You are on page 1of 182


 
 
 
 
 
 
 
 
 
 
 
 
 
 

ISBN:
978‐ 989‐ 97292‐ 0‐ 9

 

Title:
EuroITV
2011
 ‐ 
Adjunct
Proceedings 

 

Editors:
Damásio,
Manuel
José;
Cardoso,
Gustavo;
Quico,
Célia;
Geerts,
David 


 

Date:
20110401

 

Publisher:
COFAC
/
Universidade
Lusófona
de
Humanidades
e
Tecnolog ias 

2 


PREFACE 



 It
is
with
great
pleasure
 that,
on
behalf
of
 Universidade
Lusófona
de
Humanidades
e
 Tecnologias/
 CICANT
 – 
 Center
 for
 Research
 in
 Applied
 Communications,
 Culture
 and
 New
Technologies
and
LINI
 ‐ 
 Lisbon
Internet
 and
Networks
Institute ,
we
welcome
you
 to
our
institutions
and
the
city
of
Lisbon.
O ur
institutions
associated
them selves
to
this
 event
convinced
of
 the
relevance
of
it
 for
reflection
and
development
of
 the
state
of
 the
art
around
digital
television
and
as sociated
applications.
The
evolution
of
television
 and
media
has
been
a
central
topic
of
research
and
training
at
our
institutions,
and
it
is
 our
 belief
 that
 an
 organization
 like
 the
 EuroITV,
 with
 the
 high
 scientific
 quality
 that
 characterizes
it,
is
an
un deniable
contribution
to
the
advancement
of
knowledge
in
this
 field
and
the
shaping
of
a
communitarian
dynamic
around
this
topic.
 


Dear
EuroITV
2011
participants, 


EuroITV
is
increasingly
competitive
in
terms
of
paper
submission
and
acceptance
rate.


This
year,
we
received
approximately
200
paper
submissions
relevant
for
the
main
and


adjunct
 proceedings
 of
 this 
 conference
 from
 24
 countries.
 Through
 peer
 reviewing
 process
by
program
committee
and
experts,
we
accepted
for
the
adjunct
proceedings
 6
demos,
8
doctoral
consortium
proposals,
8
pos ters,
12
industrial
case
studies
papers
 and
 24
workshop
papers.
We
believe
to
have
created
a
rich
in
content
and
high‐ quality
 technical
program
spanning
three
full
days,
covering
four
different
tracks,
representing
 the
activities
in
TV
research
area.
Presentations
include
keynote
speeches
by
 Jonathan
 Taplin 
 from
University
 of
 Southern
 California
 "Long
Time
 Coming;
Has
Interactive
TV
 Finally
 Arrived?”,
 Fernando
 Pereira
 from
 Instituto
 Superior
 Técnico
 "Visual
 Compression:
the
Foundational
Technology
for
Bette r
TV
Experiences"
and
Al
Kovalick
 from
AVID
"The
Media
Cloud
and
its
Future". 
 


We
 would
 like
 to
 thank
 the
authors
 for
 providing
 the
 content
 of
 the
 program
and
 all
 members
 of
 the
 organizing
 committee
 for
 their
 dedication
 to
 the
 success
 of
 EuroITV2011
 and
 ti mely
 review
 of
 the
 submissions.
 We
 would
 like
 to
 acknowledge
 AVID,
 Caixa
 Geral
 de
 Depósitos,
 Fundação
 Gulbenkian,
 Fundação
 para
 a
 Ciência
 e


3 


Tecnologia,
 Instituto
 de
 Cinema
 e
 Audiovisual,
 Gabinete
 para
 os
 Meios
 de


Comunicação
Social,
 Fundação
Luso ‐ Americana,
 Abreu
and
ACM
for
their
sponsorships


and
support.
 We
also
thank
our
media
partners,
RTP
and
theInternational
 Journal 
of

Digital
Television .
 Finally ,
 we
are
grateful
 to
 our
 colleagues
at
Universidade
 Lusófona


and
 LINI
 who
 worked
 in
 the
 organization
 of
 th is
 event.
 It
 is
 therefore
 with
 great


pleasure
 that
 we
 collaborate
 with
 EuroITV
 and
 wish
 you
 a
 fruitful
 event
 that
 is
 not


depleted
in
these
days
of
the
conference
but
that
opens
up
spaces
and
opportunities


for
new
collaborations
and
scientific
dialogues. 
 



 Manuel
José
Damásio
 EUROITV’11
General
Chair 
 Universidade
Lusófona,
PT 

Manuel
José
Damásio

EUROITV’11
General
Chair 

Universidade
Lusófona,
PT 

Gustavo
Cardoso

EUROITV’11
General
Chair 

LINI,
PT

Célia
Quico
 

EUROITV’11
Program
Chair 

Universidade
Lusófona,
PT 

David
Geerts 

EUROITV’11
Program
Chair 

K.U.
Leuven,
BE

4 


TABLE
 OF
CONTENTS

 
 

PREFACE

3 


TABLE
OF
CONTENTS

 

5 


 

Keynotes

8 


Demos

12 


N ‐ S c r e en 
 L i ve 
 Ba seb a ll 
 G a

me
Watchi n g
S y stem:
N o vel
 I nteracti o n 
 Concepts 
w ithin
a
Pu bl i c
S e tti n g

13

Extraction 
 of 
 Cont e xtual 
 Web 
 In formation
 
 f rom 
 TV 
Video

 

15 


Au t o m a t i c 
M

easu r e

m e n t
 o f
 P l a y‐ o u t
 D iff er ences

 f o r
 S

o c ial 
T V , 
 I n t e rac t i ve 
 T V ,
 G a m i n g 
 a n d 
I

n t e r‐

d es ti na ti on 
 S ync h ro n i za t io n

 

18 


Us e r
Int e rf a ce 
 Tool k it
for
U b iquitous
TV

20

Demo:
using
speech
recognition
for
in
situ
video
tagging

 

22 


Value ‐ added
services
and
identification
system:
an
approach
to
elderly
viewers

 

24 


Doctoral
Consortium

 

26 


Resea rc h 
 for 
 D e velopment 
 of 
 Value 
 A dded 
 Se r vices 
 for 
 Connected 
 TV

 

27 


Collaboration 
 in 
 B roadcast 
 Media 
 and 
 Content

 

31 


Televisual
Leisure
Experiences
of
Different
Generations
of
Basque
Speakers

 

35 


Mobile
TV:
Towards
a
Theory
for
Mobile
Television

 

39 


Enhancing
and
Evaluating
the
User
Experience
of
Interactive
TV
Systems
and
their
Interaction


Techniques

43 


Subjective
Quality
Assessment
of
Free
Viewpoint
Video
Ob jects

 

47 


Allocation
Algorithms
for
Interactive

TV
Advertisements

 

51 


Vid e o
 a cc e ss
 a nd
int e r a ction
 ba s e d
on
 e motions

 

55 


Posters

59 


Online
iTV
use
by
older
people:
preliminary
findings
of
a
rapid
ethnographical
study

 

60

Multipl eye
 
 – 

 Concurrent 
 In formation
 
 Delive r y
 
 on 
 Pu blic
 
 Displ ays

64

Older
Adults
and
Digital
Interactive
Television:

Use
of
a
Wii
Controller

68 


Predicting
Where,
When
and
What
People
Will
Watch
on
TV
Based
on
their
Past
Viewing
History

72 


Unusual
Co ‐ Production:
Online
Co ‐ Creation
in
Cross 
Media
Format
Development

 

76 


Trendy
Episode
Detection
at
a
Very
Short
Time
Granularity
for
Intelligent
VOD
Service:
A
Case
Study


of
Live
Baseball
Game

 

80 


 

5 


Spatial
Tiling
And
Streaming
in
an
Immersive
Media
Delivery
Network

84 


In clusion 
 of 
 multiple
 
 d e vices, 
 langu a g es 
 and
 a d ve r tisements
 in 
 i Fanz y, 
 a 
 pe rsonali z ed
 
 EPG

 

88

ITV
in
Industry

92 


Convergence
of
Televised
Content
and
Game

93 


heckle.at
TV

94 


Let
the 
a u die n ce 
d i r ect

95 


Ren dezVou s

96 


Set
top ,
over
the
top,
future!

97

S m a r t
 T V 
 and 
how 
t o 
 do 
it 
 r ight

99 


Innovating
Usability

100 


T h e 
u b i q u it o u s 
re m o t e 
co n t ro l

101 


Co nt e nt 
plus 
 Int e ra ct ivit y
as 
 a 
 Ke y
Diffe re nt ia t o r

102 


ITV
strategy:
T he
use
of
Direct
and
Indirect
Communication
as
a
strategy
for
the
creation
of
 interactive
scripts

103

Ambient
Media
Ecosystems
for
TV
 ‐ 
A
forecast
2013

105 


Tutorials

106 


Designing
and
Evaluating
Social
Video
and
Television

107

How
to
inv e stig a te 
 the 
 Qu a lit y
of
Us e r
Ex p e ri e nce 
 for
U b iquitous 
 TV?

109 


D e p loying
 S oci a l
TV:
 Cont e nt, 
Conn e ctivity, 
a nd
 Communic a tion

111 


Workshops

113 


Workshop
1:
“Quality
of
Experience
for
Multimedia
Content
Sharing:
Ubiquitous
QoE
Assessment


and
Support”

114 


Quality
of
Experience
 of
Multimedia
Services:
Past,
Present,
and
Future

115 


Internet 
TV 
A rchitecture

120

Based 
on 
Scala ble 
Video 
Coding

120 


Adaptive
testing
for
video
quality
assessment

128 


Aligning
subjective
tests
using
a
low
costs
common
set

132 


Impact
of
Reduced
Quality
Encoding
on
Object
Identification
in
Stereoscopic
Video

136 


Impact 
 of 
 Disturbance 
Locations 
 on 
 Video 
 Quality
 of 
 Experience

140

Workshop
2:
“FutureTV ‐ 2011:
Making
Television
Personal
&
Social”

6 


144 


Analysis
of
the
Information
Value
of
User
Connections
for
Video 
Recommendations
in
a
Social
 Network

145 


Employing
User ‐ Assigned
Tags
to
Provide
Personalized
as
well
as
Collaborative
TV
 Recommendations

145 


Social
and
Interactive
TV:
An
Outside ‐ In
Approach 



 ‐

145 


Analyzing
Twitter
for
Social
TV:
Sentiment
Extraction
for
Sports 



 ‐

145 


OurTV:
Creating
Media
Contents
Communities
through
Real ‐ World
Interactions


… 



 145 


ITV
services
for
socializing
in
public
places

145 


Ubeel:
Generating
Local
Narratives
for
Public
Displays
from
Tagged
and
Annotated
Video
Content 145 


Hybrid
algorithms
for
recommending
new
items
in
personal
TV 



 ‐

145 


Mining
Knowledge
TV:
A
Proposal
for
Data
Integration
in
the
Knowledge
TV
Environment

145 


Workshop
3:
“Interactive
Digital
TV
in
 Emergent
Economies”

146 


GEmPTV:
Ginga ‐ NCL
Emulator
for
Portable
Digital
TV

147 


Business
Process
Modeling
in
UML
for
Interactive
 Digital
Television

151 


Guidelines
for
the
content
production
of
t ‐ learning

155 


Evaluation
of
an
interactive
TV
service
to
reinfo rce
dental
hygiene
education
in
children

159 


Experiences
in
Designing
and
Implementing
an
Extension
API
to
Converge
iDTV
and
Home
Gateway 163 


An
Approach
for
Content
Personalization
of
Context ‐ Sensitive
Interactive
TV
Applications

169 


A
Framework
Architecture
for
Digital
Games
to
the
Brazilian
Digital
Television

171 


EuroITV
2011
 ‐ 
Organizing
Committee

176 


7 



 


Keynotes


8 


Jonathan
Taplin 
 (Annenberg
Innovation
Lab
 ‐ 
University
of
Southern
Califórnia) 


‐ 
University
of
Southern
Califórnia) 
 
 Jonathan
 Taplin
 is
 a
 Prof essor
 at
 the


Jonathan
 Taplin
 is
 a
 Prof essor
 at
 the
 Annenberg
 School
 for
 Communication
 at
 the
 University
 of
 Southern
 California.
 Taplin
 is
 the
 Managing
 Director
 of
 the
 USC
 Annenberg
 Innovation
 Lab
 ( http://www.annenberglab.com/ )
 and
 also
 blogs
 at
 http://jontaplin.com/ ,
about
which
Cory
Doctorow
of
Boing,
Boing
 said,
"Taplin's
blog
is
as
eclectic
as
he
is,
a
straight ‐ up
analysis
blog
 that
rips
into
the
headlines,
illuminating
everything
from
economic
 news
to
the
w riters'
strike
to
heavy
weather
to
democratic
politics. 
 


Taplin's
areas
of
specialization
are
in
international
communication
management
and
the
 field
 of
 digital
 media
 entertainment.
 Taplin
 began
 his
 entertainment
 career
 in
 1969
 as
 Tour
 Manager
for
Bob
Dyl an
and
The
Band.
In
1973
he
produced
Martin
Scorsese's
first
feature
film,


Mean
Streets
which
was
selected
for
the
Cannes
Film
Festival.
Between
1974
and
1996,
Taplin


produced
 26
 hours
 of
 television
 documentaries
 (including
 The
 Prize
 and
 Cadillac
 Desert
 fo r


PBS)
and
12
feature
films
including
The
Last
Waltz,
Until
The
End
of
the
World,
Under
Fire
and


To
Die
For.
His
films
were
nominated
for
Oscar
and
Golden
Globe
awards
and
chosen
for
The
 Cannes
Film
Festival
seven
times. 


In
1984
Taplin
acted
as
the
invest ment
advisor
to
the
Bass
Brothers
in
their
successful
attempt
 to
 save
 Walt
 Disney
 Studios
 from
 a
 corporate
 raid.
 This
 experience
 brought
 him
 to
 Merrill
 Lynch,
where
 he
 served
as
vice
 president
 of
media
mergers
and
acquisitions.
In
 this
 role,
 he
 helped
 re ‐ e ngineer
 the
 media
 landscape
 on
 transactions
 such
 as
 the
 leveraged
 buyout
 of
 Viacom.
Taplin
was
a
founder
of
Intertainer
and
has
served
as
its
Chairman
and
CEO
since
June
 1996.
Intertainer
was
 the
 pioneer
 video ‐ on ‐ demand
 company
 for
 both
 cable
and
 broadband 
 Internet
markets.
Taplin
holds
two
patents
for
video
on
demand
technologies.
Professor
Taplin
 has
 provided
 consulting
 services
 on
 Broadband
 technology
 to
 the
 President
 of
 Portugal
 and
 the
Parliament
of
the
Spanish
state
of
Catalonia.
In
May
of
2010
he
was 
appointed
Managing
 Director
of
the
Annenberg
Innovation
Lab. 


Mr.
Taplin
graduated
 from
 Princeton
University.
He
is
a
member
 of
 the
Academy
Of
Motion
 Picture
Arts
and
Sciences
and
sits
on
the
International
Advisory
Board
of
the
Singapore
Media
 Authority
 a nd
 the
 Board
 of
 Directors
 of
 Public
 Knowledge.
 Mr.
 Taplin
 was
 appointed
 by
 Governor
Arnold
Schwarzenegger
to
the
California
Broadband
Task
Force
in
January
of
2007. 


9 


Fernando
Pereira
 
 (Instituto
Superior
Técnico,
Lisboa/
Portugal)



 Fernando
 Pereira
 is
 cur rently
 with
 the


Fernando
 Pereira
 is
 cur rently
 with
 the
 Electrical
 and
 Computers
 Engineering
 Department
 of
 Instituto
 Superior
 Técnico
 and
 with
 Instituto
 de
 Telecomunicações,
 Lisbon,
 Portugal
( http://www.img.lx.it.pt/~fp/ ). He
is
responsible
for
the
p articipation
of
IST
in
many
national
 and
international
 research
 projects.
He
acts
 often
as
 project
 evaluator
and
auditor
for
various
organizations. 


He
is
an
Area
Editor
of
the
Signal
Processing:
Image
Communication
Journal,
a
member
of
the
 Editorial
 Board 
 of
 the
 Signal
 Processing
Magazine,
 and
 is
 or
 has
 been
 an
 Associate
 Editor
 of
 IEEE
 Transactions
 of
 Circuits
 and
 Systems
 for
 Video
 Technology,
 IEEE
 Transactions
 on
 Image
 Processing,
IEEE
Transactions
on
Multimedia,
and
IEEE
Signal
Processing
Magazine.
He
is 
or
has
 been
a
member
of
 the
IEEE
Signal
 Processing
Society
Technical
Committees
on
Image,
Video
 and
 Multidimensional
 Signal
 Processing,
 and
 Multimedia
 Signal
 Processing,
 and
 of
 the
 IEEE
 Circuits
 and
 Systems
 Society
 Technical
 Committees
 on
 Visual
 Signal
 Pr ocessing
 and
 Communications,
 and
 Multimedia
 Systems
 and
 Applications.
 He
 was
 an
 IEEE
 Distinguished
 Lecturer
in
2005
and
elected
as
an
IEEE
Fellow
in
2008. 


He
 is/has
 been
 a
 member
 of
 the
 Scientific
 and
 Program
 Committees
 of
 many
 international
 conferences.
 He
 has
 been
 the
General
 Chair
 of
 the
 Picture
 Coding
 Symposium
 (PCS)
 in
 2007
 and
the
Technical
Program
Co ‐ Chair
of
the
Int.
Conference
on
Image
Processing
(ICIP)
in
2010. 
 He
 has
 been
 participating
in
 the
MPEG
 standardization
activities,
 notably
as
 the
 head
 of
 the
 Portuguese
delegation,
chairman
of
the
MPEG
Requirements
Group,
and
chairman
of
many
Ad
 Hoc
Groups
 related
 to
 the
MPEG ‐ 4
and
MPEG ‐ 7
 standards.
He
is
a
co ‐ editor
of
‘The
MPEG ‐ 4
 Book’
and
‘The
MPEG ‐ 21
Book’
which
are
reference
books
in
their
topics. 


He
 won
 the
 first
 Portuguese
 IBM
 Scientific
 Award
 in
 1990,
 an
 “ISO
 award
 for
 Outstanding
 Technical
 Contribution”
 for
 his
 contributions
 to
 the
MPEG ‐ 4
 Visual
 Standard
 in
 1998
 and
 an
 Honour
Mention
of
 the
UTL/Santander
Totta
Award
 for
Electrotechnical
Engineer ing
in
2009. 
 He
 has
 contributed
 more
 than
 200
 papers
 in
 international
 journals,
 conferences
 and
 workshops,
and
made
several
tens
of
invited
talks
at
conferences
and
workshops.
His
areas
of
 interest
 are
 video
 analysis,
 coding,
 description
 and
 adaptation,
 an d
 advanced
 multimedia
 services. 


10 


Al
Kovalick
 
 (AVID,
U.S.A.) 


Al
 Kovalick
 has
 worked
 in
 the
 field
 of
 hybrid
 AV/IT
 systems
 for
 the
 past
 18
 years.
 Previously,
 he
 was
 a
 digital
 systems
 designer
 and
 technical
 strategist
 for
 Hewlett ‐ Packard.
 Following
 HP,
 from 
 1999
 to
 2004,
 he
 was
 the
 CTO
 of
 the
 Broadcast
 Products
 Division
 at
 Pinnacle
 Systems.
 Currently,
 he
 is
 with
 Avid
 and
 serves
 as
 an
 Enterprise
 Strategist
 and
 Fellow.
 


Al
is
an
active
speaker,
educator,
author
and
participant
with
 industry
 bodies
 including
 S MPTE
 and
 AMWA.
 He
 has


presented
over
50
papers
at
industry
conferences
worldwide


and
holds
18
US
and
foreign
patents.
He
is
the
author
of
the


book
 “Video
 Systems
 in
 an
 IT
 Environment;
 The
 Basics
 of
 Networked
 Media
 and
 File ‐ Based
 Workflows”
 (2nd
 edition,
 20 09).
 
 Al
 was
 awarded
 the
 SMPTE
 David
 Sarnoff
 Gold
 Medal
 in
 2009.
 
 He
 has
 a
 BSEE
 degree
 from
 San
 Jose
 State
 University
 and
 MSEE
 degree
 from
 the
 University
of
California
at
Berkeley.
 


the
 University
of
California
at
Berkeley.
 


He
is
a
life
member
of
Tau
Beta
Pi,
IEEE
member
and
a
SMPTE
Fellow. 


11 


Dem os

12 


N-Screen Live Baseball Game Watching System: Novel Interaction Concepts within a Public Setting

Hogun Park, Geun Young Lee, Dongmahn Seo, Sun-Bum Youn, Suhyun Kim, Heedong Ko

Imaging Media Research Center, Korea Institute of Science and Technology (KIST), Seoul, Korea {hogun, gylee, sarum, dmonkey, suhyun.kim, ko}@imrc.kist.re.kr

ABSTRACT

Recently, as social media has taken place on an interactive TV domain, many researches have attempted to provide better emotional engagement and satisfaction. However, their approaches are still limited in utilizing many types of screens and supporting their social collaboration within a public setting. For example, when people are watching TV together, TV is not a suitable place to have a personal chat, and mobile phone is too small to support every sharing activity. In order to provide seamless social experience across any connected screens, in this paper, we present an N-Screen-based collaborative baseball watching system. It provides user engagement interfaces within a public setting and novel N-Screen interaction concepts.

Categories and Subject Descriptors

H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems–Video; H.5.3 [Information Interfaces and Presentation]: Group and Organization Interfaces

General Terms

Design

Keywords

Social TV, N-Screen, Live Sports Game

1. INTRODUCTION

As social media research has taken place on an interactive TV domain, many researches have tried to provide more socialized experience on TV. However, most existing social TV platforms aim to connect viewers with their friends and families by providing a virtual shared space [1][2]. Even if their communication is more socialized around TV contents, they are still limited in utilizing many types of screens and supporting their social collaboration. In particular, a number of communities and corresponding individual viewpoints are derived from live events such as a baseball game and a musical performance. Depending on the interest of co-viewers, some parts of contents would be worthwhile to share, but others are not. To facilitate their social watching activity, it is necessary to provide seamless experience across screens of participants within a public setting. In this paper, we present a N-Screen collaborative sports watching system. In

13 


this system, a public display like TV constitutes a novel global communication medium. It connects all surrounding displays to enable ubiquitous and cooperative watching. To address it, we have the following contributions: (1) User engagement interfaces within a public setting (2) New N-Screen interaction concepts and their implementation for baseball game watching. For evaluation, we implemented a live baseball game watching system on an android set-top box, an android mobile phone application, and a PC. In this paper, we present an overview of our proposed system (Section 2) and details of N-Screen-based watching system interfaces (Section 3).

2. SYSTEM OVERVIEW

watching system interfaces (Section 3). 2. SYSTEM OVERVIEW Figure 1. Overall Framework of Proposed System The

Figure 1. Overall Framework of Proposed System

The system framework of our proposed approach is illustrated in Figure 1. First, every user needs to register their personal displays like mobile phones to one of the nearest public displays. The public display serves as a medium to help group-based watching activities, and any users can initiate some sharing activities through the public display. In this system, it is newly introduced that one of participants can become a leader, so-called a media jockey (Section 3.2.2,) who can act as a media director and a producer. In our engagement interface, the media jockey takes a proactive role to organize all information and provide intermediate response to a live event and co-viewers. In other words, a group of users including a media jockey and viewers creates own social watching communities, and they collaboratively organizenew broadcasting stream. The community can be a group of friends/supporters and not necessarily placed on the same place. Our P2P streaming system supports to transmit/receive multiple live video streams, and media jockey plays a leading role of making broadcasting channel by organizing and selecting some of them.

In our demonstration, a TV tuner, 3 HD cameras, and 3 spherical cameras were installed at a baseball stadium, and real- time streaming which guarantees synchronized [5] and low media zapping time [3] (<0.5 sec) was accomplished. Furthermore, micro blogs which correspond to a live event are analyzed to

extract summary keywords, and it generates highlight video clips by associating timespans of keywords with recorded videos. On TV, keyword summary and available video clips are displayed on the bottom of screen when new issue occurs.

3. INTERFACE OF N-SCREEN LIVE

BASEBALL WATCHING SYSTEM

3.1 Public display

OF N-SCREEN LIVE BASEBALL WATCHING SYSTEM 3.1 Public display Figure 2. Public Display Interfaces In a

Figure 2. Public Display Interfaces

In a public display, it serves as a ubiquitous media medium to help their watching within a public setting. It provides supporting clues to communicate and understand emerging issues of a game and activities of members. The key design criterion is for users not to be distracted too much frequently. In particular, to provide highlight video and keywords, we designed an algorithm [4] to measure an appropriate interaction time by detecting a bursty period with keywords. There are several steps to go through. Firstly, social media feeds corresponding to a live baseball game are segmented into a fixed group of time spans for extracting the most representative keywords. To get a keyword, it makes use of a parameter-free bursty keyword detection algorithm. Later, it associates time spans having bursty keywords with recorded video, and notifies viewers on the screen, as in Figure 2. The figure also shows screen-dumps of their ranking of prediction- based game, keywords summary, and video sharing (in this case, a panorama video.)

3.2 Personal Display

3.2.1 Mobile Phone

a panorama video.) 3.2 Personal Display 3.2.1 Mobile Phone Figure 3. Mobile Phone Interfaces Mobile phone

Figure 3. Mobile Phone Interfaces

Mobile phone allows participants to watch a live game and compete against their friends using given polling and voting interfaces. As in figure 3, users can suggest a vote topic, and we can see the result graphically. Participants can also request a different video angle or navigate the live panorama video using our multi-touch interface. This kind of activities is shared on public display, and they can feel more engaged and immersive experience.

14 


3.2.2 Media Jockey

One of community members can take a role of media jockey. In this system, the media jockey mediates interaction among a group, recommends angles to be interested, and organizes their public displays. In other words, he can monitor a list of emerging videos and all social activities among a group, so he can become a contents programmer for generating a new channel. A concept of media jockey is to encourage proactive participation and provide better awareness of co-viewers in N-Screen environment.

better awareness of co-viewers in N-Screen environment. Figure 4. Media Jockey Interfaces 4. DISCUSSION AND

Figure 4. Media Jockey Interfaces

4. DISCUSSION AND CONCLUSION

This paper proposed a N-Screen live baseball game watching system, in order to provide seamless social experience across any connected platform. It supports public and personal display interfaces with corresponding dynamic resource and interaction management. For evaluating the system, it demonstrated 2010 championship series of the Korea baseball league. It provides novel experience on a live baseball game and given information and media are organized as a more intuitive way. In future work, more immersive temporal and spatial metaphor will be incorporated in order to organize all information and media. We believe that virtual world or spherical video can be a good initial point to develop.

5. ACKNOWLEDGMENTS

This research is supported by Korea Institute of Science and Technology under "Development of Tangible Web Platform" project.

6.

REFERENCES

[1]

Mate, S. and Curcio , D. D. I. 2010. Mobile and interactive social television. IEEE Communications Magazine, Vol. 47, No. 12, 116-122.

[2]

Nathan, M., Harrison, C., Yarosh, S., Terveen, L., Stead, L., and Amento, B. 2008. CollaboraTV: making television viewing social again. In Proc. of UXTV 2008, 85-94.

[3]

Joo, H., Song, H., Lee, D.B., Lee, I., 2008. An Effective IPTV Channel Control Algorithm Considering Channel Zapping Time and Network Utilization, IEEE Transactions on Broadcasting, Vol. 54, No. 2, 208-216.

[4]

Park, H., Youn, S.B., Lee, G.Y., Ko, H. 2011. Trendy Episode Detection at a Very Short Time Granularity for Intelligent VOD Service: A Case Study of Live Baseball Game. In Proc. of EuroITV 2011.

[5] Seo, D., Kim, S., Song, G. 2010. SyncStream: Synchronized Media Streaming System in a Peer-to-Peer Environment. In Proc. of HumanCom 2010, 1-5

Extraction of Contextual Web Information from TV Video

T. Chattopadhyay

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services t.chattopadhyay@tcs.com

Aniruddha Sinha

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services aniruddha.sinha@tcs.com

Avik Ghose

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services avik.ghose@tcs.org

Provat Biswas

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services provat.biswas@tcs.org

Arpan Pal

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services arpan.pal@tcs.org

ABSTRACT

This demo is extracting contextual web Information from TV Video and render it to the same TV or any second screen like iPad or iPhone. The objective of this demo is to present an emulation that depicts the functionalities of the STB and TV in a PC based environment.

The Connected TV can be described as an Internet enabled TV, via either a set-top- box or some other technology. Re- cent market trends on consumer electronics show that con- sumers are demanding for Internet connected TV largely as the sale for such product raises to $1 billion in the second quarter of 2009 compared to $776 million in the first quarter of the same year. We have a Home Infotainment Platform (HIP) that supports Internet and TV simultaneously. The input to the HIP is analog video, either through composite (CVBS) or RF cable. The output of HIP is composite video where the input TV signal is blended with graphics and in- ternet. In this demo we have proposed a novel system for connected TV that can recognize the context of TV news videos and the related information from internet can be ob- tained from internet or RSS feeds. That information can be pushed to a second screen like any handheld mobile device like iPhone. We can demonstrate the system by recognizing the context of a recorded video on a laptop and push those as a text stream to an iPad or an iPhone adjacent to it.

General Terms

Demo

Keywords

Demo, OCR, Video OCR, Keyword spotting

1. HIGH-LEVEL DESIGN OF THE PROPOSED TOOL

The proposed

system consists of the following items:

Analog TV signal with video and audio

Extraction of breaking news from the news videos

Finding the keywords from the breaking news

Browser based application to fetch information related to the keywords from internet or RSS feeds

Blended display module for mashing up the internet

a text

information with the TV video or push stream to iPad or iPhone

it

as

2. MERITS

This demo can send relevant information to a second screen that may be of interest of the person watching the TV

In India

as per report from TRAI, still 93% TV users

are using analog RF cable feed. The proposed system is compatible to this input, also.

The proposed tool is an end to end system that can bridge the gap between technology and market of TV web mashup with contextual information

It can localize the News area, breaking news automat- ically

3. NOVELTY

existing connected TV services are more or less re-

stricted to movie purchase and rental, TV show purchase, access to You Tube, Music Services and media on home net- work. Moreover, the internet mashups are primarily based on predefined set of RSS feeds. The proposed solution en- hances the user experience by deriving the context of the viewed channel and then fetching the related information to

provide an ubiquitous experience of mashups.

The

15 


Figure 1: Overview of the Technical steps involved 4. APPLICATIONS • Connected TV with contextual

Figure 1: Overview of the Technical steps involved

4. APPLICATIONS

Connected TV with contextual internet mashups

Interactive advertisements and purchase - This will in- volve fetching information from the internet that is rel- evant to a particular advertisements allowing viewers to do online review and purchase of the products.

5. TECHNICAL DETAILS

The overview is depicted in Figure 1. The demo is using the following technical modules:

Text

localization from the streamed video

Preprocessing the text region prior to Optical Charac- ter Recognition (OCR)

Recognition of the characters from the video

Removal of false positives

Spotting the breaking news from the text streams

Keyword spotting

Effic ient search in the internet

Push the retrieved information to iPad or iPhone or show as blended text over the same video

6. TECHNICAL REQUIREMENT

The following systems are required to show the demo:

Laptop: To store the recorded video and process the video

Wireless internet connectivity to search internet to fetch related information

iPad

or iPhone

as second

screen

to show the

related

information

7. DEMO SCENARIO

The demo scenario is like below:

Laptop opens the stored TV content and extracts the contextual information from the news video using the above mentioned approach. It represent the context in XML format

User launches the application (app) on the second screen when there is an intent to have augmented information on the broadcast content

The app requests for the content (in XML format) for

the connected TV over either tion or a blue-tooth pairing.

an ad-hoc Wi-fi connec-

The gadget

then

connects to the Internet and requests

class of context

augmented information based

detected.

on the

Extracted keywords are used as inputs to web 2.0 search and news API provided by web-site owners like Google, Yahoo etc.

This

augmented information is then

displayed on the

second screen with real-time context.

The apps have been implemented on iPhone, iPad and Android tablet (Galaxy Tab).

8. SCREEN SHOTS

Here we give the video displayed in the connected TV (Fig. 2)(in this demo the laptop will simulate the connected TV). Screen shots of the second screen namely iPad and iPhone is shown in Figur 3 and Figure 4 respectively. These second screens are used to display the information related to the video in connected TV.

16 


Figure 2: News Video Displayed in the Connected TV Figure 3: Related information in the

Figure 2:

News Video Displayed in the Connected

TV

Figure 2: News Video Displayed in the Connected TV Figure 3: Related information in the iPad

Figure 3: Related information in the iPad

17 


in the Connected TV Figure 3: Related information in the iPad 
 17 
 Figure 4:

Figure 4: Related information in the iPhone

Automatic Measurement of Play-out Differences for Social TV, Interactive TV, Gaming and Inter-destination Synchronization

R.N. Mekuria

TNO P.O. Box 5050 2600 GB Delft, Netherlands +31 6446 66987

roefi20@gmail.com

H.M. Stokking

TNO P.O. Box 5050 2600 GB Delft, Netherlands +31 6516 08646

hans.stokking@tno.nl

Dr. M.O. van Deventer

TNO P.O. Box 5050 2600 GB Delft, Netherlands +31 651 914 918

oskar.vandeventer@tno.nl

ABSTRACT

Inter-destination media (play-out) synchronization for social TV has gained attention from research and industry in recent years. Applications include social TV and interactive game shows. To motivate further research of inter-destination synchronization technologies, pilot measurements of play-out differences in different TV-broadcasting systems are desirable. However to our knowledge no broadly applicable measurement system exists. This paper fills this gap by presenting and implementing a robust system that can detect constant or slowly varying differences in media play-out between different receivers without accessing receiver hardware or network. The measurements are relevant in various use-cases such as football watching, social TV, interactive game shows around TV-content, TV input lag comparison for gaming and validation of current inter-destination synchronization technologies. The robustness of the system is demonstrated in a working prototype.

Categories and Subject Descriptors

H.5.1 [Multimedia Information Systems]: Video – broadcasting, media synchronization.

General Terms

Measurement, Performance.

Keywords

Inter-destination media synchronization, Social-TV, video delay, input-lag, broadcast-tv, gaming

1. INTRODUCTION

Digital TV-broadcasting and video technologies have improved visual image quality and support different devices such as TV, PC and smart phones and enable interactivity between consumers of content. Users can watch video or TV and simultaneously use text or voice chat obtaining a social-TV experience. However as different digital broadcasting techniques/schemes have different processing delay play-out differences between receivers occur. An interactive experience such as Social-TV can be disrupted by these play-out differences. This was noticed by Shamma et al.[1] who proposed play-out synchronization to enhance this social-TV watching. Other companies also started to offer solutions for watching video together synchronized .e.g. Watchitoo, clipsync.com and BBC i-player. Web sites like Youtube social and Synchtube enable synchronized playback of YouTube videos while watching with Facebook friends. Standards on inter-

destination media synchronization for IPTV and web broadcasting have also been introduced recently [2] [3]. Techniques for fixing play-out differences in networks are an active topic of research of which a survey is given in [4]. Play-out differences in TV broadcasting became noticeable to consumers in the Netherlands during the recent FIFA 2010 world cup. Consumers subscribed to different digital broadcast technologies saw goals at different times, spoiling the experience of consumers lagging behind. Play- out differences can cause unfairness in an interactive quiz show around TV-content where users can answer questions by phone or internet to compete with other viewers. When consumers and broadcasters become more aware of the play-out difference issue, measurements will become relevant to them as an indicator of the quality of service. This paper presents a tool that can accurately and automatically measure play-out differences (inter-destination synchronization) between receivers on a similar location relevant to each of the use-cases described in this section.

2. EXISTING PLAY-OUT DIFFERENCE

MEASUREMENT METHODS AND THEIR

LIMITATIONS

Traditionally, inter-destination media (play-out) synchronization techniques have been applied in video conferencing applications to enhance the collaborative work experience. Play-out differences need to be measured in a test setup to measure and compare the performance of these techniques Nunome and Taskaka performed such a validation study [5]. Timestamps of received packets were used together with a fixed estimate of the expected decoding and play-out delay to estimate differences between receivers. This approach works well with terminals running similar hardware and a similar network/protocol stack which can often be expected in commercial video conferencing systems, however it is not practical for measuring play-out difference in TV-broadcasts were many different networks and receiver types exist. Also proprietary TV broadcast systems are often not open for third party measurements in the network and receivers. Another drawback of the timestamp method is that delays can be introduced after digital reception (timestamp) by both the set-top box and the (digital) TV. Therefore we conclude that measuring differences in play-out between broadcasters with timestamps is not always possible and not always completely accurate. In the gaming community a similar synchronization measurement problem exists related to the measurement of TV input lags [6]. In [6], a digital clock was displayed on different screens, which were recorded together with a separate video camera. Time lags were determined by reading out the displayed

18 


digital clocks. This approach is accurate but hard to perform or to automate in practice. Also when measuring TV broadcasts a video

of a clock signal is not always available.

Stokking et al. [7] conducted several measurements comparing play-out differences between TV broadcasts by using a front recording and comparing scene changes manually on a frame by frame basis. From these measurements the play-out difference was

computed. Whereas this approach is accurate, end-to-end and does not require access to receiver hardware or network, it is a lot

of manual work and hence unsuited for long-time and/or real-time

measurements. We have automated the latter approach by extending this method with automatic scene change detection and robust play-out difference estimation. Our method accounts for processing delay after packet reception and does not need onscreen clocks to compute the play-out difference.

3. SYSTEM OUTLINE

Receiver 1 Receiver 2 Video camera
Receiver 1
Receiver 2
Video camera

Figure 1. Front recording test setup

Our system uses a recording of two devices displaying a similar video file or similar TV content as shown in Figure 1. A (web-) camera is used to record the terminals and to store it as a video signal V(x,y,t). From V(x,y,t) the device on the left side and the right side are automatically extracted by separating into smaller videos L(x,y,t) and R(x,y,t) after first manually selecting the upper and lower points with a user-interface. The two separate video segments are fed into the scene change detector that was tested to detect scenes with a probability of approximately 90- 95% and gives false positive (detection) rate of approximately

0.1%.

gives false positive (detection) rate of approximately 0.1%. Figure 2. Play-out difference detection system outline The

Figure 2. Play-out difference detection system outline

The scene-change detector used was a custom built implementation of [8]. The detector was carefully tuned to meet the performance with smaller screen sizes, dark images and low

resolution recordings. Figure 2 shows the outline of the system in

a diagram. The computed cross correlation gives an easily detectable peak /maximum at the play-out difference.

19 


4. DEMONSTRATION

Our Demonstrator, implemented in the MATLAB language, shows that recordings of movies playing on different devices ranging from smart phones to flat screen TV’s can be fed into the system to quickly estimate the play-out difference. It enables play- out difference detection even when no timestamps or displayed clock signals are available. The robustness to screen size and brightness is also clearly observed in practice. Currently, our system works on recorded files, but it is ready to be extended to real-time measurements. By using a mobile DVB-T receiver that was tested to have approximately constant play-out differences at different locations we were able to compare play-out differences of non co-located TV broadcast receivers indirectly as in[7].

5. CONCLUSION

This article presented a system that measures and computes play- out difference between receivers at a similar physical location. The method is more accurate, more automatic and easier to deploy than previous methods. It enables easy measurements on proprietary TV networks or systems. TV Distributors can use the system to compare their signal delay with their competitors and potentially use the results to promote their services to for example co-located soccer match viewers. Companies offering interactive games around TV content can use play-out difference measurements as a base for their synchronization strategy. The system can also be used to validate and asses the quality of the current generation of inter-destination algorithms and solutions on the market. Moreover, gamers can compare TV’s for their input lag by connecting two screens to a single gaming console or DVD player.

6.

REFERENCES

[1]

Shamma, D. A., Bastea-Forte, M., Joubert, N., and Liu, Y. Enhancing online personal connections through the synchronized sharing of online video. In CHI '08 Extended Abstracts on Human factors, 2008

[2]

ETSI TS 183064. Telecommunications and Internet converged services and protocols for advanced networking(TISPAN); NGN integrated IPTV subsystem stage 3 specification”, 2010

[3]

Stokking et al. RTCP XR Block Type for inter-destination media synchronization, Internet Draft, 2010

[4]

Boronat,F , Loret, J, Garcia, M. Multimedia group and inter- stream synchronization techniques: A comparative study. In Elsevier Information systems volume 34 issue 1 pages 108- 131, 2009

[5]

Toshiro Nunome, Shuji Tasaka. An Application-Level QoS Comparison of Inter-destination synchronization schemes for continuous media multicasting GLOBECOM’03, 2003

[6]

HDTV lag the unofficial guide:

http://sites.google.com/site/hdtvlag/

[7]

H.M. Stokking, M.O. van Deventer, O.A. Niamut, F.A. Walraven, R.N. Mekuria, "IPTV Inter-Destination Synchronization: A Network-Based Approach", ICIN 2010, Berlin, 11-14 October 2010.

[8] Chung-Lin Huang and Bing-Yao Liao: “A Robust Scene- Change Detection Method for Video Segmentation”, IEEE

Tra nsa ctions on Circuits and

2001

Systems for Video Technology,

User Interface Toolkit for Ubiquitous TV

Javier Burón Fernández

1. Department of Informatics and Numeric Analysis, Cordoba University, Córdoba, Spain

jburon@uco.es

Konstantinos Chorianopoulos

Department of Informatics, Ionian Univerity, Corfu, Greece

choko@ionio.gr

Enrique García Salcines 1 , Carlos de Castro Lozano 1 , Beatriz Sainz de Abajo 2

2. Department of Communications and Signal Theory and Telematics Engineering, Higher Technical School of Telecommunications Engineering, University of Valladolid, Spain.

(egsalcines,ma1caloc)@uco.es

beasai@tel.uva.es

ABSTRACT

The wide adoption of small and powerful mobile computers, such as smart phones and tablets, has raised the opportunity to employ them in multi-user and multi-device iTV scenarios. In particular, the standardization of HTML5 and the increase of cloud services have made the web browser a suitable tool for managing multimedia content and the user interface, in order to provide seamless session mobility among devices, such as smart phones, tablets and TV screens. In this paper we present an architecture and a prototype that let people transfer instantaneously the video they are watching between web devices. This architecture is based on two pillars: Websockets, a new HTML5 feature, and Internet TV (Youtube, Yahoo Video, Vimeo, etc.). We demonstrate the flexibility of the proposed architecture in a prototype that employs the Youtube API and that facilitates seamless session mobility in a ubiquitous TV scenario. This flexible experimental set-up let us test several hypotheses, such as user attention and user behavior, in the presence of multiple users and multiple videos on personal and shared screens.

Keywords

Tablet, TV, interaction, design, evaluation

1. INTRODUCTION

Since the advent of the PDAs there have been some studies to replace the remote control in the interaction with interactive television. One of the most influential research for this work is the Robertson one (1996), which proposed a prototype for real estate searching by a PDA bidirectionally communicated via infrared with interactive television. The author proposes a design guide remarking the importance of distributing information through appropriate devices. So the right information for display on PDA's is text and some icons, but television is suitable for displaying large images, video or audio. So the nature and quantity of information determines how to display and on which device. This research also gives priority to increase a synchronized cooperation between both devices.

By now user interface systems consider a clear distinction between the input and the output devices. Indeed, the user interface systems in desktop computers, TVs, telephones, have usually distinguished between the input and the output devices. Smart phones and tablets are devices that don't consider this

distinction. Moreover, the plentitude of devices enable the creation of ubiquitous computing scenarios, where the user can interact with two of more devices. Then, one significant research issue is to balance the visual interface system between two devices with output abilities.

The remote control has been the most common way to interact with iTV. Moreover, some released products as RedEye 1 that let the user interacts with TV through a second screen to do some basic operations of content controlling, however, it works only like Wifi to Infrared traductor in different devices. However, the popularity of mobile computers such as smart phones and tablets allow us to leverage the established way of interaction. A second screen could give the user more information and the possibility to interact controlling, enriching or sharing the content (Cesar et al. 2009, Cesar 2008). In this work, the researchers examine alternative scenarios for controlling the content in a dual screen set-up.

2. SYSTEM ARCHITECTURE

In our research, we are exploring alternative multi-device visual

interface configurations in the context of a leisure environment and for entertainment applications. For this purpose, we have developed a flexible experimental set-up, which we plan to employ in several user evaluations. The latter are focused on the actual user behavior in the face of important parameters, such as attention, engagement, and enjoyment.

The system architecture for the experimental set-up consists of: 1)

A Tv connected to a Laptop, 2) Apache/PHP Server, 3) An Ipad

and Iphone, 4) A Local Area Network, 5) An Apple Remote

connected to the TV using Infrared. In Figure 2 a simplified draw

of the system arcuitecture, in order to be well understood, can be

observed. Firstly it is necessary to remark that the novelty of this

architecture is the use of a technology drafted in HTML5 called WebSockets. The use of this technology let us connect bidirectionally two browsers. Thanks to this characteristic we convert an Ipad or any device with a WebSockets supported browser in a TV remote controller. For this several webs (depending on the scenario) have been developed. From these webs the user will be able to control other Webs that represent the TV.

1 https://thinkflood.com/products/redeye/what-is-redeye/

20 


This work is focused on the secondary-screen as a control device for TV content. Previous research has regarded the secondary- screen as an editing and a sharing interface, but has neglected the control aspect. In particular, we are seeking to understand the balance between the shared and the personal screen during alternative TV-control scenarios that regard the secondary-screen as a: 1) simple remote control, 2) related information display, 3) mirror of the same TV content.

information display, 3) mirror of the same TV content. Figure 1 System Architecture Simplified 3. ONGOING

Figure 1 System Architecture Simplified

3. ONGOING RESEARCH

For our research we consider the following situation: Peter is watching a list of short sailing videos and he wants to control the video content playing, pausing and stopping the video, pass to the next and the previous video and see more information about the video including the next video. It is worth highlighting that the proposed functionality is a subset of that provided by the API of YouTube, which is a rather diverse and growing pool of video content. Is necessary to remark that the researchers want to evaluate interaction concepts on iTV. For this very simple actions will take in part in these prototypes to come up with conclusion for the design of coupled display interfaces in general in a leisure environment.

So far we have developed four scenarios of iTV interaction:

1. To Interact with iTV using a remote control: In this case, user interacts with iTV using remote controller. To control the content there is a play/pause button and two arrows, right and left, to select the next or the previous video. The Menu button will be used to show the information related to the video and the next video on the list.

2. To interact with iTV using a tablet as remote controller: In this case, all the overlay information shown in the first scenario is displayed in the tablet cleaning the first screen of interactive information so it wouldn't disturb other users.

3. To interact with iTV using a tablet as remote controller: In this case, all the overlay information is displayed in the TV.

4. iTV inside the tablet and a screen shared: This scenario

suppose that user is watching the iTV in the tablet and there

is a TV shared

watching in the TV shared. This scenario is the most

The user can “fly out” or expand what he is

21 


interesting one. The user can extend what they are watching to other shared screen and also retrieves or “fly in” any video that is being watched in the TV.

As it has been shown three scenarios include the same options and functionalities. It is important to remark because the more complex are these functionalities the more appropriate it will be the tablet to do that. But when we do common actions that we usually do when we watch videos on Internet is when the advanced visual interfaces in a second screen can affect the user attention in a negative way (Figure 2).

can affect the user attention in a negative way (Figure 2). Figure 2 Fourth scenario illustration.

Figure 2 Fourth scenario illustration.

In summary, we are motivated by the introduction and wide adoption of small and powerful mobile computers, such as smart phones and tablets. The latter has raised the opportunity of employing them into multi-device scenarios and blending the distinction between input and output.

4. ACKNOWLEDGEMENTS

This work was partially supported by the European Commission Marie Curie Fellowship program (MC-ERG-2008-230894) and by the ORVITA2 project at EATCO, Cordoba University, Spain.

5.

REFERENCES

[1]

Cesar, P., Bulterman, D. C., Geerts, D., Jansen, J., Knoche, H., and Seager, W. 2008. Enhancing social sharing of videos:

fragment, annotate, enrich, and share. In Proceeding of the 16th ACM international Conference on Multimedia MM '08. ACM, New York, NY, 11-20.

[2]

Cesar, P., D.C.A. Bulterman, and J. Jansen, 2009. Leveraging the User Impact: An Architecture for Secondary Screens Usage in an Interactive Television Environment. Springer/ACM Multimedia Systems Journal (MSJ), 15(3):

127-142

[3]

Fallahkhair, S., Pembertom, L. and Griffiths, R. 2005. Dual Device User Interface Design for Ubiquitous Language Learning: Mobile Phone and Interactive Television (iTV)

[4]

Robertson, S., Wharton, C., Ashworth, C., and Franzke, M. 1996. Dual device user interface design: PDAs and interactive television. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI '96. ACM, New York, NY, 79-86

Demo: using speech recognition for in situ video tagging

Chengyuan Peng

VTT PL 1000 02044 VTTd , Espoo Finland +358 20 722 111

chengyuan.peng@vtt.fi

John Grönvall

Arcada University of Applied Science Jan-Magnus Jansson Plats 1 00550 Helsinki, Finland +358 50 368 4607

john.gronvall@arcada.fi

Lasse Becker

Lingsoft Oy Linnankatu 10A 20100 Turku, Finland +358 2 279 3300

lasse.becker@lingsoft.fi

ABSTRACT

In this demo we present our attempt to solve a well-known problem: how to generate interesting and useful tags describing the content of user generated mobile phone video material. In our case the tagging takes place just after the video recording, at the end of the video file. By using specific keywords we delimit the tags from the rest of the audio track.

Categories and Subject Descriptors

H5.2. User interfaces. Natural language. Theory and methods.

3.

PROBLEM DESCRIPTION

Having reliable and good metadata for UGC clips is essential when gathering material for the local TV broadcast. The two research questions we set out to answer are:

How can we generate interesting and useful tags in situ, describing the content of user generated video material at the time of recording?

 

4.

THE

“SUITCASE

OF

STORIES”

General Terms

DEMO

Documentation,

Standardization, Languages,

Design,

Experimentation,

Human

Factors,

Keywords

Mobile video, tag, speech recognition, metadata search, participatory journalism, mediation, local TV.

1. INTRODUCTION

There has been quite substantial research done on participatory elements in different news contexts [1][2][3][4][5]. Therefore we conclude that – if the citizen really wants to make his/her voice heard in a local TV context, given the right tools – the aether is currently open for delivering ones personal opinion [6]. However, when people publish their videos, they usually do not bother to enter the necessary metadata tags –most likely because they are considered awkward and laborious. [7]

In this paper we address a well-known problem: how to generate interesting and useful tags for user generated video content. We present our system of speech tagging, in situ - while recording video material on a mobile phone.

2. CURRENT AND PREVIOUS WORK

For some time there has been a multitude of annotation tools available for tagging (or annotating) video material in real time or in retrospect. [8][9][10][11][12] Using speech to tag videos is not new [13][14][15] We base our work on these prior research projects but in contrast to these systems we want to develop a simple and automated process for retrieving the metadata, in situ, at the moment of video recording. While voice recognition per se is a well-established technique, there have to our knowledge not been any attempts to use it for in- situ tagging of UGC videos at the time of recording.

The goal of “Suitcase of stories” experiment was to create a total of 500 minutes of speech-tagged video. A class of Film & Television students where chosen for the experiment using 20 Nokia N97 phones for a period of one week.

Everyone in the group should produce short video segments (30-90 seconds of length) totalling 5 minutes of content each day, for five days in a row. The students were asked to film at 5 specific locations. In addition to this they should enable the location tagging via GPS in the phone and shoot 5 minutes in a tram, thus testing the usability of the GPS system. Each video clip was tagged by speaking into the handset at the end of the filming process. The tagging was specified as follows to the students:

When done filming your 30-90s press PAUSE,

Say the word 'KIRAHVI' (giraffe)

Speak your single word tags

Say the world 'KROKOTIILI' (crocodile)

Speak one free-text continuous sentence describing the scene into the phone

Press STOP

After five days the phones were collected. The soundtrack was extracted using ffmpeg and then sent for recognition to the Lingsoft (http://stt.lingsoft.fi/stt.php) voice recognition system. Which in turn produced plaintext words that were stored in a database together with the video.

5.

RESULTS AND ANALYSIS

To get an estimate of how the transcription succeeded, we calculated the percentage of correctly recognized tags, meaning words spoken after the keyword ‘KIRAHVI’ using a small group of samples 97 out of approx. 800 total samples. Out of these samples there were 54 correctly recognized tags and 43 words recognized incorrectly (or not at all). This corresponds to 56 % accuracy, which is much less than what the recognizer

22 


does in ideal circumstances (with no background noise, quality microphone, clearly spoken words).

Furthermore, in discussions with the student group after the production week it became clear that the main difficulties in our tagging experiment were both technical and due to the end users:

People did not think carefully before speaking their tags, which resulted in sloppy pronunciation and bad recognition. Many of the words were out of vocabulary because of names of persons and physical locations in the city. The students should have been put through a retrieval exercise first, learning to identify what constitutes a “good” tag. In this way the use of sensible tags would have increased Similarly, if they had a chance to try out an automatic speech recognition system in advance they would have understood the importance of speaking clearly, into the microphone, with a firm voice and pronunciation thus improving the ASR process. The amount of tags per clip was quite good, but some students clearly did not remember what they just had filmed, or were suffering from the awkwardness of the milieu they where in at the moment. The students said at times they felt silly entering the tags at public locations. The tags were for the most part substantives and verbs, with a few adjectives mixed in. The content of the videos was to large extent scenery from the places we asked them to visit. Many clips were shot at the University where they study – as well as shots of their friends doing their everyday business. Not very representative of what typical UGC material could be. Even though the group of students were enthusiastic few of them managed to produce the required 5min of footage per day. The free text sentence in the end was regressed into words in their base form, making the whole idea of one “human readable sentence” useless. The main keywords were not always understood as the triggers (delimiters) they were intended to be. The GPS-based location metadata was lost altogether since its not stored with the actual video file. And we did not understand how to retrieve it. Also the GPS failed to get adequate satellite coverage in the time needed when an interesting shooting opportunity appeared. The process of extracting the audio from hundreds of video files was tedious, even thought ffmpg and some bash scripts made it easier. Our original idea that the video should be uploaded over the existing open wlans in Helsinki was doomed because of the actual transmission speeds from the phone being far to slow. Many did not realize that you cannot shoot with the phone in horizontal landscape mode; this led to a number of videos that would have to be rotated 90 degrees. Often the footage was surprisingly steady, probably because the students are all familiar with the use of a video camera – thus not representative of a general UGC.

6.

CONCLUSIONS

The results show that the idea of in situ speech recognition is promising but having to do the speech recognition offline makes the process unpractical. The in situ aspect is important, when entering the tags immediately the author has a better recollection of what he/she has just shot. This makes the

23 


tagging process more natural than having to go through the material and tag it in retrospect.

7.

REFERENCES

Boczkowski, P.J. (2004a) ‘The Processes of Adopting Multimedia and Interactivity in Three Online Newsrooms’, Journal of Communication 54(2): 197–213.

Boczkowski, P.J. (2004b) ‘Digitizing the News: Innovation in Online Newspapers’. Cambridge, MA: MIT Press.

Mitchelstein, E. and Boczkowski, P.J. “Between tradition and change: A review of recent research on online news production,” Journalism, vol. 10, 2009, pp. 562-586.

Wardle, C. and Williams, A. (2010) ‘Beyond user-generated content: a production study examining the ways in which UGC is used at the BBC’. Media Culture Society 2010

Niekamp, R. “Sharing Ike: Citizen Media Cover a Breaking Story,” Electronic News, vol. 4, 2010, pp. 83-96.

Scheufele, D.A. and Nisbet, M.C. “Being a Citizen Online:

New Opportunities and Dead Ends,” The Harvard International Journal of Press/Politics, vol. 7, 2002,

‘Rodden K, Wood K, How Do People Manage Their Digital Photographs? Chi 2003.

W. E. Mackay, “EVA: An experimental video annotator for symbolic analysis of video data,” Acm Sigchi Bulletin, vol. 21, no. 2, p. 71, 1989.

Abowd, G. D., Gauger, M.,Lachenmann A.2003.The family video archive: An annotation and browsing environment for home movies. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval, 1–8.

Shamma D. A, R. Shaw, P. L. Shafton, and Y. Liu, “Watch what I watch: using community activity to understand content,” in Proceedings of the international workshop on Workshop on multimedia information retrieval, 2007, p.

275–284.

N.

Diakopoulos, S. Goldenberg, and I. Essa, “Videolyzer:

quality analysis of online informational video for bloggers and journalists,” in Proceedings of the 27th international conference on Human factors in computing systems, 2009, p. 799–808.

P.

Cesar, D. C. A. Bulterman, J. Jansen, D. Geerts, H. Knoche, and W. Seager, “Fragment, tag, enrich, and send,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 5, no. 3, pp. 1-27, 2009.

M.

Cherubini, X. Anguera, N. Oliver, and R. de Oliveira, “Text versus speech: a comparison of tagging input modalities for camera phones,” in Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services, 2009, p. 1–10.

R.

Zhang, S. North, and E. Koutsofios, “A comparison of speech and GUI input for navigation in complex visualizations on mobile devices,” in Proceedings of the 12th international conference on Human computer

interaction with mobile devices and services, 2010, p. 357–

360.

P.

Froehlich and F. Hammer, “Expressive Text-to-Speech: A user-centred approach to sound design in voice enabled mobile applications,” in Proc. Second Symposium on Sound Design, 2004

Value-added services and identification system: an approach to elderly viewers

Telmo Silva

CETAC.MEDIA/DeCA Universidade de Aveiro Portugal +351 234370200

tsilva@ua.pt

Jorge Ferraz

CETAC.MEDIA/DeCA Universidade de Aveiro Portugal +351 234370200

jfa@ua.pt

ABSTRACT

Nowadays, with the advent of TV technical and interactive improvements, operators may provide to user a more personalized television experience which demands a reliable identification system - centred on the viewer rather than on the Set-Top-Box. When senior viewers are at stake, an automatic and non-intrusive system seems to be more suitable than the input of a user ID and a matching PIN. In this work, with an approach based on hands-on experience, demos, direct observations and interviews we attempt to find the Viewer Identification System more suited for this generation of users.

Categories and Subject Descriptors

H.1.2 [Models and Principles]: User/Machine Systems–human factors.

General Terms

Human Factors, Experimentation, Verification.

Keywords

User Centred Design, iTV, Elderly, Viewer Identification.

1.

INTRODUCTION

Watching TV is a daily activity for most of human beings. In recent years, with the advent of new TV distribution systems such as Digital Terrestrial Television (DTT) and IPTV (Internet Protocol Television), this activity is changing[1]. Some of these systems introduced a return channel which has the potential to provide a high level of content personalization. In this technological scenario, the multiplicity of interactive TV (iTV) services faces a constant increase. Since the typical scenario relies on multiple unidentified viewers using the same STB the offered experience is not completely adjusted to the viewer. This limitation can be overcome if the TV provider knows who is really watching TV, allowing the offer of interactive services more suitable to the viewers’ profile, such as: personalized ads, automatic tune of favourite channels, adjusted audio descriptions, personalised health care systems or communication services. To accomplish this it is of a great importance to improve the TV provider infrastructure with a reliable Viewer Identification System (VIS). In the particular context of this work we are especially interested in the development of a VIS targeted to senior viewers. Therefore it is important to understand their needs, motivations and behaviours when they are in front of the TV set. The motivation for this target audience derives from the actual worldwide scenario where the number of older persons is increasing as confirmed by the World Health Organization (W.H.O.). It reports an increasing rate of 2.7% per year in the group of people having more than 65 years (with a prediction of 2 billion people in this

Pedro Almeida

CETAC.MEDIA/DeCA Universidade de Aveiro Portugal +351 234370200

almeida@ua.pt

Osvaldo Pacheco

Depart. Elect. Telec. e Inf. Universidade de Aveiro Portugal +351 234370200

orp@ua.pt

group in 2050) [5]. This trend justifies a careful consideration not only of the needs and characteristics of this target group when developing appliances and services to their homes but also on their corresponding gratifications [6].

2. IDENTIFYING ELDERLY VIEWERS

Each person/environment set is unique and encloses specificities that are complex to address in system’s design. Taking this into account, we defined a research process that begins with a set of exploratory interviews, followed by the development and validation of a prototype through a new set of interviews. This research process is supported in a User-Centred Design approach as David Geerts suggests in [2]: "A good human-centred design approach will lead to applications that take user’s needs and the context of use into account".

2.1 Research process

The defined research process starts with a set of five exploratory interviews to help in the system design. The experience gathered from this first contact helped us adjust the interviewing style. Taking in consideration the sociological and technical literature review and the data collected on this first phase, a prototype (explained in 2.1.2) was developed to support a following round of interviews. This new round (nine interviews) was carried out to gather opinions concerning a medical reminder service (thought as one of the potential value- added services to this age group) and the different identification systems presented in the prototype.

2.1.1. Exploratory interviews – phase I

Elderly are a highly heterogeneous group that live in multifaceted environments influenced by several social structures. Due to this heterogeneity we decided to use a broad interview guide approach to ensure that all the issues were addressed in the interviews. This approach allowed a degree of freedom and adaptability in the interviewing process that was important to create a more relaxed and enthusiastic environment [3].To the exploratory interviews the participants (two women and three men) were, randomly, selected using a list of inhabitants from Anadia, a small size Portuguese city. In order to assure a relaxed environment in the interviews a preliminary phone call to each interviewee explaining the process and the motivations of the work was made; and all the interviews were carried at elders’ houses. In this group, the estimated time spent viewing television was three and a half hours per day. Thus, we could verify that people of this set: i) do not use computers and internet often; ii) watching TV is their main occupation; iii) generally, don’t like to speak about technological gadgets; iv) have difficulties in conceiving scenarios related with the integration of new technologies; v) most of them are concerned about health issues and agree on the advantages of an iTV

24 


service in this field. Concerning the viewer identification system, we described (using a common sense language) a set of options to the interviewees: i) RFID card (a card that should be passed near a reader); ii) Fingerprint reader in remote control;

iii) PIN code; iv) Voice recognition; v) a bracelet with an

identifier; vi) Mobile phone with Bluetooth activated; vii) Face recognition; and viii) Remote control with a gyroscope providing handling recognition. These interviews (phase I) revealed that without a prototype it becomes difficult to the seniors to clearly identify the advantages of an automatic identification system in iTV context. Due to this constraint, they tended to disperse their answers: Fingerprint reader and RFID card each got one vote and face recognition two. The other methods did not get votes. However, three of the five interviewees referred the need to be able to turn-off this system anytime they wanted. All valued the importance of a system that can be used to help in daily activities and events (e.g. medical prescription) and help their caregivers’ network. After these interviews we strengthen the need to develop a functional prototype to clearly demonstrate the identification methods correlated with a value-added service.

2.1.2. medControl prototype

In order to present to users a layer of services that benefit from

the identification systems, the prototype was built on top of a

medical reminder service. This module (medControl) was developed under the research project iNeighbourTV (PTDC/CCI-COM/100824/2008) targeted at senior citizens. The medControl triggers alerts on top of the TV screen when the senior viewer needs to take his medication. MedControl was developed using the MS Mediaroom Presentation Framework (PF). Over this iTV service a multi-modal viewer identification system was developed and used in the interviews. This multi- modal system comprises the ability to perform viewer

identification through: i) PIN insertion using the remote control;

ii) Bluetooth pairing with the user mobile phone; and iii)

Detection of 13.56 MHz RFID tags (in an identification card).

A laptop computer running the PF simulator was used as a STB

at the interviewees’ homes. A RFID tag reader and a Bluetooth

driver were also part of the prototype. The identification module, a Java based software (VIS- Viewer Identification System), reads the RFID tags, discovers the nearest Bluetooth

devices and forwards viewer identification data to the iTV service.

2.1.3. Exploratory interviews – phase II

This set of interviews was useful to get insights from the participants’ experiences and to gather more depth information onthe concerned topics [3]. This type of interview is common in the development of projects based in an User Centred Design (UCD) methodology [4]. Like in the first phase of interviews, we visited the participants at their homes in order to maintain them engaged at their natural environment. The researcher also tried to contribute to a relaxed environment making clear that it was not intended to test the participants’ technical skills. The second group of interviewees included nine new participants (five women and four men) over 55 years. The adopted procedures were similar to the ones of the first group of interviews. All the invited individuals accepted, were very kind and demonstrated their willingness to help and to be interviewed again if necessary. All interviews, except one (due

to a request from the interviewee) were recorded. During the

tests/interviews we could figure that the use of the prototype was very important giving the interviewees a solid and tangible image of the VIS goals. This evaluation helped us not only to improve the prototype but also to get information about the

25 


most suitable VIS. Along with the prototype we also proposed other identification options (same as in phase I) and we tried to perceive which the favourite one was.

2.1.4. Selected findings

The collected data (from interviews phase I and phase II) show that the estimated time watching TV (from all the participants)is around three and a half hours a day. Specifically from phase II we gathered the following information: i)Despite all the interviewees were familiarized with a TV set, they used the remote control mostly to adjust volume and select channels (6 in 9 users); ii)All the interviewees considered the identification system useful and recognized the enhancements that can be obtained in iTV services based on it; iii)4 in 9 of the interviewees referred that the VIS should be as automatic as possible (without the need of user intervention); iv)Communication services and life support systems, specially related with medical care, were often referred as a key enhancements that could be obtained if a viewer identification system is implemented; and v) Help instructions should be largely used and should be “always present”. In spite of these general indications our main focus was to detect a trend amongst the several automatic user identification techniques. Concerning this fact, we found that the spectrum of answers about VIS was considerably large making it impossible to find a clear trend.

3. CONCLUSIONS

At the beginning of this work we aimed to identify a trend concerning a VIS and we defined a simple research process. We also realized that it was very difficult to illustrate and explain our ideas without a tangible prototype. Although the results of the interviews were non-statistical it was clear that this is an excellent mean to involve and explain to older people technological enhancements in interactive TV area. Regarding the Viewer Identification System, we cannot clearly identify a general suitable method.

4.

REFERENCES

[1]

Ardissono, Liliana, Kobsa, Alfred, and Maybury, Mark T., Personalized Digital Television. Human–Computer Interaction Series. Vol. 6. 2004: Springer. 331.

[2]

Geerts, David, Sociability heuristics For interactive Tv. Supporting the social uses of television., in Faculteit

Sociale

2009, Katholieke Universiteit

Leuven.

[3]

Kvale, Steinar, Interviews: An Introduction to Qualitative Research Interviewing, ed. S. Publications. 1996.

[4]

Lewis, Clayton and Rieman, John, Task-centered user interface design - A Practical Introduction. 1994.

[5]

Organization, World Health. World Health Organization launches new initiative to address the health needs of a rapidly ageing population. 2004 cited 2-1-2011, Available from:

http://www.who.int/mediacentre/news/releases/2004/pr60/

en/.

[6]

Ruggiero, Thomas E., Uses and Gratifications Theory in the 21st Century. Mass Communication & Society, 2000. 3(1): p. 34.

Doctoral
Consortium

26 


Research for Development of Value Added Services for Connected TV

Tanushyam Chattopadhyay

Innovation Lab, Kolkata Tata Consultancy Services India

t.chattopadhyay@tcs.com

ABSTRACT

Recent trends on emerging market [1], [2] shows that the connected TV is becoming very popular in developing coun- tries like India, Philippines. Connected TV can be described as an Internet enabled TV. One such product, referred as Home Infotainment Platform (HIP) [3], combines the func- tions of a television and a computer, by allowing customers to use their television sets for low-bandwidth video chats and access internet websites. It is now commercially avail- able in India [4] and Philippines [5]. The research presented in this thesis is motivated by the business need to implement some value added services, hereinafter referred as VAS, on top of this product which can work as the Unique Selling Point for the above mentioned product. Main motivation behind the research described in this thesis is to develop some interesting VAS for the product of the company using frugal computing. As a result of that the thesis is not very focused to a particular problem, instead provides a frugal solution for several goals need to be achieved to develop the VAS for the global product. Some of such planned VAS for those connected TVs are video conferencing, Video encryp- tion and watermarking, context based web and TV mash up, video summarization and Electronic Program Guide (EPG) for cable feed channels [8]. This thesis is devoted to the development of those above mentioned VAS for connected TV. As all these services need to be developed on an em- bedded platform, the primary task is to realize the required video CODEC on the target DSP platform. As H.264 is ad- judged as the best video CODEC of the day [9] because of its compression efficiency, video quality, and network friend- liness, we have developed most of the VAS on top of H.264 CODEC. Security can be ensured by either encrypting the video or putting a watermark in the video. Context can be extracted at top level by recognizing the channel the user is viewing and then getting the relevant information from the website of that particular channel. On the other hand, textual information in a TV show provide some information related to the show at any particular instant of time.

Keywords

VAS, Connected TV, H.264, Text localization, EPG, Video Security

1. INTRODUCTION

The total media viewing and sharing experience is changing and getting richer everyday, as videos, music and other mul- timedia content flood the Internet. The main reason behind the popularity of this product (HIP) in developing countries is that the Internet penetration in those countries is signifi- cantly low [6] compared to the penetration of Television. As per the report [7], 75% of the population of India owns a TV. TV has been a favored device of home infotainment for decades. In order to provide an unified Internet Experience on TV, it is imperative that the Internet experience blends into the TV experience. This in turn means that it is neces- sary to create novel VAS that enrich the standard broadcast TV watching experience through Internet capability. This necessity is eventually translated into the need for different applications like secured distribution of multimedia content, communication using video chat over TV, and applications that can understand what the user is watching on broad- cast TV (referred to as TV-context) and provide user with additional information/ interactivity on the same context us- ing the Internet Connectivity. Understanding the basic TV context is quite simple for digital TV broadcast (cable or satellite) using metadata provided in the digital TV stream. But in developing countries, digital TV penetration is quite low. For example, in India, more than 90% TV households still have analog broadcast cable TV. Understanding the TV context in the analog broadcast scenario is really a big chal- lenge. Even for the small percentage of homes where satellite TV has penetrated in form of Direct-to-Home (DTH) ser- vice, almost all of them lack in back-channel connectivity for proving true interactivity.

2. REVIEW OF RELATED WORKS

In this section a brief overview of the state of the art in the related field is given. Realization of Video CODEC on Embedded Platform : In the literature [9] it is rep orted th at the improvement in video quality and compression ratio for H.264 is obtained at the cost of increase in computational complexity and memory requirement. So the State of the

Detail discussion

art is analyzed in light of these challenges.

can not provided here because of the page constraint and so we are providing the gist of the analysis here. Reduction of Computational Complexity : Two different approaches were taken to reduce the computational complexity which can

27 


be measured in terms of Mega Cycles Per Second (MCPS).

These approaches are (i) Platform independent optimization and (ii) Platform specific optimization. Some approaches are also there who had described the optimization of the encoder

execution time as a whole like [10], [11]. Memory Optimiza- tion: In [12] authors have proposed a novel near-optimal fil- tering order so that significant reduction in memory require- ment is achieved. This work also gives significant reduction reduces MCPS. However, their methodology is applicable to an FPGA prototype. It cannot be used in a commercially available DSP platform, where the user does not have the flexibility to modify the hardware architecture. The above state of the art reveals some limitations like the platform independent optimization techniques gives good optimiza- tion but at the cost of coding efficiency. Moreover some

of these algorithms are sub optimal and not compliant to

further processing. The Text Information Extraction (TIE) module localizes the candidate text regions from the video. The state of the art shows that the approaches for TIE can be classified broadly in two sets of solution: (i) Using pixel domain information when the input video is in raw format and (ii) Using the compressed domain information when the input video is in compressed format. A comprehensive sur- vey on TIE is described in the paper [19] where all differ- ent techniques in the literature between 1994 and 2004 have been discussed. A survey on the recent works in this field can be found from [20].

EPG for RF Enabled TV: Some related work on the chan-

nel logo recognition can be found in [21] -

performance is observed in x86 platform for the approaches described in [24]. But the approaches taken in [24] involve

[24]. The best

the standard. These algorithms are generic and thus can be applied to any type of videos. As the target application

PCA and ICA which is very much computationally expen- sive and thus is difficult to be realized in the said DSP plat-

is

mainly video telephone and video conference, the motion

form to get a real time performance. So there is no solution

in

the videos are very less. So if the nature of video can

is

available in the literature that can recognize the channel

be exploited, more optimization can be obtained. Thus in this thesis an optimization technique is proposed based on the statistical analysis of the selected mode for these type

of videos. On the other hand, the platform specific opti-

mizations are not suited for our choice of video conferenc-

telephone. No literature also focuses on efficient

rate control at low cost to get a better video quality.

ing/ video

A comprehensive survey on the H.264 video security is de-

scribed in [13], [14]. As the video security itself is a vast field

of research, we have restricted the State of the Art analy-

sis to the study of encryption and watermarking for videos and more specifically for H.264 compressed domain videos. Encryption: Description of video encryption can be found

in [15] - [18]. Some of such techniques are (i) Encrypt the

motion vector, (ii) Encrypt the entropy coded stream or (iii) Scramble the prediction modes to achieve encryption. But to the best of our knowledge only two such work can be found

in [16], [17]. Watermarking: Different classifications for wa-

termarking technology is described in [14]. Broadly video watermarking techniques can be classified in two types of ap- proaches namely (i) Pixel Domain where watermark can be directly inserted in the raw video data and (ii) Compressed domain where watermark is integrated during encoding pro- cess or implemented by partially decoding the compressed video data. The major problem of implementing the pixel based approaches in the proposed solution is that there is

an additional overhead of decoding the compressed video. Moreover the watermarking technique for the proposed sys- tem should be compliant to compressed H.264 video format

which differs from the previous video codecs in different as-

[9]. We have also described the dif-

ferences between H.264 and other video codecs in chapter

pects as described in

2.

Text Information Extraction from Video: The input video

format for the proposed system is different for different sources

of input signal. The input video may come from a Direct To

Home (DTH) service or in form of Radio Frequency (RF) ca- ble. In case of DTH, the input video is in H.264 or MPEG or any other compressed digital video format and on the other hand in case of video RF cable, the input is an analog video signal. In the second case initially the video is digitized for

logos realtime and can provide the EPG for RF feed TVs.

3. MOTIVATION FOR THE PRESENT WORK

Review of the previous works on VAS for HIP like systems reveals that most of the studies concentrate on different sub- problems instead of providing a complete end to end solu- tion. The work embodied in this thesis is motivated to fill this gap. The major challenge in developing such a product

is the resource constraint namely CPU speed and memory

of the target hardware. Some of these algorithms describe

a good solution for some of the sub-problems in PC envi-

ronment. But these solutions cannot be implemented on a fixed point DSP platform. The proposed study is focused on developing the following VAS like (i) security of the broad- cast video, (ii) context information extraction from streamed video. Moreover all of those solutions need to be deployed in the target hardware. So we have a plan to incorporate

the following VAS as a feature for the HIP.

Low bandwidth video applications: This feature enables the user to do video conferencing with another person having the similar HIP installed in his/her home while watching TV. The basic motivation behind this feature is that the TV screen would be minimized to a lower resolution and the user can use the rest part of the TV screen for video conferencing. This feature comes as the wish list from the customers of urban area of India whose wards are working abroad. Another such solution based on low bandwidth re- quirement is place shifting solution. This solution enables the user to access the home video content over broadband. But both of these solutions can be implemented when there

is an efficient video CODEC, satisfying the requirement of

high video quality at a low bandwidth, is realized on a DSP

platform. As H.264 is proved to be the best video CODEC of the day, we have implemented H.264 on a low cost DSP platform.

Video Encryption: This feature was motivated by the de- mand from the TV Channel agencies when the PVR was set in the market. The video encryption algorithm allows the user to record the video content using the key which can be derived from the hardware identification number of the PVR or HIP. As a consequence the user can be tracked if he/she

28 


wants to use the recorded content for some illegal business purpose.

Video Watermarking: The need for video watermarking was motivated by the need of one of the major content provider company. They had a need to insert watermark to the con- tent provided by them in a content delivery network (CDN). The same algorithm can be extended for streaming video applications like video on demand (VoD) services provided by the DTH service providers. They also looked for a wa- termarking evaluation system that can evaluate any water- marking system.

Mash up of TV context and Internet information: Living in a generation of Google TV, Yahoo Connected TV, it is impossible to sustain in the market of connected TV without providing the mash up feature. But as in India most of the people are using analog cable TV and all of those above mentioned products are based on Digital videos only, there is a need to develop such system to address this variation of input, too. Moreover the quality of the videos obtained from analog feed video is quite poor in comparison to those obtained from the DTH service.

EPG for RF feed TVs: The same gap in technology arising

motivates us to develop

such a service of EPG for the users using RF feed signal for TV.

from

the source

of video

content

We have proposed the solutions that can perform at per of the 80% accuracy and efficiency of the related best PC based solution at a 20% cost in terms of execution time and hardware cost. This concept, commonly known as froogle computing, is mainly targeted for the CE products in de- veloping countries. Current thesis is mainly motivated to provide such froogle solutions that can be deployed on the top of the HIP product already developed by the organiza- tion.

4. SCOPE OF FUTURE RESEARCH

The study presented here can be extended in several direc- tions. Some of them are highlighted below:

• Video Screen Layout Segmentation: The layout of a video is very complex. We have tried to run different document page layout segmentation techniques on dif- ferent video frames of news video. But none of these methods can produce a significant result.

• Frame by frame annotation of video frame using mul- timodal cues: The proposed method for mash up of web information and TV context is based on textual content of video only. But a better result perhaps can perhaps be obtained if multimodal cues like audio and image can be used. This can be used for annotating the frames and indexing the video.

• Cross lingual Information Retrieval: The textual con- tent from the news video can be further used to retrieve related information from other languages. Script iden- tification and subsequently Cross lingual Information

29 


Retrieval (CLIR) are further research issues involved with this problem.

• Automatic channel logo region identification: We have found that the channel logo region identification for the animated channels is a challenge. Automatic lo- calization for these channels (like 9xM) is a possible future extension of the present research.

5. CONTRIBUTION OF THE THESIS

As far the state of the art is concerned, this thesis has several contributions for development of some VAS for a connected Television. Some of the major contributions are briefly dis- cussed below:

• In this thesis some novel approaches to provide bet- ter video quality, coding efficiency and reduced MCPS even under the constraint of target hardware has been proposed. Improvement in video quality and coding ef- ficiency under a constant bit rate is achieved by imple- menting a novel algorithm for adaptively selecting the basic unit for rate controlling. The proposed method also reduces the computational complexity using plat- form independent and platform specific optimization techniques and yet meets the very low memory con- straint of the target processor for a standard H.264 baseline encoder without sacrificing the rate-distortion performance. The platform independent optimizations are useful as this version of the code can be ported to any DSP platform for further platform specific opti- mizations. Almost 40% MCPS reduction with respect to optimized reference code is achieved at the cost of less than 1% reduction in Peak Signal to Noise Ratio (PSNR).

• This thesis deals with an encryption scheme for H.264 video that can be implemented on a DSP platform. In case of Personal Video Recorder (PVR) enabled STBs and connected TVs any user can easily store any TV program. The proposed technique is capable to protect illegal distribution of video content stored in PVR. This thesis presents a fast yet robust video encryption algorithm that performs real-time encryp- tion of the video in H.264 format on a commercially available DSP platform. This algorithm is applied in

a real-time place-shifting solution on DSP platform,

too. However, the approach has no negative effect as far as compression ratio and video quality are con- cerned. Mathematically, it can be shown that the pro- posed method is more robust than those methods for encrypting H.264 video described in the state of the

art analysis.

• With the advent of high-speed machine, a hacker, now

a day, finds it less difficult to break any encryption key even though it may require large number of attempts. Therefore, an encryption method alone is not sufficient for copyright protection and ownership authentication of stored and streamed videos. In this thesis digital watermarking techniques has been proposed for this purpose. A fast method of watermarking of streamed H.264 video data is proposed in the thesis to meet the real time criteria of a streaming video. This solution

was deployed in a content delivery network (CDN) en- vironment, too.

• In this thesis a novel TV and web mash-up applica- tion is described. This application initially extract the relevant textual information from the TV video com- ing in either analog or digital format and then mash up the related information from the web to provide a true connected TV experience to the viewers. Unlike digital TV transmission it is not possible to automati- cally get contextual information of TV programs from any Meta data. The text in a TV channel is extracted by text region identification followed by pre-processing of the text regions and performing Optical Character Recognition (OCR) on the text regions. The appli- cations are presented for x86 based PC platform and ARM based dual-core platform. This type of system is not available in the literature.

• The thesis presents a novel method for recognizing the channel logos from the streamed videos in real time, which has various applications for VAS in the con- nected TV space. The prototype is developed in X86 platform and then ported on a commercially available DSP with nearly 100% accuracy in real time. In India, where most of the people are still watching TV using Radio Frequency (RF) feed cable, this image process- ing based approach solution for providing EPG is novel in nature.

6.

REFERENCES

[1] A. Wooldridge, “A special report on innovation in emerging markets”, The Economist, Page(s) 6, 17th April, 2010. [2] B. Stelter, “A TV-Internet Marriage Awaits Blessings of All Parties”, [3] A. Pal, C. Bhaumik, M. Prashant, A. Ghose, “Home Infotainment Platform”, UCMA2010, Miyazaki, Japan, June 2010. [4] Economic Times, “Dialog can raise internet penetration”, Economic Times Kolkata, Section-Business and IT, Page(s)5, Apr 27, 2010. [5] “SMART launches SurfTV”, http:// smart.com.ph/corporate/newsroom/SurfTV.htm, Last accessed on 13th Jan, 2011. [6] “World Internet Usage Statistics News and World Population Stats”, http:// www.internetworldstats.com/stats.htm., Last Accessed on Oct 2010.

[7] “ITU report baffled over RP’s high mobile phone, TV penetration standard document styles”, http:// technews.com.ph/ ?p=1627, Last Accessed on Oct 2010. [8] T. Chattopadhyay and C. Agnuru, “Generation of Electronic Program Guide for RF fed TV Channels by Recognizing the Channel Logo using Fuzzy Multifactor Analysis”, ISCE’10,7-10 June, Germany,

2010.

[9] I. E. G. Richardson, “H.264 and MPEG-4 Video Compression”, ISBN 0-470-84837-5, 2003.

[10] X. Kim and C. C. Jay Kuo, “A Feature-based Approach to Fast H.264 Intra/Inter Mode Decision”, ISCAS’05, Page(s)308-311, May 23-26, 2005. [11] H. Wang and Z. Zhu, “Fast Mode Decision and

30 


Reduction of the Reference Frames for H.264 Encoder”, IICCA’05, Vol. 6, Page(s) 1040-1043 , June

2005.

[12] S. Y. Shih,

C. R. Chang, and

Y. L. Lin, “A near

optimal deblocking filter for H.264 advanced video coding”, Asia and South Pacific Conference on Design Automation, 2006., Page(s) 24-27, 2006.

[13] T. Chttopadhyay and A. Pal, “A Survey on Video Security with Focus on H.264: Steganography, cryptography and watermarking techniques”, Proc. of the 2nd National Conference on Recent Trends in Information System (ReTIS 2008), Page(s) 63-67, Kolkata, India, 2008.

[14] S. Bhattacharya, T. Chttopadhyay, and A. Pal, “A Survey on Different Video Watermarking Techniques and Comparative Analysis with Reference to H.264/ AVC”, ISCE’06, Page(s) 616-621, Russia, 2006. [15] Y. Ye, X. Zhengquan, and L. Wei, “A Compressed Video Encryption Approach Based on Spatial Shuffling”, Proc. of 8th International Conference on Signal Processing, Volume 4, Page(s)16-20, Greece,

2006.

[16]

Encryption Algorithm for H.264”, ICICS’05, Page(s) 1121-1124, Thailand 2005. [17] Y. Zou, T. Huang, W.Gao, and L. Huo, “H.264 video encryption scheme adaptive to DRM”, IEEE Transactions on Consumer Electronics, Vol.52, no.4, Page(s) 1289-1297, Nov. 2006.

[18] Z. Liu and X. L. Sch, “Motion vector encryption in multimedia streaming”, Proc. of 10th International Multimedia Modelling Conference, Page(s) 64-71, http:// www.nytimes.com/2011/ 01/ 10/ business/m edia /10tv.html? r A=ustralia, 2004. 2, 9th January, 2011. Last accessed on 14th January,

Y. Li, L. Liang, Z. Su, and J. Jiang, “A New Video

2011.

[19] K. Jung, K. I. Kim, and A. K. Jain, “Text Information Extraction in Images and Video: A Survey”, Pattern Recognition, Volume 37, Issue 5,Page(s) 977-997, May

2004.

[20] T. Chattopadhyay, A. Pal, A. Sinha, “Recognition of Characters from Streaming Videos”, Chapter 2 of Character Recognition, Published by SCIYO,Page(s) 21-42, 2010, ISBN 978-953-307-105-3. [21] E. Esen, M. Soysal, T. K. Ates, A. Saracoglu, and A. A. Alatan, “A fast method for animated TV logo detection”, CBMI 2008, Page(s) .236-241, June 2008.

[22] A. Ekin and E. Braspenning, “Spatial detection of TV channel logos as outliers from the content”, in Proc. VCIP, SPIE, 2006.

[23] J. Wang, L. Duan,

Z. Li, J. Liu,

H. Lu, and

J. S. Jin,

“A robust method for TV logo tracking in video

streams”, ICME,

2006, ICME, 2006.

[24] N. Ozay and B. Sankur, “Automatic TV Logo Detection And Classification In Broadcast Videos”, EUSIPCO 2009, Page(s) 839-843, Scotland, 2009.

Collaboration in Broadcast Media and Content

Sabine Bachmayer

Department of Telecooperation Johannes Kepler University Linz, Austria

sabine.bachmayer@jku.at

ABSTRACT

In recent, we have observed a marked shift in broadcasting from mainly passive analog, such as conventional television, towards digital technology. This caused a boom in develo- ping interactive applications (beyond teletext in TV) and in inviting the viewer to participate with the content in a colla- borative manner. Several popular participatory TV program formats have demonstrated this by inviting their audiences, for instance, to vote for a person. To date, the collaborative acts have been executed via parallel platforms, such as te- lephone and Internet. In summary, broadcast formats with audience involvement, and technologies for adding interac- tivity both exist but remain unlinked in most cases. This poses a problem since synchronicity is lost, and collaborati- on requires a common focus which, in broadcasting, is the broadcast content. This proposal describes the challenges and working process of my PhD research in the field of computer science which focuses on collaboration in the broadcasting area. The main research issue is examining whether it is possible to expand 1:n parallel broadcasting into collaboration without the usa- ge of parallel platforms. The main contribution is the deve- lopment of a reference architecture for realizing such scena- rios.

Categories and Subject Descriptors

H.5.3 [G roup and Organisation Interfaces]: Collabo- rative Computing—Synchronous Interaction ; H.5.1 [M ulti- media Information System ]: Video—Interactive TV, Col- laborative TV, MPEG-J

General Terms

Design, Human Factors

Keywords

Collaborative TV,

Interactive TV,

Collaboration in Broad-

casting

1.

INTRODUCTION

In the past decade, digital and participatory broadcast technology have emerged besides the traditional analog, main- ly passive (except simple interactive applications like tele- text) and informative broadcasting. Therefore the concept of collaboration in broadcasting is not completely new. A simple and well-established example is inviting the audience to vote for or against a person or an item as it is practised by most participatory reality and casting shows.

State-of-the-art work in collaborative broadcasting can be categorized firstly concerning to the boundedness to TV con- tent (if it is (1) loose or (2) tied to TV content 1 ) and secondly if it focuses either on (a) enhancing broadcast environments with collaborative services or (b) using parallel platforms (e.g., telecommunication features). Case (1a) and (2b) were found quite often in state-of-the art work. Examples of (1a) are synchronous and asynchronous chats or commendation functions, tools for group building and recommendation [1, 3, 5, 6, 11, 12]. (2b) deals with participatory content, using parallel platforms for participa- tion (mostly individual) and collaboration [13, 14]. The gap reveals in case (2a) describing collaborative applications en-

hanced

to TV environments and tied to a certain (genre of )

TV program format. Exceptional cases occur in T-Learning, for instance by L´opes-Nores [9, 10], and in entertaining TV by the LIVE system. The LIVE system provides passive col-

laborative influence on the content through the viewer’s be- havior (e.g., channel switching), which is observed by the broadcaster [7]. Case (1b) is not relevant for this work.

The main contribution of this PhD research is to create the missing link between medium, content and collaboration from a technical perspective. In detail, the following three steps are comprised:

Firstly, the key feature is the development of non-linear and participatory broadcast content which invites the audience to become active instead of adding activity to passive, linear content. Secondly, the development of collaborative applica- tions, which are embedded in the television environment. Thirdly, the realization of a linkage mechanism between the delivery medium, participatory content and the collaborati- ve activity, regardless of whether the collaboration influences delivery medium and content. Delivery medium (hence termed as medium or media ) deno-

from

1 Adapted

socialitv/results.htm , introductory presentation, slide 7

http://soc.kuleuven.be/com/mediac/ 


31 


tes the technical realization and representation of the con- tent. Well-known media standards are, for example, MPEG- 2 and MPEG-4 as employed in the digital video broadcasting (DVB) standard in Europe. Characteristics of the medium

are metadata, frames, time stamps and other features that are defined by the used standard. Content defines the sub- stance that is transmitted via the medium as abstract model (e.g., a movie or radio show) and consumed by the audience. Characteristics of the content are, for instance, start and end point(s), scenes and characters in scenes. Its course is defi- ned by the narrative structure, timing and pace. Access to the content is provided via the medium by a mapping me- chanism (mapping characteristics of the medium to those of the content).

I want to develop a reference architecture for realizing pro-

totypical collaborative broadcast scenarios beyond the well- known voting. One option is to extend MHP 2 with colla- borative support. This was, for instance, suggested in T- Learning by Lo´pez-Nores [9, 10]. Another possibility is to use the MPEG-J framework [8], in MPEG-4 scenarios. For this research, I will pursue the latter possibility. The next section presents the aim and objective of my research, including the main research questions. This is fol- lowed by a description of the methodologies applied and pre- vious work. The paper concludes with prospects and future work.

2. AIMS AND OBJECTIVES

This research project aims to develop a reference architec- ture incorporating a pool of predefined collaborative services on the one hand and standard linkage mechanisms on the

other hand. The architecture shall act as a tool kit for rea- lizing a prototypical collaborative broadcast scenario. The main research question “What is necessary to expand 1:n parallel broadcasting into collaboration?” implies five re- search challenges:

Q1: How can the linkage mechanism be designed? This question addresses the challenge of linking collaborati- on with both, the medium and content.

Link to the medium: by linking

medium. For instance the audience is invited to participate

a live chat to a certain topic, for a certain time period. The

service is called, activated and deactivated by metadata em- bedded in and transmitted via the broadcast medium. Link to the content: by linking to course or characteri- stics of the content. For instance activating a collaborative service to help a candidate answering a question in a game show. The linkage is realized by using the candidate, who is acting in the scene, as a hook point. Q2: How must broadcast content be structured, and where are the hooks to link medium and content to collaboration? Structuring the content is related to the content’s storyline. The key is to design and produce participatory content that invites people to collaborate. Which genres are suitable? For the linkage, a set of hooks must be defined, for instance, the appearance of a certain person or object in a scene of the content, the beginning of frames, time stamps or metadata in the medium.

to characteristics of the

Q3: How must collaboration be designed for this purpose? Are existing collaborative mechanisms (e.g., communication mechanisms, concurrency control, context awareness) and

2 Multimedia Home Platform http://www.mhp.org 


32 


tools (e.g., computer-supported cooperative work tools, ra- ting and recommendation systems) applicable for this pur- pose and for connecting to the broadcast medium / content [4]? In addition, hooks in the collaboration that correspond to those in the medium and content, are necessary (e.g., to ascertain the majority, reactions and counter-reactions).

Q4:

How can

col laborative interaction

be

measured for

further application in a col laborative broadcast

scenario?

To use collaborative interaction for further application (e.g., to influence the course and characteristics of content), it

is necessary to measure (e.g., the level of activity), analy- ze (e.g., outcome of the collaborative activity) and finally quantify (i.e., indicate the outcome of the measurement and analysis as numeric values) the collaborative activity. Q5: What requirements and support must be satisfied? In general, which technology best supports user participa- tion and collaboration? Define technical requirements (e.g.,

a run-time environment), security requirements (e.g., priva-

cy and resistance against attacks), exception handling (e.g., handling the unexpected drop-out of participants) and the

manner of support for this real-time system.

3. METHODOLOGY AND PREVIOUS WORK

section introduces the applied

ting of two main steps, and work done so far.

3.1 Step 1 - Descriptive and Non-Empirical Research

This

methodology, consis-

The

descriptive and

non-empirical research phase

consis-

ted of literary research work and scenario construction.

3.1.1 Step 1a - Literature Review

To classify the state-of-the-art work in “collaboration in

streaming and broadcasting”, as this work was initially entit- led, it was necessary to construct a new taxonomy because, firstly, most of the existing taxonomies were too specialized (e.g., those covering the geometric structure of video content [2] and those relating to the network and the user interface) and, secondly, a focus on the linkage mechanism was missing completely. 32 approaches in this area were chosen for the state-of-the-art analysis with the aim to build a representa- tive overview of scientific work done so far. The taxonomy developed consists of six categories, name-

ly the narration space of the content (to analyze whether the

content is participatory), the level of linkage between colla- boration and medium / content (to analyze if any linkage

exists), the scope of collaboration (to analyze the level of collaboration), the type of interaction (to analyze the type of interaction used for the collaboration), delivery medium

delivery network (b oth to distinguish between streaming

and broadcasting). After conducting this analysis, it was necessary to focus on the linkage between collaboration and content. For this purpose, the results of the analysis were narrowed down by building five classes from the categories level of linkage, nar- ration space and type of delivery network. The classification indicates whether linkage exists and, in case it does, (i) whe- ther it relates to the content (storyline, topic, genre, etc.) or to the delivery medium (e.g., by changing the mediums state collaboratively from play to pause), (ii) to which type of content it links (linear or participatory) and (iii) whe-

and

ther the linkage is realized by streaming or broadcasting. As

mentioned in

collaboration has been realized in very few cases. In strea- ming, collaboration has been realized more frequently with focus on controlling the state of the medium collaboratively or enhancing the medium with interactive elements.

In the following, this work focuses on the broadcast area, to be more exact, on television environments. The decisi- on is justified firstly on the mentioned findings, secondly on technological differences which make broadcasting more at- tractive for collaboration as for example:

- Connectivity - 1:n in broadcasting, 1:1 in streaming.

- Marked standardizations in broadcasting - DVB which uses MPEG-2 / MPEG-4 media format in Europe, by contrast in

streaming where a lot of media formats are in use (e.g., Win-

dows Media, Quicktime, MP4, Flash,

possibility to tie in with popular and in the broadcasting sector well-established participatory TV program formats which currently use parallel platforms. In summary, the result of this step was an analysis of existing work. The lack of linkage mechanisms and reference architectures in broadcasting, and the prevailing usage of linear content became the starting point.

the intro duction, linkage between content and

).

And

thirdly the

3.1.2 Step 1.b - Scenario Building

Based on these results, concrete story scenarios were built and in a next step classified and abstracted to deduce func-

tional and non-functional requirements. The focal issues in the classification of these scenarios are, on the collaborati- on between the viewers (consumers side), and, on linking television medium / content and collaborative services (pro- ducer’s or broadcaster’s side). Summarizing, the consumer

scenarios

were classified and abstracted into a-/ synchronous

influence and non-influence of the medium and content as shown in Figure 1. The producer and broadcaster scenarios into enhancement and real-time change of the medium and content. This work focuses on synchronous scenarios.

a) b) # 1 # 2
a)
b)
# 1
# 2
Non-Influencing Colla bora tive Service Starting Point Legend
Non-Influencing
Colla bora tive
Service
Starting Point
Legend

c)

d) e) <meta- data > Frame #X # 5 Attribute X # 1 changes #
d) e)
<meta-
data
>
Frame #X
# 5
Attribute X
# 1
changes
# # 2
<meta-
<mdeatta-
>
da
t . a
>
# 6
Influencing
Medium

Drama / Game Show

# 3
# 3
# 4
# 4

Content

(structured in

several steps)

Figure 1: Schematic representation of collaborative services without influence (left side) and influence (right side) on medium and content

Consumer scenario: Non-influencing collaboration does not affect the characteristics of the linked medium (e.g., Figure 1c) or course and characteristics of the linked con- tent (e.g., Figure 1a and b). An example would be to ena- ble participation by providing a collaborative quiz service in conjunction with a broadcast quiz show. The quiz service is linked, for instance, to the genre, which is a characteristic of the content, and it is linked to the medium as it is, for example, initiated and terminated by metadata or by a cer-

33 


tain attribute of the medium.

Consumer scenario: Influencing collaboration

affects the characteristics of the linked medium (e.g., Figure 1e) or course and the characteristics of the content (e.g., Fi- gure 1d). One example would be to provide a 60-second chat as a joker in ”Who wants to be a millionaire“. This service is linked to the content and influences its course, and it is linked to the medium, as it is provided for a specified time interval. Influencing the medium means for example, adap- ting the duration of the interval depending on the intensity of the collaborative activity.

Producer,

broadcaster

scenario:

Enhancing

content

Prepare medium and content before broadcast for (non-) in-

collaborative

services and linkage.

Broadcaster scenario: Changing content

Doing syn- chronized changes on the medium and content automatically and in real-time during the broadcast. By abstracting and classifying story scenarios, functional and non-functional requirements were identified. Functional requirements include the enhancement and update of medi- um and content, provision of private and open groups, sessi- on management, notification of opportunity to participate, medium and content analysis.

3.2 Step 2 - Engineering Research

architec-

ture (artifact) is modeled, developed and evaluated.

fluencing

collaboration by enhancing selected

In the engineering research phase,

the reference

3.2.1 Step 2.a - Model Construction (using UML)

UML use cases are designed for the 1.) Producer’s and broadcaster’s view:

- Define and provide the collaborative services used.

- Enhance medium and content with selected collaborative services (by using MPEG-J).

- Provide methods and interfaces to define hooks in (a) the medium and content and (b) in the collaboration.

- Link hooks (a) and (b) by using the linkage mechanisms

provided. 2.) Consumer’s perspective (client):

- Provide a player that receives the incoming media stream

and displays the decompressed video content. To enable par- ticipation, the player must analyze the characteristics of the received and decompressed medium and its content. Availa-

ble collaborative services must be indicated to the viewer.

- Connect participants on the one hand, and provide a back channel to the broadcaster (server) on the other hand.

- Measure, analyze and quantify collaborative activity. Send results to the broadcaster (server).

- Provide support and exception handling.

3.) Broadcaster’s perspective (server):

- Receive and analyze incoming quantified data from the

consumers.

- Change medium and content with respect to the received data and their analysis.

- Compress and broadcast the modified media stream.

Each step of the model construction includes the determi- nation and analysis of characteristics and requirements that are necessary on the medium and content, the collaboration and the linkage. This step will lead in answering the research questions Q1 to Q4 in a theoretical manner.

3.2.2 Step 2.b - Construction of an artifact

Based on the previously defined model, a layered reference architecture (Figure 2) will be developed in this phase. The

L ay e r ed 
R e fe r ence 


A r chi t ectu r e: 
 Medium 
/
Con t ent 
 Lin k
A r chi t ectu r e: 

Medium 
/
Con t ent 

Lin k a g e 

Collabo ra tion 

User 
In t er face 

Service
Management 

Lin kage 

Management 

Hooking
 P oints 

Hooking
 P oints 

Management 

Management 

Anal y sis 

Measu r ement

Change 

Anal y sis

Quantifi ca tio
n 

Net w ork 
L ay er 

Medium 

Lin
Collabo ra tio
/


Figure 2: Schematic Representation of the Reference Architecture

reference architecture processes an MPEG-2 / MPEG-4 vi- deo stream delivered via IPTV and enables the implemen- tation of a prototypical collaborative broadcasting scenario, as described in the previous sections and in step 1.b. This step will lead in answering the research questions Q1 to Q4 in a practical manner.

3.2.3 Step 2.c - Destruction of an artifact

Destruction of the artifact includes testing and evaluati- on. Testing of this reference architecture will be conducted by building working prototypes of the defined collaborative broadcast scenarios. Due to a lack of time and money, the prototypes will use existing (and maybe linear) video con- tent and consider the consumer’s and the producer’s, broad- caster’s perspectives, as mentioned before. By building pro- totypes from this reference architecture, its functionality is proven. The prototypes in turn are examined by measuring the previously specified essential and desirable functional and non-functional requirements. This step will lead in testifying the research questions Q1 to Q4 and answering Q5.

4. PROSPECTS AND FUTURE WORK

Having finished descriptive and non-empirical research and step a) of the engineering research, I have recently started constructing the artifact. During the next months I will de- velop a reference architecture as illustrated in Figure 2. Be- fore developing the basic packages (Medium / Content, Lin- kage and Collaboration ), an analysis of broadcast content (of medium and content) and of existing collaborative services will be done. The reference architecture will be evaluated by realizing the already defined story scenarios (mentioned in step 1.b) prototypic. The prototypes will in turn be evaluated in terms of the previously elaborated essential and desirable functio- nal and non-functional requirements. Since the design and production of a participatory broadcasting content format is beyond the scope of this work, the prototypes will be tested with existing broadcast video content. However to address the “big picture” and to give a complete example scenario, I will create a storyboard of a participatory TV program format. Its purpose is to illustrate the role of participatory content in any activity beyond passive watching of TV.

34 


5.

REFERENCES

[1] J. Abreu, P. Almeida, and V. Branco. 2beon:

interactive television supporting interpersonal communication. In Proceedings of the 6th Eurographics workshop on Multimedia 2001, pages 199–208.

Springer

New York, 2002.

[2] S. Bachmayer, A. Lugmayr, and G. Kotsis. Convergence of collaborative web approaches and interactive tv program formats. International

Journal of Web Information Systems, 6(1):74 – 94,

2010.

[3] E. Boertjes. Connectv: Share the experience. In Proceedings of the 5th international conference on Interactive TV: a Shared Experience, volume 4471 of Lecture Notes in Computer Science, pages 139–140. Springer, 2007. [4] U. Borghoff and J. Schlichter. Computer-Supported Cooperative Work - Introduction to Distributed Applications. Springer Berlin, 2000.

[5] T. Coppens, L. Trappeniers, and M. Godon. Amigotv:

A social tv experience through triple-play convergence. In Proceedings of the 2nd international conference on interactive TV, Brighton, UK, 2004.

[6] F. de Oliveira,

A3tv: anytime, anywhere and by anyone tv. In Proceedings of the 12th international conference on Entertainment and media in the ubiquitous era, pages 109–113. ACM, 2008. [7] S. Gru¨ nvogel et al. A novel system for interactive live tv. In Proceedings of the 6th international conference on Entertainment Computing, pages 193–204. Springer Berlin / Heidelberg, 2007.

[8] ISO / IEC. International Standard 14496:

Information technology — Coding of audio-visual objects, Part 5: Reference software, 2001. [9] M. Lo´pez-Nores et al. Technologies to support collaborative learning over the multimedia home platform. In W. Liu, Y. Shi, and Q. Li, editors, ICWL, volume 3143 of Lecture Notes in Computer Science, pages 83–90. Springer, 2004. [10] M. Lo´pez-Nores et al. Formal specification applied to multiuser distributed services: experiences in collaborative t-learning. Journal of Systems and Software, 79:1141–1155, August 2006. [11] K. Luyten et al. Telebuddies: social stitching with interactive television. In Proceedings of the 1st international conference on Human factors in computing systems, pages 1049–1054. ACM, 2006. [12] M. Nathan et al. Collaboratv: making television viewing social again. In Proceeding of the 1st international conference on Designing interactive user experiences for TV and video, pages 85–94. ACM,

C. Batista, and G. de Souza Filho.

2008.

[13] P. Tuomi. Sms-based human-hosted interactive tv in finland. In Proceedings of the 1st international conference on Designing interactive user experiences for TV and video, pages 67–70. ACM, 2008. [14] M. Ursu et al. Interactive tv narratives:

Opportunities, progress, and challenges. ACM TOMCCAP, 4(4):1–39, 2008.

All links were checked in April 2011.

Televisual Leisure Experiences of Different Generations of Basque Speakers

Iratxe Aristegi Fradua

Faculty of Social and Human Sciences - University of Deusto Unibertsitateen etorbidea 24 48007 Bilbao (+34) 944139075

Basque Country, Spain iariste@deusto.es

Xabier Landabidea Urresti

Institute of Leisure Studies - University of Deusto Unibertsitateen etorbidea 24 48007 Bilbao

Basque Country, Spain (+34) 944139075

xlandabidea@deusto.es

Aurora Madariaga Ortuzar

Institute of Leisure Studies - University of Deusto Unibertsitateen etorbidea 24 48007 Bilbao (+34) 944139075

Basque Country, Spain aurora.madariaga@deusto.es

ABSTRACT

In this research project I argue that the connection between Leisure Studies and Audience Studies is unavoidable, fundamental and fertile in its possibilities, as media are becoming increasingly significant in defining the leisure of the 21 st century citizens of affluent societies. Audience Studies have focused on the quantitative analysis of media consumption, while research practices traditionally associated with Leisure Studies such as time budgets have tended to measure observable activities overlooking the meanings given to and taken from those same practices [10].

This project is innovative in stressing the need of a new model for audience analysis that blends the polysemic nature of the terms leisure and television, especially in a transforming media landscape. It is argued that adapting the concept of leisure experience to the field of TV audiences is key for a better understanding of the present and future forms of television. This paper presents the justification and the relevance of the issue at hand, the broad theoretical concepts on which is based, the thesis’ aim, scope and objectives and a brief note about its methodology and proposed case study with Basque speakers, still in a preliminary stage of definition, as well as the provisional index of the study.

Categories and Subject Descriptors

H.5.2. [User Interfaces] - Evaluation/methodology.

General Terms

Human Factors, Measurement, Performance.

Keywords

Audience Studies, Leisure Studies, Television, Experience, Media Ecosystem, Basque speakers.

1.

JUSTIFICATION

Media related leisure has gained increasing psychological, sociological and economic weight in industrialized societies to such degree that it can be argued that entertainment and media have become inseparable concepts for the majority of the citizens of affluent societies [3]. Today it becomes

increasingly difficult to portray the leisure of a 21 st century citizen without considering the media texts (written,

that he/she consumes and

)

audiovisual, digital, analog participates in.

Mass media have become meaningful elements in the everyday life of individuals and communities alike in postindustrial and developing countries. Citizens’ everyday spaces —sitting and sleeping rooms, class rooms, cultural centers, pubs, vehicles—and daily times -routines, habits, frequencies, periods—are saturated with media texts. Individuals of developed countries spend most of their free time watching television, listening to the radio, surfing the internet, playing on the console. Media have become omnipresent, and they have done it especially through leisure. Today, it is unavoidable to study leisure when considering media as well as studying media when considering leisure

[10].

Television provides a fertile empirical and conceptual starting point for a joint exploration of the leisure experiences of media audiences. Studying TV and its audiences from the perspective of Leisure Studies is justified both because of the massive aggregate time that is destined to it in and because of its contested but not still overthrown centrality in a rapidly changing digital media landscape. Both present and future TV are critical for Leisure Studies.

Traditional audience research methods have been mostly interested in counting and weighing the time we spend in front of the screen and trying to measure the effects and range of media in their audiences, but so far they haven’t provided us with answers to questions such as how and why do we insert television in our dailyness, what meanings does TV take into people’s everyday life or which pleasures do we find in our relations with it.

35 


Leisure and Audience Studies can no longer avoid these challenges. Not only because media have become, along with tourism, major social manifestations of leisure in contemporary societies, but also because they are the objects of human choice and the specific articulations of human freedom.

My PhD thesis aims to contribute to the understanding of the choices behind audience figures analyzing the meanings that people build in relation to television and the roles that TV plays in their everyday life. A model for the analysis of audience´s televisual leisure experiences will be presented for that purpose.

2. RESEARCH FRAMEWORK

The term television, far from a unique meaning traditionally associated to it, refers to multiple realities: a household electronic device, a social institution, a content production and distribution system, a leisure resource, a leisure practice, an industry, a market and so on. Alain Le Diberder and Nathalie Coste-Cerdan [2] refer to it as an unknown social object, our society’s “immense and central object, which, unable to avoid, we stop perceiving, like the totem, expressing and concentrating all the hopes and fears of the modern tribe” (1990: 12).

TV in its more traditional as well as most interactive and crossmedia forms is, in Javier Callejo’s words [1], a media of multiple identities which accumulates socially represented attributes, such as of that of the audience who represents itself in front of the TV set. Beyond the specific devices and their technical specifications, my thesis is concerned with the concept of television that people constructs in their everyday relations with it: how do they watch, read, understand, love, hate, take into consideration and reject TV in their everyday lives. The object of study of this thesis is the television able to enable or make impossible, facilitate or prevent, limit or condition leisure experiences in relation to it.

Interestingly, the concept of leisure shares its polysemic nature with that of television, to the point that it is difficult to determine what it is and what it is not leisure. Despite the rich and colorful development of interdisciplinary Leisure Studies during the 20 th century, especially in the decade of 1990, a conclusive and universal definition of leisure remains elusive. Although leisure theorists adventure different definitions like the one proposed by Robert Stebbins “Leisure may be defined as: uncoerced activity engaged in during free time, which people want to do and, in either a satisfying or a fulfilling way (or both), use their abilities and resources to succeed at this” [15], or the one by the Institute of Leisure Studies, “Broadly, Leisure comprises freely chosen experiences and actions, carried out in areas of freedom, without primarily utilitarian objectives, and which report satisfaction to the individual.” [11], it is also generally accepted John Neulinger’s proposition that “Perhaps it is best to realize that there is no answer to this question, or better, that there is no correct answer” [12].

Furthermore, as Leisure can be studied from different paradigms, each perspective tends to highlight certain characteristics, while overshadowing others. Indeed, a too simplistic definition becomes problematic when confronted with the nuances of everyday human experience. Traditionally understood in terms of opposition with work—“leisure as rest and recuperation from work” or “as an antidote to the stresses

36 


and strains of modern life” [7]—the concept of leisure has undergone continuous transformations through history, and has reached a relative conceptual and theoretical emancipation, possibly induced by its explosively growing social relevance. “Considered, until very recently to be a danger or a secondary matter, today it is understood as a field of development, identification and a right” [4]

On the one hand, from an objective point of view leisure has to do with the available free time, with the time period spend on doing something, with the resources used and the related actions [4]. It refers to the employed materials, occupied spaces, repeating habits and practices that are carried out. On the other hand, a subjective standpoint gives more relevance to the satisfaction, pleasures and meanings that can arise from the experience. Leisure is an area of human experience which is searched for and composed of freely chosen pleasant activities, but its outcome will never be entirely dependent on the action itself, neither on the subject’s free time, economic or education level by themselves.

As long as leisure is a personal experience, at the same time individual and social, it cannot be understood as a completely subjective phenomenon because a person’s life always will be situated in a specific social and material context. For the Humanist Leisure Perspective of the Leisure Studies Institute of the University of Deusto, leisure is, at the same time, a social phenomenon, an integral personal experience, and a basic human right. This threefold meaning has been explored by the author in relation to the television in his dissertation [9] and represents the theoretical starting point of the current PhD thesis.

Marie Gillespie states that, “The media are cultural institutions that trade in symbols, stories and meanings. As such they shape the forms of knowledge and ignorance, values and beliefs that circulate in society” [6]. It is this trade in meanings, stories and symbols that constitutes the core of their social relevance as manifestations and enablers of leisure, which this thesis aims at exploring. The meanings of television and leisure are not fixed, but change along history and through human groups and individuals.

Exploring the everyday connections that different generations of Basque speakers make between TV and leisure will be one of the keys elements to understand their conceptions of what place does television have in their lives and to understand the evolution of these two terms, charged with multiple meanings.Comparing the discourse of different age-groups, showing different levels of media literacy and expertise with information and communication technologies (ICTs) will help us to clarify the possibilities and manifestations of social and individual interaction with television.

3. SCOPE AND OBJECTIVES

This research project aims at contributing to the Audience Studies with an analysis of the leisure experiences of different generations of Basque speakers that goes beyond the counting of time and the analysis of the textual content of the medium. I attempt to explore television consumption not as a leisure practice, but as a complex, multi dimensional leisure experience. The preliminary objectives and working hypotheses are introduced below:

Main objective: To explore the leisure experiences of different generations of Basque speakers in their relationships with a television in transition.

Secondary objectives:

O1: To explore the contributions of the Humanist Leisure perspective to the Audience Studies and vice versa . O2: To analyze the discourse of participating subjects in relation to their attitudes, emotions and feelings in the use of television in their everyday lives. O3: To identify and compare the key structuring elements of various age-groups´ relation with television and their conception of it. O4: To introduce a model for audience analysis that takes television as a framework for leisure experiences, beyond the perspective of leisure practice.

This project is based on the following working hypotheses:

i. Different people establish and develop different relationships with TV. Television viewers show different skills, goals, strategies and usages of TV, and they result in different experiences, including those of leisure.

ii. Both media and the way audiences engage with media are changing profoundly. Different generations have had different contacts with media that result in distinct expertise developments, which lead to different ways of engaging with media.

iii. The skills, goals, strategies and usages employed in TV consumption are different between age-groups, due to, among other factors, the difference in expertise with information and communication technologies, media literacy levels and personal and social agendas.

iv. These relationships are complex and varied and are not exhausted by quantitative audience measurements.

v. TV-related leisure experiences do not necessarily start when switching on the TV set and do not end when it is switched off either.

vi. The experience of the subject of media leisure can be known through the analysis of her/his discourse.

Given that "Television does not mean what it once did" [14], we must follow that neither does the study of its audiences. While time and space have been the main parameters in the past century, but XXI st century Audience Studies must deepen in the leisure experience of individuals and communities.

This is the concern that the present project has been initiated with, and the main contribution that it aims to make: to understand and to provide a model to approach television as an enabling, limiting, conditioning and changing reality of leisure experiences.

4.

METHODOLOGY

The aim of this thesis is not to study the times and spaces Basque speakers watch TV (when, where, how much, how many times…) the contents they consume (what, how, through which channels…), or to generalize trends in Basque audiences, but to collect the audiences’ discourses in order to compare and understand their leisure experiences.

37 


The methodological ambition of the thesis is one of understanding, not of totality [16]. The in-depth case study of different generations of basque speakers will provide a "detailed examination of a single example" that "can be used in the preliminary stages of an investigation to generate hypotheses" but not limited to it [5]. This approach will enable a necessarily incomplete but intensive exploration of the meanings and pleasures found and built around television, following the ethnographic statement declaring that “experience shows that intensive study provides understanding, while extensive doesn´t” [8].

The nature and complexity of the phenomenon of human experience requires a qualitative approach for its understanding. In the terms used by Chris Rojek [13], what is needed here is more an ideographic approach than a nomothetic approach: a non-generalizing methodology, more than a generalizing one. My interest lies in the construction of meaning and in the living of meaningful recreation, entertainment and leisure experiences. Therefore the object of study can only be approached through the narration of these experiences, as it only occurs within the person, never outside it. The importance, meaning and significance that the audiences grant to television can only be known through their own expression.

The working language of this PhD thesis is Basque. The text itself will be written in Basque and in English, and the fieldwork (interviews and focus groups) will also be naturally conducted in Basque, although English, Spanish and French will also be employed when indispensable in order to clarify important aspects of the case study to the participants (especially in the case of migrant Basque speakers abroad).

The techniques chosen for the collection of data through the case study will be the in-depth interview and the focus group (both homogeneous and heterogeneous in their composition regarding age-groups). Two pilot focus groups have been completed to this stage, one in Spanish and another one in Basque, which have helped pretest the interview and group- discussion scripts and to determine their limits and reach regarding the objectives proposed.

5.

PROVISIONAL

THESIS

INDEX

OF

THE

This PhD thesis will have three distinct parts:

Section 1: Theoretical framework

Chapter 1: Leisure and leisure experience Chapter 2: Television and audience studies Chapter 3: The media ecosystem

Section 2: Analysis of the experiences of the audience

Chapter 4: Methodology of fieldwork Chapter 5: Analysis and results of the Case Study Chapter 6: Comparison and typology of leisure experiences

Section 3: The audience analysis model

Chapter 7: The audience analysis model Chapter 8: Conclusions and recommendations

At this point in time the index of the thesis is in a provisional state, as the author is in the process of drafting the theoretical section. The methodology of the fieldwork and the definition

of the Case Study will be nurtured with the contributions of

the present conference and redefined during in a predoctoral

stay in during 2011.

6. ACKNOWLEDGMENTS

I would like to thank my two supervisors, Dr. Iratxe Aristegi

Fradua and Dr. Aurora Madariaga Ortuzar. Without their help this research project would not have been the same, as well as

to Jone Goirigolzarri Garaizar for her invaluable help and

constant support.I also want to thank all sixteen participants in the two focus groups that have constituted the first methodological pretest.

7. REFERENCES

[1] Callejo, J. La audiencia activa. El consumo televisivo:

discursos y estrategias. Centro de Investigaciones Sociológicas (CIS), Madrid, 1995.

[2] Coste Cerdan, N. and Le Diberder, A. Romper las cadenas. Una introducción a la post-televisión. Gustavo Gili, Barcelona, 1990.

[3] Cuenca Amigo, J. and Landabidea Urresti, X.

El ocio mediático y la transformación de la experiencia en

Walter Benjamin: hacia una comprensión activa del sujeto receptor. In VIII Congreso Vasco de Sociología y Ciencia Política Sociedad e Innovación en el Siglo XXI. (Bilbao). 2010, 33.

[4] Cuenca, M. Las artes escénicas como experiencia de ocio creativo. In Cuenca, M., Lazcano, I. and Landabidea Urresti, X. eds. Sobre ocio creativo: situación actual de las Ferias de Artes Escénicas. Universidad de Deusto., Bilbao, 2010, 13.

[5] Flyvberg, B. Five misunderstandings about case-study research. In Seale, C., Gobo, G., Gubrium, J. and Silverman, D. eds. Qualitative Research Practice. Sage, London and Thousand Oaks, CA, 2004, 420-420-434.

[6] Gillespie, M. Media audiences. Open University Press, Mainhead, 2005.

[7] Haywood, L., Kew, F., Bramham, P., Spink, J., Capenerhurst, J. and Henry, I. Understanding leisure. Stanley Thornes, Cheltenham, , 1995.

[8] Herskovits, M. J. Some Problems in Ethnography. In Spencer, E. F. ed. Method and Perspective in Anthropology. University of Minnesota Press, Minneapolis, 1954.

[9] Landabidea Urresti, X. Hacia una aproximación cualitativa

a las experiencias televisivas de distintas generaciones.

(2009).

[10] Landabidea Urresti, X., Aristegi Fradua, I. and Madariaga Ortuzar, A. Aisiazko praktikatik aisiazko esperientziara: ezinbesteko berrikuntzak telebista-audientzien ikerketan. (2011).

38 


[11] Maiztegui, C., Martinez, S. and Monteagudo, M. J. Thesaurus de Ocio. Universidad de Deusto, Bilbao, 1996.

[12] Neulinger, J. The psychology of leisure. C.C. Thomas, Michigan, 1981.

[13] Rojek, C. The labour of leisure: the culture of free time. SAGE Publications Ltd, , 2009.

[14] Shimpach, S. Television in Transition. Wiley-Blackwell, Chichester, UK, 2010.

[15] Stebbins, A. R. Choice and experiential definitions of leisure. Leisure Sciences, 27( 2005), 349-352.

[16] Velasco, H. and Díaz de Rada, Á. La lógica de la investigación etnográfica. Trotta, Madrid, 2006.

Mobile TV: Towards a Theory for Mobile Television

Luis Miguel Pato

Dept. of Communication and Art

University of Beira Interior

Covilhã, Rua Marquês d'Ávila e Bolama

(+351) 275 319 700

luis13pato@gmail.com

ABSTRACT

With people's successful adoption of mobile devices and the imminent change to Digital Terrestrial Television, the transition of TV to the emergent mobile scenarios is a foreseeable future. In this upcoming scenario viewing patterns and behaviors are defined by time dimension, place and social context. It is within these specifications that the transmission of personalized television through mobile phones is believed to have a tremendous end-user impact. In this doctoral investigation our aim will be to measure this aspect in a country where this emergent media is verifiable (Portugal). Our methodology proposes the apprehension of mobile television’s (mTV) reality through fundamental social and theoretical assumptions, interviews with Portugal's mTV market’s basic players and a statistical evaluation of Portuguese's mTV's viewing motivations and consequent satisfaction levels. We intend to base our theoretical framework on the media gratifications theoretical perspective and through a laboratory sessions with three samples of mTV users/adopters in Portugal. Through this approach we believe it is possible to apprehend mTV's usage and current reality in Portugal and possibly its foreseeable future.

Categories and Subject Descriptors

H.4.3

Information Systems]

[Communications

Applications],

H.51

[Multimedia

General Terms

Measurement, Experimentation, Human Factors, Theory.

Keywords

Mobile Television, Expectancies, Satisfaction Levels, Consumers, Portugal, Media Theory

1.

INTRODUCTION

Today, wireless communication networks are believed to be one of the fastest evolving issues in contemporary societies. From a media theoretical perspective McLuhan’s historic globalization desire is defined through today’s general access to media content and technological devices to consume it without geographical boundaries. Therefore, it is understandable that the Mobile Phone is considered generally as the trademark of contemporary society. In Innis’s terms we could say that it (this device and its impact) defines the society that we live in [ 1 ] . However, when we regard the possibility of consuming television through a mobile phone, it is a whole different story. First, it is important to understand that we are talking about the convergence of two successful case studies –Television and this medium’s consumption through an enhanced end-user mobile media experience in a cell-phone. In addition, with the deadline for the implementation of DTT (Digital Terrestrial Television) fixed in 01 of January 2012 and the progressive adoption of DVB – H, mobile television (mTV) is defined as a fundamental “killer application” of TV’s near future. We also have the ever- more present use of YouTube to watch mTV. In this study we consider both of these realities as mTV. So, we could say that there are some issues to solve regarding the sources and forms of diffusing this type of television. Nevertheless, besides these issues, we regarded that most of the current trails and academic research are based on mere technical analyses. And, to our understanding, this aspect represents a serious problem because,

in a world where wireless telecommunications emerge quickly,

overlooking data provided by early adopters may be regarded as

a problematic issue. Why? Well, through the media studies’

long history of researching and evaluating new mass media, one basic idea has always emerged – early end user acceptance is precious. Therefore, our perspective is of a more social order because we believe: “it’s about the people (…) not just the technology” [ 2 ] .

2. BACKGROUND

Today, several realities are associated to the term:

Mobile TV. Nevertheless, they all mean the same: television diffused through mobile platforms. When we look at the basics -these television contents are based on live broad-cast emissions (Pull) and Push – and – Store for a quasi PVR (Personal Video Recorder) television consumption experience. When it comes to methods of delivery we have satellite (DMB, GPRS) cellular

39 


operators (UMTS, CDMA) and terrestrial (DVB-H and WiFi) [ 3 ] , [ 8 ] . When it comes to the delivered television content, through the revision of the literature, we observed that the grids are basically completed with the re-emission of recycled broadcast television programs. The same thing happens in Portugal – the country where our doctoral investigation shall be realized. We believe that consumers may not be satisfied with this issue. However, this is nothing new. Historically, we can regard that for example: “the railway did not introduce movement, or the wheel or road (…) but it enlarged the scale of previous human functions, creating new kinds of work and leisure” [ 4 ] . Therefore, we believe it is safe to assume that an experimental phase is needed when any new media is developed. We also believe that mTV is in this phase [ 5 ] . It still has unsolved identity issues [ 6 ] , [ 7 ] . Currently, the mTV market is defined through the following realities of re-used broadcast television [ 5 ] :

TV in your Pocket

It is too pure and simply rebroadcast the television programs that are emitted first on conventional television programming grids [ 5 ] . If we look at contemporary mTV corporate options, we may observe that this concept can be characterized through the inherent idea of a promise of an individualized, personal television experience but without specific mTV content [ 5 ] . The user may personalize his television experience but the basis will always be the conventional reality. This kind of mTV rebroadcasting of regular television linear content is defined as: “Simulcasting Linear TV” [ 7 ] . Still in this topic, we can also retrieve the concept of “Repurposed TV” where existing content is recycled for the mobile medium with minimal adaptation – basically it is the same content as what is aired on the regular TV grids; however, counterparts are split up into smaller segments or are cropped to better suit the smaller screens of mobile devices [ 9 ] .

TV anytime, anywhere This concept’s basis can be observed the release of the television viewers from the constraints of the obligation of consuming television in a specific place. This theoretical approach intends to highlight the consumer’s ability to control their medium to an extent in which they may choose how, where, when, and what kind of TV content they consume [ 10 ] .

TV on the Go It promotes a “fast-food idea” of television. Therefore, there is the intention of emphasizing the differences among mobile and traditional TV viewing [ 5 ] . We agree with this perspective, however. Why? Because when we think about mTV we can observe that mobile devices are operated at “arm’s length” and continued viewing can cause eye discomfort and eyestrain [ 11 ] . Therefore, we also consider that television content for this type of television must have a short duration. This kind of content is mobile specific – “a necessary final step in the evolution of mobile television” [ 9 ] . Shani Orgad considers that the mobile phones’ small screen, shorter usage duration, noisier usage environment should lead to a new visual grammar that will eventually be expressed through mobile specific content [ 5 ] .

Enhanced TV

In a similar mode as the one that happens with interactive television – this perspective is what some authors define as an “out of the box” issue [ 12 ] . In simple terms this

40 


proposal is based on the interactive possibilities that characterize television – it regards the potential creation of innovative manners of including users and tailoring media contents to satisfy individual needs [ 5 ] , [ 12 ] . In what regards specifically mTV there has been a large discussion on the potential that this reality has in providing a platform for user generated content [ 5 ] .

Through these various stages, we feel that we are obliged to ask: “Does mTV enhance a new television experience?” Or, is it possible to conclude that: “Conventional television interaction and consumption habits are not enhancing a new television experience”? As an end-note, and in an attempt to answer these questions, we can say that currently the specificity of mTV content is the main discussion. However, currently linear content is an essential reality in the present and future of mTV content because currently this kind of television is regarded in a very McLuhanian manner – as an extension of the classical media. Or as a parallel media reality that exists side by side with classical media [ 4 ] . However, one thing is sure: plenty of issues are still unsolved and therefore this reality is still unclear.

We believe that television diffused through a mobile phone has should apprehend the unique benefits of this gadget [ 13 ] . Therefore, we believe that with the application of this kind of TV, consumers might not be satisfied with mTV’s current reality Why? When we consider the specifications of mobile phone use we can see that they are fragmented and, therefore, their media consumption desires are individualized and divided through various fixed and mobile media platforms [ 14 ] , [ 15 ] , [ 18 ] . When it comes to its applications and content, we believe that mobile phone user’s demand interactive, flexible, enhanced, personal, and context-aware media realities [ 7 ] . And the same thing occurs with mTV [ 9 ] . So, we believe that we can define these aspects as: possible mTV expectations. So, can redistributed mTV content satisfy these needs?

3.

THEORETICAL

PERSPECTIVE

When we think about the motivations in which we can support the potential desire for “Mobile TV” (MTV), we believe it is important to look at this issue from a theoretical point of view. Therefore, we will observe this topic through the “Uses and Gratifications Theory” (UGT). The justification for the use of this theory is due to the simple fact that we suggest a theoretical approach based on investigating an active audience with their media and the UGT perspective considers this reality [ 11 ] . Its theoretical approach intends to understand consumers’ motivations and concerns in the context of media use. We approached this theory through previous investigations that we believe can define the mobile phone’s current converged reality – Television [ 11 ] , [ 19 ] , [ 27 ] , [ 26 ] , [ 28 ] , Internet [ 20 ] , Digital Television [ 23 ] . Internet, Computer Mediated Technology [ 22 ] , [ 24 ] and Cellular Phone [ 29 ] , [ 21 ] . However, we observed that there does not exist any UGT study that investigates the “Mobile Television” (mTV) reality. Thus, this aspect gave this investigation the desired originality and the desired scientific contribution because we believe that we can expand this theories’ theoretical approach by including mTV reality.

As an end-note, we believe that it is important to state that the selected UGT based studies all indicate that non- tangible issues (of emotional nature) are, in fact, the most important elements regarding the expectations that consumers

have of their media. This aspect is fundamental to develop this doctoral study’s inquires for the selected samples. Through the choice of the UGT theoretical concept, we observed that its application is going to help this investigation understand “how” and “why” consumers use their cellular phones to watch television. This aspect leads us to our investigation methodology. These aspects shall be regarded in the following part of this paper.

4.

METHODOLOGY

This investigation will have a fourfold perspective that is divided in the following phases:

4.1.

The revision of the literature

In this part our main intention is to identify the problem – mobile television end-user expectancy and acceptance – and explore it through various theoretical perspectives. Since mobile television (mTV) literature is scarce, we will retrieve all the important media and social academic investigations that are believed to be fundamental. We shall start through McLuhan’s perspective. We chose this approach because he was the first theoretician to identify the concept of the contemporary “active – audience”. He implied that it desired constant connectivity and communication without any geographical boundary concern [ 4 ] . Since we are focusing on the social apprehension of a wireless technology and its progressive social changes, we shall also retrieve other social academic analyses. Given that we are evaluating the consumer’s mTV interaction, we selected the Media Effects Theory. The reason for this selection is because we will study mTV’s effects on its consumers. However, as expressed earlier, we also intend data regarding end-user expectancies and satisfaction levels towards mTV consumption, the Uses and Gratifications Theory (UGT) suits this objective perfectly. This sub traditional premise of the media effects theory focuses on investigating the motives behind the selection of a certain mass media by consumers [ 11 ] . This theory also allows an experimental or quasi-experimental approach where the manipulation of the evaluated data to respond to the purposes of discovering motives and media selection patterns is possible [ 11 ] . And since this investigation includes a laboratory phase, this theory’s perspective is necessary for the apprehension of the consumer’s expectancies and satisfaction levels towards the use of mTV. Besides this academic perspective, we intend to consult current academic journals, market reports and Whitepapers. This way a multidisciplinary theoretical perspective is always guaranteed.

4.2. Data analyses of previous mTV user studies

Since we are approaching an emergent technology in the media corporate scenario, we believe that it is important to analyze end-user studies conducted through mobile television trails. These investigations might help us apprehend the contemporary “Mobile Television” (mTV) reality from the market’s perspective. However, in these scientific productions all of the narrated trails are industry driven events and thus, these studies results must be analyzed with caution. Nonetheless, and since we do not believe that this data may complete an mTV market perspective, we also intend to conduct semi-structured interviews with “Mobile TV” Channel/Project and mTV distribution Directors/ Project Managers. This leads us to the following investigation moment.

41 


4.3. mTV market expert session of interviews

The potential lack of professional based data regarding mTV will be completed with other sources of information – experts in the field of mTV, for example. We believe that this approach is important to apprehend the mTV panorama. Therefore, we selected a panel of mTV experts that will be composed of two types of experts – Portuguese Television and TV Production companies (RTP, SIC, TVI and Produções Fícticias) as well as professionals that are responsible for wireless networks companies (TMN, Vodafone, Sapo – Sapo Mobile and Optimus). The causes for the selection of these experts lies in the fact that when it comes to mTV content production and progressive emissions, only the referred national televisions have ventured in this emergent market. Evidently the wireless companies are responsible for the support of these mobile emissions. Besides this corporate reality we might also include interviews with those experts regarding mTV that have demonstrated their expertise through academic and scientific publications in the fields that we are studying.

4.4. Laboratory evaluation of mTV interaction

We believe that the previous research moments shall provide us with the necessary elements to develop two laboratory sessions with end-user samples. Thus, for these phases we shall select three samples of mTV users composed by teenagers, young adults and middle aged people. We estimate that we may need at least 200 individuals. Through these heterogeneous samples we believe that a general apprehension of Portugal’s mTV reality is possible. In the first part of the Laboratory sessions, we shall consider what variables (expectancies) may be defined as: unique expectancies for mTV consumption. So, before the first mTV’s interaction moment, we will apply a questionnaire with closed questions based on estimating expectancies. These evaluation elements will result from previous UGT investigations regarding mobile phones, television and the Internet – technical and media elements that are converged by current mobile phones and progressively mTV. After this first evaluation moment, each user shall watch a previously prepared current national mTV applications session with programs that reflect its current market offer in Portugal. The TV genres that will be evaluated are news and entertainment programs that are broadcasted through mTV emissions and other contents that have been downloaded previously to the mobile phone that we will use in these sessions. After this interaction moment, we shall apply to each user a second questionnaire to apprehend data regarding the satisfaction levels of the previous expectancies. Thus, through this approach, we believe we will outline mTV’s unique expectancy variables. In both of these evaluation moments we will apply a nine point Lickert Questionnaire starting with – “Strongly Disagree” and ending in – “Strongly Agree”. Since we are dealing with emotional aspects, we will also apply an Osgood Semantical Scale in both of these evaluation moments to observe if a shift of the end-user’s opinion occurs. All collected data from these sessions shall be statistically evaluated by SPSS statistical software. Since we are still in an initial phase we can include a similar mTV end-user evaluation moment in an exterior environment thus excluding any eventual bias in the proposed investigation.

5.

As an end-note we could say that mTV assures new, engaging and customized promises of television experience. In

CONCLUSION

our investigation we will attempt to explain how this reality will occur in Portugal and evaluate mTV’s usage and current reality from the end-user’s perspective. Through this approach we consider that we will achieve an identity of the Portuguese national mTV market from the end-user’s expectancy perspective and see if the current offer is satisfactory. Thus, we can propose some changes in this reality. Another foreseeable conclusion resides in the fact that mTV will be considered a parallel market regarding the national broadcast TV. Besides these points we also consider that an understanding if mTV enhances the cell-phones specifications or the medium’s characteristics will be accomplished. Through the selected end- user based approach we believe we will apprehend the unique dimensions of the consumption of this type of TV in Portugal and thus expanding the UG theories’ development. However, since this paper represents a doctoral investigation that is now at its beginning, these aspects have yet to be proved through the scientific approach that we intend to develop in this study.

6.

REFERENCES

Innis, H. 2008. The Bias of Communication, Toronto, University of Toronto Press. Norman, D. 2002. The Design of Everyday things, New York, Basic Books.

Knoche, H., McCarthy, J., & Sasse, M. 2005. Can small be beautiful?: assessing image resolution requirements for mobile TV. In Proceedings of the 13 th Annual ACM International conference on Multimedia. (New York 2005). Multimedia '05. ACM Press, New York, NY, 1-10. DOI= 10.1145/1101149.1101331.

McLuhan, M. 1995 Understanding Media: The Extensions of Man, Cambridge Mass USA, Cambridge Press.

Orgad, S. 2006. This box was made for walking. London School of Economics.,

http://europe.nokia.com/NOKIA_COM_1/Press/Press_Eve

nts/mobile_tv_report,_november_10,_2006/Mobil_TV_Re

port.pdf.

Carlsson, C., & Walden, P. 2007. Mobile TV-to live or die by content. In System Sciences, 2007. HICSS 2007. 40th Annual Hawaii International Conference. IEEExplore, Waikoloa, HI. Doi = 10.1109/HICSS.2007.382.

O'Hara, K., Mitchell, A., & Vorbau, A. 2007. Consuming video on mobile devices. In Proceedings of the SIGCHI conference on Human factors in computing systems. New York DOI = 10.1145/1240624.1240754.

Kumar, A., 2007. Mobile TV: DVB-H, DMB, 3G Systems and Rich Media Applications, Oxford, Focal Press.

Marcus, A., Roibás, C., Anxo 2010. Mobile TV: Customizing Content and Experience Vol. 1., London - United Kingdom, Springer.

Södergård, C. 2003. Mobile television-technology and user experiences. Report on the Mobile-TV project. VTT, Finland.

Katz, E., Blumler, J., & Gurevitch, M., 1973. Uses and gratifications research. Berverly Hills – California, SAGE Pub.

Gawlinski, M. 2003. Interactive television production. London:

Focal Press.

Ahonen, T. T. 2008. Mobile as the 7th of the Mass Media, London, Future Text Ltd.

Ling, R., 2004. The mobile connection: The cell phone's impact on society, Oslo, Morgan Kaufmann Pub.

Ling, R., 2008. New Tech, New Ties How Mobile Communication is Reshaping Social Cohesion, Cambridge – Massachusetts, MIT Press.

Pavlik, J. V

2008. Media in the Digital Age Columbia, New

York, University Press.

Personaz, J. J. I. 2006. Métodos Cuantitativos de Investigación en Comunicación., Barcelona, Editorial Bosch S.A

Levinson, P., 2004. Cellphone: The story of the world's most mobile medium and how it has transformed everything!, New York, Palgrave.

Conway, J., & Rubin, A., 1991. Psychological predictors of television viewing motivation. Communication Research, 18(4), (Jul. 2010), 443.

Ko, H., 2002. A Structural Equation Model of the Uses and Gratifications Theory. Ritualized and Instrumental Internet Usage. Retrieved from https://listserv.cmich.edu/cgi-

bin/wa.exe?A2=ind0209&L=aejmc&T=0&O=D&P=2218

2.

Leung, L., & Wei, R. 2000. More than just talk on the move:

Uses and gratifications of the cellular phone. Journalism and Mass Communication Quarterly, 77(2), 308-320.

Lin, C. 2001. Audience attributes, media supplementation, and likely online service adoption. Mass Communication and Society, 4(1), 19-38. Retrieved from

http://pdfserve.informaworld.com/225045_778384746_78

5315275.pdf.

Livaditi, J., Vassilopoulou, K., Lougos, C., & Chorianopoulos, K. 2003. Needs and gratifications for interactive TV implications for designers. In System Sciences, 2003. HICSS 2003. 36th Annual Hawaii International Conference. IEEExplore, Waikoloa, HI. Doi =

http://doi.ieeecomputersociety.org/10.1109/HICSS.2003.1

174237.

Papacharissi, Z., & Rubin, A. 2000. Predictors of Internet use. Journal of Broadcasting & Electronic Media, 44(2), 175- 196. Retrieved from

http://pdfserve.informaworld.com/918041_778384746_78

3685029.pdf.

Papacharissi, Z., & Zaks, A. 2006. Is broadband the future? An analysis of broadband technology potential and diffusion. Telecommunications Policy, 30(1), 64-75.

Rubin, A. 1983. Television uses and gratifications: The interactions of viewing patterns and motivations. Journal of Broadcasting & Electronic Media, 27(1), (Jun. 2008),

37-51.

Rubin, A., & Perse, E. 1987. Audience activity and television news gratifications. Communication Research, 14(1), 58.

Rubin, A., & Rubin, R. 1982. Older Persons' TV Viewing Patterns and Motivations. Communication Research, 9(2), (Mar. 2010), 287.

Chorianopoulos, K. 2008. Personalized and mobile digital TV applications. Multimedia Tools and Applications, 36(1), (Jun. 2008), 1-10.

42 


Enhancing and Evaluating the User Experience of Interactive TV Systems and their Interaction Techniques

Michael M. Pirker

ICS-IRIT 118, Route de Narbonne 31062 Toulouse, France 0033 (0) 561 55 77 07

Michael.Pirker@irit.fr

ABSTRACT

This paper describes the focus of my PhD thesis on how to enhance and evaluate the User Experience (UX) of interaction technologies that are applied in interactive Television (iTV) systems. Interaction technologies for iTV systems are different from standard work on desktop interactions; my thesis will thus describe the following aspects: (a) the usage context (how iTV usage, e.g. in the living room, is differing from other usage

situations), (b) the set of currently available methods on how to evaluate UX and (c) how to enhance the UX of interaction technologies for iTV systems. Given that UX evaluation methods and especially methods that support UX-oriented development are rare, the following research objectives were defined: to understand (1) How users’ UX concepts are related

to interaction technologies that are used for iTV systems and

how an interaction technology does contribute to the overall UX

when interacting with an iTV system. (2) How usability and user experience are related in that specific domain (e.g. does the enhanced UX of a gesture based interaction really contribute to a positive UX in the long term, or is usability the key factor for

a long term use). (3) How to inform the design and

development process to improve UX of the interaction technique and the system (before a product is available), and finally (4) How the consumption of iTV content on a variety of devices (cross-device-usage) will change the overall UX. The main contribution of this phD thesis lies within the developed evaluation methods which should allow to better understand and evaluate the UX of iTV services and their respective interaction technologies in the future.

Categories and Subject Descriptors

H5.m.

Miscellaneous.

General Terms

Measurement, Human Factors

Keywords

User Experience Evaluation, iTV, interaction techniques

Information

interfaces

and

presentation

(e.g.,

HCI):

43 


1.

INTRODUCTION

The living room, sometimes called the “campfire” of the new age, is still one of the most central and important areas in the home. It is a place where people can relax, but also gather together and enjoy leisure activities including entertainment and games. Interactive systems used in the living room are receiving only limited attention from the HCI research community: While there is lots of work on how to improve future generations of games including UX measurement 0, as well as work on social media, personalization, recommendation, and communities, less work is dedicated to understanding the user experience of entertainment applications including interactive TV, especially when it comes to the evaluation of interaction techniques for the living room. Interactive systems that are mainly used in the living room are currently subject to a dramatic change: the ways how to consume TV and other media is changing due to new forms of interactive TV services including IPTV and new generation of TVs. The user does not only have the possibility to watch a certain amount of TV channels; new TVs and Set-Top boxes enable the user to e.g. access Internet on his TV, rent Video on Demand (VOD) movies, play games, access weather and traffic information, watch video clips, communicate with others and use “apps”. Concerning the interaction techniques, controlling the TV and its services is also changing: TV and entertainment services found in IPTV offers today are no longer controlled with a standard remote control, but also simply by the mobile phone, game-oriented input devices allowing motion control (e.g. Nintendo Wii, Free Box 6.0) as well as gesture recognition (Microsoft X-Box 360 Kinect).

When measuring the UX of these new forms of interaction techniques the following problems occur: it is unclear to what extent the user experience of an interaction technique in the living room can be investigated in the same way as in other domains (e.g for a mobile phone). Same holds true for the comparison of user experience evaluation of games: are the same factors important for entertainment activities in the living room? Games are different to standard interactive TV applications, as they are not task-oriented, and typically focusing primarily on the fun aspect, which is not the case at e.g. a VOD service. But is the User Experience in terms of interaction with an interactive TV really comparable to a game? Is it comparable to a mobile phone interaction? Or will we simply fail to understand the User Experience in the living room when applying UX evaluation methods from other areas? We thus see the need to develop specialized methods that are appropriate for the evaluation of interaction techniques for iTV in the living room context. These methods subsequently can help to improve UX of interacting with iTV already in the design process and early development stages.

2. STATE OF THE ART

Given our research goals and objectives, our research is focusing on UX and its evaluation, which will be briefly

discussed in this section. A lot of effort has recently been put in

by

researchers and practitioners alike to find a clearer definition

of

UX and its evaluation methods 0, but nevertheless the HCI

community has still no unified definition of UX. An ISO Standard defining UX exists (ISO 9241-210), but leaves a lot of room for interpretation: “A person’s perceptions and responses that result from the use and/or anticipated use of a product, system or service”. The difficulties in getting a more refined definition of UX are caused by several reasons. UX is associated with a broad range of “fuzzy and dynamic concepts” 0 having a multitude of meanings, ranging from “being a synonym for traditional usability” to beauty, hedonic, affective or experimental aspects of technology usage. Additionally, the term UX is also influenced by several concepts from other areas, like fun, playability, or flow. Within this multitude of concepts, it has been pointed out 0 that the inclusion and exclusion of particular variables seem arbitrary, depending on the author’s background and interest.

For our research on understanding UX in the living room, we compiled a working definition of UX, based on definitions by Hassenzahl & Tractinsky 0 and Desmet & Hekkert 0:

“The user experience when interacting with an iTV system in the specific living room context is mainly influenced by: the subjective perception of the quality of experience that is elicited by the interaction of a user with the interactive TV system, which may change dynamically depending on the situational context of usage and time. Factors influencing the quality of experience include feelings and emotions that are elicited (emotional experience), the degree to which our senses are gratified by the system (aesthetic experience), meanings and values that are attached to the system, the perception of system characteristics like utility, purpose and usability, and how well these factors fit the current situational and temporal context.”

In the current literature, UX is described as being dynamic, context-dependent, and subjective (individual) 0. It highlights non-utilitarian aspects of interactions, shifting the focus to user affect, sensation, and the meaning as well as value of such interactions in everyday life 0. More generally, UX focuses on the interaction between a person and a product or service, and is likely to change over time and with an embedding context 0,0.

A broad variety of UX evaluation methods is available today.

To measure the user experience beyond the instrumental, task- based approach, Hassenzahl introduced the AttrakDiff 1 questionnaire. Approaches focusing on the evaluation of emotion and affect include approaches that evaluate the emotional state of the user with questionnaires, while other evaluation approaches include physiological measurements or evaluation of valence and arousal. To evaluate situational or temporal experiences, some approaches in mobile UX exist, using conceptual-analytical research and data gathering techniques 0. For prototypes, usability evaluation methods can be enhanced by including experiential aspects to the evaluations, e.g. experience sampling in long-term field trials 0. To be able to get a clear picture how UX changes over time, it































 





























 


1 
See
also
 http://www.attrakdiff.de/AttrakDiff/Publikationen/ 


44 


has also been proposed 0 to measure various aspects of UX both in different contexts and at different points of time.

For the development and application of UX evaluation methods, it is important to start from a clear definition of UX 0 with an appropriate underlying model 0. The formal definition of UX issued by ISO suggests that UX can be measured in a way similar to the behavioral and attitudinal metrics of usability (i.e. users’ performance and satisfaction) 0. As a result of the still ongoing research to define the scope of UX, current methods, techniques and tools used to evaluate UX are most of the time taken from the large pool of traditional usability methods 0, thus established techniques such as questionnaire, interview, and think-aloud remain important for capturing self-reported data 0. For the development of a new UX evaluation approach it would be important to understand the relationship of UX to other factors which are important for the development of interactive systems. Especially usability seems to be connected to user experience and is likely to be a sub-factor within UX 0, which also matches our position (cf. working definition above), while others 0 just see it as a source of product experience.

Based on our research goals and objectives, other research topics that will be investigated within the thesis, but will not be further discussed in this paper due to page limit constraints, include: the evaluation of interaction techniques; research about influences of the usage context, e.g. for cross-device usage; design- and development methods which are supporting UX- models; and models that explain the interrelation of usability and UX.

3. RESEARCH PROBLEM

The goal of the presented research is to develop a set of methods to better capture the UX of interaction technologies, as well as entertainment services and systems in the living room, focusing especially on interactive Television (iTV). The living room itself incorporates a special usage context and serves many specific usage situations during leisure time. This includes various aspects of entertainment and social activities, where different usage situations are arising when using different devices, some of them passive and laid-back, others requiring active usage and participation. The factors context and usage situation heavily influence the user experience when interacting with an iTV system; while the users likely wants to change the volume simply by pressing a button on the remote control blindly while being immersed in watching a movie, other activities, especially games-related ones, may be enhanced by performing gestures to interact with the user interface (as can be observed with recent developments for games with gesture input e.g. Microsoft Kinect). The major problem is that currently available UX evaluation methods do not support various aspects that we are interested in our research – e.g. factors related to the properties of the remote control or the interaction technique itself.

Evaluation of UX in games showed that user experience can be quite independent of usability. While games have to provide a minimum degree of usability (e.g. possibility to control the game), it is just a sub-factor amongst other factors within UX (e.g. presence, involvement, and flow 0) that seem to shape the UX more intensely and gain a lot of importance once a certain level of usability is given. UX evaluation in games today includes a broad variety of factors, one being playability 0 amongst others. In the context of the living room, it can be assumed that different factors are of importance and influencing the media usage and UXs than in a work environment: e.g. voluntariness or mood may be named as a major difference between work and leisure.

Another important aspect that has to be kept in mind, especially when focusing on the evaluation of interaction technologies in the living room, is the differentiation between the content that is delivered via the means of a certain device and the usage experience of the device itself. Existing evaluation methods tend to focus on either a certain aspect of UX or still on basic usability targets 0. Combined methods (e.g. Attrakdiff) exist but seem to lack some aspects of importance for our focus area, the living room and iTV, like haptic properties of the remote control that could influence UX. Another question is if the UX evaluation should be included in the usability evaluation or whether it should be evaluated separately or not – and if, when, and how.

Thus, the research focus should be on the identification, analysis and evaluation of factors that are important and contributing to UX in this specific context of use, the living room, if possible at the real location of usage and within a normal usage situation, keeping in mind and being adaptable for recent and future technological changes as well as changes in usage situations.

4. RESEARCH GOALS

The research objectives are: to understand (1) How users’ UX concepts are related to interaction technologies that are used for iTV systems and how an interaction technology does contribute to the overall UX when interacting with an iTV system. (2) How usability and user experience are related in that specific domain (e.g. does the enhanced UX of a gesture based interaction really contribute to a positive UX in the long term, or is usability the key factor for a long term use). (3) How to inform the design and development process to improve UX of the interaction technique and the system (before a product is available), and finally (4) How the consumption of iTV content on a variety of devices (cross-device-usage) will change the overall UX.

This leads to the research goal, which is to develop a set of methods to better capture the UX of interaction technologies, services and systems in the living room, focusing especially on iTV. The methods should fit the living room context and properly incorporate factors that are important to evaluate the UX of media usage and interaction technologies for this context from a user’s perspective. These methods should allow evaluating UX of a system and its accompanying interaction technologies quickly and easily applicable during product development as well as for existing products. The set of methods developed within this phD thesis are aiming to be general enough to be applicable for various devices and interaction technologies, taking into account recent and future technological changes, while at the same time being focused enough to still properly grasp the UX of media and interaction technology in the living room. This will be approached by thoroughly choosing and addressing UX factors that seem to have high importance and impact in this context of usage, identified within current UX literature as well as during studies focusing on this issue. The methods thus do not claim to provide a comprehensive evaluation of the multi-facetted construct of UX, but are rather trying to provide valuable insights for our small area of research.

5.

METHODOLOGY

In order to identify factors that are influencing UX, a literature review has been conducted as a first step to get an overview on concepts, evaluation methods and related work, followed by research conducted to identify factors from a user’s perspective.

5.1. Previous Work

45 


In previously conducted studies, we already compared field usability studies to lab usability studies, where we evaluated the same system in both conditions 0. Within the field study, we already addressed the topic of user needs during the pre- interview in order to identify important aspects from a user’s perspective. During the study, participants stated that they wanted the system to be easy to handle, user-friendly, and without the need of an operating manual. Other user needs stated were individualization and safety issues, as well as the reduction of devices via an all-in-one device. UX has been evaluated in this trial using the AttrakDiff questionnaire. Concerning the evaluation of Interaction Technologies, we also conducted a lab study, comparing touch-based to button-based interaction (using the same remote control shape and functionality) and investigating the relation of user experience and usability 0. iTV usability might still be an important factor in the early usage phases of the system (allowing to access content), but user experience is becoming more and more important. When investigating the relation between usability and user experience, it has been noted that for the compared product, a remote control, good usability values do not necessarily impose a better UX, and low usability values can at the same time lead to high UX ratings. As a result of a high rating of hedonic quality and a good assessment of the touch- based interaction technology, it is concluded that product design as well as visual appeal are influencing the users’ willingness to use a product.

5.2. Studies to Identify Major UX Factors

In order to address the research goals and get a better understanding of what UX concepts and factors are important for the evaluation of an iTV system in the home, two ethnographically oriented studies have been conducted in 2010. Within these studies, the question which factors are contributing to a positive UX was addressed in order to identify factors that are really important from a user’s perspective and in the real usage context in the home. The studies were conducted in two different countries with overall 69 participating households and 179 participants (149 adults). Besides other topics that are beyond the focus of this paper, factors influencing the UX of media usage in the home and especially in the living room were addressed and led to first insights for the further development of the UX evaluation method. The factors aesthetic experience (including visual and haptic experience), utility, purpose, the elicitation of emotions, functionality and usability were the UX factors that were most stated and relevant for our context of research. Other UX factors that were not named directly but observed during analysis were the need for stimulation and identification, as well as the contextual factors time, place/situation, social influences and whether a device is perceived as personal or not. Others, e.g. the need for diversion, were omitted because they are more content- and not interaction technology related. Also the need for relatedness, respectively its fulfillment, was only observed for technologies that allow communication features and may thus be neglected for our research focus; nevertheless it may gain importance when new services that are offering communication features will reach a mass audience in the iTV sector. Additionally, based on the identified UX factors, our conclusion is that media content is not that much interfering with the evaluation of the iTV system, services and interaction techniques, especially when combining expert and user oriented evaluation, and thus might be neglected, as the influences of the mediated content on the UX are beyond our research focus.

5.3. Current State and Future Work

At the moment, the findings gathered during the ethnographic studies 0 0, combined with those identified in the literature and in previous studies, are used to develop an UX questionnaire for our domain as a first step, which is currently subject to first

evaluations and examination of its validity within user tests. It focuses on the UX evaluation of interaction technologies in the living room, and should allow investigating and measuring UX factors already in early design phases. The preliminary version of the questionnaire and the underlying framework will be presented at doctoral consortium. As described previously in the methodology section, first steps have already been taken within the thesis, a first version of the methodology is developed and in the course of being evaluated. At this point, the thesis has progressed far enough to present first results and receive feedback from the community that is working within the same or related areas to further improve the research within the remaining one to one and a half years of the phD thesis. The doctoral consortium thus should serve as a forum to provide valuable feedback for the further development of the UX methods. Especially interesting would be feedback about the methodology chosen, also regarding the question if all important aspects of UX can be addressed accordingly with a questionnaire and the viability and requirements for expert evaluation. Additionally, feedback about the UX factors identified and how to incorporate other UX factors, how methods could be further combined, what benefits they could offer in the development process and which insights the methods could provide would be interesting topics for further discussion. The community of the conference seems to be an ideal possibility to further discuss potentials of the proposed approach and methodology, possible drawbacks and areas where further investigation might be necessary.

The next steps of the thesis will include the development of expert guidelines that can be used in the tradition of heuristic evaluation to understand if and to what extent future systems support major UX factors. Here the application and adaption of evaluation methods taken from structural and functional playability 0 seem reasonable, as they are already addressing functional (i.e. more usability-related) as well as structural (i.e. more aesthetic-related) concepts which have a substantial interconnection to current UX concepts and models. These guidelines should offer valuable benefits for the fast-paced industrial product development cycle, where other means of UX evaluation may not be appropriate due to project time constraints or time and manpower needed to carry out the evaluation.

6. CONTRIBUTION AND CONCLUSION

To sum it all up: for the evaluation of UX of interactive TV systems and the respective interaction technologies, factors from other areas like gaming and mobile usage, as well as product related factors are important. Based on the factors that we identified in several studies and the literature, a set of methods is being developed that allows investigating and measuring these factors already in early design phases. The current approach is to use method triangulation with a questionnaire as a first step, including evaluation of the user interface, the interaction technique and the orthogonality between interaction and user interface, which will be followed by guidelines for expert evaluation in the future. The main contribution of this phD thesis lies within the proposed framework and evaluation methods in order to better understand and evaluate UX of interaction technologies for iTV and its services in a living room setting. The research conducted to identify the UX factors in this setting will contribute to a better understanding of which aspects are really important in this context, which influencing factors might change the UX and which factors should be included in an UX evaluation method for interaction technologies in the living room. The UX evaluation methods will offer the possibility to quickly and easily evaluate UX within the whole product design and development cycle.

7.

REFERENCES

Bernhaupt, R. (Ed.) 2010. Evaluating User Experience in Games: Concepts and Methods. London: Springer. 2010. Bernhaupt, R., Pirker, M., Weiss, A., Wilfinger, D., Tscheligi, M. 2011. Security, Privacy, and Personalization: Informing Next Generation Interaction Concepts for Interactive TV Systems. ACM Comp. in Entertainm. In press

Desmet. P. M. A. & Hekkert. P. 2007. Framework of product experience. Intern. Journal of Design. 1(1), 57--66.

Hassenzahl, M., and Tractinsky, N. 2006. User Experience - a research agenda. In: Behavior & Information Technology, 25(2), (2006) pp. 91--97.

Hassenzahl, M. and Roto, V. 2007. Being and doing: A perspective on User Experience and its measurement. Interfaces, 72, 10--12.

Järvinen, A., Heliö, S. and Mäyrä, F. 2002. Communication and Community in Digital Entertainment Services. Online

http://tampub.uta.fi/tup/951-44-5432-4.pdf

Law, E.L.-C., Roto, V., Hassenzahl, M., Vermeeren, A., Kort, J.

2009. Understanding, scoping and defining user

experience: a survey approach. In: Proc. CHI09, 719–728.

Law E.L.-C., and Van Schaik P, 2010. Modelling user experience - An agenda for research and practice. Interacting with Computers, 22 (5), pp. 313-322.

Pirker, M., Bernhaupt, R. and Mirlacher, T. 2010. Investigating usability and user experience as possible entry barriers for touch interaction in the living room. In Proc. EuroITV

2010. ACM, New York. 145-154.

Pirker, M. and Bernhaupt, R. 2011. Measuring User Experience in the Living Room: Results from an Ethnographically Oriented Field Study Indicating Major Evaluation Factors. In Proc. euroiTV 2011. Accepted.

Roto, V., Ketola, P. & Huotari, S. 2008. User Experience Evaluation in Nokia, in CHI'08 Workshops, 3961--3964.

Takatalo, J., Häkkinen, J., Kaistinen, J. and Nyman, G. 2010. Presence, Involvement, and Flow in Digital Games. In Bernhaupt, R. (Ed.) 2010. Evaluating User Experience in Games: Concepts and Methods. London: Springer, p. 23-

46.

Vermeeren, A., Law, E.L.-C., Roto, V., Obrist, M., Hoonhout, J. and Väänänen-Vainio-Mattila, K. 2010. User experience evaluation methods: current state and development needs. In Proc. NordiCHI 2010, ACM, 521-530

Wilfinger, D., Pirker, M., Bernhaupt, R., and Tscheligi, M.

2009. Evaluating and investigating an iTV interaction

concept in the field. In Proc. EuroITV '09, 175-178. ACM.

46 


Subjective Quality Assessment of Free Viewpoint Video Objects

Sara Kepplinger

Institute for Media Technology Ilmenau University of Technology 98693 Ilmenau, Germany 0049 (0) 3677 69 2671

Sara.Kepplinger@tu-ilmenau.de

ABSTRACT

This paper presents an overview on the intended contribution to quality assessment of free viewpoint video representations in the video communication use case within the author’s PhD proposal. This proposal will analyze opportunities and obstacles for free viewpoint video objects usage within video communication systems focussing on subjective quality of experience. Quality estimation of emerging free viewpoint video object technology in video communication has not been covered yet and adequate approaches are missing. The challenges are the definition of quality influencing factors, the formulation of a measure, and to link quality evaluation up with technical realization. The paper outlines a description on the theoretical background and intended work. A short description of the related project Skalalgo3d, which offers a useful application framework for the intended work, is included. Preliminary outlined results consist of a tentative research framework, and evaluations conducted so far.

Categories and Subject Descriptors

H.5.1

methodology

[Multimedia

Information

Systems]:

Evaluation

/

General Terms

Algorithms, Measurement, Design, Experimentation, Human Factors

Keywords

Free viewpoint video, video communication, methodology, quality of experience

1.

INTRODUCTION

Free viewpoint video applications enable the user to navigate interactively and freely within a visual real world scene representation. Applications, like free viewpoint choice on DVD, or similar approaches on TV or online, gain more and more attention in the field of interactive media. Free viewpoint video objects (or 3DVOB) usage within the context of video communication may offer sociability and communication support. This can be achieved by technical possibilities to over-

47 


come the obstacles of absent eye contact, or freedom of choice regarding the viewing angle and distance to the dialog partner, for example. These are activities which are possible and usual in real face-to-face conversations. There are different approaches to realize this way of representation using multiple views of the recorded scenes. This complex processing chain can be regenerated in different ways of acquisition, processing, scene representation, coding, transmission, and presentation. This paper is describing the planned efforts within the PhD proposal in order to pay more attention to the user’s perception of these new visual representations allowing interactivity. One goal is to define an extended model or an absolute measure for overall quality including subjective quality assessment. Therefore, the correlation between the used algorithm(s) and achieved quality will be considered. The opportunity of this approach is to gain further insights which may be useful for system adaptivity and processing scalability. The challenges of this scheme are mainly (still) open questions about novel algorithms for image analysis and synthesis on one hand, and the development of evaluation and measurement methods of visual quality on the other hand. This emergent field of research is influenced by several different approaches in both: image processing (e.g. [13], [16]), and the inclusion of subjective quality assessment for overall quality estimation (e.g. [17], [12]). In the following, the most relevant work for the author’s PhD proposal will be outlined, starting with a short introduction into the technical background. This proposal focuses on the quality assessment within the described technical context and use case. Two main questions are being addressed: How to include the subjective quality estimation by the user? How to identify the most relevant quality influencing factors in order to provide an extended quality model supporting technical optimization? This is outlined in the following way: The problem which will be worked on in this research project is stated in section 2 by explaining the theoretical starting point and intended goals. This is followed in section 3 by a general description of the project Skalalgo3d, approaches chosen within the project, and at related work. Section 4 describes the planned methodological approach. First evaluation steps and preliminary results of previous research will be outlined in short in section 5. This is concluded by section 6 Discussion leading to future work.

2. STATEMENT OF THE PROBLEM

The theoretical starting point is published research concerning the definition of a quality measure for free viewpoint video objects which was developed at the Ilmenau University of Technology [11]. This measure includes definitions of influencing quality parameters as well as measurable characteristics based on objectively quantifiable errors. It is clearly outlined in the description of the measure, called 3DVQM, that it is open in terms of extension by subjective

quality estimation and the definition of to-be identified quality influencing factors and their evaluation [5]. Initial efforts towards an extension of measurements by subjective quality of experience were already made under the usage of synthetic free viewpoint video objects [3]. However, up to now, subjective assessment of natural free viewpoint video objects and the resulting user experience received only little attention and demand more efforts in incorporating early user inclusion [11] (see also section 3. Related Work). A video communication use case is used as a framework regarding eye-contact and other communication based factors. Based on this, and on the technical further development in terms of processing steps, the aim is the identification of further quality influencing factors based on subjective quality assessment. This will lead to the definition of terms and quantifiers and proposed patterns of

application of the results to prospective technical developments. Throughout the author’s experience, literature analysis, and work within the project, a number of questions arose for the PhD proposal:

What kind of methodology accounts for reliable subjective quality assessment of free viewpoint video objects in the particular use case of video communication?

Which further factors influencing free viewpoint video object quality can be identified?

To which extent do factors, identified by means of subjective quality assessment, influence the overall quality of experience?

How can the identified factors benefit prospective technical development and processing algorithms of free viewpoint video objects?

These questions within this interdisciplinary approach address mainly methodological questions and practical realization within the area of human computer interaction as well as intended impact on processing development. The novelties the PhD intends to bring about are definitions of (further) quality influencing factors, the way of linking quality evaluation up with technical realization of free viewpoint videos, and therefore the formulation of an adequate measure.

3. RELATED WORK

The work related to the PhD proposal consists of three main topics. Once, there are the evaluation approaches with similar goals. Then, there is the technical realization of free viewpoint video objects in general and their usage for eye contact support in video communication. Preliminary, the project Skalalgo3d is described in this section as the PhD proposal arose within the framework of this project.

3.1. Skalalgo3d

The project Skalalgo3d (Scale able algorithms for 3D video objects under consideration of subjective quality factors) intends to improve free viewpoint video objects and eye contact support used within the context of video communication. The project work of Skalalgo3d is divided into two general working areas. These are the technical realization of free viewpoint video objects and the identification of subjective quality factors. This concerns the optimum processing as well as qualitative displaying under different conditions. It is funded by the German Research Foundation (DFG).

3.2. Technical Realization of 3DVOB

In general, the procedure of 3DVOB generation starts with the acquisition of a time variable and a three dimensional object. The methods of the reconstruction processes differ in principle

[5]. They can be either model based or based on disparity analysis or a combination of both. The differences are due to different usage of interpolation, warping, morphing and the recording of several different camera views. The technical development within the project Skalalgo3d is based on the following actual processing chain as shown in Figure 1. This includes the software usage of MATLAB and the project’s internal ReVOGS (Realistic Video Object Generation System).

A person is recorded by at least two ordinary webcams. This is

followed by the processing of first representations out of the recorded scene. This may include adequate pre-processing like colour correction, keying, and calibration. Thereof, second representations are generated by rectification and analysis for

accurate disparity determination.

Acquisition Calibration Rectification ReVOGS two MATLAB (p.r.n. MATLAB stereo- and ReVOGS) two ground- Internal
Acquisition
Calibration
Rectification
ReVOGS two
MATLAB (p.r.n.
MATLAB
stereo- and
ReVOGS)
two ground-
Internal and
or
external
camera
parameters
Color Correction
MATLAB
Keying
Analysis
Synthesis
Manually
Disparity
(Combustion),
Determination
later: automatic
with MATLAB
ReVOGS
algorithm

Figure 1. Current status of 3DVOB generation in

Skalalgo3d

After this, the view synthesis leads to the intended 3DVOB, provided by different and new views.

There are different approaches available for the view synthesis, the disparity analysis and refinement, as well as for the usage of 3DVOB for eye-contact support. They are outlined in following sub-sections.

3.2.1. Different disparity and synthesis methods

Within the development process of the most adequate algorithms to create a qualitative 3DVOB representation, different approaches, concerning disparity and synthesis methods, are considered. These approaches differ in their cost- benefit ratio.

Table 1. Summary of used disparity and synthesis methods

Processing step

Different methods used

 

Linear

Linear interpolation plus median filtering

View Synthesis

interpolation

Disparity analysis

Windowed NCC cost measure

Disparity refinement

with / without hole filling after cross check

with / without temporal cleaning

In Table 1 a summary of up to now used methods is given. The

view synthesis is either only done by accounting the neighbour pixel, or a classical equalization filter is used additionally, in order to reduce the so called “salted pepper noise”, a visual

disorder. The disparity analysis undertaken for the test items is realized by the classical usage of a cost based measure. Within

48 


the refinement differences are made by the usage of a hole filling filter or temporal cleaning. These different approaches may result in differently perceived quality concerning representation. A more detailed description on the analysis of disparity in general is outlined in [13].

3.2.2. Support of eye contact in video communication

The problem of usual video communication systems is the impossibility of eye-contact.

communication systems is the impossibility of eye-contact. Figure 2. Problem of eye-contact in video communication The

Figure 2. Problem of eye-contact in video communication

The user either has to look on the display to get information or to look into the camera in order to simulate eye-gaze (see Figure 2). Eye-contact is seen as a critical factor in the fields of communication, psychology, and sociology. In 1976 [1] analyzed the role of gaze and mutual gaze in conversations and communication. A possibility offered by the usage of free viewpoint video objects is to support eye-contact via video communication on computer, television, or mobile devices. This can be realized by ways of eye-adjustment or the use of the so called Wollaston illusion by adjusting the displayed person’s position (without manipulating the eyes). Skalalgo3d allows this support by a virtual camera positioning. This approach is described to some extent in [8]. There are already some approaches published concerning eye-contact support. In [10] an approach of virtual view image synthesis for eye-contact in TV conversation system is described. In [15] the effects of gaze direction and depth on perceived eye contact and perceived gaze direction compared between 2D and 3D display conditions are described. In [2] the role of eye gaze in avatar mediated conversational interfaces is analysed. One possible approach to technically realize eye contact via a camera/display system for videophone applications is described in [7].

3.3. Existing Evaluation Methods

Objective quality measures compute metrics representing compare able reference values, mostly focused on technical feasibility. Several defined measures, parameters, and assessment methods are available in order to rate general video quality objectively [17]. Subjective quality measurements intend to include the users and their opinion. This is expressed for example via judgment (e.g. yes/no) or adjustment (e.g. user changes influencing factors and chooses the preferred outcome) resulting in a measure (e.g. mean opinion score) representing this judgments. However, developments, especially in the field of 3D technology, ask for the inclusion of more sophisticated subjective measures in order to reach an adequate consideration of human perception which may differ from the objective rating [6]. In the context of 3DVOB quality assessment, preliminary defined subjective quality factors derive for example from occlusion, distortion, and shape, as outlined in [11]. However, the identification and the tighter definition of the extent of influences ask for further exploration. This is intended by the author’s PhD proposal. Researchers of related research areas

49 


(mainly associated with user interface design) already worked

on measurement of user experience and user acceptance also in

a pre-prototype stage of product development (e.g. [9]).

Activities are being undertaken in order to clarify the different usage of efforts to understand the quality of experience of new technologies as summarized in [4]. Research on free viewpoint video object technology up to now mainly regards the technical feasibility. Hence, in this particular emergent field of research user inclusion did not attract much attention up to now. Specific approaches are available concerning the evaluation of video quality in different usage contexts by means of subjective (e.g. [5]) as well as objective measurement (e.g. [6]). However, the quality estimation of emerging free viewpoint video object technology in video communication has not been covered yet

[14].

4. METHODOLOGY

First and foremost, the goal of the PhD effort is to identify

factors of subjective quality experience (e.g. disturbing fringes,

recognized holes, missing eye contact

and their respective

extent of impact as formulated into a measure. To achieve this, an adequate method needs to be defined. There are several methodological approaches paying little attention to subjective factors, besides evaluation efforts on objective measurements (e.g. concerning system processes) in early system development phases, as described in section 3.3. As a consequence, the first step within the proposed evaluation framework is an explorative approach in order to deduct non-critical factors and to define a range of applicable methods for evaluation. In a second step,

the application of an - at that time specified - applicable method

to collect data about quality influencing subjective factors is

verified. The final step is the formulation of a most adequate

methodological approach providing a quality measure. This measure provides the possibility of a mathematically formulation of quality influencing factors derived from the users’ perception and therefore may be able to be integrated into the technical processing chain (e.g. in a form of a perceptual coder or something similar).

)

5. PRELIMINARY RESULTS

As a first approximation to the described topic several methods were conducted in 2009 in order to gain more information. Expert interviews, focus groups, and online questionnaires were held to collect information on possible pre-experiences and users’ ideas about possible free viewpoint video object usage.

In 2010 a methodological framework, in cooperation with the

Institute of Psychology at the University of Salzburg, Austria, was established by the systematic evaluation of pre-produced test items. The goal was to detect to what extent the resulting quality of different processing steps was acceptable and to examine subjective factors influencing the experienced quality. This included the experience of eye-contact and the measurement of possible influence by several characteristics (e.g. appeal, trustworthiness…) of the shown conversational partner or different conversation contexts (private talk vs. professional conference). The test items were different free viewpoint video objects (produced by the usage of the described processing chain) showing four different people (two men, two women) resembling a possible video communication partner. With the conduction of this study a total of 322 data sets were collected. The data collection was carried out within three weeks in November 2010 in a laboratory providing a standardized environment (i.e. lightning conditions) in five separate rooms with personal computers and 19” LCD displays. Table 2 summarizes the design of the evaluation study and shows the different pre-defined independent variables.

Table 2. Combination of different evaluation variables

Test items (i.e. videos, 10 sec.),

with/without eye-contact, different view synthesis and disparity analysis (Table 1)

Technical

items

Usage context

Private talk with friend

Professional talk

to adviser

Content shown

Man

Woman

Man

Woman

With the possibility of collecting this amount of data sets every possible test setting contains at least 15 data sets. The settings vary in the combination of the different evaluation variables. For first evaluation activities, a set of different test items were created with the usage of the described different methods of view synthesis and disparity analysis (see also Table 1 in section 3.2.1).

6. DISCUSSION AND FUTURE WORK

There are several open questions concerning the identification of subjective quality factors and their measurement. In a first step, within the framework of the author’s PhD proposal, data is collected providing a basis for explorative analysis. The analysis of the preliminary data will be organized in three different phases. First of all, quality influencing factors will be identified via categorization and correlation analysis paying attention to the different evaluation variables (Table 2). A weighing of the identified factors leading to a list of influences on quality will be carried out in a second step. This is followed by the subtraction of non-critical factors and the conception of a further evaluation in May 2011. Therefore, it is planned to allow the user to define him or herself the best combination of provided processing steps in order to assess a free viewpoint video object representation with the best experienced quality. The identification or development of an ideal methodology in order to reach above mentioned goals is seen as accompanying needed effort and therefore main part of the overall result. Results will be published gradually within conference publications, the intended PhD work, and within the project report of Skalalgo3d until the end of February 2012.

7. REFERENCES

Argyle, M., Cook, M. 1976. Gaze and mutual gaze. Cambridge University Press, New York, USA. Colburn, A., R. 2000. The Role of Eye Gaze in Avatar Mediated Conversational Interfaces. In: Avatar Mediated Conversational Interfaces, Technical Report MSRTR- 2000-81, Microsoft Research, Microsoft Corporation, One Microsoft Way.

Fan, F., 2008. Analyse von Qualitätsparametern von 3D- Videoobjekten. Diplomarbeit, Technical University Ilmenau, Ilmenau, GER.

Geerts, D., De Moor, K., Ketykó, I., Jacobs, A., Van den Bergh,

J., Joseph,W., Martens, L., De Marez, L., 2010. Linking an

Integrated Framework with Appropriate Methods for Measuring QoE. In: Proceedings of the QoMEX 2010, Second International Workshop on Quality of Multimedia

Experience (June 21–23, 2010. Trondheim, Norway), Trondheim, Norway, Paper No. 158.

ITU-R BT.500-11. ITU Recommendation. 2002. Methodology for the subjective assessment of the quality of television pictures. ITU.

Jumisko-Pyykkö, S., Reiter, U., Weigel, C., 2007. Produced quality is not perceived quality – A qualitative approach to overall audiovisual quality. IEEE Xplore.

Kollarits, R. V., Woodworth, C., Ribera, J., F., and Gitlin, R.,

D. 1995. An eye contact camera/display system for

50 


videophone applications using a conventional direct-view lcd. SID 1995, Digest.

Korn, T. 2009. Kalibrierung und Blickrichtungsanalyse für ein 3D Videokonferenzsystem. Diplomarbeit, Technical University Ilmenau, Ilmenau, GER.

Law, E., Roto, V., Hassenzahl, M., Vermeeren, A., Kort, J. (2009) Understanding, Scoping and Defining User Experience: A Survey Approach. In Proc. Human Factors in Computing Systems, CHI’09. April 4-9, 2009, Boston, MA, USA.

Murayama, D., Kimura, K., Hosaka, T., Hamamoto, T., Shibuhisa, N., Tanaka, S., Sato, S., Saito, S. 2010. Virtual view image synthesis for eye-contact in TV conversation system. In: Conference Proceedings of Electronic Imaging, (17-21 January 2010. San Jose Convention Center, San Jose, California, United States) San Jose, USA, Paper No.7526-11.

Rittermann, M. 2007. Zur Qualitätsbeurteilung von 3D- Videoobjekten. Dissertation, Technical University Ilmenau, Ilmenau, GER.

Satu Jumisko-Pyykkö, Dominik Strohmeier, Timo Utriainen, and Kristina Kunze. 2010. Descriptive quality of experience for mobile 3D video. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction:

Extending Boundaries (NordiCHI '10). ACM, New York, NY, USA, 266-275. DOI=10.1145/1868914.1868947.

Scharstein, D., Szeliski, R., 2002. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. In: International Journal of Computer Vision, 47(1-3):7– 42, 2002.

Schreer, O., Kauff, P., Sikora, Th., (eds.) 2005. 3D Video communication: Algorithms, concepts and real-time systems in human centered communication. John Wiley & Sons, Ltd., UK.

Van Eijk, R., Kuijsters, A., Dijkstra, K., IJsselsteijn, W., A. 2010. Human Sensitivity to eye contact in 2D and 3D videoconferencing. In: Proceedings of the QoMEX 2010, Second International Workshop on Quality of Multimedia Experience (June 21–23, 2010. Trondheim, Norway), Trondheim, Norway, Paper No. 76.

Weigel, C., Schwarz, S., Korn, T. 2009. Wallebohr, M., Interactive free viewpoint video from multiple stereo. In:

Proceedings of the 2009 3DTV Conference: The True Vision – Capture, Transmission and Display of 3D Video (3DTV-CON 2009, Potsdam, Germany, May 4-6, 2009). IEEE Xplore, DOI = 10.1109/3DTV.2009.5069663.

Winkler, S. 2005. Digital Video Quality – Vision Models and Metrics. John Wiley & Sons, UK.

Allocation Algorithms for Interactive TV Advertisements

Ron Adany

Department of Computer Science Bar-Ilan University Ramat-Gan 52900, Israel

adanyr@cs.biu.ac.il

ABSTRACT

In this research we consider the problem of allocating per- sonalized advertisements (ads) to interactive TV viewers. We focus on the optimization problem of maximizing rev- enue while taking into account the special constraints and requirements of the TV ads industry. The research is part of studies towards a Ph.D. in Computer Science currently in the third year of the four year planned study. In this paper we define the research problem, present the achievements attained to date, detail the research plan and discuss the contribution of the work.

1.

INTRODUCTION

Personalization is the next-generation in the world of ad- vertisement. It is very attractive to all players. From the commercial companies’ perspective, it offers the possibility to tailor their advertisements to specific audiences and to ensure that the target population receives the desired ads in the desired format. From the standpoint of the service sup- pliers, i.e. the media companies and the operators, whose major source of income is advertisement [13], it is a way to increase revenue [7, 13]. And, from the viewers’ perspec- tive, it allows them to view ads which best suit their profile, preferences and interests. Ads’ personalization is already extensively used in the In- ternet medium (e.g. Google AdsWords [9]), but not in the TV medium. There are several key points that distinguish TV ads, as we consider them in this research, from Internet ads, such as the environment, the method of exposure, the pricing method, allocation constraints, etc. Over the past few years we have witnessed progress in technology, infras- tructure upgrade, increased use of alternative TV screens, e.g. cell-phones, and the penetration of interactive TV. This

The research is part of the NEGEV Consortium [17] tar-

geted at developing personalized content services, and di-

rected

by SintecM edia [20], which is a High-Tech company

that designs and implements management systems for the

TV broadcasting, cable and satellite industries.

51 


progress has given rise to real personalized services, and the assignment of advertisement to specific viewers, based on their interests and their relevance to the advertised content. The personalization problem becomes even more important for the mobile TV platform where there is no uncertainty with respect to who is watching [1, 2]. Many studies concern the problem of selecting personal advertisements most suitable to each individual viewer, e.g. [12, 15, 16], and many others focus on how to deliver them, e.g. [4, 16]. Our research supplements these studies, by using their results as input with the goal of optimizing the allocation of ads. The issue of optimizing the personal TV advertisement problem is still an open problem for which, to date, no adequate solution has been proposed. In this work we propose algorithmic solutions and do not deal with the hardware or infrastructure problems. Throughout this research we assume that the infrastruc- ture is similar to the iMedia system framework, which is designed for personal advertisement in the interactive TV environment [4]. Based on frameworks such as iMedia, the entire process of the personalized advertisement is as fol- lows. Given advertisement requests, ads are allocated to viewers and playlists of ads are generated for the planned time periods in some centralized computing center. Then, advertisement contracts are signed with the advertisers ac- cording to the allocations, and the playlists are delivered and stored in the Set-Top-Box (STB) units with which each viewer is equipped, as is common today. During the planned time periods, viewers watch TV, and on commercial breaks each STB airs ads based on the viewer’s playlist. At the end of each time period, each STB sends an ads’ viewing report to the centralized computing center, detailing the actual air- ing of the ads from the viewer’s playlist. At the end of all the planned time periods the billing process is activated ac- cording to the signed contracts.

2. PROBLEM DESCRIPTION

The Ads Allocation problem concerns the allocation of personal TV advertisements to viewers. We consider two versions of the Ads Allocation problem. The deterministic version, where the problem’s data is known in advance, is presented in Section 2.1. The uncertain multi-period ver- sion, where there are multi allocation periods and uncer- tainty about the problem’s data, is presented in Section 2.2.

2.1

Deterministic Version

The input for the Ads Allocation problem

set of ads and a set of viewers. Each viewer is associated

consists

of

a

with a viewing capacity, and a profile attributed to him. Each ad is associated with a transmission length, a required rating, a required airing frequency, a profit and a profile defining the target population. The ad rating indicates the required number of different viewers to whom the ad must be assigned in order to be considered allocated and be paid. The ad frequency corresponds to the number of times the same viewer should view the ad in order to be considered assigned to that viewer. The target population defines the set of viewers that are relevant for the ad. An example of parameters of a viewer would be a viewing capacity of 20 hours a week with a profile of a male from London in the 35-40 age group. An example of an ad request would be a 30 second ad that needs to be allocated to 20,000 viewers, 10 times to each viewer, which will result in a profit of $10,000, and the target population is females from NYC in the 20-35 age group. The goal of the Ads Allocation problem is to maximize the profit from a valid assignment of ads to viewers. A valid assignment that will result in payment should satisfies the ad rating and frequency requirements, does not exceed the viewers’ viewing capacities and be personal, i.e. suits the ad’s target population and viewers’ profiles. The deterministic version of the Ads Allocation problem is an extension of several well-studied optimization problems such as the General Assignment Problem (GAP) [14], the Multiple Knapsack Problem (MKP) [5], and the Multiple Knapsack problem with Assignment Restrictions (MKAR) [6, 19]. All of these problems are NP-hard and as an exten- sion of them the Ads Allocation problem is also NP-Hard. Consequently, our proposed method for solving the prob- lem is the heuristic approach (see Section 3.2) which is very common in solving instances of GAPs [14].

2.2 Multi-Period Uncertain Version

The multi-period uncertain version of the Ads Allocation problem is an extension of the deterministic case into a multi-period problem where the viewers’ viewing capacities are uncertain. While data regarding the ads’ requests, as well as data concerning the viewers’ profile (e.g. by asking the viewers), are known in advance, the data on a viewer’s viewing capacity is only a prediction of how much time a viewer will view TV within a certain period. Situations where viewers watch more or less time than expected are possible. The latter case, i.e. less viewing time than ex- pected, is more problematic since there may be some ads that are not fulfilled which will cause a loss in revenue. However, the case of more viewing time than expected is also problematic, since knowing the actual viewing time in advance could result in allocation of more ads that in turn would increase revenue, which is our goal. We assume that each estimated viewing time, c j , of viewer

v j is given together with some uncertainty factor 0 u j 1, where the “real” viewing capacity is a value in the range of

[c j