• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
Event on Demand with MPEG-21 Video Adaptation System
Min Xu
1
,
2
Jiaming Li
1
Liang-Tien Chia
1
Jesse S. Jin
2
Yiqun Hu
1
Bu-Sung Lee
1
Deepu Rajan
11
School of Computer Engineering Nanyang Technological University,Singapore, 639798 Email:
{
mxu, liji0006, asltchia, y030070, ebslee, asdrajan
}
@ntu.edu.sg
2
School of Design, Communication and IT, University of NewcastleCallaghan 2308, Australia Email:
{
M.Xu
}
@studentmail.newcastle.edu.au
{
Jesse.Jin
}
@newcastle.edu.au
ABSTRACT
In this paper, we present an event-on-demand (EoD) videoadaptation system. The proposed system supports users indeciding their events of interest and considers network con-ditions to adapt video source by event selection and framedropping.Firstly, events are detected by audio/video analysis andannotated by the description schemes (DSs) provided byMPEG-7 Multimedia Description Schemes (MDSs). Andthen, to achieve a generic adaptation solution, the adapta-tion is developed following MPEG-21 Digital Item Adap-tation (DIA) framework. We look at early release of theMPEG-21 Reference Software on XML generation and de-velop our own system for EoD video adaptation in threesteps: 1) the event information is parsed from MPEG-7annotation XML file together with bitstream to generategeneric Bitstream Syntax Description (gBSD). 2) Users’ pref-erence, Network Characteristic and Adaptation QoS (AQoS)are considered for making adaptation decision. 3) adapta-tion engine automatically parses adaptation decisions andgBSD to achieve adaptation.Unlike most existing adaptation work, the system adaptsvideo of events with interest according to users’ preference.Implementation following MPEG-7 and MPEG-21 standardsprovides a generic video adaptation solution. gBSD basedadaptation avoids complex video computation. 30 studentsfrom various departments were invited to test the systemand their responses has been positive.
Categories and Subject Descriptors
C.2.4 [
Computer-Communication Networks
]: DistributedSystems—
Distributed applications
; H.3.5 [
Information Stor-age and Retrieval
]: Online Information Services—
Web-based services
General Terms
Algorithms, Design, Experimentation
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.
 MM’06,
October 23–27, 2006, Santa Barbara, California, USA.Copyright 2006 ACM 1-59593-447-2/06/0010 ...
$
5.00.
Keywords
Event on Demand, Adaptation, Event detection, Annota-tion, MPEG-7, MPEG-21
1. INTRODUCTION
With the increasing amount of multimedia data and thedevelopment of multimedia communication techniques, thereis an increasing need to develop effective and efficient videoadaptation systems. In various media environments, usersmay access and interact with multimedia content on differ-ent types of terminals and networks. As shown in Fig. 1.,video adaptation play an important role between video data-base and users. It supports exchange, access, and manipu-lation of multimedia data according to users’ preference andnetwork condition.
Figure 1: The role of video adaptation
Video adaptation is still a challenging field. Earlier workdeveloped encoding schemes to reduce video size or providethe scalability for video adaptation [14, 9, 22]. With the in-creasing amount of video formats, attention turned towardstranscoding video from one format to another in order tomake the video compatible with the new usage environment[25]. Besides encoding and transcoding, a popular adapta-tion approach is to select, reduce or replace some video ele-ments, such as dropping shots and frames in a video clip [7],dropping pixels and DCT coefficients in an image frame[4],replacing video sequences with still frames [5] etc. Althoughthese methods provide feasible ways for video adaptation,there are still some limitations as follows.1. Most existing adaptation systems currently focus onachieving a certain defined SNR or bitrate withoutconsidering users’ preference and experience.
921
 
2. The current media adaptation solutions tend to be pro-prietary and therefore lack a universal framework.3. The transcoding and video elements removal will incurhigh computational complexity and cost.In this paper, our proposed system alleviates the above lim-itations by three steps mentioned below.Firstly, our adaptation system takes account of users’ pref-erence by asking users to select events in which they are in-terested. Sometimes, users may only want to watch thoseinteresting video segments instead of wasting time browsingthe whole video. Event is a feasible entry for users to accesscertain video segments because events are related to user’sunderstanding, which is also a good semantic index to thevideo. Taking account of user’s preference on events, Videoadaptation allocates more resources to the video parts whichattracts users than the unattractive parts.Secondly, in order to provide a generic solution to satisfya wide variety of applications, our system is implementedbased on an MPEG-21 Digital Item Adaptation (DIA) frame-work. Some international standards such as MPEG-7 andMPEG-21, define the format-independent technologies tosupport users to exchange, access, consume, trade and oth-erwise manipulate digital items in an efficient, transparentand interoperable way [16, 1].Finally, using generic Bitstream Syntax Description (gBSD)which is unaware of bitstream coding format to describe thestructure of bitstream provides interoperability in DigitalItem Adaptation (DIA). Implementing adaptation based ongBSD instead of the video itself helps to adapt resourcesquickly with minimal computation cost. It alleviates thecomputation complexity in transcoding which treats bit-stream in a bit-by-bit manner. Furthermore, gBSD can pro-vide structure description at different syntax layer, whichenables adaptation at different levels.The rest of this paper is organized as follows. Adapta-tion system architecture will be briefly introduced in Sec-tion 2. Section 3 will review our previous work on sportsevent detection. Detected interested events are annotatedby MPEG-7 descriptors which are presented in Section 4.Section 5 presents the details of Digital Item Adaptation(DIA), which is the core of our proposed adaptation sys-tem. Experiments and results are presented in Section 6.Some related conclusions are in Section 7.
2. RELATED WORK
This section presents traditional adaptation methods andsome related techniques of adaptation.The early work was mostly concerned with network con-dition for multimedia streaming service. In order to adaptvideo files for fluctuating networks, the network transmis-sion mechanisms [19, 24] dynamically adapt video sequenceby flexibly dropping portions of elements in video file, suchas enhancement layers, frames and so on. To make thevideo scalable for layers or frames dropping, several encod-ing schemes have been proposed, such as Fine GranularityScalability (FGS) video coding [14], Multiple DescriptionCoding (MDC) [9], wavelet-based scalable coding [22], etc.Previous works focus on how to estimate network QoS andachieve good video quality with limited network resources.Nowadays, the structure of network is changing from ho-mogeneous to heterogeneous structure. Different networkarchitectures have different capabilities in transmission. Nor-mally, there are two approaches when dealing with problemsin multimedia services via complex heterogeneous network[8]: adaptive transmission which enforces traditional guar-anteed network resource allocation, and adaptive applica-tions which are more tolerant to inevitable fluctuations inthe supporting environment’s performance. The problemof maximizing overall quality in adaptive multimedia sys-tem has been abstracted to Utility Model [12] to incorporatethe dynamics in heterogeneous network environment. Someutility-based adaptation schemes [13, 23, 21] have been pro-posed to optimize the quality of multimedia service undernetwork constraints.The various capabilities of terminal devices at the end of a network increase the complexity of multimedia services.Some works [26, 17] focus on how to do adaptation concern-ing limited resource on terminal devices, such as energy,screen size, presentation capability etc.At the same time, the set of emerging rich media formatsto be delivered is growing fast. People do not want to botherbuilding specialized adaptation mechanism for every upcom-ing format. An alternative way to adapt multimedia filesbetween different container formats is transcoding [3, 10].In [17], semantic knowledge about context is used to guidephysical adaptation: conversion, scaling and distillation. Tohandle the bandwidth degradation, One method tries todrop shot or frames in video sequence [7]. Instead of drop-ping shots completely, some methods retain the keyframesof the shot [5]. Pixels and coefficients are dropped at framelevel [4]. However, objectives of the adaptation process toreduce the bandwidth utilization cannot satisfy users’ re-quests. User-specified adaptation has been addressed in lit-erature [6] [11]recently. These work focus on adapting low-level features such as, color depth whereas users might paymore attention to semantic aspects than low-level features.Our proposed adaptation system bridges the gap betweenusers’ preference on semantic and video content. The im-plementation based on MPEG-21 DIA framework providesa generic adaptation solution. MPEG-21 standard providessome reference softwares to generate and parse XML files de-scribing video source, users’ environment, network conditionand so on [2]. We use the MPEG-21 reference softwares anddevelop our system by improving Structured Scalable Meta-formats (SSM) version 2.0 for content agnostic digital itemadaptation [18].
3. THE ADAPTATION SYSTEM
Before we describe our system step by step, an overview of system architecture will be explained. Our proposed adapta-tion system has two primary processes (Fig. 2):
event iden-tification and annotation 
and
MPEG-21 Digital Item Adap-tation 
. The MPEG-21 DIA can be employed on server andany intermediate proxies between server and client, on thebasis of the user-provided preferences. In the experiments,the adaptation process is performed on the server side andwill be easily extended to adaptation on proxies.Firstly, some pre-defined interesting events are identifiedby our event identification and annotation module and storedin MPEG-7 structured format. Detecting semantic high-lights or events in video has attracted much interests. Mostof the previous methods rely on a single feature (audio, videoor transcripts) and each feature provides some hints on in-teresting video events or video highlights. Recently, our
922
 
Figure 2: Adaptation system architecture
event detection work based on audio sounds identificationand video scene detection has shown promising results [15].According to users’ preference, these detected events can betagged with their own priorities for video adaptation, whennecessary.Secondly, the event information is parsed from MPEG-7 annotation XML file together with bitstream to generategeneric Bitstream Syntax Description (gBSD). When users’request, device capabilities and user preferences is sent toUsage Environment Description (UED), adaptation decisionengine determines decision point according to AQoS in orderto maximize user satisfaction and adapt it to the constrainedenvironment, such as network condition. The decision pointand gBSD will instruct the adaptation operation engine toalter the bitstream and resend to users.The Event on Demand adaptation represents a promis-ing strategy especially for streaming applications, where thecrucial phase is the video content analysis. The event iden-tification and annotation engine is more suited for video inwhich the events are easily detected such as sports videos.However the adaptation scenario is easily extended to othervideo domains. for example, the movie can be adapted byuser selected scenes, and news video can be adapted by newscontent, etc. In this paper, video scenarios from basketballgames which are presented in previous publication [15] areused to demonstrate how well the proposed adaptation sys-tem performs. Compared to [18], the most significant im-provement is EoD adaptation which will be achieved by thefollowing points.
As a pre-processing step, The Event Identification andAnnotation module provides an event indexing videoto be adapted. The module enables users to directlyaccess preferred events.
The gBSD structure and descriptions are designed foreasy storing and parsing of both bitstream format re-lated information and event related information suchas event label and event duration.
By incorporating users’ preference, AQoS is designedto flexibly allocate limited network bandwidth via events.
According to the changes in average bandwidth, theadaptation decision engine can dynamically make adap-tation decision and signal the adaptation operation toadjust the rule of adaptation. Currently, the networkbandwidth is estimated by monitoring the transmis-sion time and the file size of past video segment.
4. EVENT DETECTION AND MPEG-7BASED ANNOTATION
Capturing interesting events can be regarded as a pre-processing step in video adaptation. It provides a feasibleway for users to access video content by selecting interestingevents.
4.1 Event Identification
Event identification is a challenging problem due to thegap between low-level perceptual features and high-level hu-man understanding of videos. We seek some middle-levelfeatures, such as specific audio sounds and video scenes.These specific audio sounds have significant hints pointingto interesting events. For example, the sound of a ball hit-ting the rim of the basket may be used to confirm the eventof a basketball shot being taken. The excited commenta-tor and audience sounds are most likely the consequence of a shot. Additionally, the video scenes provide certain con-straints for the event occurrence. By summarizing someheuristic decision rules to combine audio events and videoscenes, interesting events are detected. More details can bereferred to our previous work [15]. Six basketball events aredetected as:
Replay 
,
Highlight 
(goal or shot),
Foul 
,
Penalty 
,
Close-up
and
Normal 
.
4.2 Event Annotation
MPEG-7 is a new multimedia standard, designed for de-scribing multimedia content by providing a rich set of stan-dardized descriptors and description schemas. We utilizethe description schemes (DSs) of content management anddescription provided by MPEG-7 MDSs to represent the re-sults of event identification. A small snippet of event an-notation using MPEG-7 XML file is listed in Fig. 3. TheAudioVisual DS is utilized to describe the temporal decom-position of a video entity. In each TemporalDecomposition
923
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...