You are on page 1of 26

The Electronic Library

Multimedia networking issues for digital video libraries

Dimitris N. Kanellopoulos
Article information:
To cite this document:
Dimitris N. Kanellopoulos , (2014)," Multimedia networking issues for digital video libraries ", The Electronic
Library, Vol. 32 Iss 6 pp. 898 - 922
Permanent link to this document:
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)
Downloaded on: 16 March 2017, At: 00:41 (PT)
References: this document contains references to 73 other documents.
To copy this document:
The fulltext of this document has been downloaded 626 times since 2014*
Users who downloaded this article also downloaded:
(2014),"Web 2.0 tools: a survey of awareness and use by librarians in university libraries in Africa", The
Electronic Library, Vol. 32 Iss 6 pp. 864-883
(2014),"Will open-access journals substitute big-deal subscriptions in engineering college libraries in India?",
The Electronic Library, Vol. 32 Iss 6 pp. 852-863

Access to this document was granted through an Emerald subscription provided by emerald-srm:584523 []
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald for
Authors service information about how to choose which publication to write for and submission guidelines
are available for all. Please visit for more information.
About Emerald
Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as
providing an extensive range of online products and additional customer resources and services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee
on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive

*Related content and download information correct at time of download.

The current issue and full text archive of this journal is available at

Multimedia networking issues
for digital video libraries
Dimitris N. Kanellopoulos
Department of Mathematics, University of Patras, Patras, Greece

Received 12 January 2013 Abstract

Revised 24 February 2013
Purpose The purpose of this paper is to provide a tutorial and survey on recent advances in
Accepted 24 February 2013
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

multimedia networking from an integrated perspective of both video networking and building digital
video libraries. The nature of video networking, coupled with various recent developments in
standards, proposals and applications, poses great challenges to the research and industrial
communities working in this area.
Design/methodology/approach This paper presents an insightful analysis for recent and
emerging multimedia applications in digital video libraries and on video coding standards and their
applications in digital libraries. Emphasis is given on those standards and mechanisms that enable
multimedia content adaptation fully interoperable according to the vision of Universal Multimedia
Access vision.
Findings The tutorial helps elucidate the similarities and differences among the considered
standards and networking applications. A number of research trends and challenges are identified, and
selected promising solutions are discussed. This practice would needle further thoughts on the
development of this area and open-up more research and application opportunities.
Research limitations/implications The paper does not provide methodical studies of
networking application scenarios for all the discussed video coding standards and Quality of Service
(QoS) management mechanisms.
Practical implications The paper provides an overview of which technologies/mechanisms are
being used broadly in networking scenarios of digital video libraries. The discussed networking
scenarios bring together video coding standards and various emerging wireless networking paradigms
toward innovative application scenarios.
Originality/value QoS mechanisms and video coding standards that support multimedia
applications for digital video libraries need to become well-known by library managers and professional
associations in the fields of libraries and archives. The comprehensive overview and critiques on
existing standards and application approaches offer a valuable reference for researchers and system
developers in related research and industrial communities.
Keywords Multimedia networking, Digital video libraries, Quality of Service, MPEG-21 digital
item adaptation, Scalable video coding
Paper type General review

1. Introduction
Today, there are many applications of digital video libraries in education, medicine,
publishing, law, consumerism, research and so forth. The rapid growth of digital
libraries (DLs) has changed our lives more readily than we have ever speculated. The
The Electronic Library applications of DLs range from technical to home-use applications and from critical to
Vol. 32 No. 6, 2014
pp. 898-922 entertainment-based applications (Pratha et al., 2006). A DL can involve various types of
Emerald Group Publishing Limited data such as text, speech, audio, images, graphics, and video (Rarnaiah, 1998). For
DOI 10.1108/EL-01-2013-0009 example, a DL object could be a document, such as a computer science technical report,
a weather map (image), an interactive presentation of a speech, a video clip of a movie, Multimedia
an instructional visual aid or even an olfaction (i.e. a smell). Enhancing multimedia
applications with olfactory stimuli has the potential to create a more complex and
richer user multimedia experience, by heightening the sense of reality (Ghinea and issues
Ademoye, 2012). A DL includes collections of data which are stored in digital formats
and accessible via computers. The digital content may be stored locally or accessed
remotely via computer networks. DLs provide an integrated set of services for 899
capturing, cataloguing, storing, searching, protecting and retrieving information. These
services provide a coherent organization and convenient access to typically large
amounts of digital information (Gonalves et al., 2007).
DELOS[1] is a Network of Excellence on Digital Libraries, and[2] is a
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

Coordination Action on Digital Library Interoperability, Best Practices and Modelling

Foundations. In the context of these umbrellas, DL researchers and practitioners have
produced a Reference Model for Digital Library Management Systems (Candela et al.,
2011, p. 3). In this reference model, a DL is defined as:
A potentially virtual organisation, that comprehensively collects, manages, and preserves for
the long depth of time rich digital content, and offers to its targeted user communities
specialised functionality on that content, of defined quality, and according to comprehensive
codified policies.
At present, digital video libraries have to encompass various technologies such as
storage, databases, multimedia networking, information systems, artificial intelligence,
multimedia databases, high performance processing, communications, user interface,
hypertext, hypermedia and security. Figure 1 depicts a digital video library architecture.
The major components of such an architecture are presented below:
The Media Server is specialized application software that provides video on
demand and is dedicated to storing various digital media (meaning digital videos/
movies, audio/music, and picture files).

Figure 1.
Architecture of a digital
video library
EL Multimedia (video) Metadata Server: Metadata are particularly useful in video,
where information about its contents (such as transcripts of conversations and
32,6 text descriptions of its scenes) are not directly understandable by a computer, but
where efficient search is desirable.
Media Store: There are two formats that store information in a temporal
component that is specified, one for subtitles and another for transcripts, which
900 can also be used for subtitles. The formats are SRT or SUB for subtitles and TTXT
for transcripts. To manage this type of format, it is motivating to use MP4Box
program with which you can get these kinds of files and formats.
Application Controller: It is the communication entity between the external users
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

and the system. This component consists of the following sub components:
Media Submission Manager;
Digital Library Application Manager;
Media Player Interface; and
Media Encoder.
The Media Submission Manager performs the activity of uploading the media
content onto the Media Server. It interacts with the Segmentation Engine and
provides the media content as an input to the Segmentation Engine component. It
handles multiple requests for the upload activity. It handles offline uploads and also
online uploads. The main function of the Digital Library Application Manager is the
efficient execution of procedures (programs, routines and scripts) for supporting the
management of DL applications. Such procedures can be related with media content
creation. The Media Encoder encodes different formats in which the videos are
uploaded, while the Media Player Interface acts as an interface between the client
side player and the server.
The Segmentation Engine is the core component that addresses the performance
issues of the multimedia DL. It consists of the Synchronizer, Metadata Generator,
Striper and the Transcription Tool. The Synchronizer performs the
synchronization of the audio, video files and the synchronization of the segmented
files for a smooth playback. The output of the Synchronizer is sent to the Media
Player for the playback. Inside the Synchronizer, the Redundancy Checker detects
unintended changes to raw data. Blocks of data entering the Cyclic Redundancy
Checker get a short check value attached, based on the remainder of a polynomial
division of their contents. On retrieval the calculation is repeated and corrective
action can be taken against presumed data corruption, if the check values do not
match. The Metadata Generator performs the unique metadata generation for
each segment of a lengthy video or for every video file and its segments. It stores
the metadata information in the metadata server. It enhances the searching and
indexing mechanism and also the access to the media content in the Media Server.
Based on the metadata, the segments are created and these segments are given as
an input to the Striper. The Striper performs the striping operation and segments
each video/audio file and stores them in the media server. Finally, the
Transcription Tool transcripts video files and their segments. The outcome of this
process is an input to the Metadata Generator. For performance reasons, the
transcription must be done at the end-users side.
Indexer: DLs provide indexing to help users identify and understand the Multimedia
characteristics of the information they need. There are extensive studies of
the problem of automatic index construction for text-based DLs. However, the
construction of multimedia DLs continues to represent a challenge because issues
multimedia objects usually lack sufficient text information to ensure reliable
index learning. Hwang et al. (2010) tackled the problem of automatic index
construction for multimedia objects by employing Web usage logs and limited 901
keywords pertaining to multimedia objects. Web usage logs provided valuable
information for building indexes of multimedia DLs with limited textual
information. Their proposed methods generally yielded better indexes,
particularly for the artwork data set.
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

The Multimedia Search Engine (in the depicted architecture) is a video search
engine. It allows users to search by video format type and by length of the clip.
Videos can be searched for by simple metadata or by complex metadata generated
by indexing. Search results are usually accompanied by a thumbnail view of the

Multimedia networking is changing the ways by which DLs provide access to

information sources and services (Hwang, 2009). Most multimedia content providers of
DLs have started their own streaming infrastructures to deliver their video content,
either live or on-demand. Many multimedia applications have also matured in the past
few years, ranging from distance learning to desktop video conferencing, workgroup
collaboration, instant messaging, and imaging. Increasingly, multimedia content of DLs
is being accessed by a large number of diverse users and clients at any time, and from
anywhere, across various communication channels, such as the Internet and wireless
networks. As mobile cellular and wireless local area network (LAN) networks are
evolving to carry multimedia data, an all-Internet protocol (IP)-based system similar to
the Internet is likely to be employed due to its cost efficiency, improved reliability,
allowance of easy implementation of new services, independence of control and
transport, and importantly, easy integration of multiple networks. In digital video
libraries, various multimedia applications benefit from the rapid development of
encoding techniques of multimedia data sources and effective Quality of Service (QoS)
management mechanisms (Ahmed et al., 2007). Meanwhile, interoperability solutions
are being proposed to integrate wired and wireless heterogeneous networking systems
provided for DL applications. Another challenge is ensuring that the
multimedia-networked content is fully interoperable, with ease of management and
standardized multimedia content adapted for interoperable delivery, as well as
intellectual property management and protection (i.e. digital rights management [DRM])
successfully incorporated in the DL system.
The above discussion reveals some of the hot multimedia issues in digital video
library systems connected to IP networks. This article briefly presents issues
concerning software tools and metadata standards for developing digital video libraries.
Afterwards, it considers multimedia services and applications in DLs, as well as
research results on multimedia networking issues that support such applications.
Special emphasis is given to those standards and mechanisms that could enable
multimedia content of DLs fully interoperable according to the vision of Universal
EL Multimedia Access (UMA). According to the UMA vision, any user/device may be able
to consume any multimedia content, anytime and anywhere (Pereira and Burnett, 2003).
2. Software tools and metadata standards for DLs
DL projects are being initiated all over the world for materials of different formats and
domains. To organize, store, and retrieve digital content, many libraries and archiving
902 centers are using either proprietary or open-source software. Digital media requires
continuous processes to keep it compliant with current technology. It is not only
necessary to organize digital content but also important to preserve it to ensure
accessibility, sustainability and retrieval across time. Madalli et al. (2012) presented an
analytical study along with observations regarding digital preservation support
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

available in existing open-source digital library software (OSS-DL) based on test beds
created for that purpose. Gonalves et al. (2007) elaborated on the meaning of quality in
DLs by proposing a model that is totally grounded in a formal framework for DLs: 5S
(Streams, Structures, Spaces, Scenarios and Societies). For each major DL concept in the
framework, they formally defined a number of dimensions of quality and proposed a set
of numerical indicators for those quality dimensions. In particular, they considered key
concepts of a minimal DL: catalogue, collection, digital object, metadata specification,
repository and services. Regarding quality dimensions, they considered: accessibility,
accuracy, completeness, composability, conformance, consistency, effectiveness,
efficiency, extensibility, pertinence, preservability, relevance, reliability, reusability,
significance, similarity and timeliness. Regarding measurement, they studied
characteristics like: response time (with regard to efficiency), cost of migration (with
respect to preservability) and number of service failures (to assess reliability). For some
key DL concepts, the quality dimension and numerical indicator pairs were illustrated
through their application to a number of real-world DLs. Gonalves et al. also
discussed connections between the proposed dimensions of DL quality and an expanded
version of a workshops consensus view of the life cycle of information in DLs. Such
connections can be used to determine when and where quality issues can be measured,
assessed, and improved, as well as how possible quality problems can be prevented,
detected, and eliminated. Amato et al. (2004) described the MILOS[3] software
component that supports design and effective implementation of DL applications.
MILOS supports the storage and content-based retrieval of any multimedia documents
whose descriptions are provided by using arbitrary metadata models represented in
extensible markup language (XML). MILOS is flexible in the management of documents
containing different types of data and content descriptions. It is efficient and scalable in
the storage and content-based retrieval of these documents.

2.1 OSS for digital video libraries

Libraries are regular users of open-source software (OSS). There are a number of
library-related projects and some of these are detailed in Table I. These range from
simple scripts to produce statistics to integrated library systems to institutional
repository software. These software are CDSware (developed by CERN) and Fedora
(developed jointly by the University of Virginia and Cornell University, with funding
from the Andrew W. Mellon Foundation). Fedora[4] (Flexible Extensible Digital Object
and Repository Architecture) is an open-source digital object repository management
system that demonstrates how distributed DL architecture can be deployed using
Name URL Type of project
Apache Web server
CDS invenio Integrated library systems issues
DSpace Digital library software
E Prints Digital library software
GIMP OS image manipulation software
GNOMe Unix desktop environment
Greenstone Digital library software
GridSphere The open-source portlet Web-based portal
iVai Web-based digital library portal Table I.
My Library A digital library framework and toolkit Library-related projects
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

Uportal The open-source enterprise portal framework based on OSS

Web-based technologies, including XML and Web services. Wei (2011) described
DSpace, E-Prints and Greenstone DL Software, which are all widely used OSS for digital
The Open Video Digital Library Toolkit (OVDLT) project provides tools to libraries,
museums, and other institutions holding moving image collections to more easily create
Web-based digital video libraries. Funded by the Institute of Museum and Library
Services and now released as an open-source product under the MIT License, the
OVDLT project provides a no-cost solution for libraries, archives, museums and other
institutions that want to make available their digital video resources through their own
Web-based DL. OVDLT[5] runs on Linux or Mac OS X 10.5. Its features include:
Rich end-user features: favorites, user-generated playlists, playlist annotations,
tagging and saved searches.
Easy library administration: site-integrated form-based library configuration, user
management and easily configurable metadata schema.
Quick, intuitive cataloguing: integrated forms, easy control over public/private
videos and featured videos and one-click poster frame selection.
Automatic video preview generation: storyboards, fast-forwards and excerpts.

2.2 Descriptive metadata standards for libraries and L2L protocols

Increasingly, DLs move their collections to the Web and share them with users of other
institutions via the Metadata Harvesting Protocol of the Open Archives Initiative
(OAI-PMH, 2006) and the OpenURL standard (Apps and MacIntyre, 2006). Actually,
multimedia DLs require the strong footings of Resource Description Framework (RDF)
vision supplemented with descriptive metadata standards such as Dublin Core,
Metadata Object Description Schema (MODS) or Metadata Encoding and Transmission
(METS). According to Park and Lu (2009), DLs also need the strength of XML encoding
schemas, related Document Type Definitions (DTDs) and Extensible Stylesheet
Language (XSL) transformations between the non-traditional data streams and the
Hypertext Markup Language (HTML) front-end.
There are many diverse DLs that use different metadata standards. This also implies
a variety of library-to-library (L2L) protocols (Park and Lu, 2009):
EL Z39.50 It is one of the oldest protocols used in library environments. It allows
queries to be formulated without having to know anything about the target
32,6 database because its syntax is abstracted from the underlying database structure.
Search/Retrieve via URL (SRU) It is a standard search protocol for Internet
search queries, utilizing Contextual Query Language (CQL), a standard query
syntax for representing queries. CQL is a formal language for representing queries
904 to information retrieval systems such as search engines, bibliographic catalogues,
and museum collection information. Based on the semantics of Z39.50, its design
objective is that queries be human readable and writable, and that the language be
intuitive while maintaining the expressiveness of more complex query languages.
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

DIENST It is an hypertext transfer protocol (HTTP)-based network protocol. It

uses HTTP Get queries to embed requests, which later are sent to individually
defined services. Each service can support a set of operations, so called verbs.
These services provide their response in text or XML format. The DIENST
protocol allows:
access to digital objects;
deposition of new resources;
discovery and browsing of resources; and
users registration.
Communication with and among individual DIENST services is established upon
an open protocol.
OAI-PMH This protocol provides an interoperability framework based on
metadata harvesting. The OAI-PMH defines two classes of participants: data
providers and service providers. The OAI-PMH is based on HTTP requests and
XML responses. The response utilizes Dublin Core as its metadata format.

2.3 DRM software

The ability for someone to make perfect copies and the ease by which those copies can
be distributed also facilitate misuse, illegal copying and distribution (piracy),
plagiarism and misappropriation. Popular Internet software based on a peer-to-peer
architecture has been used to share copyrighted movies, music, software and other
materials (Liu et al., 2008). Concerned about the consequences of illegal copying and
distribution on a massive scale, content owners are interested in DRM systems, which
can protect their rights and preserve the economic value of digital video. A DRM system
protects and enforces the rights associated with the use of digital content.
Unfortunately, the technical challenges for securing digital content are formidable and
previous approaches have not succeeded. Lin et al. (2005) overviewed the concepts and
approaches for video DRM and described methods for providing security, including the
roles of encryption and video watermarking. Current efforts and issues were described
in encryption, watermarking, and key management. Finally, they identified challenges
and directions for further investigation in video DRM.

3. Multimedia applications in DLs

In this section, we present diverse multimedia applications in DLs.
3.1 Recommender systems for video libraries Multimedia
Bollen et al. (2007) described a minimalist methodology to develop usage-based
recommender systems for multimedia DLs. A prototype recommender system based on
this strategy was implemented for the Open Video Project, a DL of videos that are freely issues
available for download. Sequential patterns of video retrievals were extracted from the
projects Web download logs and analyzed to generate a network of video relationships.
A spreading activation algorithm located video recommendations by searching for 905
associative paths connecting query-related videos. Bollen et al. (2007) evaluated the
performance of the resulting system relative to an item-based collaborative filtering
technique operating on user profiles extracted from the same log data.
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

3.2 Spatial-temporal information-based video retrieval

Video retrieval is increasingly based on image content. A number of studies on video
retrieval have used low-level pixel content related to statistical moments, shape, color
and texture. However, it is well-recognized that such information is not enough for
uniquely discriminating across different multimedia content. The use of semantic
information, especially which derived from spatiotemporal analysis is of great value in
multimedia annotation, archiving and retrieval. Ren et al. (2009) detailed how the use of
spatiotemporal semantic knowledge is changing the way in which modern research is
conducted. They reviewed a number of studies and concepts related to such analysis
and illustrated important conclusions on where future research is headed.

3.3 Semantic search of cultural content

Semantic search is of major importance in present day DLs, such as in Europeana[6]. It
is noteworthy that the multimedia search engine in the depicted architecture of a digital
video library (Figure 1) can be semantics-based. Content metadata constitute the main
descriptions of cultural items that are analyzed, mapped and used to interpret users
queries, so that the most appropriate content is selected and presented to the users.
Kollia et al. (2012) presented a new semantic search methodology, including a query
answering mechanism. This mechanism meets the semantics of users queries and
enriches the answers by exploiting appropriate visual features, both local and Moving
Picture Experts Group (MPEG)-7, through an inter-woven knowledge and machine
learning based approach. Kollia et al. (2012) presented an experimental study by using
content from the Europeana DL, and involving both thematic knowledge and extracted
visual features from its images, illustrating the improved performance of their proposed
semantic search approach. From another perspective, the increasing amount of
information available on the Internet is changing the forms of classification and access
to data. A major challenge is how to classify, locate and access knowledge in DLs
tackling the huge amount of resources the Web provides. Therefore, improving DLs by
means of different strategies, particularly using semantics, remains a promising and
interesting approach. Garca-Crespo et al. (2011) presented CallimachusDL, a
semantics-based DL which provides faceted search, enhanced access possibilities and a
proof-of-concept implementation. CallimachusDL represents a novel approach to DLs,
integrating social Web and multimedia elements in a semantically annotated repository.
The results of the implementation indicated that the features proposed in
CallimachusDL are encouraging and extendable in the use of DLs.
EL 3.4 Image browsers and photos albums
Viewing high-resolution images on DLs is a common activity. For this purpose,
32,6 specialized zoomable browsers have been proposed to reduce the network traffic.
Kanellopoulos et al. (2012) presented a lightweight and platform independent zoomable
browser that supports annotations of high quality digital content. This browser can
display high-resolution images in resource-limited mobile environments like PDAs,
906 PALMs and smartphones, even in low-speed connections. It was developed using
lightweight Web technologies, which makes the tool ideal for downloading highly
detailed digital content. The researchers evaluated the browser and the main finding is
that data transfer is reduced with the aid of the browser, while the presented information
is of high quality and could be used in various operating environments. From another
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

viewpoint, Amato et al. (2006) designed and implemented an online photo album
(PhotoBook), which is a DL application that allows people to manage their own photos,
to share them with friends, and to make them publicly available and searchable.
PhotoBook uses a complex internal metadata schema (MPEG-7) and allows users to
simply express complex queries (combining similarity search and fielded search),
enabling them to retrieve material of interest even if metadata are imprecise or missing.

3.5 Broadcasting for digital video library applications

Digital Video Broadcasting (DVB) is a suite of internationally accepted open standards
for digital television (Reimers, 2006). Hua and Sheu (2000) investigated a novel multicast
technique, called Skyscraper Broadcasting (SB), for digital video library applications.
They discussed the data fragmentation technique, the broadcasting strategy, and the
client design. They also demonstrated the correctness of their technique and derived
mathematical equations to analyse its storage requirement. To assess its performance,
they compared it to the latest designs known as Pyramid Broadcasting (PB) and
Permutation-Based Pyramid Broadcasting (PPB). Their study indicated that PB offers
excellent access latency. However, it requires very large storage space and disk
bandwidth at the receiving end. PPB is able to address these problems. However, this is
accomplished at the expense of longer access latency and more complex
synchronization. With SB, they can achieve the low latency of PB while using only 20
per cent of the buffer space required by PPB.

3.6 Internet protocol-based television

IPTV concerns video infotainment (a combination of information and entertainment)
and represents a solution for interactive television-like services over IP-based networks.
Operators and vendors are currently working on IPTV standardization efforts (e.g.
ATIS/IIF, ITU-T IPTV-GSI, ETSI TISPAN) to bear wider availability and
interoperability of IPTV as a secure reliable managed multimedia service. Mikczy et al.
(2012) presented a generic IPTV architecture, and explained standardization efforts
such as TISPAN, OIPF, ITU-T and ATIS specifications for the next generation IPTV.
They reviewed new approaches in multimedia services and media delivery, and
presented open issues in IPTV networks. It is worth mentioning that Lin et al. (2009)
examined an auto-assembled multimedia presentation (MP) from DLs, in which the
retrieved media objects are dynamically composed to form a continuously played
TV-like presentation. They proposed techniques for ordering the media objects in
such a presentation so as to reduce its total presentation lag in a high-delay network
environment. In particular, they proposed some computationally efficient heuristic Multimedia
algorithms that can obtain near-optimal sequences. Their proposed algorithms
significantly reduced the lag of a given presentation, compared with a random sequence.
3.7 Mobile Web services for a library
Wang et al. (2012) provided directions for designing mobile Web services for a library. In
addition, they pointed out how to evaluate the performance and patron satisfaction of 907
mobile Web services through system log analysis and patron questionnaire.

3.8 Online tutorials

A storyboarding process for rapid outline of tutorial content can be developed (Recep
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

Okur and Gms, 2010; Bailin and Pea, 2007). Then, online tutorials using free and
low-cost software and Web-based tools can be created (Silver and Nickel, 2005). For
example, a tutor can develop online tutorials by using the CAMTASIA[7] software that
does screen capture and recording. Afterwards, the online tutorial can be published and
distributed to the end-users of the DL.
The above multimedia applications can be managed by the Digital Library
Application Manager (Figure 1). These applications run in IP-based networks and
require proper multimedia networking infrastructures. Such networked multimedia
applications for libraries have created a tremendous impact on computing and network
infrastructures. In the next sections, we consider critical multimedia networking issues
for digital video libraries.

4. Multimedia networking issues for digital video libraries

A large number of diverse users and clients access multimedia content of DLs at any
time, and from anywhere, across various communication channels such as the Internet
and wireless networks. As mobile cellular and wireless LAN networks are evolving to
transmit multimedia data, an all IP-based system similar to the Internet is expected to be
employed due to its cost efficiency, improved reliability, allowance of easy
implementation of new services, independence of control and transport, and
importantly, easy integration of multiple networks. Nevertheless, reliable transmission
of multimedia over such an integrated IP-based network poses many challenges. This is
not just due to the inherently lower transmission rates provided by these networks but
also due to related problems such as competing traffic, congestion, fading, interference
and mobility, all of which lead to varying transmission capacity and losses. As a result,
to achieve a high level of acceptability and proliferation of networked multimedia, a
solution for reliable and efficient transmission over IP and wireless networks is

4.1 QoS requirements

Various network protocols and architectures supporting QoS assurance are available
now and are still being developed. Gozdecki et al. (2003) provided an overview of
commonly used terminology related to QoS assurance in IP networks. QoS
requirements, imposed by multimedia applications, are specified by the following four
closely related parameters:
(1) Bandwidth On Demand refers to data rate measured in bit/s (channel capacity or
throughput-bandwidth consumption) that is required in order to transfer
EL continuous media data (i.e. video and audio). Video applications require more
bandwidth than audio applications. Generally, multimedia applications need to
32,6 have more bandwidth available for multimedia content, which leads to
higher-quality services. In the Next-Generation Internet (NGI), the speed of
network links and routers will be improved radically so that network congestion
will be uncertain and QoS guarantees will be provided by design. This endeavor
908 will include optical Wavelength-Division Multiplexing (WDM) technologies
being considered by the NGI initiative (Paul et al., 2011).
(2) End-to-end delay, which affects the users satisfaction with the application. It
includes capturing, digitizing, encoding/compressing media data and
transporting them from the source to the destination, decoding and displaying
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

them to the user. Low end-to-end delay is preferred (Lu, 2000).

(3) Delay variation (or delay jitter), which is the variation of end-to-end delay from
one packet to the next packet within the same packet stream (connection/flow).
Low delay jitter is preferred (Kanellopoulos, 2009).
(4) Acceptable error rate or loss rate without retransmission as the delay would be
intolerable with retransmission. It is required robustness to data losses since,
depending on the channel condition, partial data losses may occur (Lu, 2000;
Kanellopoulos, 2009).

Bhargava and Annamalai (2000) have conducted experiments to measure the

communication overhead in the response time of DLs. They have studied the correlation
between communication and size of data, between communication and type of data and
the communication delay to various sites in a local and wide area network. In particular,
they presented different strategies for reducing delay while communicating multimedia
data. Images are amenable to losing data without losing semantics of the image. Lossy
compression techniques reduce the quality of the image and reduce the size, leading to a
lower communication delay. The authors compared the communication delay between
compressed and uncompressed images, and studied the overhead due to compression
and decompression. Additionally, they presented issues in providing DL service to
mobile users and presented a framework for efficient communication of DL data.

4.2 Multicast support

It is a common requirement of multimedia communication to send data from one source
to multiple destinations. Efficient multicasting protocols can reduce bandwidth
requirements (Paul, 1998). Given the multi-receiver nature of video programs, real-time
video distribution has emerged as one of the most important IP multicast applications,
and it requires bandwidth adaptability. Real-time video multicast applications have to
adapt to the dynamic network conditions, but still offer reasonable playback quality to
the receivers. Liu and Zhang (2003) presented a survey on adaptive video multicast
solutions. As video and shared data are essential to many distributed tasks, audio of
sufficient quality is a necessary condition for almost any successful real-time
interaction. Cooperative caching and application-level multicast are two technologies
that can be implemented in a multimedia content delivery network for delivering
on-demand and live multimedia contents respectively. Ni and Tsang (2005) introduced
the ideas and approaches of implementing cooperative caching and application-level
multicast under a hierarchical architecture to achieve large-scale multimedia content Multimedia
4.3 Multimedia synchronization issues
Continuous media streams such as video and audio are characterized by a well-defined
temporal relationship between subsequent presentation units to be played. A
presentation unit is a logical data unit that is perceivable by the user. Multimedia 909
synchronization is the process of preserving the temporal order of one or more media
streams. This process can be implemented in the Synchronizer Entity, depicted in Figure
1. The problem of maintaining continuity within a single stream is referred as
intra-stream synchronization, whereas the problem of maintaining continuity among
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

the streams is called inter-stream synchronization. These two types of synchronization

are necessary for both live streams and for stored media streams presentations. Manvi
and Venkataram (2006) proposed an agent-based synchronization framework to handle
three synchronization mechanisms (point, real-time and adaptive) at the application
service level, depending on the life/run-time presentation requirements of the
multimedia applications. Zhang and Gollapudi (2000) investigated a framework and
systematic strategies for supporting the continuous and synchronized retrieval and
presentation of multimedia data streams in a client/server distributed multimedia
environment for educational DLs. Specifically, they established a practical framework
for specifying multimedia objects, tasks, schedules and synchronization constraints
between media streams. They identified the QoS parameters critical to the support of
MPs for learning and training activities. Based on the proposed framework and QoS
specifications, they developed presentation scheduling and buffer management
strategies, which can enforce the specified QoS requirements in an educational DL

4.4 Adaptive media coding

Multimedia data should be coded in a way such that acceptable audio/video playback
quality is still achieved, when some data packets are delayed extremely or lost. Coding
multimedia data into multiple layers is the basic solution. Some layers are assigned high
priority and they contain essential data to generate basic acceptable basic play-out
audio/video quality. Extra layers contain data that add additional details (or quality) to
the basic quality and are assigned low priority. In the case of a DL system overloading,
low priority data can be dropped first, leading to little effect to play-out quality. This
effect is named graceful quality degradation, and it can be obtained by the use of error
control techniques such as Forward Error Correction (FEC) (Mohr et al., 2000).

4.5 Video compression algorithms

Humans are less sensitive to loss of video than audio because audio is significant for
comprehension. Besides, video requires larger bandwidth (100 Kbps-15 Mbps) than audio
(8-128 kbps). In light of this evidence, audio is given higher priority and as a result only the
video can be used for adaptation. Video is always compressed before transmission because
raw video would otherwise consume far too much bandwidth, except in specialist photonic
networks. The two main compression techniques used for video are:
(1) Discrete Cosine Transformation (DCT)-based; and
(2) wavelet transforms-based.
EL In the event of congestion in the network, the video encoder can reduce its encoding rate
by temporal scaling (reducing the frame rate) or spatial scaling (reducing resolution).
32,6 DCT is the compression method used in the popular MPEG (Moving Picture Experts
Group) set of standards (ISO/IEC 13818-2, 1994). MPEG standards are used for both
video and audio signals. DCTs are used in MPEG-2, MPEG-1 and JPEG. The
transformed coefficients are quantized using scalar quantization and run length
910 encoded before transmission. The transformed higher frequency coefficients of video
are truncated given that the human eye is insensitive to these coefficients. The
compression relies on two basic methods:
(1) intra-frame DCT coding for reduction of spatial redundancy; and
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

(2) inter-frame motion compensation for reduction of temporal redundancy.

MPEG-2 video has three kinds of frames: I, P and B.

(1) I frames are independent frames compressed using only intra-frame
(2) P frames are predictive, which carry the signal difference between the previous
frame and motion vectors.
(3) B frames are interpolated (i.e. encoded based on the previous and the next frame).

MPEG-2 video is transmitted in Group of Pictures (GoP) format, which specifies the
distribution of I, P and B frames in the video stream (Aramvith and Sun, 2010). The
MPEG compression methods can be used for adaptation with two main schemes. First,
the rate of the source can be changed by using different quantization levels and encoding
rates (Duffield et al., 1998). Second, DCT coefficients can be partitioned and transmitted
in several layers with different priorities. The base layer carries the important video
information and an additional layer improves the quality. In the event of congestion, the
lower priority layer can be dropped to reduce the rate (Eleftheriadis and Batra, 2004).
Lotfallah et al. (2006) presented and evaluated adaptive streaming mechanisms, which
are based on the visual content features for non-scalable (single-layered) encoded video,
whereby the adaptation is achieved by selectively dropping B frames.
MPEG-4 is an original collection of methods defining compression of audio and
visual digital data. Uses of MPEG-4, include compression of audio and visual data for
streaming media and CD distribution, voice (telephone and videophone) and broadcast
television applications. MPEG-4 provides:
improved coding efficiency;
ability to encode mixed media data (video, audio and speech);
error resilience to enable robust transmission; and
ability to interact with the audio-visual scene generated at the receiver.

H.264/MPEG-4 Advanced Video Coding (AVC) is a standard for video compression

(Wiegand et al., 2003). The ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC
standard (formally, ISO/IEC 14,496-10 - MPEG-4 Part 10, AVC) are jointly maintained so
that they have identical technical content. H.264/AVC/MPEG-4 Part 10 contains a
number of features which allow it to compress video effectively and provide more
flexibility for application to a wide variety of network environments. In particular, some Multimedia
such key features include:
multi-picture inter-picture prediction;
spatial prediction from the edges of neighboring blocks for intra coding;
lossless macro-block coding features;
flexible interlaced-scan video coding features; 911
a quantization design;
an in-loop de-blocking filter which helps prevent the blocking artifacts common to
other DCT-based image compression techniques, resulting in better visual
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

appearance and compression efficiency;

An entropy coding; and
Loss resilience features.

These techniques help H.264 to perform significantly under a wide variety of conditions
in a wide variety of application environments. H.264 can often perform radically better
than MPEG-2 video typically obtaining the same quality at half of the bit rate or less,
especially on high bit rate and high resolution situations. H.264/AVC has a reference
software implementation that can be freely downloaded. Grecos and Wang (2011)
presented detailed technical descriptions for recent and emerging video coding
standards, in particular the H.264 family. Moreover, they introduced the applications of
selected video coding standards in emerging wireless networks with an emphasis on
scalable video streaming in multi-homed mobile networks.

4.6 Summary
Some of the important issues that have to be addressed when one is designing a
multimedia communication system for DLs are:
storage organization and management;
available physical bandwidth in the delivery path to the user;
QoS management (real-time delivery and adaptability to the environment);
information management (indexing and retrieval);
user satisfaction; and
security (especially the management of content rights).

Key requirements that should be satisfied are the following ones:

Efficient end-to-end transmission over different networks exhibiting various
characteristics and QoS guarantees.
Adaptation to the QoS provided by the network. There is large network
heterogeneity in the Internet so we have many options to deal with: wireline,
wireless and 2/2.5/3G mobile networks. The network heterogeneity and
decentralized control of the Internet constitute the major problems to provide the
scalability and reliability of providing QoS. The problem of how to extend QoS
capabilities across the multi-provider domain of DLs for providing end-to-end
services has not been solved satisfactorily to date (Ahmed et al., 2007).
EL Kanellopoulos (2011) provided an overview of research results on QoS
mechanisms in IP networks that support representative multimedia
32,6 communication applications in the cultural heritage sector.
Easy adaptability to rate variations because since the available transmission
capacity may vary due to interference, overlapping wireless LANs, competing
traffic, mobility, multipath fading and so forth.
912 Limited complexity implementations for mobile wireless devices.
Support for device scalability and user preferences because various clients may be
connected at different data rates and request transmissions that are optimized for
their respective connections and capabilities. Novel techniques and standards
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

(discussed in Section 5) can support device scalability and user preferences.

To address the above-mentioned requirements, innovative solutions are needed for

adaptive and error-resilient multimedia compression, error control, error protection and
concealment, multimedia streaming architectures, channel models and channel
estimation, packetization and scheduling and so forth. Such solutions can be developed
by a combination of theory, tools and methods from the fields of networking, signal
processing and computer engineering. This integrated and cross-disciplinary approach
has led to the advent of a new research wave in compression, joint source-channel
coding, and network-adaptive media delivery and has motivated the emergence of new
compression standards, transmission protocols and networking solutions. Recently, the
DL community has realized the potential of such integrated solutions for multimedia

5. Enabling multimedia content of DLs fully interoperable

The Internet is a heterogeneous and constantly evolving environment in which end
users of DLs are making use of different types of client devices like notebooks, desktop
PCs, workstations, set-top boxes, TV sets and mobile devices such as PDAs, cell phones
and hand-held devices. All those devices have different capabilities in terms of
computational power, memory size, display size or network capabilities. At the same
time, there are a large number of possible media formats for multimedia content. As a
result, current end-user devices cannot display all kinds of multimedia data.
Nevertheless, this does not conform to the vision of UMA (Pereira and Burnett, 2003).
According to the UMA vision, any user/device can consume any multimedia content,
anytime and anywhere. Displaying multimedia content on heterogeneous client devices
is a task of manageable complexity because users have different content/presentation
preferences and intend to consume the content at different locations, times and under
altering circumstances, within a variety of different contexts. Generally, end users can
specify explicit personal preferences, which should be taken into account when
servicing the client.

5.1 Multimedia representation approaches

The representation of multimedia content on the Internet remains challenging because
the content available on the multimedia server can be heterogeneous (e.g. in terms of
encoding). For example, a video can be encoded in different formats such as MPEG-1,
MPEG-2, MPEG-4 or H.264, using different encoder settings such as spatial and
temporal resolution, color depth or bit rate. Actually, there is a large number of file and
encoding formats in which media resources are stored at the multimedia server. A Multimedia
multimedia server must adapt the multimedia resource correspondingly before sending
it to the client for the following two reasons:
(1) The capabilities of the clients device of transforming the resource by itself may
be limited.
(2) Preferences and personalization requirements of the end users must be taken
into account. New opportunities for personalization on the content level have 913
emerged because of the new possibilities of annotating the multimedia
resources. For example, research efforts based on the MPEG-7 standard
(MPEG-7, 2004) allow enriching media content with semantic content
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

annotations, which in turn facilitate new forms of multimedia experience such as

semantic-based content selection and filtering. Unfortunately, only a few
projects have exploited the extended metadata annotation possibilities available
with the new MPEG standards (Martinez et al., 2002; Kanellopoulos, 2012).

Multimedia characteristics, including spatial resolution, bit rate or format and coding
parameters, should be considered for delivering multimedia content in the UMA
context. With a proper selection of these parameters, the DL system would have the
ability to deliver multimedia content according to the most suitable format, bit rate,
language and other parameters, taking into account user needs, user device constraints,
network status and original multimedia format. The MPEG-7 standard provides a rich
set of tools to describe multimedia content. These descriptions are associated to the
content to provide operational requirements, such as filtering, searching, indexing,
classification or extraction of certain multimedia features. This associated data may be
physically embedded in the multimedia content, in the same stream, or the same storage
device. MPEG-7 supports different granularities in the descriptions and can describe
content in different levels. Furthermore, this does not depend on the actual multimedia
container or format, no matter whether descriptions may be embedded in the encoded
multimedia asset. Besides multimedia content descriptions, it is necessary to describe
the usage environment; that is, network and device characteristics and user preferences
and needs. MPEG-7 does not offer support to these features, but the MPEG-21 standard
addresses these types of descriptions.
The MPEG-21 standard (MPEG-21, 2002; Burnett et al., 2003) provides mechanisms
that allow end users to specify explicit, personal preferences on the multimedia content.
The MPEG-21 is the ISO/IEC standard, which defines an open framework for
multimedia delivery and consumption involving all parties in the delivery and
consumption chain. MPEG-21 may be considered an open framework for multimedia
distribution. This standard describes the context, actions, resources and elements
involved in multimedia delivery. Its architecture is based on two concepts:
(1) Digital Items (DIs); and
(2) User actions for supporting the exchange, access, trade and otherwise
manipulating DIs in an efficient, transparent and interoperable way.

DIs are modeling units offering descriptions to all content formats, and the standard has
enough flexibility to create different types of DIs targeted to different user needs: music
albums, film collections, etc. These entities wrap resources or multimedia content with
EL metadata, identifiers, authorizations and the methods needed for users to interact and
exchange DIs. These entities are defined as XML objects according to the provided
32,6 definition model (i.e. digital item declaration (DID)). Moreover, part 7 of MPEG-21
(MPEG-21 Part 7, 2007) defines a Digital Item Adaptation (DIA) framework that
supports the adaptation of DIs depending on network status, user device features and
user profiles. This part of the standard defines a set of environment description tools to
914 describe the main elements involved in delivering multimedia content. The MPEG-21
DIA framework provides systematic solutions in choosing the optimal adaptation
operation to given conditions and supports interoperable video adaptation. Multimedia
content adaptation involves the execution of one or more transformation operations on
the content. It uses descriptive information about the content, user preferences and
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

usage context to provide the variant of content more adequate to the usage scenario
(Jannach and Leopold, 2007).

5.2 Multimedia streaming protocol

Streaming media is a media format which can play on the Internet by adopting the
streaming transmission way, such as audio, video or multimedia files. It can be used for
online tutorials and news, online live, online advertising, distance learning, real-time
teleconferences and so forth. The Real-time Transport Protocol (RTP) defines a
standardized packet format for delivering audio and video over IP networks (Perkins,
2003). RTP is used extensively in communication and entertainment systems that
involve streaming media. Recently, the usage of RTP has been decreased because
delivering data according to the RTPs small packet model is less efficient than using
larger data frames. In particular, delivering multimedia content in larger HTTP
segments is more effective and has several additional benefits. For example, HTTP
packets are being well-conditioned for firewall configurations and outgoing
connections. Besides, transport-level Web protocols do not manage any information
about session state on the server and there is, therefore, not any additional management
cost on content or resources. For these reasons, the popularity and usage of HTTP
streaming has risen. Thus, the MPEG group reacted by introducing a new HTTP
streaming protocol, named Dynamic Adaptive Streaming over HTTP (MPEG-DASH)
(MPEG-DASH Part 1, 2012; Sodagar, 2011). This new standard is based on a
combination of two components, namely, media content and manifest file. This
combination identifies the stream for any player and destination by means of URL
addressing. In the MPEG-DASH context, the media stream is called MP and defines a set
of sequences of small HTTP segments (Figure 2). Each sequence corresponds to a short
interval of playback time (i.e. periods) of original multimedia content. These periods
contain one or several adaptation sets that describe one or more representations of a
single stream. These representations define one or more audio or video streams with
different parameters or encoding alternatives. For instance, an adaptation set might
contain several representations with different bit rates of a same video or audio
component. Each representation set is composed of a set of media information segments
identified through URLs that correspond to chunks of data managed by the HTTP
streaming protocols. These chunks may be discrete files or byte ranges in a single media
file stored on a content HTTP server. The manifest file is named Multimedia
Presentation Description (MPD), and is an XML file that contains the information
describing and identifying the periods of a media stream. Precisely, MPD does not

Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

Figure 2.
The MPD data model

contain any media data but describes the accessible segments and corresponding
timing. According to this data model, DASH clients parse the MPD document and select
the best adaptation set according to a devices capabilities and a users profiles. Finally,
MPEG-DASH also supports layered codecs such as Scalable Video Coding (SVC) (Muller
et al., 2012) or Multiview Video Coding (MVC) (Vetro et al., 2011). In the case of SVC, each
layer is described by a different representation. The SVC layers are structured as one
base layer and several enhancement layers that depend on lower layers, down to the
base layer. This dependency can be described in MPD, allows an advanced and efficient
usage of network resources and supports dynamic adaptation according to network
status. It must be mentioned that the MPEG-DASH protocol can constitute the main
mechanism of the Player module of the DL architecture (Figure 1).

5.3 Scalable video coding

Digital multimedia transmission and storage systems are characterized by a fixed
spatial temporal stream format that would require network status to be as stable as
possible. However, in real-world situations, this condition is seldom satisfied.
Furthermore, the variety of a users preferences and devices with different capabilities is
not suitable for transmitting fixed multimedia formats. SVC (Schwarz et al., 2007) is an
extension of the H.264/MPEG-4 AVC video compression standard that was introduced
as a convenient solution to the problems enumerated above. The main concept in SVC is
coding high-quality video streams as one or more sub-streams in a similar way to
existing video coding standards, but according to different quality and transmission
parameters. Thus, the term scalable in SVC refers to the capability of removing parts of
the global stream (i.e. some sub-streams) in a way that the resulting stream is still valid,
but with degraded quality with respect to the original stream. It must be mentioned that
SVC can be embedded in the Media Encoder module of the DL architecture depicted in
Figure 1. SVC offers some types of scalability (Ohm, 2005), as seen in Figure 3.
SVC offers these capabilities and also supports the combination of them, actually
supporting multiple representations of multimedia content with different
spatial-temporal resolution and bit rates in a single stream. With these features, source


Figure 3.
Temporal, spatial and
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

quality scalability

content is encoded only once according to the highest requirements (i.e. high resolution
and bit rate), avoiding the need to re-code content for each specific application or
situation. Moreover, SVC contains parts with different video qualities, which in
conjunction with unequal error protection is mainly useful in multimedia transmission
scenarios over the Internet. Thus, SVC also offers resilience and protection against error
SVC defines a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL).
VCL is a coded representation of the multimedia source, while NAL formats this data to
support header information to use VCL data in a variety of systems and situations. NAL
units are packets with an integer number of bytes. The first byte of an NAL unit
represents the type of data and the remaining bytes correspond to the data payload.
These NAL units are classified into VCL NAL units that contain coded slices or coded
slices plus non-VCL NAL units. The last ones contain additional information (i.e.
parameter sets) and Supplemental Enhancement Information (SEI). This latter
information assists the decoding process or related processes like bit-stream
manipulation. A set of NAL units is called an Access Unit and a set of successive Access
Units compose a coded video sequence that represents a part of a bit stream. The VCL
units follow a block-based hybrid video coding approach. The pictures source content is
partitioned into macro-blocks, each one covering a rectangular area of 16 16 luma
samples at a 4:2:0 chroma sampling format. These macro-blocks are organized into
slices and can be parsed independently of other macro-blocks. In SVC, scalability is
supported at bit-stream level, so to obtain a bit stream with a reduced spatial-temporal
resolution or/and fidelity, some NAL units will be discarded keeping the ones needed to
decode the stream according to a specific spatial-temporal resolution. A consequence of
this approach is SVC defining one base layer and several enhancement layers (Figure 4).
Each layer represents a bit stream with specific spatial resolution and fidelity, and is
referenced by a layer identifier. The layer with identifier zero is called the based layer
and is available in some access units. This level is non-scalable, so that it does not
employ any information from other layers for decoding. In each access unit, each layer
is encoded in increasing order of their layer identifiers. Layers needing some information
from other layers to be decoded are called enhancement layers, and spatial resolution or
fidelity is modified in each enhancement in a way that one layer increases the resolution
or fidelity of the previous layer. Therefore, each input picture of a spatial or fidelity layer

Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

Figure 4.
SVC layer scheme

is split into macro-blocks enabling intra-layer coding. In addition to the basic coding,
SVC includes inter-layer prediction methods to improve the coding efficiency of
enhancement layers by adding a macro-block coding mode, referencing blocks in
several layers. It is noteworthy that Choi et al. (2007) proposed a dynamic adaptation
scheme of SVC bit-stream using the MPEG-21 DIA tool.

6. Conclusion
Multimedia networking is evolving as one of the most active research areas.
Nevertheless, several problems related to the optimal design of source coding schemes
aimed at transmission over a variety of networks, joint source-channel coding trade-offs
and flexible multimedia architectures remain open. A major challenge is ensuring that
the multimedia-networked content of digital video libraries is fully interoperable, with
ease of management and standardized multimedia content adapted for interoperable
delivery, as well as intellectual property management and protection (i.e. DRM),
successfully incorporated in the DL system.
In this paper, we provided an overview of multimedia services and applications for
digital video libraries such as spatial-temporal information-based video retrieval,
recommender services for video library objects, semantic search of cultural content,
image browsers and photos albums, broadcasting for digital video library applications,
IPTV, mobile Web services, information access from anytime and anywhere and so
forth. Such multimedia networking applications benefit from the rapid development of
encoding techniques of multimedia data sources, and effective QoS management
mechanisms. We also presented research results on various aspects of multimedia
networking that support such services and applications. In particular, we focused on
those standards and mechanisms that enable the multimedia content adaptation fully
interoperable in DLs according to the vision of UMA. We discussed state-of-the-art
multimedia representation approaches, the multimedia streaming protocol DASH and
SVC techniques. Despite the current significant advances in video signal processing,
ubiquitous networking standards and emerging technologies, various challenges have
yet to be further dealt with. There is an inherent need to present frameworks, standards,
techniques, QoS management mechanisms and other tools that deal with various
EL components/issues in multimedia networking for digital video libraries, as indicated
throughout our paper.

1. DELOS Network of Excellence on Digital Libraries,
918 2. Digital Library Interoperability, Best Practices and Modelling Foundations,
3. MILOS: Multimedia Digital Library for On-line Search,
4. FEDORA: Flexible Extensible Digital Object and Repository Architecture, www.fedora-
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)
5. OVDLT: The Open Video Digital Library Toolkit,
6. The Europeana DL,
7. CAMTASIA software,

Ahmed, T., Asgari, A., Mehaoua, A., Borcoci, E., Berti-Equille, L. and Kormentzas, G. (2007),
End-to-end quality of service provisioning through an integrated management system for
multimedia content delivery, Computer Communications, Vol. 30 No. 3, pp. 638-651.
Amato, G., Bolettieri, P., Debole, F., Falchi, F., Rabitti, F. and Savino, P. (2006), Using MILOS to
build a multimedia digital library application: the PhotoBook experience, Research and
Advanced Technology for Digital Libraries, Vol. 4172, pp. 379-390.
Amato, G., Gennaro, C., Rabitti, F. and Savino, P. (2004), MILOS: a multimedia content
management system for digital library applications, Research and Advanced Technology
for Digital Libraries, Vol. 3232, pp. 14-25.
Apps, A. and MacIntyre, R. (2006), Why OpenURL?, D-Lib Magazine, Vol. 12 No. 5.
Aramvith, S. and Sun, M.T. (2010), MPEG-1 and MPEG-2 video standards, in Bovik, A.C. (Ed),
Handbook of Image and Video Processing, 2nd ed., Academic Press, Waltham, MA,
pp. 833-848.
Bailin, A. and Pea, A. (2007), Online library tutorials, narratives, and scripts, The Journal of
Academic Librarianship, Vol. 33 No. 1, pp. 106-117.
Bhargava, B. and Annamalai, M. (2000), A communication framework for digital libraries,
Multimedia Tools and Applications, Vol. 10 Nos 2/3, pp. 205-236.
Bollen, J., Nelson, M.L., Geisler, G. and Araujo, R. (2007), Usage derived recommendations for a
video digital library, Journal of Network and Computer Applications, Vol. 30 No. 3,
pp. 1059-1083.
Burnett, I., de Walle, R.V., Hill, K., Bormans, J. and Pereira, F. (2003), MPEG-21: goals and
achievements, IEEE MultiMedia Magazine, Vol. 10 No. 6, pp. 60-70.
Candela, L., Athanasopoulos, G., Castelli, D., El Raheb, K., Innocenti, P., Ioannidis, Y., Katifori, A.,
Nika, A., Vullo, G. and Ross, S. (2011), The digital library reference model, Project
Choi, H., Kang, J.W. and Kim, J.-G. (2007), Dynamic and interoperable adaptation of SVC for
QoS-enabled streaming, IEEE Transactions on Consumer Electronics, Vol. 53 No. 2,
pp. 384-389.
Duffield, N.G., Ramakrishnan, K.K. and Reibman, A.R. (1998), SAVE: an algorithm for smoothed Multimedia
adaptive video over explicit rate networks, IEEE/ACM Transactions on Networking,
Vol. 6 No. 6, pp. 717-728. networking
Eleftheriadis, A. and Batra, P. (2004), Optimal data partitioning of MPEG-2 coded video, IEEE issues
Transactions on Circuits and Systems for Video Technology, Vol. 14 No. 10, pp. 1195-1209.
Garca-Crespo, ., Gmez-Berbs, J.M., Colomo-Palacios, R. and Garca-Snchez, F. (2011), Digital
libraries and Web 3.0. The CallimachusDL approach, Computers in Human Behavior, 919
Vol. 27 No. 4, pp. 1424-1430.
Ghinea, G. and Ademoye, O. (2012), The sweet smell of success: enhancing multimedia
applications with olfaction, ACM Transactions on Multimedia Computing,
Communications, and Applications, Vol. 8 No. 1, pp. 2.
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

Gonalves, M.A., Moreira, B.L., Fox, E.A. and Watson, L.T. (2007), What is a good digital
library? A quality model for digital libraries, Information Processing &
Management, Vol. 43 No. 5, pp. 1416-1437.
Gozdecki, J., Jajszczyk, A. and Stankiewicz, R. (2003), Quality of service terminology in IP
networks, IEEE Communications Magazine, Vol. 41 No. 3, pp. 153-159.
Grecos, C. and Wang, Q. (2011), Advances in video networking: standards and applications,
International Journal of Pervasive Computing and Communications, Vol. 7 No. 1, pp. 22-43.
Hua, K.A. and Sheu, S. (2000), An efficient periodic broadcast technique for digital video
libraries, Multimedia Tools and Applications, Vol. 10 Nos 2/3, pp. 157-177.
Hwang, J.-N. (2009), Multimedia Networking: From Theory to Practice, Cambridge University
Press, New York, NY.
Hwang, S.-Y., Yang, W.-S. and Ting, K.-D. (2010), Automatic index construction for multimedia
digital libraries, Information Processing & Management, Vol. 46 No. 3, pp. 295-307.
ISO/IEC 13818-2 (1994), Generic coding of moving pictures and associated audio information,
Technical report, MPEG (Moving Pictures Expert Group), International Organization for
Jannach, D. and Leopold, K. (2007), Knowledge-based multimedia adaptation for ubiquitous
multimedia consumption, Journal of Network and Computer Applications, Vol. 30 No. 3,
pp. 958-982.
Johnson, R. (2010), Freescale #DSP tackles scalable video coding, available at: http://nextgenlog. (accessed January 2013).
Kanellopoulos, D. (2009), High-speed multimedia networks: critical issues and trends, in Lee, I.
(Ed), Handbook of Research on Telecommunications Planning and Management for
Business, IGI Global, Hershey, PA, pp. 775-787.
Kanellopoulos, D. (2011), Quality of service in networks supporting cultural multimedia
applications, Program: Electronic Library and Information Systems, Vol. 45 No. 1,
pp. 50-66.
Kanellopoulos, D. (2012), Semantic annotation and retrieval of documentary media objects, The
Electronic Library, Vol. 30 No. 5, pp. 721-747.
Kanellopoulos, D., Lalos, P. and Tombras, G. (2012), Implementing a zoomable web browser with
annotation features for managing libraries of high quality images, International Journal of
Innovative Computing, Information and Control, Vol. 8 No. 10, pp. 7725-7235.
Kollia, I., Kalantidis, Y., Rapantzikos, K. and Stafylopatis, A. (2012), Improving semantic search
in digital libraries using multimedia analysis, Journal of Multimedia, Vol. 7 No. 2,
pp. 193-204.
EL Lin, E.T., Eskicioglu, A.M., Lagdendijk, R.L. and Delp, E.J. (2005), Advances in digital video
content protection, Proceedings of the IEEE, Vol. 93 No. 1, pp. 171-183.
32,6 Lin, F.-C., Lai, C.-Y. and Hong, J.-S. (2009), Heuristic algorithms for ordering media objects to
reduce presentation lags in auto-assembled multimedia presentations from digital
libraries, The Electronic Library, Vol. 27 No 1, pp. 134-148.
Liu, J. and Zhang, Y.-Q. (2003), Adaptive video multicast over the Internet, IEEE Multimedia,
920 Vol. 10 No. 1, pp. 22-31.
Liu, Y., Guo, Y. and Liang, C. (2008), A survey on peer-to-peer video streaming systems, Peer-to-
Peer Networking and Applications, Vol. 1 No. 1, pp. 18-28.
Lotfallah, O., Reisslein, M. and Panchanathan, S. (2006), Adaptive video transmission schemes
using MPEG-7 motion intensity descriptor, IEEE Transactions on Circuits and Systems
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

for Video Technology, Vol. 16 No. 8, pp. 929-946.

Lu, G. (2000), Issues and technologies for supporting multimedia communications over the
Internet, Computer Communications, Vol. 23 Nos 14/15, pp. 1323-1335.
Madalli, D.P., Barve, S. and Amin, S. (2012), Digital preservation in open-source digital library
software, The Journal of Academic Librarianship, Vol. 38 No. 3, pp. 161-164.
Manvi, S. and Venkataram, P. (2006), An agent based synchronization scheme for multimedia
applications, The Journal of Systems and Software, Vol. 79 No. 5, pp. 701-713.
Martinez, J., Koenen, R. and Pereira, F. (2002), MPEG-7 the generic multimedia content
description standard part 1, IEEE MultiMedia Magazine, Vol. 9 No. 2, pp. 78-87.
Mikczy, E., Vidal, I. and Kanellopoulos, D. (2012), IPTV evolution towards NGN and hybrid
scenarios, Informatica, Vol. 36, No. 1, pp. 3-12.
Mohr, A.E. Riskin, E.A. and Ladner, R.E. (2000), Unequal loss protection: graceful degradation of
image quality over packet erasure channels through forward error correction, IEEE
Journal on Selected Areas in Communications, Vol. 18 No. 6, pp. 819-828.
MPEG-21 (2002), Overview V. 5, ISO/EIC JTC1/SC29/WG11 N5231.
MPEG-21 Part 7 (2007), ISO/EIC 21000-7, Information technology multimedia framework
(MPEG-21) Part 7: digital item adaptation.
MPEG-7 (2004), Overview V. 10, ISO/IEC JTC1/SC29/WG11 N6828.
MPEG-DASH Part 1 (2012), ISO/IEC 23009-1, Information Technology Dynamic Adaptive
Streaming Over HTTP (DASH) Part 1: Media Presentation Description and Segment
Muller, C., Renzi, D., Lederer, S., Battista, S. and Timmerer, C. (2012), Using scalable video coding
for dynamic adaptive streaming over HTTP in mobile environments, Proceedings of the
20th European Signal Processing Conference (EUSIPCO), 27-31 August, pp. 2208-2212.
Ni, J. and Tsang, D.H.K. (2005), Large scale cooperative caching and application-level multicast in
multimedia content delivery networks, IEEE Communications Magazine, Vol. 43 No. 5,
pp. 98-105.
OAI-PMH (2006), Open archives initiative protocol for metadata harvesting, available at: (accessed 12 December 2012).
Ohm, J.-R. (2005), Advances in scalable video coding, Proceedings of the IEEE, Vol. 93 No. 1,
pp. 42-56.
Ozer, J. (2011), What is MPEG DASH?, available at:
rticle.aspx?ArticleID79041 (accessed December 2012).
Park, J. and Lu, C. (2009), Application of semi-automatic metadata generation in libraries: types,
tools, and techniques, Library and Information Science Research, Vol. 31 No. 4, pp. 225-323.
Paul, S. (1998), Multicasting on the Internet and its Applications, Kluwer, Dordrecht. Multimedia
Paul, S., Pan, J. and Jain, R. (2011), Architectures for the future networks and the Next Generation networking
Internet: a survey, Computer Communications, Vol. 34 No. 1, pp. 2-42.
Pereira, F. and Burnett, I. (2003), Universal multimedia experiences for tomorrow, IEEE Signal
Process Magazine, Vol. 20 No. 2, pp. 63-73.
Perkins, C. (2003), RTP: Audio and Video for the Internet, Addison-Wesley Professional.
Pratha, L., Mattam, M., Ambati, V. and Reddy, R. (2006), Multimedia digital library: performance 921
and scalability issues, International Conference of the Universal Digital Library (ICUDL),
Bibliotheca Alexandrina, Alexandria.
Rarnaiah, C.K. (1998), Multimedia systems in libraries and their applications, DESIDOC Bulletin
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

of Information Technology, Vol. 18 No 6, pp. 25-40.

Recep Okur, M. and Gms, S. (2010), Storyboarding issues in online course production process,
Procedia - Social and Behavioral Sciences, Vol. 2 No. 2, pp. 4712-4716.
Reimers, U. (2006), DVB the family of international standards for digital video broadcasting,
Proceedings of the IEEE, Vol. 94 No. 1, pp. 173-182.
Ren, W., Singh, S., Singh, M. and Zhu, Y.S. (2009), State-of-the-art on spatio-temporal
information-based video retrieval, Pattern Recognition, Vol. 42 No. 2, pp. 267-282.
Schwarz, H., Marpe, D. and Wiegand, T. (2007), Overview of the scalable video coding extension
of the H.264/AVC standard, IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 17 No. 9, pp. 1103-1120.
Silver, S. and Nickel, L. (2005), Are online tutorials effective? A comparison of online and
classroom library instruction methods, Research Strategies, Vol. 20 No. 4, pp. 389-396.
Sodagar, I. (2011), The MPEG-DASH standard for multimedia streaming over the Internet,
IEEE Multi Media, Vol. 18 No. 4, pp. 62-67.
Vetro, A., Wiegand, T. and Sullivan, G.J. (2011), Overview of the stereo and multiview video
coding extensions of the H.264/MPEG-4 AVC standard, Proceedings of the IEEE, Vol. 99
No. 4, pp. 626-642.
Wang, C.-Y., Ke, H.-R. and Lu, W.-C. (2012), Design and performance evaluation of mobile web
services in libraries: a case study of the Oriental Institute of Technology Library, The
Electronic Library, Vol. 30 No. 1, pp. 33-50.
Wei, Z. (2011), Research on the application of Open Source Software in digital library, Procedia
Engineering, Vol. 15, pp. 1662-1667.
Wiegand, T., Sullivan, G., Bjntegaard, G. and Luthra, A. (2003), Overview of the H.264/AVC
Video Coding Standard, IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 13 No. 7, pp. 560-576.
Zhang, A. and Gollapudi, S. (2000), QoS management in educational digital library
environments, Multimedia Tools and Applications, Vol. 10 Nos 2/3, pp. 133-156.

Further Reading
Cordes, S. (2008), Process management for library multimedia development service, Library
Management, Vol. 29 No. 3, pp. 185-198.
Kanellopoulos, D. (2010), Intelligent multimedia engines for multimedia content adaptation,
International Journal of Multimedia Intelligence and Security, Vol. 1 No. 1, pp. 53-75.
Naren, K., Csilla, F. and Duminda, W. (2004), An authorization model for multimedia digital
libraries, International Journal on Digital Libraries, Vol. 4 No. 3, pp. 139-155.
EL zer, I.B., Wolf, W. and Akansu, A. (2002), A graph-based object description for information
retrieval in digital image and video libraries, Journal of Visual Communication and Image
32,6 Representation, Vol. 13 No. 4, pp. 425-459.
Schwartz, C. (2000), Digital libraries: an overview, The Journal of Academic Librarianship,
Vol. 26 No. 6, pp. 385-393.
Witten, I.H., Bainbridge, D. and Nichols, D. (2009), How to Build a Digital Library, Morgan
922 Kaufmann.

About the author

Dimitris N. Kanellopoulos is a member of the Educational Software Development Laboratory
(ESD Lab) in the Department of Mathematics at the University of Patras, Greece. He holds a PhD
Downloaded by UNIVERSITAS SUMATERA UTARA At 00:41 16 March 2017 (PT)

in multimedia communications from the University of Patras. Since 1990, he was a Research
Assistant in the Department of Electrical and Computer Engineering at the University of Patras
and involved in several European Union R&D projects. He is a reviewer for journals such as
International Journal of Communication Systems, Journal of Systems and Software, Information
Sciences, IETE Technical Review and The Electronic Library. He has served as a technical program
committee (TPC) member to more than 30 international conferences. His research interests include
multimedia communications, multimedia networks, intelligent information systems and
knowledge representation. He has many papers in international journals and conferences in these
areas. Recently, he edited a book titled, Intelligent Multimedia Technologies for Networking
Applications: Techniques and Tools, published by IGI Global. Dimitris Kanellopoulos can be
contacted at:

To purchase reprints of this article please e-mail:

Or visit our web site for further details: