You are on page 1of 12

Robust Multilayer Control for Enhanced

Wireless Telemedical Video Streaming

Maria G. Martini, Senior Member, IEEE, Robert S.H. Istepanian, Senior Member, IEEE,
Matteo Mazzotti, Member, IEEE, and Nada Y. Philip, Member, IEEE
AbstractM-health is an emerging area of research and one of the key challenges in future research in this area is medical video
streaming over wireless channels. Contrasting requirements of almost lossless compression and low available bandwidth have to be
tackled in medical quality video streaming in ultrasound and radiology applications. On one side, compression techniques need to be
conservative, in order to avoid removing perceptively important information; on the other side, error resilience and correction should be
provided, with the constraint of a limited bandwidth. A quality-driven, network-aware approach for joint source and channel coding
based on a controller structure specifically designed for enhanced video streaming in a robotic teleultrasonography system is
presented. The designed application based on robotic teleultrasonography is described and the proposed method is simulated in a
wireless environment in two different scenarios; the video quality improvement achievable through the proposed scheme in such an
application is remarkable, resulting in a peak signal-to-noise ratio (PSNR) improvement of more than 4 dB in both scenarios.
Index TermsMedical video streaming, wireless telemedicine, m-health, cross-layer design, robotic ultrasonography.

URRENT and emerging developments in wireless com-
munications integrated with developments in perva-
sive and wearable technologies will have a radical impact
on future healthcare delivery systems. M-Health can be
defined as mobile computing, medical sensors, and
communication technologies for healthcare [1], [2]. This
emerging concept represents the evolution of e-health
systems from traditional desktop telemedicine platforms
to wireless and mobile configurations.
In this paper, we present an advanced mobile healthcare
application example (a mobile robotic tele-echography
system) requiring a demanding medical data and video
streaming traffic in a heterogeneous network topology that
combines 3G and WLAN environments.
Since, in this case, medical video streaming is the most
demanding application, it represents the main focus of this
work. Medical video compression techniques for telemedi-
cal applications have requirements of high fidelity, in order
to avoid loss of information that could help diagnosis. In
order to keep diagnostic accuracy, lossless compression
techniques are thus often considered when medical video
sequences are involved. However, when transmission is
over band-limited, error-prone wireless channels, a com-
promise should be made between compression fidelity and
protection/resilience to channel errors and packet loss.
From the medical imaging perspective, it has been observed
that when lossy compression is limited to ratios from 1:5 to
1:29, compression can be achieved with no loss in
diagnostic accuracy [3]. Furthermore, even if the final
diagnosis should be done using an image that has been
reversibly compressed, irreversible compression still plays
a critical role when quick access to data stored in a remote
location is needed. For these reasons, lossy compression
techniques have been considered for medical images and
ultrasound medical video [4].
Recently, joint source and channel coding and decoding
(JSCC/D) [5] techniques that include a coordination
between source and channel encoders were investigated
[9], e.g., for transmission of audio data [10], images [11], and
video [12]. It was shown that, for wireless audio and video
transmission, separate design of source and channel coding
does not necessarily lead to the optimal solution [5], nor is
always applicable, in particular, when transmitting data
with real-time constraints or operating on sources where the
bit error sensitivity of encoded data varies significantly. In
some of these works, transmission is adapted to source
characteristics (unequal error protection (UEP)), either at
channel coding level or through source adaptive modulation
[10], [13]. JSCC/D techniques may also require the use of
rate/distortion curves or models of the source in order to
perform the optimal compromise between source compres-
sion and channel protection [11]. Joint source and channel
coding involves joint design of the source encoder, at the
application layer, and of channel encoder/modulator, at the
physical layer. The most recent video source codecs (as
MPEG-4 [7], H.264 [8], SVC [14]) have in-built error
resilience tools, and allow the decoder to cope with errors
and conceal them at the receiver side [15], [16], [17]. Such
tools are taken into account in the considered framework.
Cross-layer design is a recent further evolution of the
concept of joint source and channel coding, targeting at
jointly designing the classically separated OSI layers.
Characteristics and limits of such an approach are described,
. M.G. Martini, R.S.H. Istepanian, and N.Y. Philip are with the Faculty of
Computing, Information Systems and Mathematics, Kingston University,
Penrhyn Road, Kingston-Upon-Thames, KT1 2EE, London, UK.
E-mail: {m.martini, r.istepanian, n.philip}
. M. Mazzotti is with CNIT/DEIS, University of Bologna, Italy.
Manuscript received 7 Mar. 2007; revised 29 Apr. 2008; accepted 3 Mar.
2009; published online 2 Apr. 2009.
For information on obtaining reprints of this article, please send e-mail to:, and reference IEEECS Log Number TMC-2007-03-0068.
Digital Object Identifier no. 10.1109/TMC.2009.78.
1536-1233/10/$26.00 2010 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS
e.g., in [18], [19]. Regardless of the recent interest in such
techniques, no study to date addresses the application of
JSCC/D and cross-layer approaches to advanced mobile
telemedical applications ("m-JSCC" in the following).
In this paper, we present the application and the
performance analysis of our cross-layer approach [6] for a
robotic ultrasonography system with high (diagnostic)-
quality medical video streaming requirements. Among the
several assessment metrics for medical video quality that
have been proposed (examples are those described in [20],
[21], [22], [23], [24], [25], [26]), we will focus on the classic
peak signal-to-noise ratio (PSNR), for comparison purpose,
and on metrics evaluating structural distortion [22] and
better representing diagnostic accuracy. A quality-driven
approach is considered here in the sense that the received
quality is monitored and such information is used for the
selection of system parameters.
The paper is organized as follows: In Section 2, the
robotic mobile tele-echography system is presented to-
gether with the requirements for ultrasonography video
transmission. In Section 3, the management of the informa-
tion to be exchanged among the system component blocks
is addressed and the logical units responsible of system
optimization, in the following referred to as multilayer (or
JSCC/D) controllers, having a key role in the system, are
analyzed. Results and discussion are provided in Section 4.
Teleultrasound systems for remote diagnosis have been
proposed in the last 10 years [27], [28], [29], [30], [31], [32],
[33], [34] given the need to allow teleconsultation when
access of the medical specialist to the sonographer is not
possible. An advanced medical robotic system was devel-
oped in the mObile Tele-Echography using an ultra-Light
rObot (OTELO) European IST project. The project resulted
in a fully integrated end-to-end mobile tele-echography
system for population groups that are not served locally,
either temporarily or permanently, by medical ultrasound
experts [35]. The system comprises a fully portable
teleoperated robot allowing a specialist sonographer to
perform a real-time robotized tele-echography to remote
patients. OTELO is a remotely controlled system designed
to achieve reliable ultrasound imaging at an isolated site,
distant from a specialist clinician. Fig. 1 shows the main
operational blocks. This tele-echography system is com-
posed of the following:
1. An expert site, where the medical expert interacts
with a dedicated patented pseudohaptic fictive
probe, instrumented to control the positioning of
the remote robot and emulating an ultrasound probe
that medical experts are used to handle, thus
providing a better ergonomy.
2. The communication media. We developed commu-
nication software based upon IP protocol to adapt
to different communication links (wired and wire-
less links).
3. A patient site made up of the 6 degrees of freedom
(Dof) lightweight robotic system and its control unit.
Further details on this system are described in [2], [35].
With recent advances in mobile technologies, WLAN,
PDAs, and other hand-held devices featured with different
software and hardware capabilities are used by physicians,
nurses, and other paramedical staff needing to be updated
with the medical reports of patients over the air interface. In
this paper, OTELOis integratedina 3G/WLANenvironment
so that the healthcare professionals, who are on the move,
might have continuous access to the patient information.
2.1 The OTELO System: Functional Modalities and
3G/WLAN Wireless Connectivity
It is well known that 3G cellular technology is characterized
by wide area coverage, which is its biggest advantage. On
the other hand, 802.11 WLAN offers high bandwidth
connections at low cost but in limited range. These two
mainstream wireless access methods have dominated the
wireless broadband Internet market. However, the most
probable application scenario is the coexistence of both.
Telemedicine is one of the multimedia applications that will
benefit from this scenario.
OTELO can be considered as a bandwidth-demanding
m-health system, with challenging classes of QoS require-
ments, since several medical ultrasound images, robotic,
and other data have to be transmitted simultaneously.
Fig. 2 shows the proposed 3G/WLAN connectivity of the
OTELO system and the interface requirements. In this
scenario, we assume that the OTELOs Expert Station is
Fig. 1. The OTELO mobile robotic system.
connected to the OTELO system via the specialist hospital
WLAN network.
The detailed medical and nonmedical OTELO data
traffic characteristics are shown in Table 1. As ultrasound
images are mostly transferred from the robot probe to the
OTELO Expert Station, the air interface between the OTELO
Patient Station and the Radio Network Controller (RNC)
bearer is characterized by asymmetric traffic load. Still
ultrasound images, stream ultrasound images, ambient
video, sound, and robot control data are sent over the
uplink channel, while only robot control, ambient video,
and sound need to be downloaded to the patient side (i.e.,
Expert station uploading).
From Table 1, it can be seen that for the OTELO system,
the most bandwidth-demanding traffic consists of medical
ultrasound (US) streaming data. For this reason, the focus in
this work is on the transmission of US data. According to
communication link limitations, various scenarios can be
identified with respect to the data traffic that should be sent
simultaneously in order to enable the medical examination
performance. For the current study, we consider the
following options:
1. When the expert is searching for a specific organ
(liver, kidney, etc.), high-quality images may not be
required and simple compression methods or lossy
techniques can be applied. The lowest data rate
acceptable by medical experts in this scenario is
approximately 210 Kbits/s with a frame rate of 15 fps.
2. When the organ of interest is found and small
displacements of the robot take place, it may be
necessary to consider lossless compression techni-
ques that would bring higher image quality to the
expert. This lossless compression can be applied to
the whole image or to a region of interest (ROI). From
the medical perspective and in order to provide a
real-time virtual interactivity between the remote
consultant and the manipulated robot, the best round
trip delay from the expert station between the robot
commanded position and the received correspond-
ing image should not exceed 300 ms.
Fig. 2. 3G/WLAN wireless OTELO connectivity system. The hospital site is represented on the left in the ellipse and the patient station is depicted on
the right.
OTELO Medical Data Requirements and Corresponding Data Rates [35]
3. There is the need to have a multisite specialist
wireless connectivity in the hospital and to provide a
second diagnostic opinion about the received ultra-
sound images. Hence, in this study, we assume an
additional multispecialist WLAN connectivity sys-
tem to provide such a service.
2.2 WLAN Connectivity for Expert Diagnosis
It is well known that the IEEE 802.11e WLAN standard
adds quality-of-service (QoS) features and multimedia
support to the existing IEEE 802.11b and IEEE 802.11a
wireless standards, while maintaining full backward
compatibility with them. An orthogonal frequency division
multiplexing (OFDM) encoding scheme rather than FHSS
or DSSS is used in 802.11a standard. 802.11b, often called
Wi-Fi, considers complementary code keying (CCK) as
modulation method, which allows higher data speeds, and
is less susceptible to multipath-propagation interference.
Although WLAN connectivity allows the possibility to
use higher bandwidth, we need to consider that data
transmitted in the hospital WLAN are possibly received
from the UMTS link. In this case, the UMTS link represents
the bottleneck and the source bit rate considered in the
WLAN section is limited by the one received from the
UMTS link. However, more error protection can be
provided to data in the WLAN section.
As shown in Fig. 2, the extended use of WLAN
connectivity is configured for second opinion and multi-
diagnosis services in the expert side of the OTELO system
The proposedarchitecture for ultrasoundvideo transmission
over 3G/WLAN systems is described in this section,
focusing, in particular, on the system controller structure.
Fig. 3 illustrates the overall systemarchitecture proposed,
from the transmitter side (patient side) in the upper part of
the figure to the receiver side (expert side) in the lower part,
and including the signalization used for transmitting the
JSCC/D control information in the system. We focus, in fact,
on the transmission of ultrasound video from patient to
specialist. Beside the traditional tasks performed at applica-
tion level (source encoding, application processing such as
ciphering), at network level (including real time transport
protocol (RTP)/UDPLite/IPv6 packetization, impact of IPv6
wired network, and Robust header Compression (RoHC)),
medium access (including enhanced mechanisms for WiFi),
and radio access (channel encoding, interleaving, modula-
tion), the architecture includes two controller units at
physical and application layers. Those controllers are
introduced for supervising the different (de)coders, (de)-
modulation, (de)compression modules and to adapt said
modules parameters to changing conditions, through the
sharing of information about the source, network and
channel conditions, and user requirements. For the control-
ling purpose, a signaling mechanism has been defined, with
detail in the following section.
3.1 Side Information Exchange Mechanisms in the
System optimization is performed according to information
about the different system blocks, which is collected and
managed by the system controllers. In particular, the
information that is taken into account by the system for
optimization is composed of source significance information
(SSI), i.e., the information on the sensitivity of the source
(encoded medical video) bitstream to channel errors;
channel state information (CSI); decoder reliability informa-
tion (DRI), i.e., soft values output by the channel decoder;
source a priori information (SRI), e.g., statistical information
on the source; source a posteriori information (SAI), i.e.,
information only available after source decoding; network
state information (NSI), represented, e.g., by packet loss rate
and delay; and finally, the ultrasonography video quality
measure, output from the source decoder (at the expert site)
and used as feedback information for system optimization.
This last measure is critical as the target of the overall system
optimization is the maximization of the received ultrasound
video quality, which is the ultimate goal, since it corresponds
Fig. 3. m-JSCC/D architecture for OTELO system.
to the possibility to performa correct diagnosis. See also [26],
[23] for the relevance of perceptual optimization. Due to the
fact that this quality measure is regularly sent back to the
patient side for action, the evaluation should be performed
on the fly and without reference (or reduced reference) to
the transmitted frame (see, e.g., [25]).
Clearly, when considering real-time diagnostic systems,
this control information needs to be transferred through
the network and systems layers, in a timely and band-
width-efficient manner. The impact of the network and
protocol layers is quite often neglected when studying joint
source and channel coding and only minimal effort is
made into finding solutions for providing efficient inter-
layer signaling mechanisms for JSCC/D. Different mechan-
isms have been identified, which could allow information
exchange transparently for the network layers (see, e.g.,
[36], [37]). Besides the said novel solutions, several
transport protocols allow carrying the payload and also
some control information. In particular, UDP, UDPlite, and
datagram congestion control protocol (DCCP) protocols are
considered at transport layer level. For further information
on such solutions, the reader may refer to [37].
Finally, it should be noted that additional information is
requested by the system for the setup phase, where
information on available options (e.g., available channel
encoders and available channel coding rates, available
modulators, . . . ) and a priori information on the transmitted
ultrasound video (e.g., statistical characterization of video
sequence) are exchanged, the session is negotiated, and
default parameters are set (e.g., authentication key, modules
default settings).
3.2 Principle of the m-JSCC/D Controller Structure
Fig. 4 shows a schematization of the multilayer controller
structure, representing the core of the proposed m-health
transmission system, aiming at a global system optimization
in terms of received medical video quality, by collecting step
by step information on the system and providing up-to-date
control parameters to the relevant system blocks.
The system controller is represented by two distinct
units, namely the physical layer (PHY) controller and the
application layer (APP) controller. The latter collects
information from the network (NSI: packet loss rate, delay,
and delay jitter) and from the source (e.g., SSI), and has
availability of reduced channel state information and the
quality metric of the previously decoded frame (or group
of frames) of ultrasound video. According to this informa-
tion, it produces controls for the source encoder block (e.g.,
quantization parameters, frame rate, error resilience tools to
activate) and the network.
The task of the PHY controller unit is to provide controls
to the physical layer blocks, i.e., the channel encoder,
modulator, and interleaver.
A more detailed description of the controller component
units with exemplifications of their functionalities is
presented below.
3.3 Application Layer Controller Unit
Given the amount of information that can be exploited and
the number of parameters to be set, the specific application
controller has been modeled as a finite state machine so that
the controller allows to switch among the different states
according to the collected information. At the beginning of
each iteration cycle, the controller decides the next operat-
ing state, which is defined by a fixed set of configuration
parameters for the different blocks of the chain. The choice
of the new state is based on the history and feedback
information coming from the blocks at the receiver (expert)
side, relevant to the previous cycle.
The algorithm dynamically performs adaptation to chan-
nel conditions, network conditions, and source character-
istics, by considering the feedback information available.
The feedback information collected at the expert side
and used at patient side for the choice of medical video
coding parameters and transmission parameters is sum-
marized as follows:
. Medical video quality at the expert end: PSNR or
other quality metric (e.g., based on structural
distortion [22], or without reference to the original
sequence [25]). A no-reference or reduced reference
metric should in fact be considered in a realistic
implementation: at the expert side, the original
frame transmitted from the patient side is not
available for comparison; furthermore, attention
should be paid on the choice of a metric representing
diagnostic accuracy, which is the final goal of
medical image and video transmission.
Fig. 4. Controller structure (patient side).
. Reduced CSI, composed, for example, of the average
signal-to-noise ratio (SNR) in one controller step and
of channel coherence time.
. NSI: number of lost packets, average jitter, and
average round trip time (RTT).
The main configuration parameters set by the APP m-
JSCC/D and modifiable at each simulation step are:
. frame rate of encoded medical video,
. quantization parameters of encoded medical video,
. group of pictures (GOPs) size (i.e., intraframe refresh
rate) of encoded medical video, and
. average code-rate channel protection, as a conse-
quence of the choice of the source encoding
parameters and of the knowledge of the available
In order to reduce the dimension of the possible
configurations and to avoid switching continuously from
one given set of parameters to another, only a limited set of
possibilities for these parameters is considered, resulting in
a limited number ` for the possible states. Exemplifications
are provided in Section 4 and in Table 2 for medical video
encoded according to the MPEG-4 standard.
The adaptive algorithm that has been tested takes into
account the trend of the medical video quality feedback
from the source decoder and the average 1
enced on the wireless link in the previous controlling cycle,
where 1
is the average energy-per-coded bit and `
is the
one-sided noise spectral density.
Typically, a low ultrasonography video quality value
associated to a negative video quality trend will cause a
transition to a state characterized by a higher robustness,
i.e., higher source compression, more error resilience tools
[16], [17], and higher channel protection.
The APPcontroller algorithmis runeverycycle of lengthT
(e.g., one second). States are numbered from the most robust
(State 1) to the less robust (State `) corresponding to the
highest error-free quality. An example is reported in Table 2.
At the end of each cycle, the network condition is checked.
When there is a network congestion, indicated by a high
value for the packet loss rate (PLR) feedback in the NSI, the
controller sets immediately the state to the first one,
characterized by the lowest source bit rate (corresponding
to the minimum requirements for OTELO medical video),
in order to reduce as much as possible the amount of data
that have to flow through the IPv6 network.
Otherwise, the state is selected according to the ultra-
soundvideo quality Q
. Q
. . . . achievedin previous cycles.
The state number is decreased (a more robust state is
selected for the following cycle) if a reduction in US video
quality is observed, i.e., Q
< 0, and increased (a less
robust state, corresponding to a higher errorless quality, is
selected) if a US video quality improvement is observed,
i.e., Q
0. In order to avoid too many oscillations,
proper controls checking previous states are considered.
The state number can be increased/decreased of one
step :totci 1 :totci 1 or two steps :totci 1
:totci 2 according to the US video quality value Q
observed, which is compared with proper thresholds. The
temporal average of the quality of video frames in the cycle
just terminated is considered as the US video quality:

. 1
where Q
is the quality of the /th video frame in controller
cycle i and `
is the number of frames transmitted in cycle i
just terminated.
As anticipated above, the video quality metric consid-
ered for closed-loop control should be evaluated with no
reference or with only partial reference to the original US
video frames, which are not available at the expert side. An
example of such metric is the metric defined in [24].
However, for the final assessment and results shown in
Section 4, the following metrics are considered:
1o`1 20|oq

. 2
where 1`o1 is the square root of the mean square

i. , )i. ,
\ H
. 3
where )

i. , and )i. , are the luminance of pixels

i. , in the reconstructed and the original frame of
dimension \ H.
2. Structural similarity metric (SSIM) [22]. The SSIM
index presentedin[22], as shownin(4), canbe written
as the product of three independent contributions,
representing the luminance information, the contrast
information, and the structural information.
With r and y indicating the reference and
received image:
oo1`r. y |r. y cr. y :r. y. 4
Sets of Parameter Values Used by the APP Controller Unit
where the luminance comparison is represented by
the term
|r. y
and for the contrast comparison:
cr. y
. 6
The structural comparison term :r. y is
:r. y
. 7
In the expressions above, it is

. 8

` 1

. 9

` 1
. 10
and C
, C
, and C
are proper constants.
These metric have been considered since more com-
monly used, allowing thus easier evaluation and compar-
ison of results.
The GOP length, i.e., intraframe refresh rate, is selected
according to the reduced CSI available at the application
layer: higher GOP lengths are selected for better channel
If the indication is to increase the state number and the
current state corresponds to the maximum value, the state
is unchanged. Similarly, the state is unchanged if the
indication is to decrease and the current state corresponds
to the minimum.
Given the average bit rate associated to the chosen state,
the code rate available for signal protection is evaluated
considering the constraint on the total 1
, where
is the average source coding rate and 1
is the target
average protection rate. If physical layer UEP is adopted,
given the available total coded bit rate (1
), the average
channel coding rate (1
) is derived by the application m-
JSCC/D controller and proposed to the PHY controller (see
Fig. 4). The knowledge of the bit rate is of course
approximated and based on rate/source parameters models
developed by the authors or average values evaluated in
previous controller steps.
The length of the controller time step has to be chosen
according to the requirements of the application. The
controller time step has to be short enough to allow a quick
adaptation to channel conditions, but long enough not to
reduce too much compression efficiency.
The information about the system is acquired by
controllers in the first time steps, where, e.g., the motion
and statistics characteristics of the medical video sequence,
to be exploited for the following time steps, are acquired by
the APP controller subunit.
Furthermore, as mentioned above, the OTELO system
requires a maximum delay of 300 ms. In this case, attention
should be paid in order to comply with such constraints. If
preencoding is considered for medical video and frames are
encoded group by group, the group of frames coded
together should correspond to a portion of video shorter
than 300 ms. It is important in fact that the medical expert
could experience a small delay between the positioning of
the probe and the visualized ultrasonography video frame.
3.4 Physical Layer Controller Unit
The compressed medical video bitstream may be separated
in partitions or layers with different sensitivity to channel
errors. A different protection can thus be used for the
different partitions, when allocating the average channel
coding rate available. As an example, in the case of MPEG-4
video, the bitstream can be separated into packets and each
packet can be separated into a header and two data
partitions, with different error sensitivity. Packets from I
(intra) frames can be separated in a first class related to DC
DCT coefficients and a second class related to AC DCT
coefficients, whereas P (predicted) frames packets can be
separated into two partitions relevant to motion and texture
data, respectively. This different sensitivity can be exploited
to perform UEP, either at application or at physical level.
The video stream sensitivity can be modeled similar as [12]
in order to simplify the UEP policy. Similarly, in the case of
H.264-based compression, the data partitioning tool can be
exploited, as in SVC the granularity offered by scalable
video coding may be invoked. Unequal protection based on
ROI can also be considered, by exploiting the possibility
offered by the MPEG-4 standard to separate any video
sequence in video objects that can differently be managed.
In this view, the identification of regions of interest allows
dedicating a higher protection to the region of interest,
resulting in an increase in diagnostic accuracy, for a fixed
available bandwidth.
The taskof thePhysical layer controller subunit is todecide
on the unequal error protection strategy, i.e., on the channel
coding rate for the different source layers, each with a
different sensitivity to errors, with the goal of minimizing the
total distortion 1
due to compression and channel errors,
with the constraint of the average channel coding rate 1
selected by the application controller. The general procedure
for channel codingrates selectionis describedindetail in[12].
Furthermore, the PHY controller subunit sets the para-
meters for bit-loading in multicarrier modulation, interlea-
ver characteristics and it performs a trade-off with receiver
complexity. Again, the metric chosen for representing
distortion should be representative of diagnostic accuracy.
In order to demonstrate the feasibility of the system and to
evaluate the performance achievable, the proposed con-
trolled system has been implemented with its different
subblocks, namely, application layer controller; source
encoder/decoder (three possible codecs: MPEG-4, H.264/
AVC, and SVC); cipher/decipher unit; RTP header inser-
tion/removal; transport protocol header (e.g., UDPLite,
UDP, or DCCP) insertion/removal; IPv6 header insertion/
removal; IPv6 mobility modeling; IPv6 network simulation;
RoHC; DLL header insertion/removal; Radio Link, includ-
ing physical layer controller, channel encoder/decoder
(convolutional, rate compatible punctured convolutional
(RCPC), low-density parity check (LDPC) codes with soft
and iterative decoding allowed), interleaver, modulator
(also OFDM, TCM, TTCM, STTC, soft and iterative
demodulation allowed), and channel (e.g., additive white
gaussian noise (AWGN), Rayleigh fading, shadowing,
frequency selective channels). The implementation of some
of the blocks above was performed in the framework of the
PHOENIX EU project and is described in [37].
The proposed structure is implemented in simulated
laboratory environment with images and video stream
acquired from the real OTELO system. The video
sequences acquired by the robotic sonographer are fed to
the source codec, which performs source (MPEG-4/H.264)
encoding (according to the parameters suggested by the
APP controller) by every controller time step. The encoded
bitstream is then processed by the lower layers, and finally
transmitted over the wireless channel. The parameters of
upper layers, down to the network, are determined by the
application layer controller unit by every APP controller
time step. The parameters of lower layers, in particular of
the physical layer, are determined runtime by the PHY
controller unit with its relevant time step (lower or equal
to the one of the APP controller). The application controller
unit performs adaptive source bit rate adaptation and the
physical layer one provides UEP, according to the average
bit rate suggested by the APP layer controller, and drives
adaptive bit-loading for multicarrier modulation. A default
parameter setting is considered in the initialization phase.
4.1 Simulation Setup
The 802.11e WLAN support was added at the radio link
level, with a coded bit rate of 12 Mbit/s. The ultrasono-
graphy video stream is coded according to the MPEG-4
standard and is assumed multiplexed with other real-time
transmissions, so that it occupies only an average portion of
the available bandwidth corresponding to a coded bit rate
of 650 Kbit/s. CIF resolution has been selected. The
MoMuSys MPEG-4 reference video codec is considered,
with some modifications in the decoder to improve bit error
resilience. The modified decoder is used both in the
adapted and nonadapted system.
RoHC is applied, in order to compress the transport
and network headers by transmitting only nonredundant
Channel codes are irregular repeat-accumulate (IRA)
LDPC codes with a mother code rate of (3,500, 10,500),
properly punctured and shortened in order to obtain
different code rates. The resulting codewords are always
4,200 bits long. The code rate is 2/3 for the nonadapted
system (EEP); in the adapted case, the code rate can change
according to SSI in order to perform UEP. The average
coded bit rate is the same in both cases considered.
In the first case, the modulation is a classical OFDM
with 48 carriers for data transmission and a frame duration
of 4 js; margin adaptive bit-loading techniques managed by
the PHY joint source and channel coding (JSCC) are
considered in the adapted system.
The channel is obtained according to the ETSI channel
A model, representing the conditions of a typical office
(hospital) environment. It takes into account also a log-
normal flat fading component with channel coherence time
of 5 s, to consider the fading effects due to large obstacles.
A median signal-to-noise ratio of 1
13.2 dB has
been considered.
For the scenario considered, the number of possible
states controlled at APP layer is 5. Each state is character-
ized by different sets of values for the above-mentioned
parameters. State 1 corresponds to the lowest source data
rate (the lowest video quality) and the highest robustness,
whereas state 5 corresponds to the highest source data rate
(the highest video quality) and the lowest robustness. Thus,
increasing the state number corresponds to increasing the
robustness of transmission at the cost of a loss in the error-
free received video quality. Table 2 summarizes the states
considered in the example.
The source bit rate after MPEG-4 compression depends
on the APP controller status and ranges from 210 Kbit/s
(state 1) to 384 Kbit/s (state 5), taking into account also the
overhead due to the various network headers, and it is thus
in good accordance with the OTELO requirements reported
in Table 1 and Section 2. Note that in some cases, to keep an
acceptable video quality in very deep fades or network
congestion, the most robust states have to consider lower
frame rates than those required by the OTELO system. This
simulation setup is summarized in Table 3.
A second scenario is described in the following. The case
of transmission over a nonfrequency selective channel
affected by Rayleigh fading and log-normal shadowing with
median o`1 5d1 is taken into account. The considered
channel codes are RCPC codes with mother code rate 1,3,
constraint length 1 5, and puncturing period 1 8.
Simple BPSK modulation is considered here. Robust header
compression is applied. The APP controller states corre-
spond to a minimum source bit rate after MPEG-4 coding of
210 Kbps and a maximum of 384 Kbps (thus, respecting
OTELO specifications in Table 1). The considered combina-
tions of parameters for MPEG-4 video are the same as above.
State 4 is the reference one, i.e., the one considered in the
nonadapted case, whereas the controller switches among
the states in Table 2 in the adapted case. The maximum bit
rate over the channel is 450 Kbps.
4.2 Numerical Results
Fig. 5 reports comparative results in terms of PSNR and
SSIM[22] versus time, in the first setup described above. The
quality curves reported in the graph have been obtained
through the average of four distinct simulations, run with
different noise seeds. Quality values averaged over 1s are
reported in the curves. The quality values have been
normalized with respect to the maximum value achieved
in order to allow the comparison of different metrics in the
same figure. An average gain of 4.4 dB in terms of PSNR is
provided by the adapted system, allowing the performance
of the diagnosis with much higher accuracy than in the
nonadapted case, as visual results confirm. It is evident how,
in deep channel fades, the medical video quality in the
proposed system is kept at acceptable levels. In particular, if
such fades happen in the first part of the ultrasonography,
where the medical doctor is searching for the specific organ,
this allows a reduction of the search time, avoiding the time
where the quality of the communication is not acceptable.
Fig. 6 reports the complementary cumulative distribu-
tion function of the received medical video quality
expressed in terms of SSIM for the proposed and the
reference system. The graph allows visualizing the
comparison in terms of the percentage of time the video
quality is above a prefixed threshold.
Since PSNR is a more commonly used quality metric,
results in terms of PSNR for 30 seconds are reported in
Table 4 for an easier evaluation of results.
Fig. 7 shows the comparative visual results for the
echocardiography sequence acquired on the expert side, in
the same setup. The original frame (no. 422) is reported in
Fig. 7a. The corresponding received video frame with the
nonadapted system is reported in Fig. 7b; this figure clearly
shows evident artifacts, in terms of light stripes, affecting
the accuracy of the diagnosis. Fig. 7 shows the correspond-
ing received video frame with the adapted system,
presenting a much higher visual quality, also reflected in
very good diagnosis accuracy.
Fig. 8 reports comparative results showing the perfor-
mance with (ROAM) and without the proposed controller
structure, in the second setup described in the previous
Summary of the Main Simulation Parameters Considered (First Scenario)
Fig. 6. Complementary cumulative distribution function of ultrasound
video quality in terms of SSIM.
Fig. 5. Comparative performance of the proposed and reference
architecture. Normalized PSNR and SSIM versus time.
section, i.e., in the case of time correlated, nonfrequency
selective channel. Results are obtained through the average
of five simulations of 30 seconds each. Again, video quality
of subsequent frames in one second is averaged to provide a
single value. The average gain observed in terms of PSNR is
4 dB in this case.
Fig. 9 shows the complementary cumulative distribution
function of video quality at the expert side, in terms of
SSIM, in the same setup. The gain with the adapted
(ROAM) system is evident also with this different visuali-
zation, highlighting the percentage of time the ultrasound
video quality is above a prefixed quality value.
A new multilayer controller structure for enhanced medical
video streaming in robotic teleultrasonography applications
(m-JSCC/D) is introduced in this paper. In particular, an
application controller unit, driving the source encoder
parameters with the knowledge of channel and network
state information and of the medical video quality at the
expert site, and a physical controller unit, allowing the
adaptation of the medical video bitstream to channel
conditions by exploiting the knowledge of the character-
istics of the ultrasonography video stream, are described in
detail. The proposed structure is implemented in simulated
laboratory environment with images and video stream
acquired from the real robotic ultrasonography system
(OTELO). Comparative simulation results in the case of
ultrasonography video transmission over a WLAN link
show that a considerable improvement in terms of both
objective medical video quality and of subjective quality is
achieved with the proposed system.
Ongoing work is currently underway to test the
performance of the proposed system in real telemedical
and clinical settings to verify the performance of the robotic
diagnostic system in hospital and emergency situations.
Fig. 7. Comparative visual results of the acquired medical video images.
(a) Frame no. 422original. (b) Frame no. 422MPEG-4without
m-JSCC/D. (c) Frame no. 422MPEG-4with m-JSCC/D.
PSNR Results (in dB)30s Ultrasound Video Sequence
Fig. 8. Normalized PSNR and SSIM versus time with the adapted
(ROAM) and reference systems. Second scenario.
The work of M.G. Martini and M. Mazzotti was partially
supported by the European Commission in the framework
of the PHOENIX IST project under contract FP6-IST-1-
001812. The PHOENIX project partners are also acknowl-
edged. R.S.H. Istepanian and N. Philip are grateful to the
European Union for supporting the project EU IST-32516
OTELO: Integrated, end-to-end, mobile tele-echography
system. Maria G. Martini was with CNIT, University of
Bologna, Italy, at the time of this work.
[1] R.S.H. Istepanian, S. Laxminarayan, and C.C. Pattichis,
M-HealthEmerging Mobile Health Systems. Springer, 2006.
[2] R.S.H. Istepanian, E. Jovanov, and Y.T. Zhang, M-Health: Beyond
Seamless Mobility for Global Wireless Healthcare Connectivity-
Editorial, IEEE Trans. IT in Biomedicine, vol. 8, no. 4, pp. 405-414,
Dec. 2004.
[3] P.C. Cosman et al. Thoracic CT Images: Effect of Lossy Image
Compression on Diagnostic Accuracy, Radiology, vol. 190,
pp. 517-524, 1994.
[4] H. Yu, Z. Lin, and F. Pan, Applications and Improvement of
H.264 in Medical Video Compression, IEEE Trans. Circuits and
Systems I, special issue on biomedical circuits and systems: a new
wave of technology, vol. 52, no. 12, pp. 2707-2716, Dec. 2005.
[5] J.L. Massey, Joint Source and Channel Coding, Communications
Systems and Random Process Theory, pp. 279-293, Sijthoff &
Noordhoff, 1978.
[6] M.G. Martini, M. Mazzotti, C. Lamy-Bergot, J. Huusko, and P.
Amon, Content Adaptive Network Aware Joint Optimization for
Wireless Video Transmission, IEEE Comm. Magazine, vol. 45,
no. 1, pp. 84-90, Jan. 2007.
[7] F. Pereira and T. Ebrahimi, The MPEG-4 Book. Prentice Hall, 2002.
[8] T. Wiegand, G.J. Sullivan, G. Bjntegaard, and G. Luthra, An
Overview of the H.264/AVC Video Coding Standard, IEEE
Trans. Circuits and Systems for Video Technology, vol. 13, no. 7,
pp. 560-576, July 2003.
[9] J. Hagenauer and T. Stockhammer, Channel Coding and
Transmission Aspects for Wireless Multimedia, Proc. IEEE,
vol. 87, no. 10, pp. 1764-1777, Oct. 1999.
[10] J. Hagenauer, N. Seshadri, and C.E. Sundberg, The Performance
of Rate-Compatible Punctured Convolutional Codes for Digital
Mobile Radio, IEEE Trans. Comm., vol. 38, no. 7 pp. 966-980, July
[11] J. Modestino and D. Daut, Combined Source-Channel Coding of
Images, IEEE Trans. Comm., vol. 27, no. 11 pp. 1644-1659, Nov.
[12] M.G. Martini and M. Chiani, Rate-Distortion Models for Unequal
Error Protection for Wireless Video Transmission, Proc. IEEE
Vehicular Technology Conf. (VTC 04), May 2004.
[13] D. Dardari, M.G. Martini, M. Mazzotti, and M. Chiani, Layered
Video Transmission on Adaptive OFDM Wireless Systems,
Eurasip J. Applied Signal Processing, vol. 2004, no. 10, pp. 1557-
1567, Aug. 2004.
[14] H. Schwarz, D. Marpe, and T. Wiegand, Overview of the Scalable
Video Coding Extension of the H.264/AVC Standard, IEEE
Trans. Circuits and Systems for Video Technology, vol. 17, no. 9,
pp. 1103-1120, Sept. 2007.
[15] Y. Wang, S. Wenger, J. Wen, and A.K. Katsaggelos, Error-
Resilient Video Coding Techniques, IEEE Signal Processing
Magazine, vol. 17, no. 4, pp. 61-82, July 2000.
[16] R. Talluri, Error-Resilient Video Coding in the ISO MPEG-4
Standard, IEEE Comm. Magazine, vol. 36, no. 6, pp. 112-119, June
[17] Y. Wang and Q.F. Zhu, Error Control and Concealment for Video
Communication: A Review, Proc. IEEE, vol. 86, no. 5, pp. 974-997,
May 1998.
[18] V. Srivastava and M. Motani, Cross-Layer Design: A Survey and
the Road Ahead, IEEE Comm. Magazine, vol. 43, no. 12, pp. 112-
119, Dec. 2005.
[19] V. Kawadia and P.R. Kumar, A Cautionary Perspective on Cross-
Layer Design, IEEE Wireless Comm., vol. 12, no. 1, pp. 3-11, Feb.
[20] J.O. Limb, Distortion Criteria of the Human Viewer, IEEE Trans.
Systems, Man and Cybernetics (SMC), vol. 9, no. 12, pp. 778-793,
Dec. 1979.
[21] C.J. van Den, B. Lambrecht, and O. Verscheure, Perceptual
Quality Measure Using a Spatio-Temporal Model of the Human
Visual System, Proc. Soc. Photo-Optical Instrumentation Engineers
(SPIE) Conf., pp. 450-461, 1996.
[22] Z. Wang, L. Lu, and A.C. Bovik, Video Quality Assessment
Based on Structural Distortion Measurement, Signal Processing:
Image Comm., vol. 29, no. 1, Jan. 2004.
[23] I. Cheng and A. Basu, Perceptually Optimized 3D Transmission
over Wireless Networks IEEE Trans. Multimedia, vol. 9, no. 2,
pp. 386-396, Feb. 2007.
[24] M. Zanotti, M.G. Martini, and M. Chiani, Reduced Reference
Image and Video Quality Assessment Based on Structural
Similarity, Technical Report IEIIT-002-06, 2006.
[25] Z. Wang, H.R. Sheikh, and A.C. Bovik, No-Reference Perceptual
Quality Assessment of JPEG Compressed Images, Proc. IEEE Intl
Conf. Image Processing, Sept. 2002.
[26] J.L. Mannos and D.J. Sakrison, The Effects of a Visual Fidelity
Criterion on Encoding of Images, IEEE Trans. Information Theory,
vol. 20, no. 4, pp. 525-536, July 1974.
[27] J. Sublett, B. Dempsey, and A.C. Weaver, Design and Imple-
mentation of a Digital Teleultrasound System for Real-Time
Remote Diagnosis, Proc. IEEE Ann. Symp. Computer-Based Medical
Systems, pp. 292-299, June 1995.
[28] R. Ribeiro, R. Conceicao, J.A. Rafael, A.S. Pereira, M. Martins, and
R. Lourenco, Teleconsultation for Cooperative Acquisition,
Analysis and Reporting of Ultrasound Studies, Proc. Conf.
Telemedicine (TeleMed 98), Nov. 1998.
[29] G. Kontaxakis, S. Walter, and G. Sakas, Eu-Teleinvivo, an
Integrated Portable Telemedicine Workstation Featuring Acquisi-
tion, Processing and Transmission over Low-Bandwidth Lines of
3D Ultrasound Volume Images, Proc. Intl Conf. Information
Technology Applications in Biomedicine, Nov. 2000.
[30] A. Vilchis, J. Troccaz, P. Cinquin, F. Courreges, G. Poisson, and B.
Tondu, Robotic Tele-Ultrasound System (TER): Slave Robot
Control, Proc. First Intl Fed. Automatic Control (IFAC) Conf.
Telematics Application in Automation and Robotics, pp. 95-100, July
[31] K. Masuda, E. Kimura, N. Tateishi, and K. Ishihara, Develop-
ment of Remote Echographic Diagnosis System by Using Probe
Movable Mechanism and Transferring Echogram via High Speed
Digital Network, Proc. Ninth Mediterranean Conf. Medical and
Biological Eng. and Computing (MEDICON 01), pp. 96-98, June
[32] F. Courreges, P. Vieyres, R.S.H. Istepanian, P. Arbeille, and C. Bru,
Clinical Trials and Evaluation of a Mobile, Robotic Tele-
Ultrasound System, J. Telemedicine and Telecare (JTT), vol. 2005,
no. 1, pp. 46-49, 2005.
Fig. 9. Complementary cumulative distribution function of ultrasound
video quality in terms of SSIM. Adapted (ROAM) versus reference
system. Second scenario.
[33] A. Gourdon, P. Poignet, G. Poisson, P. Vieyres, and P. Marche, A
New Robotic Mechanism for Medical Application, Proc. IEEE/
ASME Conf. Advanced Intelligent Mechatronics, pp. 33-38, Sept. 1999.
[34] S.A. Garawi, F. Courreges, R.S.H. Istepanian, H. Zisimopoulus,
and P. Gosset, Performance Analysis of a Compact Robotic Tele-
Echography e-Health System over Terrestrial and Mobile Com-
munication Links, Proc. Fifth IEE Intl Conf. 3G Mobile Comm.
Technologies3G 2004, pp. 118-122, Oct. 2004.
[35] S.A. Garawi, R.S.H. Istepanian, and M.A. Abu-Rgheff, 3G
Wireless Communications for Mobile Robotic Tele-Ultrasonogra-
phy Systems, IEEE Comm. Magazine, vol. 44, no. 4, pp. 91-96, Apr.
[36] M.G. Martini and M. Chiani, Proportional Unequal Error
Protection for MPEG-4 Video Transmission, Proc. IEEE Intl Conf.
Comm. (ICC 01), June 2001.
[37] M.G. Martini, M. Mazzotti, C. Lamy-Bergot, P. Amon, G. Panza, J.
Huusko, J. Peltola, G. Jeney, G. Feher, and S.X. Ng, A
Demonstration Platform for Network Aware Joint Optimization
of Wireless Video Transmission, Proc. IST Mobile Summit, June
Maria G. Martini received the laurea degree in
electronic engineering (summa cum laude) from
the University of Perugia, Italy, in July 1998, and
the PhD degree in electronics and computer
science from the University of Bologna, Italy, in
March 2002. She is currently a senior lecturer in
the Faculty of Computing, Information Systems
and Mathematics, at Kingston University, Lon-
don, where she is also coordinating the Wireless
Multimedia Networking Research Group and the
participation of the group in the OPTIMIX European project. After a
collaboration with the University Hospital of Perugia, Italy, and with the
University of Rome, Italy, she joined the Dipartimento di Elettronica,
Informatica e Sistemistica (DEIS), University of Bologna, in February
1999. In 2004-2007, she was with the National Inter-University
Consortium for the Telecommunications (CNIT), Italy. She has worked
as a key person for several national and international projects, such as
the JSCC project, with Philips Research, the Joint Source and Channel
Coding-Driven Digital Baseband Design for 4G Multimedia Streaming
(JOCO) EU IST project, and the PHOENIX (Jointly Optimizing Multi-
media Transmission in IP-Based Wireless Networks) European IST
project, leading in particular the activity on the cross-layer system
controller. She serves as a reviewer for international journals and
conferences and has participated or is participating in the organizing
committees and technical program committees of several international
conferences (recently, MOBIMEDIA 2008, IEEE PIMRC 2008, IEEE
WCNC 2009, and IEEE Pervasive Healthcare 2009). She was the
general chair of the EUMOB 2008 Symposium in Oulu, Finland, and is
currently the general chair of the Fifth International Mobile Multimedia
Communications Conference (MOBIMEDIA 2009) in London. She is
coordinating the edition of the Strategic Applications Agenda (SAA) on
mobile health and inclusion applications in the framework of the
eMobility European Technology Platform. Her research interests are
mainly in joint source and channel coding, error resilient video
transmission, wireless multimedia networks, cross-layer design, deci-
sion theory, frame synchronization, and in the application of knowledge
from the communications field to the medical field. She holds several
international patents on wireless video transmission. She is a senior
member of the IEEE.
Robert S.H. Istepanian received the PhD
degree from the Electronic and Electrical En-
gineering Department at Loughborough Univer-
sity, United Kingdom, in 1994. He is currently a
professor of data communications at Kingston
University, London, and a visiting professor in
the Division of Cellular and Molecular Medicine
at St. Georges University of London. He is the
founder and director of the Mobile Information
and Network Technologies Research Centre
(MINT) in Kingston University. He held several academic and research
academic posts in the UK and Canada including senior lectureships in
the Universities of Portsmouth and Brunel University in the UK, and was
also an associate professor in the Universities of Ryerson, Toronto, and
an adjunct professor in the University of West Ontario in Canada. He is
currently the 2008 Leverhulme distinguished visiting fellow at the Centre
for Global e-Health Innovation at the University of Toronto and the
Universitys Health Network. He is a investigator and coinvestigator of
several EPSRC and EU research grants on wireless telemedicine and
other research/visiting grants from the British Council and the Royal
Society, the Royal Academy of Engineering, and the Leverhulme Trust.
He was also the UK lead investigator of several EU-IST and e-Ten
projects in the areas of mobile healthcare. He is also a member on
several experts and grants review committees, and more recently, was a
member for the Canada Foundation for Innovations experts panel and
their strategic healthcare projects. He currently serves on several IEEE
Transactions and international journals editorial boards, including the
IEEE Transactions on Information Technology in Biomedicine (since
1997), the IEEE Transactions on NanoBioScience, the IEEE Transac-
tions on Mobile Computing, the International Journal of Telemedicine
and Applications, and the Journal of Mobile Multimedia. He has also
served as the guest editor of several special issues of IEEE
Transactions. He was the cochairman of the UK/RI chapter of IEEE
Engineering in Medicine and Biology in 2002. He has also served as an
expert and reviewer on numerous funding bodies in the UK and Canada,
as an invited keynote speaker at several international conferences, and
as a technical committee member or chair of several national and
international conferences. He has published more than 170 refereed
journal and conference papers and edited three books, including
chapters in the areas of mobile communications for healthcare, m-health
technologies, and biomedical signals processing. He is a fellow of the
Institute of Engineering Technology (IET) (formerly the IEE) and a senior
member of the IEEE.
Matteo Mazzotti received a degree in tele-
communications engineering (with the highest
honors) and the PhD degree in electronic
engineering, computer science and telecommu-
nication from the University of Bologna in July
2002 and May 2007, respectively. Currently, he
is working with the National Research Council
(CNR), Italy, and the National Inter-University
Consortium for the Telecommunications (CNIT),
Italy. His main research interests include multi-
media communications, joint source and channel coding, broadcast
technologies, and wireless communication systems. He is a member
of the IEEE.
Nada Y. Philip received the PhD degree for her
thesis titled Medical Quality of Service for
Optimized Ultrasound Streaming in Wireless
Robotic Teleultrasonography System from the
Faculty of Computing, Information Systems and
Mathematics at Kingston University, United King-
dom, in 2008. Currently, she is a lecturer at
Kingston University and an honorary tutor at St.
Georges University of London, United Kingdom.
She is a member of the Mobile Information and
Network Technologies Research Centre (MINT) at Kingston University.
Her research interests include data communication, networking, and
information technology in healthcare and medical applications. She is a
member of the Institute of Engineering Technology (IET) and the IEEE.