You are on page 1of 12

1

An Efficient Security System for CABAC


Bin-strings of H.264/SVC
Mamoona Naveed Asghar, and Mohammad Ghanbari, Fellow, IEEE

performance and to avoid computational complexities,


Abstract— The distribution of copyrighted scalable selective/partial encryption for video content security has been
video content to differing digital devices requires suggested. The selective encryption is carried out on the most
protection during rendering and transmission. In this significant information at a choice of different stages of the
paper we propose a complete security system for codec, such as on the original pixels, the transform
H.264/SVC codec and present a solution for: 1) the bit-rate coefficients, the quantization indexes, the bit-planes, the
and format compliance problems by careful selection of entropy coder, or the final bit-stream [4]. Encryption alters the
entropy coder syntax elements (bin-strings) for selective video statistics, resulting in the issues of bit-rate overhead and
encryption (SE), and 2) the problem of managing multiple format compliancy at the decoder. However, applying the
layer encryption keys for scalable video distribution. A encryption at the entropy coding stage minimizes these
standard key management protocol, multimedia Internet problems.
keying protocol (MIKEY), is implemented for the The cryptographic algorithms used for encryption are never
hierarchical key generation mechanism, in which a a secret; their steps are visible to public. The object which
subscriber has only one encryption key to unlock all should be hidden from the public and unauthorized access, is
scalable layers that have been subscribed to. The the ‘key’ used for encryption by cryptographic algorithms.
evaluation demonstrates the resulting video quality Kerckhoffs' principle [5] declares that the rival can know the
degradation arising from SE for many CIF and 4CIF test chosen cipher algorithm but not the key; thus the security of
video sequences, without there being any impact upon the the ‘key’ is imperative for data security. Therefore, the
bit-rate or format compliancy, and with small proposed security system for scalable video coding should not
computational delay. The security and statistical analysis only involve the protection of data by encryption; it must also
performed further verify the effectiveness of the proposed incorporate the protection of secret values (keys) with a
security system for H.264/SVC. The proposed system is solution that provides management of multiple keys with
highly suitable for video distribution to users who have minimal overhead.
subscribed to a varying degree of video quality on devices The key generation and distribution is the issue to crucially
with medium to high computational resources. tackle in order to further enhance the security of any cipher
algorithm. This field of study has not been given enough
Index Terms—AES-CFB algorithm, CABAC, H.264/SVC, attention in the past studies on multimedia security. So, to
MIKEY, key management, security system fulfill the current need for scalable content distribution, we
have devised an efficient security system including digital
I. INTRODUCTION rights management (DRM) state, giving “sufficient

WITH advances in digital media, increases in processing


power and network bandwidth, numerous multimedia
encryption” [6] with a key management mechanism to further
enhance the protection of the encryption key. The idea behind
applications and codecs have evolved in recent years. The “sufficient encryption” is that the scalable contents should
Joint Video Team of the ITU-T VCEG and the ISO/IEC have enough security with selective encryption and must reach
MPEG have standardized scalable video coding (SVC) which the user in a scrambled form, rather than be watchable with a
is an extension of the state-of-the-art H.264/AVC standard single encryption key for all layers. Receiving the upper layers
[1][2]. Scalable Video Coding (H.264/SVC) [3] permits the in scrambled form enhances a user’s interest in subscribing to
transmission and decoding of partial bit-streams to provide the upper high-quality video layers.
video services at various temporal, spatial and/or quality The proposed security system includes: 1) careful selection
resolutions, as well as preserving a reconstruction quality that of context adaptive binary arithmetic coding (CABAC) syntax
is high enough relative to the rate of the partial bit-streams. elements (bin-strings) for selective encryption (SE), 2)
Copyrighted digital content is always vulnerable to application of SE on bin-strings of H.264/SVC on a per layer
plagiarism attacks by its ease of copying and modification. basis by using Advanced Encryption Standard (AES) with a
Therefore, the concerns that exist about their protection and cipher feedback mode (AES-CFB) in a compression-friendly
authentication are significant. Cryptography is a conventional and format compliant manner, and 3) a key management
technique that has been used for many decades to provide (generation/distribution) scheme to improve the content
multimedia content protection. To improve the run-time security of H.264/SVC layers by using the IETF standard key
management protocol MIKEY.
School of Computer Science and Electronic Engineering, University of Essex,
The rest of the paper is organized in the following manner.
Colchester, United Kingdom CO4 3SQ. (e-mail: masghaa@essex.ac.uk; Section II overviews the prior research in the area of H.264
ghan@essex.ac.uk).
2

protection and key management. Section III describes developed by performing transparent encryption/conditional
elements of the H.264/SVC codec, specifically CABAC access on the H.264/SVC layers. The scalable enhancement
entropy coding used in this study. It also reviews the key layers are encrypted by using AES, while leaving the base
management protocol (MIKEY) and cipher algorithm (AES- layer in plain format. However, the authors point out that the
CFB). Section IV describes the proposed security system enhancement layers are non-format compliant.
features and implementation. Section V elaborates the On the other hand, there is a conviction that if the base
experimental results on video appearance after application of layer is protected then no one can get the data from the
SE with a detailed performance analysis. Finally, Section VI enhancement layers and the whole SVC bit-stream is secured.
contains concluding remarks with some proposals for future The research shows [24] that if objects are encrypted in this
work. way, the real content can be easily guessed without
decryption. Consequently, Algin et al. [25] proposed the idea
II. PRIOR RESEARCH ON H.264 SECURITY of SE on SVC with three security levels. The idea concerns
Many multimedia selective content encryption methods the encryption of signs of coefficients, signs of Motion
have been proposed over the last five years [7][8], for the Vectors (MV) and alteration of DC values. The sign
security of the latest H.264/AVC standard video codec. The encryption has no effect on bit-rate and compression
study in [9] describes the SE on I-frames by extracting them efficiency (as the signs are equally distributed) but DC value
from the H.264 bit-stream and using the AES algorithm for alterations affect the compression efficiency.
ciphering/deciphering. The scheme reduces the computational There has been some recent work on key
cost but it is non-format compliant. Additionally, the idea is generation/distribution for standard and scalable video coding.
not suitable for selective encryption, because I-frame Li, et al. [26][27] devised a NAL level selective encryption
encryption is not as significant as encryption of other encoding technique for H.264/ SVC. The scheme encrypts the
components, e.g. Data Part A from the Data Partition Mode of instantaneous decoding refresh (IDR) Pictures, sequence
H.264/AVC [10]. The idea of scrambling DCT coefficients parameter set (SPS), picture parameter set (PPS) on individual
[11] was applied by Wang et. al [12] but it degraded the NAL units [26] and intra prediction modes (IPM) with signs
compression efficiency. Selective scrambling of bits, DCT of textures for base layer [27] by using the stream cipher Leak
coefficients and motion vectors were proposed by Zeng et. al Extraction (LEX) Algorithm. The LEX uses three keys for
[13] which also degraded the compression efficiency. each of the three NAL units. The study pointed out required
Spinsante et al. [14] proposed H.264/AVC partial encryption future work such as a key management scheme, which is a
of quantization parameters (QP), deblocking filter coefficients foremost issue in the security of any cipher algorithm. Wang
and Intra prediction mode, one by one, and altogether for the et al. [28] demonstrated the idea of hierarchical key generation
final outcome. The selected parameters increase the bit-rate for the cipher algorithm to encrypt the partial H.264/AVC
and the encryption algorithm which is described for the results video content including the intra-prediction mode, motion
is inefficient. Fan et al. [15][16] presented a novel video vector differences (MVD), and quantization coefficients.
encryption scheme for H.264/AVC. Three different Every frame has a unique key in the whole video as in the
block/stream cipher algorithms AES, FLEX (Fast Leak group of pictures (GOP) key generation design by Yuan et al.
Extraction) and XOR are used to encrypt the H.264/AVC [29]. Three sub-keys are also derived, each for the encryption
stream. The work describes the Unequal Secure Encryption of intra-prediction mode, motion vector differences, and
(USE) approach in which they classified the important and un- quantization coefficients. Therefore, if an attacker can hack
important video content by using data partitioning. The the frame key, he can decipher the frame but cannot obtain the
important data content is encrypted by AES, the least frame contents. This appears to be a derivative of MIKEY
important by FLEX, and XOR technique is used to show the with multiple key overhead but with reduced efficacy.
alternative simple encryption. The paper is a significant Perhaps, it was done to reduce the computational cost.
contribution towards H.264/AVC selective encryption but the Nevertheless, we feel that using the derivative instead of
computational cost can be further minimized by not MIKEY has weakened the security specifically in wireless
encrypting the identified un-important video data. transmission. The study in [30] investigates the scalable layer
Limited research has been carried out on the security of protection with individual layer keys. The keys are generated
SVC [17]. Apostolopoulos [18] investigated a Secure Scalable for individual scalable NAL units, meaning N different keys
Streaming (SSS) framework which provides end-to-end are derived and distributed to decode the individual layer. The
security and in-network secure transcoding for content using scheme is complicated and has a high computational cost for
SVC [19][20]. The NAL level encryption for H.264/AVC and identifying the NAL units related to the scalable features. The
SVC is also proposed in the study of [21]. The NAL units are same selective parameters for encryption are extended for the
individually encrypted after the compression so that they have protection of the region of interest (ROI) with a stream cipher
no side-effect on compression efficiency and format [31]. Park et al. [32][33] designed a hierarchical key
compliancy of the bit-stream. The scheme is applied by setting management scheme for the selective encryption of SVC. The
the NAL unit type of encrypted NALs outside the defined study in [32] proposed the scheme of partial encryption of
range, the decoder is forced to reject those NALs, unless base and enhancement layers. The intra prediction mode
encryption is enabled. The scheme is only applied to the SVC (IPM), motion vector differences and residual (texture sign
enhancement layers, with a small affect on bit-rate. In bits) are encrypted in the base layer. For the security of spatial
[22][23], a low-quality free preview application was already and SNR scalability layers the texture sign bits in every layer
are encrypted, but for temporal scalability layers the MVD
3

sign bits and texture sign bits are encrypted. The authors of resources. To make entropy coding computationally efficient,
[33] have devised a key management scheme by creating both CAVLC and CABAC use the single infinite extent
multiple keys i.e. all layer keys are generated with the help of codeword, called Exp-Golomb code [40] to generate the
a MD5 hash [34]. The NAL unit key is generated by a Hash required code for most of the data elements [41].
Message Authentication Code (HMAC) [35] and features CABAC encoding (Fig. 1) is based on three steps: 1)
created by the absolute DC and some threshold values. The Binarization, 2) Context Modeling (CM) and 3) Binary
key management scheme provides robustness against the arithmetic coding (BAC). The binarization step is the
known brute-force attack due to the different NAL unit keys. elementary stage for CABAC encoder. Here the input non-
All the reviewed studies have their own devised key binary syntax elements, such as the quantized transform
management mechanisms but they do not refer to any standard coefficients, macroblock type specifier or motion vector
key management protocol. The author of [36] pointed out that components are converted into unique binary codewords,
the earlier data communication protocols/standards had very known as bin-strings, for a given syntax element. The bit
few security features. Generally, the security was handled at a position in each bin-string, known as a bin, is then passed to
system level that uses the communication protocols. But these one of the two coding mode decisions; regular coding mode
days the communication protocols alone cannot handle the and by-pass coding mode. The bins in regular coding mode are
proliferating security demands of digital devices (smart passed to the next step, context modeling/probability
phones, tablets, netbook, and laptops). Hence there is a need distribution and then encoded by the regular BAC engine. The
for a key management mechanism to enhance the functionality bins from the bypass coding mode skip the context modeling
of the communication security protocols. step and directly enter in the bypass BAC engine for the
encoding process. These bins are related to the sign
III. AN OVERVIEW OF H.264/SVC information of MVD and the signs of transform coefficients
A. H.264 Scalable Video Coding levels or for lower significant bins which are assumed to be
H.264/SVC technology permits devices to send and uniformly distributed.
receive multi-layered bit-streams; it allows the transmission
and decoding of partial bit-streams to provide video services
with different frame rates, spatial resolutions (picture size) and
quality (SNR). Scalable video has a base and a number of
enhancement layers containing various improvements in frame
rates, resolution and quality per layer. Considering encryption
will alter data characteristics, it should be applied where it has
a minimal side effect. This can be achieved by applying the
encryption as part of the entropy coding where all the natural
Fig. 1. CABAC Encoder Top view
redundancies have already been exploited for maximum
compression efficiency. The problem with joint security and CABAC uses five binarization schemes according to syntax
compression is to make sure that the security will not affect elements similar to Huffman trees for binary sequences, which
the compression efficiency [37]. Due to this the encryption at are as follows:
entropy coding needs to be handled with great care, since (1) Unary code – each unsigned integer value symbol x ≥ 0 is
tampering with the statistical dependency of the symbols will mapped onto x “1” bits followed by a “0” terminating bit.
harm the compression efficiency. (2) Truncated unary (TU) code – defined for x with 0 ≤ x ≤ CV
The entropy coding used in H.264/AVC and its extension (cutoff value) is coded with a unary code if x < CV. If x < CV,
H.264/SVC, is context adaptive, and is applied in the two the terminating 0 bit is neglected and the TU codeword
forms of Huffman and arithmetic coding [38]. Our purpose of comprises of x “1” bits only.
choosing CABAC over its Huffman counterpart, context (3) kth order Exp-Golomb (EGk) code – it is a derivative of
adaptive variable length coding (CAVLC), is based on the Golomb codes [30]. Each unsigned integer value symbol x is
greater range of parameters for encryption that CABAC mapped onto two sequential bit strings: a prefix, and a suffix.
provides over CAVLC with more compression efficiency. The The prefix part of the EGk codeword consists of a unary code
H.264 Main profile and the various High profiles, which deal with length ls bits of 1 and one termination bit 0. The length ls
with 4CIF resolution pictures and above, support CABAC. of the prefix string of bit 1 is: ls =  1 and the EGk
Thus, it appears that the multi-scale video distribution of the
future will support CABAC. Currently, the outlook is for full suffix part is computed as the binary representation x + 2k ( 1
VGA resolution on standard streaming mobile applications – 2ls ) which uses k + ls significant bits, but in the kth order of
(e.g. Apple's FaceTime), full 720p high definition (HD) on EGk the number of symbols having the same code length is
mobile devices and full 1080i HD for desktop streaming, represented by Codeword = K + (2  ls) + 1.
which will require a reduction in bit-rate that can be supported (4) Fixed-length (FL) code – this FL binarization is commonly
by CABAC. applied to syntax elements with fairly uniform distribution,
where each bit in FL binary format represents a specific
B. Context Adaptive Binary Arithmetic Coding coding decision e.g. coded block pattern (CBP) symbol related
CABAC [39] is one of the entropy coding modes used by to the luma residual data part. In FL a symbol x within a finite
H.264/SVC to achieve high compression and can be easily size of cutoff value CV is represented by FL ls = .
computed on devices with medium to high computational
4

(5) Concatenation of the first and third scheme (UEGk)- There receiver. This property makes this mode a valid choice for the
are three situations where concatenations of the four basic real-time video applications. However, the scheme could
types are used: (a) UFL- coded_block_pattern is encoded adopt another mode such as OFB, and in this case more
using a 4-bit FL prefix for luma and TU suffix with cutoff protection against errors in transmission would become
value CV = 2 for chroma; (b) UEG3- motion vector available, at a cost in lack of self-synchronization. Thus,
differences are encoded with a concatenation of a unary prefix choice of mode is not critical to the proposed scheme.
and a 3rd order Exp-Golomb code suffix: for a value MVD, The AES is chosen for encryption because of its strength
the prefix is a TU coding with cutoff value CV = 9 of the value against all exhaustive key search attacks. It is estimated that
min(|MVD|,9), or, if MVD = 0, just the bit 0. If |MVD| ≥ 9, a the time required for breaking a 128 bit key by applying all
suffix is output with the value | MVD | - 9 using the EG3 code. possible keys at 50 billion keys/sec takes 5 x 1021 years [45].
A sign bit is then output if |MVD| > 0: 0 if MVD is positive
and 1 otherwise; and, (c) UEG0- absolute values of transform IV. PROPOSED SECURITY SYSTEM
coefficient levels are coded using a TU prefix with cutoff When the same copyrighted multimedia content is
value CV = 14 and the EG0 suffix. The syntax elements distributed to multiple users with different scalable features,
(coeff_abs_value_minus1 = abs_level – 1) are coded by using there is a need for transmitting scalable coded layers
this scheme, while the zero-valued coefficient levels are separately hence demanding individual layer security. We
encoded using a significance map. choose selective encryption on bin-strings of the CABAC
C. Multimedia Internet Keying Protocol entropy coder which are the input to the probability/context
MIKEY [42] is designed to tackle the key exchange model and finally code with a binary arithmetic coder. The
problems, especially in real-time networks. The key research aim is to devise an efficient security system that
management protocol is devised to enable end-to-end security provides sufficient encryption and a key management
i.e. only the participants involved in the communication have mechanism for SVC layers.
authorized access to the generated key(s) and hence to the
content. MIKEY uses a total of eight keys. The keys are A. Bins Selection for Selective Encryption
generated on either sender side or both sides (sender and The CABAC coder has multiple parameters (bin-strings)
receiver) and are described as the: which can be encrypted; for example, transform coefficients
(TC), motion vector differences (MVD), delta quantization
1) Traffic Generation Key (TGK) parameters (dQP) and the arithmetical signs of TC and MVD.
2) Traffic Encryption Key (TEK) To make the SE more effective, we need to choose sensibly
3) Encryption Keys (one for each sender and receiver) the parameters for the encryption. There are two constraints in
4) Authentication Keys (one for each sender and receiver) parameters selection:
5) Salting Keys (one for each sender and receiver) 1) Compression friendliness specifies that the SE must not
disturb the compression efficiency of the encoder else the SE
MIKEY supports five methods for
would increase the encrypted data size to be transferred for a
transporting/establishing a TGK or to setup a common secret,
given bandwidth. It can be controlled by keeping the size of
for all communication scenarios by using: a pre-shared key,
encrypted bin-string (codewords) of the same as is the size
public-key encryption, Diffie-Hellman (DH) key exchange,
of input bin-string, and also by keeping the context model
DH-HMAC (HMAC-Authenticated Diffie-Hellman), and
unchanged for the given syntax element.
RSA-R (Reverse RSA). MIKEY has the capability of
2) Format compliance means the SE must not change the
establishing keys and parameters for more than one security
overall video statistics which would otherwise make the
protocol (or for multiple instances of the same security
SVC decoder complain about decoding the selectively
protocol) at the same time. The TEK can be used directly by
encrypted bit-stream.
the security protocol or it can be used to derive further master
To fulfill the above two constraints we can make some
keys from the TEK. It is however up to the security protocol
recommendations for SE. Some of these recommendations are
to define how the TEK is used.
made on the basis of experimental results (to be described in
D. Advanced Encryption Standard Section V) while others pertain to the nature of syntax
The AES [43] is based on modified substitution- elements. The SE should not be applied on:
permutation network. AES can use the keys of lengths 128 − the intracoded syntax elements having relationship with
bits, 192 bits and 256 bits. For both ciphering and de- neighboring macroblock (MB) syntax elements like Intra DC
ciphering, the AES algorithm uses a round function that is and AC: Because it increases the bit-rate and drift in the
comprised of four different byte-oriented transformations. values of syntax elements and also the bit-stream will not be
AES is basically a symmetric key block cipher using 128- decodable at some stage.
bit block size but it can be used as a stream cipher in Cipher − the intercoded syntax elements like motion vector
Feedback (CFB), Output Feedback (OFB) and Counter (CTR) differences (MVDs) : Because this prediction residual used
modes. In selective encryption a small number of bytes are to predict the future MBs, alters the video statistics by
encrypted, so implementing the AES as a stream cipher is changing magnitudes and increases the bit-rate while the bit-
recommended [44]. Among the above mentioned three modes, stream can be decodable.
the CFB mode is used to build a self-synchronization stream
cipher which provides confidentiality at transmitter and
5

− Delta QP syntax element: Because it causes bit-rate


fluctuation, either increasing or decreasing the overall bit-
rate according to new encrypted dQP values.
− Macroblock header information (encoded first in
CABAC encoding): Because it is used for the prediction of
future MBs, this is related to format compliance.
− Coded-block-flag (CBF): Because it makes the bit-
stream non-format compliant. Every 4×4 block within a MB Fig. 2. Bins selection per spatio-SNR-temporal scalability
is encoded if CBP and MB mode are set for it. The encoded
4×4 block has CBF syntax element showing the NZs alter the number of output bits. The CFB mode uses an
existence in the current block. initialization vector (IV) (fixed in our implementation) and an
encryption key (variable) for each data block. The single
− Unary and truncated unary (TU) bin-strings: Because
encryption key is used to encrypt all data blocks of each layer
they have different codeword lengths and cause the change
i.e. one encryption key will encrypt all three chosen bin-
of bit-rate.
strings on individual layers. So the client will receive only one
− FL bins: Because they have the mandatory header
encryption key to decrypt and watch the subscribed data. Let
information.
the three chosen bin-strings be Ƥ1B1, Ƥ2B2, Ƥ3B3 and their
The above discussed syntax elements have disadvantages
encryption represented as Ƈ1B1, Ƈ2B2, Ƈ3B3 then the general
with either lower compression efficiency and/or non-format
cipher process can be presented as:
compliancy. Therefore, we have eliminated them from our
∑ (Ƈ1B1, Ƈ2B2, Ƈ3B3) = {∑ (Ƥ1B1, Ƥ2B2, Ƥ3B3)} XOR {Encrypt
proposed SE scheme.
(Ƈ1B1-1, Ƈ2B2-1, Ƈ1B1-1)} (1)
We have proposed that the SE should be applied on the
bin-strings which are equally distributed, since this does not The encryption process (Fig. 3) is done in a unique way
change the compression efficiency of the codec and is encoded which makes the bit-rate consistent. The encryption of the sign
by the bypass BAC engine with uniform probability [39]. bits is not tricky, because of constant bin sizes, although
We found three bin-strings to fulfill our purpose of SE, handling of UEG0 suffix bins is different, as the suffixes have
these being: variable length codewords. If the sizes of suffix codewords are
i) UEG3 suffix; changed after encryption, the bit-rate will definitely increase.
ii) UEG0 suffix; and So, to make the codewords compression friendly, we first
iii) Signs of the transform coefficient levels. count the suffix bins present in each UEG0 bin string and then
The UEG3 suffix consists of the MVD sign bits if two perform the encryption only on the number of existing suffix
conditions hold i.e. |MVD| ≥ 9 and 0 < |MVD| < 9, the sign bits bins rather than on the whole suffix allocated size. This
of TC levels and the suffix of UEG0 can be encrypted only scheme makes the encrypted codewords of the same size as
when abs_level > 14. The experimental results (Section V) the original ones.
show that the selected bins are fully compression friendly,
format compliant, and do not alter the context models.
B. Bins selection for SVC layers
An SVC coded video has a base and a number of
enhancement layers [46], depending on how the three
scalabilities of temporal, spatial and quality at various
resolutions are used. The selected bins must be compliant with
all three scalabilities in SVC coded video, so the bins selection
is done also by keeping in mind the specific scalability type as
well. In SVC, every temporal layer requires the change of
MVD, dQP and coefficients. While spatial layers require the Fig. 3. Block Diagram of SE over CABAC bins
change of coefficients only and SNR layers require the change
The decryption process is reversed at the CABAC
of dQP and coefficients. Such SVC layer behavior shows that
decoding side. The client is supplied with the same encryption
the UEG0 suffix and sign of coefficient levels are the most
key that was used for enciphering. The encrypted values of the
suitable parameters for the SVC layer encryption because the
bin-strings are converted into the original bin values and then
coefficients are changing with every scalability option. The
passed to the inverse binarization, quantization and DCT
UEG3 suffix encryption is more suitable for temporal
process to get the finally de-ciphered and fully decoded bit-
scalability but is meaningful for all three scalabilities, as
stream. The general de-cipher process can be represented as:
spatial and SNR are usually combined with temporal
scalability. In Fig. 2, X represents the temporal scalability, Y ∑ (Ƥ1B1, Ƥ2B2, Ƥ3B3) = {Encrypt (Ƈ1B1-1, Ƈ2B2-1, Ƈ1B1-1)}
for spatial and Z for SNR Scalability. Z0 and Z1 denote the XOR {∑ ( Ƈ1B1, Ƈ2B2, Ƈ3B3)} (2)
SNR quality levels.
D. Key Management Scheme
C. SE on Bins by AES-CFB
Besides SE, another objective of the paper is to devise a
The SE is implemented on individual SVC coded layers by
key management scheme that provides users access to the
using AES-CFB, which is a stream cipher; hence it does not
layers that they have subscribed to while stopping access to
6

other layers. Providing scalable security requires that SE is layer Eln encryption key ekn will generate its immediate lower
applied on all layers of data from Bl0 (base layer) to Eln (top layer Eln-1 key ekn-1, ekn-1 will generate ekn-2 key and so on.
enhancement layer). If client Ci has subscribed to receive the This key generation is carried out at the client side. All the
data of layer Eli, he must have access to the entire lower layer recursively derived keys will be stored in the working
encryption keys (i.e. eki to ek0) to be able to decode the memory. The generalized concept of encryption keys
subscribed layer data. The management of all sets of layer Eli generation for lower SVC layers is represented as:
keys for client Ci is a potential security hazard especially when eknÆ HMAC (TEK , ekn Constant || RAND, ekn length) (8)
the scalable data has a large number of layers. Many problems ekn-1Æ HMAC (ekn , ekn-1Constant || RAND, ekn-1 length) (9)
arise with the generation of large number of keys, specifically: ekn-2ÆHMAC (ekn-1 , ekn-2Constant || RAND, ekn-2 length) (10)
1) high computational cost of generating multiple keys at one RAND is generated according to the PRF (a keyed pseudo-
time to get access to the Bl0 to Eli data, 2) memory random function) in [42]. The overall key generation scheme
consumption, and 3) time to save ek0 to eki keys which are is shown in Fig. 4.
sizeable as per the security requirements. Consequently, the
goal is to derive a mechanism in which each client needs to
hold a single encryption key to retrieve the subscribed layer
data. A single key significantly reduces the security hazards
related to key management, storage and transmission. In this
work MIKEY is used to provide this goal.
MIKEY generates the two major keys (TGK and TEK)
which will further generate the lower keys in a hierarchical
fashion. Table I shows the characteristics of all MIKEY keys
(key length, life time and constants) with their
generation/distribution summaries. Fig. 4. Keys generation mechanism
TABLE I
CHARACTERISTICS OF MIKEY KEYS After the key generation and distribution, the proposed
Key Generation/ MIKEY Key Life solution will provide client authentication and SE of the layers
Keys Length Distribution Constants Time by using AES-CFB stream cipher algorithm. The idea behind
(bits) Methods &
Parameters
the SE of scalable layers can be understood from Fig. 5.
TGK (Master key) 128 Diffie DH prime & 01 month Frame 1 Frame 2 Frame 3 Frame 4 Frame 5
Hellman base values ek2
El2
TEK 128 HMAC- 0x2AD01C64 Daily for ek1
SHA1(TGK) 12 Hrs. El1
ek0
Master Encryption 128 HMAC- 0x15798CEF For Session Bl0
key (eK) SHA1(TEK)
Authentication Key 160 HMAC- 0x1B5C7973 Unique for Fig. 5. Keys per SVC layers
(aK) SHA1(TEK) every User
Salt Keys (sK) 112 HMAC- 0x39A2C14B Daily for Three ascending order scalable layers are shown in Fig. 5,
SHA1(TEK) 12 Hrs. lowest is the base layer and the upper two are enhancement
layers. The term ‘frame’ in Fig. 5 generalizes I, P and B
TGK is generated by the Diffie-Hellman algorithm and it frames with their respective contents. Fig. 5 shows that SE is
generates TEK, while TEK further generates the master applied by key ek0 on the base layer video frames 1 and 5
encryption key, authentication key and salt key. The purpose (horizontal lines patterns). Three video frames 1, 3 and 5 are
of salt key generation is to enhance the security by altering on the first enhancement layer el1, frames 1 and 5 are already
some bytes of TEK on a daily basis and thus stop look-up encrypted by ek0; only frame 3 (vertical lines) belongs to the
table based attacks. The few bytes of the salt key are replaced El1, so SE is applied on frame 3 only by key ek1. This process
in the TEK and after 12 hours use of TEK, the salted TEK will of encryption is continued on all the above layers. The frames
be used for the next 12 hours. The general equations for which are already encrypted on lower layers will not be re-
overall keys generation scheme are: encrypted on upper layers. Only the respective layer frame(s)
TGKÆ gsr mod p (Diffie Hellman) (3) will be encrypted with the corresponding layer encryption key.
Where p=prime no., g=generator, sr=sender & receiver RAND The equations for the SE on bit-streams within layers are:
values ek2 (SE) Æ El2 Frames – El1 Frames (11)
TEK Æ HMAC (TGK , MIKEY Constant || RAND, TEK length) (4) ek1Æ El1 Frames – Bl0 Frames (12)
Master ekÆ HMAC (TEK , MIKEY eK Constant || RAND, eK length) ekoÆ Bl0 Frames (13)
(5) The process of SE on frames can be generalized as:
ak Æ HMAC (TEK , MIKEY aK Constant || RAND, aK length) (6) ekn (SE) Æ Eln Frames – Eln-1 Frames (14)
skÆ HMAC (TEK , MIKEY sK Constant || RAND, sK length (7)
The master encryption key further generates the 128-bit V. EVALUATION
lower layer keys. The lower layer keys are then used to The performance of the proposed SE with key
encrypt the content of the SVC lower layers by the use of self management scheme has been tested with the SVC reference
defined constants for each layer. The keys are generated in software (Joint Scalable Video Model) JSVM 9.19.10 version
recursively hierarchical fashion, i.e. top enhancement SVC encoder. For the evaluation of results, several different test
7

video sequences were chosen with different combinational A. Security Analysis


features such as colors with high/low contrast, motion, texture The robustness of the proposed key management and
and objects etc. The experiments were performed on CIF (352 encryption system against various security attacks has been
× 288 pixels/frame) resolution and 4CIF (704 × 576 evaluated by the following tests.
pixels/frame) resolution video sequences. Both CIF and 4CIF i) Replacement Attacks: To evaluate the strength of the
resolution test sequences were encoded into four layers (one proposed encryption system, we performed experiments on
base with three enhancement layers) representing three different sequences by replacing the encrypted bits of data
temporal, two spatial and two SNR scalable levels for CIF and with constant bits and determining the PSNR values against
four temporal, two spatial and two SNR scalable levels for such a guessing attack. If someone tries to guess and insert the
4CIF resolution pictures. All test videos were encoded in a data with the intention of improving video quality and make it
Main/High profile (for base layer encoding) and Baseline watchable, they would tend to insert the specific constant bits
profile for enhancement layers. The Intra and Inter frames or random strings of 0 and 1. But the proposed system is
were selectively encrypted in sequence of their occurrence in a robust against such replacement/guessing attacks. As an
bit-stream with GOP size 8 and Intra period 16. The SE results experiment, in the News (CIF) video, we replaced the
are formulated by taking different QP values, different encrypted data with 0’s on MVD signs, 1’s on signs of run
encoding frame rates and by calculation of computational levels and added a constant integer value of five in the suffixes
overhead in terms of encryption/decryption and key generation to get the video. The result is a distorted image in Fig. 7(c).
timings.
To demonstrate the efficiency of our proposed scheme, we
have encoded 90 frames at 30 fps for CIF and 4CIF resolution
videos. Table II compares the average PSNR of 90 frames
(I+P+B) with and without SE. It shows the suitability of our
SE scheme for both Intra and Inter frames of CIF and higher (a) (b) (c)
resolution pictures. The average PSNR value of luma is in the Fig. 7. Impact of replacement attack on the News (CIF) sequence encoded
with 90 frames (I+P+B) and QP 24. (a) Frame # 41 [Y=42.29, U=45.15,
lower range for all CIF and 4CIF resolution pictures. V=46.33] dB. (b) Proposed SE [Y=11.35, U=19.92, V=24.32] dB. (c)
TABLE II Replacement attack [Y=3.94, U=17.58, V=19.01] dB.
COMPARISON OF AVERAGE PSNR (dB) OF 90 FRAMES (I+P+B) AT QP 24
Sequences Plain SE Plain SE Plain SE
ii) Video perception test with different keys: In another
(CIF) PSNR PSNR PSNR PSNR PSNR PSNR experiment we tested the effect of changing the keys and
Y Y U U V V checked the sensitivity of video perception against variation in
CITY 38.6 10.3 45.4 30.0 46.8 31.7 the keys. Let us assume a malicious user is able to guess the
CONTAINER 40.8 7.4 46.4 25.0 46.7 25.0
key to a very near exact value, i.e. there is a difference of only
CREW 40.3 12.0 44.9 12.0 44.7 22.7
FOOTBALL 38.5 10.6 43.1 20.6 44.0 19.4 one or two bits as compared to the exact key value and use this
FOREMAN 39.6 8.1 44.8 24.5 47.2 26.2 guessed key for decryption. It is observed that the video can
HARBOUR 37.6 7.4 44.7 21.7 45.8 34.7 still not be decrypted. On the other hand, the video quality is
ICE 41.8 10.6 48.1 29.7 48.7 25.3 noticeably changed with even a single bit change in the key.
MOBILE 37.6 7.3 41.3 18.8 41.1 15.0 This test shows that a hacker’s attempt to guess the video
NEWS 42.2 11.3 45.1 19.9 46.3 24.3
SOCCER 39.4 7.9 45.4 22.8 46.9 21.6
parameters will fail until the hacker is able to guess the exact
Sequences (4CIF) key. Let us assume, the hacker has guessed the exact key
CITY 37.1 10.0 44.9 26.8 46.8 29.3 (nearly impossible for a 128 bit-key), even then, they will
HARBOUR 37.2 7.1 44.7 23.1 46.2 32.6 succeed for a very short time, as every time whenever the
ICE 41.7 11.1 49.2 31.1 49.9 27.5 same sequence is played, it will be encrypted with a different
SOCCER 39.2 7.2 45.4 21.4 47.2 22.4
encryption key and will have a different perception. We did
We performed the experiments at different QP values of 8, this experiment (Fig. 8) by encoding the same sequence twice,
16, 24, 32, 40 and 48 to show the independence of the as every time a new encryption key is generated and used, so
proposed SE on the QP values. The graphs in Fig. 6 show the different video was produced. We also tried to decrypt with a
PSNR variance in YUV values with and without SE on Mobile key of only one bit changed. The results confirmed our
(CIF) and ICE (4CIF) videos on different QP values for Intra findings, as mentioned above.
and Inter frames. Both graphs verify that our SE scheme is
independent of QP, and the average PSNR is still in the lowest
range at all QP values.
55 57
Original Y Original Y
PSNR (dB)

45 47
PSNR (dB)

Encrypted Y Encrypted Y
35
Original U
37
Original U
(a) (b) (c)
25 Encrypted U 27 Encrypted U Fig. 8. Impact of keys on video perception of the News (CIF) sequence
15
Original V
17
Original V encoded with 90 frames (I+P+B) and QP 24. (a) Frame #41[Y=42.29,
5
Encrypted V
7
Encrypted V
U=45.15, V=46.33] dB, (b) ek change by 1 bit [Y=11.34, U=19.87, V=24.78]
8 16 24 32 40 48 8 16 24 32 40 48 dB, and (c) ek change by 2 bits [Y=11.37, U=19.79, V=24.70] dB.
QP values QP values
(a) (b) iii) Exhaustive Key Search Attack: The exhaustive key search
Fig. 6. PSNR variance of (a) Mobile (CIF) sequence and (b) ICE (4CIF) is a strategy to find the correct key by continuously trying
sequence at different QP values every possible key in turn until the correct key is identified.
8

However, it is not practicable to find a 128-bit key by Table IV shows the standard deviation of luma values after
exhaustive key search. To quantify this security we can relate SE. Note these are smaller than the original video while the
the number of generated attacks on data and keys with Poisson chroma values are larger, this produces the dark or bright color
probability distribution, given by P (µ; n) = , where e is pictures. The statistical analysis shows that the luma and
! chroma values of the whole video are drastically changed by
a constant equal to approx. 2.71828, µ is the number of attacks
the proposed SE and there is no way to extrapolate/derive the
and n is the actual number of attacks occurring in the fixed
encrypted parts from the un-encrypted parts.
interval of time of region. P defines the probability of a given
number of events (attacks) occurring in a fixed interval of
C. Computational Overhead Analysis
time. CISCO security statistics [47] show that, an attack on a
The computational overhead is calculated on the basis of
host machine occurs every five minutes, translating to about
the additional processing time required for the encoding and
300 attacks per day. We assume that 20% of these attacks are
decoding of test sequences with SE on whole SVC bit-stream
on video and if there is one attack every hour, a continuous
and on per layer basis. The experiments were performed on a
time Markov chain [48] can be associated with the attacks
machine, Intel Core 2 Duo (3.33GHz) processor with 4GB
queue. Our system is robust enough to meet the security
RAM. Tables V(a) and V(b) show the encoding and decoding
needs, as the time for a traffic encryption key (TEK) is fixed
timings of the ICE (CIF) and ICE (4CIF) videos respectively
i.e. 12 hrs and after every 12 hrs TEK will be changed. Within
at different frame rates with and without SE. It is also noted
these 12 hrs the number of attacks that can occur is not likely
here that additional encoding and decoding delay (Tables V(a)
to successfully break the key, as the TEK will be replaced by a
and V(b), column no. 4 and 7) includes the keys generation
new one. So the previously rendered successful attack will be
time as well which is calculated separately and shown in Fig.
useless for all subsequent key changes.
10. The processing delays are negligible as they fall in the
B. Statistical Analysis range of milliseconds, verifying the efficiency of the proposed
An image data distribution can be examined by two scheme on Intra and Inter frames for four-layer SVC, on both
statistical measuring parameters which are mean µ and its the encoder and decoder side. Fig. 9 shows the additional
standard deviation σ. The pixels within an image are highly computational delay on encoding and decoding in
correlated with each other in horizontal, vertical and diagonal milliseconds with different numbers of frames (x-axis).
directions. As a result, when the image is encrypted the Table V (a)
THE COMPUTATIONAL OVERHEAD MEASUREMENT (MILLISECONDS) FOR THE
entropy (data randomness) falls and correlation becomes high ICE (CIF) SEQUENCE AT A DIFFERENT NUMBER OF ENCODED FRAMES (I+P+B)
because the video frames (texture & edges) are converted into AND QP 24
flat regions and produce artifacts in the image. During SE, No. of Encoding Encoding Encoding Decoding Decoding Decoding
frames time time with Delay time with time Delay
pixel values were truncated to a maximum and minimum of without SE SE without
255 and 0 respectively. This causes the spread of dark or very SE SE
bright colors across the video image, which is why correlation 10 6248.72 6227.62 21.1 454.35 439.05 15.3
30 19152.41 19108.91 43.5 970.59 937.09 33.5
and data randomness increase in the encrypted video. 50 31933.08 31865.18 67.9 1469.85 1423.15 46.7
Correlation of adjacent pixels is dependent on the local µp and 70 44753.66 44664.46 89.2 2008.4 1941.2 67.2
σp. A statistical analysis on video was performed on the 90 57724.86 57609.76 115.1 2490.14 2406.24 83.9
Mobile (CIF) sequence to show the impact of SE on video Table V (b)
statistics. The mean (Table III) and standard deviation (Table THE COMPUTATIONAL OVERHEAD MEASUREMENT (MILLISECONDS) FOR THE
ICE (4CIF) SEQUENCE AT A DIFFERENT NUMBER OF ENCODED FRAMES
IV) were determined for the local neighborhood of each pixel,
(I+P+B) AND QP 24
before averaging across all pixels and all frames of the tested No. of Encoding Encoding Encoding Decoding Decoding Decoding
sequence. frames time time with Delay time with time Delay
TABLE III without SE SE without
SE SE
MEAN (µ) OF SE FOR MOBILE (CIF) SEQUENCE WITH 90 FRAMES (I+P+B) AT
10 21955 22012 55 1321.31 1285.59 35.72
DIFFERENT QP VALUES
QP µ of µ of µ of µ of µ of µ of 30 68432 68552 120 3514.5 3440.18 74.32
50 113956 114193 237 5615.69 5510.39 105.30
Values Plain Y SE Y Plain U SE U Plain V SE V
70 158945 159316 371 7645.58 7481.89 163.69
8 135.23 46.02 113.25 111.52 131.61 126.82
90 204964 205461 497 9719.98 9489.50 230.48
16 135.31 54.31 113.29 119.12 131.74 96.29
24 135.42 53.25 113.45 119.07 131.81 145.56 140 600
32 135.53 42.29 113.51 121.24 131.93 122.98 Encoding Delay Encoding Delay
Time in Milliseconds
Time in Milliseconds

120 500
Decoding Delay
40 135.47 41.18 113.38 111.20 131.98 97.71 100
Decoding Delay
400
48 135.28 29.09 113.42 103.92 132.07 113.83 80
300
TABLE IV
60
200
40
STANDARD DEVIATION (Σ) OF SE FOR MOBILE (CIF) SEQUENCE WITH 90 20 100
FRAMES (I+P+B) AT DIFFERENT QP VALUES 0 0
QP σ of σ of σ of σ of σ of σ of 10 30 50 70 90 10 30 50 70 90
No. of Frames No. of Frames
Values Plain Y SE Y Plain U SE U Plain V SE V
8 63.58 44.05 21.83 26.54 26.50 38.21 (a) (b)
16 63.50 46.81 21.76 29.23 26.40 39.52 Fig. 9. Additional encoding and decoding delay caused by SE on (a) ICE
24 63.26 44.54 21.52 24.56 26.13 34.70 (CIF) video, and (b) ICE (4CIF) video.
32 62.82 40.29 21.14 28.08 25.66 45.99
40 61.95 38.82 20.51 28.08 25.10 44.03
Different numbers of frames were tested to study the
48 59.01 33.62 20.37 28.44 24.74 52.51 suitability of the scheme for real-time transmissions, for
9

example by encoding 10, then 30, up to 90 frames. The ITU-T


G.114 [49] recommends a maximum of a 150 ms one-way
latency for real-time streaming over the Internet. Suppose the
sender and receiver both have a maximum buffer size of 30
frames, the results (Tables V(a) and V(b)) show that the (a) (b) (c) (d)
encoding and decoding overheads are negligible for both CIF Fig. 11. Impact of SE applied on layer basis on the News (CIF) sequence
and 4CIF resolutions when a sequence is encoded 30 frames at with 90 frames (I+P+B) and QP 24. (a) SE applied on Layer 0 [Y=14.5,
U=23, V=28.3] dB, (b) SE applied on Layer 0, 1[Y=12.9, U=21.7, V=26.8]
a time. Thus the proposed security system is suitable for the
dB, (c) SE applied on Layer 0, 1, 2 [Y=11.6, U=19.9, V=24.3] dB, and (d)
streaming of pre-encoded video such as in Web TV, and IPTV SE applied entire layers [Y=11.3, U=19.8, V=24.2] dB.
applications, as well as for interactive real-time
communication scenarios such as video conferencing PSNR values are degraded gradually with the specific layer
applications, to which SVC is now being applied. encryption. If someone has the key for the base layer but not
The computational overhead is not only calculated by the for the enhancement layers then he can only view the base
encryption and decryption timings on an SVC bit-stream, the layer contents while the enhancements remain encrypted.
key generation timings are also calculated separately to show Fig. 12 shows the layer-wise decryption of News (CIF)
the exact computational overhead of deriving multiple keys. sequence. Considering four SVC layers, if someone has just
Fig. 10 depicts the time (in microseconds) required for the base layer key; the video on the base layer is totally clear
generating the desired keys. (but with smaller resolution, SNR and frame rate). However,
the client cannot view the data of three enhancement layers
120
109 and the video will be the same as shown in Fig. 12(a). The
Time in Microseconds

100 94 keys are generated in hierarchical top-down fashion, if


2 layers
80
79 someone is subscribed to layer 1 data and has the layer 1 key
4 layers
60 64 only, then he can generate the lower layer 0 key and be able to
6 layers
33 49 view the video shown by Fig. 12(b). Fig. 12(d) shows the
40 27 8 layers
19 video with layer 3 key enabled and has the maximum quality.
20 8 10 layers

0
TEK aK Master ek sK Ln to L0
eK
Keys
Fig. 10. Keys generation timings

For each subscriber, three keys TEK, ak and a master key (a) (b) (c) (d)
ek have to be derived. TEK and ak will be generated once at Fig. 12. Impact of having a layer wise key for decryption on the News (CIF)
the client registration stage and must be unique for each client. sequence with 90 frames (I+P+B) and QP 24. (a) Decryption by Layer 0 key
(eK0) [Y=13.86, U=23.51, V=25.99] dB, (b) Decryption by Layer 1 (eK1)
In addition, depending upon the subscribed layers by the key [Y=16.61, U=25.46, V=28.96] dB, (c) Decryption by Layer 2 (eK2) key
client, the master encryption key is generated by the system [Y=41.31, U=43.66, V=45.93] dB, and (d) Decryption by Layer 3 (eK3) key
for the subscribed layer, and sent to the client. Then he derives [Y=42.29, U=45.15, V=46.33] dB.
his own encryption keys for all the lower layers. It is a D. Comparative Analysis
hierarchical system and each layer encryption key ek1 is For comparative analysis of our proposed key management
derived from its former layer ek0. The timings given in Fig. 10 and SE scheme with the existing work, we choose eight
are for the keys of the scalable layers El10, El8, El6, El4 and El2, encryption and key management methods specifically for
but for generating hierarchical encryption keys these must be CABAC entropy coding of H.264/SVC scalable video codec.
derived from layer El10 to El0. The experiments show that the The chosen proposed techniques are compared on the basis of
timings of generating TEK, aK, master eK and sK are the the following parameters which are denoted by comparison
same whether they are generated for layer El0 or layer El10. symbol Cn in comparison Table VI:
The difference is shown in the encryption keys generation
timings of layers Ln to L0. If the hierarchical encryption keys C1- Selected parameters for encryption
are derived for just two scalable layers (base and enhancement C2- Compression friendliness
layers), it will take 49 microseconds and if they are generated C3- Format compliance
for ten layers (one base and nine enhancement layers) then it C4- Entropy coding
will take 109 microseconds. The frequent key generation does C5- Bit-rate overhead
not cause much additional overhead on the C6- Encryption algorithms
encryption/decryption computational cost of the proposed C7- Incorporated key management scheme for SVC layers
system because of the negligible key generation time. C8- Key management protocol
The computational cost is also calculated for each scalable All the compared techniques are applied in the same
layer. We have measured the per layer encoding/decoding domain of selective encryption on CABAC of H.264/SVC.
time with SE on the News (CIF) sequence. The maximum The encryption proposed by Stütz et al. [21] was applied on
processing delay time for 90 frames at 30 fps for the entire NAL units of an SVC bit-stream and it was reported that there
four layers encoding was 0.1124 ms and decoding processing was a small bit-rate overhead due to the change in the number
delay was 0.1213 ms. Fig. 11 shows the impact of encryption of bytes after NAL unit encryption. Recent research regarding
on the individual layers; the quality of sequence and YUV SVC is presented by [25], which has detailed work on SVC
10

TABLE VI
COMPARATIVE ANALYSIS OF PROPOSED SECURITY SYSTEM

Proposed schemes C1 C2 C3 C4 C5 C6 C7 C8
Thomas Stütz et al. NAL Unit Yes Yes CABAC/ Yes AES-ECB No Not specified
[21] CAVLC
Gul Boztok Align et DC alteration, signs of texture No Yes Not Yes XOR No Not specified
al. [25] and MVD specified
Chunhua Li et al. IDR frames, PPS, SPS, IPM , No Yes CABAC/ Yes LEX stream Yes Not specified
[26][27] signs of texture CAVLC cipher
Yong Geun Won et al. Signs of texture, MVD and FGS Yes Yes CABAC No XOR stream Yes Not specified
[30] cipher
Yeongyun Kim et al. Region of interest (ROI) with Yes Yes CABAC No XOR stream No Not specified
[31] signs of texture MVD and FGS cipher
Su Wan Park et al. IPM, signs of residual and MVD No Yes CABAC/ Yes Stream cipher Yes Not specified
[32] [33] CAVLC
Our scheme UEG3 suffix, UEG0 suffix, and Yes Yes CABAC No AES-CFB Yes MIKEY
signs of TC levels

layers. However, the DC value alteration in [25] damages the except the inevitable minimal computational overhead due to
video statistics before compression and thus causes a bit-rate SE over SVC layers.
overhead. The IPM encryption [26][27][32][33] changes the The significance of the proposed system is to resolve the
video statistics, hence compression efficiency degradation multiple key overhead issues: the subscriber of each layer will
increases the bit-rate. The studies in [26][27][30][33] provide receive only one encryption key to use, but this key will
complete security systems for SVC layers and complex key transparently open the doors of all the layers below. The
management schemes are proposed, without any reference to proposed system is suitable for video distribution to users who
standard key management protocols. More than one keys were have subscribed to a different video quality regarding
generated per layer in these works, hence they do not solve the bandwidth, storage and device rendering capabilities. The
problem of overhead for managing multiple keys for each same system can be extended to ROI for bit-rate reduction in
layer. The selective encryption presented in [30] was video surveillance [31][51] without any modification. The
implemented in a similar way on ROI by Kim et al. [31] but error resilience [52] issues for the proposed system can be
without a key management scheme. investigated in the transmission scenarios of scalable layers as
To summarize, we have proposed a complete security a future work.
system for scalable video content protection. It incorporates
the standard security algorithm AES-CFB for SE on justified REFERENCES
SVC bin-strings; and the key management protocol (MIKEY) [1] T. Wiegand, G. Sullivan, J. Sullivan, G. Bjøntegaard, and A. Luthra,
is used for client authentication at the registration phase and “Overview of the H. 264/AVC video coding standard,” IEEE Trans.
also for key generation/distribution on layer basis. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560-576, 2003.
[2] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira,
VI. CONCLUSIONS AND FUTURE WORK T. Stockhammer, and T. Wedi, “Video coding with H. 264/AVC: Tools,
performance, and complexity,” IEEE Circuits Sys. Mag., vol. 4, no. 1, pp.
In this paper, an efficient complete security system has 7-28, 2004.
been proposed for H.264 scalable video codec on CABAC [3] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video
bin-strings. The security system incorporates selective coding extension of the H. 264/AVC standard,” IEEE Trans. Circuits
protection of the scalable layers utilizing DRM techniques Syst. Video Technol., vol. 17, no. 9, pp. 1103-1120, 2007.
[50] for client authentication at registration stage and efficient [4] Y. Mao, and M. Wu, “A joint signal processing and cryptographic
key management mechanism through MIKEY. AES-CFB is approach to multimedia encryption,” IEEE Trans. Image Process., vol.
used for SE on sensibly chosen bin-strings by taking into 15, no. 7, pp. 2061–2075, July 2006.
account the security of the video, compression efficiency, bit- [5] F. Cayre, C. Fontaine, and T. Furon, “Watermarking security: Theory and
rate fluctuation, format compliance and scalability features practice,” IEEE Trans. Signal Process. vol. 53, no. 10, pp. 3976–3987,
2005.
(Temporal, SNR and Spatial) of H264/SVC. The results show
[6] H. D. Engel, R. Kutil, and A. Uhl. “A symbolic transform attack on
that our scheme is fully implementable with all scalable
lightweight encryption based on wavelet filter parameterization”, in Proc.
features (Temporal, SNR and Spatial) of SVC and with Intra of ACM Multimedia and Security Workshop, MM-SEC ’06, pp. 202-207,
and Inter coded (I, P & B) frames. The performance of the Geneva, Switzerland, Sept. 2006.
proposed system is justified by many important factors such [7] B. Furht, E. Muharemagic, and D. Socek, eds., Multimedia Encryption
as: a security analysis on video perception and keys, video and Watermarking, Springer Verlag, New York, NY, 2005.
statistical analysis after the application of pro-compression [8] A. Uhl, and A. Pommer. “Image and video encryption: From Digital
encryption, computational overhead calculation caused by SE Rights Management to secured personal communication,” Advances in
with a keys generation process and comparative analysis with Information Security Series, vol. 15. Springer-Verlag, New York, NY,
existing work. The results demonstrate that the proposed 2005.
security system has no drawbacks over security, compression [9] M. Abomhara, O. Zakaria, O.O. Khalifa, A.A. Zaidan, and B.B. Zaidan
efficiency, bit-rate and format compliance on the decoder side, “Enhancing selective encryption for H.264/AVC using Advanced
Encryption Standard,” Int. J. of Computer and Electrical Eng., vol. 2, no.
2, pp. 223-229, 2010.
11

[10] B. Barmada, M. M. Ghandi, E. V. Jones, and M. Ghanbari, “Prioritized [32] S.W. Park, and S.U. Shin. "An efficient encryption and key management
transmission of data partitioned H.264 video with hierarchical QAM,” scheme for layered access control of H.264/Scalable Video Coding,”
IEEE Signal Process. Lett., vol. 12, no. 8, pp. 577-580, August 2005. IEICE Trans. on Information and Systems, vol. 92, no. 5, pp. 851-858,
[11] P. Melih, and D. Vadi, “A MPEG-2-transparent scrambling technology,” 2009.
IEEE Trans. Consum. Electron., vol. 48, no. 2, pp. 345-355, May 2002. [33] S.W. Park, and S.U. Shin, “Efficient selective encryption scheme for the
[12] C. Wang, H.B. Yu, and M. Zheng, “A DCT-based MPEG-2 transparent H.264/Scalable Video Coding (SVC),” Int. Conf. Networked Computing
scrambling algorithm,” IEEE Trans. Consum. Electron., vol. 49, no. 4, and Advanced Inf. Management, 2008, pp. 371-376.
pp. 1208 – 1213, Nov. 2003. [34] R. Rivest, “The MD5 Message-Digest Algorithm,” IETF RFC 1321, Apr.
[13] W. Zeng, and S. Lei, “Efficient frequency domain selective scrambling of 1992.
digital video,” IEEE Trans. Multimedia, vol. 5, no. 1, pp. 118-129, March [35] H. Krawczyk, M. Bellare, and R. Canetti, “HMAC: Keyed-hashing for
2003. message authentication,” IETF RFC 2104, 1997.
[14] S. Spinsante , F. Chiaraluce, and E. Gambi, “Masking video information [36] G.B. White, E.A. Fisch, and U.W. Pooch, Computer System and Network
by partial encryption of H.264/AVC coding parameters,” 13th Europ. Security, CRC Press, Boca Raton, FL, 1995.
Signal Proc. Conf., 2005. [37] E. Magli, M. Grangetto, and G. Olmo, “Joint source, channel coding, and
[15] Y. Fan, J. Wang, T. Ikenaga, Y. Tsunoo, and S. Goto, “An unequal secure secrecy,” EURASIP J. on Information Security, vol. 2007, Article ID
encryption scheme for H.264/AVC video compression standard,” IEICE 79048, 7 pages, 2007.
Trans. Fundamentals of Electronics, Communications and Computer [38] G. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC Advanced
Sciences, vol. 91, no. 1, pp. 12-21, 2008. Video Coding standard: overview and introduction to the fidelity range
[16] Y. Fan, J. Wang, T. Ikenaga, Y. Tsunoo, and S. Goto, “A new video extensions,” SPIE Conf. on Applications of Digital Image Processing
encryption scheme for H.264/AVC,” Advances in Multimedia XXVII, 2004, pp. 454-474.
Information Processing, 2007, LNCS vol. 4810, pp. 246–255. [39] D. Marpe, H. Schwarz, and T.Wiegand, “Context-adaptive binary
[17] J.R. Ohm, “Advances in scalable video coding,” Proceedings of the arithmetic coding in the H.264/AVC video compression standard,” IEEE
IEEE, vol. 93, no. 1, pp. 42-56, 2005. Trans. Circuits Syst. Video Technol., vol. 13, pp. 620–636, July 2003.
[18] J.G. Apostolopoulos, “Architectural principles for secure streaming & [40] J. Teuhola, “A compression method for clustered bit-vectors,” Inf. Proc.
secure adaptation in the developing scalable video coding (SVC) Lett., vol. 7, no. 6, pp. 308-311, 1978.
standard,” Invited paper presented at the Network-Aware Multimedia [41] M. Ghanbari, Standard codecs: Image compression to advanced video
Processing and Communications special session at IEEE ICIP 2006. coding, 3rd edition, IET Press, London, UK, 2011.
[19] S.J. Wee, and J.G. Apostolopoulos, “Secure scalable video streaming for [42] J. Arkko, E. Carrara, F. Lindholm, M. Naslund, and K. Norrman.
wireless networks,” IEEE Int. Conf. Acoustics, Speech, and Sig. Proc., “MIKEY: Multimedia Internet KEYing,” IETF RFC 3830, 2004.
May 2001, pp. 2049-2052. [43] Federal Information Processing Standards Publication 197, November 26,
[20] S.J. Wee, and J.G. Apostolopoulos, “Secure scalable streaming enabling 2001-ADVANCED ENCRYPTION STANDARD (AES), available from
transcoding without decryption,” IEEE Int. Conf. Image Proc., Oct. 2001, http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
pp. 437-440. [44] M. Kuchar, “Dispelling the myths of cryptography,” Database and
[21] T. Stütz, and A. Uhl. “Format-compliant encryption of H. 264/AVC and Network Jornal, vol. 30, no. 2, pp. 3-3, 2000.
SVC,” Tenth IEEE Int. Symposium Multimedia, Jan. 2009, pp. 446-451. [45] B. Esslinger, “The CrypTool Script: Cryptography, Mathematics and
[22] E. Magli, M. Grangetto, and G. Olmo. “Conditional access techniques for More,” 10th edition{distributed with CrypTool version 1.4.30}, 2010.
H.264/AVC and H.264/SVC compressed video,” IEEE Trans. Circuits [46] B.B. Zhu, M.D. Swanson, and S. Li, “Encryption and authentication for
Sys. Video Technol., 2008. scalable multimedia: Current state of the art and challenges,” in Proc.
[23] E. Magli, M. Grangetto, and G. Olmo. “Transparent encryption SPIE Internet Multimedia Management System, vol. 5601, pp. 157-170,
techniques for H.264/AVC and H.264/SVC compressed video,” J. of Oct. 2004.
Signal Proc, vol. 91, no. 5, May 2011. [47] D. Tesch, and G. Abelar, “Security threat mitigation and response:
[24] C. Yuan, B.B. Zhu, Y. Wang, S. Li, and Y. Zhong, “Efficient and fully Understanding Cisco security MARS”, Cisco Press, Indianapolos, IN,
scalable encryption for MPEG-4 FGS,” in Proc. of the IEEE Int. 2006.
Symposium on Circuits and Syst., May. 2003, pp. 620–623. [48] D. Malone, and W. G. Sullivan, “Guesswork and entropy.” IEEE Trans.
[25] G.B. Algin, and E. T. Tunali. “Scalable video encryption of H. 264 SVC Inf. Theory, vol. 50, no. 3, pp. 525-526, 2004.
Codec,” J. of Visual Communication and Image Representation, vol. 22, [49] ITU-T, One-Way Transmission Time ITU-T Recommend. G.114, Feb.
no. 4, pp. 353-364, May 2011. 1996.
[26] C. Li, X. Zhou, and Y. Zong, “NAL level encryption for scalable video [50] E. T. Lin, A. M. Eskicioglu, R. L. Lagendijk, and E. J. Delp, “Advances
coding,” in proc. PCM, no. 5353, pp. 496–505, 2008. in digital video content protection,” Proceedings of the IEEE, vol. 93, no.
[27] C. Li, X. Zhou, and Y. Zong, “Layered Encryption for Scalable Video 1, pp. 171-183, 2005.
Coding,” IEEE Conf. on Image and Signal Proc., Oct. 2009, pp. 1–4. [51] J.M. Rodrigues, W. Puech, and A. Bors, “Selective encryption of human
[28] X. Wang, N. Zheng, and L. Tian, “Hash key-based video encryption skin in JPEG images,” in Proc. IEEE Int. Conf. Image Process, pp. 1981-
scheme for H. 264/AVC,” Signal Processing: Image Communication, 1984, Oct. 2006.
Signal Processing: Image Communication vol. 25, no.6, pp. 427-437, Jul. [52] A. Massoudi, F. Lefebvre, C.D. Vleeschouwer, B. Macq, and J.-J.
2010. Quisquater, “Overview on selective encryption of image and video,
[29] C. Yuan, Y. Zhong, and Y. He, “Selective video stream encryption challenges and perspectives,” EURASIP J. on Information Security,
algorithm based on chaos,” Chinese Journal of Computers, vol. 27, no. 2, [online journal] Article ID 179290, 18 pages, 2008.
pp. 257-263, 2004.
[30] Y.G. Won, T.M. Bae, and Y.M. Ro, “Scalable protection and access
control in full scalable video coding,” in Proc. 5th Int. Workshop on
Digital Watermarking, 2006, LNCS vol. 4283, pp. 407–421.
[31] Y. Kim, S.H. Jin, T.M. Bae, and Y.M. Ro, “A selective video encryption
for the region of interest in Scalable Video Coding,” IEEE Region 10
Conference, 2007, pp. 1-4.
12

Mamoona Naveed Asghar received the Bachelors degree in Computer


Science from the Islamia University of Bahawalpur, Punjab, Pakistan and Mohammad Ghanbari (M’78-SM’97-F01) is an Emeritus Professor in the
Masters degree in Computer Science with the major in Computer Networks School of Computer Science and Electronic Engineering, University of Essex,
Security from the International Islamic University, Islamabad, Pakistan. United Kingdom. He is best known for the pioneering work on two-layer
Currently she is a PhD student in the School of Computer Science and video coding, now is known as SNR scalability in the standard video codecs,
Electronic Engineering, University of Essex, United Kingdom. which earned him the Fellowship of IEEE in 2001. He has registered for
Before her PhD, she was serving in her parent university, The Islamia eleven international patents and published more than 600 technical papers on
University of Bahawalpur, Punjab, Pakistan, as an Assistant Professor in various aspects of video networking, many of which have had a fundamental
Department of Computer Science and IT since 2006. Her research interests influence in this field. These include: video/image compression,
include the security aspects of multimedia (Audio and Video), Compression, layered/scalable video coding, video transcoding, motion estimation, video
Encryption, Steganography, Secure transmission and the key management quality metrics, etc. He is the author and co-author of 7 books, and his book
schemes for standard and scalable video. on Video coding: an introduction to standard codecs, published by IET press
in 1999, received the Rayleigh prize as the best book of year 2000 by IET.
He has been an organizing member of several international conferences and
workshops. He was the general chair of 1997 international workshop on
Packet Video and Guest Editor to numerous special issues on video
networking. He has served as Associate Editor to IEEE Transactions on
Multimedia (IEEE-T-MM from 1998-2004) and represented University of
Essex as one of the six UK academic partners in the Virtual Centre of
Excellence in Digital Broadcasting and Multimedia. He is a Fellow of IEEE,
Fellow of IET and Chartered Engineer (CEng).

You might also like