Professional Documents
Culture Documents
protection and key management. Section III describes developed by performing transparent encryption/conditional
elements of the H.264/SVC codec, specifically CABAC access on the H.264/SVC layers. The scalable enhancement
entropy coding used in this study. It also reviews the key layers are encrypted by using AES, while leaving the base
management protocol (MIKEY) and cipher algorithm (AES- layer in plain format. However, the authors point out that the
CFB). Section IV describes the proposed security system enhancement layers are non-format compliant.
features and implementation. Section V elaborates the On the other hand, there is a conviction that if the base
experimental results on video appearance after application of layer is protected then no one can get the data from the
SE with a detailed performance analysis. Finally, Section VI enhancement layers and the whole SVC bit-stream is secured.
contains concluding remarks with some proposals for future The research shows [24] that if objects are encrypted in this
work. way, the real content can be easily guessed without
decryption. Consequently, Algin et al. [25] proposed the idea
II. PRIOR RESEARCH ON H.264 SECURITY of SE on SVC with three security levels. The idea concerns
Many multimedia selective content encryption methods the encryption of signs of coefficients, signs of Motion
have been proposed over the last five years [7][8], for the Vectors (MV) and alteration of DC values. The sign
security of the latest H.264/AVC standard video codec. The encryption has no effect on bit-rate and compression
study in [9] describes the SE on I-frames by extracting them efficiency (as the signs are equally distributed) but DC value
from the H.264 bit-stream and using the AES algorithm for alterations affect the compression efficiency.
ciphering/deciphering. The scheme reduces the computational There has been some recent work on key
cost but it is non-format compliant. Additionally, the idea is generation/distribution for standard and scalable video coding.
not suitable for selective encryption, because I-frame Li, et al. [26][27] devised a NAL level selective encryption
encryption is not as significant as encryption of other encoding technique for H.264/ SVC. The scheme encrypts the
components, e.g. Data Part A from the Data Partition Mode of instantaneous decoding refresh (IDR) Pictures, sequence
H.264/AVC [10]. The idea of scrambling DCT coefficients parameter set (SPS), picture parameter set (PPS) on individual
[11] was applied by Wang et. al [12] but it degraded the NAL units [26] and intra prediction modes (IPM) with signs
compression efficiency. Selective scrambling of bits, DCT of textures for base layer [27] by using the stream cipher Leak
coefficients and motion vectors were proposed by Zeng et. al Extraction (LEX) Algorithm. The LEX uses three keys for
[13] which also degraded the compression efficiency. each of the three NAL units. The study pointed out required
Spinsante et al. [14] proposed H.264/AVC partial encryption future work such as a key management scheme, which is a
of quantization parameters (QP), deblocking filter coefficients foremost issue in the security of any cipher algorithm. Wang
and Intra prediction mode, one by one, and altogether for the et al. [28] demonstrated the idea of hierarchical key generation
final outcome. The selected parameters increase the bit-rate for the cipher algorithm to encrypt the partial H.264/AVC
and the encryption algorithm which is described for the results video content including the intra-prediction mode, motion
is inefficient. Fan et al. [15][16] presented a novel video vector differences (MVD), and quantization coefficients.
encryption scheme for H.264/AVC. Three different Every frame has a unique key in the whole video as in the
block/stream cipher algorithms AES, FLEX (Fast Leak group of pictures (GOP) key generation design by Yuan et al.
Extraction) and XOR are used to encrypt the H.264/AVC [29]. Three sub-keys are also derived, each for the encryption
stream. The work describes the Unequal Secure Encryption of intra-prediction mode, motion vector differences, and
(USE) approach in which they classified the important and un- quantization coefficients. Therefore, if an attacker can hack
important video content by using data partitioning. The the frame key, he can decipher the frame but cannot obtain the
important data content is encrypted by AES, the least frame contents. This appears to be a derivative of MIKEY
important by FLEX, and XOR technique is used to show the with multiple key overhead but with reduced efficacy.
alternative simple encryption. The paper is a significant Perhaps, it was done to reduce the computational cost.
contribution towards H.264/AVC selective encryption but the Nevertheless, we feel that using the derivative instead of
computational cost can be further minimized by not MIKEY has weakened the security specifically in wireless
encrypting the identified un-important video data. transmission. The study in [30] investigates the scalable layer
Limited research has been carried out on the security of protection with individual layer keys. The keys are generated
SVC [17]. Apostolopoulos [18] investigated a Secure Scalable for individual scalable NAL units, meaning N different keys
Streaming (SSS) framework which provides end-to-end are derived and distributed to decode the individual layer. The
security and in-network secure transcoding for content using scheme is complicated and has a high computational cost for
SVC [19][20]. The NAL level encryption for H.264/AVC and identifying the NAL units related to the scalable features. The
SVC is also proposed in the study of [21]. The NAL units are same selective parameters for encryption are extended for the
individually encrypted after the compression so that they have protection of the region of interest (ROI) with a stream cipher
no side-effect on compression efficiency and format [31]. Park et al. [32][33] designed a hierarchical key
compliancy of the bit-stream. The scheme is applied by setting management scheme for the selective encryption of SVC. The
the NAL unit type of encrypted NALs outside the defined study in [32] proposed the scheme of partial encryption of
range, the decoder is forced to reject those NALs, unless base and enhancement layers. The intra prediction mode
encryption is enabled. The scheme is only applied to the SVC (IPM), motion vector differences and residual (texture sign
enhancement layers, with a small affect on bit-rate. In bits) are encrypted in the base layer. For the security of spatial
[22][23], a low-quality free preview application was already and SNR scalability layers the texture sign bits in every layer
are encrypted, but for temporal scalability layers the MVD
3
sign bits and texture sign bits are encrypted. The authors of resources. To make entropy coding computationally efficient,
[33] have devised a key management scheme by creating both CAVLC and CABAC use the single infinite extent
multiple keys i.e. all layer keys are generated with the help of codeword, called Exp-Golomb code [40] to generate the
a MD5 hash [34]. The NAL unit key is generated by a Hash required code for most of the data elements [41].
Message Authentication Code (HMAC) [35] and features CABAC encoding (Fig. 1) is based on three steps: 1)
created by the absolute DC and some threshold values. The Binarization, 2) Context Modeling (CM) and 3) Binary
key management scheme provides robustness against the arithmetic coding (BAC). The binarization step is the
known brute-force attack due to the different NAL unit keys. elementary stage for CABAC encoder. Here the input non-
All the reviewed studies have their own devised key binary syntax elements, such as the quantized transform
management mechanisms but they do not refer to any standard coefficients, macroblock type specifier or motion vector
key management protocol. The author of [36] pointed out that components are converted into unique binary codewords,
the earlier data communication protocols/standards had very known as bin-strings, for a given syntax element. The bit
few security features. Generally, the security was handled at a position in each bin-string, known as a bin, is then passed to
system level that uses the communication protocols. But these one of the two coding mode decisions; regular coding mode
days the communication protocols alone cannot handle the and by-pass coding mode. The bins in regular coding mode are
proliferating security demands of digital devices (smart passed to the next step, context modeling/probability
phones, tablets, netbook, and laptops). Hence there is a need distribution and then encoded by the regular BAC engine. The
for a key management mechanism to enhance the functionality bins from the bypass coding mode skip the context modeling
of the communication security protocols. step and directly enter in the bypass BAC engine for the
encoding process. These bins are related to the sign
III. AN OVERVIEW OF H.264/SVC information of MVD and the signs of transform coefficients
A. H.264 Scalable Video Coding levels or for lower significant bins which are assumed to be
H.264/SVC technology permits devices to send and uniformly distributed.
receive multi-layered bit-streams; it allows the transmission
and decoding of partial bit-streams to provide video services
with different frame rates, spatial resolutions (picture size) and
quality (SNR). Scalable video has a base and a number of
enhancement layers containing various improvements in frame
rates, resolution and quality per layer. Considering encryption
will alter data characteristics, it should be applied where it has
a minimal side effect. This can be achieved by applying the
encryption as part of the entropy coding where all the natural
Fig. 1. CABAC Encoder Top view
redundancies have already been exploited for maximum
compression efficiency. The problem with joint security and CABAC uses five binarization schemes according to syntax
compression is to make sure that the security will not affect elements similar to Huffman trees for binary sequences, which
the compression efficiency [37]. Due to this the encryption at are as follows:
entropy coding needs to be handled with great care, since (1) Unary code – each unsigned integer value symbol x ≥ 0 is
tampering with the statistical dependency of the symbols will mapped onto x “1” bits followed by a “0” terminating bit.
harm the compression efficiency. (2) Truncated unary (TU) code – defined for x with 0 ≤ x ≤ CV
The entropy coding used in H.264/AVC and its extension (cutoff value) is coded with a unary code if x < CV. If x < CV,
H.264/SVC, is context adaptive, and is applied in the two the terminating 0 bit is neglected and the TU codeword
forms of Huffman and arithmetic coding [38]. Our purpose of comprises of x “1” bits only.
choosing CABAC over its Huffman counterpart, context (3) kth order Exp-Golomb (EGk) code – it is a derivative of
adaptive variable length coding (CAVLC), is based on the Golomb codes [30]. Each unsigned integer value symbol x is
greater range of parameters for encryption that CABAC mapped onto two sequential bit strings: a prefix, and a suffix.
provides over CAVLC with more compression efficiency. The The prefix part of the EGk codeword consists of a unary code
H.264 Main profile and the various High profiles, which deal with length ls bits of 1 and one termination bit 0. The length ls
with 4CIF resolution pictures and above, support CABAC. of the prefix string of bit 1 is: ls = 1 and the EGk
Thus, it appears that the multi-scale video distribution of the
future will support CABAC. Currently, the outlook is for full suffix part is computed as the binary representation x + 2k ( 1
VGA resolution on standard streaming mobile applications – 2ls ) which uses k + ls significant bits, but in the kth order of
(e.g. Apple's FaceTime), full 720p high definition (HD) on EGk the number of symbols having the same code length is
mobile devices and full 1080i HD for desktop streaming, represented by Codeword = K + (2 ls) + 1.
which will require a reduction in bit-rate that can be supported (4) Fixed-length (FL) code – this FL binarization is commonly
by CABAC. applied to syntax elements with fairly uniform distribution,
where each bit in FL binary format represents a specific
B. Context Adaptive Binary Arithmetic Coding coding decision e.g. coded block pattern (CBP) symbol related
CABAC [39] is one of the entropy coding modes used by to the luma residual data part. In FL a symbol x within a finite
H.264/SVC to achieve high compression and can be easily size of cutoff value CV is represented by FL ls = .
computed on devices with medium to high computational
4
(5) Concatenation of the first and third scheme (UEGk)- There receiver. This property makes this mode a valid choice for the
are three situations where concatenations of the four basic real-time video applications. However, the scheme could
types are used: (a) UFL- coded_block_pattern is encoded adopt another mode such as OFB, and in this case more
using a 4-bit FL prefix for luma and TU suffix with cutoff protection against errors in transmission would become
value CV = 2 for chroma; (b) UEG3- motion vector available, at a cost in lack of self-synchronization. Thus,
differences are encoded with a concatenation of a unary prefix choice of mode is not critical to the proposed scheme.
and a 3rd order Exp-Golomb code suffix: for a value MVD, The AES is chosen for encryption because of its strength
the prefix is a TU coding with cutoff value CV = 9 of the value against all exhaustive key search attacks. It is estimated that
min(|MVD|,9), or, if MVD = 0, just the bit 0. If |MVD| ≥ 9, a the time required for breaking a 128 bit key by applying all
suffix is output with the value | MVD | - 9 using the EG3 code. possible keys at 50 billion keys/sec takes 5 x 1021 years [45].
A sign bit is then output if |MVD| > 0: 0 if MVD is positive
and 1 otherwise; and, (c) UEG0- absolute values of transform IV. PROPOSED SECURITY SYSTEM
coefficient levels are coded using a TU prefix with cutoff When the same copyrighted multimedia content is
value CV = 14 and the EG0 suffix. The syntax elements distributed to multiple users with different scalable features,
(coeff_abs_value_minus1 = abs_level – 1) are coded by using there is a need for transmitting scalable coded layers
this scheme, while the zero-valued coefficient levels are separately hence demanding individual layer security. We
encoded using a significance map. choose selective encryption on bin-strings of the CABAC
C. Multimedia Internet Keying Protocol entropy coder which are the input to the probability/context
MIKEY [42] is designed to tackle the key exchange model and finally code with a binary arithmetic coder. The
problems, especially in real-time networks. The key research aim is to devise an efficient security system that
management protocol is devised to enable end-to-end security provides sufficient encryption and a key management
i.e. only the participants involved in the communication have mechanism for SVC layers.
authorized access to the generated key(s) and hence to the
content. MIKEY uses a total of eight keys. The keys are A. Bins Selection for Selective Encryption
generated on either sender side or both sides (sender and The CABAC coder has multiple parameters (bin-strings)
receiver) and are described as the: which can be encrypted; for example, transform coefficients
(TC), motion vector differences (MVD), delta quantization
1) Traffic Generation Key (TGK) parameters (dQP) and the arithmetical signs of TC and MVD.
2) Traffic Encryption Key (TEK) To make the SE more effective, we need to choose sensibly
3) Encryption Keys (one for each sender and receiver) the parameters for the encryption. There are two constraints in
4) Authentication Keys (one for each sender and receiver) parameters selection:
5) Salting Keys (one for each sender and receiver) 1) Compression friendliness specifies that the SE must not
disturb the compression efficiency of the encoder else the SE
MIKEY supports five methods for
would increase the encrypted data size to be transferred for a
transporting/establishing a TGK or to setup a common secret,
given bandwidth. It can be controlled by keeping the size of
for all communication scenarios by using: a pre-shared key,
encrypted bin-string (codewords) of the same as is the size
public-key encryption, Diffie-Hellman (DH) key exchange,
of input bin-string, and also by keeping the context model
DH-HMAC (HMAC-Authenticated Diffie-Hellman), and
unchanged for the given syntax element.
RSA-R (Reverse RSA). MIKEY has the capability of
2) Format compliance means the SE must not change the
establishing keys and parameters for more than one security
overall video statistics which would otherwise make the
protocol (or for multiple instances of the same security
SVC decoder complain about decoding the selectively
protocol) at the same time. The TEK can be used directly by
encrypted bit-stream.
the security protocol or it can be used to derive further master
To fulfill the above two constraints we can make some
keys from the TEK. It is however up to the security protocol
recommendations for SE. Some of these recommendations are
to define how the TEK is used.
made on the basis of experimental results (to be described in
D. Advanced Encryption Standard Section V) while others pertain to the nature of syntax
The AES [43] is based on modified substitution- elements. The SE should not be applied on:
permutation network. AES can use the keys of lengths 128 − the intracoded syntax elements having relationship with
bits, 192 bits and 256 bits. For both ciphering and de- neighboring macroblock (MB) syntax elements like Intra DC
ciphering, the AES algorithm uses a round function that is and AC: Because it increases the bit-rate and drift in the
comprised of four different byte-oriented transformations. values of syntax elements and also the bit-stream will not be
AES is basically a symmetric key block cipher using 128- decodable at some stage.
bit block size but it can be used as a stream cipher in Cipher − the intercoded syntax elements like motion vector
Feedback (CFB), Output Feedback (OFB) and Counter (CTR) differences (MVDs) : Because this prediction residual used
modes. In selective encryption a small number of bytes are to predict the future MBs, alters the video statistics by
encrypted, so implementing the AES as a stream cipher is changing magnitudes and increases the bit-rate while the bit-
recommended [44]. Among the above mentioned three modes, stream can be decodable.
the CFB mode is used to build a self-synchronization stream
cipher which provides confidentiality at transmitter and
5
other layers. Providing scalable security requires that SE is layer Eln encryption key ekn will generate its immediate lower
applied on all layers of data from Bl0 (base layer) to Eln (top layer Eln-1 key ekn-1, ekn-1 will generate ekn-2 key and so on.
enhancement layer). If client Ci has subscribed to receive the This key generation is carried out at the client side. All the
data of layer Eli, he must have access to the entire lower layer recursively derived keys will be stored in the working
encryption keys (i.e. eki to ek0) to be able to decode the memory. The generalized concept of encryption keys
subscribed layer data. The management of all sets of layer Eli generation for lower SVC layers is represented as:
keys for client Ci is a potential security hazard especially when eknÆ HMAC (TEK , ekn Constant || RAND, ekn length) (8)
the scalable data has a large number of layers. Many problems ekn-1Æ HMAC (ekn , ekn-1Constant || RAND, ekn-1 length) (9)
arise with the generation of large number of keys, specifically: ekn-2ÆHMAC (ekn-1 , ekn-2Constant || RAND, ekn-2 length) (10)
1) high computational cost of generating multiple keys at one RAND is generated according to the PRF (a keyed pseudo-
time to get access to the Bl0 to Eli data, 2) memory random function) in [42]. The overall key generation scheme
consumption, and 3) time to save ek0 to eki keys which are is shown in Fig. 4.
sizeable as per the security requirements. Consequently, the
goal is to derive a mechanism in which each client needs to
hold a single encryption key to retrieve the subscribed layer
data. A single key significantly reduces the security hazards
related to key management, storage and transmission. In this
work MIKEY is used to provide this goal.
MIKEY generates the two major keys (TGK and TEK)
which will further generate the lower keys in a hierarchical
fashion. Table I shows the characteristics of all MIKEY keys
(key length, life time and constants) with their
generation/distribution summaries. Fig. 4. Keys generation mechanism
TABLE I
CHARACTERISTICS OF MIKEY KEYS After the key generation and distribution, the proposed
Key Generation/ MIKEY Key Life solution will provide client authentication and SE of the layers
Keys Length Distribution Constants Time by using AES-CFB stream cipher algorithm. The idea behind
(bits) Methods &
Parameters
the SE of scalable layers can be understood from Fig. 5.
TGK (Master key) 128 Diffie DH prime & 01 month Frame 1 Frame 2 Frame 3 Frame 4 Frame 5
Hellman base values ek2
El2
TEK 128 HMAC- 0x2AD01C64 Daily for ek1
SHA1(TGK) 12 Hrs. El1
ek0
Master Encryption 128 HMAC- 0x15798CEF For Session Bl0
key (eK) SHA1(TEK)
Authentication Key 160 HMAC- 0x1B5C7973 Unique for Fig. 5. Keys per SVC layers
(aK) SHA1(TEK) every User
Salt Keys (sK) 112 HMAC- 0x39A2C14B Daily for Three ascending order scalable layers are shown in Fig. 5,
SHA1(TEK) 12 Hrs. lowest is the base layer and the upper two are enhancement
layers. The term ‘frame’ in Fig. 5 generalizes I, P and B
TGK is generated by the Diffie-Hellman algorithm and it frames with their respective contents. Fig. 5 shows that SE is
generates TEK, while TEK further generates the master applied by key ek0 on the base layer video frames 1 and 5
encryption key, authentication key and salt key. The purpose (horizontal lines patterns). Three video frames 1, 3 and 5 are
of salt key generation is to enhance the security by altering on the first enhancement layer el1, frames 1 and 5 are already
some bytes of TEK on a daily basis and thus stop look-up encrypted by ek0; only frame 3 (vertical lines) belongs to the
table based attacks. The few bytes of the salt key are replaced El1, so SE is applied on frame 3 only by key ek1. This process
in the TEK and after 12 hours use of TEK, the salted TEK will of encryption is continued on all the above layers. The frames
be used for the next 12 hours. The general equations for which are already encrypted on lower layers will not be re-
overall keys generation scheme are: encrypted on upper layers. Only the respective layer frame(s)
TGKÆ gsr mod p (Diffie Hellman) (3) will be encrypted with the corresponding layer encryption key.
Where p=prime no., g=generator, sr=sender & receiver RAND The equations for the SE on bit-streams within layers are:
values ek2 (SE) Æ El2 Frames – El1 Frames (11)
TEK Æ HMAC (TGK , MIKEY Constant || RAND, TEK length) (4) ek1Æ El1 Frames – Bl0 Frames (12)
Master ekÆ HMAC (TEK , MIKEY eK Constant || RAND, eK length) ekoÆ Bl0 Frames (13)
(5) The process of SE on frames can be generalized as:
ak Æ HMAC (TEK , MIKEY aK Constant || RAND, aK length) (6) ekn (SE) Æ Eln Frames – Eln-1 Frames (14)
skÆ HMAC (TEK , MIKEY sK Constant || RAND, sK length (7)
The master encryption key further generates the 128-bit V. EVALUATION
lower layer keys. The lower layer keys are then used to The performance of the proposed SE with key
encrypt the content of the SVC lower layers by the use of self management scheme has been tested with the SVC reference
defined constants for each layer. The keys are generated in software (Joint Scalable Video Model) JSVM 9.19.10 version
recursively hierarchical fashion, i.e. top enhancement SVC encoder. For the evaluation of results, several different test
7
45 47
PSNR (dB)
Encrypted Y Encrypted Y
35
Original U
37
Original U
(a) (b) (c)
25 Encrypted U 27 Encrypted U Fig. 8. Impact of keys on video perception of the News (CIF) sequence
15
Original V
17
Original V encoded with 90 frames (I+P+B) and QP 24. (a) Frame #41[Y=42.29,
5
Encrypted V
7
Encrypted V
U=45.15, V=46.33] dB, (b) ek change by 1 bit [Y=11.34, U=19.87, V=24.78]
8 16 24 32 40 48 8 16 24 32 40 48 dB, and (c) ek change by 2 bits [Y=11.37, U=19.79, V=24.70] dB.
QP values QP values
(a) (b) iii) Exhaustive Key Search Attack: The exhaustive key search
Fig. 6. PSNR variance of (a) Mobile (CIF) sequence and (b) ICE (4CIF) is a strategy to find the correct key by continuously trying
sequence at different QP values every possible key in turn until the correct key is identified.
8
However, it is not practicable to find a 128-bit key by Table IV shows the standard deviation of luma values after
exhaustive key search. To quantify this security we can relate SE. Note these are smaller than the original video while the
the number of generated attacks on data and keys with Poisson chroma values are larger, this produces the dark or bright color
probability distribution, given by P (µ; n) = , where e is pictures. The statistical analysis shows that the luma and
! chroma values of the whole video are drastically changed by
a constant equal to approx. 2.71828, µ is the number of attacks
the proposed SE and there is no way to extrapolate/derive the
and n is the actual number of attacks occurring in the fixed
encrypted parts from the un-encrypted parts.
interval of time of region. P defines the probability of a given
number of events (attacks) occurring in a fixed interval of
C. Computational Overhead Analysis
time. CISCO security statistics [47] show that, an attack on a
The computational overhead is calculated on the basis of
host machine occurs every five minutes, translating to about
the additional processing time required for the encoding and
300 attacks per day. We assume that 20% of these attacks are
decoding of test sequences with SE on whole SVC bit-stream
on video and if there is one attack every hour, a continuous
and on per layer basis. The experiments were performed on a
time Markov chain [48] can be associated with the attacks
machine, Intel Core 2 Duo (3.33GHz) processor with 4GB
queue. Our system is robust enough to meet the security
RAM. Tables V(a) and V(b) show the encoding and decoding
needs, as the time for a traffic encryption key (TEK) is fixed
timings of the ICE (CIF) and ICE (4CIF) videos respectively
i.e. 12 hrs and after every 12 hrs TEK will be changed. Within
at different frame rates with and without SE. It is also noted
these 12 hrs the number of attacks that can occur is not likely
here that additional encoding and decoding delay (Tables V(a)
to successfully break the key, as the TEK will be replaced by a
and V(b), column no. 4 and 7) includes the keys generation
new one. So the previously rendered successful attack will be
time as well which is calculated separately and shown in Fig.
useless for all subsequent key changes.
10. The processing delays are negligible as they fall in the
B. Statistical Analysis range of milliseconds, verifying the efficiency of the proposed
An image data distribution can be examined by two scheme on Intra and Inter frames for four-layer SVC, on both
statistical measuring parameters which are mean µ and its the encoder and decoder side. Fig. 9 shows the additional
standard deviation σ. The pixels within an image are highly computational delay on encoding and decoding in
correlated with each other in horizontal, vertical and diagonal milliseconds with different numbers of frames (x-axis).
directions. As a result, when the image is encrypted the Table V (a)
THE COMPUTATIONAL OVERHEAD MEASUREMENT (MILLISECONDS) FOR THE
entropy (data randomness) falls and correlation becomes high ICE (CIF) SEQUENCE AT A DIFFERENT NUMBER OF ENCODED FRAMES (I+P+B)
because the video frames (texture & edges) are converted into AND QP 24
flat regions and produce artifacts in the image. During SE, No. of Encoding Encoding Encoding Decoding Decoding Decoding
frames time time with Delay time with time Delay
pixel values were truncated to a maximum and minimum of without SE SE without
255 and 0 respectively. This causes the spread of dark or very SE SE
bright colors across the video image, which is why correlation 10 6248.72 6227.62 21.1 454.35 439.05 15.3
30 19152.41 19108.91 43.5 970.59 937.09 33.5
and data randomness increase in the encrypted video. 50 31933.08 31865.18 67.9 1469.85 1423.15 46.7
Correlation of adjacent pixels is dependent on the local µp and 70 44753.66 44664.46 89.2 2008.4 1941.2 67.2
σp. A statistical analysis on video was performed on the 90 57724.86 57609.76 115.1 2490.14 2406.24 83.9
Mobile (CIF) sequence to show the impact of SE on video Table V (b)
statistics. The mean (Table III) and standard deviation (Table THE COMPUTATIONAL OVERHEAD MEASUREMENT (MILLISECONDS) FOR THE
ICE (4CIF) SEQUENCE AT A DIFFERENT NUMBER OF ENCODED FRAMES
IV) were determined for the local neighborhood of each pixel,
(I+P+B) AND QP 24
before averaging across all pixels and all frames of the tested No. of Encoding Encoding Encoding Decoding Decoding Decoding
sequence. frames time time with Delay time with time Delay
TABLE III without SE SE without
SE SE
MEAN (µ) OF SE FOR MOBILE (CIF) SEQUENCE WITH 90 FRAMES (I+P+B) AT
10 21955 22012 55 1321.31 1285.59 35.72
DIFFERENT QP VALUES
QP µ of µ of µ of µ of µ of µ of 30 68432 68552 120 3514.5 3440.18 74.32
50 113956 114193 237 5615.69 5510.39 105.30
Values Plain Y SE Y Plain U SE U Plain V SE V
70 158945 159316 371 7645.58 7481.89 163.69
8 135.23 46.02 113.25 111.52 131.61 126.82
90 204964 205461 497 9719.98 9489.50 230.48
16 135.31 54.31 113.29 119.12 131.74 96.29
24 135.42 53.25 113.45 119.07 131.81 145.56 140 600
32 135.53 42.29 113.51 121.24 131.93 122.98 Encoding Delay Encoding Delay
Time in Milliseconds
Time in Milliseconds
120 500
Decoding Delay
40 135.47 41.18 113.38 111.20 131.98 97.71 100
Decoding Delay
400
48 135.28 29.09 113.42 103.92 132.07 113.83 80
300
TABLE IV
60
200
40
STANDARD DEVIATION (Σ) OF SE FOR MOBILE (CIF) SEQUENCE WITH 90 20 100
FRAMES (I+P+B) AT DIFFERENT QP VALUES 0 0
QP σ of σ of σ of σ of σ of σ of 10 30 50 70 90 10 30 50 70 90
No. of Frames No. of Frames
Values Plain Y SE Y Plain U SE U Plain V SE V
8 63.58 44.05 21.83 26.54 26.50 38.21 (a) (b)
16 63.50 46.81 21.76 29.23 26.40 39.52 Fig. 9. Additional encoding and decoding delay caused by SE on (a) ICE
24 63.26 44.54 21.52 24.56 26.13 34.70 (CIF) video, and (b) ICE (4CIF) video.
32 62.82 40.29 21.14 28.08 25.66 45.99
40 61.95 38.82 20.51 28.08 25.10 44.03
Different numbers of frames were tested to study the
48 59.01 33.62 20.37 28.44 24.74 52.51 suitability of the scheme for real-time transmissions, for
9
0
TEK aK Master ek sK Ln to L0
eK
Keys
Fig. 10. Keys generation timings
For each subscriber, three keys TEK, ak and a master key (a) (b) (c) (d)
ek have to be derived. TEK and ak will be generated once at Fig. 12. Impact of having a layer wise key for decryption on the News (CIF)
the client registration stage and must be unique for each client. sequence with 90 frames (I+P+B) and QP 24. (a) Decryption by Layer 0 key
(eK0) [Y=13.86, U=23.51, V=25.99] dB, (b) Decryption by Layer 1 (eK1)
In addition, depending upon the subscribed layers by the key [Y=16.61, U=25.46, V=28.96] dB, (c) Decryption by Layer 2 (eK2) key
client, the master encryption key is generated by the system [Y=41.31, U=43.66, V=45.93] dB, and (d) Decryption by Layer 3 (eK3) key
for the subscribed layer, and sent to the client. Then he derives [Y=42.29, U=45.15, V=46.33] dB.
his own encryption keys for all the lower layers. It is a D. Comparative Analysis
hierarchical system and each layer encryption key ek1 is For comparative analysis of our proposed key management
derived from its former layer ek0. The timings given in Fig. 10 and SE scheme with the existing work, we choose eight
are for the keys of the scalable layers El10, El8, El6, El4 and El2, encryption and key management methods specifically for
but for generating hierarchical encryption keys these must be CABAC entropy coding of H.264/SVC scalable video codec.
derived from layer El10 to El0. The experiments show that the The chosen proposed techniques are compared on the basis of
timings of generating TEK, aK, master eK and sK are the the following parameters which are denoted by comparison
same whether they are generated for layer El0 or layer El10. symbol Cn in comparison Table VI:
The difference is shown in the encryption keys generation
timings of layers Ln to L0. If the hierarchical encryption keys C1- Selected parameters for encryption
are derived for just two scalable layers (base and enhancement C2- Compression friendliness
layers), it will take 49 microseconds and if they are generated C3- Format compliance
for ten layers (one base and nine enhancement layers) then it C4- Entropy coding
will take 109 microseconds. The frequent key generation does C5- Bit-rate overhead
not cause much additional overhead on the C6- Encryption algorithms
encryption/decryption computational cost of the proposed C7- Incorporated key management scheme for SVC layers
system because of the negligible key generation time. C8- Key management protocol
The computational cost is also calculated for each scalable All the compared techniques are applied in the same
layer. We have measured the per layer encoding/decoding domain of selective encryption on CABAC of H.264/SVC.
time with SE on the News (CIF) sequence. The maximum The encryption proposed by Stütz et al. [21] was applied on
processing delay time for 90 frames at 30 fps for the entire NAL units of an SVC bit-stream and it was reported that there
four layers encoding was 0.1124 ms and decoding processing was a small bit-rate overhead due to the change in the number
delay was 0.1213 ms. Fig. 11 shows the impact of encryption of bytes after NAL unit encryption. Recent research regarding
on the individual layers; the quality of sequence and YUV SVC is presented by [25], which has detailed work on SVC
10
TABLE VI
COMPARATIVE ANALYSIS OF PROPOSED SECURITY SYSTEM
Proposed schemes C1 C2 C3 C4 C5 C6 C7 C8
Thomas Stütz et al. NAL Unit Yes Yes CABAC/ Yes AES-ECB No Not specified
[21] CAVLC
Gul Boztok Align et DC alteration, signs of texture No Yes Not Yes XOR No Not specified
al. [25] and MVD specified
Chunhua Li et al. IDR frames, PPS, SPS, IPM , No Yes CABAC/ Yes LEX stream Yes Not specified
[26][27] signs of texture CAVLC cipher
Yong Geun Won et al. Signs of texture, MVD and FGS Yes Yes CABAC No XOR stream Yes Not specified
[30] cipher
Yeongyun Kim et al. Region of interest (ROI) with Yes Yes CABAC No XOR stream No Not specified
[31] signs of texture MVD and FGS cipher
Su Wan Park et al. IPM, signs of residual and MVD No Yes CABAC/ Yes Stream cipher Yes Not specified
[32] [33] CAVLC
Our scheme UEG3 suffix, UEG0 suffix, and Yes Yes CABAC No AES-CFB Yes MIKEY
signs of TC levels
layers. However, the DC value alteration in [25] damages the except the inevitable minimal computational overhead due to
video statistics before compression and thus causes a bit-rate SE over SVC layers.
overhead. The IPM encryption [26][27][32][33] changes the The significance of the proposed system is to resolve the
video statistics, hence compression efficiency degradation multiple key overhead issues: the subscriber of each layer will
increases the bit-rate. The studies in [26][27][30][33] provide receive only one encryption key to use, but this key will
complete security systems for SVC layers and complex key transparently open the doors of all the layers below. The
management schemes are proposed, without any reference to proposed system is suitable for video distribution to users who
standard key management protocols. More than one keys were have subscribed to a different video quality regarding
generated per layer in these works, hence they do not solve the bandwidth, storage and device rendering capabilities. The
problem of overhead for managing multiple keys for each same system can be extended to ROI for bit-rate reduction in
layer. The selective encryption presented in [30] was video surveillance [31][51] without any modification. The
implemented in a similar way on ROI by Kim et al. [31] but error resilience [52] issues for the proposed system can be
without a key management scheme. investigated in the transmission scenarios of scalable layers as
To summarize, we have proposed a complete security a future work.
system for scalable video content protection. It incorporates
the standard security algorithm AES-CFB for SE on justified REFERENCES
SVC bin-strings; and the key management protocol (MIKEY) [1] T. Wiegand, G. Sullivan, J. Sullivan, G. Bjøntegaard, and A. Luthra,
is used for client authentication at the registration phase and “Overview of the H. 264/AVC video coding standard,” IEEE Trans.
also for key generation/distribution on layer basis. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560-576, 2003.
[2] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira,
VI. CONCLUSIONS AND FUTURE WORK T. Stockhammer, and T. Wedi, “Video coding with H. 264/AVC: Tools,
performance, and complexity,” IEEE Circuits Sys. Mag., vol. 4, no. 1, pp.
In this paper, an efficient complete security system has 7-28, 2004.
been proposed for H.264 scalable video codec on CABAC [3] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video
bin-strings. The security system incorporates selective coding extension of the H. 264/AVC standard,” IEEE Trans. Circuits
protection of the scalable layers utilizing DRM techniques Syst. Video Technol., vol. 17, no. 9, pp. 1103-1120, 2007.
[50] for client authentication at registration stage and efficient [4] Y. Mao, and M. Wu, “A joint signal processing and cryptographic
key management mechanism through MIKEY. AES-CFB is approach to multimedia encryption,” IEEE Trans. Image Process., vol.
used for SE on sensibly chosen bin-strings by taking into 15, no. 7, pp. 2061–2075, July 2006.
account the security of the video, compression efficiency, bit- [5] F. Cayre, C. Fontaine, and T. Furon, “Watermarking security: Theory and
rate fluctuation, format compliance and scalability features practice,” IEEE Trans. Signal Process. vol. 53, no. 10, pp. 3976–3987,
2005.
(Temporal, SNR and Spatial) of H264/SVC. The results show
[6] H. D. Engel, R. Kutil, and A. Uhl. “A symbolic transform attack on
that our scheme is fully implementable with all scalable
lightweight encryption based on wavelet filter parameterization”, in Proc.
features (Temporal, SNR and Spatial) of SVC and with Intra of ACM Multimedia and Security Workshop, MM-SEC ’06, pp. 202-207,
and Inter coded (I, P & B) frames. The performance of the Geneva, Switzerland, Sept. 2006.
proposed system is justified by many important factors such [7] B. Furht, E. Muharemagic, and D. Socek, eds., Multimedia Encryption
as: a security analysis on video perception and keys, video and Watermarking, Springer Verlag, New York, NY, 2005.
statistical analysis after the application of pro-compression [8] A. Uhl, and A. Pommer. “Image and video encryption: From Digital
encryption, computational overhead calculation caused by SE Rights Management to secured personal communication,” Advances in
with a keys generation process and comparative analysis with Information Security Series, vol. 15. Springer-Verlag, New York, NY,
existing work. The results demonstrate that the proposed 2005.
security system has no drawbacks over security, compression [9] M. Abomhara, O. Zakaria, O.O. Khalifa, A.A. Zaidan, and B.B. Zaidan
efficiency, bit-rate and format compliance on the decoder side, “Enhancing selective encryption for H.264/AVC using Advanced
Encryption Standard,” Int. J. of Computer and Electrical Eng., vol. 2, no.
2, pp. 223-229, 2010.
11
[10] B. Barmada, M. M. Ghandi, E. V. Jones, and M. Ghanbari, “Prioritized [32] S.W. Park, and S.U. Shin. "An efficient encryption and key management
transmission of data partitioned H.264 video with hierarchical QAM,” scheme for layered access control of H.264/Scalable Video Coding,”
IEEE Signal Process. Lett., vol. 12, no. 8, pp. 577-580, August 2005. IEICE Trans. on Information and Systems, vol. 92, no. 5, pp. 851-858,
[11] P. Melih, and D. Vadi, “A MPEG-2-transparent scrambling technology,” 2009.
IEEE Trans. Consum. Electron., vol. 48, no. 2, pp. 345-355, May 2002. [33] S.W. Park, and S.U. Shin, “Efficient selective encryption scheme for the
[12] C. Wang, H.B. Yu, and M. Zheng, “A DCT-based MPEG-2 transparent H.264/Scalable Video Coding (SVC),” Int. Conf. Networked Computing
scrambling algorithm,” IEEE Trans. Consum. Electron., vol. 49, no. 4, and Advanced Inf. Management, 2008, pp. 371-376.
pp. 1208 – 1213, Nov. 2003. [34] R. Rivest, “The MD5 Message-Digest Algorithm,” IETF RFC 1321, Apr.
[13] W. Zeng, and S. Lei, “Efficient frequency domain selective scrambling of 1992.
digital video,” IEEE Trans. Multimedia, vol. 5, no. 1, pp. 118-129, March [35] H. Krawczyk, M. Bellare, and R. Canetti, “HMAC: Keyed-hashing for
2003. message authentication,” IETF RFC 2104, 1997.
[14] S. Spinsante , F. Chiaraluce, and E. Gambi, “Masking video information [36] G.B. White, E.A. Fisch, and U.W. Pooch, Computer System and Network
by partial encryption of H.264/AVC coding parameters,” 13th Europ. Security, CRC Press, Boca Raton, FL, 1995.
Signal Proc. Conf., 2005. [37] E. Magli, M. Grangetto, and G. Olmo, “Joint source, channel coding, and
[15] Y. Fan, J. Wang, T. Ikenaga, Y. Tsunoo, and S. Goto, “An unequal secure secrecy,” EURASIP J. on Information Security, vol. 2007, Article ID
encryption scheme for H.264/AVC video compression standard,” IEICE 79048, 7 pages, 2007.
Trans. Fundamentals of Electronics, Communications and Computer [38] G. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC Advanced
Sciences, vol. 91, no. 1, pp. 12-21, 2008. Video Coding standard: overview and introduction to the fidelity range
[16] Y. Fan, J. Wang, T. Ikenaga, Y. Tsunoo, and S. Goto, “A new video extensions,” SPIE Conf. on Applications of Digital Image Processing
encryption scheme for H.264/AVC,” Advances in Multimedia XXVII, 2004, pp. 454-474.
Information Processing, 2007, LNCS vol. 4810, pp. 246–255. [39] D. Marpe, H. Schwarz, and T.Wiegand, “Context-adaptive binary
[17] J.R. Ohm, “Advances in scalable video coding,” Proceedings of the arithmetic coding in the H.264/AVC video compression standard,” IEEE
IEEE, vol. 93, no. 1, pp. 42-56, 2005. Trans. Circuits Syst. Video Technol., vol. 13, pp. 620–636, July 2003.
[18] J.G. Apostolopoulos, “Architectural principles for secure streaming & [40] J. Teuhola, “A compression method for clustered bit-vectors,” Inf. Proc.
secure adaptation in the developing scalable video coding (SVC) Lett., vol. 7, no. 6, pp. 308-311, 1978.
standard,” Invited paper presented at the Network-Aware Multimedia [41] M. Ghanbari, Standard codecs: Image compression to advanced video
Processing and Communications special session at IEEE ICIP 2006. coding, 3rd edition, IET Press, London, UK, 2011.
[19] S.J. Wee, and J.G. Apostolopoulos, “Secure scalable video streaming for [42] J. Arkko, E. Carrara, F. Lindholm, M. Naslund, and K. Norrman.
wireless networks,” IEEE Int. Conf. Acoustics, Speech, and Sig. Proc., “MIKEY: Multimedia Internet KEYing,” IETF RFC 3830, 2004.
May 2001, pp. 2049-2052. [43] Federal Information Processing Standards Publication 197, November 26,
[20] S.J. Wee, and J.G. Apostolopoulos, “Secure scalable streaming enabling 2001-ADVANCED ENCRYPTION STANDARD (AES), available from
transcoding without decryption,” IEEE Int. Conf. Image Proc., Oct. 2001, http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
pp. 437-440. [44] M. Kuchar, “Dispelling the myths of cryptography,” Database and
[21] T. Stütz, and A. Uhl. “Format-compliant encryption of H. 264/AVC and Network Jornal, vol. 30, no. 2, pp. 3-3, 2000.
SVC,” Tenth IEEE Int. Symposium Multimedia, Jan. 2009, pp. 446-451. [45] B. Esslinger, “The CrypTool Script: Cryptography, Mathematics and
[22] E. Magli, M. Grangetto, and G. Olmo. “Conditional access techniques for More,” 10th edition{distributed with CrypTool version 1.4.30}, 2010.
H.264/AVC and H.264/SVC compressed video,” IEEE Trans. Circuits [46] B.B. Zhu, M.D. Swanson, and S. Li, “Encryption and authentication for
Sys. Video Technol., 2008. scalable multimedia: Current state of the art and challenges,” in Proc.
[23] E. Magli, M. Grangetto, and G. Olmo. “Transparent encryption SPIE Internet Multimedia Management System, vol. 5601, pp. 157-170,
techniques for H.264/AVC and H.264/SVC compressed video,” J. of Oct. 2004.
Signal Proc, vol. 91, no. 5, May 2011. [47] D. Tesch, and G. Abelar, “Security threat mitigation and response:
[24] C. Yuan, B.B. Zhu, Y. Wang, S. Li, and Y. Zhong, “Efficient and fully Understanding Cisco security MARS”, Cisco Press, Indianapolos, IN,
scalable encryption for MPEG-4 FGS,” in Proc. of the IEEE Int. 2006.
Symposium on Circuits and Syst., May. 2003, pp. 620–623. [48] D. Malone, and W. G. Sullivan, “Guesswork and entropy.” IEEE Trans.
[25] G.B. Algin, and E. T. Tunali. “Scalable video encryption of H. 264 SVC Inf. Theory, vol. 50, no. 3, pp. 525-526, 2004.
Codec,” J. of Visual Communication and Image Representation, vol. 22, [49] ITU-T, One-Way Transmission Time ITU-T Recommend. G.114, Feb.
no. 4, pp. 353-364, May 2011. 1996.
[26] C. Li, X. Zhou, and Y. Zong, “NAL level encryption for scalable video [50] E. T. Lin, A. M. Eskicioglu, R. L. Lagendijk, and E. J. Delp, “Advances
coding,” in proc. PCM, no. 5353, pp. 496–505, 2008. in digital video content protection,” Proceedings of the IEEE, vol. 93, no.
[27] C. Li, X. Zhou, and Y. Zong, “Layered Encryption for Scalable Video 1, pp. 171-183, 2005.
Coding,” IEEE Conf. on Image and Signal Proc., Oct. 2009, pp. 1–4. [51] J.M. Rodrigues, W. Puech, and A. Bors, “Selective encryption of human
[28] X. Wang, N. Zheng, and L. Tian, “Hash key-based video encryption skin in JPEG images,” in Proc. IEEE Int. Conf. Image Process, pp. 1981-
scheme for H. 264/AVC,” Signal Processing: Image Communication, 1984, Oct. 2006.
Signal Processing: Image Communication vol. 25, no.6, pp. 427-437, Jul. [52] A. Massoudi, F. Lefebvre, C.D. Vleeschouwer, B. Macq, and J.-J.
2010. Quisquater, “Overview on selective encryption of image and video,
[29] C. Yuan, Y. Zhong, and Y. He, “Selective video stream encryption challenges and perspectives,” EURASIP J. on Information Security,
algorithm based on chaos,” Chinese Journal of Computers, vol. 27, no. 2, [online journal] Article ID 179290, 18 pages, 2008.
pp. 257-263, 2004.
[30] Y.G. Won, T.M. Bae, and Y.M. Ro, “Scalable protection and access
control in full scalable video coding,” in Proc. 5th Int. Workshop on
Digital Watermarking, 2006, LNCS vol. 4283, pp. 407–421.
[31] Y. Kim, S.H. Jin, T.M. Bae, and Y.M. Ro, “A selective video encryption
for the region of interest in Scalable Video Coding,” IEEE Region 10
Conference, 2007, pp. 1-4.
12