You are on page 1of 70
3 SOURCE CODING Communication systems are designed to transmit the information generated by a source to some destination. Information sources may take a variety of different forms. For example, in radio broadcasting, the source is generally an audio source (voice or music). In TV broadcasting, the information source is a video source whose output is a moving image. The outputs of these sources are analog signals and, hence, the sources are called analog sources. In contrast, computers and storage devices, such as magnetic or optical disks, produce discrete outputs (usually binary or ASCII characters) and, hence, they are called discrete sources. Whether a source is analog or discrete, a digital communication system is designed to transmit information in digital form. Consequently, the output of the source must be converted to a format that can be transmitted digitally. This conversion of the source output to a digital form is generally performed by the source encoder, whose output may be assumed to be a sequence of binary digits. In this chapter, we treat source encoding based on mathematical models of information sources and a quantitative measure of the information emitted by a source. We consider the encoding of discrete sources first and then we discuss the encoding of analog sources. We begin by developing mathematical models for information sources. 3-1 MATHEMATICAL MODELS FOR INFORMATION SOURCES Any information source produces an output that is random, ie., the source Output is characterized in statistical terms. Otherwise, if the source output CHAPTER. SOURCE copInG 83 were known exactly, there would be no need to transmit it, In this section, we consider both discrete and analog information sources, and we postulate ‘mathematical models for each type of source. The simplest type of discrete source is one that emits a sequence of letters selected from a finite alphabet. For example, a binary source emits a binary sequence of the form 100101110..., where the alphabet consists of the two letters {0, 1}. More generally, a discrete information source with an alphabet of L possible letters, say {x,,t2,...,Xch emits a sequence of letters selected from the alphabet. To construct a mathematical model for a discrete source, we assume that each letter in the alphabet {xy,.x2,...,..t;} has a given probability p, of occurrence. That is. Pea P(X =n), 1SK 1 and for all shifts m. In other words, the joint probabilities for any arbitrary length sequence of source outputs are invariant under a shift in the time origin. An analog source has an output waveform x(t) that is a sample function of a stochastic process X(1). We assume that X(t) is a stationary stochastic process with autocorrelation function 4,,(t) and power spectral density ®,,(f). When X(J) is a bandlimited stochastic process, i.c., P(f)=0 for [f|>W, the sampling theorem may be used to represent X(t) as oan anw(1- xo- 3 x(5) ee iw)| 2 G11) where {X(n/2W)} denote the samples of the process X(t) taken at the sampling (Nyquist) rate of f, = 2W samples/s. Thus, by applying the sampling theorem, we may convert the output of an analog source into an equivalent 84° piarrat conmumicarions discrete-time source. Then, the source output is characterized statistically by the joint pdf p(x1,2,..., 4m) for all m>1, where X, = X(n/2W), 1 soURcE cooNc 87 Let us consider some special cases: First, if po= p= 0, the channel is called noiseless and 1(0; 0) = log, 2 = 1 bit Hence, the output specifies the input with certainty. On the other hand, if Po=p;=4, the channel is useless because 10: 0) = logs =0 However, if po =p, = 4, then 1(0,0) = log, § = 0.587 1(0, 1) = log, }= — 1 bit - In addition to the definition of mutual information and self-information, it is useful to define the conditional self-information as 1 |) = loa 5G T= “Hoe Psd 9) (2-5) ‘Then, by combining (3-2-1), (3-2-3), and (3-2-5), we obtain the relationship te) — Tey) G26) We interpret J(«,| y,) as the self-information about the event X =x, after having observed the event Y=y,. Since both I(x,)20 and I(x,|y)>0, it follows that 1(x;9)) <0 when I(x, {y,) > JG), and I(;,) >0 when I(x, |) < (x). Hence, the mutual information between a pair of events can be either positive, or negative, or zero. Tey) = 3-2-1 Average Mutual Information and Entropy Having defined the mutual information associated with the pair of events (x,y), which are possible outcomes of the two random variables X and Y, we can obtain the average value of the mutual information by simply weighting 1x.) by the probability of occurrence of the joint event and summing over all possible joint events. Thus, we obtain WX; ¥) = 2 2 PR yMEY) SF POY) = F Pe, ytog Pee 27 2% Peo meese pop oa as the average mutual information between X and Y. We observe that 1(X;¥)=0 whea X and Y are statistically independent. An important characteristic of the average mutual information is that 1(X;¥)®0 (sce Problem 3-4), Similarly, we define the average self-information, denoted by H(X), as Hx) = 3 esnesy => P(x) log P(x) (3-28) When X represents the alphabet of possible output letters from a source, H(X) represents the average selfinformation per source letter, and it is called the entropyt of the source. In the special case in which the letters from the source are equally probable, P(x;) = 1/n for all i, and, hence, x)= 3 hog? =logn G29) In general, H(X)0, it follows that H(X)>H(X |), with equality if and only if X and Y are statistically independent. If we interpret H(X | ¥) as the average amount of (conditional self-information) uncertainty in X after we observe ¥, and H(X) as the average amount of uncertainty (self-information) prior to the observation, then 1(X:¥) is the average amount of (mutual information) uncertainty provided about the set X by the observation of the set Y. Since H(X)> H(X | ¥), it is clear that conditioning on the observation Y does not increase the entropy. Example 3-2-4 Let us evaluate the H(X|Y) and 1(X;Y) for the binary-input, binary- output channel treated previously in Example 3-2-2 for the case where Po™ pi=p. Let the probabilities of the input symbols be P(X =0)=q and P(X =1)= 1—q. Then the entropy is H(X) = H(q)= —@ 10g q — (1-9) log (1-9) where H(q) is the binary entropy function and the conditional entropy H(X | Y) is defined by (3-2-11). A plot of H(X'| Y) as a function of q with FIGURE 322 90 oicrrat communications Conditional entropy for binary-input, binary- ‘output symmetric channel pxos 7 HOY = condenser a ‘¢= probably of ymbot X= 0 P as a parameter is shown in Fig. 3-2-2. The average mutual information 1(X; ¥) is plotted in Fig. 3-2-3. As in the preceding: example, when the conditional entropy H(X | ¥) is viewed in terms of a channel whose input is X and whose output is Y, H(X | ¥)is called the equivocation and is interpreted as the amount of average uncertainty remaining in X after observation of ¥. FIGURE 3:23 Average mutual information for binary-input, binary-output symametric channel, exo CHARTER x souRCE coniING 9 The results given above can be generalized to more than two random variables. In particular, suppose we have a block of k random variables XjX2°-+Xq. with joint probability P(xyxy «++ x4) = P(X, = 44, X= Xz.-+++Xq =x). Then, the entropy for the block is defined as HUM Xe X= DDD Ply a) og Pl a) 213) wei et Since the joint probability P(x,x; +x.) can be factored as PCa ta x4) = POPC | y)Ps | Buta) + Pte free te 1) (3-214) it follows that HX XX y0+ Xi) = W(X) + HG |X) + (Xa | X1X2) He HG Xe Ke) = DHX Xr Ke) 215) By applying the result H(X)>H(X|¥), where X= and Y= X X20 Xue in (3-215) we obtain HX Xa XS DHX) (3-216) with equality if and only if the random variables X,.X3... statistically independent. 3-2-2 Information Measures for Continuous Random Variables ‘The definition of mutual information given above for discrete random variables may be extended in a straightforward manner to continuous random variables. In particular, if X and Y are random variables with joint pdf p(x, y) and marginal pdfs p(x) and p(y), the average mutual information between X and Y is defined as ie ply |x) XY) [freee L208 pexyp(y) 4 (3-2-17) Although the definition of the average mutual information carries over to 92 oarrat commuNcATIONS continuous random variables, the concept of self-information does not. The problem is that a continuous random variable requires an infinite number of binary digits to represent it exactly. Hence, its self-information is infinite and, therefore, its entropy is also infinite. Nevertheless, we shall define a quantity that we call the differential entropy of the continuous random variable X as H(X)= f P(x) log p(x) dx (3-218) ‘We emphasize that this quantity does nor have the physical meaning of self-information, although it may appear to be a natural extension of the definition of entropy for a discrete random variable (see Problem 3-6). By defining the average conditional entropy of X given ¥ as, aaxiy=-f f peytosee ly) dx dy (3-219) the average mutual information may be expressed as UX, ¥) = H(X)- K(X Y) or, alternatively, as U(X Y) = H(¥) - H(Y |X) In some, cases of practical interest, the random variable X is discrete and Y 1s continuous. To be specific, suppose that X has possible outcomes x,, i=1,2,...,1, and Y is described by its marginal pdf p(y). When X and ¥ are statistically dependent, we may express p(y) as Py) = & ply |x)PC) ‘The mutual information provided about the event X = x, by the occurrence of the event ¥ =y is Hexiy) = og? L20PD 9) = BT OPE) ‘Then, the average mutual information between X and Y is 1a SL oo laren —— @22y cuarrer » source coon 93 Example 3-2-5 Suppose that X is a discrete random variable with two equally probable outcomes x, =A and x= —A. Let the conditional pdfs p(y | x,), i= 1, 2, be gaussian with mean x, and variance a7. That is, pla=gece? ava 222) pu |-A)= gece ee (3-2-22) ‘The average mutual information obtained from (3-2-21) becomes. =f 20] 4) 1 rn=3f [ovr artog®+ pty | ~Ay tog LA) ay (32.23) ply) = Spy |4) +e |-A)I (3-224) In Chapter 7, it will be shown that the average mutual information 1(X; Y) given by (3-2-23) represents the channel capacity of a binary-input additive white gaussian noise channel. 3-3 CODING FOR DISCRETE SOURCES In Section 3-2 we introduced a measure for the information content associated with a discrete random variable X. When X is the output of a discrete source, the entropy H(X) of the source represents the average amount of information emitted by the source. In this section, we consider the process of encoding the output of a source, i.e., the process of representing the source output by a sequence of binary digits. A measure of the efficiency of a source-encoding method can be obtained by comparing the average number of binary digits per output letter from the source to the entropy H(X). ‘The encoding of a discrete source having a finite alphabet size may appear, at first glance, to be a relatively simple problem. However, this is true only when the source is memoryless, .e., when successive symbols from the source are statistically independent and each symbol is encoded separately. The discrete memoryless source (DMS) is by far the simplest model that can be devised for a physical source. Few physical sources, however, closely fit this idealized mathematical model. For example, successive output letters from a machine printing English text are expected to be statistically dependent. On the other hand, if the machine output is a computer program coded in Fortran, the sequence of output letters is expected to exhibit a much smaller dependence. In any case, we shall demonstrate that it is always more efficient to encode blocks of symbol instead of encoding each symbol separately. By making the block size sufficiently large, the average number of binary digits per output letter from the source can be made arbitrarily close to the entropy of the source. 3-31 Coding for Discrete Memoryless Sources Suppose that a DMS produces an output letter or symbol every 1, seconds. Each symbol is selected from a finite alphabet of symbols x, i=1,2,...,L, occurring with probabilities P(x,), i= 1,2,..., L. The entropy of the DMS in bits per source symbol is Hx) = 3, P(x) logs Pex) H(X). ‘The efficiency of the encoding for the DMS is defined as the ratio H(X)/R. We observe that when L is a power of 2 and the source letters are equally probable, R= H(X). Hence, a fixed-length code of R bits per symbol attains 100% efficiency. However, if L is not a power of 2 but the source symbols are still equally probable, R differs from H(X) by at most 1 bit per-symbol. When loge L>> 1, the efficiency of this encoding scheme is high. On the other hand, when L is small, the efficiency of the fixed-length code can be increased by ‘encoding a sequence of J symbols at a time. To accomplish the desired ‘encoding, we require L’ unique code words. By using sequences of N binary digits, we can accommodate 2" possible code words. NV must be selected such that N>J log: L Hence, the minimum integer value of N required is N= log LJ+1 3-4) Now the average number of bits per source symbol is N/J = R, and, thus, the ‘CHAPTER x SouRcE copNG 9S inefficiency has been reduced by approximately a factor of 1/J relative to the symbol-by-symbol encoding described above. By making J sufficiently large, the efficiency of the encoding procedure, measured by the ratio JH(X)IN, can be made as close to unity as desired. The encoding methods described above introduce no distortion since the encoding of source symbols or blocks of symbols into code words is unique. This type of encoding is called noiseless. Now, suppose we attempt to reduce the code rate R by relaxing the condition that the encoding process be,unique. For example, suppose that only a fraction of the L’ blocks of symbols is encoded uniquely. To be specific, let us select the 2”—1 most probable J-symbol blocks and encode each of them uniquely, while the remaining 1’ ~ (2-1) J-symbol blocks are represented by the single remaining code word, This procedure results in a decoding failure ‘or (distortion) probability of error every time a low probability block is mapped into this single code word. Let P, denote this probability of error. Based on this block encoding procedure, Shannon (198a) proved the following source coding theorem. Source Coding Theorem I Let X be the ensemble of letters from a DMS with finite entropy H(X). Blocks of J symbols from the source are encoded into code words of length N from a binary alphabet. For any ¢>0, the probabilily P. of a block decoding failure can be made arbitrarily small if Rat onnyre 35) and J is sufficiently large. Conversely, if REH(X)—e€ 33-6) then P becomes arbitrarily close to 1 as J is made sufficiently large. From this theorem, we observe that the average number of bits per symbol Tequired to encode the output of 2 DMS with arbitrarily small probability of decoding failure is lower bounded by the source entropy H(X). On the other hand, if R. This property makes the code words instantaneously decodable. Code III given in Table 3-3-1 has the tree structure shown in Fig. 3-3-2. We note that in this case the code is uniquely decodable but nor instantaneously decodable, Clearly, this code does not satisfy the prefix condition. Our main objective is to devise a systematic predure for constructing uniquely decodable variable-length codes that are efficient in the sense that the average number of bits per source letter, defined as the quantity a R= > Pla) B37) is minimized. The conditions for the existence of a code that satisfies the. prefix condition are given by the Kraft inequality. Kraft Inequality A necessary and sufficient condition for the existence of a binary code with code words having lengths 7, <2 <... <7; that satisfy the prefix condition is “ Drs1 (3-3-8) First, we prove that (3-3-8) is a sufficient condition for the existence of a code that satisfies the prefix condition. To construct such a code, we begin with a full binary tree of order m =, that has 2" terminal nodes and two nodes of order k stemming from each node of order k 1, for each k, 1 =k available to be assigned to the next code word. Thus, we have constructed a code tree that is embedded in the full tree of 2" nodes as illustrated in Fig. 3-3-3, for a tree having 16 terminal nodes and @ source output consisting of five letters with m= 1, m,=2, ,=3, and ny=ns=4, To prove that (3-3-8) is a necessary condition, we observe that in the code tree of order m=n,, the number of terminal nodes eliminated from the total number of 2" terminal nodes is Dwre2 Hence, S21 and the proof of (3-3-8) is complete. ‘The Kraft inequality may be used to prove the following (noiseless) source coding theorem, which applies to codes that satisfy the prefix condition. Source Coding Theorem I Let X be the ensemble of letters from a DMS with finite entropy H(X), and output letters x4, 1 plogs—~ > parte Hoe m1 Pe kt ‘ an = 2 Pe lows (3-310) Use of the inequality In.x Pl.) > >P(x,). We begin the encoding process with the two least probable symbols x, and x;. These two symbols are tied together as shown in Fig. 3-3-4, with the upper branch assigned a 0 and the lower branch assigned a 1. The probabilities of these two branches are added together at the node where the two branches meet to yield the probability 0.01. Now we have the source symbols x,,... , x5 plus a new symbol, say x4, obtained by combining 4. and x. The next step is to join the two least probable symbols from the Set Mi, X2 3, Kay Xs XG. These are x5 and x;, which have a combined probability of 0.05. The branch from x, is assigned a 0 and the branch from 4 is assigned a 1. This procedure continues until'we exhaust the set of possible source letters. The result is a code’tree with branches that contain the desired code words. The code words are obtained by beginning at the Fightmost node in the tree and proceeding to-the left. The resulting code words are listed in Fig. 3-3-4. The average number of binary digits per symbol for this code is R= 2.21 bits/symbol. The entropy of the source is 2.11 bits/symbol. We make the observation that the code is not necessarily unique, For example, at the next to the last step in the encoding procedure, we have a tie between x, and x3, since these symbols are equally probable. At this point, we chose to pair x, with x. An alternative is to pair x2 with x}. If we choose this FIGURE 335 FIGURE 335 charmer x: source copmc 101 03s 030 07 0.0: 06s 004 0003: ai 00s: ‘An alternative code for the DMS in Example 3:41 22) pairing, the resulting code is illustrated in Fig. 3-3-5. The average number of bits per source symbol for this code is also 2.21. Hence, the resulting codes are ‘equally efficient, Secondly, the assignment of a 0 to the upper branch and a 1 to the lower (less probable) branch is arbitrary. We may simply reverse the assignment of a 0 and 1 and still obtain an efficient code satisfying the prefix condition. Example 3-3-2 As a second example, let us determine the Huffman code for the output of a DMS illustrated in Fig. 3-3-6. The entropy of this source is H(X)= 2.63 bits/symbol. The Huffman code as illustrated in Fig. 3-3-6 has an average length of R = 2.70 bits/symbol. Hence, its efficiency is 0.97. ‘Hulfman code for Example 3-32 036 0 ogg ou on on ——____] : 5 % ' 5 010 an 0 n ou 229 © 100 on fi Pa 5 ‘01 007 —_______p tog ho oas 5 10 ° qi ht oot i % a” 002 r ay=20_Re27 TABLE 332 202 _diarrat conmunicaTions ‘The variable-length encoding (Huffman) algorithm described in the above examples generates a prefix code having an R that satisfies (3-3-9). However, instead of encoding on a symbol-by-symbol basis, a more efficient procedure is to encode blocks of J symbols at a time. In such a case, the bounds in (3-3-9) of source coding theorem II become JH(X)< RB, X, is AX Xi) =D HG | Xa Xa) 314) where H(X, | X,X2-+ + X,-1) is the conditional entropy of the ith symbol from the source given the previous i~1 symbols. The entropy per letter for the -symbol block is defined as HA) = EHO XX) (3-3-15) We define the information content of a stationary source as the entropy per letter in (3-3-15) in the limit as k—» «, That is, HAX) im H,(X) = lim pH, Me) (33-16) 104 o1crrat. communications ‘The existence of this limit is established below. ‘As an alternative, we may define the entropy per letter from the source in terms of the conditional entropy A(X, |X,X2"++Xr-1) im the limit as k approaches infinity. Fortunately, this limit also exists and is identical to the limit in (3-3-16). ‘That is. HX) = fim HX |XX Xa) 3-17) This result is also established below. Our development follows the approach in Gallager (1968). First, we show that HUG | Xi Xa0 0 Xp) SACK |X Xa 0 Xia) G-3-18) for k>2. From our previous result that conditioning on a random variable ‘cannot increase entropy, we have AUX, | Xi X20 Xe) SMe |XX Xe-1) (3-3-19) From the stationarity of the source, we have A(X, | X2Xy00 1) A(Xe-1 | XiX2-+ Xe-2) (33-20) Hence, (3-3-18) follows immediately. This result demonstrates that H(X, [X,Xz°+ + X41) is a nonincreasing sequence in k. Second, we have the result HX) = H(X | XX -+ Xi) (3-3-21) which follows immediately from (3-3-14) and (3-3-15) and the fact that the last. term in the sum of (3-3-14) is a lower bound on each of the other k — 1 terms. Third, from the definition of H,(X), we may write AO = PH Aa Xs) FAK | Xi HUG = DH) + HX Xd) 1 —~(X) + ph) which reduces to Hy(X) = Hy-(X) (3-3-2) Hence, H,(X) is a nonincreasing sequence in k. Since H,(X) and the conditional entropy H(X;-|X,--+Xz-1) are both CHAPTER x souRCE come 105 nonnegative and nonincreasing with k, both limits must exist. Their limiting forms can be established by using (3-3-14) and (3-3-15) to express H,,,(X) as Ha) = EGO Kas) Ppp Xe Mean) Heo |X Fe tH Key |X Kee) ‘Since the conditional entropy is nonincreasing, the first term in the square brackets serves as an upper bound on the other terms. Hence, HOSE HOG Xa Xd ed [Xia Xd (3323) For a fixed k, the limit of (33-23) as j—+ yields HOAX) =, A(X) > fim H(Xs | XiXa-~* Xe-a) (3-326) which establishes (3-3.17). Now suppose we have a discrete stationary source’that emits J letters with H,(X) as the entropy per letter. We can encode the sequence of J letters with a variable-length Huffman code that satisfies the prefix condition by following the procedure described in the previous section. The resulting code has an average number of bits for the J-letter block that satisfies the condition HX X) SR < A(X X41 (33.27) By dividing each term of (3-3-27) by J, we obtain the bouills on the average number R= R,/J of bits per source letter as Hye R< nian} (3-3-28) By increasing the block size J, we can approach H,(X) arbitrarily closely, and in the limit as J > @, R satisfies HAX) 2, However, in practice, the statistics of a source output are often unknown. In principle, it is possible to estimate the probabilities of the discrete source output by simply observing a long information sequence emitted by the source and obtaining the probabilities empirically. Except for the estimation of the marginal prob- abilities {p,}, corresponding to the frequency of occurrence of the individual source output letters, the computational complexity involved in estimating joint probabilities is extremely high. Consequently. the application of the Huffman coding method to-source coding for many real sources with memory is generally impractical. In contrast to the Huffman coding algorithm, the Lempel-Ziv source coding algorithm is designed to be independent of the source statistics, Hence, the Lempel-Ziv algorithm belongs to the class of universal source coding algorithms. is a variable-to-fixed-length algorithm, where the encoding is performed as described below. In the Lempel-Ziv algorithm, the sequence at the output of the discrete source is parsed into variable-length blocks, which are called phrases. A new phrase is introduced every time a block of letters from the source differs from some previous phrase in the last letter. The phrases are listed in a dictionary, which stores the location of the existing phrases. In encoding a new phrase, we simply specify the location of the existing phrase in the dictionary and append the new letter. As an example, consider the binary sequence 1010101001001 11010100001 10011 1010110001101 Parsing the sequence as described above produces the following phrases: 1, 0, 10, 11, O1, 00, 100, 111, 010, 1000, O11, 001, 110, 101, 10001, 1011 We observe that each phrase in the sequence is a concatenation of a previous phrase with a new output letter from the source. To encode the phrases, we cunetem x. source cope 307 TABLE 334 DICTIONARY FOR LEMPEL-ZIV ALGORITHM Dictionary Dictionary Code lecation contents word 1 ooo 1 ‘20001 2 onto o ‘00000 3 ol 0 0010 + 100 u 0011 so! a co101 6 ow 0 0100 7 ot 100 opie 8 1000 um 1001 9 wor ow oto 0 1010 1000 oni n wou on o1o1i R100 cor ono BHO 0 01006) 4 ne wo oot sant 10001 10101 16 rot 101 construct a dictionary as shown in Table 3-3-4. The dictionary locations are numbered consecutively, beginning with 1 and counting up, in this case to 16, which is the number of phrases in the sequence. The different phrases corresponding to each location are also listed, as shown. The codewords are determined by listing the dictionary location (in binary form) of the previous Phrase that matches the new phrase in all but the last location. Then, the new output letter is appended to the dictionary location of the previous phrase. Initially, the location 0000 is used to encode a phrase that has not appeared previously. The source decoder for the code constructs an identical table at the receiving end of the communication system and decodes the received sequence accordingly. It should be observed that the table encoded 44 source bits into 16 code words of five bits each, resulting in 80 coded bits. Hence, the algorithm provided no data compression at all. However, the inefficiency is due to the fact that the sequence we have considered is very short. As the sequence is increased in length, the encoding procedure becomes more efficient and results in a compressed sequence at the output of the source. How do we select the overall length of the table? In general, no matter how farge the table is, it will eventually overflow. To solve the overflow problem, the source encoder and source decoder must agree to remove phrases from the tespective dictionaries that are not useful and substitute new phrases in their place. 108 picrrat commusicarions ‘The Lempel-Ziv algorithm is widely used in the compression of computer files. The “compress” and “uncompress” utilities under the UNIX® operating system and numerous algorithms under the MS-DOS operating system are implementations of various versions of this algorithm. 3-4 CODING FOR ANALOG SOURCES—OPTIMUM QUANTIZATION As indicated in Section 3-1, an analog source emits a message waveform x(1) that is a sample function of a stochastic process X(t). When X(t) is a bandlimited, stationary stochastic process, the sampling theorem allows us to represent X(1) by a sequence of uniform samples taken at the Nyquist rate. By applying the sampling theorem, the output of an analog source is converted to an equivalent discrete-time sequence of samples. The samples are then quantized in amplitude and encoded. One type of simple encoding is to Tepresent each discrete amplitude level by a sequence of binary digits. Hence, if we have L levels, we need R = logs L bits per sample if L is a power of 2, or R =Llog, LJ +1 if L is not a power of 2. On the other hand, if the levels are not equally probable, and the probabilities of the output levels are known, we may use Huffman coding (also called entropy coding) to improve the efficiency of the encoding process. Quantization of the amplitudes of the sampled signal results in data compression but it also introduces some distortion of the waveform or a loss of, signal fidelity. The minimization of this distortion is considered in this section. Many of the results given in this section apply directly to a discrete-time, continuous amplitude, memoryless gaussian source. Such a source serves as a good model for the residual error in a number of source coding methods described in Section 35, 341 Rate-Distortion Function Let us begin the discussion of signal quantization by considering the distortion introduced when the samples from the information source are quantized to a fixed number of bits. By the term “distortion,” we mean some measure of the difference between the actual source samples {x,} and the corresponding quantized values Z,, which we denote by dfs, %}. For example, a commonly used distortion measure is the squared-error distortion, defined as ie, Fe) = te — 4) G-4-1) which is used to characterize the quantization error in PCM in Section 3-5-1. Other distortion measures may take the general form (Xe, Bu) = be ~ Fal?” (3-4-2) where p takes values from the set of positive integers. The case p =2 has the advantage of being mathematically tractable. 109 If d(x. %) is the distortion measure per letter, the distortion between a sequence of n samples X, and the corresponding n quantized values X, is the average over the n source output samples, i.c., 40%, Ke) = 2S desu, 80) G-43) The source output is a random process. and, hence, the m samples in X,, are random variables. Therefore, d(X,, X,) is a random variable. Its expected value is defined as the distortion D, ie., D> FAK, KI)=7 S Aldo z= ld] 44) where the last step follows from the assumption that the source output process is stationary. Now suppose we have a memoryless source with a continucus-amplitude output X that has a pdf p(x), a quantized amplitude output alphabet X, and a per letter distortion measure d(x, ), where xeX and ¢ ¢X. Then, the minimum rate in bits per source output that is required to represent the output X of the memoryless source with a distortion less than or equal to D is called the raie-distortion function R(D) and is defined as ROY cpedti nn (OX) Ca where 1(X; X) is the average mutual information between X and X. In general, the rate R(D) decreases as D increases or. conversely, R(D) increases as D decreases. One interesting model of a continuous-amplitude, memoryless information source is the gaussian source model. In this case, Shannon proved the following, fundamental theorem on the rate-distortion function. ‘Theorem: Rate-Distortion Function for s Memoryless Gaussian Source (Shannon, 19594) The minimum information rate necessary to represent the output of a discrete-time, continuous-amplitude memoryless gaussian source based on a mean-square-error distortion measure per symbol (single letter distortion measure) is $log:(93/D) (0o3) (3-46) where o} is the variance of the gaussian source output. FIGURE 341 110 praia, commuNicaTIONS tO) ey 5 Rate distortion function fora continuous-ampitude memoryless “OOF Oa “Gs _OT 1 ‘gaussian source. ve} We should note that (3-4-6) implies that no information need be transmitted when the distortion D > a2. Specifically, D = 07 can be obtained by using zeros in the reconstruction of the signal. For D > 02, we can use statistically independent, zero-mean gaussian noise samples with a variance of D ~ 0? for the reconstruction. R,(D) is plotted in Fig. 3-4-1. The rate distortion function R(D) of a source is associated with the following basic source coding theorem in information theory. ‘Theorem: Source Coding with a Distortion Measure (Shannon, 1959a) ‘There exists an encoding scheme that maps the source output into code words such that for any given distortion D, the minimum rate R(D) bits per symbol (sample) is sufficient to reconstruct the source output with an average distortion that is arbitrarily close to D. It is clear, therefore, that the rate distortion function R(D) for any source represents a lower bound on the source rate that is possible for a given level of distortion. Let us return to the result in (3-4-6) for the rate distortion function of a memoryless gaussian source. If we reverse the functional dependence between D and R, we may express D in terms of R as DAR)=2*03 G47) This funcion is called the distortion-rate function for the discrete-time, memoryless gaussian source ‘When we express the distortion in (3-4-7) in dB, we obtain 10 logio Dg(R) = —6R + 10 logy a? (3-48) Note that the mean square distortion decreases at a rate of 6 dB/bit. Explicit results on the rate distortion functions for memoryless non-gaussian sources are not available. However, there are useful upper and lower bounds on the rate distortion function for any discrete-time, continuous-amplitude. memoryless source. An upper bound is given by the following theorem. ‘Theorem: Upper Bound on R(D) The rate-distortion function of a memoryless, continuous-amplitude source with zero mean and finite variance o7 with respect to the mean-square-error distortion measure is upper bounded as a R(D) = Hogs (W) When the output of this source is sampled at the Nyquist rate, the samples are uncorrelated and, since the source is gaussian, they are also statistically DIFFERENTIAL ENTROPIES AND RATE DISTORTION COMPARISONS OF FOUR COMMON PDFs FOR SIGNAL MODELS i R,D)-R“D)_DAR)— vat pte) Hay tbis/aample) (any ett) Gaussian saad Slog, (2meo?) oO 0 1 Univorm ATI VSe, ——{ogs(t203) ass Iss can ob g-VFie, ag! 2 Laplasan Mog: @e'02) e108 az Gamma viwiae, log, (4xe™"er7/3) 0709 425, TABLE 42 CHAPTER 3, souRcE coping 113 independent. Hence, the equivalent discrete-time gaussian source is memory- less, The rate-distortion function for each sample is given by (3-4-6). Therefore, the rate-distortion function for the band-limited white gaussian source in bits/s is R(D)= Whos (0 : af UK DA ~ pte dx 42ffuee- 1d —x)pc9 ae (423) TABLE 3-42 Ad icrrat cosmesications OPTIMUM STEP SIZES FOR UNIFORM QUANTIZATION OF A. GAUSSIAN RANDOM VARIABLE, Number of Optimum step Minimum MSE 10108 Dyin output levels Size ope Dawe (4B) 3 1.396 03834 44 4 os? 0.1188 ~925 ® (0.3860 o.as7as 1427 6 03382 donisa 938 2 ORS 00349 2ST In this case, the minimization of D is carried out with respect to the step-size parameter 4. By differentiating D with respect to 4, we obtain > (2k-1) FQQRK ~ 1) — x)p(x) dx +u-vf FUL ~ 18 ~ x)p(x) dx = 0 (3-4-24) where f'(x) denotes the derivative of f(x). By selecting the error criterion function f(x), the solution of (3-4-24) for the optimum step size can be obtained numerically on a digital computer for any given pdf p(x), For the mean-square-error criterion, for which f(r) =", Max (1960) evaluated the optimum step size og and the minimum mean square error when the pdf p(x) is zero-mean gaussian with unit variance. Some of these resulls are given in Table 3-4-2, We observe that the minimum mean square distortion Das decreases by a little more than 5 dB for each doubling of the number of levels L. Hence, each additional bit that is employed in a uniform quantizer with optimum step size jg. for a gaussian-distributed signal amplitude reduces the distortion by more than 5 dB. By relaxing the constraint that the quantizer be uniform, the distortion can be reduced further. In this case, we let the output level be # =, when the input signal amplitude is in the range x4, <2 " fis-npeyar (3-425) which is now minimized by optimally selecting the {24} and (x,). The necessary conditions for a minimum distortion are obtained by differentiating D with respect to the {%4) and {&%J. The result of this minimization is the pair of equations Sa 44) = fan) k cub-1 G426) [Ores opeydr=0, ke12. (427) TABLE 343 TABLE 3-44 CHAPTER»: souRce copING 11S OPTIMUM FOUR-LEVEL QUANTIZER FOR A GAUSSIAN RANDOM VARIABLE Level & * % 1 09816-1510 2 00 ~n.s28 3 9816 0.4528 4 = 1510 As a special case, we again consider minimizing the mean square vatue of the distortion. In this case, f(x) = x? and, hence, (3-4-26) becomes Het Meta K=12, -1 (3-4.28) which is the midpoint between %, and ,,,. The corresponding equations determining {z,} are iE Gx ple)dx=0, k=1,2, (3-4.29) Thus, £, is the centroid of the area of p(x) between x4, and x). These equations may be solved numerically for any given p(x). Tables 3-4-3 and 3-4-4 give the results of this optimization obtained by Max OPTIMUM EIGHT-LEVEL QUANTIZER FOR A GAUSSIAN RANDOM VARIABLE (MAX, 1960) Level k * 4 t 1788 ~2182 2 1050-1344 3 05008-07560 4 © 02451 5 0.5006 02451 6 1.050 0.7360 7 1948 we 8 ® 2182 Dyan = 003454 10 log Dax, = —14.62 4B TABLE 3.45 116 prarrat communications COMPARISON OF OFTIMUM UNIFORM AND. NONUNIFORM QUANTIZERS FOR A GAUSSIAN RANDOM VARIABLE (MAX, 1960; PAEZ AND GLISSON, 1972) 10105 Dass R (its/eample) Uniform (48) Nomunitorm (4 n44 930 1402 -022 36.02 -31.89 3781 (1960) for the optimum four-level and eight-level quantizers of a gaussian distributed signal amplitude having zero mean and unit variance. In Table 3-4-5, we compare the minimum mean square distortion of a uniform quantizer to that of a nonuniform quantizer for the gaussian-distributed signal amplitude. From the results of this table, we observe that the difference in the performance of the two types of quantizers is relatively small for small values of R (less than 0.5 dB for R <3), but it increases as R increases. For example, at R=5, the nonuniform quantizer is approximately 1.5 dB better than the uniform quantizer. tis instructive to plot the minimum distortion as a function of the bit rate R= log L bits per source sample (letter) for both the uniform and nonuniform quantizers. These curves are illustrated in Fig. 3-4-2. The functional depen- dence of the distortion D on the bit rate R may be expressed as D(R). the distortion-rate function, We observe that the distortion-rate function for the optimum nonuniform quantizer falls below that of the optimum uniform quantizer. Since any quantizer reduces a continuous amplitude source into a discrete amplitude source, we may treat the discrete amplitude as letters, say X ={%.15k SL), with associated probabilities {p,}. If the signal ampli- tudes are statistically independent, the discrete source is memoryless and, hence, its entropy is Ae ~ 2 pa logs px (3-4.30) For example, the optimum four-level nonuniform quantizer for the gaussian-distributed signal amplitude results in the probabilities p, = p,= 0.1635 for the two outer levels and p2 =p; = 0.3365 for the two inner levels. The entropy for the discrete source is H(X)=1.911 bits/letter. Hence, with FIGURE 3.42 TABLE 3.46 Haren» sovece counc M17 Opium ‘Optima aanun form Eniopy coding Drunionte Sy fanetion oe eS, aaetan sce \ Deki? : as iieeeaeraaseeees Robes sample Distonion versus rate curves for discrete-time memoryless gaussian source entropy coding (Hufiman coding) of blocks of output letters, we can achieve the minimum distortion of -9.30dB with 1.911 bits/letter instead of 2bits/letter. Max (1960) has given the entropy for the discrete source letters resulting from quantization. Table 3-4-6 lists the values of the entropy for the nonuniform quantizer. These values are also plotted in Fig. 3-4-2 and labeled entropy coding. From this discussion, we conclude that the quantizer can be optimized when the pdf of the continuous source output is known. The optimum quantizer of L=2" levels results in a minimum distortion of D(R), where R=log: L. ENTROPY OF THE OUTPUT OF AN OPTIMUM NONUNIFORM QUANTIZER FOR A GAUSSIAN RANDOM VARIABLE iMAX, 1960) z Entropy Distortion (oitfsample) ——_(bitflette) ——10lotie Dan 1 10 ~4a 2 rou 930 3 2828 la 4 Ams 2022 5 4730 26.42 JAB otartaL communications bits/sample. Thus, this distortion can be achieved by simply representing each quantized sample by R bits. However, more efficient encoding is possible. The discrete source output that results from quantization is characterized by a set of probabilities {p,} that can be used to design efficient variable-length codes for the source output (entropy coding). The efficiency of any encoding method can be compared with the distortion-rate function or, equivalently, the rate-distortion function for the discrete-time, continuous-amplitude source that is characterized by the given pdf. If we compare the performance of the optimum nonuniform quantizer with the: distortion-rate function, we find, for example, that at a distortion of 26 dB, entropy coding is 0.41 bits/sample more than the minimum rate given by (3-4-8), and simple block coding of each letter requires 0.68 bits/sample more than the minimum rate. We also observe that the distortion rate functions for the optimal uniform and nonuniform quantizers for the gaussian source approach the slope of ~6 dB/bit asymptotically for large R. 34-3 Vector Quantization In the previous section, we considered the quantization of the output signal from a continuous-amplitude source when the quantization is performed on a sample-by-sample basis, ie., by scalar quantization. In this section, we consider the joint quantization of a block of signal samples or a block of signal parameters. This type of quantization is called block or vector quantization. It is widely used in speech coding for digital cellular systems. A fundamental result of rate-distortion theory is that better performance can be achieved by quantizing vectors instead of scalars, even if the continuous-amplitude source is memoryless. If, in addition, the signal samples or signal parameters are statistically dependent, we can exploit the dependency by jointly quantizing blocks of samples or parameters and, thus, achieve an even greater efficiency (lower bit rate) compared with that which is achieved by scalar quantization. ‘The vector quantization problem may be formulated as follows. We have an n-dimensional vector X= (x1 x2 + x] with real-valued, continuous- amplitude components {r,,1 PUKE CEUX.) [Ke G] => Pike Gs), d(x, P(X) AX, (34:32) where P(X € C,) is the probability that the vector X falls in the cell C, and p(X) is the joint pdf of the m random variables. As in the case of scalar quantization. we can minimize D by selecting the cells {C,.1> L). An iterative clustering algorithm, called the K ‘means algorithm, where in our case K = 1, can be applied to the training vectors. This algorithm iteratively subdivides the M training vectors into L clusters such that the two necessary conditions for optimality are satisfied. The K means algorithm may be described as follows [Makhoul er al. (1985)). K Means Algorithm Step 1 Initialize by setting the iteration number i=0. Choose a set of output vectors K(0), 15k SL. Step 2 Classify the training vectors [X(m), 11 is a constant that is selected to minimize the total distortion. A block diagram of s DM encoder-decoder that incorporates this adaptive algorithm is illustrated in Fig. 3-5-9. Several other variations of adaptive DM encoding have been investigated and described in the technical literature. A particularly effective and popular technique first proposed by Greefkes (1970) is called continuously variable slope delia modulation (CVSD). In CVSD the adaptive step-size parameter may be expressed as 4, = ad) + hy if2,, 2-1, and é,_2 have the same sign; otherwise, Smad, | + ky ‘The parameters a, k,, and k2 are selected such that 0> k>0. For more discussion on this and other variations of adaptive DM. the interested reader is referred to the papers by Jayant (1974) and Flanagan ef a. (1979), which contain extensive references. 136 prarrat communicaTions. Encoder fe owe FIGURE 35.9 An example of a delta modulation system with adaptive step size. PCM, DPCM, adaptive PCM, and adaptive DPCM and DM are all source encoding techniques that attempt to faithfully represent the output waveform from the source. The following class of waveform encoding methods is based on a spectral decomposition of the source signal. 3-5-2 Spectral Waveform Coding In this section, we briefly describe waveform coding methods that filter the source output signal into a number of frequency bands or subbands and separately encode the signal in each subband. The waveform encoding may be cnarrer > source copna 137 performed either on the time-domain waveforms in each subband or on the frequency-domain representation of the corresponding time-domain waveform in each subband, Subband Coding In subband coding (SBC) of speech and image signals, the signal is divided into a small number of subbands and the time waveform in each subband is encoded separately. In speech coding. for example, the lower frequency bands contain most of the spectral energy in voiced speech. In addition, quantization noise is morc noticeable to the ear in the lower- frequency bands. Consequently, more bits are used for the lower-band signals and fewer are used for the higher-frequency bands. Filter design is particularly important in achieving good performance in SBC. In practice, quadrature-mirror filters (QMFs) are generally used because they yield an alias-free response due to their perfect reconstruction property {see Vaidyanathan, 1993). By using QMFs in subband coding, the lower- frequency band is repeatedly subdivided by factors of two, thus creating octave-band filters. The output of each OMF filler is decimated by a factor of, two, in order to reduce the sampling rate. For example, suppose that the bandwidth of a speech signal extends to 3200Hz. The first pair of QMFs divides the spectrum into the low (0-1600 Hz) and high (1600-3200 Hz) bands. Then, the low band is split into low (0-800 Hz) and high (800-1600 Hz) bands by the use of another pair of QMFs. A third subdivision by another pair of QMS can split the 0-800 Hz band into low (0-400 Hz) and high (400-800 Hz) bands. Thus, with three pairs of QMFs, we have obtained signals in the frequency bands 0-400, 400-800, 800-1600 and 1600-3200 Hz. The time- domain signal in each subband may now be encoded with different precision. In practice, adaptive PCM has been used for waveform encoding of the signal in each subband. Adaptive Transform Coding In adaptive transform coding (ATC), the source signal is sampled and subdivided into frames of N, samples, and the data in each frame is uansformed into the spectral domain for coding and transmission. At the source decoder. each frame of spectral samples is transformed back into the time domain and the signal is synthesized from the time-domain samples and passed through a D/A converter. To achieve coding efficiency, we assign more bits to the more important spectral coefficients and fewer bits to the less important spectral coefficients. In addition, by designing an adaptive allocation in the assignment of the total number of bits to the spectral coefficients, we can adapt to possibly changing statistics of the source signal. An objective in selecting the transformation from the time domain to the frequency domain is to achieve uncorrelated spectral samples. In this sense. the Karhunen-Loéve transform (KLT) is optimal in that it yields spectral values that are uncorrelated, but the KLT is generally difficult to compute (see 138 o1cstat communications Wintz, 1972). The DFT and the discrete cosine transform (DCT) are viable alternatives, although they are suboptimum. Of these two, the DCT yields good performance compared with the KLT, and is generally used in practice (see Campanella and Robinson, 1971; Zelinsky and Noll, 1977). In speech coding using ATC, it is possible to attain communication-quality speech at a rate of about 9600 bits/s. 3-5-3 Model-Based Source Coding In contrast to the waveform encoding methods described above, model-based source coding represents @ completely different approach. In this, the source is modeled as a linear system (filter) that, when excited by an appropriate input signal, results in the observed source output. Instead of transmitting the samples of the source waveform to the receiver, the parameters of the linear system are transmitted atong with an appropriate excitation signal. If the number of parameters is sufficiently small, the model-based methods provide a large compression of the data. The most widely used model-based coding method is called linear predictive coding (LPC). In this, the sampled sequence, denoted by x», n =0, 1,...,N~ 1, is assumed to have been generated by an all-pole (discrete-time) filter having the transfer function Hiz)= £ G5-18) t= $ az Appropriate excitation functions are an impulse, a sequence of impulses, or a sequence of white noise with unit variance. In any case, suppose that the input sequence is denoted by u,, 2 =0,1,2,.... Then the output sequence of the all-pole model satisfies the difference equation mS ant Gun, 0.1.2, G-5-19) In general, the observed source output x,, 1 =0,1,2,...,.N—1, does not satisfy the difference equation (3-5-19), but only its model does. If the input is white-noise sequence or an impulse, we may form an estimate (or prediction) of x, by the weighted linear combination 2.2 3S ate n>0 (35-20) The difference between x, and £,, namely, nen km am 3 ates 35-21) caamies 5 source cone 139 represents the error between the observed value x, and the estimated (predicted) value &, The filter coefficients {a,} can be selected to minimize the mean square value of this error. ‘Suppose for the moment that the input {v,] is a white-noise sequence. Then, the filter output x, is a random sequence and so is the difference ¢, = x, ~ fy The ensemble average of the squared error is G,= Ele) = 60)- 2S a0k)+ SS mandth—m) (35-22) where é(} is the autocorrelation function of the sequence x,. 0.1....,.N-=1. But 4; is identical to the MSE given by (3-5-X) for a predictor used in DPCM. Consequently, minimization of ¢, in (3-5-2) yields the set of normal equations given previously by (3-5-9). To completely specify the filter H(z), we must also determine the filter gain G. From (3-5-19), we have E\(Gu,F} (35-23) where é, is the residual MSE obtained from (3-5-2) by substituting the ‘optimum prediction coefficients, which result from the solution of (3-5-9). With this substitution, the expression for €, and. hence. G? simplifies to G, = G= 8) — Y ac dtk) (35-24) In practice. we do not usually know @ priori the true autocorrelation function of the source output. Hence, in place of d(m), we substitute an estimate (1) as given by (3-5-10}. which is obtained from the set of samples 41 =0.1,...,N— 1, emitted by the source. As indicated previously, the Levinson—Durbin elgorithm derived in Appen- dix A may be used to solve for the predictor coefficients {a,} recursively. beginning with a first-order predictor and iterating the order of the predictor up to order p. The recursive equations for the {a,} may be expressed as $0) ~ Ea, d0-K) Oe aaa Oy a Ful ve VSRSERT é=(1-a,)%, FIGURE 3541 140 oxortat conmunicarions where ay, k=1,2,...,4, are the coefficients of the ith-order predictor. The desired coefficients for the predictor of order p are 8 Op, k=1,2.. and the residual MSE is P (3-5-26) #-@=30)- 3 adh) = 60) f (1-43) (3-5-27) We observe that the recursive relations in (3-5-25) give us not only the coefiicients of the predictor for order p, but also the predictor coefficients of al orders less than p. ‘The residual MSE |,2,...,p, forms a monotone decreasing se- quence, ie. <8, <...<@,<%, and the prediction coefficients a, satisfy the condition Mail... , px. fespectively. Prove that the entropy H(X) of the source is at most log 2 346 Determine the differentia! entropy H(X) of the uniformly distributed random variable X with pdf at (O X, where K is 2 positive integer, what is the entropy of X? 146 rortar communications AL Let X and ¥ denote two jointy distributed discrete valued random variables. Show that HX) D Pex ylog Por) HY) =~ P(x. y)log Piy) b Use the above result to show that HX) = HX) + HO) When does equality hold? ¢ Show that HX | Y) = HU) with equality if and only if X and ¥ are independent. 342 Two binary random variables X and ¥ are distributed according to the joint distributions p(X = ¥=0)= p(X =0. Y= 1) =p(X = ¥ = 1) = 4. Compute A(X), HAY), HX | ¥), HOE |X), and HEX. ¥) 313 A Markov process is a process with one-step memory. i... a process such that pis, for all n. Show that, for a stationary Markov process, the entropy rate is given by HX! &, 3-14 Let ¥ = @(X), where g denotes a deterministic function. Show that, in general, HY) = HX), When does equality hold? BAS Show that /(X:¥) = H(X) + HOY} ~ HOY) 316 Show that. for statistically independent events, WX X X= S HK) 3417 For a noiseless channel. show that H(X | ¥)=0, 3418 Show that WX Xe] Ky) = A(X) X= HX |XX) and that MUX, |X) HUN |X ) 349 Let X be a random variable with pdf py(e) and let Ye aX +b be a hinear transformation of X. where a and b are two constants. Determine the differentia’ entropy H(Y) in terms of H(X), 3-20 The outputs x,, 3, and x, of a DMS with corresponding probabilities p, = 0.45 0.35, and p; = 0.20 are transformed by the linear transformation Y= aX +d, a and 6 are constants. Determine the entropy (Y) and comment on wha effect the transformation has had on the entropy of X. 3-21 The optimum four-level nonuniform quantizer for a youssian-distributed signal amplitude results in the four levels ay. as. a, and ay, with corresponding probabilities of occurrence p, = p: = 0.3365 and p, =p, = 0.1635, FIGURE P322 ‘CHAPTER 8 source copiNc 147 Aen) a ee rn 1 Design a Huffman code that encodes a single level at a time and determine the average bit rate. bb Design a Huffman code that encodes two output levels at a time and determine the average bit rate. What is the minimum rate obtained by encoding J output levels at a time as Je? 322A first-order Markov source is characterized by the state probabilities P(x). i=1,2,...,L, and the transition probabilities P(t, |x), k= 1,2,....L, and k #1. The entropy of the Markov source is PU )H(X | 4) where H(X | %) is the entropy conditioned on the source bein Determine the entropy of the binary, first-order Markov source shown in Fig. 3-22, which has the transition probabilities P(x, |x.) =0.2 and P(x, | x2) =03. {Note that the conditional entropies H(X |x1) and H(X |x) are given by the binary entropy functions H[P(x, | x,)] and H[P(x, | x)]. respectively] How does the entropy of the Markov source compare with the entropy of a binary DMS with the same output letter probabilities P(x.) and P(x,)? 323 A memoryless source has the alphabet sf ={-5,-3, -1,0,1,3,5} with corre sponding probabilities (0.05,0.1,0.1,0.15, 0.05, 0.25, 0.3}. ‘a Find the entropy of the source. 'b Assuming that the source is quantized according to the quantization rule 9(-5)= 4(-3) =4 a(—1)= 40) = g(t) =0 9(3)=q(5)=4 find the entropy of the quantized source. 3-24 Design a ternary Huffman code, using 0, 1, and 2 as letters, for @ source with output alphabet probabilities given by {0.05,0.1, 0.15, 0.17, 0.18, 0.22, 0.13}. What is the resulting average codeword length? Compare the average codeword length with the entropy of the source. (In what base would you compute the logarithms in the expression for the entropy for a meaningful comparison?) 325 Find the Lempel-Ziv source code for the binary source sequence {900100100000011000010000000100000010100001 0000001 10100000001 100 Recover the original sequence back from the Lempel-Ziv source code. [Hint: You require two passes of the binary sequence to decide on the size of the dictionary.) 3:26 Find the differential entropy of the continuous random variable X in the following cases: 148 oicrra communications # X is an exponential random variable with parameter A>0, ie., fasy={9 b X is a Laplacian random variable with parameter A>0, ie. dwn f= yer © Xisa triangular random variable with parameter A>0, ie., (+a (-ASx<0) fel) =} (-x Fay (Oa) (see Berger, 1971. 1 How many bits per sample are required to represent the outpats ofthis source with an average distortion not exceeding 44? b Plot R(D) for thee diferent values of A and discus the etfect of changes in A on these plots. 3-28 It can be shown that if X isa zero-mean continuous random variable wth variance 2%, its rate distortion function, subject to squared error dlstortion measure. satisies the lower and upper bounds given by the ineg H(X)~ Hog 2xeD = R(D)< Slog to" where h(%) denotes the differential entropy ofthe random variable X (see Cover and Thomas, 1991). Show that. for a Gaussian random variable, the lower and upper bounds cine b Plot the lower and upper bounds fora Laplacian source with & « Plot the lower and upper bounds for a triangutat source with 329 stationary random procets hes an autocorrelation fonction. given by Ry = }A’e“"'cos2efir and itis known that the random process never exceeds 6 in magnitude. Assuming A=, how many quantization levels are required. 10 guarantee a sgral-to-quantization noise ratio of at fast 60 dB? 330 An additive white gaussian noise channel has the output Y ~ X + G, where X is the channel input and G isthe ncse wiih probability density fonction 1 2" ire, Af X is a white gaussian input with E(X)=0 and E(X*) = a the conditional diferemtial entropy H(X | G): b the average mutual information 1X: ¥) 331A DMS has an alphabet of eight letters x, f, determine 1,.2....,8, with probabilities CHAPTER 3) soURcE copna 149 given in Problem 3-7. Use the Huffman encoding procedure to determine a ternary code (using symbols 0, I, and 2) for encoding the source output. [Hint: Add a symbol x5 with probability py~0, and group three symbols at a tine, 382 Dame sheer there oxite 4 binary code with code word legs (essay ms, m4) = (1, 2,2, 3) that satisfy the prefix condition. 3.33 Consider a binary biock code with 2" code words of the same length n. Show that the Kraft inequality is satisfied for such a code. 334 Show that the entropy of an a-dimensionsl gaussian vector X= [x, x; with zero mean and covariance matrix M is HX) = Hog: (2n2)" (MI 335 Consider a DMS with output bits, (0,1) that are equiprobable. Define the distortion measure es D = P,, where P, is the probability of error in transmitting the binary symbols to the user over a BSC. Then the rate distortion function is (Berger, 1971) R(D)=1+ Dog, D+(1~D)lops(1~D), 08D = F<} Plot R(D) for 0< D3 336 Evaluate the rate distortion function for an M-ary symmetric channel where D=Pyand R(D) = log: M+ D log: D+ (I~ D) oes 4 for M =2, 4,8, and 16. Py is the probability of error. 337 Consider the use of the weighted mean-square-error (MSE) distortion measure defined as 4X, X) = (X— KY W(X ~ K) W is a symmetric, positive-definite wieghting matrix. By factorizing W as show that d,(X,X) is equivalent to an unweighted MSE distortion measure d,(X’,X') involving transformed vectors X’“and X’. 3-38 Consider a stationary stochastic signal sequence {X(n)} with zero mean and autocorrelation sequence 1 (n=0) d(ny=y} (n= 21) 0 (otherwise) 4 Determine the prediction coefficient of the firstorder minimum MSE predictor for {X(n)} given by (n) = a.x(n = 1) and the corresponding minimum mean square error 'b Repeat (a) for the second-order predictor 2(n) = a,x(n ~ 1) + a,x(n ~2) 150 vicrTat commenicaTions ho +f Z YY Ye by Ts oan o z 7 Wan e P32 ‘ 5 ty = Z Z Ly fs 5 t ar 7 o 7 FIGURE P3.9 “ 339 Consider the encoding of the random variables 1, and x, that are characterized by the joint pdf p(x,,x3) given by ya { 15/76 Git eC) eet eaelo (otherwise) as shown in Fig, 3-39. Evaluate the bitrates required for uniform quantization of xx and «separately (scalar quantization) and combined (vector) quantization of (1.49). Determine the difference in bitrate when a = 4b. 3-40 Consider the encoding of two random variables X and ¥ that are uniformly distributed on the region between two squares as shown in Fig. P3-40. a Find f(x) and f(y) bb Assume that each of the random variables X and ¥ are quantized using four fevel uniform quantizers. What is the resulting distortion? What is the resulting umber of bits per (X. ¥) pair? FIGURE P340 CHAPTER» souRCE copNG 151 FIGURE P3.41 «¢ Now assume that instead of scalar quantizers for X and ¥. we employ a vector ‘quantizer to achieve the same level of distortion as in (b). What is the resulting umber of bits per source output pair (X, ¥)? 341 Two random variables X and Y are uniformly distributed on the square shown in Fig. P3-41, fa Find fa(x) and f(y) 'b Assume that each of the random variables X and Y are quantized using four level uniform quantizers, What is the resulting cistortion? What is the resulting number of bits per (X, ¥) pair? «¢ Now assume that, instead of scalar quantizers for X and Y, we employ a vector quantizer with the same number of bits per source output pair (X, ¥) as in (b). ‘What is the resulting distortion for this vector quantizer?

You might also like