You are on page 1of 34
16.1 Information Measure and Source Encoding 769 bandiwidih ond signalionoise rao, and thereby serves as a benchmotk for comparing the performance of commir nigation systems inthe light of information theory. The chopler closes with on application of heory fo optimum digital detection. We'll! intoduce signal space 95.0 meons ol representing ond quantifying signals, Then the principles of optimum detection will be applied to dig ‘ol communication, OBJECTIVES After studving this chapter and working the exercises, you should be able 10 do each of the following: Define Shannon's measure of information (Sect. 16.1) 2, Calculate the information rate ofa discrete memoryless source (Sect. 16.1). 3 Analyze adisrete memoryless channel, given the souree and transition probabilities (Sect. 16.2). 4, State and apply Shannon's fundamental th heorem for information transmission on a noisy channel (Sects. 16.2 and . 163). 5. State and apply the Hartley-Shannon law fora continuous channel (Sect. 16.3). 4 Draw the vector representation of two or more signals (Sect. 16.4, 7. Construct the decision regions for MAP detection in a twoslimensional space (Sect. 16.5). Find the decision functions for a MAP receive gular decision regions (Sect. 16.5). = and evaluate the symbol error probabilities when there are rectan- 16.1 INFORMATION MEASURE AND SOURCE ENCODING Quite logically, we begin our study of information theory with the measure of informa- tion. Then we apply information measure to determine the information rate of discrete Sources, Particular attention will be given to binary coding for discrete memoryless sources, followed by a brief look at predictive coding for sources with memory. Information Measure ‘The crux of information theory is the measure of information. Here we use informa tion as a technical term, not to be confused with “knowledge” or “meaning"— ‘concepts that defy precise definition and quantitative measurement. In the context of communication, This implies that Suppose, for instance, that you're planning a trip to a distant city, To determine ‘what clothes to pack, you might hear one of the following forecasts: + The sun will rise, ‘There will be scattered rainstorms, There will be a tornado. 70 ‘are and unexpected events; you might even decide 0 Notice that the messages have been listed in order of decreasing likelihood and ucretsing information, The Ieee likely the message, the more information it con- Yeys. We thus conclude that information measure must be related to uncertainty, the in ty OF the user a to wat the ‘message will be. Alternatively, we can say that information measures the Sreedom of choice exercised by the source in selecting a message. When a source freely chooses from many different messages, the user is sage Vetter you prefer the Source of user viewpoint, it should be evident that information Measure involves the Probability of a message. If x, denotes an arbi- TBFY message and P(x) = isthe probability of the event that x, is selected for } then the amount of information associated with x; should be some Mnction of P,, Specifically, Shannon defined information measure by the logarith- Qn * The sun will rain tornadoes, It will convey lots of information—being very improbable—despite the lack of substance. Although not immediately obvious, the definition in Eg. (1) has several impor fant and meaningful Consequences, including 420 for osp<} (20) 450 for P31 (2b) 4>h for PB

Lit P, < P, information increases with uncertainty), Further- we with epOse & Source produces two successive and independent messages, x; and % with joint probability Px;) = P,P; then $0 the total information equals the sum of the individual message contributions, Shannon's information measure log, (1/P,) is the only function that satisfies all of the properties in Eqs, (2) and (3), 161 Information Measure and Souree Encoding Specifying the logarithmic base b determines the unit of information. The stan- dard convention of information thery takes = 2 andthe corresponding units the bit, @ name coined by J. W. Tukey asthe contraction for “binary digit.” Equation (1) thus becomes J, = log; (1/P.) bits. Ths convention normalizes information measure relative tothe most elementary soure,a source tha selets from just to equiprob- able messages. Far if Px;) = P(x) /2, then fy = f = logy 2 = | bit In other ‘words, 1 biti the amount of information needed to choose between two eqully likely altematives Binary digits enter the picture simply because any two things can be represented by the two binary digits. However, you must carefully distinguish information bits from binary digits per se—especially since a binary digit may convey more a Iss ‘han one bit of information, depending upon its probability. To prevent possible mis- interpretation, the abbreviation biitsis sometimes usd for binary digits as message or code clement When necessary, you can convert base-2 natural or common logarithms via Tog: v = BY. one w In ~ Tog? IEP, = 1/10, for instance, then J, = (Iogio 10)/0.301 = 3.32 bis Entropy and Information Rate Now consider an information source that emits a sequence of symbols selected from an alphabet of M different symbols, ic., an M-ary alphabet, Let X denote the entire sot of symbols x, --..rye We can treat each symbol x) as a message that occurs with probability P; and conveys the self-information J, The set of symbol probabili- ties, of course, must satisfy 5) ‘We'll assume that the source is stationary so the probabilities remain constant over time, We'll also assume that successive symbols are statistically independent and ome from the source at an average rate of r symbols per second, These properties define the model ofa diserete memoryless source. (6 Yai Shannon borrowed the name, and notation H from a similar expression in statistical mechanics. Subsequently, various physical and philosophical arguments have been put forth relating thermodynamic entropy 10 ‘communication entropy (or comentrop)’). But we'll interpret Eq, (6) from the more pragmatic observation that when the souree emits a sequence of n >> | symbols, the total information to be transferred is 72 SHAFTER 16 © Infoemation and Detection Theory Fae duration ofthis sequence is aber ‘yr. The information must therefore be trans. eed atthe average rate yn) 20) bits pe second. Fooly, we define the source information rate rH(X) bits/sec a 0SH(X) = log M 8) The lower bound here Fol Foods to no uncertainty oF freedom of choice, which Sccurs when one symbol hag Probability P= 1 while P, jource almost always emits the same Symbol. The upper bound corresponds to ma imum uncertainty of freedom of choice, which occurs when P, the symbols are equally likely, __ Toillustrate the variation of H(X) between these extremes, take the special but portant case of a binary source (Af = 2) with Pi=p MO) = 10) © og, 2 + (1 ~ p) toy (9) “Pp in which we've introduced the “horseshoe” function Q(p). The plot of O(p) in Fig” 16.1-1 displays a rather broad maximum centered at p = 1 ~ p = 1/2 where FAX) = logs 2 = 1 bit/symbol; HUY) then decreases monotonically to zero ag P>lorl=ps], Proving the lower bound in Eq, (8) with arbitrary Mis easily done once you note that v logs (I/v) —+0 as v > 0. The proof of te Upper bound H(X) < log, M involves a few more steps but deserves the effort First, we introduce another set of probabilities Q,,0,..., Qy, and replace low: (I/F;) in Eq. (6) with log; (Q,/P). Conversion from base-2 to natural loga- rithms gives the quantity 2 aaah Where it's understood that all sums range from i = 1 to M, Second, we invoke the inequality in v = v ~ 1, which becomes an equality only if v = 1, as seen in Fig. 16.1-2. Thus, letting v = P/Q, and using Eq, (5), 16.1 Information Measure and Source Encoding 2p) » 2 Os 10 Figure 16.1-1 Binary entropy os © function of probability ponde p85 Third, we impose the condition ® u DOA! (10) 80 it follows that « Qi 2 logs pS 1- (108) Finally, taking Q; = 1/M we have L 1 DP los Bay = DP lows 5 ~ DSP; log M = HX) — log, Ms 0 thereby confirming that H(X) < log, M. The equality holds only in the equally likely case P, = 1/M sow = Q,/P, = 1/Mp, = | for all Source Entropy and Signaling Rote EXAMPLE 16.1- Suppose 2 source emits r = 2000 symbols/se selected from an alphabet of size ‘M = 4 with symbol probabilities and self information listed in Table 16.1-1. Equa tion (6) gives the source entropy HX) =} 1 +EX24$X3 +45 X 3 = 175 bits/symbol Which falls somewhat below the maximum value log, M = 2. The information rate is R = 2000 X 1.75 = 3500 bits/see and appropriate coding should make it possible to transmit the source information at the binary signaling rate r, 2 3500 binits/sec. 774 CHAPTER 16 @ Information and Detection Theory AV Figure 16.1-2 Plott» — 1 and Info). Table 16.1-1 Symbol probabilities cond self information for Example 16.1-1 fa e 4 4 1/2 1 B 4 2 c. 18 3 D 8 3 ee ae ee FXERCISE 16.1-1 Suppose a source has M = 3 symbols with probabilities P, = p and P, = P,, Show that H(X) = () + 1 ~ p. Then evaluate H(X)|,.. and the corresponding value of p. Coding for a Discrete Memoryless Channel [hen a discrete memoryless source produces M equally likely symbols, so Af = logs M, all symbols convey the same amount of information sod efficient takes account ofthe variable amount of information per symbol. Here, we'll inven, gate source encoding with a binary encoder, Equivalent results for nonbinary encod. ing just require changing the logarithmic base. The binary encoder in Fig. 16.13 converts incoming source symbols to code- words consisting of binary digits produced at some fixed rate r, Viewed from its ins 16.1 Information Measure and Source Encoding lp)" Discrete R=AKX) memoryless. [| souree Binary encoder Figure 161-3 Source encoding entropy (p) and information not generate additional infor- that the code is uniquely n rates, WE ‘output, the encoder looks like a binary source with rate r,0(p) $ r; log, 2 = ry. Coding obviously does mation, nor does it destroy information providing decipherable, Thus, equating the encoder’s input and output informatior conclude that R = rH(X) = rp) S ry or ro/r = HX). The quantity = ry/r is an important parameter called the length, Physically, corresponds to the average number of binary digits per source symbol. Mathematically, we write the statistical average Ne San, a a average code ‘where NV; represents the length of the codeword for the ith symbol. _ Shannon’s source coding theorem states that the minimum value of Nis bounded by HX) SN H(X) if the code has reasonably good efficiency. The ratio R/r, = H(X)/N = 1 serves as the measure of efficiency for suboptimum codes. ‘The source coding theorem presumes a uniquely decipherable code to ensure that no information is lost. This requirement imposes an additional but indirect con- straint on N. Specifically, as a necessary and sufficient condition for @ uniquely deci- pherable binary code, the word lengths N; must be such that u K=r"si (13) a which is the Kraft inequality. The simplest encoding process generates a fixed- length code, all codewords having the same length N; = W. The Kraft inequality in Eq, (13) then becomes K = M2-" = 1, so decipherability requires N = log, M and the resulting efficiency is H(X)/N < H(X)/log, M. When H(X) < log: M, higher efficiency calls for variable-length c to reduce the average code length N, An ‘example should help clarify these concepts. Source Coding and Efficiency Table 16.1-2 lists some potential codes for the source in Example 16,11. Code | is a fixed-length code with NV = log, M = 2 binits/symbol, as compared to H(X) = ; 776 CHAPTER 16 ® Information and Detection Theory Table 16.1-2 tstative source codes a 4% Code t Code I Code III Code IV 4 ip 00 0 0 0 a 4 o1 1 1 0 cK 10 10 ou rt) Dp i " out m1 R20 Las 1.875 las x 10 Ls 0.9375 10 i 1:75 bits/symbol. The efiieney is A(A)/N = 1.75/2 = 88%—not bad, but we can do better with a variable-length code, Application of Eqs. (11) and (13) to Code Il gives Weatxtetxreix24)x2=125 1 Karts og gy 97 The result V < H(X) is meaningless because K > 1, which tells us that this code is not uniquely decipherable. For instance, the code sequence 10011 could be decoded as BAABB or CABB or CAD, and so on, Hence, Cove II effectively destroys source information and cannot be used. Code III, known as a comma code, has K'< 1 and ensures decipherability by ifarking the start of each word with the binary digit 0. However, the extra comma digits result in W = 1.875 > H(X). Code IV i a tree code withthe property that no codeword appears as the prefix in another codeword. Thus, for example, you can check that the code sequence 110010111 unambiguously represents the message CABD. This code is optimum for the source in question since it has 1.75 = H(X) as well as K = 1, Having demonstrated optimum coding by example, we tum to a general proof of the source coding theorem. For this purpose we. sien with Bq. (106), taking Q.= 2°%/K where K is the same as in Eq, (13), which satisfies Eq. (10a), Then 1 i 1 = DP tog, 2 BA (lost Mi lots k) = (0) =~ tops & 50 or, since log, K $0, we have Wem which establishes the lower bound in Eq. (12). The equality holds when K F,= Q,. Optimum souree encoding with N= H(X) theresone requires K symbol probabilities of the form Poh 12,54 (14) so that N, = log, P, 16.1 Information Measure and Source Encoding An optimum code must also have equally likely 1s and 0s in order to maximi2e the binary entropy at A(p) = 1, Code IV in Table 16,1-2 exemplifies these optimal Properties, Granted, we can’t expect every source to obey Eq, (14), and we certainly can't Control the statistics of an information source. Even so, Eq. (14) contains a signifi- cant implication for practical source coding, to wit: Symbols that occur with high probability should be assigned shorter codewords than symbols that occur with low probability. Long before Shannon, Samuel Morse applied this commonsense principle to his telegraph code for English letters (Incidentally, Morse estimated leter probabilities by counting the distribution of type in a printer's font.) We'll invoke this principle to establish the upper bound in the source coding theorem. Let the length of the ith codeword be an integer NV; that falls within SN, > 1 or N;~ logs (1/P,) forall i. If neither condition holds, then we ‘must resort to the process called extension coding, Foran extension code, n successive source symbols are grouped into blocks and the encoder operates on the blocks rather than on individual symbols. Since each block consists of n statistically independent symbols, the block entropy is just nH(X). Thus, when the coding rule in Eq, (15) is modified and applied to the exten- sion code, Eq. (16) becomes nH(X) < nN < nH(X) + 1, where niV is the average number of binary digits per block. Upon dividing by n we get Hn $< Hag ++ 07) which is our final result. Equation (17) restates the source coding theorem with € = I/n, It also shows that N— H(X) as n > ©, regardless of the source statistics. We've thereby proved that an nth-extension code comes arbitrarily close to optimum source coding. What we haven't addressed is the technique of actual code construction. Systematic algo- rithms for efficient and practical source coding are presented in various texts. The following example serves as an illustration of one technique. 7 CHAPTER 16 @ Information and Detection Theory Table 16.1-3 Shannon ano coding ee <8 [Lo . a as Trp ef ¢ ais y;ijojs tol D 0.08 1 1 0 ue E 0.08 1 1 1 0 ao: F002 1 1 1 1 E atta G oof t itt 1 lo sunita HH 001 1 1 1 I ' ' a EANPLE 161-3 Shamoohangg EXANPIE 16.1~3— Shannon-Fano coding Shannon-Fano coding generates an efficent code in which the word length increase as the symbol probabilities decrease, but not necessarily in strict accordance with Eq, (15). The algorithm provides a tree-code structure to ensure unique decipherabil- ity. We'll apply this algorithm to the source with M = 8 and HY) = 2.15 whose sta- tistics are listed in Table 16.1-3. ‘The Shannon-Fano algorithm involves a suecession of divide-and-conquer steps. For the first step, you draw a line that divides the symbols into two groups such that the group probabilities are as nearly equal as possible; then you assign the digit 0 to each symbol in the group above the line and the digit | to each symbol in the group below the line. Forall subsequent steps, you subdivide each group into subgroups and agtin assign digits by the previous rule, Whenever a group contains just one symbol, 43s happens in the first and third steps in the table, no further subdivision is possible and the codeword for that symbol is complete. When all groups have been reduced to Gne symbol, the codewords are given by the assigned digits reading from lett right. ‘A careful examination of Table 16.1-3 should clarify this algorithm, The resulting Shannon-Fano code in this case has W = 2.18 so the efficiency is 2152.18 = 99%. Thus, if the symbol rate is r = 1000, then r = Mh 2180 bints/see— slightly greater than R = rH(X) = 2150 bits/sec. As comparison, 8 fixed-length code would require W = log, 8 = 3 andr, = 3000 binits/sec, EXERCISE 16,1-2 Apply the Shannon-Fano algorithm tothe source in Table 16,1~2. Your result should be identical to Code IV. Then confirm that this code has the optimum property that 'N, = Jyand that Os and Is are equally likely. Predictive Coding for Sources With Memory Up to this point we've assumed a memoryless source whose successive symbols are ily independent. But many information sources have memory in the sence 16.1 Information Measure and Souree Encoding as symbols, Written lan- that the symbol probabilities depend on one or more pre Buage, being governed by rules of spelling and grammar, provides a good illustration of a source with memory. For instance, the leter U (capitalized or not) occurs in English {ext with probability P(U) = 0.02 on the basis of relative frequency; but ifthe previ- us letter is Q, then the conditional probability becomes P(U'| Q) ~ |. Clearly. memory effect reduces uncertainty and thereby resulls in a lower value of entropy than would be calculated using absolute rather than conditional source statistics. Suppose that a source has a first-order memory, so it “remembers” just one Previous symbol. To formulate the entropy, let P, be the conditional probability that symbol x; is chosen after symbol x,. Substituting P, for P, in Eq. (6), we have the conditional entropy ; : H(x1x) = DP, leas (18) ; 4 Which represents the average information per symbol given thatthe previous symbol Was xj, Averaging over all possible previous symbols then yields A(X) =D BH(X|x) i) An equivalent expression applies tothe general case of a gth-order memory. However, the notation gets cumbersome because x, must be replaced by the state of the source defined in terms ofthe previous q symbols, and there are MY possible states to consider. A source with memory is said to be redundant when the conditional probabili- ties significantly reduce H() compared to the upper bound log, M. The redundancy of English text has been estimated at about 50 percent, meaning that roughly half the symbols in a long passage are not essential to convey the information, For example, yu shd babl tread ths evntho sevrl ltrs r msng. It likewise follows that if uncertainty is reduced by memory effect, then predictability is increased. Coding for efficient transmission can then be based on some prediction method. Here we'll analyze the scheme known as predictive run encoding fora discrete source with memory. Consider the encoding system in Fig. 16.l~4a, where source symbols are first converted to a binary sequence with digit rate Let x() denote the ith digit in the binary sequence, and let %() be the corresponding digit generated by a predictor. Mod:2 addition of x(i) and ¥(i) yields the binary error sequence e() such that e(i) = Owhen ¥(i) = x(i) and e(i) = I when (i) # x(i). The error sequence is fed back to the predictor in order to update the prediction process, The decoding system in Fig. 16.14 employs an identical predictor to reconstruct the source symbols fiom the mod-2 sum x(i) = ¥(i) ® e(). We'll assume the source is sufficiently pre~ dictable that a predictor can be devised whose probability of a correct prediction is p= Ple(i) = 0) > 1/2 We'll also assume that prediction errors are statistically independent, Hence the error sequence entropy is Q(p) < 1 and appropriate encoding should allow information transmission atthe rate ry 0.8. Sane EXERC ile tana SE 16,1 1 fcsimite transmission a texts seamed and sample values are converted to Os and |s for white and black, beste. Since there are usually many more Os than 1, the source clearly has memory. Suppose a predictor can be built with p = 0.9. Esti. ‘mate an upper bound on the Source entropy less than log,2 = | bit/sampl 16.2 INFORMATION TRANSMISSION DISCRETE CHANNELS This section applies information theory to the study of information transmission, We'll wre ine that both the source andthe transmission penned are discrete, so we can meas- As the amount of information transferred and define the channel capacity. Shannon penatod that, with appropriate coding, nearly errorless information transmission is Possible on noisy channel if the Tate does not exceed the channel capacity. This fun- damental theorem will be examined in Conjunction withthe binary symmetric channel, “Snimportent channel model forthe analysis of digital commoner systems, Mutual Information ure the information transferred in this Several types of symbol Probabilities will be needed to deal with the two alpha- bets here, and we'll use the notation defined as follows: * PG) isthe probability thatthe source selects symbol + for transmission, * PCy) isthe probability that symbol y,is received a the destination, * Pl%i23) isthe joint probability that x; is transmitted and is received. * PGi |.y) is the conditional probability that x, was transmitted given that y, is received, * PO; |x) is the conditional probability that y is received aiven that x, was transmitted, Source Destination Y Figure 16.2-1 Dire information transmission sytem, iv on Discrete Channels 16.2. Information Transmissiot Ports) Figure 16.2-2 Forward transition probebilties for o noisy discrete channel We'll assume, for simplicity, that the channel is time-invariant and memoryless, SO the conditional probabilities are independent of time and previous symbol transmis~ sions. The conditional probabilities P(y, | x,) then have special significance as the channel's forward transition probabilities. For example, Fig. 16.2-2 depicts the forward transitions for a noisy channel with two source symbols and three destination symbols. If this system is intended to deliver y, = y, when x; = x, and y, = y, when x, = 1, then the symbol error proba- bilities are given by P(y | x) for # i. Our quantitative description of information transfer on a discrete memoryless channel begins with the mutual information a which measures the amount of information transferred when x; is transmitted and is received. To lend support to this definition, we'll look at two extreme cases. On the ‘one hand, suppose we happen to have an ideal noiseless channel such that each y; uniquely identifies a particular x then P(x; | ,) = 1 and I(x y,) = logs [1/P()] —so the transferred information equals the self-information of x;. On the other hand, suppose the channel noise has such a large effect that y; is totally unrelated to x,; then P(s;|y;) = P(x) and I(x) = log, 1 = 0—so no information is transferred, These extreme cases make sense intuitively. Most transmission channels fall somewhere between the extremes of perfect transfer and zero transfer, To analyze the general case, we define the average mutual information Pleads y) = YP) tows oe bits/symbol (2) 1X) = where the summation subscripts indicate that the statistical average is taken over both alphabets, The quantity (x; Y) represents the average amount of source infor mation gained per received symbol, as distinguished from the average information per source symbol represented by the source entropy H(X). 784 SHAFTER 18 © Information and Detection Theory fo ion can be , Several different but equivalent expressions forthe mutual information Aerived using the probability relationships Play) = Plaily)Pl) = Plyls)Pes) “4 Pos) = SP) Poy) = DP) * 'n particular, let's expand Eq, (2) as 1 = Say) lo oa ~ SA) 9 Fp So the first term simplifies to I z [ Law| logy a = Dra) ow ay = HO) Hence, AX ¥) = HX) ~ HEX|Y) [ where we've introduced the equivocation 6) 1 1) Equation (4) says that the average information transfer per symbol equals the source entropy minus the equivocation. Correspondingly, the equivocation represents the information lost in the noisy channel, For another perspective on information transfer, we return to Eq, (2) and note from Eq, (3a) that P(x; |1)/PC) = PG |x)/P(,) so AX: Y) = 1(Y;X). Therefore, 'upon interchanging X and Y in Eq. (4), we have Mas Y) = H(Y) ~ HOY |x) i may) 2 Pw) be Fy with = Srey HI = Sr) ow ses a Equation (6) says thatthe information transfered equals the destination entopy ‘H(7) minus the noise entropy H(Y |X) added by the channel. The interpretation of H(Y| 2) as noise entropy follows from our previous observation that the set of for- Ward transition probabilities Py, |x) includes the symbol error probabilities, XAMPLE 16,2-1 The binary symmetric chonnel Figure 16.2-3 depicts the model of a binary symmetric channel (BSC). There are two source symbols with probabilities Pn)=p Ply) =1-p 162. Information Transmission on Discrete Channels Fi 3 OE 162-3 Binary symmetic channel BSC) Pile) = Plvr|x) This model Listically eqn TePteSent any binary transmission system in which errors ae st- ally independent and the eror robabilit the average eror probability is Probabilities are the same for both symbols, so Pe= Pln)POa|n) + PordPO | 5) = pa + (I ple = Si , a o ane the forward transition probabilities, we'll use I(X;Y) = fs nt |X) to calculate the mutual information in terms of p and a. ao os entropy H(Y) is easily found by treating the output of the chan- iy source with i =l- Heralds symbol probabilities P(y,) and P(y2) PU). We (Y) = Q[P(y,)] where () is the binary entropy function defined in Eq. (9), Sect. 16.1 (p. oa re ) is the binary entropy ine POn) = P|) Pox) + PO |42)PG:) = a + p ~ 2ap obtained with the help of Eqs. (3a) and (3b). For the noise entropy H(Y |X), we substitute P(x; y;) = PQ |x,)P(x;) into Eq, (7) to get = 1 mig = Sata) 3r0 +) ee ra which then reduces to H(Y |X) = (a). The channel's symmetry causes this result to be independent of p, Putting the foregoing expressions together, we finally have 1X ¥) = fa + p ~ 2ap) ~ Ma) ® 0 the information transfer over a BSC depends on both the error probability and the source probability p. Ifthe nose is small, then « < 1 and 1X ¥) = Q(p) = HX); ifthe noise is very large, then a = 1/2 and i(x; ¥) = 0. Confirm that H(Y | X) = O(a) fora BSC. 786 CHa TERNS Information and Detention Theory ee EXERCISE 16.2-2 ee Consider a channel with the property that x; and y, are statistically independent forall ‘andj. Show that A(X] ¥) = Hd and GY) 0 Discrete Channel Capacity : Tei een that discetememoryeschamels transfer a definite amount of informa- tion 1X; ¥), despite omupting noise, A given channel usually has fixed source and Aestination alphabets and fixed forward transition probabilities, so the only variable inom 8 HAY) are the source probabilities P(x). Consequently, maximun information transfer requires specific source statistics—obtained, perhaps, through “ouree encoding, Let the resulting maximum value of (X; ¥) be denoted by CS max (XY) bits/symbol 6) Per) This quantity represents the maximum amount of information transfered per chan- nel symbol (on the average) and is called the channel capacity, We also measure Capacity in terms of information rate. Specifically, if s stands for the maximum sym- bol rat allowed bythe channel, then the capacity per unit time is C=sC, bits/sec ‘Which represents the maximum rate of information transfer. : The significance of channel capacity becomes most evident in the ight of Shan- ‘on’s fundamental theorem for a noisy channel, stated as follows: (10) Fa channel has capacity C and @ source has information rate R < C, then there exists a coding system such thatthe ‘output of the source can be transmit. ted over the channel with an arbitrarily small equency of erors. Conversely, iF > C, then itis not possible to transmit he information without errors, A general proof of this theorem goes well beyond our scope, but we'll attempt to ‘make it plausible by considering two particular cases, First, suppose we have an ideal noiseless channel with ye = 2° symbols, Then 1061) = HUD), which is maximized if P(x,) = 1 forall. Thus, = max HX) = logs wv C= ay m Errorless transmission rests in this cae on the fat tht the channel is noiseless, However, We still need a coding system like the one diagrammed in Fig. 16.24 to match the source and channel, The binary source encoder generates binary digits atthe rater, = for conversion to rary channel symbols atthe rate s = r/logy su = ry/v. Hence, is Channels 16.2 Information Transmission on Discrete Binary to peat eonversion 2k sary! Fi ti NA24* Erocding eran bx noes dct’ chal RSr,=9=C__ bits/sec re Sptimum source encoding achieves maximum information transfer with Kraft prog tension at & > C would require a coding sytem that violates the is etualty; consequently, decoding errors would occur even though the chan- nel is noiseless, A more realistic case, in oa ‘Ne, + including channel noise, isthe binary symmetric chan fon Eran 162-1, We previously found hat 1X5 ¥) =0(a + p ~ 2ap) - O(a), varige 2) being constant for a fixed eror probity a. But O(a + p ~ 2p) 'S with the source Probability p and reaches a maximum value of unity when at+p-dap= . 1/2, which is satistied for any «if = 1/2, that is, equally likely - y a ifp = 1/2, that is, equally inary input symbols. Thus, the eapacty of BSC ie C= 1 O(a) 04 The plot of C, versus a in Fig, 162-5 shows that C, = I fora < 1, but the capac NY rapidly drops to zero as a 0.5. The same cusve applies for 0.5 Sa $ | if you replace a with 1 — @, equivalent to interchanging the two output symbols. Reliable transmission on a BSC requires channel coding for error control in addition to source coding. Our study of erorcontol coding in Chap. 13 demon- Strated that the error probability per binary message digit could, in fact, be made ‘much smaller than the transmission error probability a. For instance, the (15, 11) Hamming code has code rate R. = 11/15 and output eror probability Py, ~ Mar i: 0S Figure 162-5 Copocily ofa BSC. 788 CHAPTER 14, Information and Detection Theory Sours [Binary] | [-——— ee Tey ee | Pee] fowe Lf we LL abe ary eee sft = Qa} Lasse [72 [atte Lei pee tba Figure 16.2.6 Encoding system for a BSC. Per message digit; if the BSC has a = 10° and symbol rate s, then the Hamming Code yields P,, ~'10-5 at the message digit rate ry = R,s ~ 0.73s. Shannon's theo- the aac’ that beter coding system would yield virtually erorless transmission at the rater, = R C,, and that appropriate channel coding makes it possible to recover the M-ary symbols at the Aestination with arbitrarily low probability of eror—providing thatthe wordlength Nis very large, Infact, we'll eventually let N 2 to ensure errorless information transfer Ideal channel coding for the BSC thus involves infinite time delay. Practical coding systems with finite time delay and finite wordlength will fall short of ideal performance, te Channels 162 Information Transmission 09 Discret Figure 16.2-7. Vector representation of codewords. __The reasoning behind lage wordlength fr eror control comes from the Yeree picture of binary codewords introduced in Sct. 13.1. Specifically, recall that al!) 2" possible words of length NV can be visualized as vectors in an Neimensions Space where distance is measured in terms of Hamming distance. Let the veotor Fig. 162-1 represent one of the M channel codewords, and let” be the rece codeword with n erroneous binits caused by transmission erors. The Hamring tance between V" and V equals n,a random variable ranging from 0 to N. However, When Wis very large, 7” almost aways falls within a Hamming sphere ofr" 4 ¢», Although the random coding approach met with considerable criticism at first, it has subsequently been recognized as a powerful method in coding and information theory, We'll adopt random codeword selection and we'll write the probability ofa decoding error as P.= Pre + Peo Here, Pye is the probability of a noise error, so V” falls outside the Hamming sphere, and P., is the probability of a “code error” in the sense that two or more selected codewords fall within the same Hamming sphere, ‘A noise error corresponds to the event n = d. Since transmission errors are sta tistically independent and occur with probability a < 1/2, nis governed by the binomial distribution (p. 371) with mean and variance Na(I - a) a i= No If we take the Hamming radius as d=NB a 0, the pokey ofa noise enor becomes nishingly small as To f 7 represent rae the probability ofa “ode cor,” let the veetor Fin Fig. 16,2-7 Within th ie OF the selected codewords, and let m denote the number of vectors from the ee ine sphere. The remaining M ~ 1 codewords are chosen randomly rom the entre Set of2" vectors, andthe probabilit of selecting one of the m vectors Yee /2F Heg, ni oblong P. (M = 1)ma-* < Mn2-¥ = m2-Niiere) 09 Where we've inserted a. (13) fr Mand writen C, = I~ fa). Now we need an “Per bound on m as nyo. won OF them vectors within a Hamming sphere of radius d represent binary that ging SUE that tern no more tha places. The number of these words die in places equals a binomial la ‘ " wv ( ") = = hone % (; (i ) ui a There are d+] terms in this sum and the last term is the largest since 4= NIB< NP: hence, M =a! factorial of each of the large numbers Nd = NIB, and ~ B), we apply Stirling's approximation Then, for the N-d=M Ma VirkBet kD] (16) and a few manipulations lead to the upper-bound expression fis NB +1 Vint =p) Combining Eqs. (15) and (17) then gives He) 017) ea Pog < A p-er)-009 19 V2qNB(1 — B) a which is our final result, Equation (18) shows thatthe probability of a decoding error caused by random codeword selection goes to zero as N >, providing that € > Q(B) - Oa). Fora Biven value ofthe channel parameter a, we can take B > a to satisfy Eq, (14) and still have an arbitrarily small value of ¢ > (8) ~ 2a). Under these conditions, ‘we achieve erroriess information transfer onthe BSC with R -> Cas N-> ats and System Comparisons 16.3 Continuous Channs 16.3 CONTINUOUS CHANNELS AND SYSTEM COMPARISONS oes Having developed concep of information transmission forthe simplified Ce case, we're now ready Tre cte the more realistic case ofa continuous Source an channel. : We'll begin with the measure of information from a source that emits a continu- ous signal, The material may seem heavy going at first, but ‘we'll then make some reasonable assumptions about the transmission of continuo signals fo express channel capacity in terms of bandiwidth and signal-to-noise ratio, & result known as the Hartley-Shannon law: This result leads us tothe definition of an ideal communi- cation system, which serves asa standard for system comparisons and a guide for the design of improved communication systems. Continuous Information ‘continuous information source produces a time-varying signal (). We'll eat he set of possible signals as an ensemble of waveforms generated by some random process, asumed to be ergodic. We'l further assume thatthe process has a fine bandwidth, meaning that 2() is completely characterized in terms of periodic sample values. Thus, at any sampling instant, the collection of possible sample val- ‘ues constitutes a continuous random variable X described by its probability density function p(x) ‘The average amount of information per sample value of x(¢) entropy function is measured by the a we 2 [Ao be ei where, as before, the logarithmic bas is 6 = 2. Tis expression has obvious simila~ ites to the definition of entropy fora discrete source. However, Eq (1) turns out to be a relative measure of information rather than an absolute measure, The absolute entropy ofa continuous source can, in principle, be defined from the following lim iting operation. Let the continuous RV X be approximated by a discrete RV with values = i Ax fori = 0,41, 42,... ,and probabilities P(x) ~ ACs) Bx. Then, based on the formula for discrete entropy, let the absolute entropy be Halt) = fa, Ste) le yas " fia, fo) logs sai ~ phx.) logs Passing from summation to integration yields Hy(X) = H(X) + Hol) (20) 792 CHAPTER 16 © Information and Detection Theory Where (26) Ad) = = tim log, Ax | ple) dx aes i = log; 0 = 09, Mich isthe reference for the relative entropy HUN). Since H(A) = ~Uobat the absolute entropy ofa continuous source is always infinite— use! De able result in view of the fact that 3’is a continuous RV with an uncount of possible values, ‘ i" i tion from Relative entropy, being finite, serves as a useful measure of inom ae Continuous sources if you avoid misleading comparisons inves ie oe ences. In particular, consider two information signals x(¢) and z(t). Fo Parison of their entropy functions we wrile Hy(Z) ~ Hyy(X) = HZ) — HX) + (HolZ) ~ Hol] where jim, log: Ax i HZ) ~ Hy{X) = — Jim, logs Ae + fim, Az | Ax, dz dc log, = - lim logs as,de0 it I. Tfthe signals are related in some manner and if |dz/ds| = 1, then the reference va ues are equal and H(Z) and H(X) are directly comparable. The inherent nature of relative entropy precludes absolute bounds on H(). In fact, the value of H7(X) can be positive, zero, or negative, depending upon the source PDF. Nonetheless, a reasonable question to ask is, What p(x) maximizes H(X) for a given source? This question is significant because a given source usually las specific signal Constraints, such as fixed peak value or average power, that limit the possible PDFs. Stating the problem in more general terms, we seek the function p = p(x) that maximizes an integral of the form 1= [rene whose integrand F(x, p) isa specified function of x and p. The variable function p is subject to a set of& constraints given by i [recor a with the cy being constants. A theorem from the calculus of variations says that / is maximum (or minimum) when p satisfies the equation oF a —+ Sast=0 nt PNG a where the A, are Lagrange’s undetermined multipliers, The values of A, are found by substituting the solution of Eq. (4) into the constraint equations, system Comparisons Channels and 163. Continuous , Eq. (1). . gs defined by Ea ( In the problem at hand, we wish to maximize / H(X) as a ‘Thus, we take F(x, p) = p log: (W/p) = ~P (tn p)/in 2 Furthermore, p(x) must obey the essential PDF property [noe 6) so we always have the constraint function and constant Fix.p) . c Additional constraints come from the particular source limit the important example that follows. tations, aS illustrated in Source entropy with fixed average power oe in terms of Consider the case of a source with fixed average power defined in tert (6) S= [? pix) de dy and cy = S. In serting F, which imposes the addtional constraint funtion F, = «7p and ca s. F,, and F; into Eq, (4) yields (inp +1) +A tax =0 Thus, where In 2 has been absorbed in A; and A Actase P(x) = Air using Eqs. (5) and (6) to evaluate the multipliers, we get - ” pix) Te 4 gaussian function with zero mean and variance o” ' ; The corresponding maximum entropy is calculated from Eq. (1) by writing logs 1/p(x) = log, V2m5 + (x7/25) log, e. Therefore, H(X) = log, 278 + } log, ¢ = 5 log, 2reS (8) obtained with the help of Eqs. (5) and (6). Note that this relative entropy has a nega tive value when 2meS < I. Even so, for any fixed average power S, we've established the important result that H(X) = } log; 2eS and that the entropy is maximum when p(x) is a zero-mean gaussian PDF. Other source constraints of course lead to different results. 794 CHAPTER 14 Information and Detection Theory BBRCSEVEST Suppose source poo el) = A) BY Pose a source fy imitation, such that ce finding the Ms a peak-value limitation, suc -M & yy Letthe source '8 the PDF that maximizes H(X), show that H(X) = log2 on cea that the way ; able” © aplfed to produce se) = Katt). Find UZ) Sntfopy remains unchanged even though (2) # 1X)- : a ae ; a _iaumees Continuous Channel Capaci ‘apacity ‘i ; ansmission. Tactation transfer ona continuous annel takes the for of See ee appeals at the cee tS signal x(2) which, ater corruption by transmission Son is defined vakatitation as another signal (0). The average mutual information is by analogy with the discrete case to be 2 pel¥) 1 n, a ray & {] Patt) Woes ey tet Where 2s(x) isthe source PDF, py) isthe joint PDF, and s0 on. Averaging vith Toe ry 2 POX and ¥ removes the potential ambiguities of relative entropy. us {62 measures the absolue infomation tanser per sample values of 4?) ee destination. It can be shown from Eq, (9) that /(X; ¥) = O and that /(%, ¥) = 0 when the noise is so reat that (1) is unrelated to x(t). We Usually, we know the forward transition PDF py(y |x) rather than p(s |»). We then calculate /(X; ¥) from the equivalent expression W(X; Y) = H(Y) - H(Y|X) 'nwhich H(Y) isthe destination entropy and MY | X) is the noise entropy given by (10) Hx = [ [” ‘ lx) los dx dy er) = [fades ow 5 If the channel has independent additive noise such that y(t) = x(2) + n(t), then Pr |x) = pi(x + n) = puly — x), where py(n) is the noise PDF. Consequently, H(Y |X) reduces to Pan) ari = [pvt ie a independent of p(x), Now consider a channel with fixed forward transition PDF, so the maximum information transfer per sample value of y(4) is om max 10% ¥) _bits/sample 12) If the channel also has a fixed bandwidth 8, then y(¢) is a bandlimited signal com- pletely defined by sample values taken at the Nyquist rate /; = 28—identical to the maximum allowable signaling rate for a given bandwidth B. (Samples taken at omparisons "63 Continious Channcs and System Compariso YaUist ra aoe al infor- XMM rate pl tbe independent and carry no alton i of information ‘Uansfer then becomes 8, C= 28C, bits see (03) of a bandlimi continuous channel, Shannon’s funda~ Be nel pls here inthe ee hat ress sis : att Any information rate R Cin the limit as 7» x—which means thatthe number OPS ically the coding delay becomes infinitely large, so the ideal system we design uunrealizable. However, real systems having large but finite Mand 7 ¢20 Hae to come as close as we wish to ideal performance. (One such syst iS °°, hypor Sect. 16.5.) With this thought in mind: let's further consider the properties ° thetical ideal system. : ental role The capacity relation C= B logs (1 + S/N) underscores the funda can of bandwith and signal-to-noise ratio in eommunication. I a0 sHONS I ty exchange increased bandwidth for decreased signal power, & AN"e PTE ey, observed in wideband nose-reduction systems suchas FM and PCM. THe Tg, Shannon law specifies the optimum bandwidth-power exchange ands moreoven® gests the possibility of bandwidth compression. NoB: W& Keeping in mind that noise power varies with bandwidth as N= "0°" explore the trade-off between bandwidth and signal power by Writing 5 3) C= Bl (: + 5) lon + a5 ; ‘Thus, if Nand Rave fixed values, information transmission atthe rate R$ C requires SL Bion (16) Se A (qn Nee (2-1) which becomes an equality when R = C. Figure 16:32 shows the resulting plot of > Ca for- 5/NoR in dB versus B/R. The region on the lower left corresponds to R 4 bidden condition for reliable communication. This plot reveals that bandwidth compression (2/R-< 1) deroauds.aideamatio dnctiase of vigil powet, while bandwidth expansion (B/R > 1) reduces S/NjR asymptotically toward a distinct limiting value of about-1.6 dB as B/R >». In fact, an ideal system with infinite Banchvidrh has finite channel capacity given bY a s S si Cy = lim Bh +o )}= = 144— (7) 00 gi ja(1 5) Ny in2 en Equation (17) is derived from Eq, (15) written in the form __sin(i+a) _ 5S CoNa 2 MB The series expansion In (1 +A) =A 2 4 -+-then shows that [In (I +A)]/A— 1 as \ 0, corresponding to B > %, Note that C. is the maximum capacity for fixed Sand No, so % 798 CHAPTER 16 1.648 BR "2 5 0 2 Figure 16.3-9 Tradeoff between bandwidth ond signal power with an ideal system. SINR = SINC = In 2 ~ =1.6 4B erat? it Fig. 163-2, we conclude that C= Cos for BIR > 10. It must be Stressed that these Tesults pertain to an ideal but unrealizable system, Consequently, th 'stimate upper limits on the performance of any eal system whose transmissi ‘ir From the shape of th Biven the following information, A digitized picture will cons each pixel having one of 16 and picture transmission t RT = ny logs 16 = 480,000 bits assuming equally likely brightness levels, The Mars-to-Earth link is a microwave radio 81 = 26 dB, and gp = 56 dB, so : 7 system Comparison 9 163 Continuous Channels and System COM 5x10" W The noise density at the receiver is No = Ty =8 X 10-2 W/Hz obtained trom Eq. (6), Sect, 9.4, Since no transmis ideal system would t 10. An ission bandwidth was specified, let’s assume that BIR > then have RS C=C, = 1.445/Np ~ 9000 bits/sec . and the corresponding bandwidth must be B > LOR = 90 kHz, Therefore, the wan mission time per picture is 7x 480,000 bits 9000 bits/sec A real system, of course, would require more time for picture tan Point dete is that no system with the same assumed specifications eam 87 smaller transmission time—unless some predictive source encoding reduees RT = 53sec __ — EXERCISE 1 A keyboard machine with 64 different symbols is connected to a voice telephone ba channel having B = 3 kHz and 5/NoB = 30 dB. (a) Calculate the maximum en ble symbol rate for errorless transmission. (b) Assuming B can be changed Wh! other parameters are fixed, find the symbol rate with B = I kHz and B— 00- System Comparisons Having determined the properties of an ideal communication system, we're ready 0 reexamine various systems from previous chapters and compare their performance in the light of information theory. Such comparisons provide important guidelines for the development of new or improved communication systems, both digital and ana- log. For all comparisons, we'll assume an AWGN channel with transmission band- width By and signal-to-noise ratio (S/N), = Sp/NyBy at the receiver. Consider first the ease of binary baseband transmission. Previously, we found the maximum signaling rate 2B and minimum transmission error probability OLVTGN)g], which requires the use of polar sine-pulse signals and matched filter- ing. Now we'll view this system as a binary symmetric chanel with symbol rate 5 = 2B, and transmission error probability a = Q[V(S/N)a]- Hence, the BSC sys- tem capacity is C=2B[1-9(@)) «= OVEN ae where (a) is the binary entropy function from Eq. (9), Sect. 16.1. With sufficiently elaborate error-control coding, we can obtain nearly error-free information transfer aRsc. 800 CHAPTER 16 @ Information and Detection Theory (SIN), AB 7 ise ralic ind ideal Figure 163-3 Chonnel bi rale/bondwidh versus signaHocise ralio for @ BSC a cconlinuous system, for a BSC and for an ideal continuous i ~ (S/ Figure 16.3-3 shows C/Br versus (S/N)x oye flattens off at C/Br = 235 system with C/By = logs [1 + (S/N)q]. The B: ens off at C/B: ‘ me a ie Ce Eis of a noise-free binary signal is one bit per symbol. Hence, we must use M-ary signaling to get closer to ideal performance when (S/N), >> 1. At lower signal-to-noise ratios, where C/Br < 2, the gap between the BSC and ideal curves suggests that there should be a way of extracting more information from a noisy binary signal. This observation has led to the sophis~ ticated technique known as soft-decision decoding in which binary codewords are decoded from sample values of the continuous signal-plus-noise waveform, rather than from a regenerated (hard-decision) binary waveform. Now let’s examine the performance of M-ary digital communication systems without error-control coding. We'll assume a binary source that emits equiprobable symbols at rate r,, Ifthe error probability is small, say P,, = 10*, then the informa- tion rate is R ~ r, since almost all of the binary digits correctly convey information to the destination. For transmission purposes, let the binary data be converted to a Gray-code M-ary signal with M = 2, An M-ary baseband system with polar sine pulses and matched filtering then has ne 2 6K Bo Pe x! sd on) mm where y, = E,/No and E,, = Sp/r,, which is the average energy per binary digit. Upon solving Eq, (19) for the value of y, that gives P,, ~ 10°, we obtain the plot of r4/By versus y, in Fig. 16.3-4. This same curve also applies for bandpass m Comparisons 16.3. Continuous Channels and Systen nlBr 160 10 20 » dB Figure 16.34 Performance comparison of ideal sytem and Mery system with Por = Wo". Analog source AWGN chase Destination Modulator Domed C= Bplogy ( +) Figure 16.3-5 Analog modulation system transmission via APK modulation with M replaced by Vi, while other modulation ‘methods have poorer performance. The comparison curve for an ideal system is cal~ culated from Eq. (16) taking R = r, and S/NoR = Sq/Nors = Yo. We thus see that real digital systems with »)/B, 2 2 and small but nonzero error probability require at least 6-7 dB more signal energy than an ideal system. Error-control coding would reduce the energy needed for a specified error probability, thereby shifting the sys- tem performance curve closer to the ideal. Finally, we come to analog communication in which bandwidth and signal-to-noise ratio are the crucial performance factors, rather than information rate and error probabil- ity. Consider, therefore, the analog modulation system represented by Fig. 16.3-5, where the analog signal at the destination has bandwidth 17 and signal-to-noise ratio (S/N)p. The maximum output information rate is R= W log, [1 + (S/N)p), and it ‘cannot exceed the transmission channel capacity C = By logy [1 + (Se/NoBr)]. Set- ting R < Cand solving for (S/N) p yields ° ) -1 (20) (5),= (+) S\ e(1+ =)" - N/o ia) R= Woes [+S 802 Sree 16 Information and Detection Theory 0% Figure 1 a 6.3. “S$ Petformonce omarion of analog modvlaon eystens Where 4 i: Normalized parameters b and + are defined by b= BW y= Sy Ny Equation oe ohare wane quality for an ideal sys Particular, wer Y With fcr?! ompare the analog performanes “xed bandwidth SUS b With fixed (S/N)p. Tatio band the power-bandwidth exchange given by ver- be Fieure 163-6 repeats some of ea ious analog performance curves for ie hh together With the curve for an ideal system. The he avy dots mark the thresh- » and We see that wideband noise fall short of deat ‘igure 16.37 depicts the ower-bandwidth exchange of analog modulation, sys tems, taking (S/¥)9 = 50 dB and § 1/2. SSB and DSB appear here as sin. Ble points since bis fixed, and SSE x equivalent to an ideal system with b = | Meaning no wideband j Previous equations, an old. Observe that Pt improvement, Based on these two. ‘Comparison figures, we conclude that digital transmission via PCM makes better use of the ‘apacity of continuous channel than conventional ana- og modulation. This difference between digital and analog transmission of analog

You might also like