You are on page 1of 33
CHAPTER EX] wtropuction 11-2) With the pioneering work of Wiener, Shannon, Gabor and others starting in mid-40’s, the study of information and communication has been formalised in the form of statistical communication theory or the communication-oriented Information theory. This enables theoreticians as well as practical designers to evaluate existing and futuristic communication systems, to evolve new codes for signal design and to optimize information transfer in the transmitter-channel-receiver path. For a generalized communication link, the information is generated in a source, transformed and coded in a suitable signal, transmitted to 2 channel and then transferred to a sink through a decoder-receiver. For efficient transmission of information, it is necessary to transform the information at many stages as dictated by the characteristics of the channel and the optimum receiver. The communication channels are broadly classified as: (a) discrete noiseless channel, (b) discrete channel with discrete noise, (c) discrete channel with continuous noise, and (d) continuous channel with continuous noise. For the discrete noiseless channel, e.g., in coding language information, one is led ‘0 Shannon-Fano-Huffman codes. For the discrete channel disturbed by discrete noise, the information is protected by error-correcting codes, as discussed in the last chapter. For discrete channel with continuous noise (mostly white and Gaussian noise), e.g., in binary and multilevel PAM transmission, the optimum receiver uses a correlation detector and the coding problem is a search of waveforms, which are mutually as uncorrelated as possible, e.g., orthogonal codes. For the continuous channel with continuous noise, solutions are found in the form of wideband modulation systems, e.g., FM, PPM, PAM-FM, PCM-AM, PCM-FM, and coherent detection techniques using matched filters and correlators. ‘The major developments in the area of information theory started with Shannon's classical theorems ‘on ‘Code capacity’ and ‘Channel capacity’, which specify the maximum information that may be generated by a set of symbols, and the information that may be transinitted by a physical noisy channel, Shannon's theorems did not provide the methods of achieving the ideal channel capacity, given the statistics of the channel, but only indicated a yardstick, with which all practical communication systems can be tested t© evaluate their efficiency. Since the time of publication of Shannon’s results in 1948, the theoreticians and the designers have been constantly trying to find methods of coding the information in such a way a8 10 Y ach the so-called Shannon's limit. Large amount o er gesgn i the result ofthis effort, Information Theory 485 f developments in coding techniques and optimum pis chapter discusses the information measure of sources, including Markov sources, language city, code capacity, and channel capacity, Fundamental theorems of Shannon are discussed and the «used 1 evaluate some ofthe existing communication systems, Noiseless coding as well as coding goisy channels have been illustrated. Exchange of bandwidth for signal-to-noise ratio has also been sgoussed. INFORMATION AND ENTROPY [1, 2) fhe quantitative measure of information is based on out intuitive notion of the word ‘information’, i.e, more unexpected an event is (with a priori probability p small), the more information is obtained spen the event occurs (With a posteriori probability of the event = 1), Conveniently this information is agressed as: 1 [x)= ae — log p(x) (5.1) stere p(x) = probability of occurrence of x, and unit of 1(x) depends on the base for the log {_] in wp. (6.1) Different units are known as: I(x) = log, p(x) bits, Jog, p(x) = -In p(x) nats, (5.2) = -log9 p(x) Hartleys nin general, Ix) = ~ logy p(x) D-ary units. The information measure can be any monotonically decreasing function of p, but the special choice of log pis justified because of its following useful properties: (i) The information content of a message increases as p(x) decreases and vice versa. (ii) 1(x) is a continuous function of p(x). (iii) The total information of two or more independent messages/events is the sum of individual informations. The property (iii) is evident from the following example: If the two independent events (x), x9) have ‘ror probabilities of p, and py, then the joint probability oftheir occurrence = p= pp, and the information ‘sociated with the events 10x, x5) = og p = -log p, — log p = -Ylo8 Consider a zero-memory information source S delivering messages from its alphabet X= {17,x3,.. s+.) with probabilities p {x} = (Py, P2, «-» Pn Pm. IE N messages are delivered, (N — 00), then ‘mmbol x, occurs Np, times and each occurrence conveys information -log p,. Hence, the total information } Aue to Ap, messages = —Np; log p; and the total information due to all N messages = -N’" p, log p,. weEatopy HIQd ofthe source Ss now defined as the averge amount of information per source symbol ke 486 Principles of Digital Communication (5.3) HX) Thus the entropy is associated with a source, whereas information is associated with a message, Hx) ig also referred to as Marginal entropy, and p(x) as Marginal probability. The entropy, as defined in eqn, (5.3), has the following important properties: & pj log p, with Y p, =! 1 (i) H(X)=0, ifall ps except one, arezero. m ™ @i) ACO = - Y P; log p; ~-E he, hem ifP) = Dy... = Pp (5.4) and log m is maximum for any possible set of p/s. This can be shown by writing the entropies in nats, log, m— H(X)= LP: log, m+ Dp; log, p,, Pi # B; a a = LP; log, mp). 1 By using the inequality, (55)* In (1x) 2 (1-2), ~ 1 -H( ye Inm 092 Sa =) 2 Sn-2¥e/0) T 1 20 “The inequality of eqn. (5.5) is obtained from the well known ‘inequality: In x (1 =x) (5.54) Graph A Information Theory 487 snd Inm = H(%), if and only if p, = p, abinary source with alphabet {0, 1}, and probabilities (p, 1 -p), for 1 HO) = PIB +0 p)logs -P +H) (5.6) apiotofH(p) is shown in Fig. 5.1 where itis seen that £(p) is Afimum with H(p) = 1 bit, only when p= (1 ~ p)=0.5, For symbol source, the maximum Entropy HUX)naq increases “iy as log m, and for any other p(i) # p(j), the entropy of the gurce i less, as shown for the following examples. grample 5.1: A source delivers 4 symbols xy, x, x3, x4 with pixy={h.4.h.4} then, 209 = -¥ p, tos p; =—|tbog d+ dioe) +2 toe - -[f08} + los 4+ Zoe §] = 1.75 bits. Figure 5.1 1 however, all p(i)’s were equal having p(i) = % , then max H(X) = log, 4 = 2 bits. Example 5.2: In English language, the probabilities of occurrence of the 27 symbols (26 letters + space) ae not equal and they are tabulated in various texts. The symbol p(i)’s in the descending order are as follows: Symbols ]Space] | T [A Dea B v Lolz [ pli) 0.187 | 0.107 | 0.086 9.067] --- 0.031} 0.028 | ... | 0.012 | 0.008 +. | 0.001 0.0006; 2 and H(x) = ~ Pi log p; = 4.065 (as calculated from all p(i)’s). 1 I however, we choose the letters at random and with equal probability, then the information value of all x 1 leters are the same and H(X)myax of the source = — Lizz log (1/27) = 4.76 bits/symbol. This means that 1 the length of messages could be shortened by the ratio of .065/4.76 = 0.854, if equal probability random “oding is used in forming English words. In other words, approximately 17 letters (= Antilog, 4,065) “ould be sufficient to express English messages, if the choice of the letters are ideally random. S44 Entropy of Several Alphabets (sider two sources S, and $3 delivering symbols x; and y, and there may be some dependence iclation) between x’s and y's. In conformity with the definition of joint probability p(x, yy), the PY of the joint event (x; yj), also known as the Joint Entropy of the sources S, and S,, is defined as: “a 488 Principles of Digital Communication mo HX, Y= ~LY rls, y,) log pl, »,) 67 fal jat where PCs ¥) = ple) POY) (1a) If {X} and {Y} are independent of each other, then p(x, y,) = PPO). and H(X, Y) = H(X) + H(¥) (5.8) Similarly, the Conditional Entropy H(¥/X), the entropy of Y given that X has occurred, is defined as: HOI) = ZY pO, 9) 08 pO) (a) at If (X} and {Y} are independent, then, H(YIX) = -Y,Y p(s, y;) log p(y) TI = ->p0) log p(y) 1 = H(Y) The interrelations between the above entropies, defined by eqns. (5.3), (5.7) and (5.9), are given as () AX, Y) = HX) + HX) (5.10a) = H() + HOY) (5.106) (i) HO) > HOY) (5.100) H(1) 2 HX) (5.104) ii) HO, 2) s HO) + HO) (6.106) The relation (i) is easily shown by expanding H(X, Y) as: HX, Y= -LD pls)p(//) log (p(x) p(v/)) a] = -LY pls))p(v,/x) log pls) aa LY r)P0//) log p(v//x;) = 09 ' H(YIX) (5.10a) “Equation (5.9) is derived by first defining H(Y/x,) = Soo, ‘x;)- log p(y) and then averaging this overall values of x, giving, HQYIX) = E(HOYIx)} = Lospwer's) = Looe) log p(y,/x,) = -LYpt«,»,) log pv/x,) (59) r Information Theory 489 (37)/%) low PC) SL PePOr we BY = -Lts) log p(x) = HX) T acy. €0°- (5.10b) is proved, by using the relation: i plan) = PEDPO/) = PO:DPL/>) relations (i iS proved by using the inequality of eqn. (5.5). By writing the entropies in nats, ¢ 1 yo) - HOM LP) log PR) + LY px, v,) log p(x/y,) i TS = TL.» tel "52D "5 PO) pyoan. (5-5) yy) | 1-2@2. Lus. = DEP »l Peal) = LLG, »)- pO) P@)) a) = Lle@)- P&I 20 (5.10) Equality occurs when {X}, {¥} are independents, giving H(X/Y) = HO). Similarly eqn. (5.104) is proved. The relation (5.10e) follows from the eqns. (5.10a) and (5.104). The above results for two variables may be extended to three variables and more, giving the relations: H(X, Y, Z) = H(X, ¥) + H(Z/X, Y) AUX, Y) 2 HX, ¥/Z) GAD aud for g-variables, : HS yy Spy 5 Sq) = H(Sy. Syy oon Sq-a) + HSYS15 Say +++» Sgt) where H(S}, .... 5g) is the joint entropy of all sources Sy «.. to S,. Example 5.3: Consider that two sources S; and S, emit messages x, x3,x; and )), y2, 3 with the joint Probability p(X, Y) as shown in the matrix form. Calculate H(X), H(¥), H(X/Y) and H(¥/X). Given: ies 3740-1740 1740 1/20 3/20 1/20 18 1/8 3/8 PK Y) x2 x3 490 Principles of Digital Communication By adding the entries in rows and in columns, we get, Px) = 1/8, ploy) = 1/4 and p(x3) = 5/8 P01) = 1/4, p(y.) = 3/10 and p(y) = 9/20 The matrices p(¥/X) and p(X/Y) are calculated by using the eq. (5.7a), and they are: Mn x /35 15 15 PUYIX) xy | VS 3/8 VS xy |S 1/5 3/5 and Ww x, [310 1/12 18 PAM) 9x] 1S 12 1/9 x3 | 1/2 5/2 5/6 Using eqns. (5.3), (5.7) and (5.9), H(X) = 1/8 log 8 + 1/4 log 4 + 5/8 log 8/5 = 1.3 bits/symbol H(Y) = 1/4 log 4 + 3/10 log 10/3 + 9/20 log 20/9 54 bits/symbol H(¥IX) = [3/40 log 5/3 + 1/40 log 5 + 1/40 log 5 + 1/20 log 5 + 3/20 log 5/3 + 1/20 log 5 + 1/8 log 5 + 1/8 log 5 + 3/8 log 5/3] 37 bits/symbol. .13 bits/symbol HY) and H(X, Y) = 2.67 bits/symbol < H(X) + H(Y) The example illustrates the inter-relations between different entropies. The matrix p(¥/X) is also known as the channel/transition/noise matrix and is graphically shown in Fig, 5.2. Figure 5.2 a mmunication Network 01 Information Theory 491 2 A «pretation of the different entropies, defined so far, may be given in terms of the input/ 7 it nmunication channel, a8 shown i Fig, 53. A coder normaly changes the statistics fof 2c08eT pols {xi} and is characterized by the transition matrix p(¥/X). Similarly in an actual e source SY al, the statistics of the transmitted symbols is changed due to the noise present and the sion) have to be optimally interpreted from the inverse probability matrix p(X/¥). We jv Sea different entropies as: 108 - Source! T Coder! Transmitter L Channel | Receiver He) wo wrx) wo 7) Figure 5.3: A communication system model H(X) = average information/symbol of the source, i.e., the entropy of the source (also known as Marginal entropy). sntropy of the receiver HY) H(X, ¥) = average information per pairs of transmitted and received symbols (xj, y)). equivalently the average uncertainty of the communication link as a whole (ie. the Joint entropy of (x, ¥))) H(YIX) = the entropy of the received symbols when the transmitter state is known. This conditional entropy is the measure of information given by H(W/x,) averaged over all {x;}. H(XIY) = the entropy of the source when the state of the receiver is known. This measure of information given by H(X/y,) averaged over {y,} is also known as “Equivocation’. luspte of the change of the transmitter statistics p {x;} due to noise in the channel, certain probabilistic ‘formation regarding the transmitter states is available, once p{x/y;} is calculated at the receiver. The ita uncertainty regarding {x,} is log p{x;} and the final uncertainty about {xj} after the reception of ‘y} is og p{x/y,}; thus the Information gain through the channel (also known as Mutual Information niained in {x/y;} or Transinformation of the channel) is given by: Pixlys} | Pix. ¥j} Kg y) = log 22! = tog| PPE om ete) [ee] Piya} = Dcaprqe eat (5.12)* ¥ oie Channa ery, "(53 ‘0 X)) = “Log p(x) of eqn. (5.1). It then follows that H(X) = Dele). which is the same as | PCx/y) = 1, and I(x, »3) =— log p(x); and this information is usually known as: 492 Principles of Digital Communication Averaging eqn. (8.12) over all possible values of x, and y,, one obtains the Average Information ai by the receiver, and is given as met WAY) > EU yp) L¥ pc,» 105, 9) bits/symbol fale plxily)) LE pty.y) toe a a) = H(X) - H(X/Y), by eqns. (5.7) and (5.9). (5.13) By using eqn. (5.10), 1X, Y) = HY) ~ H(YIX) = H(X) + H() - H(X, ¥) bits/symbol (5.14) where H(X7) is specially called ‘Equivocation’, meaning thereby the loss of information due to channe! noise. Further for a noise-free channel, H(Y/X) = H(X/¥) = 0, and U(X, Y) = H(X) = H(Y) = HX, Y) ( For a channel where {x;} and {y;} are independent and have no correlation between them, HX’ Y) = and H(¥/X) = H(Y) and U(X, Y) = H(X)- H(X/Y) = 0 (5.16) thereby no information is transmitted through the channel. x) Example 5.4: Using the channel matrix, given in Ex. 3, one now obtains: U(X, Y) = H() - HX) = 1.54 - 1.37 = 0.17 bits/symbol = H(X)- H(X/Y) = 1.3 - 1.13 = 0.17 bits/symbol. Further discussions of the channel, e.g., the Rate of Transmission, Channel capacity etc. will be taken up in a later section. 5.1.3 Extension of Zero-memory Sources [5, 6] If multialphabet source-outputs are to be coded into words of a lesser alphabet, then it is necessary 10 have an extension of the coder alphabets. Consider that 8 messages are to be coded with a binary alphabet (0/1), then itis necessary to have the binary words as: {000, 001, 010, .... 110, 111}, leading tothe thd extension of the source coder. In general, for m messages, with m = 2", an nth extension of the binary alphabet is necessary. Thus if zero-memory source S has the source alphabets {x}... .. %m!, thea the nth extension of S, called S", is a zero-memory source with m" symbols {y(1), (2)... m")} a8 18 higher-order alphabet. The corresponding statistics of the extended source S is given by: PAY) = bPOG PU) «+: Pn) since each of yj) corresponds to a word made of n alphabets of {x,}, ie. YU Opps pay eon Xj) The condition that’ plv(/)} = 1, is satisfied, since, nl Information Theory 493 Lv = Lire ror)... pee = mI md Ms Ms (PC) PCa). PCE) A= Y. ’ j = Yr) Y ple) ¥ ptt) im pa it = (since each of }¥=1). ‘The entropy of the extended source 5" is now given by: Hs) = Spo} log — jal PL} = nH(S) (5.17) Thus the entropy of source S” is times that of the primary source H(S). Example 5.5: Consider the third extension S? of a binary source with p(0) = 0.25 and p(1) = 0.75. having the primary source entropy H(S) = 1/4 log 4 + 3/4 log 4/3 = 0.815 bit. The extended source S? has the alphabets and the corresponding p{){/)} given by — Symbols of S® Yt Ya Ys Ya Ys Ye. vy | ¥ Alphabets (000 001 0710 ot 100 101 m0 | 11 PY} 1/64 3/64 3/64 9/64 3/64 9/64 9/64 | (27/64 Itis observed that: 8 YP) =1 i=l and H(S®) = 1/64 log 64 + 3 * 3/64 log 64/3 + 3 9/64 log 64/9 + 27/64 log 64/27 = 6/64 + 9/64 x (6 — 1.58) + 27/64 x (6 -3.17) + 27/64 (6 ~ 4.75) = 2.44 = 3H(S). Example 5.6: Consider a source with alphabets x,, x2, X,.xy and a priori probability p{x,} = (1/2, 1/4, V8, 1/8}. Thus a typical message will be: Coo gM AL YM Ad Satisfying p {x,}. The entropy of the source H(X) is: 4 H(S) = H(X) = -y p(4;) log pQ;) = 1.75 bits/symbol. a 494 Principles of Digital Communication Suppose the source is extended to deliver messages with two symbols, then the new alphabets ang the corresponding p{y(/)} are given as: [Symbols of $3] y, | yo | Ys | Ye | Ye | Ye | ¥7 | Ye | ¥9 | Yo | Yn | Y2 | Yi3| M4] Vis | ve Alphabets [xqxq] 24x | x4Xq | 4X | oxy | XoXo | aX | XoXa | X9%1 | X9%2 | X99 | X94 | 44%1| Xara] Xyx Bis 64 Para | gate | RecA ah fet Weed foe ae ee | se fee ne |e en 0 Pot} 4| 8 |%6| %6| 8 | 46 | 32 | 32 | 16 | 32 | 64 | 64 | 16] 32 gi/- and H(S?) = 1/4 log 4 + 2/8 log 8 + 5/16 log 16 + 4/32 log 32 + 4/64 log 64 = 3.5 bits/symbol pair = 2H(s). oe ae EE] source ENCODING: NOISELESS CODING [4, 5] In practical communication systems, it is necessary to transform the m-ary source alphabets {x;... x,,} to a convenient form, say, binary or D-ary digits to match the channel. Since the rate of transmission of information in a noisy channel is maximum if the source probabilities p(x)) are all equal ie., H(x)= log m (this will be shown in a later section), itis desirable that the transformed code alphabets {C}, C>,..., Cp} have equal probability of occurrence, hence the technique is also known as Entropy coding. To avoid ambiguity in deciphering the codes, as well as, to minimise the time of transmission of the message, the codes should have the following properties: 1. Codes should be ‘separable’ or uniquely decodable. Also they should be ‘comma-free’,i.e., no synchronising signal should be required to recognise the words. This restricts the selection of codes in such a way that no shorter code can be a prefix of a longer code. Such codes are also called Instantaneous codes. 2. The average length of the code words L=%p;l, L,= length of ith code word, should be minimum, and Z should approach the value [A(XVlog DI with the condition Z > H(X)/log D. (This will be proved later). Since Ep,L,; is to be minimum, the code words with larger Z; should have smaller p;, i.e, 1, log (Wip;) (5.18), (3.19) 3. The code efficiency 7, is defined as = OO any 5 1 for the opti Iso called nT T= [jog p S241 | forthe optimum (also called compat) codes. Further, Redundancy of codes is defined as R= 1 - n. Example 5.7: Consider a source alphabet of 4, B, C, ..., G, H having p(x) given as: px) = 1/2, 1/4, 1/16, 1/16, 1/32, 1/32, 1/32, 1/32. If we have to code the alphabet with binary (1/0) digits, then the code generating process is shown In Table 5.1. The symbols are arranged according to their decreasing probability. Then the Ist selection is made such that the symbols are divided into groups such that ,(x;) in each group = 1/2, thus symbol 4 is designated by 1 and the rest by 0. aa Information Theory 495 Table 5.1 eccoccos .B ... H) are divided into two groups B, and {C, D, ..., Hf}, 50 that Zp(x,)= 1/4 in bols {: renters now designated by | and the rest by 0. This process continues till we complete eh BP pe last symbol. The resulting code is shown in the Table. The following properties of the i ys evident amber of its required For any code = log Up), and, sey, for Fs a 5 bit code is used and @ © 0g 32)=5- 3x4 y F=Dpis 12 +1 + as A+ = 2.125 HQ) = LPO) log Mp(%) = 2.125 bits/symbol = Z. (9 Ina long message derived out of binary codes, number of occurrences of 1" ~ 1 1 41/16 + 1/16 + 1/32 + 1/32 = 17 and number of occurrence of 0" = 175 Therefore, by normalising, p(0) = (17/16)/(34/16) = 0.5 = p(1), and the coder ouput has ‘maximum H(X) in terms of binary digits. (@ Ifthe 8 alphabets were directly coded into binary digits, then codes would be {000, 001. 010. .... 110, 111}, each symbol requiring 3 digits, and given pil, = 3 bit/symbol > W(X). The above technique of grouping the symbol probabilities in two halves and then designating each ‘ufwith 1/0 is also known as Shannon-Fano technique. We have however considered a very convenient Pies PU), where log I/p(x;) was an integer. In a general case, where log 1/p(x)) is not an integer, iunnon [4] has suggested that L; should have the bounds (for binary coding), 5.20) 1 log ——~ <1; log lip) +1 ' Play <215!08 Vee) ¢ thea ‘ Size m is large, then the resultant code alphabet is quite efficient, but for small 7, the log Upc, tucient, as will be seen from the following example. Black [15] has suggested that ‘P(x;), may be taken as the code-length. Bam ple 5,8: Fang incon. again 8-alphabet source with p(x;) as shown in Table \g to Shannon- Pes of log ps town in Example 5.7, the binary codes are generated as shown. Inthe table, the “Bing ‘P(%)) are also shown, and it is seen that L; = log I/p(x;), as suggested by Black. However, ea by @ : ‘9%. (5.20) is larger than the actural values in the Table. ~~ 496 Principles of Digital Communication Table 5.2 0 1 0 | 0 1 1 1 1 roammoog>| aa alols r 1 0 1 H(X) = Zp(x;) log I/p(x)) = 2.663 bits/symbol L =YpL,=2.69 2.663 ith D = 2) = => = 99% Nowith D = 2) = 55, = 99% At the coder output, p(0) = 0.53 and p(1) = 0.47. Huffman [11] has suggested a simple method of constructing separable codes with minimum redundancy (i.e., minimum Z ). The rules of construction are: 1. The symbols are arranged according to their decreasing probability, 2. For minimum L , L(i) should be greater or equal to L(i— 1). Only in case of last two symbols, L(m) = L(m — 1), and they should differ only in the last digit. 3. Select the last two symbols x,,.; and x,, and assign 0/1 to them. Form a composite symbol with probability = p(x,,_,) + p(t,), and construct a reduced source with (m — 1) alphabets and order them according to their decreasing probability. 4. Assign 0/1 to the last symbols of the new column and again construct a 2nd reduced source with the ordering as above. 5. Continue this till all symbols are assigned 0/1 codes. Example 5.9: Huffman coding with binary alphabet. Consider a source with 8 alphabets as given in Table 5.3. The methods of constructing reduced sources are shown below. By tracing the code path backwards, the assigned code is detined as shown H(X) 2.75 bits and L = Ep, = 2.8, giving n = 2.75/2.8 = 98.2% Shannon-Fano method also gives a similar code with L = 2,8. _ Table 5.3 Pls) Raced sources | A |022 022 022 40.25 $0.33 0.42 0.58(0) 8 }o20 0.20 0.20 0.22 | 0.25 0.330)| | 0.42(1) c 018 018 018 0.20 | 0.22(0) 0.25(1) | | D |015 015 0.15 0.18(0) | | 0.20(1) | E 0.10 010 /.0.15(0)| | 0.15(1) [> | F 0.08 casa | 0.10(1) { | G 01011 i =x 0.05(0) | -+0.07(1) | 0.02(1) v Information Theory 497 sople 10: Hufinan coding with D= 4 js fora code nuit Dp: > 2 the rules are exactly similar, except that each combination now contains D ors However, the first composite symbol should be formed out of the last r source symbols such pat: Seep 0 m—HD - 1)}, kis the highest integer, and the last reduced source has only D entries in the last jn. nthe above example, for D= 4, r= (@-3K)=2,k=2 codeword with symbols (0, 1,2, 3}, canbe constructed as given in Table 54 Table 5.4 : 0.40(0) 020 0.20 0.22(1) 018 = (0.18 0.20(2) 0.15 —— 0.15(0)] | 0.18(3) 0.10 0.101) 0.08 © ——0.08(2) 0.05(0)| ,20.07(3) 0.02(1) from the table, L- log D=2 * Xp, = 2.94, sa of ths code is worse than that ofthe binary code. However, for large values of m, both binary and qulenary code are equally efficient. Trample 5.11: Coding of an extended source 5". Consider a two-alphabet source with p(x) = 0.25 and p(x.) = 0.75 as discussed in Example 5.5. The best binary code for this source is: x1 =0,.%) = 1, giving H(X) = 0.815 bits and 9 = 0.815 § with alphabets: (000, 001, 010, ..., 110, 111}, then the fan technique) are shown in Table 5.5. However, if we consider the extended source codes for the new set of symbols (by using Hut Table 5.5 27164 27/64 27164 27164 -» 37/64(0) yy (9164 9164 9/64 p> 10164 18/64 —19/64(0)) | 27/64(1) 10 ye = 9164 9164 9164 9164 ee | 18/64(1)| O11 Ye 9164 9/64 9/64 c=) 9/64(1). 0000 Vs 3/64 4/64 6/64(0)) 9/64(1)) ont vy 3164 sexo | 4164(1), Yo 3/64(0)) _| 3/64(1) 498 Principles of Digital Communication and = ¥, pi 2.47 bits, giving n = 2.48/27 = 98.8%, showing We have now H(S*) = 2.44 bits, ; ion, Further extension of the source gives better remarkable improvement over the coding without extensi 1 (e.g. 11 for S4 = 99.1%), but the rate of improvement is slower. [Ey] sHANNon's FinsT FUNDA ENTAL THE ny Ithas been shown, by earlier examples and coding techniques, that by efficient (noiseless) encoding, the average length Z of the codewords at the coder output approaches H(X)/symbol of the source, particularly with S”, the nth extension of the source. Since average information per source symbol is H(X) bits, a message of length n source symbols gives nH() bits of information and equivalently the source generates symbol is fa sec, then in time T sec, the number of source- M,= 2"H00 messages. If the duration of each s s ; messages generated M, = 200M, (n= Titg). At the coder output, the averge information per codeword = 2rtls? , where, n= Tip, and is L log D bits and in sec, the number of code-messages generated M, of codewords. It is intuitively evident that M, has to be greater or equal to M,, ifthe .d. Thus we have the bound: wh) 3/49" fy =average duration overflow of the buffer memory at the coder input is to be avoide p= VE HP aM, =D Therefore, m (x) log D ‘What is now necessary to show is that with n > ee, L —> H(X)/log D. Shannon has formalised these results in the following theorem: “Suppose that there are a zero-memory source S with the entropy H(X)/symbol and a noiseless channel with a capacity C bits/message. Then itis possible to encode the output of S (with the encoder delivering messages to the channel) 50 that all the information generated in S is transmitted through the channel without loss if, and only if C 2 H(X). 1fC 2, we have the bound: log I/p(x;) < L; log D< log l/p(x;) + 1 (5.22) This also satisfies the Kraft inequality which states that*: (521) L2 *The proof that eqn. (5.22) also satisfies eqn, (5.23) is as follows: The left inequality of eqn. (5.22) can be written as: log, I/p(x,) $ L, oF l/p(x,) $ DY, or p(x,) 2 DY, Summing over all i, we get Yeope gos (5.23a) rh necessity and Information Theory 499 sufficient con ton for the existence of an instantaneours code with word length L, 15 that: Spesi (5.23) iach Lp a8 given in eqn. (5.22), is acceptable as the word lengths of an instantaneous code. Now ssa eg. (5:22) by r(x) and summing over all's we get til EPC tog Upts)< DP tog ds LPs) top vpts) + | FOO pct ,, (5.24) log D log D rabtain better efficiency, one may use the nth extension of S, giving 7,, as the new codelengths. Since ga- (5.24) is valid for any zero-memory source, then this is also valid for (S") having H(S") = nH(=), veading t0 nH(X) 7, PAOD , nS 1 log D log D « Oy ae . HQ) 2 i <4O ij, log D log D aad with n — 09, re A(x) li L,/n = 25) moe = oe D (5.25) ere, Z,./n is the average number of code symbols used per single symbol of S, when the input to the ‘oder is symbol messages for the extended source S". [,,/n # L, where [is the average code length ‘orthe source $ and (see Ex. 11) in general, L,, /n H(s;)- For an extended gth-order Markov source, 57, the equation is modified as: H (52) = nH{(S,) +, € >0 or HSt HS) _ HS) + ln 7 HIS} and for ne, tim 722) H(S,)- (5.33) | ar In other words, for large n, the intersymbol constraints in the Markov source become less and less significant and the average entropy per n-symbol words tends to be equal to n-times the entropy of the Markov source. Shannon’s first Fundamental theorem, given by eqn. (5.25), may now be shown to hold for Markov sources as well. Since the Fitst-roder Markov source 5, and its adjoint 5, have identical first-order symbol probabilities, 10%), P00), .pq)} and S, isa zero-memory source, with H(S}) < H(S), the average codelength for both the sources would be indentical, Thus, eqn. (5.21) also holds for 5, giving Information Theory 503 J log De HOS) TKS) wy sang them extension of 8, the eqn. (8.25) i now monifed as MUSE) © Ly WORDS HESPy A saber from ea (8.32), and dividing by n, . L, log D M(S,)4THOS) = HOS i/o “A S H(S,) +{1S,) —0S,) +) ris gives the bound for the average codelength of a Ist order Markov source and for large n, iG jim © = H(S,Vlog D ‘the result may be extended to the qth order Markov sources, giving L, HS) lim 5 nin logD oe 54.2 Entropy of English Language [12] in Ex. 2. il was shown that the probability of English letters while forming the words are not equal and iherefore, H/leter ~ 4.065 bits, and not 4.76 bits with all letters equiprobable. There are further constraints in forming words such that some combinations, e.g, th, qu, etc., are more frequent, and in many cases spcnent letter after a few earlier letters in a word may be easily predicted. Thus the English language vance, and all other language sources, are Markov sources with finite memory, since p(x/S1- $2 --» $4) oevrious letters are well established. It is then possible to calculate the digram, trigram and q-gram entropies for the language by using eqn. (5.29). Cemsider initially that the language source is only a first order Markov source i-e., the intersymbol influence extends only to the earlier letter in words. Then, the digram entropy, H(i) = H(xss1) iss.) His,), Statistics of digrams, trigrams etc. for English are available, and from tabulated values, H(s,, x,) = 7.70 bits, H(sy, 32.) = 11.0 bits. Thus: H{S,) = 7.7 - 4.065 = 3.635 bits/letter has reduced the information value of the language to 3.64/4.76 = 76.5% of the he intersymbol influence to extend to earlier two letters, the source 1s and the digram structure maximum possible, Considering t! ow a 2nd order Markov and the trigram entropy is: HHS) Hexfsy 82) % Hs 3200 Als 92) 11.0 7.7~ 3.3 bits/letter The g-gram entropy is now calculated by using eqn (5.29), but all such statistics are not easily available. Wthas been shown that 8-gram entropy, He, 1087). 18 approximately 2.3 bits/letter, Shannon [12] has Suggested an alternative method of determining the entropy of English words, From the tabulated word frequencies, as given by Zipf, the most common word ‘The’ 1s assigned the word-order number 1, the Second most common word “Of” is assigned word number 2 and so on. The probability of words p, 504 Principles of Digital Communication against the word numbers n are plotted in a log-scale, as shown in Fig.5.5. Its seen that the locus of yq Points is approximately a straight line, whose slope is (-1). Thus p,, of the nth most common worg is given by: P, = prob. of letters 107 The — (p = 0.07) of > (p = 0.034) ‘AND 10° 10° 10° 1 10 10° 10° 10° Word order n Figure 5.5: Word probability in English (after Zipf) Pn = O-Mn (536) and Py = 1 for maximum n= 8727. ‘We may neglect n > 8727 and the law holds good for many languages. The entropy per word is now: amy HMord = - , p, log p, = 11.82 bits/word 7 Taking each word = 4.5 letters, H/letter = 11.82/4.5 = 2.62 bits. Including space symbol, each word has approximately 5.5 letters, and H/Jetter = 2.14 bits. Finally, if one considers all possible intersymbol influences in English then, H/letter ~ 1.5 bits only. Thus the English language has an efficiency = 1.5/4.76 = 30% and a redundancy of 70% 5.4.3 Source Entropy and Code-channel Capacity [1, 5] It has been shown in Section 5.3 that the number of source messages M, and that of code messages M- grow exponentially with time T'and are given as: , oad Information Theory 505 M, = 240079 (5.378) My = Y7!)E oe” (5.37) owever: it was assumed that all symbols have equal duration f and thee is no intersymbol probabil” traint. It is shown below that the law of exponential growth of messages still hold when the above restrictions are removed. case: Markov Sources. *¢ jas been shown in eqn. (5.35) that forthe mth extension of the qth order Markov sources, average lenis of code words is Sq the rT - nH(S,) limn->= ~ Tog D since the duration of n-symbol source words = nfo, the duration of Z,, = 7fo, and the number of messages generated by the Markov source in time Tis: M(S,) = &XP2 (%) L, log >| no = exp, [(T/t)H(S,)] (5.38) which is similar to eqn. (5.37a). ‘Alternatively we may consider that the intersymbol influence in the source words is of finite range, say less than a duration of q symbols, and let us group the T-duration messages into 'y groups of q symbols each. Then the number of different sequences of -symbol duration is: m7 = 0 mm = size of the source alphabet. Further we may assume that the occurrence of these sequences are somewhat independent, their internal intersymbol influences being absorbed in their overall probability distribution*. Now, consider avery long sequence, B symbols long, made up of Big = ¥ groups of q-symbol words. Then if the probabilities of these o-sequences are {pj, P2» ---» Pa}, then the probability of occurrence of B-long sequences is, by law of large numbers, P=(P" (P2¥"--- Ca (5.39) and the probability is the same for all ergodic sequences of length B. If the sequence is not ergodic, then itsprobability is vanishingly small and may be considered zero. The number of possible ergodic sequences is then: M(S,) = VP and —— == , log p + B log M(S,) = —log P yh Pi, (5.40) —_————_ Thi . Ben a gives an excellent method of coding q-symbol words by using Huffman method, as discussed in <= 506 Principles of Digital Communication The constant B is due tothe intersymbol constraints within q-symbol groups and varies slowly wi For pure random sequences, B = 0. Thus M(S,) may be written as a MS,)= toon 13m log | iat = Ayala) (541, where 4 = constant, and H(q) = entropy/q-symbol word. It is seen that the number of possible sequences MS,) grows exponentially with the length of the sequence B, even when the intersymbol constraints 22 present, provided that the sequences are ergodic. In terms of 7-sec duration of B-long sequences MAS,) = Ap €xP2 [(T/ato)H(Q)]; Blo = 7, = Ann (5.42 which is similar to eqn. (5.38). Asymptotic source (language) capacity may now be defined as: ; te Mr) Ay +e Re im | |= T = log Ry (5:43) Moreover, the total language information of B long sequences is -log P, and therefore, language information/symbol To] 1 Y pjlog p; agri log p; i i = H(g)/q ... (from eqn. (5.41)) (44) Case If: Unequal Duration of Code Symbols [1, Appx. 1-4]. thas been assumed in Sec. 5.3, that the code symbols are all of equal duration = )/Z, However. if the symbols are not of equal duration, e.g., in Pulsewidth modulated signals, then the number of possible code messages generated in time 7, is given as: M1) = NT ~t)) + MT~ 0)... + NT tp) (8.45) where 1, #1, ... # tp, and 1, ~ duration of the symbol d, for a code alphabet of D symbols. The solution of the above difference equation for large T is of the form: fim (NC7)] = yp, (5.46)* where Ag is a constant and Ry is the largest real root of the character ristic equation: XX eX weay *The equation is also true for source vodes, ¢ s i fi codes, €.g., Morse “graph ed onstaits for transition fom one state tothe ater {sce on i hia there are certain specific fix v jg soem that ‘N(Z) grows exponentially with 7 for T>, and itt MT) N(T) code capacity can now be defined Information Theory 507 is similar to eqn. (5.37b), giving: e as: (5.47) Entropy vs. Code Capacity 3 ew ofan. (5-43) and (5.47), i 5 itis now evident that for ‘Successful transmission of all the information cy rated by the source in time T, the condition to be Satisfied is: erate” M12 M¥S,) fT, a Ry2 Ry (5.48) sunon’s frst fundamental theorem may now be restated for general sources and code-channels as: ‘ifthe code capacity of a code-channel and enco. der exceeds the average information per unit time gested in the information source, Le, iff Ry > Ry, then there isa method of encoding all the information ‘nto the channel (even if one has to wait for a long time 7). There is no method of encoding all the information if ts generated ata rate exceeding the code capacity, ie, if Ry < Ry. The proper encoding, of course, will require a large buffer memory atthe input to the coder and a ssond memory at the output ifthe codes of unequal length and duration are to be transmitted through ‘he channel in a synchronous mode. EG biscReTE CHANNEL Witt DISCRETE NOISE [4, 6, 8] !adigital communication, binary or multilevel tise produces errors at the output. The channel ctannel or equivalently by the transition/nois ‘formation 1(X, Y) has been defined by eqn AX, Y) = H(X) - HAIN) = H(X)- Hix) = H(X) + H(X) - H(X, Y) V.Disthe rate of transmission in bits!symbol, and therat of transmission in bits per second is defined as RK, Y)= HX, Yip ete = duration MEK. 3, the di Value of mut ‘hg eral, "as, ( (D-ary) signals are normally transmitted, and the channel | characteristic is normally defined by the error rate in the se matrix of the channel, as shown in Fig. 5.2. Mutual (5.3) in See. 5.1.2, and is given by: (S49) ofeach symbol. ifferent entropies of a multiinput/multioutput channel have been calculated and in Ex.4. ual information has been calculated, based on any of the formul: ae of eqns. (5.13), (5.14), : the Channel may have m inputs and n outputs, and all the entropies may be calculated using ), (5.14), 508 Principles of Digital Communication Example 5.14: Consider a channel with two inputs x1, x2 and three outputs 1, ¥2, ¥3, and the nojge matrix of the channel is as given below. Calculate (X, Y), with p(x) = p(z) = 0.5. Given: x/34 14 0 ¥. POU) lo 1m 12 Then xne[8 V8 0 w= |) a ys and am.|' 3 9 PAM=19 3 1 where, pi) = 3/8, pox) =3/8, plys)= 1/4 Then, H() = ~3/8 log 3/8 — 3/8 log 3/8 — 1/4 log 1/4 = 1.56 bits/symbol H(X) = 1 bit/symbol (WX) = 0.90 bits/symbol, and H(X7Y) = 0.34 bits/symbol Therefore, IX, ¥) = H(Y) - HX) = 1.56 - 0.90 = 0.66 bits/symbol = H(X) - H(X1Y) = 1.0 - 0.34 = 0.66 bits/symbol. Example 5.15: Consider another example with m # n, whose noise matrix is given below. Given that: {X} = {1/4, 2/5, 3/20, 1/20}, calculate I{X, Y}. x 0 18 23 0 x 0 0 18 2B Xs o 0 1 0 y Oe «n't as given, the pLX, Y) matrix, ig ra ' W400 9 9 0 Win 9 G4 pie) 9 20 io 9 o 0 20 Yo o 0 ym o A 7 0 0 0 2/7 6/7 0 0 pxXy=| 0 7 y2 0 0 0 44 0 0 40 This gives PAY) = (0.35, 0.35, 0.2, 0.1} Then H(X) = 2.06, H(Y) = 1.86, H(X, Y) = 2.665, H(Y/X) = 0.61, H(XY) = 0.81 Therefore, U(X, Y) = H(Y) ~ H(Y/X) = 1.86 - 0.61 = 1.25 bits/symbol = H(X) - H(X/Y) = 2.06 - 0.81 = 1.25 bits/symbol. Channel Capacity and Efficiency The channel capacity C of a channel is defined as the maximum of the mutual information thet may be \ransmitted through the channel. As such: C= HX, Vmax bits/symbol = R(X, Vmax bits/sec. (5.50) ‘tom egns. (5.13), (5.14), it is evident that: given the noise matrix p(¥/X),C may be obtained by maximizing ‘'Y),and the resultant H(X) gives the values of p(X}. ‘he transmission efficiency or the channel efficiency 1 is then defined as: n= MX, YC ae ‘nd Redundancy of the channel CH) v C (8.52) ie the maximization problem as indicated by eg. (5.50) 1s rather difficult, we consider first some of “Simpler channels, e.g., the Binary channel and its extension, 4 516 Prnciptes of Cages Commurmcstior 55.1 Binary Symmetric Channel (BSC) ~ the most common and wxde!) used chemnels 8 BSC. whose Wansiton Max. and the chan, eagren are shown m cual Informenon of the channe| YD where Hin) = HU) = wd mag Therefore, pO) = 1) = 05 ; wd peg=* ~plogp-g bee - He) Figure 5.6: A BSC and ts channelimar-, Clramnei capace is defined as the maximum of AX. ¥) and for the BSC with POp= pi C=1-ploep-qbeg (53) Jn thas particular symmerric case, the “equivocation’ HUT) = HUA). An interesting imerprettion of tee equtvocation may be given. if we consider an idealized communication system with the above Spmmetic noise mamx and an observer auxiliary noiseless channel as shown in Fig. $ 7 The observer compares the tansmiced and the received symabols, and whenever there is an error, a correcbon Signal m the form of “I” is seat wo the receiver Othersiise the observer continues to send °0" and no change is made in the recetved symbols. The correction signal ramsmited by the observer is the additional information suppbed to the rececver to compensate for the loss due to the noise in the channel and is equal 1 the equivocation HLL F). This may be calculated as follows: Pr. of sendang | by the observer ( with Pr (x = 1) = Pr. (x= 0)= 0.5), = Pc of error in the chamme! =| 2p- 12 Pr of sending 0 by the olen - 1 -p-¢ Therefore. the addition! information supplied lp-glgle=AXD Deserve Noseess crane | Figure §.7) 40 seaisec Drary communicator system yo Information Theory 514 apd the Transinformation ofthe channel = HX) — HOO) = H(Y) — (YX) = 1+plogp+g!ogq (5.54) e of 10° bits! gsample 5.16: Consider that a source is transmitting equiprobable 1/0 symbols at the rat «ec, and the probability of error in the channel, P, = 1/6. Then the Rate of transmission Risec is: Risec = 10°[1 + 15/16 log 15/16 + 1/16 log 1/16] = 10° [1 0.087 - 0.25] = 663 bits/sec. Inis interesting to note that if p and g are interchanged in the channel matrix, t transmission is obtained. But with p —» 0.5, R decreases and R = 0 for p = 0. however, p(x,) # p(x), then one has to use the eqn. R= HO) HIN), and for a given value of error probability p. Py =O pig * PP rye Ve ppt pd HOPX) © p log Vip + q log Vig = Hip) then also the same rate of s is shown in Fig. 5.8. Tf, Therefore, R- HO) MP) and ix maximum for p, =p) —0.5. The vanation of & with ptx,) is shown in Fig, 5.9. 068 0 1 aL 4 o 02. 04 06 08 10 o 05 1 AK) pr. of error p Figure 5.9: Mutual information for a BSC Figure 5.8: Transmission rate in a 8SC Example 5.17: Consider a binary channel with the same noise matrix, as in Example 5.16, but p; = p(0) yw calculate: = V4 and py = p(1) = 3/4. We may 0 p= 0)= 14 x 15/16 + 3/4 * 1/16 = 9/32 pov= 1)= WA * 1/16 + 34 * 15/16 = 23/32 512 Principles of Digital Communication Therefore, A(Y) = 9/32 log 32/9 + 23/32 log 32/23 = 0.86 and 1X1) = H() — H(p) = 0.86 ~ 0.337, (Hip) from Ex. 16) = 0.523 Therefore, for transmission rate of 10° bits/sec the information rate Risec = 523 bits/sec. Binary Erase Channel (BEC) In data communication, it is a common practice to use ARQ (automatic Fequest for retransmission techniques for 100% correct data recovery. In the receiver, an error detecting circuit is used along with some parity checks transmitted with the data and the erroneous data is rejected asking for Tetransmission, The transition matrix of BEC is shown in Fig. 5.10. Assume that p(x) = p(x) = 0.5, and Yio Ya Vy % nd % 4 PAYIX) > P x » hy Then, Xi a Ye g/2 0 p/2 PX Y= 9 g/2 p/2 Figure 5.10: A BEC Therefore, p(y) = q/2, p(y) = q/2, and p(y) = (1/2)p + (12)p = p. Then, 10 2 Py) = [0 i tal This gives H(Xy,) = H(Xy3) = 0, H(Xlys) = ~ 1/2 log 1/2 - 1/2 log 1/2 = 1 and HOY) =0+0+p(l)=p Therefore, MX, Y) = HX) - HX) = | - p = q bits/symbol (5.55) In this particular case, use of the eqn. I(X, ¥) = H(Y)— Hi YX) will not be correct, as H(¥) involves ys, but the information given by y, is rejected at the receiver, 5.5.2 Extension of Binary Channels [5] It has been shown in Sec. 5.1.3 that for the nth extension of the source H(S") = nH(S). Similarly, the mutual information of the mth extension of the channel may be shown to be: 2", 1") = nll, ¥) 656) Information Theory 513 consider again the channel matrix given in Fig. 5.6 for a BSC. For the second extension of the channel, pannel matrix is now expanded as thee 0} ¢ pa mg Pp paixs01| pq @ ppg 10) pa Pow PG | vr an © or p(0) = (1) = 0-5, px) = PCa) = (x3) = pla) = 0.25. Similarly, 01) = PO) = POs) = PVs) = V/A(p? + 4? + 2pg) = 0.25, since (p + q) = 1. Therefore, H(Y°) =2 bits, and H(P-1X?) = ~(g? log q? + p* log p? + 2 pq log pq) = -2[q log q + p log p] bits Therefore, 12, Y) = 2[1 + q log g + p log p] = 2X, 1) (5.57) For the nth extension, H(”) = nH(¥), and because of the symmetry of the channel, H(P1X") = nH(/X), giving the eqn. (5.56). Cascaded Channels sometimes the channels are cascaded for operational reasons, and if both the channels are noisy, there is an information loss in the final outcome. PAX) = Pla) = 0.5 Figure 5.11: Two BSC’s in cascade 514 Principles of Digital Communication Consider that two BSC's are cascaded, giving the overall channel matrix, as shown in Fig. 5 1) 4% pz ne Al NLP og’ where, gp tg=1—- pq P= Inq Thus the cascaded channel is equivalent to a single BSC with error probability = 2pq We have, UX, Y) = 1 - Hp) and therefore, NX, Z) = | - H(2pq) Similarly for a cascade of three BSC’s, giving the output {U;, U3}, the mutual information is: KX, U) = 1 — HBpq? + p3) where the equivalent q’ = q3 + 3p2q P= p+ 3pq? (5.58) (59) and O'+q)=(p+gP=1. Ivis seen that the equivalent error probability increases with cascading of more channels and therefore the overall mutual information decreases. For a number of non-identical channels cascaded, the genera relations are: W(X, ¥) 2 KX, Z) 2 KX, U) and equality holds when the later channels are noiseless (except in some special cases). Repetition of Signals: Additivity of Mutual Information To improve the channel efficiency (equivalently to reduce the error rate) the signals {.X} at the channel input and to detect onl Consider the BSC of Fig. 5.6, with inputs {X} = discarding the outputs {01, 10} (5.60) ), an useful technique is to repeat ly the repeated signals as {Y}, as shown in Fig. 52 {00, 11}, and acceptable as {Y} = {00, 11), theres » as ina BEC. Then the equivalent channel matrix is: Erase Figure 5.12: A repetitive BSC Information Theory 515 WY Vs Ia lq? p? pq pq worm *[% @ pa pa pag Pon) =P0d= 7; p(y5)= pow) = pa Therefore, HQ), = (P? +4?) log—>2— + 2p9 log + pg Pee og H(YIX)y = @? log 1/q? + p? log 1/p? + 2pq log 1/pq 1 KX Y= HD), HOI, = +29 BE GE rr +) + q@ log q? +P? log p* @ lo Pg pte peta -P+ all APMP? +e) (5.61) 2 +P hog = sats Itis seen thatthe channel is now equivalent oa BSC with error probability p'= 73", 77 andanormalising factor of (p? + g2) = probability of observing either 00 or 11 at the output. Since p’ + of = uf I +3pq[l—H(p)) (5.62) Example 5.18: Assume a BSC with p= 1/4 and q = 3/4 and p(0) = p(1)= 0.5. Calculate the improvements in rate of transmission by 2 and 3 repetitions of the input. For the BSC, without repetitions, UX, Y)=1+plogp+4loeg 1+ 1/4 log 1/4 + 3/4 log 3/4 = 0.185 bits/symbol. With 2 repetitions, equivalent error rate p’ = 1/10, and UX n= + PILL - HON = 5/8[1 +p’ log p’ +4 10g q'] = 5/8[1 — 1/10 log 10~ 0.9 log 10/9] = 5/8 x 0,533 = 0.333 bits. u 516 Principles of Digital Communication With 3 repetitions, the error rate p” = 1/28 and q” = 27/28 1X, N= + PLL - He") + 3pal - HO) = (7/16)[1 — 1/28 log 28 — 27/28 log 28/27] + (9/16) x 0.185 = 7/16 x 0.78 + 0.104 = 0.444 bits/symbol. It is thus seen that the mutual information improves considerably with repetitions. A plot of eqns. (5.61) and (5.62) is shown in Fig. 5.13, for various values of p. This technique of improving the error rate is somewhat similar to error-correcting codes and is widely used in radar-signal reception. o 05 1 p Figure 5.13: I(x, y) for BSC with r-repetition 0 5.5.3 Channel Capacity of Channels: General Case [4, 6] Consider a simple binary channel, given by a noise matrix which is not symmetric. In this case, 1X, Dis ae 7 gape igen ipa ie

You might also like