This action might not be possible to undo. Are you sure you want to continue?

# Class Notes: Digital Communications

Prof. J.C. Olivier Department of Electrical, Electronic and Computer Engineering University of Pretoria Pretoria Revision 3 September 8, 2008

2

Contents

0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 11 11 12 13 14 17 17 17 17 19 22 23 24 24 24 24 25 26 26 26 29 31 31 31

1 Introduction 1.1 Overview of Wireless Communications . . . . . 1.2 The transmitter data burst structure . . . . . . 1.3 The dispersive radio channel . . . . . . . . . . . 1.4 The model of the entire communication system

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

2 Introduction to Probability theory and Detection 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Probability theory, Detection and some odd experiments 2.2.1 Background . . . . . . . . . . . . . . . . . . . . . 2.2.2 Applications of Bayes’s theorem . . . . . . . . . 2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 3 The modulator and demodulator 3.1 Modulation continued . . . . . . . . . . . . . . . . . . 3.1.1 The concept of base band signal processing and 3.1.2 Types of modulation . . . . . . . . . . . . . . . 3.1.3 Binary phase shift keying (BPSK) . . . . . . . 3.1.4 Four level pulse amplitude modulation (4PAM) 3.1.5 Quadrature phase shift keying (QPSK) . . . . 3.1.6 Eight phase shift keying (8 PSK) . . . . . . . . 3.2 De-modulation . . . . . . . . . . . . . . . . . . . . . . 3.2.1 What if there is multipath? . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

4 Detection 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The static Gaussian channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Computing more than just the most likely symbol: probabilities of all constellation points, and the corresponding coded bit probabilities computed by the receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 MLSE - the most likely sequence estimate . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Finding the sequence x via the MLSE . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 3 tap detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

33 35 36 37 38

4 4.4 Probabilistic Detection via Bayesian Inference for Multipath channels . 4.4.1 Sub optimal detected bit probability calculation . . . . . . . . . 4.4.2 Optimal symbol probability calculation using Bayesian Detection Forward-Backward MAP detection . . . . . . . . . . . . . . . . . . . . . 4.5.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 39 40 41 43 44 47 47 48 48 48 49 50 50 51 53 53 54 55 56 57 58 58 59 61 63 63 63 64 65 66 66 67 69 71 72 73 76 79 80 80 81

4.5 4.6

5 Frequency Domain Modulation and Detection: OFDM 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Circulant matrix theory . . . . . . . . . . . . . . . . . . . 5.3 The Transmitter for OFDM systems . . . . . . . . . . . . 5.3.1 Cyclic time domain multipath propagation . . . . 5.4 OFDM receiver, i.e. MAP detection . . . . . . . . . . . . 5.4.1 MAP detection with trivial complexity . . . . . . . 5.4.2 Matlab demo . . . . . . . . . . . . . . . . . . . . . 5.5 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Channel Estimation 6.1 Introduction . . . . . . . . . . . . . . . . . . . . 6.2 Optimum receiver ﬁlter and suﬃcient statistics 6.3 The linear model . . . . . . . . . . . . . . . . . 6.4 Least Squares Estimation . . . . . . . . . . . . 6.5 A representative example . . . . . . . . . . . . 6.6 Generalized Least Squares Estimation . . . . . 6.6.1 The generalized least squares procedure 6.7 Conclusion . . . . . . . . . . . . . . . . . . . . 6.8 Assignment . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

7 Minimum Mean Square Error (MMSE) Estimation, Preﬁlter and Prediction 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Minimum mean square error (MMSE) estimation . . . . . . . . . . . . . . . . . . . . . 7.2.1 The principle of orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Geometrical interpretation of the principle of orthogonality . . . . . . . . . . . 7.3 Applying minimum mean square error (MMSE) estimation: Let us design a linear preﬁlter 7.3.1 Matched ﬁlter, minimum phase ﬁlter and spectral Factorization . . . . . . . . . 7.3.2 MMSE preﬁlter design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Evaluating matrix E{yy† } and vector E{s[n]∗ y} . . . . . . . . . . . . . . . . . 7.4 A representative example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Stochastic processes and MMSE estimation . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Information Theory and Error Correction Coding 8.1 Linear block codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Repetition codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 General linear block codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8. . . . . . . . .5 8.3 Decoding linear block codes using the Parity Check matrix H Convolutional codes and Min-Sum (Viterbi) decoding . .2.1. . . . . . .2 8. . .3 . . . . . . . . . . . 82 83 84 87 8. . . . . . . . . . . . . . . . . . . . . . . . . Assignments . . . . . . . . . . .1 Decoding the convolutional codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 .

. . . . . . . . . . . . . . . . . The representation of the MMSE-DF preﬁlter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The estimated impulse response ˜ and its z-plane representation. . . . . . . . . . . . . The overall impulse response c before the preﬁlter. . The layout of a typical receiver. . . . . . . . . . . . . . . . . . . . . . .5 3. . . . . . . . . . . . . . . . The modulation of 8 coded bits from x via QPSK modulation. . . indicating where the detector come into play. . . . .2 MAP detection on a static Gaussian channel is selecting the modulation constellation point closest to the noice corrupted received samples. . . .3 7. . . . . . . . . . .3 1. . . . . . . .3 3. . . . . . . .1 The The The The data burst using pilot or training symbols. . . . 12 13 14 15 23 25 25 26 27 28 29 The modulation of 1 bit in amplitude modulation. . . . . . . . . . . . . . . . . . The OFDM transmitter frame format making the multipath propagation cyclic. . . . . . . . . . The road-map between town A and B . . . . . .2 3. . . . . . . . . . . . .5 66 68 72 73 . . . . . The modulation of 3 coded bits from x via 8 PSK modulation. . . . . . . . . . . . . .1 6. . . 4. . . . .6 3. . . . The ﬁrst stages of the receiver hardware. . . . . . . . one where the channel quality is good (high SNR) and one where the channel quality is poor (low SNR). c The principle of orthogonality. . . . . the feed-forward ﬁlter and the feedback ﬁlter. . The modulation of 8 coded bits from x via 4 PAM modulation. .that will be the MLSE sequence! The trellis . . . . . . . .4 7. . . . . . . . . . . . . . . . . . . .List of Figures 1. . . . . . . Two cases are shown. . . . . The de-modulation using a matched ﬁlter and optimum sampling. . . The preﬁlter is the combination of the matched ﬁlter and the feed-forward ﬁlter.7 4. . . . . . . . . . The Gaussian pulse shaping ﬁlter used in GSM. . . . . . . . .3 7. . . . . . . .1 6. . . . .infer the shortest route with least cost . . . .5 5. . . . . . . . . . . .2 1. . . .3 4. . . . . .1 7. . . . The representation of the matched ﬁlter. The modulation of 4 coded bits x via BPSK modulation. . . . . . .that will be the MLSE sequence! The forward-backward MAP trellis for BPSK. . .infer the shortest route with least cost . (an AI device) . . . . . . . . . . . . . . 7 33 36 38 39 42 49 53 57 58 65 7. . .4 4. . . . . . . . . . . . . . . .infer the shortest route with least cost or distance? The trellis . . .4 3. . .1 3. . . . . . . . . multi-path channel and time domain representation at the receiver. . . . . . . . transmitter and receiver ﬂow in a wireless communication system. . . . . . . . . . . . . . . . . . normalized autocorrelation function for a training sequence. . .1 1. . . . . . . . . . . . . .4 3. . . . . . . . . . The feedback ﬁlter is the post preﬁlter CIR. . . . . . .2 6. . . . .2 4. . . . . . The overall impulse response b after the preﬁlter. . . . . . . . . . . . . . .

.1 8. . . . . . . The convolutional decoder trellis based on the state diagram. . . . . . . . . . . . . . . . . . . 74 77 84 85 . . . GSM channel fading with and without frequency hop. . . . . .2 The interpolation problem. . . . . . . . . . . . . . . . . . . . . . .8 7. . .6 7. . . . . . . . .7 8. . . . . . . . . . . . . . . The convolutional encoder and state diagram. . . . . . . . . . .

and it contains most of the blocks needed in a modern communication system.za . namely the book by MacKay [1] and the book by Proakis [2].up. the student will use the simulator to complete the assignments. J. as well as a few key papers from the open literature. The notes only references two other texts.ac. and in the assignments after each chapter.olivier@eng. Most of the material in these notes can be found in these references. it is presented in a style easy to understand. On our open source website http://opensource. and in a logical order that facilitates the students appreciation of the big picture of Digital Communications systems.za a complete GSM simulator is available. Olivier Pretoria September 2005 corne.1 Preface The notes deal with a number of diﬀerent topics needed in Digital Communications theory. however.ee.ac. The idea is that the student will re-create the material for herself/himself by completing these assignments.up. Feedback on these notes would be appreciated and can be send to the email address below.C. Each chapter in these notes deal with some of the techniques used in the simulator.9 0.

10 .

Some of the problems that plague wireless communications and lead to errors in detected bits. buildings. or any large object able to reﬂect a signiﬁcant amount of the transmitted wave. Multi-path may be caused by reﬂections from mountains. the type of multi-path present may be diﬀerent. Or it may be radio waves from the solar system. that will receive any radio wave source inside the frequency band of interest. a system transmitting information over a ﬁber optic cable is a communication system. radio waves are susceptible to any relative movement between the transmitter and the receiver. but is not wireless. multi-path may essentially be absent. but the use of radio wave propagation presents unique challenges. Doppler shift causes radio wave fading. where hills or mountains may cause multi-path with large delay because of the distances involved. as such movement causes Doppler shift.Chapter 1 Introduction 1. that are omni present. use of radio wave propagation in wireless communications systems. Secondly. This is also the case in hilly terrain areas.1 Overview of Wireless Communications Wireless communications is a term that is applicable to any system transmitting information from one location to another while making use of radio wave propagation. By that we mean that multiple copies of the modulated radio wave that are delayed in time arrive at the receiver. and where a multitude of interference sources. Channel dispersion is caused by multi-path communication. This may for example include human made noise or interference sources such as other transmitters operating in frequency bands close to the band of interest and leading to cross talk when the transmitter and receiver ﬁlters are unable to completely suppress the transmission in adjacent channels. that is. such as thermal noise in the receiver and channel dispersion (multi-path propagation) may also apply to the case where the electromagnetic waves are guided such as in ﬁber optic or coaxial cable systems. and at the other extreme in build up urban areas buildings may cause signiﬁcant multi-path. Hence. the radio wave is constantly undergoing a multiplicative distortion that varies the wave amplitude and phase during transmission from the transmitter to the receiver. requires the use of receiver antennae. For example. depending on the environment where the system is deployed. In rural areas where mountains or hills are absent. both human made and extraterrestrial are impairing the receivers 11 . First of all. Thus we may summarise by saying that the wireless communications system has to operate in an environment where the transmitter and/or the receiver is mobile (moving) causing Doppler shift that leads to fading. where multi-path propagation is present causing channel dispersion.

In designing the optimal receiver. These will be used by the receiver for channel estimation. 1.1. The pilot symbols enable the channel that is unknown to be estimated.1: The data burst using pilot or training symbols. The bits will in general be represented or modulated as complex symbols chosen from a discrete modulation alphabet. The receiver will thus need to perform an estimation of the channel properties.-1.1.1.2. In the receiver we will use the transmitted training (pilot) symbols that are known a priori to estimate the overall impulse response valid over a short period of time.-1. Experience has shown that actual performance obtained in practice closely match the performance predicted based on computer modelling and simulation. These impairments may be artiﬁcially generated or modelled on a computer. based on the theory of estimation and detection.-1.1. as shown in Figure 1. In order to perform the channel estimation.-1.2 The transmitter data burst structure In modern digital communication systems the objective is to transmit discrete data or source bits reliably over the channel. 3 Tails 58 data symbols 26 training 58 data symbols 5 tails Figure 1. For example. a 26 symbol sequence used in GSM is given by -1.-1. we have to design a receiver that is able to mitigate all of these impairments simultaneously.1.-1.-1 and its normalised autocorrelation function is shown in Figure 1.1.-1.-1. . we will in general require known pilot or training symbols to be intermittently transmitted. In later chapters it will become evident that the autocorrelation properties are important in designing the optimal channel impulse response estimator. where they will be estimated or detected.12 ability to detect transmitted information reliably. the RF channel is time variant and is in general unknown at the receiver.1.1. Later it will become evident that the selection of signal processing methods used in the receiver is based on statistical models and assumptions that are made about the operating conditions and impairments.1. The period of time over which we assume the CIR remains valid is determined by the length in time of the data burst as shown in Figure 1. Since the transmitter or receiver or both may be mobile. The choice for the pilot symbols are based on the need to have the autocorrelation function of the pilot sequence to be as close to a Kronecker delta as is possible. These symbols (complex numbers) are unknown at the receiver. It is diﬃcult to say in general which of these impairments are dominant. it has become a standard procedure to include a short sequence of pilot (training) symbols in between data symbols to form a radio burst. In later chapters it will become clear that the detection process requires information about the transmit ﬁlters. which we denote by vector c(t).1.1.1. as the conditions prevalent in diﬀerent locations are diﬀerent.-1.1. the RF channel and the receiver ﬁlters.1. Hence.-1.1.-1. and the receiver may so be simulated and its performance determined to a large extent before deployment in the ﬁeld.

large bandwidth systems. There are 2 secondary paths that each are delayed a fraction of T seconds longer than the direct path. how many there are. 1 0. varies) over time independently from the other taps. i. a complicated topic not considered in this notes 1 .2: The normalised autocorrelation function for a training sequence. the channel impulse response is arbitrary.9 0. which takes τ1 seconds to travel between the antennas. then there are 2 delayed paths reﬂecting 1 The reader is referred to the work by Zheng and Xiao available on IEEE Xplore for a detailed exposition of the simulation of these processes .6 0.3. The sampling rate of the receiver determines to a large extent how fast the FIR taps will decay with time i.e. will experience much longer FIR channels than narrow-band or low sampling rate systems. high sample rate systems.2 0.e. There is a direct path between the 2 antennas.7 0.13 The pilot symbols are used to estimate the eﬀective overall Channel Impulse Response (CIR) valid for one burst. Movement between the transmitter and receiver will cause fading due to Doppler phenomena.e. except for the length of the channel impulse response that is ﬁxed and related to the receiver sample rate and multi path environment the receiver operates in. 1.1 0 0 10 20 30 40 50 60 Figure 1. So for any one burst. under the assumption that the channel is changing slowly compared to the duration of a single burst. and the radio channel impulse response appear to be ﬁnite in time. Let us consider a situation where the transmitter transmits a series of modulated symbols denoted 1 as An−2 An−1 An An+1 and the receiver uses a sample rate of T Hz.5 0. Thus in general.4 0. Thus it has become a standard procedure to model the radio channel in discrete time as a Finite Impulse Response (FIR) ﬁlter.3 The dispersive radio channel The radio channel impulse response in fact decays with time. The multipath environment the receiver is operating in is depicted in Figure 1.8 0. The important point is that generally each tap of the channel impulse response fades (i.3 0. as a high sample rate implies short time diﬀerences between diﬀerent taps and thus less decay. Later copies of the wave have less and less energy since it propagated larger distances.

the transmitter is completely unaware of the channel. See Figure 1. How is this achieved in practice? It is achieved in several steps each contributing a key part of the overall Digital Communications system. So due to symbol An−2 there arrives 5 copies of the symbol An−2 at the receiver. oﬀ mountains that are delayed T and 2T seconds each.3. Each block represents a key part of the communication systems ability to transmit and receive information. An−1 . and where noise and interference signals are added to it. Let us consider what the receiver ﬁnds at sample nT . denoted r[n]. each with a diﬀerent amplitude. Mathematically we can write this series of symbols present at time n as L r[n] = k=0 ck An−k + ns [n] (1. or a voice.3 for a pictorial presentation. as depicted in Figure 1. and each ck is a tap of the channel impulse response.as is clear from Figure 1. phase and time delay.3: The multi-path channel and time domain representation at the receiver. An .1) where ns [n] represents thermal noise.14 Transmiter τ3 τ1 τ2 τ 5 = τ + 2T 1 Receiver τ 4 = τ1 + T f(t) A n−2 A Due to An−2 A n−1 n Due to An c0 c2 c1 Due to An−2 Due to An−1 T Sample time Figure 1. 1. At the output of the receiver we wish to have a reliable copy of the source. and that is the reason why we refer to c as the channel’s impulse k=0 ck A response. with few or no mistakes if possible. we have a data source. Hence multiple copies of all the symbols are arriving at the receiver. undergo multipath propagation. However. . The term L n−k is discrete convolution.4 The model of the entire communication system Let us now present the overall wireless communications system. contributions from symbols An−2 . or an image that we wish to send over a horrible channel where the transmitted signal will fade. To start oﬀ with. It ﬁnds present at that point in time. The receiver however samples the output of the demodulator at a multiple of T seconds.4.

This redundancy will be exploited in the decoder to correct errors . • The receiver ﬁlter used to ﬁlter the distorted signal that arrives at the receiver. but since the bandwidth use 1 is proportional to T . where thermal noise is also added to it. The technology for achieving compression is constantly changing. The real part of Ai is used to modulate the in-phase part of the carrier wave. the price we pay for that is we consume more bandwidth. and is not a solved topic at this point in time. • The modulated series of modulated symbols in analogue form denoted s(t) are transmitted one by one over the channel with a time duration of T seconds each. • The data with redundancy denoted x is binary (ones and zeros). known as the symbol rate. regardless if the channel is a coaxial cable. a scarce and expensive resource. Now why would we add redundancy when we did all the work to remove it in the previous block? The reason is that the redundancy we add here is controlled redundancy. the more bits we can grab from x per symbol. the data x must be modulated to transform the binary x into a series of complex data symbols denoted Ai . To be transmitted via an antenna over a channel. The smaller T is. This is done in the modulator. are passed to the error correction encoder. is matched to . Here theoretically all or at least most of the redundancy of the source is removed. and the imaginary part to modulate the quadrature part of the carrier wave. • Next the data without any redundancy denoted as x in Figure 1.4. the faster data can be transmitted over the channel. This process is complicated and is an ongoing research venture. and cannot be transmitted in that form over a channel. • The ﬁrst key part is the source compression block in the transmitter. before the receiver ﬁlters it. that is added in very smart and ingenious ways. This will all become clear later on in this notes. The complex symbols Ai can be used to modulate a carrier RF electromagnetic wave over the channel.a topic covered in detail in this notes. The dispersive channel causes multiple copies of s(t) to arrive at the receiver input port.15 Modulator: Transmitter source voice/ data source compression Bits to Symbol Pulse shaping filter x encoder data bits z g(t) s(t) RF channel model Ch j coded bits noise symbl/soft bit detector Σ source decompression source estimated x decoder z c Overall channel estimator y matched filter de−modulator Receiver Figure 1. Its job is to add some redundancy to x to produce z. and the higher the thru-put will be over the channel. the more vulnerable we will be to noise in the receiver.4: The transmitter and receiver ﬂow in a wireless communication system. The more valid points we permit on the complex plane in the modulator. a wireless channel or deep space between Pluto and Earth. But the more points we have.

and the entire frame is send again. Without the decoder (coding theory) we would not be able to produce error free estimates x at the receiver. • Finally. This function is performed by higher layer in the protocol stack used in the communication system. the number of taps in it. However. and data communications ˜ would be virtually impossible. This causes the Signal to Noise Ratio (SNR) presented to the rest of the receiver to be maximised. this one the detector where the symbols formed in the modulator are transformed back into bits denoted z because there can be errors in this estimated version of z. . and several options exist with various advantages and disadvantages. i.16 the transmission pulse shaping function g(t) used in the modulator. If there is a single tap. and is presented to the channel impulse response estimator. which is a desirable situation. and forms the key part of the ˜ modern marvel of Digital Communications systems. the detector is very simple and the methods developed using elementary probability theory can be used as is. then we need to resort to graphical methods called a trellis. so there are two taps in c. The methods used to form the decoder is also based on graphs or trellises. even if there is just one additional tap. • The decoder has the job of ﬁxing errors that are present in z . with the rest zero. There exist a very eﬃcient and elegant algorithms to ﬁnd the optimal solution on these graphs. the decompression device uses x to reconstruct the original source. • The ﬁltered and sampled signal that is the output of the de-modulator is denoted y[n]. Here the overall channel impulse response denoted c is estimated by the receiver using known pilots symbols in the transmitted burst. In practice Cyclic ˜ Redundancy Check (CRC) codes are used in the frame and if an error is detected in the reconstructed source in spite of the best eﬀorts of the decoder. known as the Viterbi algorithm. then a Repeat Frame Transmission request is send back to the transmitter. ˜ The character of the detector is dictated by the length of the vector c. ˜ and the probability for each bit in z are provided as well by the detector for use by the decoder.e. • The received vector y[n] and the channel impulse response c are passed to the ﬁrst of 2 detection devices in the receiver.

i. 2.1) A bent coin.Chapter 2 Introduction to Probability theory and Detection 2. Clearly therefore. a science where we are given information via observations. and we are required to infer the value of a parameter or some property of a random variable [1]. In this notes it is assumed that the reader has a basic understanding of statistics and its applications.1 Probability theory. an unfair coin has a probability f coming up heads.1 Introduction Here we study inference.2) . We perform the experiment N times. and quantify those estimates probabilistically. the process of Inference that is performed in the receiver is a statistical one. that is unknown (stochastic). N ) = N r N −r f (1 − f ) r 17 (2. and given the noisy observed data (that was also corrupted by multipath propagation) and knowledge of the noise PDF our job is to ﬁgure out what was transmitted.2. but the concepts behind Inference are explained here using several experiments that were taken from [1] since these contain the essential elements needed in chapters to follow. What is the probability distribution of the number of heads r? It has a binomial distribution given by P (r|f. Detection and some odd experiments Background Binomial Distribution Deﬁne N r N! (N − r)!r! = (2.e. This is a situation commonly found in Digital Communications systems. and statistics and a proﬁciency in applying the concepts of statistics is needed.2 2. The only knowledge we do have is a statistical description of the noise probability density function (PDF). where the observed data in the receiver is corrupted by noise.

else it is undeﬁned. • The probability that x = ai is P (x = ai ) = pi . the two variables are not necessarily independent. say P (x. We may write P (x. Conditional probability We can compute probability. H)P (y|H) (2. Later on we will see how we have to make a Hypothesis if we want to infer. . Ax . This will become a cornerstone of inference. so spend time on it. By that we mean that the joint probability cannot be written as a product P (x) P (y). y. y|H) where this probability is based on a Hypothesis H. pi ≥ 0 and ai ∈Ax P (x = ai ) = 1 We don’t always want to write in such a formal way. y) is the joint probability of x and y. y = bj ) P (y = bj ) (2. For a joint probability. Joint ensembles XY the joint ensemble is an ordered pair x. y) (2. Marginal probability P (ai ) = y∈Ay P (x = ai .6) and this is the sum rule.5) known as the chain rule or product rule. Px ): • x is the outcome.18 Probability An ensemble X is a triple (x. We will simply write P (ai ). H)P (x|H) (2. Formally P (x = ai |y = bj ) ≡ P (x = ai . H)P (y|H) = P (y|x. conditioned on other given information. y|H) = P (x|y. Sum rule We may expand P(x/H) as P (x|H) = y P (x. or the value of the random variable • It may take on one of the possible values deﬁned by the set Ax usually called the alphabet. so we will use informal notation. Product rule Lets assume we have some joint probability. If we can.3) We will use the marginal probability extensively in what will follow. y|H) = y P (x|y.4) but P (y = bj ) must be larger then 0. they are independent by deﬁnition. P (x.

16 P (b = 1|a = 1)P (a = 1) + P (b = 1|a = 0)P (a = 0) P (a = 1|b = 1) = (2. only 16%. H)P (y|H) = P (x|H) P (x|y. So Gottlieb’s results come back. let us deﬁne Gottlieb’s state of health by the variable ’a’. the doctor or the Bayesian? . and the test result by variable ’b’. in all cases that people really have the disease.95 Do we agree? Also we know P (a = 1) = 0. So what do Bayesian think P (a = 1|b = 1) is? P (b = 1|a = 1)P (a = 1) = 0. How do we approach the problem? First. we think that in order to to believe something (like the statement from the doctor) probability theory must say it is probable since our believe is founded in probability theory. This doctor seems to think that the probability that Gottlieb has the disease is ’high’.2.19 Bayes’s theorem From the product rule we may ﬁnd that P (y|x.01. So who is right.e. Also then in 95% of the cases that people don’t have the disease the test is negative.e.7) (2. we need to infer the probability that Gottlieb has the disease. H)P (y|H) † † y † P (x|y . but let us compute its exact value. because the test is so reliable.2 Applications of Bayes’s theorem Gottlieb’s nasty disease [1] Gottlieb has a test done for a disease he suspects that he has.05 • P (b = 0|a = 1) = 0. In the language of Bayesian inference. exploiting all knowledge we have available (that was given).9) A Bayesian thinks the probability that Gottlieb really has the disease is rather small. i. the result is positive. The doctor tells him it is probable that he has the nasty disease. and the test is positive. H)P (y |H) (2.99. given the test result: P (a = 1|b = 1). The doctor tells Gottlieb the this test is 95% reliable. • P (b = 1|a = 1) = 0. H) = P (x|y.8) 2. and P (a = 0) = 0.05 • P (b = 0|a = 0) = 0. a = 1 implies Gottlieb deﬁnitely has the disease.95 • P (b = 1|a = 0) = 0. the chances are 1% he has it. The doctor found from past experience (prior information) that for Gottlieb being male and of a certain age and background. The question is. do we as Bayesians agree with the doctors assessment? A Bayesian believes in using probabilities to infer . the prior information. without the test being done.i. a = 0 thus means he deﬁnitely does not have it.

each time replacing the ball. given by P (n|N ) = u P (u. P (n = 3|N = 10) = 0. chooses at random an urn u. n|N ) = u P (u)P (n|u. a black ball is drawn? First oﬀ all. our experimental lady. N = 10) = 0.this we know from theory is the binomial distribution N fu (1 − fu )N −n . that enable us to make the inference. N times. fu = 10 n because each urn contains 10 balls. · · · . and is given by P (n|f. we need to compute the probability distribution of the urn identiﬁcation label u . where we have a given b. They are all equally likely. 10}. The key idea here to answer this question is to realize that we need to compute the probability for each urn. What is the probability distribution for the number of times. It may be more improbable that a black ball is drawn x times than say y times. N ).20 Experiments with black and white balls [1] Take a long hard look at the previous example. It so happens that Candy obtained n=3 black balls after N=10 draws. and then we choose the most probable one . then replace it. Candy asks us to guess the number of the urn she is using. We draw a ball at random.we will then choose the urn with maximum probability. and u are black. It involves the so called inverse probability. and K-B are white. u n P (n|u. This is a theme that will keep on repeating throughout this course.10) Now let us continue and do an inverse experiment with the balls. since both n (which was in this case 3) and N (which was 10) are given: P (u|n. Thus we need to compute P (u|n. In most inverse problems (the interesting ones) we need to infer the conditional probability of one or more unobserved variables.11) Now we just go ahead and compute the needed quantities: 1 P (u) . We thus use probability to make a choice. B are black. N ) = P (u)P (n|u. N ). So let us do some experiments with white and black balls. 3. N = 10) = 0 P (u = 1|n = 3. These we denote u ∈ {0. say n. There we will compute the probability of each valid symbol in the receiver. all 11 of them. Deﬁne fB = B/K then the distribution is binomial. N ) P (n|N ) (2. or 1 or 2 or · · · N times. Bayes’s theorem turns the probabilities around. 2. why do we say “probability distribution”? It is because there is a ﬁnite probability that a black ball is drawn either 0. i.the most probable choice.e. we end up with the inverse. For this case given n = 3 and N = 10.22 . Below we have the probability distribution in tabular form: P (u = 0|n = 3. What about P (n|N ) ? This is the marginal probability of n. N = 10) = 0. N ) = N n f (1 − fB )N −n n B (2. This will form the core of our approach to detection in Communications theory.063 P (u = 2|n = 3. but we as onlookers don’t know the urn number she selects. and make sure that you really understand the essence of the Bayesian approach. We have 11 urns. Candy. 1. Each urn contains 10 balls. and 10 − u white balls.083. An urn contains K balls. and chose the most likely one. Urn u contains u black balls. To answer the question posed above. b given a. From this urn she draws N times.in fact P (u) = 11 for all u because Candy choose the urn randomly. all identical. but the point is that the parameter n has some probability distribution. given some observed variables. N ) .

13 P (u = 6|n = 3. but based on probability theory and the idea that probabilities can be used to infer. N ) is called the Posterior Probability of u P (n|N ) is called the evidence or marginal likelihood. the probability is 0. N = 10) = 0. We must include the uncertainty explicitly in our prediction. we select urn 3 as the most likely candidate. We continue.047 P (u = 7|n = 3. is heads?”. N = 10) = 0. N = 10) = 0. (Explain why the probability calculation says that it is deﬁnitely not urn 0 or 10?) Secondly.333. n.29 P (u = 4|n = 3.24 P (u = 5|n = 3. N = 10) = 0.21 P (u = 3|n = 3. It will say “well we know the most probable urn is urn 3. N = 10) = 0. N = 10) = 0 So what is the most likely urn that Candy is using given the evidence? It appears to be urn 3. It is because the urn is GIVEN What about P (u|n. unless the probabilities for the other urns were negligible. P (ball N + 1 is black|u. let us be more . we ﬁnd P (next ball is black|n = 3.00086 P (u = 9|n = 3. N ) is called the likelihood of u P (u|n. the probability for a next back ball is 0. We ask Candy to draw another ball from the same urn. N = 10) = 0.but the consequences far reaching. n. The answer is that if it is a fair coin.0099 P (u = 8|n = 3. N = 10) = 0. Now. if Candy choose a ball 20 times and replacing each time. in this case the ﬁrst toss. N ) = fu = in P (ball N + 1 is black|u. by summing (integrating) over all the urns and incorporating the probability distribution we computed above. N ). N ). N ) = u P (ball N + 1 is black|u. what do we think would happen to the distribution and our uncertainty? So how does more evidence inﬂuence uncertainty? Notation and naming conventions P (u) is called the Prior probability for u P (n|u. N )P (u|n. which it was not in this case.00000096 P (u = 10|n = 3. So the correct probability computation yields a slightly higher value. regardless of n and N.3”. ﬁrst studied by Thomas Bayes in 1763.5. The essential ideas are simple . u 10 . So under the Hypothesis that she is drawing from urn 3. Substituting numerical values. However. the probabilities for some of the other urns are not far oﬀ . The unfair Coin [1] This is a classic problem. The Bayesian approach is to say: P (next ball is black|n. We are given a coin and asked “what is the probability that the next toss. n. N )? It is the probability distribution we computed in the ﬁrst part of this experiment.so we are uncertain. What do we think is the probability that she will draw a back ball? Standard statistical analysis will solve this problem as follows. This is an incorrect solution according to Bayesian inference. It contains the uncertainty that we have in what urn Candy is drawing from. We cannot make a Hypothesis about what urn she is drawing from to predict. N = 10) = 0.

the best choice among the M possible explanations is to choose the most probable one. F ) = P (s|pa . F ) = pa Fa (1 − pa )Fb where Fa indicates the number of heads. F )dpa (2. F ) where a is the probability for the F + 1 toss of coming up heads. Given the observed information and the stated assumptions of the experiment . given the observed data. F ) using Bayes’s theorem as follows: P (pa |s. Since P (a|pa ) = pa . However.13) • P (s|pa . and choose the one with the highest probability. The probability for coming up heads for the F + 1 toss. The prediction thus has the eﬀect of incorporating the uncertainty we have about pa . say a.e.this is a more diﬃcult question to answer without ambiguity. • P (pa ) . . given the observed sequence s that contain F entries. a good assumption is to assume it can be anything. F ) itself has to be inferred from the data as well. and the probability distribution P (pa |s. Now we are asked the same question. Now what would we say if asked the same question? If it is the ﬁrst toss. P (pa ) = 1. given the results of the previous F tosses.12) where pa is the probability of coming up heads. a good guess would be again to say 0.22 general and say we don’t know yet if the coin is fair. In the rest of the notes this theme will repeat over and over in the design of Digital Communication systems. and we may use other priors too. it may have a bias and tend to come up more heads than tails (or vice versa). We may infer P (pa |s. By the sum rule. F )P (pa ) P (s|F ) (2. is therefore a parameter we wish to infer.5. and F = Fa + Fb . given a sequence of observations of the previous F tosses”? So we wish to infer P (a|s. we seek to write down the probability for the parameter a. i.e. this is by no means a unique choice. 2. i. F ) = P (a|pa )P (pa |s. but we are provided with a number of previous tosses and the results as evidence. because we do not have any observed information regarding the behaviour of the coin. we may predict a as P (a|s. we focus on the second term. Since we have no idea of the extent of the coins bias. one has no choice but to compute the probability of each possibility (or Hypothesis). So a useful question to ask is “what is the probability of the coin coming up heads for the F + 1 toss.3 Conclusion This chapter introduced probability theory and a central concept that will be used repeatedly in this notes: when one is faced with choosing between M possible explanations for experimental data where the evidence is not enough to make a deﬁnitive choice. So here the parameter to be inferred is itself a probability! As usual.

so the transmitter is able to send 1 symbol over the channel in T seconds. If the bit to be send over the channel is a logical 1. and that may be used to modulate the in-phase and quadrature phase components of a carrier wave that is transmitted over the channel. If its a logical 0 we multiply the carrier wave by -1.1: The modulation of 1 bit in amplitude modulation. say Q bits.Chapter 3 The modulator and demodulator In general terms the job of the modulator is to take a few bits from the encoder at a time. Refer to Figure 3. i. 23 . The symbol duration is T seconds. the higher is the thru-put of the Communication system. In this case we also transmit 1 bit per T seconds. The time it takes for the symbol to be transmitted is T seconds. The more bits the modulated symbol.1. Q coded bits are send over the channel in T seconds. where we use the simplest form of modulation namely amplitude modulation. i. s(t) represent. and produce a complex symbol that is an analogue function of time. because each symbol just represents 1 bit in this simple modulation scheme. say s(t) representing those Q bits. we transmit 1 symbol per T seconds. the Bit Error Rate for z in ˜ the receiver will increase if Q increases.e.e. i. RF(t) = cos(2 πf c t) radio frequency carrier wave s(t) = g(t) RF(t) j2πfct = Re{A exp } where nT < t < (n+1)T for the n’th symbol to antenna or power amplifier data symbol wave g(t) = A where A = 1 if info bit = "1" A = −1 if info bit = "0" Figure 3.e. we multiply the RF carrier frequency by 1. but the more vulnerable the receiver will be to noise.

24 In the literature the data symbol wave g(t) is mostly referred to as the pulse shaping function. The reason is that we may shape the signal send to the power ampliﬁer and antenna by choosing g(t) wisely. There are many reasons why we would want to do that. For example the government may restrict the spectrum where you are allowed to transmit (they mostly do), and hence you want the RF transmitted signal to not spill over outside your frequency band that you rented from the authorities (since you will be ﬁned severely if you do that). In the simple example in Figure 3.1 the pulse shaping function was is fact g(t) = 1 f or nT < t < (n + 1)T. (3.1)

So without knowing it, we chose the simplest possible shaping function, namely a square pulse. It frequency spectrum has a shape sin(αω) , which you can draw to see what it looks like. Its not very ω optimal since it does not utilise the frequency spectrum wisely when forced to ﬁt into the ﬁnite frequency band available. Figure out for yourself why this is so. Secondly, since g(t) had only one of 2 possible values its alphabet size was 2. In practice we may choose M values in general. In fact, we need not always choose a real number like we did above. We can choose a complex number of levels and pulse shaping function, since the operator that selects the real part (Re{})as shown in Figure 3.1 guarantees that the ﬁnal waveform send to the power ampliﬁer and transmitter antenna is real, as only real signals exist in the real world. Nothing however stops us from using complex signals in the modulator mathematics then, and its generally done like that in the literature.

3.1

3.1.1

Modulation continued

The concept of base band signal processing and detection

In the literature the function g(t) is considered to be in the base band, because it has not yet been frequency translated yet (multiplied by a carrier wave) to the RF carrier frequency where the government rented you some bandwidth to operate in. For example the GSM cellular system in SA is on the 800 to 900 MHz band. All the signal processing mathematics can be done on the base band, since the translation up to RF frequency before transmission can be reversed again at the receiver by translating down again to the base band. Remember the receiver knows the RF carrier frequencies its supposed to operate on.

3.1.2

Types of modulation

There are many types of modulation alphabets that we may use for modulation. If the alphabet contains M entries, then we can map Q = log2 (M ) bits to each symbol from that alphabet. Let us consider a number of modulation schemes for diﬀerent alphabet sizes.

3.1.3

Binary phase shift keying (BPSK)

This is the one you are familiar with, that we used above in Figure 3.1. Here the alphabet A contains two entries, i.e. the i’th component of A must be one of two values or Ai ∈ {1, −1}. We are at liberty to choose our own pulse shaping function to make the analogue symbol s(t), and we denote that function as g(t). So for BPSK the modulator for a part of the binary string z operates as shown in Figure 3.2. What becomes clear for the case of BPSK modulation is that 1 bit from x maps to 1

25

X= 1

imag

0

imag

1

imag

0

imag

real "1"

real "0"

real

real "0"

real

real "0"

real

s(t) = A g(t) 1 2

s(t) = A g(t) 1 2

s(t) = A g(t) 2 3

s(t) = A g(t) 1 4

== Symbol selected to match bit/s from bit string x g(t) 1 t T

A = −1 1 A = +1 2

Figure 3.2: The modulation of 4 coded bits x via BPSK modulation.

symbol sn (t) that is T seconds long. So every T seconds we are able to transmit 1 coded bit. Also, in this case the modulated symbols sn (t) are real valued. They do not have an imaginary part, because the alphabet contains only real elements [−1, 1]. Later it will become clear that even though BPSK is only able to transmit a single bit per symbol, it is very immune to noise. Also, we may view BPSK to modulate the phase of the carrier wave, because the amplitude of the alphabet elements both have a magnitude of 1. Thus the amplitude is not modulated, only phase is modulated.

3.1.4

Four level pulse amplitude modulation (4PAM)

Here we have a case of amplitude modulation when we consider 4PAM. Since there are 4 points in the alphabet or Ai ∈ {1, 0.5, −0.5, −1} we are able to map 2 bits from x per symbol. We choose exactly the same pulse shaping strategy as in the previous example, it is only the components of the alphabet, i.e. A that change. 4PAM is shown in Figure 3.3.

X= 10110001

imag imag imag imag

real

real

real

real

real

real

real

"10" s(t) = A g(t) 1 3

"11" s(t) = A g(t) 2 2

"00" s(t) = A g(t) 4 3

"01" s(t) = A g(t) 4 1

== Symbol selected to match bit/s from bit string x g(t) 1 t T A = −1 1 A = −0.5 2 A = 0.5 3 A =1 4

Figure 3.3: The modulation of 8 coded bits from x via 4 PAM modulation.

26

3.1.5

Quadrature phase shift keying (QPSK)

QPSK can also be viewed as 4 PSK, and is a phase modulation technique. The amplitude is not modulated as was the case for 4PAM, but it is also able to map 2 bits to each alphabet point. Once again we choose the same pulse shaping function as we did in the previous cases, but here the alphabet symbols are complex, and thus the analogue symbol sn (t) is also complex so that both the in-phase and quadrature components of the carrier wave will be modulated. Refer to Figure 3.4 for an explanation

X=10010011

imag imag imag imag

−1 −j

j

1 real "10"

real

"01"

"11"

real real real real real

"00" s(t) = A g(t) 1 2 s(t) = A g(t) 2 3 s(t) = A g(t) 4 4

s(t) = A g(t) 1 3

== Symbol selected to match bit/s from bit string x g(t) 1 t T A = −1 1 A = −j 2 A =1 3 A =j 4

Figure 3.4: The modulation of 8 coded bits from x via QPSK modulation. of the modulation scheme. Clearly 8 coded bits from x were mapped to 4 QPSK symbols.

3.1.6

Eight phase shift keying (8 PSK)

In this case we are able to map 3 coded bits from x to each symbol, that is complex. We choose again the same pulse shaping function to produce the analogue symbols used to modulate the carrier wave with. Figure 3.5 shows the 8 PSK alphabet and the bit mapping used. Note that in all cases only 1 bit changes its value when the symbol changes, a strategy known as Grey mapping.

3.2

De-modulation

What we dealt with in the previous sections were the transmitter operations, i.e. getting binary data onto a RF carrier wave known as modulation. In modulation information bits are grouped into M bits and are allocated to a symbol that is able to accept M bits to create one symbol to be transmitted. For example 8 P SK could represent 3 bits per symbol of duration T seconds. So every T seconds one symbol is transmitter over the air via the transmitter antenna, and it suﬀers multipath distortion and attenuation over the channel. Over a ﬁxed period of time, a series of complex symbols s = [s1 (t), s2 (t), s3 (t), s4 (t), · · · , sN (t)] were used to modulate the carrier wave in the transmitter. All in all M N bits were so transmitted in N T seconds if each symbol could represent M bits. The receiver on the other hand, has to perform de-modulation, the opposite process of modulation that was performed in the transmitter. Because of the fact that the receiver does not know, to begin with, what the data is that the transmitter transmitted, plus the fact that the channel causes distortions to the transmitted data due to multipath propagation, and then ﬁnally the receiver is bombarded with thermal noise in its own electronics plus other interference sources (both human made and non-human made), it has a very diﬃcult job sumarized in a single word: de-modualtion.

27

M=1 111 (0+j1) M=2 011

1 2

X=011 s(t) = A g(t) 2

M=8 110

1 2

(−1+j1)

(1+j1)

M=7 100 (−1+j0)

M=3 010 (1+j0)

M=6 101

1 2

M=4 000

1 2

(−1−j1)

(1−j1)

(0−j1) 001 M=5

Figure 3.5: The modulation of 3 coded bits from x via 8 PSK modulation.

The ﬁrst step in the receiver is to move (translate) the signal which is located at the carrier frequency back into baseband. That is indicated in Figure 3.6, and is performed using a local ossilator, a multiplier and baseband ﬁlter. The local ossilator in nonperfect, i.e. its drifts somewhat over time and this causes a so-called frequency oﬀset error, but we will not deal with that complication now. Let us assume that the local ossilator is perfect. After translation to the base band as shown in Figure 3.6, the receiver received a corrupted version of the series of transmitted (complex again now in base band) symbols s. The corruption due to the multipath progation can be modelled by modelling it as a discrete convolution process, i.e. the multipath channel is seen as a system (black box) that has an impulse response denoted c that is either known or if not (mostly the case) can be estimated somehow 1 . So, since we now assume to know or at least have an estimate of c, we may model the eﬀect of the channel as a convolution with the transmitted data. Let r(t) denote the received samples after the translation and bandpass ﬁlter operations. The sum of the receiver internal thermal noise plus all other interference sources received by the receiver antenna is denoted ns (t) and is assumed additive. Then in mathematical terms the baseband received signal r(t) can be expanded in terms of the transmitted data s(t) over a symbol duration T seconds long as

L

r(t) =

k=0

ck sn−k (t) + ns (t)

(3.2)

At this point in Figure 3.6 we have an analogue baseband signal r(t), that still has not been demodulated yet. Demodulation is complete when the binary data that the transmitter transmitted has been recovered by the receiver. However, to do that modern receivers apply digital techniques based on Detection techniques 2 . But before a digital detection operation can be performed, we must

1 This channel estimation problem has been solved by very innovative ways in cellular systems, we will get to it in chapters to follow 2 Detection methods are regarded by some as Artiﬁcial Intelligence agents, using probabilistic methods, a topic dealt with in the next chapter

This sample will have the highest possible Signal to Noise Ratio (SNR). Then at this maximum it is sampled to convert it to a digital sample.3) This choice is called the matched ﬁlter. With reference to Figure 3. a symbol time. Each increment of n will then imply T seconds of physical time has elapsed.28 receiver antenna fc ω multiplier or mixer RF antenna electronics bandpass filter fc ω ω bandpass filter r(t) matched filter local ossilator digital sampler estimated data bits detector or AI rational agent y[n] Figure 3. The question that now arises is what series of steps the de-modulator must follow to produce a digital signal that we can pass to the detector. i.e. the ﬁlter h(t) is not just any old ﬁlter. However. So here we assume c = [c0 ] and hence for this special case the convolution summation dissapears and r(t) = c0 sn (t) + ns (t). Thus the output of the sampler at the peak of the matched ﬁlter output when c = [c0 ] (the . i. Speciﬁcally it is chosen as h(t) = g(T − t). and assume that c has only 1 tap (one entry. There are no other ﬁlters able to produce a SNR higher than the SNR for the matched ﬁlter. among all linear ﬁlters it is the optimum one. (3. it is chosen to complement the pulse shaping function g(t) for reasons that will become clear below.e.6: The ﬁrst stages of the receiver hardware. convert the analogue signal r(t) to an equivalent digital one. indicating where the detector (an AI device) come into play. Using the concept of relaxation we relax the conditions to make the analysis simpler. denoted y[n] where n now indicates sampled (digital) time.7 we see that the receiver simply convolves the baseband signal r(t) with a ﬁlter with impulse response h(t). since the output of the matcher ﬁlter will achieve a maximum over the symbol time T . its not a vector under this assumption).

8) if g(t) is chosen so that (g(t) ∗ g(T − t)) |t=T = 1. the channel impulse response c0 and the thermal noise sample ns [n].2.does that sound familiar (EAI 310!)? With a good estimate for An from the detector. In general. An you may recall is the complex symbol that the transmitter created from the binary data. the data bits is recovered . y[P ] is passed from the demodulator to the .7) (3.6) (3. However as you will later see.1 What if there is multipath? In the previous section we made a relaxation assumption that there were no multipath components in the transmitter. relaxation assumption) is given by y[n] = (r(t) ∗ h(t)) |t=T = (c0 sn−0 (t) ∗ h(t) + ns (t) ∗ h(t)) |t=T = (c0 An g(t) ∗ h(t)) |t=T + ns [n] = (c0 An g(t) ∗ g(T − t)) |t=T + ns [n] = c0 An + ns [n] (3.4) (3.10) ck An−k + ns [n] and a vector with P entries denoted y = [y[1].albeit full or errors due to the noise. clearly there is multipath.5) (3. It is the detector (or AI agents) job to ﬁgure out what is An given y[n] and a priori knowledge of the probability distribution function for ns [n] . so how does the above analysis change then? For the case where there are L + 1 taps (multi-path components) in the channel impulse response vector C then the output of the de-modulator will be y[n] = = k=0 (r(t) ∗ h(t)) |t=T L (3.9) (3. 3.29 matched filter−sampler pair t = nT r(t) h(t) = g(T−t) matched filter sampler to convert to digital sequence y[n] Figure 3. relating the digital sample y[n] produced by the demodulator. we will use error correction coding (another large ﬁeld of research in AI) in the transmitter to be able to correct those errors in the receiver. y[2]. that is unknown. · · · . So now we have a mathematical relation.7: The de-modulation using a matched ﬁlter and optimum sampling.

A later chapter dealing with preﬁlter design will address this case in detail. . However in this case the use of the matched ﬁlter is insuﬃcient to yield an output SNR that is maximised.30 detector.

that are rather complicated. there will be 8 discrete possibilities for x. This is an important class of problems often found in practice. The input x is passed to the channel.one has to be careful when making statements about these concepts. There is no fading of the transmitted signal like typically will occur in radio channels. Communications systems oﬀer a very nice environment to study these topics. Denote the input to the channel as x and the observed output as y. Data networks can also be modelled in this way. Most communication systems can be broken down to a few blocks as shown in Figure 1. and is the symmetric and static (time invariant) Gaussian channel. and the interpretation of methods based on Maximum Likelihood (ML). using Baysian Inference. where it is corrupted with the Gaussian noise. Most of the concepts that are important to understand in Detection can be studied by applying the ideas to the two blocks. in a practical way that is easy to simulate (and eventually to understand) on a computer. Maximum Likelihood Sequence Estimation (MLSE) and Maximum Aposteori Probability (MAP) criteria need to be scrutinised in detail. In this course we will look at two of the blocks: the symbol detector (also referred to as the equaliser in the literature) and the decoder.a topic for a chapter to follow . the channel is the cables. the transmitter and receiver hardware where the data symbols are corrupted. There can be a lot of confusion between these concepts . For example in 8 PSK. The noise energy (σ 2 ) can be estimated in the receiver . and the only impairment is additive Gaussian noise. and can be viewed as the “channel”. The storage media is noisy.4. The symbols x can be one of N possibilities as allowed by the alphabet in use. 4. Speciﬁcally we study the concepts behind the inference of unobserved parameters given observed data.1 Introduction In this chapter we study detection. This process applies equally well to say a magnetic or optical recording device. The examples chosen in this chapter have been chosen to make the subtle diﬀerences clear.here we 31 . and we observe the corrupted output y.Chapter 4 Detection 4.2 The static Gaussian channel The ﬁrst channel and a classic example. By that we mean the noise have a Gaussian distributed pdf.

The key observation is to notice that the evidence term is not aﬀected by what symbol Ak is being considered to decide which one maximises that poterior probability. So it can also be moved outside the brackets and be neglected as it does not inﬂuence the maximisation process. Nobody has yet come up with a better approach. then chose the one with the maximum probability.for example if we have BPSK 2 then Ak could be one of two possibilities namely -1 or 1 corresponding to a logical “0” or “1”). Hence on the basis of the observed symbol y we have the posterior probability that the transmitted symbol was Ak (one of the possible symbols transmitted . So we can yet again simplify the MAP choice by writing it as max {P (x = Ak |y)} = max 1 exp− (2πσ 2 ) y−Ak 2 /2σ2 Ak ∀k 1 Would 2 Binary Ak ∀k = min Ak ∀k y − Ak 2 .P (Ak ) max {P (x = Ak |y)} = max . It is the same regardless of the choice of Ak and hence can be moved outside the brackets. where we have to guess what was transmitted. its the minimisation of the Euclidean distance between y and Ak i.e. the outcome of observed symbol i is independent of previous transmitted symbols 1 . i. So we have to choose Ak so that max {P (x = Ak |y)} = max 1 (2πσ 2 ) exp− y−Ak 2 (4. Increasing noise energy causes the diﬀerentiation between diﬀerent Ak to become more blurred and hence inference becomes more diﬃcult.3) P (y) Recall what we did when we had to guess the urn Candy was using to draw balls from. given noisy observed data. i. We calculated the probability of all the urns. (4.e. (4.6) this assumption hold if the channel introduced multipath propagation? Phase Shift Keying . the Maximum a Posteori (MAP) choice. (4. (4.2) exp− y−Ak 2 /2σ2 .P (x) We must now address the issues of the evidence term P (y) and the prior term P (Ak ). And we know the noise is white and its distribution is Gaussian.4) Ak ∀k Ak ∀k P (y) /2σ2 Ak ∀k Ak ∀k .5) Now notice what determines the maximisation of P (x = Ak |y). The prior we deal with by saying that all symbols are equally likely. P (x = Ak |y) = P (y|x)P (x) P (y) (4. How can we design a device (or intelligent agent) capable of inferring what was transmitted given the observed data (a symbol detector)? There are a few assumptions we can make that are applicable for this simple channel. That was the best we could do. an assumption which is valid if we were transmitting random data such as a compressed voice.1) where P (y|x) is proportional to the PDF which is Gaussian (the noise is Gaussian) p(y|x) = So we can write P (x = Ak |y) ∝ 1 exp− (2πσ 2 ) √ 1 (2πσ2 ) y−x 2 /2σ2 . we must choose 2 2 1 √(2πσ2 ) exp− y−Ak /2σ .e. y − Ak 2 . Here the channel is memoryless. So let us apply this same technique to this static Gaussian channel. To apply MAP.32 assume we know it.

In the next section we will se how the MAP choice is complicated when the channel also introduces multipath signals. since that is what we proved MAP detection tells us to do under these conditions (Gaussian static channel). one where the channel quality is good (high SNR) and one where the channel quality is poor (low SNR). i. Two cases are shown. This of course is not known by the receiver .1. .1: MAP detection on a static Gaussian channel is selecting the modulation constellation point closest to the noise corrupted received samples. the decoder that will follow the detector will be able to do much with the probability info for each encoded bit as we will see in a later chapter 3 . The transmitter transmitted a symbol.33 This proves that the MAP choice is the one that minimises the Euclidean distance between the observed noisy output y and the alphabet points on the complex constellation as shown in Figure 4. and the corresponding coded bit probabilities computed by the receiver In the previous section we computed just the best (MAP) constellation point. Two cases are shown. The constellation used is shown in Figure 3.e. However.1 Computing more than just the most likely symbol: probabilities of all constellation points. In both cases the transmitter sent 10 symbols that were all −1 and a QPSK modulation scheme was used. imag large noise power (low SNR) observations y(n) j −1 imag QPSK 1 real observations y(n) j QPSK 1 −1 real −j small noise power (high SNR) −j Figure 4.5. each symbol represents 3 bits from the encoded vector. So let us compute all the probabilities for all the constellation points and the bits used to make up the modulated symbols.it will select the closest of the 4 constellation points. one where the channel quality is good (high SNR) and one where the channel quality is poor (low SNR). Imagine we have a transmitter transmitting 8 PSK symbols. denoted Ai where i was one of 8 3 So called Soft Decision Decoding.2. 4.

b. (4. i) was. The most probable symbol turns out to be symbol Ak=8 . Hence we can write 2 2 P (x = Ak |y) = β exp− y−x /2σ (4. Now.it is very important to realize that P (x = Ak |y) = exp− y−x /2σ since a probability cannot be equal to a probability density function. then choose the one with the maximum probability (the most likely one). 5. 0. c being 1. The receiver must try to determine what the transmitted symbol (i.10) The value of β may be determined by combining equations (4. 6. Hence we may demand 8 k=1 P (x = Ak |y) = 1. (4. 1. but the value of β is still undetermined. you (the receiver) are given an observed complex number that came out the de-modulator. Since P (x = Ak |y) is a probability (not a pdf) it has to comply with the axioms of probability theory. One of them says that if a probability is summed over all its possible outcomes. so for each P (x = Ak |y) term we have to compute the 8 Euclidean distance metrics D(k) = y[1] − Ak 2 (4. The 8 2 2 value of β has to be determined .8). 2. We know the noise is Gaussian. and then choose the maximum one. Which bit was most reliably detected? Intuitively why is this so? .8) where P (x) = 1 was absorbed into the constant β along with all other constants including P (y). There are 8 possibilities of Ak .1 + j1.9) then we have 8 values for D(k).12) 4 As an exercise go and compute the 3 bit probabilities. For bit c to be a zero we have P (c =′′ 0 ′′ ) = P (Ak=8 |y) + P (Ak=7 |y) + P (Ak=4 |y) + P (Ak=3 |y). This is left to the reader as an exercise. 8}.11) For bit b we have P (b =′′ 1 ′′ ) = P (Ak=8 |y) + P (Ak=1 |y) + P (Ak=2 |y) + P (Ak=3 |y). 0?” We know the symbol probabilities. There we learned that the optimal strategy is to compute the probability of each possibility.8). 3. 1. (4. The probability for the ﬁrst bit a to be a one is 4 P (a =′′ 1 ′′ ) = P (Ak=8 |y) + P (Ak=7 |y) + P (Ak=6 |y) + P (Ak=1 |y). We thus compute the posterior probability of each of the 8 possible symbols it could have been. this is y[1] = −1. So the bits that the transmitter sent were most likely 1.9) and (4. These we may substitute into equation (4.13) (4. 7. so we may compute the bit probabilities. (4. using a device called a detector. it must yield one.34 possibilities. P (y) is common to all values of k. We will follow that same strategy here. so P (x) is 8 regardless of k.e. The next question is “what is the probability of the 3 bits a.7) 1 We assume all symbols are a equally likely. k = {1.10). Thus for the k’th symbol in the alphabet we need to compute P (x = Ak |y). The probability P (y|x) is thus proportional to the noise pdf which has a Gaussian distribution. 4. what was transmitted? Our strategy is based on what we learned from Candy’s example with the white and black balls. which is given by P (x = Ak |y) = P (y|x)P (x) P (y) (4.

and in the absence of other information. I. and it needs to be estimated . what was the entire block of symbols x[1]. the detector (the device we are now designing) may assume that all . accept the fact that we may accurately estimate c using some known symbols in between unknown data symbols. we can model these channels as linear time invariant convolutional channels. Inferring the most probable symbols are treated in the next two sections. In most cases in practice. we will simply assume that the channel estimate is available from a channel estimator module in the receiver.14) where c is a vector containing the channel impulse response and n is additive white Gaussian noise. but we will not get into the topic of channel estimation in this chapter . we write the posterior probability of the sequence x as P (x|y) = P (y|x)P (x) P (y) (4. Later in this notes it will be shown that the most probable sequence does not necessarily contain the most probable symbols.e. we identify two aspects of that question: • We need to estimate the most probable block of data. In that case it was easy to show how the optimal inference is derived at from knowledge of the noise distribution (PDF) and Bayes theorem. communications channels. In general we don’t know the IR. we can write the relationship between the transmitted symbols Ak and received symbols y as L y[k] = i=0 ci Ak−i + ns [k] (4. The assumption that the channel c is time invariant can be satisﬁed if we consider the detection of small enough blocks of data symbols. So using Bayes theorem.e. Now.15) The noise is white (uncorrelated) so that we can use the separability of the noise and write p(y|x) = PN 1 exp− k=1 2 )N (2πσ yk − PL i=0 hi xk−i 2 /2σ2 . We are of course assuming that the channel is sampled at or above the Nyquist rate. i. x[N ] that was transmitted (not only just one of the symbols) ? Notice that we cannot solve this problem as before because of the channel memory (IR has multiple taps). we need to infer the most probable sequence. To reﬁne the question posed above.the most likely sequence estimate In the previous section we dealt with channels that were impaired by Gaussian noise but had no memory (no ISI). · · · .3 MLSE . In this section we consider the inference of the most probable sequence. such channels in practice are hard to come by. · · · . For now. Most storage media. but with profound implications: Given a block of observed symbols y = {y[1]. referred to as as sequence. y[2]. (4.16) So what about the prior P (x)? Since we may have an interleaver after the encoder. x[2]. For now.35 4. indicated by k. we pose a simple question. The reader may invest in ﬁguring out for herself why multiple taps in the IR can be viewed as modelling the channel memory. y[N ]}. • We need to infer the probability of each symbol being correct in that sequence. However. waveguides such as cables and ﬁbers have memory due to a variety of reasons that we wont go into here.we will in a later chapter.

For example assume we have BPSK modulation. (4. we just need to minimise −logP (y|t) N L F= k=1 yk − hi xk−i i=0 2 . which has been invented in more than one ﬁeld of Science. It is called the Min-Sum algorithm. For that reason. thereby eliminating the need to complete enumeration. Hence. As the length of the block N increases the complexity grows as 2N and we require exponentially more computations on the computer 5 .17) Before we solve the problem of minimising this function with the Min-Sum algorithm. So we proceed according to the Min-Sum algorithm as follows: 5 This is a so-called NP complete problem . First we recognise that in maximising the likelihood function (Maximum Likelihood). how do we choose the shortest path from A to B? We do not want to compute the total distance of all possible paths (complete enumeration) because it is too expensive. as indicated in Figure 4. one fool proof option is called complete enumeration. in Communications it is known as Viterbi’s algorithm.1 Finding the sequence x via the MLSE It is one thing to write down the expressions for the Bayesian Inference of the sequence x.36 symbols are equally likely. For example. The probability of the data P (y) does not inﬂuence the choice of x. let us look at a simple example of the application of this algorithm. Information gained at one node of the map is passed to neighbours. quite another to do it in a computationally eﬃcient manner. We ask. An algorithm exists that can solve this optimisation problem exactly with signiﬁcantly less complexity. is also called the Maximum Likelihood Sequence Estimator (MLSE). The Min-Sum algorithm direction of travel J H 40 20 10 20 20 M 10 A 10 K 20 10 B N 30 I 10 30 L Figure 4. We make use of the concept of message passing.3.infer the shortest route with least cost or distance? Consider a map of a province. we come to the conclusion that we may in fact just ﬁnd the sequence x that maximises PN PL 2 2 exp− k=1 yk − i=0 hi xk−i /2σ . 4.2: The road-map between town A and B .2. we simply go through all 2N combinations of the sequence of length N and choose the best one. the likelihood function. this type of MAP sequence detector given all symbols transmitted are equally probable. where 2 cities are connected via several towns.

e. and we may identify a cost at each node of the trellis. the channel memory was given by the impulse response vector as c = [c0 c1 ] and thus had 2 taps. with least cost. We compute both. the most probable transmitted bits from the encoded vector z is 1. 1. via J and via K. via K. simply adding the cost of H-J to the cost known at H. and choose the winner. and with node I 10. 1. • There are two remaining paths to B. i. in this case 1. and we need to delay the ﬁrst pruning L nodes. There is also two paths to N. 4. Thus we may ﬁnd the most probable sequence of symbols transmitted for the case in Figure 4. It is optimal in the sequence sense. 0.e. with cost 20 miles at node L.2 3 tap detector In the previous example. and prune away the worst one. There is an alternative path to I with cost 10 miles.i. A-I-K-N-B and A-I-K-M-B. 3. However. graph or trellis that represent all the paths that are possible for the sequence x.K. giving A-I-K-M with cost 50 at M. MLSE with min-sum detection does not cause any noise enhancement. Time ﬂows from left to right. we select A-I-K-N with cost 40 miles at N. 1 or -1. This information is passed to H and I.3. we select the path of the two. where the distance travelled is 0. We prune away the other path. A-I-K-M-B with total cost 60 miles. The trick of solving the minimisation of F is to realize that we may draw a map.3 as 1. The cost associated with node H is thus 40. Now we return to the MLSE or min-sum detection. 1. to K there are two competing paths. · · · we compute the accumulated metrics that contest that node. At the end of the trellis we ﬁnd the overall winner. in this case A-H-K. 1. Thus we need to compute 4 metrics per time n. • There are two competing paths to M. This can be seen by noting that a contest between 2 paths only develop at n = 2. incorporating the history of the path that was taken to get to that node.i. • We examine the paths to the next set of towns. J. A-H-K and A-I-K. because of the memory due to h. 1. 1. This in essence is the power of the min-sum algorithm . The redundancies removed causes no degradation of the ﬁnal results . and led to the simple trellis of Figure 4. −1. The surviver or winner path A-I-K-M-B is the path that gave the least cost.3.e. There is also a path A-I-L.e. We show only part of the trellis in Figure 4. i. c(1)]. We now associate the cost 30 miles with node K.37 • We start at A.it cuts out all the redundant calculations. This enabled us to associate one transmitted symbol value per state (node).L. However. The Euclidean metrics indicated are computed as −log(P (y|t)). 4. lets draw a trellis for the case of BPSK modulation so that there are 2 possibilities for each data symbol. So at each node for n = 2. −1 In this case each symbol also represents a bit . The path A-H-J has cost 60 miles. has L + 1 = 2 taps. We prune away the loosing path. for the 3 tap case the impulse response vector . There are 1 known pilot symbol at the start and end of the trellis: each a 1. We select the least one. and retains the minimum possible calculations needed to get exactly the same answer as complete enumeration. The impulse response we have in this case is h = [c(0). For example.3. From A to H there is a path with cost 40 miles. The path it took resolves the most probable sequence taken by the transmitter. 0.

i. The min-sum algorithm is executed on this trellis in the same way as before. 4. On such a trellis. there will be M winning paths per time node. but was unable to produce any estimate of the probability of the individual symbols being correct. and a MAP method using forward and backward iterations on the trellis was devised by Bahl et al [3] and known as the BCJR algorithm at about the same time. and direct approaches based on Bayesian inference or MAP criteria are able to produce better estimates for the probabilities when there is multipath. M per state (since M contest at each state). Note that certain states are not connected here as they are illegal. However. However. A method based on Bayesian detection was devised by Abend and Frichman in the early 70’s. With this notation we may set up a trellis for BPSK and 3 taps as shown in Figure 4. c2 ].4. as the decoder that follows the detector may use probability information eﬃciently to decode as we shall see in later chapters. c1 . May authors have extended the min-sum algorithm so that it produces probability information as well. and we have to associate 2 transmitted symbol values with each state in the trellis. but only one winner that will resolve the transmitter symbols as the most likely sequence that the transmitter transmitted. most notably the Soft Output Viterbi Algorithm (SOVA) by Hagenauer in Germany. Let us denote that pair by An An−1 . the BCJR algorithm was known in the Artiﬁcial . This is a fundamental limitation of the min-sum (Viterbi) algorithm.38 observed data from demodulator: y[0] n=0 1 y[1] n=1 1 y[2] n=2 1 ∆8 ∆ 12 −1 ∆ 17 −1 ∆18 y[3] ∆3 n=3 1 ∆9 ∆ 13 ∆4 y[4] n=4 1 y[5] ∆5 n=5 1 ∆6 y[6] n=6 1 ∆1 ∆7 ∆2 ∆ 10 ∆14 ∆ 19 −1 ∆ 20 ∆ 11 ∆ 15 −1 winner path has least total accumelated∆ ∆ 16 −1 Transmitter has 2 possible states per time n: 1 or −1 since BPSK was used 2 n ∆k = y[n] − c A − c A n−1 1 0 n A => either 1 or −1 depending on position (i=1 or 2) in trellis i Figure 4. the probabilities are in fact suboptimal. the transmitter is unable to move from certain states to certain others. if we have a (L+1) tap channel impulse response c with a modulator that has M symbols in its alphabet.e.3: The trellis .3.3 Discussion Note that the min-sum algorithm was able to produce accurate estimates of the data symbols in the receiver when the channel has memory. then the trellis will have M L states per time node. In general.that will be the MLSE sequence! is c = [c0 .infer the shortest route with least cost .

4 Probabilistic Detection via Bayesian Inference for Multipath channels Sub optimal detected bit probability calculation 4. and the method by Abend and Frichman. See http://opensource. Hence we can use a multi step procedure: 1.39 y[1] 11 11 1−1 −11 −1−1 ∆1 y[2] 0 A1 ∆2 ∆5 y[3] 10 AA ∆3 y[4] 21 AA y[5] 32 AA y[6] 3 1A y[7] 11 ∆4 ∆6 winner path least cost ∆7 2 n n−1 n−2 ∆ = y[n] − c A − c A − c A 0 1 2 Figure 4. 6 In the next section we present an approximate method based on the Viterbi trellis.that will be the MLSE sequence! intelligence community already at that time as the Pearl Believe Propagation algorithm. Complete a min-sum detector and this produces an estimate for all the symbol values without any probability info.18) and Ak−m . m ≥ 1 before executing the min-sum algorithm. latter algorithm can be most eﬃciently executed on a so-called Frey graph.infer the shortest route with least cost .4: The trellis . m ≥ 1.za/ for a demo on the decoding of repetition and convolutional codes using the Pearl Believe Propagation algorithm on a Frey graph. For each k. The section after introduces the forward-backward MAP algorithm.1 For the case where L = 0 we had no memory in the channel (1 tap channel) and the probabilistic detection dealt with in a previous section was in fact trivial. (4. 6 The . Use the output from the Viterbi (min-sum) to ﬁnd the values for Ak−m .up. m ≥ 1 from step 2.18) The problem is that we dont know Ak−m . 2. a more sophisticated representation than a trellis [1] and involves a forward and backward iteration on the graph. 4. compute P (x = Ak |y) using (4.ac. 3.4.ee. For the case where L ≥ 0 we would need to compute the probabilities in a similar fashion as follows: P (x = Ak |y) = β exp − y−c0 Ak − PL 2 m=1 cm Ak−m 2σ2 .

· · · . The idea is that we will detect each symbol independently. The denominator is in fact identical regardless of the choice i for symbol dk . z1 . −1. · · · . Also in iterative detectors 7 where a priori information about the symbols dk = i are shared between the decoder and symbol detector. z1 ) implies maximising the numerator.e.21) = arg max d1 dL+1 7 also ··· d2 p(z1+L . · · · . For example.40 4.19) where P (dk = i) is the probability that the i’th point in the symbol alphabet was transmitted. · · · . given the CIR and the entire received sequence. Thus. i. a received symbol at discrete time zn can only be inﬂuenced by transmitted symbols di transmitted at times i ∈ {n. z1 |dL+1 . z1 |dk = i) P (dk = i) p(zk+L . · · · . · · · . n + L}. given the entire received sequence z and the CIR b. The receiver has access to a burst of received symbols z. the number of taps in the CIR is ﬁnite. such that the probability of making an incorrect decision is minimised. z1 ) (4. i. P (dk = i) may vary. It will become evident that we will exploit some assumptions about the channel noise statistics. in this presentation we will keep the term P (dk = i) as an unknown. Secondly. Notice that in the Pearl Believe Propagation algorithm we needed no assumptions on the prior. · · · . It also has access to an estimate of the channel impulse response (CIR) b. z1 |d1 ) P (d1 ) d1 (4. · · · }. · · · . z1 ) = p(zk+L . we note that in many practical cases we may make the assumption that the M symbols in the alphabet are equally likely as data bits themselves are random after a suﬃciently long interleaver removed any correlation in the encoded data bits. Thus. referred to as the modulation alphabet. and the fact that the channel memory.20) We assume we have known tail symbols. the estimate of the transmitted symbols made at the receiver is called detection (or sometimes equalisation). and thus maximising P (dk = i|zk+L . d1 ) referred to as turbo detectors or turbo equalizers . For example. where we need to consider the prior. We have ˜ d1 = arg max p(z1+L . The approach we will use in this section is referred to as Maximum a Posteriori Probability (MAP) symbol detection based on Bayesian Inference.e. i. · · · . Thus we may start by detecting dk=1 given zL+1 .2 Optimal symbol probability calculation using Bayesian Detection Assuming that data symbols are chosen from a set of discrete complex values. d1 ) P (dL+1 . of length Q. · · · . we may write the a posteriori probability for the transmitted symbol at time k being symbol i in the alphabet. as dk=1 cannot inﬂuence the value of zk ∀ k > L + 1 since the channel memory is L + 1 (CIR has L + 1 taps). di (4. since we use Bayesian Inference here.4. dk = i using Bayes’s theorem as P (dk = i|zk+L . The question thus is how we will choose or infer the estimated data symbols. we know dk ∀ k ∈ {0. we may have an 8 PSK modulation alphabet. by assuming we know the probability density function (pdf) of the additive noise process. · · · . the MAP criterion for detecting dk is ˜ d k = arg max p(zk+L . z1 |dk = i) P (dk = i) .e. as we need to pick one of the M possible symbols in the alphabet as the estimated data symbol. n + 1. However.

d2 )P (dL+2 ) = arg max d2 dL+2 d3 p(z1+L . 4. · · · . d2 ) P (dL+2 . dk−L ) = PL 1 √ e−|zk − j=0 πN0 b[j]d[k−j]|2 /N0 (4. . In other words. but unfortunately. the MAP detector is recursive. 8 We assume the noise is statistically independent. · · · .23) Assuming no a priori information about the a priori probability P (dL+1 . d2 ) p(z2+L |dL+2 . · · · . because the process of detection implies that the disturbing eﬀects of the ˜ channel were equalised to obtain the detected estimate d 1 .24) = arg max d2 dL+2 ··· ··· d3 p(z2+L .5 Forward-Backward MAP detection Sequence detection produces optimal hard symbol values. d2 ) = arg max d2 dL+2 ··· d3 p(z2+L |dL+2 . a process we will call detection. · · · . and hence P (dL+1 . we will ﬁnd that the reliability information is very important to the decoder. z1 |dL+1 .22) In addition. · · · . · · · . · · · . assuming the noise is Gaussian enable us to write the pdf used above as p(zk |dk . · · · .41 ˜ where d 1 denotes the decision on transmitted symbol d 1 . z1 |d2 ) P (d2 ) d2 (4. · · · . We have ˜ d2 = arg max p(z2+L . · · · . d1−L ) p(z1+L . and hence one would like to have handy a detection algorithm that can provide optimal probabilistic 8 Historically this is called equalisation. · · · . It does not require the re-computation of information obtained from detected symbols for time prior to k. · · · . since the statistical independence makes it possible to simplify the equations above by rewriting p(z1+L . This leads to a huge saving in computational requirements. · · · . d1 ) = (4. d1 )p(zL |dL . d0 ) · · · p(z1 |d1 . · · · . · · · . · · · . d1 ) becomes a constant that does not inﬂuence the detection of symbol dk . z1 |dL+1 . z1 |dL+1 . since it has been whitened by the preﬁlter (see Chapter 5). z1 |dL+1 . We will not present the detection d3 as it should be clear the it follows a similar route as did the detection of d2 with the recursions continuing as time k increases for detecting in general dk . z1 |dL+2 . d1 ) The term in the last line of equation (4. In a later chapter dealing with decoding of error correction codes. We now move to k = 2. d1 ) is available. sub-optimal probabilistic information regarding the reliability of those decisions. d2 ) P (dL+1 . d1 ) as p(z1+L |dL+1 . d1 ) P (dL+1 . The reader may now appreciate the importance of noise whitening to the detection process.24) that is summed over may be determined from information gathered when detecting d1 . · · · . · · · . d2 )P (dL+2 ) d1 p(z1+L . · · · . · · · . · · · . we may make the assumption all symbols are equally likely. · · · .

It is known as the Pearl Belief Propagation algorithm in the artiﬁcial intelligence community [1] but in Digital Communications it is known as the BCJR or Forward-Backward MAP algorithm [3]. · · · . If we number each node in the trellis sequentially from left to right as 0. we associate a symbol value that was transmitted. The likelihood has the form P (yn |tn ) = A exp−|yn − P1 k=0 ck tn−k |2 /2σ2 (4. but rather the likelihood itself.5: The forward-backward MAP trellis for BPSK. we had to either know or guess the prior probabilities of the symbols. observed data from demodulator: y[0] n=0 0 t 01= 1 1 y[1] n=1 t13 = 1 3 t 23=1 t01 = −1 2 t 14= −1 4 t 24= −1 10 y[2] n=2 t35 = 1 5 y[3] n=3 7 y[4] n=4 9 y[5] n=5 11 n=6 y[6] 6 8 Transmitter has 2 possible states per time n: 1 or −1 since BPSK was used Figure 4. With each edge in the trellis. It has two possible symbols −1 or +1. from left to right on the trellis. in the case for BPSK there are two possibilities. with a 2 tap trellis.j be the likelihood itself (given above) associated with the edge from nodes j to i with value tn = ti.j . Let us consider the BP SK alphabet as an example. A diﬀerent algorithm exists that is able to provide optimal probabilistic information.26) where tn is given. however. then the edge that connects node j to node i (assuming they are connected. and tn is the n’th symbol we wish to infer at the receiver. In the trellis used for the MAP detection. while yn is known as the observations from the de-modulator. a −1 or a +1. Let i run from 0 to I.42 information about each detected symbol. and that we denote with j ∈ P(i) meaning j is a parent node of i) has a value ti. The symbol may take on one of M possible values dictated by the modulation alphabet used.5. I. and let wi.j . without the need to know or guess the prior probabilities of the symbols. c is the channel impulse response assumed known or at least an estimate of it is known. the edges of the trellis have associated with them not the Euclidean distance metric as was the case for the min-sum algorithm. and it is the job of the receiver where the MAP algorithm resides to estimate the probabilities that each symbol was transmitted. Using Bayesian inference we presented an algorithm capable of precisely that (the previous section). Formally the symbol probability is given by marginalization as P (tn |y) = tn† :n† =n P (tn |y) (4. The modulator in the transmitter followed a speciﬁc path thru the trellis shown in Figure 4.25) Here y is the received or observed vector from the demodulator. while the set of states .

j:j∈P (i).in the end. Since summing over all outcomes of a probability must yield 1. There are only two nodes at each time slice since we use BPSK. . we may demand that t=1 t=−1 rn + rn =1 (4.tij =−1 αj wij βi (4. Compute terms proportional to the probability as t=1 rn = i.j:j∈P (i).tij =1 αj wij βi (4.33) and t= −1 rn=1 = α1 w41 β4 + α2 w42 β4 . The second set of messages from right to left is similarly computed as βj = i:j∈P(i) wij βi (4.32) 4.27) αi = j∈P(i) and α0 = 1. each associated with a node. The term proportional to the probability contains a yet to be determined constant of proportionality which is a function of A. ′ ′ ′ ′ ′ ′ (4. (4.29) for the symbol to be a ’1’ and t=−1 rn = i.1 An example Let us assume all the forward and backward messages on the trellis have been computed.30) for the symbol to be a ’-1’.34) t= −1 t= 1 We now normalise for n = 1 by demanding rn=1 + rn=1 = 1. and then ﬁnd after normalisation that P (t1 =′ 1′ |y) = r1 (t=′ 1′ ) . but that may be avoided by re-normalising when needed .31) which yields the constant of proportionality for time instant n and thus correctly normalises both terms so that the probability is thus (t) P (tn = t|y) = rn . and let tij be the values of tn associated with the trellis from node j to node i. say i. Now let an i run over nodes at time n and j over nodes at time n − 1.43 considered for node i is P(i). deﬁned above.5. (4. the normalisation constants disappear anyway when the outcomes are summed to one to produce a probability. so that t= 1 rn=1 = α2 w32 β3 + α1 w31 β3 ′ ′ (4. as wij αj (4. and we want to determine the probability that at time n = 1 the transmitted symbol was a ’1’ or a ’-1’. Compute the forward pass messages.28) and βI = 1.35) Remember that in computer calculations there may be underﬂow or overﬂow in calculating β and α.

especially the behaviour for the 2 coding schemes. otherwise the curves are not reliable. Choose values for Eb /N0 that yield sensible BLER values. Form the soft bits using the same procedure as is the case in the given Max Log MAP algorithm. Form the soft bits using the same procedure as is the case in the given Max Log MAP algorithm. typically between 0. 2) On the same graphs. for MCS (Modulation and Coding Scheme) 1 and MCS 4. Discuss the advantages and disadvantages of each in terms of complexity and . for MCS (Modulation and Coding Scheme) 1 and MCS 4. Using the GSM simulator. Develop your own detector based on the forward-backward MAP algorithm. and to scale the decisions based on the sub-optimal probability calculations. and compute the symbol probabilities using the suboptimal procedure explained in this chapter. otherwise the curves are not reliable. typically between 0. identify the symbol detector (equaliser) function. Comment on what you ﬁnd. that is also based on probabilities. Using the GSM simulator. typically between 0. 2) On the same graphs. especially the behaviour for the 2 coding schemes. Choose values for Eb /N0 that yield sensible BLER values.3 and 0.6 Assignments 1.3 and 0. 1) Plot BLER (block error rate) versus Eb /N0 . Form the soft P˜ ′ ′ bit outputs as zsof t = (2˜ − 1) | ln Pz=′ 1′ |. plot the Max log MAP BLER values. a sub-optimal implementation of the forward-backward MAP algorithm.44 4. 2. Choose values for Eb /N0 that yield sensible BLER values. Compare the BLER results for MCS 1 and 4 for all 3 methods and Max Log MAP on the same graphs. where the one is at a low code rate and the other at a high code rate.01. It is based on the so-called Max Log MAP algorithm. Develop your own detector based on the Abend and Frichman detector. otherwise the curves are not reliable. It is based on the so-called Max Log MAP algorithm. a sub-optimal implementation of the forward-backward MAP algorithm. Develop your own detector based on the Min-Sum algorithm. plot the Max log MAP BLER values. 3.3 and 0. a sub-optimal implementation of the forward-backward MAP algorithm. Rather simulate less points with more frames/blocks per point. especially the behaviour for the 2 coding schemes. Comment on what you ﬁnd. where the one is at a low code rate and the other at a high code rate. The idea is to use the min-sum hard bits since ˜ z z= 0 ˜ they are optimal. identify the symbol detector (equaliser) function.01. identify the symbol detector (equaliser) function.01. Rather simulate less points with more frames/blocks per point. 2) On the same graphs. plot the Max log MAP BLER values. Rather simulate less points with more frames/blocks per point. 4. where the one is at a low code rate and the other at a high code rate. It is based on the so-called Max Log MAP algorithm. that is also based on probabilities. Comment on what you ﬁnd. 1) Plot BLER (block error rate) versus Eb /N0 . for MCS (Modulation and Coding Scheme) 1 and MCS 4. 1) Plot BLER (block error rate) versus Eb /N0 . Using the GSM simulator.

45 performance. .

46 .

Chapter 5 Frequency Domain Modulation and Detection: OFDM 5. 47 . and is the modulation of choice in many emerging wireless communications standards at the time of writing. especially if the time domain channel impulse response contain many taps and/or the modulation constellation is complex. even up to the present day. The solution has become known as Orthogonal Frequency Devision Multiplexing (OFDM) modulation and detection. These are generally complex. This chapter will present and analyse OFDM. Engineers have a long tradition of mitigating complex time domain operations in the frequency domain. Generations of engineers have done this. We are comfortable with Laplace and Fourier transformations to render diﬀerential operators and/or convolution operators into a form that uses only algebraic equations. and then somehow render the detection process trivial. the frequency domain detection remains trivial. think of how simple it is to ﬁnd circuit transfer functions by performing a Laplace transformation and then factoring pure algebraic equations on the s domain. to the extent that regardless how many taps there are in the time domain. To jog your memory. These led to the development of trellis based detection methods to achieve both Maximum Likelihood and Maximum A-posteriori Probability detection. It turns out that the answer on this question is aﬃrmative.1 Introduction In the previous chapters we studied time domain modulation and time domain detection. It was thus a natural question to ask ourselfs if it is possible to modulate and detect in the frequency domain.

i. . cn−1 cn−2 cn−3 · · · c0 Since the eigenvectors of any circulant matrix is simply the columns of the matrix F . . To do that we change the transmission format to a cyclic one as indicated in Figure 5. .1 The Transmitter for OFDM systems Cyclic time domain multipath propagation In OFDM we want to exploit the nice properties of cyclic matrix theory.e. A circulant matrix C is a matrix build up from only n elements c0 . note that FFH = I (5. ··· ··· . the sum A + B is circulant.48 5. 5. the eigenvalues of a circulant matrix can be readily calculated by a Fast Fourier transform (FFT) of c. The discrete Fourier transform matrix is given by 2π(0)(0) 1 √ e−j N N 2π(0)(1) 1 √ e−j N N 2π(0)(2) 1 √ e−j N N 2π(1)(0) 1 √ e−j N N 2π(1)(1) 1 √ e−j N N 2π(1)(2) 1 √ e−j N N OFDM is based on the properties of circulant matrices. . . .2 Circulant matrix theory Circulant matrices form a commutative algebra.2) where Λ is a diagonal matrix with the diagonal vector containing the eigenvalues.3) F = . the product AB is circulant. . . and AB = BA. . To do that. . This is a property unique to the FFT matrix. (5. any circulant matrix can be written or factorized as C = F H ΛF (5.3. we need to make the multipath propagation cyclic.1. . Consequently. F is thus perfectly orthogonal to itself. It has a special structure given by c0 cn−1 · · · c2 c1 c c0 cn−1 · · · c2 1 c1 c0 · · · c3 C = c2 (5. . cn−1 . . its Hermitian transpose is also its inverse. ··· ··· . .3 5. .4) where I is the identity matrix. These are just equal to the FFT of c. c1 . since for any two given circulant matrices A and B.5) . The baseband model of multipath propagation as we are used to is given by L r[n] = i=0 h[i]d[n − i] + ns [n]. Finally.1) . · · · . 2π(N −1)(0) 1 √ e−j N N 2π(N −1)(1) 1 √ e−j N N 2π(N −1)(2) 1 √ e−j N N 2π(0)(N −1) 1 √ e−j N N 2π(1)(N −1) 1 √ e−j N N 2π(N −1)(N −1) 1 √ e−j N N . . . (5. H indicates the Hermitian transpose. A key property of any circulant matrix is that the eigenvectors of a circulant matrix of given size are merely the columns of the discrete Fourier transform matrix of the same size.

OFDM modulation views the vector d as the inverse FFT of the modulated symbols from the modulator. i.1. . . r[1] corresponds to d[1] in Figure 5. If the transmission burst is cyclic as in Figure 5. . then the received frame can be written in cyclic matrix form as h0 0 0 hL · · · h2 h1 ns [1] r[1] d[1] h h 0 0 hL · · · h2 0 1 d[2] ns [2] r[2] + .8) where the matrix H is a circulant matrix constructed using h on the rows as deﬁned above in the transmitter. 0 hL h3 .6) . . . this baseband model is cyclic since the matrix H is cyclic. a theme that should be familiar by now. and is corrupted by ISI and thermal noise. . Our job is to estimate the most probable D given r. (5. .1 and so on. d[n] ns [n] r[n] 0 0 0 hL · · · h0 Clearly. 5. Of course the other key idea was to make it cyclic by prepending the frame with the cyclic preﬁx. Using the inverse fast Fourier transform matrix F H we may write d as d = F H D. (5. . .49 copied here known as "cyclic prefix" d d d 1 2 3 Last P bits d n−P−1 dn Figure 5.1: The OFDM transmitter frame format making the multipath propagation cyclic. denoted D. Hence we may write r = Hd + ns .7) So in an OFDM system the transmitted data d is formed by doing an inverse FFT on the symbols from the modulator. The cyclic preﬁx must be longer than the length of the CIR vector h. . = h2 h1 h0 0 . . .e. . . .4 OFDM receiver. . where h is the time domain channel impulse response as estimated by the receiver in the normal manner (see next chapter) and ns the thermal noise sample at time n and d the transmitted symbols. a key diﬀerence between other methods and the essential idea to remove the ISI with trivial complexity in the receiver. (5. . MAP detection The received vector in the receiver from the matched ﬁlter sampler pair is denoted r. .

1 MAP detection with trivial complexity MAP detection with trivial complexity in spite of the channel impulse response having L taps was our objective in OFDM. it contains no memory. The observed symbols in this equation is the FFT of the received symbols. % inverse FFT Z = [1*Z(L-1:L) Z].ch).L)).2 Matlab demo The reader may convince herself that the MAP detection is trivial by executing the matlab code below where there is no noise. Now taking the FFT both sides we may write F r = F HF H D + F ns . Add your own noise to calculate the BER and see for yourself that it is the same as MLSE with Viterbi. the noise is still Gaussian because the FFT does not change the statistics.no noise . 5. 1 The ISI was perfectly removed because of the cyclic matrix properties introduced by the modiﬁed Tx frame and the fact the the inverse FFT of the data was transmitted in stead of the modulation symbols themselves.e.11) (5.50 5. % MAP estimate at receiver is trivial error = z . . % the dispersive channel . So in other words it my be solved by applying straightforward symbol by symbol MAP without memory which has a trivial complexity. the matrix Λ is diagonal. put in what you want R = conv(Z. and most importantly.z_h.*H))).there is none! std(error) 1Λ is a diagonal matrix that contains N values. (5. % the FD Ch est z_h = sign(real(conj(H).add your own! H = fft([ch zeros(1.8) we ﬁnd r = HF H D + ns . Let us now see how that is possible.10) (5. and by substituting this into equation (5. i. given by the DFT of h where h with L taps is zero padded to length N. % add cyclic redundancy ch = [1 0.*fft(R(3:L+2)).12) (5. Recall that d = F H D.4.13) (5.4. so we may simplify this to F r = ΛD + F ns .9) So we end up with a new equation to solve given above. But we know that F F H = I. clear all L = 8.6].L-2)])./(conj(H). which may be written as (using the decomposition theorem for cyclic matrices) F r = F F H ΛF F H D + F ns . z = sign(randn(1. % the multitap IR. % random data Z = ifft(z). % the error . but that the complexity is trivial.

. it needs some normalisation because of the FFT operators. Be careful with the noise energy in OFDM.5 Assignment Add noise at the correct Eb /No to the demo code and verify BER is the same as what Viterbi attains for any channel with L taps.51 5.

52 .

it is convenient to deploy Least Squares (LS) estimation. This is an important realization. Our approach will therefore be to estimate the overall impulse response.1: The layout of a typical receiver. In general the noise present at the output of the channel is coloured. and then to design a preﬁlter (Figure 6.1) with the objective of maximising the SNR. and hence Maximum Likelihood estimation of the overall channel impulse response will require knowledge of the noise covariance function (we will prove this in later Chapters). Since the noise covariance function is in general not known.Chapter 6 Channel Estimation 6.1 Introduction Channel estimation is the ﬁrst task in the receiver shown in Figure 6. The reader will recall that a matched ﬁlter is needed to achieve a maximum output Signal to Noise Ratio (SNR). The LS approach simply chooses the channel impulse response in such a way that the weighted errors 53 . since it implies that we cannot design a receiver ﬁlter that is matched to to overall impulse response before estimation. nor does it require statistical knowledge of the noise. as the RF channel impulse response at the symbol rate is unknown at the receiver. LS estimation requires no statistical description of the overall impulse response. Thus even though the pulse shaping ﬁlter and anti aliasing ﬁlters are known the overall impulse response is not.1. c[n] r[n] Channel Estimation r[n] Prefilter z[n] Soft bit detector b[n] Hard bits De-Interleaver Soft Decision Decoder Figure 6. a topic addressed in the next Chapter.

Hence we have N r(t) = lim N →∞ r[k]φk (t). These we denote as r[n] where n denotes discrete time that may be used for digital processing in a digital signal processor. k=1 (6.3) r[k] = n s[n] c(t − nTs ). φk+i (t) = δ(|k − i|) we may write r[k] = r(t).2) Since φk (t). (6. φk (t) + n(t). Imagine Z discrete symbols s[n] are transmitted at a rate Ts during this time. 6. (6. Given an overall impulse response c(t) that is time invariant over the duration of the burst.1) Now let us expand the signal r(t) in terms of a complete orthonormal basis with basis functions denoted by φk (t). we will have a ﬁnite number of received samples available after sampling r(t) corresponding to a burst. then the received signal will be Z r(t) = n s[n]c(t − nTs ) + n(t). speciﬁcally the receiver ﬁlter and sampling rate. We now study the form of the optimal receiver. (6. · · · .5) We may now take the limit where N approaches inﬁnity. and write the logarithm of p(r|s) (log likelihood) as P M (s) = − ∞ ∞ Z r(t) − n s[n]c(t − nTs ) 2 dt.2 Optimum receiver ﬁlter and suﬃcient statistics After suitable RF electronics have been utilised in the receiver front end. r[2].54 between the given measurements and a linear model is minimised. φk (t) via the projection theorem. (6.7) n n m .4) Assuming the noise sequence ns [k] is Gaussian and white the joint probability density function of the variables r = {r[1]. we will limit this study to a short period of time where the impulse response c(t) remains unchanged.6) Expanding and integrating we ﬁnd that P M (s) ∝ 2Re where s[n]∗ z[n] − s[n]∗ s[m]x[n − m] (6. Since transmission was performed in the form of data bursts as discussed above. Hence r[k] can be written as Z (6. φk (t) = n s[n]c[k − n] + ns [k]. r[N ]} conditioned on the transmitted symbols s is p(r|s) = 1 2πN0 N exp “ PN P 1 − 2N n s[n]c[k−n] k=1 r[k]− 0 2 ” . we receive a baseband analogue signal r(t).

However there is a problem constructing this ﬁlter in practice.10) . we need to lay down the foundations of least squares estimation. However. This is frequently called inter symbol interference (ISI). that will not add to the length of the overall CIR. Although the transmission and receiver ﬁlters are known.that will be done in the preﬁlter. the overall CIR c[n] is not known a priori at the receiver.9) We therefore conclude that passing r(t) through a ﬁlter matched to c(t) and then sampling at a rate Ts yields samples z[n] that form a set of suﬃcient statistics for detecting s. We now postulate a model. the preﬁlter also has the task of changing the phase response of the overall CIR after the preﬁlter so that the leading taps become dominant for reasons that will become apparent in later chapters. Also we will whiten the additive noise with the aid of the preﬁlter as this will simplify the detection process. d[1]. that obeys the equation d = Hc (6. This chapter will now address the Channel Estimation problem. the ﬁrst stage of the receiver. The linear model says that observations r = {r[0]. In later chapters it will become evident that the length of the CIR determines the complexity of the optimal detector. Chapter 7 will address the design of such a ﬁlter. One such ﬁlter is the raised cosine ﬁlters. and we therefore may choose the receiver ﬁlter accordingly.3 The linear model Before we proceed to the formulation of the channel estimation problem. in fact a linear model. d[Q]}T plus an error component n = {r[0]. · · · . Hence in theory an optimal receiver ﬁlter exist.4. The channel is called dispersive or frequency selective if the sampled CIR c[i] is non-zero for i > 0. namely the matched ﬁlter. 6. This is indicated in Figure 1. Each term represents interference of the transmitted signal s[n] with itself because the overall channel has memory. in the form of the linear model.11) (6. The output of the preﬁlter thus yield samples z[n] that form suﬃcient statistics for detection. · · · . We thus propose to use a ﬁxed receiver ﬁlter with a bandwidth chosen according to the transmitted bandwidth or other systems constraints and is not chosen to be a matched ﬁlter . What is important is that the receiver ﬁlter causes as little an increase to the length of the overall CIR as possible.8) ∞ ∞ c∗ (t)c(t + nTs )dt. r[Q]}T consists of a signal component d = {d[0].55 ∞ ∞ z[n] = and x[n] = r(t)c∗ (t − nTs )dt (6. The task of matched ﬁlter is given to the preﬁlter after the overall CIR has been estimated and based on this estimated CIR a suitable matched digital ﬁlter may be designed. · · · . r[Q]}T given by r = d + n. (6. r[1]. This is quite common in practical communication systems. r[1]. because the RF channel itself causes fading which is unpredictable in most cases.

Thus we have an equation error model r = Hc + n. .17) (6. (6. .4 Least Squares Estimation We receive a vector of K +1 measurements from the channel denoted r = {r[n]. c[1].14) where n represents the noise. presented next. the columns of Q needs to be linearly independent. . Thus ∂ 2 ǫ = 2Q† (r − Qc) ∂c (6. The matrix Q is fully populated by the transmitted training symbols. the RF channel model length. c[1]. c[P ]}T . . · · · .13) It is precisely these combiner weights cn that are the parameters that we wish to estimate. under the linear model. and c is a parameter vector c = {c[0]. It leads naturally to least squares ﬁtting or estimation. Consider the matrix Q shown below: Q= t[n] t[n + 1] t[n + 2] . Given the observations r. . c[L]}T we may set up a linear model as r=Q c+n (6. . a determined case (P = N ) or the overdetermined case (P < N ). For the matrix to be a full rank matrix. (6. It is the last case we are particularly interested in here.4) with the overall channel impulse response denoted by a vector c = {c[0]. the pulse shaping ﬁlter and the anti-aliasing ﬁlters. 6. (6.16) . . with P possibly more or less than N . t[n − 2] t[n − 1] t[n] . hp }. t[n − 1] t[n] t[n + 1] . · · · . we need to estimate c. so that time shifted versions of the training sequence are at least linearly independent.12) The matrix H is composed of columns hn and we may write H = {h1 . The length of the overall impulse response L + 1 depends on the sampling rate.56 where H is a matrix.15) t[n] represents the transmitted training symbol at sample n. . and these are known at the receiver. Each columns vector is a mode of the signal d. We thus conclude that we require the training sequence to have an autocorrelation function that approximates a Kronecker delta function. r[n+K]}T and using (6. · · · . ··· ··· ··· . · · · . For a given estimate of c. and signal d consists of a linear combination of these modes: P d= n=0 cn hn . these are just time shifted versions of the training sequence. r[n+1]. In general we may have an under determined case (P > N ). The reader may verify that this is the case for the training sequence given in Chapter 1. t[n − L] t[n − L + 1] t[n − L + 2] . . However. the squared error between the r and the linear model Q c is ǫ2 = tr[(r − Q c)(r − Q c)† ] = n† n which is to be minimised to obtain the least squares estimate. .

0. as shown in Figure 6. the channel realization for this burst was unknown.18) The matrix Q† Q is called the Grammian matrix. However. so that we would not have accomplished a maximum output SNR. We apply equation (6. such a sequence does not exist. It is the cross-correlation matrix of the transmitted training sequence. 1]. the receiver anti aliasing ﬁlter was not matched to the transmission pulse shaping ﬁlter.3 c along with the z-plane representation. We have 26 measurements for r taken at a SNR of 15 dB.3 0. using a modulation alphabet of ﬁxed discrete size and short sequences. The Gaussian pulse shaping ﬁlter causes inter symbol interference of three consecutive transmitted symbols. and even if we selected to do that. The optimal training sequence will make Q† Q = qI with q a constant. and since the latter sequence has a autocorrelation function which approximated a Kronecker delta.1 0 0 2 4 6 8 10 tap number 12 14 16 18 Figure 6. This problem is addressed in a later Chapter where is it shown that a suitable preﬁlter is needed to achieve both a minimum phase impulse response and a maximum SNR. and a computer search for the best suboptimal sequence may be performed instead. .4 Tap setting [Volt] 0. 1. 6.2 0. and achieve the minimum mean squared error. Here we clearly see that the overall impulse response is not minimum phase as some nulls are located outside the unit circle.2 with four samples per symbol.57 and equating the gradient to zero produces the estimate ˜ c ˜ = (Q† Q)−1 Q† r. Moreover. For simplicity. we assume here that the RF 1 channel has 3 taps at the symbol rate with settings √3 [1. apart from that introduced by the RF channel.18) to estimate ˜ and the magnitude is shown in Figure 6.5 A representative example We focus on the GSM system where the pulse shaping ﬁlter used in the transmitter is Gaussian.5 0. c (6. the Grammian is highly diagonally dominant and invertible.2: The Gaussian pulse shaping ﬁlter used in GSM.

5 0 −0. 6.5 tap number 4 4. after the CIR has been estimated via the LS method.19) since with r.2 0 1 1. we assumed that the noise covariance matrix V with elements Vij = Cov[ni .2 Tap setting [Volt] 1 0. Hence we applied LS estimation in the form of the normal equations given by (6.5 2 2. we may take a second look at the baseband received model given by r=Q c+n (6.8 0. we may in turn estimate V. Thus the system is over-determined which makes the estimation relatively immune to noise. N the vector containing the observations and noise samples respectively. · · · .5 5 5.6.5 −1 −3 −2 −1 0 Real Part 1 2 3 Figure 6.3: The estimated impulse response ˜ and its z-plane representation.18) since it does not require knowledge of V.4 0. Q and c now available after the estimation.4 1. However.1 The generalised least squares procedure Imagine we have a model for an experiment containing Z realizations.58 1. 2. c Another important observation is that we used 26 − (L + 1) observations to estimate 6 taps.21) Imaginary Part .6 Generalised Least Squares Estimation In previous sections. Thus. (6. we have y = XΘ + N (6.20) Denoting by Θ the vector of parameters to be estimated and Y.5 6 1 0. nj ] with ni the noise sample at time i was unknown. The question now arises how we may improve the estimate of c by exploiting further knowledge of the noise covariance. in the form Yi = θ1 xi1 + θ2 xi2 + · · · + θk xik + ni ∀ i ∈ [1. Z].6 0. we may form an estimate of n using training symbols. This choice is to satisfy the need to have a LS estimation rich in measurements while parameters are few.5 3 3. 6.

25) is minimised. . These two properties serves as justiﬁcation for using the GLSE rather than the LSE. then the GLSE would be a MLE. We thus concluded that least squares estimation is a viable alternative as it needs no statistical description of the noise. Let us assume the errors (noise) N has zero mean.7 Conclusion This chapter presented the key ideas behind channel estimation. . .22) is the matrix of set points of the k input variables x1 . ··· x1k x2k . since this was not the case for the LSE in the previous sections. 1 apart from an additive constant.24) (6. let us replace the estimates Θ by the corresponding estimators ˜ ˜ Q = (X† V−1 X)−1 X† V−1 y. and covariance V. especially at high SNR where the noise covariance is better estimated. It was shown that the unknown RF channel impulse response prevents us from knowing the channel perfectly. . xZ1 x12 x22 . . (6. and that an estimation of the overall impulse response is inevitable. xZ2 x13 x23 . and in practice a small improvement is so obtained in Bit Error Rate performance of the receiver. · · · . Thus if the error were multivariate Normal. x2 . . Hence an arbitrary linear function of the parameters is estimated with minimum mean square error. . Given the actual observed responses y from the Z experiments. ˜ Secondly. . which is encouraging. xZk X= (6. so that a maximum likelihood estimation is not possible. xZ3 ··· ··· . x3 . 6. . otherwise we do not assume or specify the noise pdf. xk during the N experiments. the generalised least squares ˜ estimate (GLSE) Θ are those which minimise the quadratic form (y − XΘ)† V−1 (y − XΘ) ˜ with respect to Θ. . . the noise at the output of the channel is coloured.26) (6. More so.59 where x11 x21 .23) How do we justify using the quadratic form given above? Well we argue that if the errors (noise) in the model is a multivariate Normal pdf with covariance V then the log likelihood function of the parameters Θ is given by the quadratic form 1 . Then we may proof that the estimator Q is such that the mean square error between L = λ1 θ1 + λ2 θ2 + · · · + λk θk and ˜ L = λ1 q1 + λ2 q2 + · · · + λk qk ˜ ˜ ˜ (6. Diﬀerentiating and equating to zero produces the estimate ˜ Q = (X† V−1 X)−1 X† V−1 y.

60 We showed why the training sequence transmitted at the transmitter must have desirable autocorrelation properties. . and concluded that a preﬁlter must follow the channel estimator to achieve maximum output SNR.

7). In your report. Then plot the raw BER vs.8 Assignment 1) In the module Main. and the known pilot/training sequence contained in the transmitted burst (26 symbols). rx = rx_4s(1:4:length(rx_4s)-3). and create an estimator using the generalised LS estimator using only the 26 training symbols located in the transmitted burst transmitted that will estimate the overall channel impulse response (a 7 tap FIR ﬁlter) ir and feed that to the preﬁlter.m these lines of code appear: % Channel Estimates [ir.rx_4s(1:4:length(rx_4s)-3).26. Eb/No for TU channel model at 50 km/h with fading and explain how you created the estimator.noise_s0] = ch_est_1s(transmitted. You have as knowns the received sequence.61 6. explain why the generalised LS estimator does not appear to produce better results than the standard LS estimator. Remove this estimator based on LS theory. .

62 .

However.2 Minimum mean square error (MMSE) estimation We studied LS estimators for channel estimation as we did not have available the noise covariance function after the receiver ﬁlter. which will be shown to simplify the optimal detector. We also showed that the estimated overall impulse response. In general. we may estimate the noise covariance function. Although the noise covariance matrix can be estimated from the training sequence and can be used to modify the maximum likelihood metric in the detector.1 Introduction In previous chapters we saw that for any given data burst we do not know the impulse response of the RF channel at the receiver. referred to as a preﬁlter is required to maximise the output SNR. and consequently before this estimation a maximum output signal to noise ratio (SNR) cannot be achieved. Thus the detector will be presented a signal corrupted by white additive noise. Later it will become evident that this requirement plays a key role in reducing the complexity of the detector. and exploit that knowledge in an estimator. This fundamental result 63 .Chapter 7 Minimum Mean Square Error (MMSE) Estimation. The last consideration is the fact the noise present at the input of the preﬁlter is coloured. we know from the Gauss-Markov theorem that the conditional mean of x is a linear function of the measurement y when y and x are jointly normal. denoted c is typically not in minimum phase form. Hence the overall channel impulse response needs to be estimated. 7. as well as a set of training symbols during each data burst. We then indicated that an additional ﬁlter. In practical terms this means that the energy in the leading taps. Preﬁlter and Prediction 7. c[1] are not maximised. given a set of measurements y and a vector x that we need to estimate given that y contains information about x. it is convenient to perform noise whitening also in the preﬁlter. say c[0]. given the overall channel estimate is now available.

3) To further develop the theory. E{ǫ x∗ } = 0 ∀ i.1 The principle of orthogonality We are given n random variables x1 . · · · . Using the least squares method we will not need the noise covariance function. The solution to choosing the best set of constants a is to invoke the orthogonality principle: Theorem 1 The MS error P is a minimum if the constants a are chosen so that the error ǫ is orthogonal to the data. We constrain the estimator of s to be a linear function of x i. Some texts call this the Yule-Walker solution (see Papulous). x3 .it involves the E{} operator. whether x and y are jointly normal or not. some call it the the Wiener-Hopf equations.2) Note that this is now no longer merely a LS error . and here is how we do it. The objective is to ﬁnd n constants a1 . s = K x. Hence we ﬁnd that Rsx† − KRxx† = 0 and the Wiener-Hopf solution for the linear estimator follows as K = Rsx† R−1† .64 has many consequences.2. can we exploit that information to make even better estimates? The answer to the question above is aﬃrmative. i (7. the estimation error we make is minimised.e. an so that if we for m a linear combination of these constants to estimate another random variable say s. but if we assume that we do have at least an estimate of the noise covariance function. The mathematical form of these linear equations are thus given by E{[s − (a1 x1 + a2 x2 + · · · + an xn )] x∗ } = 0 i = 1.6) (7. .5) (7. xn . · · · . First we set up the estimation formulation: s = a1 x1 + a2 x2 + · · · + an xn ˜ (7. For example. and we invoke the orthogonality principle E{(s − Kx)x† } = 0. we now move on to matrix notation. that are useful in designing MMSE based estimators. i Application of the orthogonality principle then usually leads to a set of linear equations to be solved yielding the optimal choice for a in the MS sense .e. i. a2 . (7. · · · .4) This choice for K minimises the mean squared error among all linear functions of x. xx (7. This step leads to the Wiener-Hoﬀ equations. a3 . 7. There is no better linear function we can choose in the mean squared error sense.hence the term Minimum Mean Squared Error (MMSE) criterion.1) where s is the estimate of s. Let us denote the Mean Square of the error ǫ = s − s as P. we may set up a linear minimum mean square error estimator of x where x is forced to be a linear function of y. so P is given ˜ ˜ by P = E{ s − a1 x1 + a2 x2 + · · · + an xn 2 } = E{ ǫ 2 }. x2 . n. The question is how can we set up a general approach to make sure that we choose the best set of constants? One way would be to apply the LS method of the previous chapter.

Actually it will become evident that it did lead us to an excellent estimation strategy . Thus in the next section. The student must be able to explain this statement clearly. we will apply the MS estimation theory to the design of a linear ﬁlter known as a preﬁlter. This is the geometrical manifestation of the principle of orthogonality.2 Geometrical interpretation of the principle of orthogonality Assume an abstract space where the random variables span a subspace Sn . the random variable to be ˜ estimated s does not necessarily lie in this subspace Sn . without knowing explicitly what s looks like. s ε a x 1 1 ~ s x 1 a x 2 2 x 2 Figure 7.as the student needs to be able to recreate the material for themselves.1. Rsx† is the cross correlation matrix between the parameter to be estimated and the data. As always. 7. The projection from this vector s onto the subspace Sn yields the distance from s to sn . and represents the error vector ǫ.2. A preﬁlter transforms the impulse response of a system into its minimum phase form. we must make this projection orthogonal to the subspace.65 There are assumptions that needs to be made regarding the invertibility of Rxx† . To minimise the length of this error vector. as depicted in Figure 7. This makes detection far more eﬃcient. However. . meaning the impulse response after the preﬁlter is dominated by its leading taps. we know the statistical properties of s. the best way to understand theory is to use it .1: The principle of orthogonality. What constraints must be put on the vector x to guarantee that Rxx† is invertible? What does it imply for x in practical terms? The reader may now well think that since we dont know s the analysis led us nowhere. For example. and this we may know a priori or we may be able to compute it.the trick is just to recognise that even though we dont know s. Then any linear combination of the random variables is a vector s in subspace Sn . and in the following section we will study channel tracking in GSM that is used when mobiles move very fast.

and by minimising the MSE between the instantaneous decision and the known training (pilot) symbols. In this way. and it will become clear in the next chapter how this will reduce detector complexity. the energy is the leading taps after the preﬁlter is so maximised and is what we are aiming for. the detector may be based on the leading taps where most of the energy is located. we transform the problem to the z-domain.8) should be emphasised at this point in our discussion. a feed-forward ﬁlter. but for now. a feedback ﬁlter and a decision device as shown in Figure 7. and we would like to formulate these in precise mathematical language. We are interested in deriving the optimal form of the feed-forward ﬁlter.1 Applying minimum mean square error (MMSE) estimation: Let us design a linear preﬁlter Matched ﬁlter. we may absorb the matched ﬁlter into the feed-forward ﬁlter.3. Feedback FIR filter Figure 7.66 7.. and the CIR prior to the preﬁlter as c. ISI due to channel memory is perfectly cancelled. in order to develop the needed theory. minimum phase ﬁlter and spectral Factorisation Consider the deployment of a matched ﬁlter. the following relationship will hold if the preﬁlter performs a minimum phase transformation: q q n=1 |b[n]|2 ≥ n=1 |c[n]|2 ∀q (7. To develop the exact form of the minimum phase transformation ﬁlter. Later. Thus. . The preﬁlter is the combination of the matched ﬁlter and the feed-forward ﬁlter. a non-minimum phase CIR energy per tap (discrete time) never increases faster than that of the corresponding transformed minimum phase CIR. the feed-forward ﬁlter and the feedback ﬁlter. Decision feedback is used only in the synthesis of the optimal preﬁlter coeﬃcients. The feedback ﬁlter is the post preﬁlter CIR. let us stick to this arrangement where the matched ﬁlter and the feedforward ﬁlters are kept separate.3 7.2: The representation of the matched ﬁlter.b[2]. The sampled matched ﬁlter output in the z-domain is Y (z) = C ∗ (z)C(z)D(z) + C ∗ (z)N (z) 1 It (7. an instantaneous decision on the current symbol my be made. that using decision feedback in no way implies that our detector will need to deploy decision feedback..2. prefilter Matched filter c* [-n] Known pilot symbols s[n] r[n] Channel estimation c[n] y[n] FIR filter f[n] z[n] + Decision Σ - s[n] b[1]. Denoting the post preﬁlter CIR by vector b.7) Hence. 1 By feeding back known pilot symbols. These are qualitative statements.

see for example Papulous. monic and maximum-phase. Therefore theoretically the feed-forward 1 ﬁlter should be chosen to have a z-domain form given by F (z) = G∗ ( 1 ) .9) G(z) is canonical. and the feedback ﬁlter causal. C(z) the estimated CIR in the zdomain. A preﬁlter transforms the impulse response of a system into its minimum phase form. it would theoretically have to be inﬁnitely long. 3 After 2 See . we may factorise as C ∗ (z)C(z) = G(z)G∗ ( 1 ). z∗ (7. 7. assuming that the IIR response has a decaying time response. the post preﬁlter noise is given by G∗ ((z)) N (z) and is clearly non-white even if the 1 z∗ noise fed to the preﬁlter was white to begin with.10) 1 so that the post preﬁlter CIR is G(z) which is causal and minimum phase. and the rest to the feedback ﬁlter (that is causal. F (z) = G∗ ( 1 ) z∗ is an Inﬁnite Impulse Response (IIR) ﬁlter in the time domain.67 where D(z) represent the transmitted symbols in the z-domain. If we plot the poles and nulls on the Z-domain of an impulse response. However. 4 Actually. Now that we have determined that the feed-forward ﬁlter need to be anti-causal. we take a look at the noise after the application of the matched and feed-forward ﬁlter. C∗ In the z domain.2 MMSE preﬁlter design Let us apply the MS estimation theory to the design of a linear anti-causal ﬁlter known as a preﬁlter. Thus in practice we would need to truncate the FIR feed-forward ﬁlter to be ﬁnite. i. meaning the impulse response after the preﬁlter is dominated by its leading taps. the noise whitening ﬁlter will also be absorbed into the preﬁlter as explained in sections to follow. Hence we ﬁnd that G∗ ( z1 ) is ∗ anti-causal. and the impulse response is in minimum phase form. We thus identify the need to employ a post preﬁlter noise whitening ﬁlter. 5 This follows from the formal deﬁnition of minimum phase. and N (z) the noise in the z-domain. Thus. if we wanted to approximate the IIR ﬁlter by a FIR ﬁlter. reproduced in Proakis [2] the matched ﬁlter the impulse response has both pre and post cursor components since c∗ [−n] ∗ c[n] is symmetric with respect to [n].3. The feed-forward ﬁlter should be chosen to cancel the precursors of the impulse response after the matched ﬁlter 3 with respect to the current time instant [n]. The z transformation of the autocorrelation of c is nonnegative and hence there are no zeros in the power spectrum. monic and minimum-phase 2 . The reasons for the ﬁlter to be anti-causal were given in a previous section. there will be no nulls outside the unit circle 5 . Hence he have after the z∗ feed-forward ﬁlter a z-domain representation given by 1 )D(z) + F (z)C ∗ (z)N (z) z∗ C ∗ (z) G(z)D(z) + ∗ 1 N (z) G ( z∗ ) F (z)G(z)G∗ ( Z(z) = = (7. it is causal.e. In practice we may ﬁnd G by assigning roots of C ∗ (z)C(z) greater than 1 (outside the unit circle) to the feed-forward ﬁlter (that is anti-causal). the work by Forney 1973. 4 We will now focus our attention on methods to ﬁnd the coeﬃcients of the approximating FIR ﬁlters of ﬁnite length with the appropriate noise whitening properties. since white noise will simplify the operation of the detector [2]. where roots inside the unit circle implies stability.

We feedback past symbols to eliminate ISI using the impulse response b valid after the preﬁlter as given in (7. s[n−1]. we have available after channel estimation an estimate of the noise samples ns [n]. (7. Feedback ISI Figure 7. where we will assume that it is a Gaussian distributed sequence and has an autocorrelation function E{ns [k]∗ ns [j]} = N0 x[j − k] |k − j| ≤ L 0 otherwise (7. a property that has been shown to lead to reduced complexity soft bit detectors in earlier chapters. We select to use an anti causal preﬁlter with coeﬃcients f and we may represent the ﬁlter operation on the received sequence r = {r[n]. One way of designing a minimum phase preﬁlter. r[n + 1].11) where E{} denotes the expectation operator. This tends to yield a post preﬁlter impulse response that is leading tap dominant and consequently minimum phase 6 .. Realisability of an anti-causal feed-forward ﬁlter is possible by suitable delay in the receiver. Thirdly. that is unconditionally stable. we have an anti-causal feed-forward ﬁlter. · · · }T as z = f T r. The ﬁlter-detector combination architecture has been carefully chosen to guarantee that decisions made based on s[n] are without delay. Secondly. A procedure for choosing f and b based on MMSE criteria now follows. since the FIR ﬁlters we are using in fact only approximate the theoretical IIR ﬁlters needed to guarantee minimum phase properties. · · · }T the transmitted symbol sequence.13). however in practice we ﬁnd that the CIR b after the preﬁlter is in fact minimum phase if the length of the FIR preﬁlters are correctly chosen. the feedforward ﬁlter and the post preﬁlter noise whitening ﬁlter into a single feedforward ﬁlter. Since we are cannot prove this assertion. so that minimisation of the MSE of decisions tend to maximise ˜ energy in the leading feedback tap b[0].3.12) The sequence z has an impulse response denoted by b so that we may model the post preﬁlter baseband signal as L z[n] = i=0 b[i]s[n − i] + nw [n] (7. is to use the ﬁlter-detector combination as shown in Figure 7.b[2].13) where nw [n] is a whitened Gaussian noise process and s = {s[n].68 Anti causal prefilter r[n] f z[n] + Decision Σ - s[n] s[n] b[1]. 6 We .3: The representation of the MMSE-DF preﬁlter. where now we have absorbed the matched ﬁlter..

The solution w yields both the feed-forward ﬁlter f and feedback b coeﬃcients jointly.18) where ∗ indicates complex conjugate and † the Hermitian transpose. We now turn to the output SNR after the preﬁlter.69 operating in the presence of noise. s[n − 2].19) (7. As was stated in Chapter 1. . This implies that the preﬁlter will act as a matched ﬁlter as well as transforming the impulse response to have dominant leading taps. We argue that the best choice for f and b. s[n] must be at least an estimate of s[n]. 7. in the minimum mean squared error (MMSE) sense.3 Evaluating matrix E{yy† } and vector E{s[n]∗ y} The matrix E{yy† } may be written as . s Mathematically.17) The MMSE can therefore be written as min E{ wT y − s[n] 2 } and the solution for w is given by the Wiener-Hopf equation as E{yy† }w∗ = E{s[n]∗ y} (7. From Figure 7 we may derive an expression for ǫ[n] as ǫ[n] = wT y − s[n] where w and y are given by (7. is the unique one that minimises the diﬀerence between the estimate s[n] and s[n].16) T (7. ˜ This is the best we may do to enable the decision device to make the correct decisions in the presence of noise as it yields estimates ˜ as close to s as is possible with linear ﬁlter f . the preﬁlter has as one of its objectives the objective of maximising the output SNR (a property of a matched ﬁlter). We may deﬁne the output SNR as SN R0 ∝ b 2 E{ ǫ 2} (7. and we assumed b[0] = 1.. s[n] can only take on a ﬁnite set of values (deﬁned by the modulation alphabet).15) (7. while the denominator energy is minimised. ˜ Additionally. Thus the SNR is maximised. assuming b[0] = 1. with b[0] = 1 by deﬁnition. this choice is given by min E{ ǫ[n] 2} = min E{ s[n] − s[n] 2 } ˜ where ǫ[n] is the instantaneous error.20) Since we assumed b[0] = 1 in our synthesis procedure. The feedback coeﬃcients b are the desired impulse response to be used with ﬁnal preﬁltered signal z in the soft bit detector. The decision device can then make a hard decision s[n] since the correct past symbols s[n − 1].. are fed back. the numerator energy is ﬁxed.14) w y = = {f [0] f [1] · · · f [P ] − b[1] · · · − b[L]}T {r[n] · · · r[n + P ] s[n − 1] · · · s[n − L]} (7.3.

23) and Ψ21 by s[k − 1]r∗ [k] s[k − 2]r∗ [k] .26) . . ··· r[k]r∗ [k + P ] r[k + 1]r∗ [k + P ] . s∗ [k]r[k + P ] s∗ [k]s[k − 1] s∗ [k]s[k − 2] . r[k + P ]s∗ [k − 1] r[k]s∗ [k − 2] r[k + 1]s∗ [k − 2] . . . . Starting with Ψ11 we have r[k]r∗ [k] r∗ [k]r[k + 1] . .25) E{s∗ [n]y} = E (7. (7. s[k − L]s∗ [k − 1] s[k − 1]s∗ [k − 2] s[k − 2]s∗ [k − 2] .21) We shall derive expressions for each Ψ. . . . r[k + P ]s∗ [k − 2] Ψ12 = ··· ··· . . s∗ [k]s[k − L] = s[k − 1]s∗ [k − 1] s[k − 2]s∗ [k − 1] . . (7. .24) Ψ22 Vector E{s[n]∗ y} is given by s∗ [k]r[k] s∗ [k]r[k + 1] . s[k − L]r∗ [k + 1] Ψ21 = Ψ22 is given by ··· ··· . r[k + P ]r∗ [k + P ] Ψ11 = ··· ··· . . . . . .22) Ψ12 is given by r[k]s∗ [k − 1] r[k + 1]s∗ [k − 1] . s[k − L]s∗ [k − 2] ··· ··· . . ··· . . s[k − L]r∗ [k] s[k − 1]r∗ [k + 1] s[k − 2]r∗ [k + 1] . . . . . . ··· .70 E{yy† } = E Ψ11 Ψ21 Ψ12 Ψ22 (7. . . . . (7. ··· (7. ··· . . . . r∗ [k]r[k + P ] r[k]r∗ [k + 1] r[k + 1]r∗ [k + 1] .

5. We can not prove that the MMSE-DF synthesis procedure will always yield a minimum phase impulse response.11). I. . and the leading taps are dominant. First of all we require the term E{r[k]r[k]}.27) hence L E{r[k]r[k]} = i=0 c[i] 2 + E{n[k]n∗ [k]} (7. and we select a single burst to examine where each tap undergoes Raleigh fading.30) 7.28) E{n[k]n∗ [k]} is the noise energy N0 . and the impulse response after the preﬁlter in Figure 7.29) E{n[k]n∗ [k + α]} is not zero as the noise is not white. Using (6. the output noise sequence nw [n] will be white and have an autocorrelation function which approximates a Kronecker delta function. but in practice it is frequently observed because the synthesis procedure forces decisions without any delay causing the leading taps to be dominant. (7. with a transmitted pulse shaping ﬁlter which is Gaussian and was presented in Chapter 6. We select a SNR ratio of 10 dB. Moreover. In the latter ﬁgure the MMSE-DF synthesized ﬁlter b is shown along with a LS estimate of b performed after the preﬁlter. We assume that the noise and data sequences are uncorrelated.5.4. These are given by E{s∗ [k]r[k + α]} = c[α] and E{s∗ [k]s[k + α]} = δ[α] where we assumed that the variance of the training sequence is unity.71 We now turn to the individual terms of these matrices and vector.4 A representative example We select the Typically Urban RF channel model from GSM. It is clear that the preﬁltered impulse response is minimum phase.e. We may show that in fact b[0] = f T c and the synthesis procedure is biased.4) we may write L L E{r[k]r∗ [k]} = E{( i=0 c[i]s[n − i] + ns [n])( i=0 c∗ [i]s∗ [n − i] + n∗ [n])} s (7. It is given by (7. There are two more terms we need to evaluate namely E{s∗ [k]r[k + α]} and E{s∗ [k]s[k + α]}. c itself is only an estimate of the actual overall impulse response. A second interesting observation is that the estimated impulse response b indicates that b[0] is in fact smaller than unity. The inclusion of the noise covariance enables noise whitening. In the receiver we use a raised cosine ﬁlter with unity bandwidth and roll oﬀ factor 0. The term E{r[k]r∗ [k + α]} is given by L−α E{r[k]r∗ [k + α]} = i=0 c[i]c∗ [i + α] + E{n[k]n∗ [k + α]} (7.31) (7. The overall impulse response c is shown in Figure 7.

32) where ωo and θ represent the oﬀset and are stochastic processes.6 0.5 3 3.4 0.2 Tap setting [Volt] 1 0. can we estimate such an oﬀset for example in a GSM burst. Generalisation to the discrete case is straightforward. can we track it over time by predicting it into the future? These questions were studied in a general framework by Norbert Wiener.4: The overall impulse response c before the preﬁlter. MIT Press. The orthogonality principle can be applied to the estimation error s(t) − s(t) and we ﬁnd ˜ b E{[s(t) − 7 N. For example.5 tap number 4 4. 7. The desirable linear estimate s(t) of s(t) is an integral ˜ b s(t) = E{s(t)|x(ξ).5 Stochastic processes and MMSE estimation Many problems encountered in practice are of a stochastic nature. and can assume a continuous value. a h(α)x(α) dα]x(ξ)} = 0. a ≤ ξ ≤ b.5 0 −0.5 5 5. in terms of the values of another process x(t) which was speciﬁed for an interval a ≤ t ≤ b. are unpredictable.5 −1 −3 −2 −1 0 Real Part 1 2 3 Figure 7. 7 In this section we treat the MMSE problem for the continuous case. (7.8 0. Interpolation.5 2 2. and once it has been estimated. . 1950.72 1. “Extrapolation. The frequency oﬀset can be modelled in baseband by modifying the baseband signal representation as L Imaginary Part r[n] = ejωo nT +θ k=0 h[k]d[n − k] + ns [n]. (7. They vary with time.5 6 1 0. Formally we wish to estimate the present value of s(t) which is a stochastic process. The question is. and Smoothing of Stationary Time Series”. in any receiver the local ossilator will not be perfectly tuned to the carrier frequency and this will cause frequency oﬀset.34) Wiener.4 1.33) where h(α) is a function to be estimated.2 0 1 1. a ≤ ξ ≤ b} = ˜ h(α)x(α) dα a (7.

6 0.8 Tap setting [Volt] 0.5 2 2. R(0) (7. we will have two unknowns and set up the predictor linearly as ˆ s(t + λ) = E{s(t + λ)|s(t). which leads to Rsx (t. In the rest of this chapter we assume that all processes to be investigated here are WSS and real .1 Prediction 1) We wish to estimate the future value s(t + λ) of a stationary process s(t) simply in terms of its present value: ˆ s(t + λ) = E{s(t + λ)|s(t)} = as(t) ˜ (7.73 1 0. s′ (t)} = a1 s(t) + a2 s′ (t).39) .5 3 3.5 5 5.5 −1 −3 −2 −1 0 Real Part 1 2 3 5 Figure 7.if its complex results still hold except the complex conjugate needs to be applied.38) (7.2 0 estiamted IR b computed IR b 1 1. ξ) = a Imaginary Part b h(α)Rxx (α.35) The formal solution to this integral equation is usually solved numerically.5. Isn’t that illuminating? If s(t) was completely random.37) So it turns out that the optimal choice for a is based on the correlation properties of the process s(t).5 tap number 4 4.5 0 −0. ˜ (7. then how would we go about reformulating the prediction equation? Well. in the sense that it has a white spectrum. (7. 7. ξ) dα.4 0.5 6 1 0.36) What is the optimal choice for a given the stated assumptions? Applying the orthogonality principal we ﬁnd E{[s(t + λ) − as(t)]s(t)} = 0 yields a= R(λ) . how would R look and how would the predictor be able to predict the future value? Can we thus predict the future of such a process? 2) Let us now assume we want to improve on this prediction by not using only its present value but also ﬁst derivative s′ (t).5: The overall impulse response b after the preﬁlter.

So the most general linear interpolator we can set up is N s(t + λ) = ˆ k=−N ak s(t + kT ) 0 < λ < T. Let us develop the best values of ak in terms of the MMSE approach. we ﬁnd R(λ) a1 = . s(t) 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 T t−NT 1 0 1 0 1 0 s(t+λ) 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 λ t+NT t+λ t t Figure 7. The key point one needs to get here is that we think that all the nearest N points known/given to us may contribute to the optimal estimate of s(t + λ).40) R(0) and a2 = R′ (λ) . which ones will dominate? Study an example of your choice. For most processes.6.e. it will exploit the correlation properties of the process s(t). as shown in Figure 7. (7. Formally we wish to estimate the value s(t + λ) of a process s(t) in terms of its 2N + 1 samples that are nearest to t. So if λ becomes large. Rss (τ ) has a decaying form. and given the identities R′ (0) = 0. and we will apply this to the GSM simulator and a real world problem to make the point . 3) Next up we study interpolation. R′′ (0) (7. i.74 Again we ask the question what are the optimal choices for a1 and a2 ? Again applying the orthogo′ ′ nality principle s(t + λ) − s(t + λ) ⊥ s(t). This is an important topic. .6: The interpolation problem. (7. s′ (t). it is dominated by the values at τ = 0. how do we decide what the values of ak must be? To answer this question. of the two terms. think about what we would do using a LS approach? Is that the best we can do? How about if we knew the correlation properties of the process s(t)? Can we do better using that information? That is exactly what MMSE estimation will do for us. Rss (τ ) = −Rss (τ ) ˆ ′ ′′ and Rs′ s (τ ) = −Rss (τ ).41) Now isn’t that illuminating? So the optimal choices in this case where the derivative is used as well as the current values of the process implies the derivatives of the correlation functions.in the process the student will re-create the material for herself.42) Are we in agreement? The reader must be convinced there is no better linear estimation we can come up with? Now that we just formulated the optimal estimator. say s(t).

using the orthogonality principle we ﬁnd N E{[s(t + λ) − from which it follows N k=−N ak s(t + kT )]s(t + nT )} = 0 ∀ |n| ≤ N (7.75 First. the key ingredient in MMSE estimation over LS estimation is that we use the correlation properties of the process s(t) in the former. and these will be optimal in the MMSE sence. .44) This constitutes a set of linear equations that can be solved for ak . (7.43) k=−N ak R(kt − nT ) = R(λ − nT ) − N ≤ n ≤ N. Again.

in addition to the material presented in these notes.76 7. Sept. and GSM networks are supposed to cover moving trains. make the dispersion proﬁle to have 1 single tap. This is the sort of speeds trains in Europe regularly achieve. for adjacent channel interference the noise covariance matrix used in MMSE design is only estimated . and make sure it is continuous. Next.what does that imply? 8 If . Cioﬃ and N. Plot BER vs. e. Now use MMSE estimation theory to estimate/predict the CIR over the burst (i.and that is where the channel impulse response (CIR) is estimated. you may need to read up on cyclo stationary noise (such as co-channel interference) and think about the impact that has on the LS preﬁlter that whitens the noise/interference without knowledge of the noise covariance matrix.e.be prepared to spend signiﬁcant time on it. Conﬁrm for yourself that the CIR is now varying over the burst. This is a ﬁrst class major assignment . Set the channel selection to Flat fading. IEEE Trans. for that you will need to save the state of the fading simulator and feed it as an additional input to subsequent calls to the fading simulator so that the fading becomes continuous over time (multiple bursts).ee. 9 Hint . Olivier and C. Tip. thermal noise for both the MMSE and LS preﬁlters. If the simulator doesn’t support it. so that the CIR for each burst is totally diﬀerent. b. Plot BER vs. determine the correlation properties of the fading for each tap after the preﬁlter.za/). Al-Dhahir on MMSE preﬁlter design (from IEEE Xplore). you will notice that the GSM burst (see earlier chapters in these notes for an explanation) has been set up that the pilot symbols (26 of them) are placed in the middle of the burst . You need to explain the results you will ﬁnd 9 . write new code based on LS and replace the build in preﬁlter function with your LS 8 . Volume 50.up. Conversely. b. Issue 9. Xiao. 2. d. e. i. Modify the simulator to disable frequency hop. This is a ﬁrst class major assignment . Figure 7. a. Download and read the work by J. we call the fading Raleigh fading. adjacent channel interference for both the MMSE and LS preﬁlters with thermal noise insigniﬁcant (Eb /No = 100 dB). a. Now set the simulator input ﬁle to a very high Doppler speed equivalent to 250 km/h. g. Communications. Then create your own preﬁlter function for the simulator. In the GSM simulator. “Joint Optimization of FIR Pre-Filter and Channel Estimate for Sequence Estimation”. c.6 Assignments 1.C. Numerically. c. 2002.ac. f. Plot BER vs. while the simulator for this assignment is based on 1 sample per symbol (S/S). The simulator uses GSM in the frequency hop mode. Simulate the raw BER for ﬂat fading at 250 km/h over a range of SNR values. and show some results that indicate yours produce the same results as mine that came with the simulator. and the assumption made by the equalizer that the CIR is static is invalid. Verify this by plotting the fading over many bursts. add it. track it you want to you may use the work by Olivier and Xiao (J.e. co-channel interference for both the MMSE and LS preﬁlters with thermal noise insigniﬁcant (Eb /No = 100dB). perform a LS optimization of the minimum phase preﬁlter in the GSM simulator (http://opensource.7 depicts the diﬀerence between GSM with and without frequency hop.) on LS preﬁlter design available from IEEE Xplore . The estimated CIR is then used anywhere in the burst for detection (equalization).to explain the results for co-channel and interference.be prepared to spend signiﬁcant time on it. d.the results in that paper you will not be able to reproduce since it used oversampling. pages 1401-1404.

Olivier.you need to change the detector code to incorporate this CIR which is a function of time.for example see the work by Olivier and Xiao 10 . 10 on . Re-simulate the raw BER .7.be prepared to spend signiﬁcant time on it. j. 2000. This is a ﬁrst class major assignment .7: GSM channel fading with and without frequency hop. Repeat Assignment 8 but use a LS approach. over the burst) using enough samples from previous bursts. “Nonselective fading channel estimation with non-uniformly spaced pilot symbols. Now use this predicted CIR which is a function of time to equalize . no. Compare the results using LS estimators to that using MMSE estimators. and you decide how many samples you need to use.177-185. Xiao and J. Simulate at 3 km/h.3. h. pp.C. vol.77 fading process s(t) without frequency hop s(t) Estimated value based on pilots t one block 4T T one burst fading process s(t) with frequency hop s(t) t Figure 7. Tip: LS estimators can be made very sophisticated .” International Journal of Wireless Information Networks. and verify that you do not loose performance relative to the static assumption case. IEEE Xplore or the journal version: C. i.verify that you achieve signiﬁcant gain for 250 km/h. 3. Explain any diﬀerences in results.

78 .

Chapter 8 Information Theory and Error Correction Coding 79 .

Recently. The receivers decoding function then simply has the job of using the multiple copies to ﬁgure out what the transmitter intended to communicate. where a convolutional code is used with a constraint length of 7.hence mathematically z=xG (8.1 Repetition codes The most intuitive way of encoding information at the transmitter is simply to send the same information more than once . is given by z as z = [101 101 101]. which is known today as Digital Communication systems. GSM is an example of such a standard. after having being forgotten when invented in 1962 by R. and have been shown to reach the limits set by Shannon within fractions of a dB on the static Gaussian channel. and a relatively detailed account of convolutional codes with a state of the art decoder. (8. Shannon proposed that a channels capacity to carry information must be measured in bits.80 The modern marvel of Digital Communication systems would not be possible without the concepts developed in information theory and the theory of error correction coding.2) . and the decoding device in the receiver. Information theory promised us the possibility of error free communications. So C has units. Let us denote the repetition encoder by a matrix G. 8. Many wireless standards still employ so-called convolutional codes for their ease of operation and decoding. named after the French mathematician Galois. The fundamental property of all channels that limit the rate and quality of communication is called the capacity and denoted C. the rate R < C.1 Linear block codes All the work in this section assumes that mathematics takes place over a ﬁnite ﬁeld called GF(2). bits/second/Hz. if and only if. say x = [101]. since the bit is the essence of information. then an encoding and decoding device in principle exist. given certain conditions are met.a repetition code.1) How can we transform a vector x to a vector z that is 3 times as long? With a matrix that is not square denoted G .1. Information theory was independently developed by Claude Shannon in the late 40’s. the communication system will be able to achieve arbitrarily small bit error rate. Shannon did not tell us how to design a suitable encoder and decoder. hence the science of error correction coding developed after the seminal work of Shannon. and is still continuing today. Gallagher (at which time they were un-implementable). Low Density Parity Check (LDPC) codes were rediscovered. The LDPC codes are in the class of linear block codes. and spawned en entire new discipline. 8. Imagine we have a source of information that we wish to send over a channel. This chapter presents an introduction to the theory of linear block codes. and remarkable resistance to fading. with its own industry. Shannon proposed his famous noisy channel theorem as follows Theorem 2 Shannon’s Noisy channel theorem If the desired rate of communication is R bits/seconds/Hz. that if the encoding device is used in the transmitter. the capacity of the channel. The encoded message send by the transmitter according to 3 times repetition. based on the method of Viterbi (minsum).

called Low Density Parity Check codes have in fact achieved the Shannon limit within a fraction of a dB using frames that contain 105 bits.81 How would G look in this particular case where we have a 3 times repetition code? The answer is in fact trivial and given by 1 0 G= 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 (8.4) Using elementary operations (i. if and only if the matrix P is sparse. to approach the Shannon limits the size of the matrix G must become very large (i. Hence may may write G = [I | P]. There are many block codes invented over several decades.e. Generally it was thought the situation is hopeless until Robert Gallagher was able to show in 1962 that the decoding problem may be practically solvable.e. but it turns out that the best block codes able to achieve BER that perform close to the Shannon limits are simply systematic codes with randomly chosen parity check bits. comes down to choosing the best parity check matrix. otherwise the matrix is rank deﬁcient. adding rows and permutation of columns). x must become very long) so that many bits must be coded together in a large frame. . Do they achieve better BER than the repetition code? Generally yes . we did not add nor remove any information when reducing a matrix to the systematic form. The mathematical properties of the systematic form matrix are identical to the original matrix. and I is an identity matrix. 8.5) It is called systematic because the uncoded bits appear as is in the coded bit string. The columns of the parity check bits must be linearly independent.e. Imagine a random code for the same information sequence x. i. but the decoding complexity becomes prohibitively large. we may reduce this matrix to the systematic form 1 0 Gc = 0 1 0 0 0 0 1 p11 p21 p31 p12 p22 p32 p13 p23 p33 p14 p24 p34 p15 p25 p35 p16 p26 p36 (8.repetition codes are not good codes. we can make weird codes diﬀerent from G given above for repetition codes by changing the generator matrix G. P is called the parity check bits.2 General linear block codes Obviously. In ﬁnding the code that will produce the best BER at the receiver. In such a case the BER performance of the code approaches the Shannon limits.1. such as the Hamming.3) Verify that the matrix G given above does the job it proclaims to do. with decoding performed by the Pearl belief propagation algorithm. These codes. say 1 Gr = 0 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 0 0 1 1 1 1 (8. multiplying rows with scalars. For example the columns of P contain only 3 ones (the rest are zero) regardless of the size of P. However. Reed-Solomon and BCH codes [2]. since the decoding problem for this case is NP complete.

So its a R = 4 rate code. The decoder in the receiver is given the matrix for ˜ and asked to determine what x was. since those are most likely the incorrect bits. z z and the decoding job is completed since the code is systematic (why is that so?).82 8. If it is.10) H = 0 1 1 1 0 1 0 . we look at the probability information also provided by the detector. else we change more bits z z with probability close to 0.9 0.8) We can use this fact in many ways. First we recognize that the detector sends the decoder not only estimates of the detected bits. (8. If we dont ﬁnd zero. 7 We may compute H as 1 1 1 0 1 0 0 (8. decoding is done.8 0. Decoding the block code Let us now look at an example of sub-optimal decoding for the block code.51 0. is orthogonal to any row of the matrix H. The transmitter transmitted x that contains 4 bits. while the channel had 3 taps. to test if it is a valid z † code word. One way is to use it in the Pearl Belief Propagation algorithm. If we ﬁnd that ˜H = 0 the detected codeword ˜ contains no errors. The encoder uses an encoder generator matrix given by 1 0 0 0 1 0 1 0 1 0 0 1 1 1 (8. by computing (8.58 0. For the systematic matrix G = [I | P].9 0. Mathematically z H† = 0. H is given by H = [P† | I].9) G= . (8. which produces near optimal results. but also probability information about the reliability of those bit estimates.7 .8). (8.11) The ﬁrst row is the bits values and the second row is the probability of that decision being correct.3 Decoding linear block codes using the Parity Check matrix H Decoding the linear block codes uses the Parity Check matrix H with the property that G H† = 0 where † denotes the transpose.1. to gain insight into the decoding process we will consider a sub-optimum decoding procedure here. Then we test the modiﬁed vector ˜ by checking if now ˜H† = 0.6) Any codeword z that the transmitter transmits and computed as xG. 0 0 1 0 1 1 0 0 0 0 1 0 1 1 Wat is the rate of the encoder? It produces 7 coded bits for z if the uncoded bits from the source is x and has 4 bits.5. The detector based on a Viterbi algorithm with sub-optimal probability estimates produced an estimate for ˜ as z ˜= z 1 0 0 1 0 0 1 0. So we may use the hard bit estimates ˜ from the detector.9 0. 1 1 0 1 0 0 1 The modulator used was a BPSK modulator. and change the bit values of the bits with probabilities closest to 0. First of all z . However.5.7) (8.

however generally speaking there was some improvement since less bits are in error. 0. and the transitions that the encoder will undergo if fed by a 0 as input by solid lines.12) Clearly the detector produced hard bits that were wrong . 0.58. in recent times (late 1980’s) it has been shown that near optimal codes can be derived from convolutional codes if two such codes are combined and iteratively decoded (so-called turbo Codes [2]). 0. the closest to 0. Hence we compute [1. 1. The state diagram has states corresponding to the two memory elements of the 3 taps in the encoder.13) where the modiﬁed bit is indicated in bold. 0. 1] H† = [100] = 0. 1] H† = [000] = 0 and we conclude that the transmitter must have transmitted [1.51. Again we did not ﬁnd 0 so there is still an error. Those bits are indicated also on the state diagram. 0. 1 Figure 8. 1. so that x = [1. 0. To make sure that the ˜ hard decisions are not z correct (because if it was then decoding is not needed) we compute [1. 0.14) because the code is systematic and thus the ﬁrst 4 bits are the source bits. (8. 8. a task that becomes impossible to perform even for moderate sizes of G. and its state diagram. 0. GSM and its derivatives for data communications such as EDGE uses convolutional codes. which we can see is bit number 4 in ˜ since it has a probability of 0. 0.5 of all the z probabilities. So in general we associate 2L−1 states with a convolutional code with L taps. The state digram shows the transitions that the encoder will undergo if fed by a 1 by dotted lines.83 we know that the the vector x has to have 4 bits. The reason is that the convolutional codes are easy to implement and to decode. So we compute z [1. . 0. Let us change the next most likely bit to be wrong. we now determine the most likely bit in error. 0. 0. 0. 0. 0.15) (8. 0.1 shows a typical convolutional rate R = 3 code (the encoder) with 3 taps. and can be eﬃciently and optimally decoded by the min-sum (Viterbi) algorithm. Sub-optimal decoding is now completed. (8. The reader may now appreciate what the probability information provided by the Viterbi (minsum) detector is useful for . 0.2 Convolutional codes and Min-Sum (Viterbi) decoding Convolutional codes are probably some of the most widely used codes in use today. 1. 0.without it we would have had to change every possible combination of the 7 bits. Each time the encoder produces 3 output encoded bits (associated with the edges in the trellis) after having been fed an 1 uncoded bit because it is a rate R = 3 code.16) (8. 1].if is was without errors the test above would have produced 0. 0. 1] H† = [111] = 0. Moreover. 0] ˜ (8. that would be bit 5 in ˜ with a probability of 0. To decode the received vector.

For hard decision decoding. the decoder just needs the bit hard values. but in this case simply the accumulated Euclidean distance between the 3 observed soft bits (deﬁned below) and the candidate bits given by the edge in question in the trellis. and estimating the encoded bits ˜. (1.(−1. 0”). 8. 1). and (−1.4). 0”). . we may set up a trellis as we did also for the 3 tap detector. The trellis has states (1.1: The convolutional encoder and state diagram. For the detector this was computed as the Euclidean distance between the observed complex number coming from the demodulator and matched ﬁlter. 1). 1”). −1). The detector has the job of reversing the multipath channel. For the decoder we also compute an Euclidean distance metric. and after being matched ﬁltered it is fed to the detector. For each edge connecting two states (if it is a legal connection.1 Decoding the convolutional codes The encoded bits are fed to the modulator. These estimates are not only z the bits themselves. BPSK etc. not the probabilities. These are transmitted over a noisy multipath channel. and the candidate symbols convolved with the impulse response. the encoder was able to move/transition between the 2 states) a cost say δi has to be computed. but also the probabilities. These are now fed to the decoder (see Figure 1. The decoder will then base its estiamtes of x on the Hamming distance between the bits estimates from ˜ the detector and the candidate transmitted bits in the the decoder trellis.). and (”0.84 Output bit 1 α Σ Input uncoded bit Output bit 2 β τ τ Σ Output bit 3 γ All operations performed modulo 2 uncoded bit = 0 uncoded bit = 1 000 state 00 001 state 01 010 110 011 111 100 state 10 101 state 11 Figure 8. −1) corresponding to the possible bits in its 2 tap memory as (”1. since the encoder has 3 taps. First of all.(”0. 8PSK. where it is modulated into analogue symbols according to some chosen modulation scheme (QPSK. (”1. i. but hard decision decoding is very suboptimal as it doesnt use the probability information supplied by the detector.e.2. Hence we will rather look at the optimal soft bit decoding where all information available to the decoder is exploited.2. 1”). as shown in Figure 8.

1 1 How would the symbol rate T and the uncoded bit rate. If the modulator uses BPSK then the symbol rate would be 3 times larger than τ .18) where α. ˜ 1 Hence the cost say δi for an edge in this trellis (for a R = 3 ) is given by δi = |˜sof t − αi |2 + |˜sof t − β i |2 + |˜sof t − γ i |2 zi z i+1 z i+2 (8. γ imply the 3 bits that are associated with the edge in question. that term becomes benign and does not inﬂuence the optimal path that the Viterbi will choose. 3 coded bits are delivered by the encoder for each uncoded bit. Note that each edge in the trellis ’consumes’ 3 bits from the detector in the metric calculation. For example if we use the rate R = 1 code. The soft bit is formed by the decoder itself using information from the detector as zsof t = (2˜ − 1) | ln ˜ z Pz=′ 1′ ˜ | Pz=′ 0′ ˜ (8. How does that yield x? Simply by recognizing that the ˜ uncoded bits required to make the encoder move from one state to the next is unique. and we used an 8 P SK modulation scheme. i. That is because it is a rate R = 1 code. 3 then the symbol rate and uncoded bit rate would be the same. when a bit from the detector is uncertain. Back tracing it yield the states the encoder went thru over the entire frame that was encoded. i. 3 The power of soft decision decoding becomes obvious if we recognize that of the three terms in the δ calculation the ones that conﬂict with a bit of high certainty from the detector are independently punished. β. These estimated hard bits are fed to the de-compression device to form the . ˜ The reader may realize that the decoder based on Viterbi decoding has many things in common with the detector studied previously.e. and ˜ z is the hard bit value (i. Also. say τ relate? It depends on the modulation scheme used.17) where Pz=′ 1/0′ is the probability that the estimated bit was a ”1/0” as provided by the detector. and so with the estimated states known the estimated uncoded bits x can be read oﬀ from the state diagram.85 zs[0] zs [3] zs [1] zs [4] zs [5] zs [2] 0 −1−1 ∆ A −1 ∆ 1 2 −1−1−1 α β γ −1−1 −11 1−1 z s[6] zs[7] zs[8] 2 1 1 0 AA ∆ AA 3 3 2 AA 3 A −1 −1−1 ∆4 111 111 ∆5 winner path least cost ∆7 −1−11 ∆6 11−1 11 i A = bit i Figure 8.e.2: The convolutional decoder trellis based on the state diagram. 0/1) from the Viterbi detector. then the relevant term in the expression for δ becomes ﬁxed regardless of the trellis state. For example here we also only know the hard bits x (unless ˜ we do extra work).e. After the trellis is populated with all the δ’s then we apply the min-sum (Viterbi) algorithm to ﬁnd the path thru the trellis with least cost.

or a diﬀerent decoder algorithm can be used that is able to provide optimal soft bit information.86 original source. If it did require soft bits. The de-compression device is assumed to only require hard bits. albeit a voice. the decoder can be modiﬁed to provide also a probability estimate. an image or just a data ﬁle. such as the BCJR [3] algorithm. .

On such a trellis the likelihood for each edge is easily deﬁned . In the GSM simulator the provided decoder is based on the min-sum strategy.87 8. . Develop your own decoder using the forward-backward MAP decoder. well suited to the forward-backward MAP decoder. one per edge. It may help to redraw the trellis to show each transmitted bit explicitly. replace the given one and simulate for the BLER for MCS 1 and 4. See [1] Chapter 25 for a treatment of this type of trellis.3 Assignments 1.see the chapter on detection where the forward-backward MAP algorithm was explained.

88 .

Jelinek. Inference. [3] L.phy. 89 . 1974.Bibliography [1] D. 2003. (http://www.uk/mackay/itila/) [2] J. McGraw-Hill.cam. Cambridge University Press. “Optimal decoding of linear codes for minimizing symbol error rate. Mar. J. Cocker. Information Theory.and Learning Algorithms. 2001. Digital Communications.ac.MacKay. vol.284-287. Raviv.inference.” IEEE Trans. pp. Fourth Edition. Information Theory.R. Proakis. F. and J. Bahl.IT-20.