u(t) v(t) dt =
_
_
_
0 , if u = v,
const , if u = v.
2.2 WHY USING OFDM? 11
2.2.5 OFDM: How to Find Orthogonal Signals
There are several possible ways to create orthogonal signals. The solution presented
here uses the Discrete Fourier Transform (DFT). A hardware efcient implementation
of the DFT is the Fast Fourier Transform (FFT). All the sinusoids of the DFT form an
orthogonal basis. If a time discrete signal is transformed with the DFT, it is essentially
correlated with those base sinusoids. Furthermore, the DFT is invertible. Using the
inverse DFT (or the inverse FFT  IFFT), the original signal can be reconstructed [3].
The mathematical backgrounds are well described in [4]. A sample system is shown in
Fig. 2.2. The basic ideas behind that system are: A whole collection of source symbols
(complex) are considered to be in the frequency domain. They get translated by the
IDFT into the time domain. Those discrete samples are transformed into a continuous
signal that can be transmitted over the channel. The receiver samples the signal and
transforms it back into the frequency domain by the use of the DFT. If there is no noise
present and the channel is perfect, the symbols at the receiver are the same as the ones
that were transmitted.
As the base functions of the DFT overlap each other without interfering, the spectral
efciency of the signal is a lot higher than in the case of a simple FDM and approaches
the case of the single carrier system.
Figure 2.2: OFDM system using twice a DFT [4]. Note: Instead of using a IDFT and a
DFT (or a IFFT and a FFT), one can use two DFT. This is because the DFT and the IDFT
are very similar. In that case, several adaptions need to be done to the datapath of the
transmitter!
12 2 INTRODUCTION
2.2.6 OFDM: Cyclic Prex
In a multipath channel, several delayed versions of the original signal appear at the
receiver. One speaks of intersymbol interference (ISI) if a consecutive OFDMsymbol gets
distorted by the previous one. In a general case, only the rst few samples of the signal
get distorted. The problem can be solved by waiting a specic time between transmitting
two consecutive symbols. This guard interval (in time domain) is depending on the
channel [3].
The other problem is that a single OFDM symbol can interfere with itself. This is called
intrasymbol interference. The reason is the following: A convolution in time domain is
equivalent to a multiplication in the frequency domain iff the signal is either periodic
or innitely long. Both is not fullled for a standard OFDM system [3].
The solution is to make the OFDM symbol appear periodic. This is done by using a
cyclic prex (CP): The last few samples of the signal are copied at the beginning of the
signal where originally the guard interval would be. This cyclic prex only contains
redundant data and can therefore be discarded at the receiver  so there is no problem
with ISI [3].
Using a cyclic prex leads to a signicant simplication of the receiver: Instead of
having to remove a convolution in time (between the signal and the channel), it is only
necessary to remove a multiplication in frequency domain [3].
2.2.7 OFDM: Noise Considerations
The most common noise source in a wireless system is thermal noise  usually manifest
ing itself as Additive White Gaussian Noise (AWGN). As the noise spectrum is uniform
in the frequency domain, this kind of noise has the same impairment on the overall
system as it has in a single carrier system [3].
Another common type of noise is impulse noise. This type of broadband noise is generally
only present during a short period. As described before, the OFDM system performs
better under impulse noise than a single carrier system [3].
Colored noise is difcult to handle as it doesnt have a constant spectrum as AWGN. A
simple solution for high noise environments is to lower the data rate [3].
2.3 CHANNEL MODELS 13
If there are other systems present, carrier interference can occur. An OFDM system can
handle that by disabling the affected subcarriers [3].
Another type of imperfection emerges fromthe local oscillator. There are two effects that
have to be considered: Phase noise (sometimes called phase jitter) and the frequency
offset. Phase noise originates from the fact that the oscillator frequency changes
randomly within a small range. The same argument in the frequency domain is that
the oscillator does not produce a single peak but rather a smeared out peak. Phase
noise affects every subcarrier. As the spectral width of a subcarrier is smaller than in
a single carrier system, phase noise affects OFDM systems more severly than single
carrier systems [3].
The frequency offset of an oscillator can be understood as the average frequency of the
oscillator. This frequency is generally slightly different from the expected frequency.
Clock quality, temperature and other effects are generally responsible for this offset. A
solution to this problem is to introduce pilot subcarriers for synchronization. It has to be
noted that introducing pilot subcarriers affects the maximum data rate negatively [3].
2.2.8 Existing Systems Using OFDM
Two of the most prominent systems using OFDM are ADSL (Asynchronous Digital Sub
scriber Loop) and DVBT (Digital Video Broadcast  Terrestrial). The rst is used for high
speed internet connections and the second for the European digital television [3].
A system that uses both MIMO and OFDM will be the next generation Wireless LAN
(WLAN 802.11n). The nal specications are not yet available  but there are already
existing devices on the market based on a draft (e.g. [5]). Those new devices promise
a signicantly higher data rate than previous generations.
2.3 Channel Models
2.3.1 Notation
The following notation will be used:
x(t) signal leaving the transmitter (time domain)
14 2 INTRODUCTION
X signal vector in frequency domain: Input of IFFT
y(t) signal reaching the receiver (time domain)
Y received signal vector in frequency domain: Output of FFT
h(t) channel impulse response (time domain)
H channel response matrix in frequency domain
n(t) additive noise (time domain)
N noise vector in frequency domain
T total number of transmitting antennas
number of the transmitting antenna
R total number of receiving antennas
r number of the receiving antenna
C total number of subcarriers
c number of the subcarriers
2.3.2 SISO Channel Model
The simplest possible system is a SISO (SingleInput, SingleOutput) system. In time
domain, it can be written as:
y(t) = x(t) h(t) +n(t)
This is equivalent to the following notation in the frequency domain:
Y (f) = X(f) H(f) +N(f)
2.3 CHANNEL MODELS 15
2.3.3 OFDM Channel Model for C Channels (SISO)
An OFDM system can be represented by the following model in frequency domain:
[y
c=1
] = [H
c=1
] [x
c=1
] + [n
c=1
]
[y
c=2
] = [H
c=2
] [x
c=2
] + [n
c=2
]
[y
c=3
] = [H
c=3
] [x
c=3
] + [n
c=3
]
.
.
. =
.
.
.
.
.
. +
.
.
.
[y
c=C
]
. .
Y
C1
= [H
c=C
] [x
c=C
]
. .
X
C1
+ [n
c=C
]
. .
N
C1
Each line corresponds to one of the orthogonal tones.
2.3.4 MIMO Channel Model for a 44 System
To simplify the notation for a MIMO system with T transmitting and R receiving
antennas, it is assumed that T = R = 4. Such a general setup is shown in Fig. 2.3. It
is straight forward to change the number of transmitting or receiving antennas. The
T1
T2
T3
T4
R1
R2
R3
R4
Transmitter Receiver
h11
h21
h31
h41
h44
Figure 2.3: A standard approach for a MIMO system with 4 transmitting and 4 receiving
antennas.
16 2 INTRODUCTION
system shown in Fig. 2.3 can be written in the frequency domain the following way:
_
_
y
r=1
y
r=2
y
r=3
y
r=4
_
_
. .
Y
R1
=
_
_
h
r=1,=1
h
r=1,=2
h
r=1,=3
h
r=1,=4
h
r=2,=1
h
r=2,=2
h
r=2,=3
h
r=2,=4
h
r=3,=1
h
r=3,=2
h
r=3,=3
h
r=3,=4
h
r=4,=1
h
r=4,=2
h
r=4,=3
h
r=4,=4
_
_
. .
H
RT
_
x
=1
x
=2
x
=3
x
=4
_
_
. .
X
T1
+
_
_
n
r=1
n
r=2
n
r=3
n
r=4
_
_
. .
N
R1
It can be assumed that only one antenna is transmitting and all the others are not
sending any signal at all. In that case, the equation simplies to:
_
_
y
r=1
y
r=2
y
r=3
y
r=4
_
_
. .
Y
R1
=
_
_
h
r=1,=1
h
r=1,=2
h
r=1,=3
h
r=1,=4
h
r=2,=1
h
r=2,=2
h
r=2,=3
h
r=2,=4
h
r=3,=1
h
r=3,=2
h
r=3,=3
h
r=3,=4
h
r=4,=1
h
r=4,=2
h
r=4,=3
h
r=4,=4
_
_
. .
H
RT
_
x
=1
0
0
0
_
_
. .
X
T1
+
_
_
n
r=1
n
r=2
n
r=3
n
r=4
_
_
. .
N
R1
This can be further simplied to:
_
_
y
r=1
y
r=2
y
r=3
y
r=4
_
_
. .
Y
R1
=
_
_
h
r=1,=1
h
r=2,=1
h
r=3,=1
h
r=4,=1
_
_
. .
H
R1
x
=1
+
_
_
n
r=1
n
r=2
n
r=3
n
r=4
_
_
. .
N
R1
2.3.5 MIMOOFDM Channel Model
As can be seen from the SISO OFDM channel model, the different OFDM subchannels
can be treated separately. This allows to formulate a simple model for a MIMOOFDM
system: The whole system can be seen as a stack of C different MIMO systems. A
graphic showing such a system is presented in Fig. 2.4
2.3 CHANNEL MODELS 17
Y
Rx1
= H
RxT
X
Tx1
.
N
Rx1
+
subchannel 1
subchannel 2
subchannel 3
subchannel 4
Rx1xC RxTxC Tx1xC Rx1xC
subchannel C
Figure 2.4: A channel model for a MIMOODFM system.
2.3.6 The TGn Channels
In 2004, the Task Group N (TGn)
1
published a set of channel models applicable
to indoor MIMO WLAN systems. The model[s] can be used for both 2 GHz and
5GHz frequency band[s.]. There are six different channel models: A, B, C, D, E and
F. Model A is an optional model and should not be used for system performance
comparisons [6].
The following steps are taken for models B to F
2
:
Start with delay proles of models BF.
Manually identify clusters in each of the ve models.
Extend clusters so that they overlap, determine tap powers (see Appendix A).
Assume PAS [power angular spectrum] shape of each cluster and corresponding
taps (Laplacian).
Assign AS [angular spread] to each cluster and corresponding taps.
Assign mean AoA [angle of arrival] (AoD [angle of departure]) to each cluster
and corresponding taps.
Assume antenna conguration.
Calculate correlation matrices for each tap.
1
IEEE P802.11  TASK GROUP N;
http://www.ieee802.org/11/Reports/tgn_update.htm
2
quoted directly from [6] to show the complexity of the models
18 2 INTRODUCTION
The TGn also calculated the mean capacity in bits per second per Hz for all models. The
results show that model C has the lowest capacity of all proposed models. This suggests
that channel C is the most challenging of the channel models. This is the reason that
TGn C is used for the simulations in this thesis.
2.4 Reconstruction of the Original Data
The MIMOOFDM channel model suggests that if the exact channel matrix and the
exact noise vector were known, the original data could be reconstructed perfectly. It
is obvious that in any real system with a limited amount of training data, one cannot
perfectly estimate neither the channel matrix nor the noise vector. The limitation of
available training data is justied by the loss of throughput by increasing the amount
of training data and the fact that any real wireless channel is timevarying. These
imperfections can cause errors in the detected symbols. By improving the performance
of the receiver, the amount of errors can be minimized. This thesis deals with the
estimation of the noise variance (or equivalently the SNR) as the noise variance is an
important parameter for decoding the received signals.
3 Literature Review
3.1 Method
This part of the thesis presents a selection of papers that might be relevant to the topic
of interest. The papers are sorted alphabetically by the family name of the author.
As the methods and parameters used for simulation vary highly between the different
papers, numerical comparisons of algorithms are omitted in this section.
An algorithm is suitable if the following points are satised:
better or equally accurate as other algorithms of similar setup and complexity
adaptable to MIMOOFDM
well enough documented to be implementable in a reasonable amount of time
complexity of calculations within reasonable limits and therefore suitable for
hardware implementation
Any additions not present in the paper and added by the author of this thesis are written
in italics.
3.2 Papers
3.2.1 Aldana et al. 2000: Accurate Noise Estimates in Multicarrier
Systems
Aldana et al. [7] presented in their work two different algorithms to estimate the noise
variance in multicarrier systems. Those algorithms would therefore be suitable for
OFDM systems. The two presented algorithms do not use any known training signals.
20 3 LITERATURE REVIEW
The rst algorithm presented is the EM (Expectation Maximization) algorithm. The
algorithm is iterative and converges only slowly. Those two facts make this algorithm
unsuitable for application in a real system.
The second algorithm is a decision directed algorithm. Similar to the previous algo
rithm, this one is suitable for OFDM signals, operates in the frequency domain and does
not need any training data.
N
k
= Y
k
H
k
X
k
2
k
=
1
L
L
n=1

N
k

2
SNR
QAM
=
H
k

2
d
2
(M
2
1)
6
2
Y
k
is the received signal of the kth tone. H
k
is the gain of subchannel k and assumed to
be known (or at least accurately guessed).
X
k
is the estimation of the transmitted symbol
of the kth tone. Known training symbols might improve the quality of the estimated
SNR. M is the number of symbols (Mary QAM) and L is the blocklength. d is the
distance between symbols. The authors come to the conclusion that their algorithm
does underestimate the true SNR and that in order to get reliable results, a look up
table (LUT) depending on the modulation scheme should be implemented.
3.2.2 Athanasios et al. 2005: SNR Estimation Algorithms in AWGN for
HiperLAN/2 Transceiver
Athanasios et al. [8] present two different algorithms for the HiperLAN/2 system that
employs OFDM. Both algorithms estimate the SNR in a 64QAM system.
The rst algorithm is called MMSE (Minimum Mean Square Error). This algorithm uses
training signals a and works in the frequency domain.
a = {a
1
, a
2
, ..., a
L
}
C = Y a
H
E = Y 
2
SNR =
C
2
a
2
E C
2

3.2 PAPERS 21
The authors state that it is also possible to only use the real or the imaginary part of the
received data to reduce the complexity of the calculation, whereas the drop in precision
should be only minimal.
The second algorithm is called EVM (Error Vector Magnitude). It estimates the sent
symbols and calculates the average and the variance of them. It is not specied in detail
how those symbols should be estimated and the algorithm seems to exhibit a rather
poor performance compared to the MMSE algorithm.
3.2.3 Athanasios et al. 2006: SNR Estimation for Low Bit Rate OFDM
Systems in AWGN channels
Athanasios et al. [9] present two different algorithms for OFDM systems. The second
one is the MMSE algorithm already presented in [8].
The rst algorithm is called SNV (Squared Signal to Noise Variance). Again, this
estimator needs estimates of the received symbol and the performance seems to be
inferior to the MMSE algorithm.
3.2.4 Beaulieu et al. 2000: Comparison of Four SNR Estimators for
QPSK Modulations
Beaulieu et al. [10] present four different estimators for QPSK modulations in time
domain. X
i
is the in phase component and Y
i
is the quadrature component. The
algorithm with the best performance is:
2
= L
_
L
i=1
(X
i
 Y
i
)
2
X
2
i
+Y
2
i
_
1
It has to be further investigated if and how this algorithm could be used for an OFDM
system. The same algorithm is also presented in the frequency domain by Hong et
al. [11].
22 3 LITERATURE REVIEW
3.2.5 Boumard 2003: Novel Noise Variance and SNR Estimation
Algorithm for Wireless MIMO OFDM Systems
Boumard [12] presents an algorithm to estimate the SNR in a 2x2 MIMOOFDM system
in the frequency domain. The algorithm needs some well dened training symbols (two
per antenna  sent individually) and the results from a channel estimator. The algorithm
is able to calculate both the SNR per subcarrier and the overall SNR. The algorithm
seems to perform well as long as the channel is reasonably slow fading. It needs to be
further investigated, how this algorithm can be adapted for a 4x4 MIMOOFDM system
with predened training symbols. The principal challenges are the use of given training
symbols and the expansion to a 4x4 system.
3.2.6 Pauluzzi et al. 2000: A Comparison of SNR Estimation
Techniques for the AWGN Channel
Pauluzzi et al. [13] present ve different SNR estimation techniques for PSK modulation
in an AWGN channel.
The rst algorithm is called SSME (Split Symbol Moments Estimator) and is only valid
for BPSK modulation.
The second algorithm is the ML (Maximum Likelihood) estimator. There are two
versions of that algorithm: One that uses known training symbols and one that uses
guesses of the transmitted symbols. The dataaided version seems to perform near
the optimum and the nondataaided performs equally well for high SNRs. To use this
algorithm, it has to be adapted to the MIMOOFDM system as the system used by Pauluzzi
et al. is quite different.
The third algorithm is the SNV estimator that is also presented in [14] and [9].
The fourth algorithm is the M
2
M
4
(Second and FourthOrder Moments) estimator.
This estimator seems to perform similar to the ML algorithm except in low SNR
environments, where it performs worse.
The fth algorithm presented is the SVR (Signal to Variance Ratio) estimator. It per
forms signicantly worse than the ML estimator especially in high SNR environments.
3.2 PAPERS 23
3.2.7 Ren et al. 2005: A New SNRs Estimator for QPSK Modulations
in an AWGN Channel
Ren et al [15] present the M
2
M
4
algorithm from [13] and an improved version of
this algorithm. The improved version seems to perform better than the original and
also better than the ML in high noise environments (SNR < 0dB). As this region is not
suitable for fast wireless communication anyway, the algorithm doesnt offer any advantage
over the ML algorithm.
3.2.8 Ren et al. 2008: SNR Estimation Algorithm Based on the
Preamble for Wireless OFDM Systems
Ren et al. [16] analyze the algorithm presented by Boumard [12] and come to the
conclusion that the performance of this algorithm depends highly on the frequency
selectivity of the channel. They propose an improved version of Boumards algorithm to
solve that problem. The authors also present several simulations that seem to conrm
that fact.
W =
4
N
N1
k=0
_
Im
_
Y
0,k
c
0,k
k

H
k

__
2
S =
M
2
M
2
=
1
N
N
k=0
Y
0,k

2
SNR
av
=
W
SNR
subch k
=

H
k

2
W
N is the size of the IFFT/FFT. Y
m,k
is the mth symbol of the kth subcarrier after the
FFT at the receiver. c
m,k
is the mth symbol on the kth subcarrier.
H
k
is the channel
coefcent estimate.
24 3 LITERATURE REVIEW
3.2.9 Schmidl et al. 1997: Robust Frequency and Timing
Synchronization for OFDM
Schmidl et al. [17] present a time domain approach for synchronizing transmitter and
receiver. As a byproduct they suggest an SNR estimator working in the time domain.
This estimator works well for the SNR below 20 dB. Above this level, M(d
opt
) is so
close to 1 that an accurate estimate of the SNR can not be determined, but only that
the SNR is high.
3.2.10 Shin et al. 2001: Simple SNR Estimation Methods for QPSK
Modulated Short Bursts
Shin et al. [18] present two algorithms to estimate the SNR in a QPSK modulated
system.
The rst algorithm is the EVM algorithm also presented by Athanasios et al. [8]. The
algorithm is rather simple and doesnt need any estimates at all (at least for the QPSK
case and not too low SNR). The authors also attribute a higher accuracy to this algorithm
than in [8].
1. check if Re{Y } > 0 and if Im{Y } > 0
2. for a given time period, collect the values for each of the four regions
3. estimate the SNR by: SNR =
average
2
variance
4. repeat to get an average
As this algorithm is simple to implement and independent of any other hardware. It should
also be easy to transform to the OFDM case.
The second algorithm presented is the MMSE that is also presented by Athanasios et
al. [8]. Interestingly, the MMSE algorithm is considered to be inferior to the EVM
algorithm by Shin et al., whereas Athanasios et al. come to the opposite conclusion.
3.2 PAPERS 25
3.2.11 Xu et al. 2005: SubspaceBased Noise Variance and SNR
Estimation for OFDM Systems
Xu et al. [19] present a subspace based algorithm for SNR estimation in OFDM
systems. The algorithm is computationally quite complex: 1) Make an eigenvector
decomposition of the correlation matrix
R.
3.2.12 Xu et al. 2005: A Novel SNR Estimation Algorithm for OFDM
Xu et al. [20] present a broad range of algorithms. Among them are the ML, the MMSE
and the M
2
M
4
algorithms already presented in other papers.
Based on Boumards algorithm [12], they develop a new algorithm that should perform
better in time varying channels.
R
G
(l) =
1
J
J1
j=0
y(i, j) y
S
G
R
G
(1) +
R
G
(1) R
G
(2)
3
(3.2)
N
G
=
1
J
J1
j=0
y(i, j) y
(i, j)
S
G
(3.3)
SNR =
S
G
N
G
(3.4)
y(i, j) is the jth symbol on the ith subcarrier.
3.2.13 Ycek et al. 2006: MMSE Noise Power and SNR Estimation for
OFDM Systems
Ycek et al. [21] propose to use an estimator with a two dimensional lter over
time and frequency. To reduce the calculational complexity, they propose to have
a rectangular window for the lter. The authors come to the conclusion that their
approach signicantly improves the SNR estimation in colored noise. The paper
continues work proposed in an earlier paper by the same authors [22]. If colored
26 3 LITERATURE REVIEW
noise should be a problem, this algorithm could be further investigated  despite its high
computational complexity.
3.3 Other Related Papers
The following papers were somehow related to the problem but were too far away from
the actual problem to be adapted with a reasonable amount of work:
Alagha 2001: CramerRao Bounds of SNR Estimates for BPSK and QPSK Modu
lated Signals [23]
This paper presents the theoretical bounds that can be achieved by the best
possible algorithm.
Benedict et al. 1967: The Joint Estimation of Signal to Noise from the Sum
Envelope [24]
This paper provides some basic theory about estimating noise in narrowband
AWGN systems.
He et al. 1998: Effective SNR Estimation in OFDM System Simulation [25]
Some basic principles about using OFDM without the DFT are presented. But more
important is the following quote: Disregarding the formof distortions/interferences,
by the virtual of the central limit theorem, the noise part in eqn. (10) tends to
approach a Gaussian process, and it has been shown that if n(t) is a WideSense
Stationary (WSS) process, the noise part in eqn. (10) tends to be white. This
indicates that it might be reasonable to assume that SNR estimation has a higher
probability of success if done in frequency domain.
Further, a rather basic algorithm for SNR estimation is presented.
Jeruchim et al. 1989: Estimation of the SignaltoNoise Ratio (SNR) in Commu
nication Simulation [26]
A very basic paper providing some estimator theory.
Kerr 1966: On Signal and Noise Level Estimation in a Coherent PCM Chan
nel [27]
A basic paper that is too far away from the actual problem to be of any direct use.
3.3 OTHER RELATED PAPERS 27
Trkboylari et al. 1998: An Efcient Algorithm for Estimating the Signalto
Interference Ratio in TDMA Cellular Systems [28]
A rather complex algorithm for TDMA systems.
Wiesel et al. 2002: DataAided SigaltoNoiseRatio Estimation in Time Selective
Fading Channels [29]
A time selective channel model is presented and a generalized class of ML detec
tors for that model is derived.
Wiesel et al. 2002: NonDataAided SignaltoNoiseRatio Estimation [30]
A non data aided version of the ML detector is presented along with a M
2
M
4
estimator. Further, a non data aided iterative algorithm is presented.
Wiesel et al. 2006: SNR Estimation in TimeVarying Fading Channels [31]
The CramerRao bound (CRB) is derived for data aided SNR estimation. It is
shown that the data aided CRB is the same for time constant and time varying
channels. But this doesnt mean that all the algorithms perform equally well in
time varying channels. A generalized ML detector is derived for a polynominal
intime, timevarying fading channel. This algorithm is iterative. If time variation
should be found to be a problem in the real system, it would probably be worth
to consider this algorithm  even though iterative behavior usually means high
computational costs.
28 3 LITERATURE REVIEW
4 Simulations
4.1 Description of the Simulation Environment
The simulation environment performs the following tasks for each sweep:
1. generate a datastream in the time domain, consisting of:
2 short preambles (64 samples + 16 samples for the CP each)
2 long preambles (a total of 128 samples + 32 samples for the CP)
MIMO training (320 samples  80 per transmitting antenna)
random data to transmit (64 samples + 16 samples for the CP)
2. transmit the data (apply channel matrix)
3. generate AWGN noise corresponding to the SNR setting (all channels equal
amount of noise)
4. add the generated noise to the received data
5. estimate SNR
6. congure receiver and decode data bits
7. calculate the BER
8. repeat steps 2 to 6 for all SNR steps
At the end of all sweeps, the average of the BER is calculated for each channel SNR.
It has to be noted, that a real system should send more data in order to increase
the throughput. This is not done here because the focus is on the SNR estimation.
In order to get reliable results with a reasonable amount of computation time, it is
preferred to increase the amount of sweeps rather than to increase the amount of data
per sweep. The estimated SNR is the average of the four SNRs calculated for each
receiving antenna.
30 4 SIMULATIONS
4.2 Best and Worst Cases
The simulation environment
1
was used to generate a plot of the BER using the exact
SNR of the channel as an SNR estimation. This curve is expected to be the lower bound
that can be achieved. To investigate the potential benet of a good SNR estimator,
several BER curves were plotted using constant SNR estimators. It was expected that
those curves touch the ideal curve at the points where the estimated SNR is equal to
the channel SNR. In all other cases, they should lead to a higher BER. This plot can be
seen in Fig. 4.1.
As can be seen from the plot, the constant SNR estimators perform at certain points
slightly better than the one using directly the SNR of the channel. How can it be that
the simplest of all SNR estimators performs at certain points better than the ideal
estimator? Is it a bug in the simulation environment? The answer can be found when
repeating the same simulation
2
 but this time using a perfect channel estimator instead
of the FDMLE channel estimator. In this case, the plot looks as expected (see Fig. 4.2).
The reason therefore seems to be that the channel estimator adds additional noise to
the signal. This is not surprising as the FDMLE channel estimator is fairly simple and
basically takes only one sample for each channel matrix entry (which still results in
the transmission of four OFDM symbols for a 4x4 system!). In order to get a perfect
SNR estimator, one therefore has to take into account that the estimated SNR has to be
lower (i.e. higher noise) than the actual channel SNR. From the intersections of the
constant 10dB and 20dB with the ideal case follows the assumption that one has to
estimate an approximately 2dB lower SNR than the actual channel SNR. It has further
to be noted that this 2dB difference has only a small inuence on the BER.
The constant SNR estimators perform better in the region where they overestimate the
channel SNR than in the region, where they underestimate the SNR (if the channel
SNR for example is 30dB, it is better to estimate 50dB than to estimate 10dB). It
therefore follows that a constant SNR estimator should be chosen in a way that it
always overestimates the actual channel SNR. By comparing the constant 50dB curve
with the ideal curve, an average of 5dB SNR can be compensated by using a good SNR
1
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=0..9)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE demapper: MMSE modulation: QPSK
2
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=0..9)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: ideal demapper: MMSE modulation: QPSK
4.2 BEST AND WORST CASES 31
0 5 10 15 20 25 30
10
3
10
2
10
1
10
0
SNR (channel) [dB]
B
E
R
SNRest = SNR of channel
SNRest = 10dB
SNRest = 20dB
SNRest = 30dB
SNRest = 50dB
Figure 4.1: This simulation shows the differences between a SNR estimator using the
actual channel SNR and several constant SNR estimators. The used channel estimator
is FDMLE.
32 4 SIMULATIONS
0 5 10 15 20 25 30
10
4
10
3
10
2
10
1
10
0
SNR (channel) [dB]
B
E
R
SNRest = SNR of channel
SNRest = 10dB
SNRest = 20dB
SNRest = 30dB
SNRest = 50dB
Figure 4.2: This simulation shows the differences between a SNR estimator using the
actual channel SNR and several constant SNR estimators. The ideal channel estimator
(i.e. perfect channel knowledge) is used.
4.3 PERFECT SNR SHIFTED 33
estimator instead of a constant (high) SNR estimator. This is equivalent to a decrease
in the BER by about a factor of two (if the channel SNR is above 5dB). The benet is
lower if compared to the 30dB curve, but still signicant. It is therefore worth investing
some time to nd a good SNR estimator.
4.3 Perfect SNR Shifted
Fig. 4.1 suggests, that it is generally better to overestimate the SNR than to underes
timate it. This is certainly true for large deviations of the actual channel SNR. The
effects of slightly over or underestimating the channel SNR are explored
3
in Fig. 4.3
and 4.4.
Fig. 4.3 and Fig. 4.4 conrm that the the channel estimator adds approximately 2dB of
noise. They also show that approximately half a decibel is lost if the estimation is in
the range of 5...+1 dB of the actual channel SNR and that around one decibel is lost
for the range 6...+2 dB channel SNR.
The second interesting result from Fig. 4.3 and Fig. 4.4 is that the loss in performance
increases quite fast for higher deviations. If one assumes 2dB to be the optimal case,
then 3dB deviation result in half a decibel of performance loss, whereas 4dB deviation
lead to a full decibel of performance loss!
3
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=0..9)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE demapper: MMSE modulation: QPSK
34 4 SIMULATIONS
0 5 10 15 20 25 30
10
3
10
2
10
1
10
0
SNR (channel) [dB]
B
E
R
SNRest = SNR channel
SNRest = SNR channel + 6dB
SNRest = SNR channel + 3dB
SNRest = SNR channel + 2dB
SNRest = SNR channel + 1dB
SNRest = SNR channel 1dB
SNRest = SNR channel 2dB
SNRest = SNR channel 3dB
SNRest = SNR channel 4dB
SNRest = SNR channel 5dB
SNRest = SNR channel 6dB
SNRest = 50dB
Figure 4.3: This gure shows the simulation results of an SNR estimator using the
actual channel SNR with an offset of several decibels.
4.3 PERFECT SNR SHIFTED 35
21 21.5 22 22.5 23 23.5 24
10
2
SNR (channel) [dB]
B
E
R
SNRest = SNR channel
SNRest = SNR channel + 6dB
SNRest = SNR channel + 3dB
SNRest = SNR channel + 2dB
SNRest = SNR channel + 1dB
SNRest = SNR channel 1dB
SNRest = SNR channel 2dB
SNRest = SNR channel 3dB
SNRest = SNR channel 4dB
SNRest = SNR channel 5dB
SNRest = SNR channel 6dB
SNRest = 50dB
Figure 4.4: This gure shows the simulation results of an SNR estimator using the
actual channel SNR with an offset of several decibels. Detailed version of the plot in
Fig. 4.3.
36 4 SIMULATIONS
5 Algorithm Design
5.1 Several Approaches and Why They Dont Work (...Too
Well)
5.1.1 Using Only the FFT Output
The simplest approach would be using the output of the FFT directly  without any
correction terms from the channel matrix. This does generally not produce any reliable
results, as every tone on every possible channel generally experiences a different
inuence from the channel itself (phase shift and amplitude change  multiplication
with a complex channel matrix coefcient). To use an EVMstyle algorithm, one would
have to apply the algorithm for every transmitterreceivertone combination. It would
therefore be necessary to send the same known symbol several times in series. This is
obviously not a good solution as a lot of potential channel capacity is wasted.
5.1.2 Using the Channel Matrix
Every approach that employs the inverse of the channel matrix is doomed: The channel
matrix is generally not invertible. Inverting the channel matrix can be circumvented by
rewriting the algorithm or using known training signals where no tone is sent by more
than one antenna at any moment.
But not only the inversion is a problem: Using the channel matrix itself is highly
problematic. To estimate the channel matrix in a 4x4 system, each antenna has to
transmit each tone once alone. It is then possible to ll in the channel matrix with the
values at the receiver. This results in four complete OFDM symbols that have to be sent
including their CP. Compared to other setup steps, this step is quite costly and should
therefore not be repeated  at least not in a 4x4 system.
If an algorithm  for example
1
the one presented by Ren et. al. [16]  uses this estimated
1
The same problem exists for Aldana et. al. [7], Athanasios et. al. [9], Boumard [12] and others.
38 5 ALGORITHM DESIGN
channel matrix, the measured noise is zero. This is because the estimation of the
channel matrix assumed that there is no noise. If then the signal power is divided by the
noise power, the result is a high number which has nothing to do with the actual SNR.
As mentioned before, it would be possible to get a better estimate of the channel matrix
 but this is no option in a real system. It is also not desirable to have an SNR estimator
that is dependent on the performance of the channel estimator. SNR estimators that
need the channel matrix are not generally bad  some of them (e.g. the one from Ren et.
al. [16]) have a performance near the optimum for a perfect channel estimator. They
can therefore be a valid solution if an extremely accurate channel estimator is used. A
plot
2
showing the performance of the Ren2008 and an adapted EVM algorithm can be
seen in Fig. 5.1.
2
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=0..9)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE/ideal demapper: MMSE modulation: QPSK
5.1 SEVERAL APPROACHES AND WHY THEY DONT WORK (...TOO WELL) 39
0 5 10 15 20 25 30
10
3
10
2
10
1
10
0
SNR (channel) [dB]
B
E
R
Ren2008 perfect channel estimator
Ren2008 FDMLE channel estimator
EVM ideal channel estimator
EVM FDMLE channel estimator
constant 50dB
channel SNR
Figure 5.1: This plot shows the high performance of the Ren2008 and an adapted EVM
algorithm for an ideal channel estimator. It further shows the bad performance when
using the FDMLE channel estimator. It is not entirely clear why the Ren2008 algorithm
performs bad at low channel SNR in combination with the ideal channel estimator. It
can further be noted that with the FDMLE channel estimator, both algorithms perform
slightly worse than the 50dB constant algorithm. This suggests that 50dB is not enough
to be the upper limit but it is close enough for the 0..30dB range.
40 5 ALGORITHM DESIGN
5.2 Proposed Algorithm
5.2.1 General Idea
The algorithm uses the short preambles transmitted in the training phase. The system
transmits a clearly dened number of short preambles (generally two or four). One
short preamble consists of a repeating signal part of 16 samples plus another 16 samples
for the CP. In the ideal case, this leads to a series of ve 16samplesignals (subsignals)
per short preamble that are identical. For four short preambles, this results theoretically
in twenty identical subsignals that can be compared to estimate the signal power and
the noise power. It has to be noted that at least the rst subsignal is heavily distorted
due to the setup of lters and the automatic gain control (AGC) and therefore cannot be
used.
To estimate the SNR, an average of all available subsignals is taken. This average
signal should be nearly identical to the signal received without noise, as long as the
noise is additive and has a mean value near zero (this is the case for AWGN). Out of
this estimated subsignal, the signal power P
s,est
can be calculated. Using the original
received signal, the power of the signal plus noise P
s+n
can be calculated. Those results
can be used to estimate the SNR:
SNR
est
=
P
s,est
P
s+n
P
s,est
=
P
s,est
P
n,est
This algorithm will be denoted proposed algorithm to distinguish it from other algo
rithms. The numbers provided are specic for the the used system but can easily be
adapted for other congurations.
5.2.2 Mathematical Formulation and Analytical Results
Original Signals
All formulas provided are written in the discrete time domain  i.e. directly after the
IDFT at the transmitter and directly before the DFT at the receiver.
5.2 PROPOSED ALGORITHM 41
16 sample subsignal c
[l] =
_
_
_
c
,l
C , if l = 0...15,
0 , else.
The transmitted signal s
[k] =
m5
i=0
c
[k i 16]
The received signal y
r
[k] for receiving antenna r is then the following:
y
r
[k] =
4
=1
(s
h
r,
)[k] +n[k]
(s
h
r,
)[n] =
k=
s
[k] h
r,
[k n]
=
k=
s
[k n] h
r,
[k]
n is assumed to be IID AWGN and h is the channel impulse response. Due to the
convolution, the received signal y
r
[k] is generally not periodic anymore.
The Received Signal Rewritten
It is shown that if the rst and the last 8 samples of y are cut away, the remaining signal
is periodic again. The important points are:
h
r,
[k] = 0 if k < 0 due to the causality of the channel.
The cyclic prex is 16 samples long and assumed to be chosen carefully to avoid
ISI. Therefore the impulse response h
r,
[k] is zero if k 8.
h is assumed to be constant during the whole transmission (slow enough fading
channel).
42 5 ALGORITHM DESIGN
The channel does in the worst case distort the rst 8 samples of the next 16sample
subsignal. This is done in a periodic manner.
The sum of multiple signals with the same period is periodic again.
Those three facts lead to the conclusion, that if the rst and the last 8 samples are cut
away, the rest of the signal is periodic again. It is easy to see that this is true for all
receiving antennas. Every receiving antenna can therefore be treated individually.
This leads to a modied received signal y[k] that can be written in the following
way:
y[k] =
_
M1
i=0
z[k i 16]
_
+n[k]
The newly introduced signal z[l] is dened as:
z[l] =
_
_
_
z
,l
C , if l = 0...15,
0 , else.
It is possible to calculate the different components z
,l
but it is in this case not necessary.
M is the number of available 16sample subsignals. The noise signal n[k] is generally a
truncated version of the former noise signal n[k] and can be dened (assuming AWGN)
the following way:
Re{n[k]} =
_
_
_
n
kr
R so that n
kr
N(0,
2
n
2
) , if l = 0...16 M 1,
0 , else (cut away).
Im{n[k]} =
_
_
_
n
ki
R so that n
ki
N(0,
2
n
2
) , if l = 0...16 M 1,
0 , else (cut away).
E
_
Re{n[k]}
2
+Im{n[k]}
2
=
2
n
The Received Signal as a Random Variable
Each sample of the received subsignal can also be interpreted as a random variable:
y[k] N
_
z[mod
16
(k)],
2
n
_
5.2 PROPOSED ALGORITHM 43
The Averaged Signal
In a next step, the average s[l] of all 16sample subsignals in y[k] is calculated. If an
innite amount of such subsignals would be available, the average is expected to be
z[l], as the noise terms cancel out according to the law of large numbers:
s[l] =
1
M
M1
i=0
y[l + 16 i]
=
1
M
[y[l] +y[l + 16] +... +y[l + (M 1) 16]]
= z[l] +
1
M
M1
i=0
n[l + 16 i]
The Averaged Signal as a Random Variable
The expectation of this average signal is calculated:
E[ s[l]] = z[l] +
1
M
M1
i=0
E[n[l + 16 i]]
= z[l]
The following property was used:
E[X +Y ] = E[X] +E[Y ]
Further, the variance of the average signal is calculated.
var( s[l]) = E
_
( s[l] z[l])
2
=
1
M
2
E[(
M1
i=0
n[l + 16 i]
. .
N(0,M
2
n
)
)
2
]
=
2
n
M
The following formulas were used:
X, Y N(0,
2
) , IID
44 5 ALGORITHM DESIGN
X +Y N(0,
2
+
2
)
var(Z) =
2
z
= E[(Z E[Z])
2
]
The average signal s[l] can then be written as a random variable:
s[l] N
_
z[l] ,
2
n
M
_
This result is plausible as the mean value is as expected and the variance decreases
linearly with an increasing number of samples.
The Signal Power
In a next step, the signal power
3
is calculated:
P
s
= R
s s
[0]
=
15
i=0
 s[i]
2
=
15
i=0
s[i] ( s[i])
i=0
_
X
i
i
_
2
3
It has to be noted that the power of s is only equal to the signal power for the limes M . The
algorithm assumes that the signal power is equal to the power of s for all M > 1. This is justied by
the fact that at the end the SNR is estimated and not calculated.
5.2 PROPOSED ALGORITHM 45
z
=
k1
i=0
_
i
_
2
mean(Z) = k +
z
2
z
= var(Z) = 2 (k + 2
z
)
E[a X
n
] = a E[X
n
]
var(a X) = a
2
var(X)
Out of those formulas it can be seen that the power of s is noncentered chisquare
distributed. This can be written the following way:
P
s
=
2
n
M
15
i=0
_
 s[i]
M
_
2
. .
:=Z
Z
=
15
i=0
_
z[i]
n
_
2
=
M
2
n
15
i=0
z[i]
2
mean(Z) = 16 +
M
2
n
15
i=0
z[i]
2
One could argue, that this is not true, as z[i] is not Gaussian distributed. But this does
not matter as the square is taken anyway. The following property holds:
z
2
 = z
2
The mean signal power is then written as:
mean(P
s
) =
2
n
M
mean(Z)
=
2
n
M
_
16 +
M
2
n
15
i=0
z[i]
2
_
=
16
2
n
M
+
15
i=0
z[i]
2
46 5 ALGORITHM DESIGN
This result makes sense, as it is exactly the signal power for M (many samples)
or for
n
0 (no noise). Next, the variance is calculated:
var(P
s
) =
4
n
M
2
var(Z)
=
4
n
M
2
2
_
16 +
2M
2
n
15
i=0
z[i]
2
_
=
32
4
n
M
2
+
4
2
n
M
15
i=0
z[i]
2
As before, the variance is zero as expected for the cases M (many samples) or for
n
0 (no noise). It is slightly confusing that the signal power has an inuence on the
variance of the signal power. The following example helps to clarify the situation. It is
assumed that the noise power is in the range [1, 1] (not AWGN anymore). If the signal
amplitude is equal to 1, then the resulting signal power is distributed in the range [0, 4].
If the signal amplitude is assumed to be 3, then the resulting signal power is distributed
in the range [4, 16]. It is therefore obvious that a higher average signal power leads to a
higher variance in the total signal power.
The Signal Plus Noise Power
In the next step, the total power is calculated.
P
y
= R
yy
[0]
=
M161
i=0
y[i]
2
=
M161
i=0
y[i] (y[i])
i=0
y[i]
2
2
n
. .
:=Z
Z
=
M161
i=0
_
z[mod
16
(i)]
n
_
2
=
M
2
n
15
i=0
z[i]
2
mean(Z) = 16 M +
M
2
n
15
i=0
z[i]
2
var(Z) = 2
_
16 M + 2
M
2
n
15
i=0
z[i]
2
_
This leads to the following mean power value:
mean(P
y
) =
2
n
mean(Z)
= M(16
2
n
+
15
i=0
z[i]
2
)
This is the expected result, as it is the sum of the signal power and the noise power.
The variance can be calculated as:
var(P
y
) =
4
n
var(Z)
= M
2
n
_
32
2
n
+ 4
15
i=0
z[i]
2
_
As expected, the variance goes to zero for
n
0 (no noise). It is slightly confusing
to have a factor of M in front of the variance term. But again, an example shows the
reason: Assume that the noise is in the interval [1, 1]. The signal amplitude is assumed
to be 1. If only one sample is taken, the signal power is in the region [0, 4]. If two
samples are taken, the total signal power is in the region [0, 8] = 2 [0, 4].
48 5 ALGORITHM DESIGN
The Noise Signal
It is also possible to calculate the estimated noise signal n[k] directly:
n[k] = y[k]
M1
i=0
s[k 16 i]
= n[k]
1
M
_
M1
i=0
n[mod
16
(k) +i 16]
_
=
M 1
M
n[k]
1
M
M1
M161
i=0
 n[i]
2
(M1)
2
n
M
2
. .
:=Z
5.2 PROPOSED ALGORITHM 49
z
=
M161
i=0

M1
M
n[k]
2
(M1)
2
n
M
2
=
(M 1)
2
n
M161
i=0
n[k]
2
E[n
2
]=
2
n
= M 16 (M 1)
mean(Z) = M
2
16
var(Z) =
2
z
= 2 16 M(2M 1)
This leads to the following mean noise power value:
mean( n) = 16 (M 1)
2
n
For M , this results in a mean value of
2
n
per sample as expected. The variance
can be calculated as:
var( n) = 32
4
n
(M 1)
2
(2M 1)
M
3
Summary of the Mean Power Terms Normalized Per Sample
As an overview, the mean values of the different power terms are presented here 
averaged per sample:
mean(P
y
) =
2
n
+
1
16
15
i=0
z[i]
2
mean(P
s
) =
2
n
M
+
1
16
15
i=0
z[i]
2
mean(P
n
) =
2
n
M 1
M
Those results indicate that the following property is true:
P
n
= P
y
P
s
The property cannot easily be proven. Numerical examples strongly indicate that the
property holds  and the mean values indicate it too. This property is important as it is
therefore needless to calculate the estimated noise signal and hardware costs can be
50 5 ALGORITHM DESIGN
saved. It also makes sense out of a physical point of view: The total power is the power
of the signal plus the power of the noise. So if from this total power the signal power is
subtracted, the remaining power is the noise power.
Calculation of the SNR
The last step is to estimate the SNR. This is done in the following way:
SNR :=
P
s
M
P
y
P
s
M
It is interesting to see what the average SNR looks like:
mean SNR =
M mean(P
s
)
mean(P
y
) M mean(P
s
)
=
1 +
M
16
2
n
15
i=0
z[i]
2
M 1
=
1 +M SNR
true
M 1
= E[SNR]
It has to be noted that this result is not equal to the expectation of the SNR, as the
following equation is generally not true:
A, B : arbitrary random variables
E
_
A
B A
_
=
E[A]
E[B] E[A]
It is not easily possible to calculate the expectation value of the division of two non
centered chisquare variables. Therefore, the approximated values of the mean SNR
were calculated for several M and various SNR, as they should show a tendency. The
results can be seen in Fig. 5.2. As expected, the results get better with a higher M and
higher channel SNR.
5.2 PROPOSED ALGORITHM 51
0 5 10 15 20 25 30
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
SNR (channel) [dB]
S
N
R
S
N
R
_
h
a
t
[
d
B
]
Figure 5.2: This gure shows the difference between the expected SNR and the mean
calculated SNR. The lowest curve is for M = 2. Each higher curve increases the value
of M by one  the highest curve is for M = 100.
52 5 ALGORITHM DESIGN
5.2.3 Simulation of the Proposed Algorithm
The proposed algorithm was tested using the simulation environment
4
. No nonidealities
were considered in this run. The noise was purely AWGN. The results of the simulation
are shown in Fig. 5.3. It can be seen that the algorithm performs near the optimum
for M = 9. The result of the SNR estimation is independent of the channel estimator,
whereas the BER depends on the estimated channel matrix!
5.2.4 The Inuence of the Number of Samples
In a next step, the inuence of the number of available subsignals M was investigated
5
.
Fig. 5.2 together with Fig. 4.4 suggest that the inuence of the number of subsignals
should be rather small  at least for M > 4. The results can be seen in Fig. 5.4 and
Fig. 5.5.
5.2.5 The Mean Value of the Estimated SNR
As mentioned before, Fig. 5.2 only shows an approximation of the estimated SNR. The
exact curves were calculated using the simulation environment
6
. The results can be seen
in Fig. 5.6. Qualitatively, the curves look the same which proves that the approximation
made is quite accurate. The most obvious difference is the offset difference of around
one decibel that can be seen by comparing the two gures.
4
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=09)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE demapper: MMSE modulation: QPSK
5
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=09)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE demapper: MMSE modulation: QPSK
6
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=09)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE demapper: MMSE modulation: QPSK
5.2 PROPOSED ALGORITHM 53
0 5 10 15 20 25 30
10
3
10
2
10
1
10
0
SNR (channel) [dB]
B
E
R
proposed algorithm (M=9)
const 50dB
channel SNR
Figure 5.3: This gure shows the simulation results that were obtained using the
proposed algorithm with M = 9. The simulated BER is close to the best possible BER
and is as discussed already better than taking the exact channel SNR.
54 5 ALGORITHM DESIGN
0 5 10 15 20 25 30
10
3
10
2
10
1
10
0
SNR (channel) [dB]
B
E
R
channel SNR
constant 50dB
proposed algorith (M=9)
proposed algorith (M=8)
proposed algorith (M=7)
proposed algorith (M=6)
proposed algorith (M=5)
proposed algorith (M=4)
proposed algorith (M=3)
proposed algorith (M=2)
Figure 5.4: This gure shows the simulation results that were obtained using the
proposed algorithm with different M. As expected, the performance is better for high
M.
5.2 PROPOSED ALGORITHM 55
22 22.5 23 23.5 24 24.5 25
10
2
SNR (channel) [dB]
B
E
R
channel SNR
constant 50dB
proposed algorith (M=9)
proposed algorith (M=8)
proposed algorith (M=7)
proposed algorith (M=6)
proposed algorith (M=5)
proposed algorith (M=4)
proposed algorith (M=3)
proposed algorith (M=2)
Figure 5.5: This gure shows the same results as Fig. 5.4. It can be seen that the BER is
near the optimum for M > 4 and even the results with smaller M are still acceptable
(losing less than 1dB in the worst case M = 2).
56 5 ALGORITHM DESIGN
0 5 10 15 20 25 30
5
4
3
2
1
0
1
SNR (channel) [dB]
m
e
a
n
(
S
N
R
S
N
R
_
h
a
t
)
[
d
B
]
proposed algorithm (M=9)
proposed algorithm (M=8)
proposed algorithm (M=7)
proposed algorithm (M=6)
proposed algorithm (M=5)
proposed algorithm (M=4)
proposed algorithm (M=3)
proposed algorithm (M=2)
Figure 5.6: This gure shows the simulated mean SNR values of the algorithm for
several M.
5.2 PROPOSED ALGORITHM 57
5.2.6 Frequency Offset
In this simulation
7
, the effects of a frequency offset between transmitter and receiver
were investigated. As can be seen in Fig. 5.7, the effects of a frequency offset seem
to be negligible as long as the offset is below 100ppm (parts per million). Even at
200ppm, the performance is nearly ideal for the whole investigated range. A frequency
offset above 200ppm results in a visible degradation of the performance that eventually
gets worse than the 50dB constant estimator.
5.2.7 Ignore Frequency Offset and Save Hardware Costs
It would be possible to ignore the frequency offset and save hardware costs at the same
time by using the absolute values of the input samples instead of the real and imaginary
parts:
y
n
= received sample, complex
y
n
e
2ift
= received sample with frequency offset, complex
y
n
e
2ift
 = y
n
 = absolute value of the received sample, real
This scenario was tested in a simulation
8
. The results can be seen in Fig. 5.8. The
loss is around one decibel compared to the optimal case. This approach is therefore
interesting if a high frequency offset is present or additional hardware costs are to be
saved.
5.2.8 Limited Precision
To efciently implement the algorithm in hardware, one can only use a xed amount of
bits per sample. The inuence of the number of effective bits on the SNR estimation
7
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=09)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE demapper: MMSE modulation: QPSK
8
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=09)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE demapper: MMSE modulation: QPSK
58 5 ALGORITHM DESIGN
0 5 10 15 20 25 30
10
3
10
2
10
1
10
0
SNR (channel) [dB]
B
E
R
const 50dB
channel SNR
proposed algorithm (M=9), 20ppm frequency offset
proposed algorithm (M=9), 50ppm frequency offset
proposed algorithm (M=9), 100ppm frequency offset
proposed algorithm (M=9), 200ppm frequency offset
proposed algorithm (M=9), 500ppm frequency offset
proposed algorithm (M=9), 1000ppm frequency offset
proposed algorithm (M=9), 10000ppm frequency offset
Figure 5.7: This gure shows the simulated BER values for M = 9 under different
frequency offset scenarios.
5.2 PROPOSED ALGORITHM 59
0 5 10 15 20 25 30
10
3
10
2
10
1
10
0
SNR (channel) [dB]
B
E
R
const 50dB
channel SNR
proposed algorithm ( input=input )
Figure 5.8: This gure shows the simulation results of the proposed algorithm, where
each complex input sample was replaced by the absolute value of each sample (M = 9).
60 5 ALGORITHM DESIGN
was investigated using a simulation
9
. The results can be seen in Fig. 5.9. The plot
shows that there is no visible impact on the accuracy in the range from 0 to 30 dB
channel SNR as long as at least eight effective bits are used. Six effective bits already
show a distinctive deviation from the ideal case and four effective bits are denitely
not enough to store small noise terms.
5.2.9 Proposed Algorithm: Further Ideas and Simulations
Until now, all simulations of the proposed algorithm were conducted in an ideal envi
ronment without any hardware effects or changing channel coefcients. Nonidealities
could include phase noise, a DCoffset, a small amplitude modulation or other types
of noise (e.g. shot noise). In some of those cases, it might be possible to adapt the
algorithm to take care of certain hardware effects  for example removing the offset by
a high pass lter.
But is it really a good idea to remove such effects? If the aim is to estimate the channel
SNR as accurately as possible, it would be favorable to do so. On the other hand, the
removal of those effects would likely decrease the performance of whole system, if
those effects are not removed for the rest of the received signal too. One can argue that
they probably dont originate in the channel, but can be treated as if they would.
9
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=09)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE demapper: MMSE modulation: QPSK
5.2 PROPOSED ALGORITHM 61
0 5 10 15 20 25 30
0
1
2
3
4
5
6
7
SNR (channel) [dB]
m
e
a
n
(
S
N
R
S
N
R
_
h
a
t
)
[
d
B
]
10 bit effective
8 bit effective
6 bit effective
4 bit effective
Figure 5.9: This gure shows the simulation results of the proposed algorithm, where
each sample has only a limited precision (M = 9). 8 effective bits for example mean,
that the absolute value of the largest received sample can be stored in a 8 bit unsigned
integer. The other samples are scaled proportionally.
62 5 ALGORITHM DESIGN
6 Implementation
6.1 Requirements and Limitations
The algorithm is implemented on a Xilinx Virtex4 FPGA (type: xc4vsx55  see [32]).
As the algorithm has to share the space on the FPGA with other components, it is crucial
that the implementation is optimized for minimal hardware usage. The clock frequency
is given  so there is no point in optimizing the implementation for speed, as long as the
design is able to run with the given 80 MHz clock signal. The maximal allowed latency
is dened by the requirement that the results have to be ready before the consecutive
long preamble is fully arrived. The long preamble is 128 samples long and one sample
is arriving every fourth clock cycle. This results in a maximal allowed latency of 512
clock cycles.
The SNR estimation has to be done for each receiving antenna individually. Besides
the SNR, it is further necessary to compute the noise variance. The data is arriving
separated in a real and an imaginary part. One new data sample pair is arriving
every fourth clock cycle. Both signals are 10 bit long and use one of those 10 bits for
the sign (twos complement). The automatic gain control is adjusted in a way that
it approximately scales the largest signal parts of the short preamble to half of the
possible amplitude. This is done in order to prevent data loss as the peak to average
power ratio (PAPR) in MIMOOFDM systems is potentially large [33]. This results in
an effective used data width of 8 bits for both the real and the imaginary part. As
discussed in the previous chapter, 8 effective bits are accurate enough for the expected
SNR range of 0 to 30 dB.
A complete list of all input and output singals with their denition can be found in
table 6.1.
For hardware cost analysis, the following assumptions that are consistent with the
testbed are made: The relevant data of the short preamble consists of a maximum of
10 complex subsignals with 16 samples each and 10 bits are used for each the real and
the imaginary part.
64 6 IMPLEMENTATION
NAME FORMAT DESCRIPTION
INPUTS
DATA_STREAM_REAL signed,
10 bit
one new sample each 4 clock cycles, real
part of the sample
DATA_STREAM_IMAG signed,
10 bit
one new sample each 4 clock cycles, imagi
nary part of the sample
AGC_CONSTANT 1 bit is 0 if the AGC didnt freeze yet and 1 if
AGC frozen
SHORT_PREAM
_FINISHED
1 bit is 1 if the last sample of the short preamble
arrives and 0 otherwise
NEW_SAMPLE_READY 1 bit is 1 if a new sample arrived and 0 other
wise
SELECT_OPERATION
_MODE
1 bit is 0 for constant SNR and 1 for estimated
SNR
CLOCK 1 bit clock signal
RESET 1 bit active low reset
OUTPUTS
SNR >10 bit,
unsigned
the calculated SNR value
SNR_READY 1 bit is 1 as soon as the SNR calculation nished,
otherwise 0 if no valid value present
SIGMA_S >10 bit,
unsigned
the calculated value for the noise variance
(sigma squared) per complex sample
SIGMA_S_READY 1 bit is 1 as soon as the sigma square calcula
tion nished, otherwise 0 if no valid value
present
Table 6.1: This table shows all incoming and outgoing signals from the SNR estimation
block.
6.2 FIRST APPROACH 65
6.2 First Approach
The most direct way would be store all received values. Once all values arrived, one
could then calculate the desired values in parallel. This approach is equivalent to a
direct implementation of standard Software code and needs very little control logic. An
approximation of the hardware costs for this approach can be found in table 6.2.
TYPE NUMBER NEEDED USE
10 bit registers 2 160 = 320 store incoming values
10 bit multipliers 320 square each signal sample
20,21,... bit adders 320 adder tree for the total signal plus
noise power
.. .. ... and so on...
Table 6.2: Approximate hardware costs for approach 1. Only about half of the parts are
listed as it is obvious that this approach is not a good one considering the constraints.
6.3 Second Approach
As the data path for the total signal plus noise power and the estimated signal power
have highly different needs, it seems to be a smart idea to separate them.
The total signal plus noise power datapath needs two registers, two multipliers and
two adders. First, the real and the imaginary sample are both squared and then added.
This is equivalent to the total power squared of the actual sample. This power is added
to the power of the previous samples. Each 16 samples, the total power is written in
the second register to ensure that no partially nished cycles contribute to the nal
result. Some additional control logic is needed to enable the two registers.
The estimated signal power can also be simplied. It is not necessary to save all
samples, but only the present ones and the average of the previous ones. A schematic
using that approach and minimizing additional control by implementing a shift register
is presented in Fig. 6.1. Compared to the rst approach, this version already saves large
66 6 IMPLEMENTATION
amounts of hardware. The total costs are still high as can be seen in the approximation
in table 6.3.
TYPE NUMBER NEEDED USE
10 bit multipliers 2 total signal plus noise power dat
apath
20/30 bit adder 1 each total signal plus noise power dat
apath
30 bit register 2 total signal plus noise power dat
apath
15 bit multipliers 2 16 = 32 SMART_ADDER  estimated sig
nal datapath
15 bit register 2 16 = 32 SMART_ADDER  estimated sig
nal datapath
15 bit adders 2 16 = 32 SMART_ADDER  estimated sig
nal datapath
30, 31, 32, 33, 34 bit
adders
16, 8, 4, 2, 1 adder tree  estimated signal dat
apath
10 bit registers 2 16 = 32 shift register  estimated signal
datapath
additional adders,
multipliers, logic,
dividers
to calculate the SNR and the noise
variance
Table 6.3: Approximate hardware costs for second approach.
6.4 Final Approach
Additional hardware costs can be saved in the estimated signal datapath. The adder
and the multiplier from the SMART_ADDER entity can be shared among all 16 samples,
as they dont produce any relevant data most of the time. As new samples only arrive
6.4 FINAL APPROACH 67
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
S
A
M
P
L
E
_
S
Q
U
A
R
E
_
E
N
T
SAMPLE_SQUARE_ENT
DATA_STREAM_REAL
SIG_POW
DATA_STREAM_IMAG
D Q
Clk
Rst
En
D Q
Clk
Rst
0
INITIALIZE
FULL_CYCLE
SMART_ADDER_ENT
D Q
Clk
Rst
SMART_ADDER_ENT
SAMPLE_SQUARE_ENT
REAL_IN
IMAG_IN IMAG_OUT
SAMPLE_SQUARE
REAL_OUT
FULL_CYCLE
INITIALIZE
Figure 6.1: Second approach: Schematic of the datapath for the estimated signal
power. Compared to the rst approach, the costs are highly reduced.
68 6 IMPLEMENTATION
every fourth clock period, they could theoretically also be shared between the real and
the imaginary samples. This is not done as the amount of hardware to be saved is small
compared to the control overhead.
About half of the registers in the estimated signal datapath can be omitted by realizing
that it is not necessary to save all the old averaged samples. Those were introduced to
ensure that only full cycles are considered. Otherwise, one needed a counter for each
sample and a division through the number of samples that were added. This is not a
good option, as the division is more cost intensive than a few registers. The trick is not
to save all the averaged samples but the total power of those instead.
The different parts of the design are introduced in the following subsections.
6.4.1 SNR_EST_ENT
The SNR_EST_ENT entity is the top level design entity. A schematic can be seen in
Fig. 6.2. The thick lines represent the datapath and all other lines the controlpath. The
datapath is separated into two parts: The part for the total signal plus noise power
and the part for the estimated signal power.
The the total signal plus noise power datapath consists of:
The TOTAL_POWER_ENT(d=10,e=9): calculating the total power of M 16
sample subsignals including all the noise.
The NUMBER_OF_FULL_CYCLES_ENT: counting the number of nished 16
sample subsignals M.
The multiplier linking both of them  resulting in:
M
2
P
(16samplesubsignal)+noise
The estimated signal power datapath consists of:
Both AVERAGE_SIGNAL_ENT: adding up all M subsignals for both the real and
the imaginary part
The TOTAL_POWER_ENT(d=15,e=4): calculating the power of the added up
real and imaginary 16sample subsignals  resulting in:
M
2
P
(16samplesubsignal)
The calculation of the SNR and the noise variance:
6.4 FINAL APPROACH 69
D
A
T
A
_
S
T
R
E
A
M
_
R
E
A
L
S
N
R
D
A
T
A
_
S
T
R
E
A
M
_
I
M
A
G
A
G
C
_
C
O
N
S
T
A
N
T
S
H
O
R
T
_
P
R
E
A
M
_
F
I
N
I
S
H
E
D
C
L
O
C
K
S
E
L
E
C
T
_
O
P
E
R
A
T
I
O
N
_
M
O
D
E
1
0
1
0
S
I
G
M
A
_
S
R
E
S
E
T
S
N
R
e
s
t
i
m
a
t
i
o
n
p
e
r
r
e
c
e
i
v
i
n
g
a
n
t
e
n
n
a
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
T
O
T
A
L
_
P
O
W
E
R
_
E
N
T
(
d
=
1
0
,
e
=
9
)
D
A
T
A
_
S
T
R
E
A
M
_
R
E
A
L
D
A
T
A
_
S
T
R
E
A
M
_
I
M
A
G
T
O
T
A
L
_
P
O
W
E
R
I
N
I
T
I
A
L
I
Z
E
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
N
U
M
B
E
R
_
O
F
_
F
U
L
L
_
C
Y
C
L
E
S
_
E
N
T
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
I
N
I
T
I
A
L
I
Z
E
N
U
M
B
E
R
_
O
F
_
F
U
L
L
_
C
Y
C
L
E
S
3
0
5
A
V
E
R
A
G
E
_
S
I
G
N
A
L
_
E
N
T
T
O
T
A
L
_
P
O
W
E
R
_
E
N
T
(
d
=
1
5
,
e
=
4
)
D
A
T
A
_
S
T
R
E
A
M
_
R
E
A
L
D
A
T
A
_
S
T
R
E
A
M
_
I
M
A
G
I
N
I
T
I
A
L
I
Z
E
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
1
5
1
5
3
5
3
5
D
A
T
A
_
S
T
R
E
A
M
S
A
M
P
L
E
_
C
O
U
N
T
E
R
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
I
N
I
T
I
A
L
I
Z
E
A
V
_
S
I
G
_
S
E
R
I
A
L
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
_
E
N
T
I
N
I
T
I
A
L
I
Z
E
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
S
A
M
P
L
E
_
C
O
U
N
T
E
R
I
N
I
T
I
A
L
I
Z
E
_
E
N
T
A
G
C
_
C
O
N
S
T
A
N
T
I
N
I
T
I
A
L
I
Z
E
V
A
L
I
D
_
D
A
T
A
V
A
L
I
D
_
D
A
T
A
_
E
N
T
V
A
L
I
D
_
D
A
T
A
A
G
C
_
C
O
N
S
T
A
N
T
S
H
O
R
T
_
P
R
E
A
M
_
F
I
N
I
S
H
E
D
S
N
R
_
E
S
T
_
E
N
T
s
i
g
n
a
l
T
O
T
A
L
_
P
O
W
E
R
C
O
N
T
_
A
V
_
S
I
G
_
E
N
T
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
S
I
G
_
S
E
L
S
I
G
_
R
E
A
D
Y
S
I
G
_
I
N
I
T
S
I
G
_
S
E
L
1 0
?
?
d
b
3
5
n
u
m
b
e
r
o
f
o
u
t
p
u
t
b
i
t
s
1
0

>
0
.
.
.
3
0
d
B
1
4

>
0
.
.
.
4
0
d
B
1
7

>
0
.
.
.
5
0
d
B
A
V
E
R
A
G
E
_
S
I
G
N
A
L
_
E
N
T
D
A
T
A
_
S
T
R
E
A
M
S
A
M
P
L
E
_
C
O
U
N
T
E
R
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
I
N
I
T
I
A
L
I
Z
E
A
V
_
S
I
G
_
S
E
R
I
A
L
S
I
G
_
S
E
L
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
N
R
_
D
I
V
I
S
I
O
N
_
E
N
T
D
A
T
A
_
B
R
E
S
U
L
T
D
A
T
A
_
A
B
O
T
H
_
V
A
L
U
E
S
R
E
A
D
Y
R
E
S
U
L
T
_
R
E
A
D
Y
S
N
R
_
R
E
A
D
Y
n o i s e
3
5
3
5
3
5
3
5
s i g n a l + n o i s e
N
R
_
D
I
V
I
S
I
O
N
_
E
N
T
D
A
T
A
_
B
R
E
S
U
L
T
D
A
T
A
_
A
B
O
T
H
_
V
A
L
U
E
S
R
E
A
D
Y
R
E
S
U
L
T
_
R
E
A
D
Y
L
S
R
4
3
1
3
1
S
I
G
M
A
_
S
_
R
E
A
D
Y
A
N
D
b
i
t
0
.
.
3
1
1
2
s
c
o
m
p
1
0
"
C
Y
C
L
E
S
_
S
"
"
D
_
F
F
1
"
"
D
_
F
F
2
"
"
D
_
F
F
3
"
p
i
p
e
l
i
n
e
p i p e l i n e
A
N
D
V
A
L
I
D
_
D
A
T
A
1 0
?
?
A
N
D
V
A
L
I
D
_
D
A
T
A
A
N
D
V
A
L
I
D
_
D
A
T
A
Figure 6.2: SNR_EST_ENT  top level design entity.
70 6 IMPLEMENTATION
The adder: Subtracting the signal power from the signal plus noise power resulting
in the noise power.
The rst divider (NR_DIVISION_ENT): Dividing the noise power rst by 16 and
then by M
2
to get the noise variance per sample:
2
n
The second divider (NR_DIVISION_ENT): Dividing the signal power by the noise
power, resulting in the SNR.
There is some additional logic at the output of the datapath to switch to constant SNR
and noise variance values instead of the estimated ones and some logic that sets the
ready signals to high as soon the calculations are nished.
The two registers with gray background inserted into the datapath and the control path
dont have any functional tasks. They are pipeline registers to ensure that the desired
clock period is met.
The control path consists of several small units. The general tasks are:
FULL_CYCLE_FINISHED_ENT: Counts the samples already arrived in the actual
16sample subsignal and noties other entities if a complete subsignal arrived.
INITIALIZE_ENT: Initializes all other entities at the beginning.
VALID_DATA_ENT: Has a 1 at the output, as long as the arriving data is valid.
CONT_AV_SIG_ENT: Control unit for the estimated signal datapath.
6.4.2 TOTAL_POWER_ENT
The TOTAL_POWER_ENT entity calculates the power of a stream of data arriving
separated in the real and the imaginary part. The schematic can be seen in Fig. 6.3.
The implementation is straight forward: Once a new data sample arrives, the real
and the imaginary part are squared and added. They are then added to the previous
stored power. Once a full subsignal arrived, the total power is stored into the second
register. The rst register can further be initialized with the value zero to start a new
computation.
6.4 FINAL APPROACH 71
E
n
D
Q
C
l
k
R
s
t
T
O
T
A
L
_
P
O
W
E
R
_
E
N
T
I
N
I
T
I
A
L
I
Z
E
D
A
T
A
_
S
T
R
E
A
M
_
R
E
A
L
D
A
T
A
_
S
T
R
E
A
M
_
I
M
A
G
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
T
O
T
A
L
_
P
O
W
E
R
E
n
D
Q
C
l
k
R
s
t
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
d
2
*
d
d
2
*
d
2
*
d
+
1
w
o
r
s
t
c
a
s
e
:
1
6
(
s
a
m
p
l
e
s
)
*
5
(
s
u
b
s
i
g
n
a
l
s
)
*
4
(
O
F
D
M

s
y
m
b
o
l
s
)
=
3
2
0
r
u
n
s
2
^
9
=
5
1
2
=
=
>
3
0
b
i
t
=
=
>
e
=
9
2
*
d
+
1
+
e
2
*
d
+
1
+
e
2
*
d
+
1
+
e
2
*
d
+
1
+
e
2
*
d
+
1
+
e
1 0
O
R
1
1
1
1
#
#
Figure 6.3: TOTAL_POWER_ENT  calculating the power of a stream of data.
72 6 IMPLEMENTATION
6.4.3 AVERAGE_SIGNAL_ENT
The AVERAGE_SIGNAL_ENT entity sums up all the samples that belong together (e.g.
every rst sample of a 16sample subsignal is stored in the register s0). The samples
are simply added up and not averaged in this entity to save the costs and latency of a
divider. The schematic can be seen in Fig. 6.4.
Every arriving sample is added to the corresponding sample from the previous subsig
nals. As there is only one sample at a time arriving, the adder can be shared among
all 16 samples. To ensure that only nished subsignals have an inuence on the nal
result, the values are read out before they are overwritten by the new arriving values.
As before, some additional circuitry was added to initialize the entity.
6.4.4 FULL_CYCLE_FINISHED_ENT
The FULL_CYCLE_FINISHED_ENT entity counts the arriving samples within a 16
sample subsignal and noties the NUMBER_OF_FULL_CYCLES_ENT entity if a com
plete subsignal has arrived. The schematic can be seen in Fig. 6.5.
The implementation of the FULL_CYCLE_FINISHED_ENT consists of a counter and
some additional hardware. The delay register and the ANDgate at the output are there
to ensure that this entity only conrms a nished subsignal only if all 16 samples have
arrived. The other difculty is to reset the counter once it arrived at the number 15.
This can be done either by a controlled overow or over the multiplexer. The second
version is implemented.
6.4.5 NUMBER_OF_FULL_CYCLES_ENT
The NUMBER_OF_FULL_CYCLES_ENT entity counts the number of complete subsignals
that have arrived. The schematic can be seen in Fig. 6.5. The implementation of the
NUMBER_OF_FULL_CYCLES_ENT is straight forward: It consists of a counter with an
initialize mechanism.
6.4 FINAL APPROACH 73
I
N
I
T
I
A
L
I
Z
E
A
V
E
R
A
G
E
_
S
I
G
N
A
L
_
E
N
T
D
A
T
A
_
S
T
R
E
A
M
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
S
A
M
P
L
E
_
C
O
U
N
T
E
R
A
V
_
S
I
G
_
S
E
R
I
A
L
1
0
w
o
r
s
t
c
a
s
e
:
2
0
r
u
n
s
=
=
>
2
^
5
1
5
1
5
1
0
1
5
1 0
s
0
s
1
s
2
s
3
s
4
s
5
s
6
s
7
s
8
s
9
s
1
0
s
1
1
s
1
2
s
1
3
s
1
4
s
1
5
1
5
.
.
0
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
0
1
2
3
4
5
6
7
8
9
1
0
1
1
1
2
1
3
1
4
1
5
S
I
G
_
S
E
L
1
4
4
t
o
1
6
d
e
m
u
l
t
i
p
l
e
x
e
r
0
1
2
3
4
5
6
7
8
9
1
0
1
1
1
2
1
3
1
4
1
5
OR
AND
r
e
s
t
o
f
t
h
e
e
n
a
b
l
e
e
n
t
r
i
e
s
a
s
t
h
e
f
i
r
s
t
o
n
e

b
u
t
w
i
t
h
d
i
f
f
e
r
e
n
t
d
e
m
u
x
o
u
t
p
u
t
s
!
4
4
1
1
1
1
1
5
Figure 6.4: AVERAGE_SIGNAL_ENT  averaging all samples that belong together.
74 6 IMPLEMENTATION
4
4
4
E
n
D
Q
C
l
k
R
s
t
4
b
i
t
0
b
i
t
1
b
i
t
2
b
i
t
3
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
_
E
N
T
N
E
W
_
S
A
M
P
L
E
_
R
E
A
D
Y
I
N
I
T
I
A
L
I
Z
E
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
S
A
M
P
L
E
_
C
O
U
N
T
E
R
V
A
L
I
D
_
D
A
T
A
A
N
D
A
N
D
A
N
D
4
4
1
1 0
4
1
1
1
1
4
A
N
D
O
R
OR
D
Q
C
l
k
R
s
t
A
N
D
E
n
D
Q
C
l
k
R
s
t
N
U
M
B
E
R
_
O
F
_
F
U
L
L
_
C
Y
C
L
E
S
_
E
N
T
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
I
N
I
T
I
A
L
I
Z
E
N
U
M
B
E
R
_
O
F
_
F
U
L
L
_
C
Y
C
L
E
S
O
R
1 0
5
5
5
5
5
5
1
1
1
w
o
r
s
t
c
a
s
e
:
2
0
r
u
n
s
=
=
>
2
^
5
Figure 6.5: NUMBER_OF_FULL_CYCLES_ENT and FULL_CYCLE_FINISHED_ENT 
counting subsignals.
6.4 FINAL APPROACH 75
6.4.6 INITIALIZE_ENT
The INITIALIZE_ENT entity is a simple rising edge detector. The schematic can be
seen in Fig. 6.6.
INITIALIZE_ENT
AGC_CONSTANT
INITIALIZE
D Q
Clk
Rst
AND
Figure 6.6: INITIALIZE_ENT  initializes the rest of the circuit as soon as the AGC
freezes.
6.4.7 VALID_DATA_ENT
The VALID_DATA_ENT entity is a small automaton, that produces a logic one at its
output, as long as there are valid data samples from the short preamble arriving. This
means that the AGC has to be frozen and the short preambles are not nished yet. The
schematic can be seen in Fig. 6.7
6.4.8 CONT_AV_SIG_ENT
The CONT_AV_SIG_ENT entity is responsible for controlling the estimated signal dat
apath. As soon as a complete subsignal arrived, the automaton produces the control sig
nal to read out the correct values in the correct order from the AVERAGE_SIGNAL_ENT
entity. It further initializes and controls the second TOTAL_POWER_ENT entity. The
automaton is also responsible to start the division in the NR_DIVISION_ENT entity.
The schematic can be seen in Fig. 6.8
6.4.9 NR_DIVISION_ENT
The NR_DIVISION_ENT is a parametrized division entity. Division by a variable is
generally a rather complex operation in hardware  but cannot be circumvented in this
76 6 IMPLEMENTATION
VALID_DATA=0
VALID_DATA=1
AGC_CONSTANT, SHORT_PREAM_FINISHED
VALID_DATA_ENT
1,0 < AGC froze
AGC_CONSTANT
VALID_DATA
0,x < AGC still changing
1,1 < AGC froze on last
short pream sample
AGC froze > 1,0
valid data arriving
MOOREAutomaton
INPUT, INPUT
RESET
SHORT_PREAM_FINISHED
ST0
ST1
VALID_DATA=0
ST2
0,x
x,1
0,0 < strange
1,x
Figure 6.7: VALID_DATA_ENT  monitors the state of the arriving samples.
case. There exist different algorithms with different advantages and disadvantages each.
It follows a list with the requirements for the division algorithm to be implemented:
division: Q = A/B + R. The residual R is not needed  which can be justied
the following way: The SNR value has a sufciently high resolution if only the
integer value is taken  except in the extreme low dB ranges that are usually not
of much interest: 1 0dB, 2 3dB, 3 4.8dB, 4 6dB,..., 100 20dB,
101 20.04dB,..
both A and B are pbit unsigned values
latency small enough  ideally below 100 clock cycles
small hardware costs  preferably parametrizable
There exist four algorithms that are sometimes called slow division algorithms: Restor
ing, NonPerforming, NonRestoring and the SRT division algorithms. If the residual
is not needed, the NonRestoring algorithm is faster than the Restoring and the Non
Performing algorithms. The SRT algorithm uses a lookuptable (LUT) and is the one
responsible for the Intel Pentium Bug that was discovered in 1994 [34]. A LUT is
6.4 FINAL APPROACH 77
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
1
3
C
O
N
T
_
A
V
_
S
I
G
_
E
N
T
M
e
a
l
e
y

A
u
t
o
m
a
t
o
n
w
a
i
t
i
n
g
0
/
x
,
0
,
1
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
0
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
/
S
I
G
_
S
E
L
,
S
I
G
_
R
E
A
D
Y
,
S
I
G
_
I
N
I
T
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
1
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
2
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
3
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
4
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
5
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
6
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
7
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
8
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
9
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
1
0
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
1
1
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
1
2
p
r
o
c
e
s
s
e
d
s
a
m
p
l
e
1
4
1
/
0
,
1
,
0
x
/
1
,
1
,
0
x
/
2
,
1
,
0
x
/
3
,
1
,
0
x
/
4
,
1
,
0
x
/
5
,
1
,
0
x
/
7
,
1
,
0
x
/
6
,
1
,
0
x
/
8
,
1
,
0
x
/
9
,
1
,
0
x
/
1
0
,
1
,
0
x
/
1
1
,
1
,
0
x
/
1
2
,
1
,
0
x
/
1
3
,
1
,
0
x
/
1
4
,
1
,
0
x
/
1
5
,
1
,
0
R
E
S
E
T
S
I
G
_
S
E
L
S
I
G
_
R
E
A
D
Y
S
I
G
_
I
N
I
T
4 1 1
F
U
L
L
_
C
Y
C
L
E
_
F
I
N
I
S
H
E
D
0
1
2
3
4
5 6 7
8
9
1
0
1
1
1
2
1
3
1
4
1
5
Figure 6.8: CONT_AV_SIG_ENT  control for the estimated signal datapath.
78 6 IMPLEMENTATION
undesirable for this application, as the hardware costs are not negligible and a LUT
makes it harder to parametrize the algorithm.
There exist two algorithms that are sometimes called fast division algorithms: The
NewtonRaphson and the Goldschmidt algorithms. Both of them need a LUT [35, 36].
The latter is used in AMD processors [37].
Out of these algorithms, the NonRestoring digital division algorithm looks most promis
ing. It is therefore further investigated. The algorithm is rather simple:
1. A, B are unsigned pbit values
2. set: r
0
= A
3. start with i = 1 and repeat until i = p + 1
4. r
i
=
_
_
_
r
i1
B 2
p+1i
, if r
i1
0
r
i1
+B 2
p+1i
, if r
i1
< 0
5. q
pi
= 1 if r
i
0 and q
pi
= 0
6. when nished, Q = [q
pi
, ..., q
1
, q
0
] is the desired result
A numerical example for clarication can be found in Fig. 6.9.
A = 105, B = 5
A
B
=?
r
0
= 100
r
1
= 105 5 2
7
= 535 q
7
= 0
r
2
= 535 + 5 2
6
= 215 q
6
= 0
r
3
= 215 + 5 2
5
= 55 q
5
= 0
r
4
= 55 + 5 2
4
= 25 q
4
= 1
r
5
= 25 + 5 2
3
= 15 q
3
= 0
r
6
= 15 + 5 2
2
= 5 q
2
= 1
r
7
= 5 + 5 2
1
= 5 q
1
= 0
r
8
= 5 + 5 2
0
= 0 q
0
= 1
Q = 2
4
+ 2
2
+ 2
0
= 21 as expected
Figure 6.9: A numerical example for the digital NonRestoring division algorithm.
The NRDivision algorithm needs a 2p + 1 bit adder, three p bit registers for the inputs
and the output and a multiplication by a power of two. The latter can be done by simply
shifting the desired value left (logical shift left  LSL). The comparison if a value is
smaller than zero is easy, as this information is stored in the most signicant bit (MSB)
6.4 FINAL APPROACH 79
if the twos complement number representation is used. Additionally, some control
logic is needed.
This NRDivision seems to fulll the low hardware requirements. The latency of this
algorithm is p clock cycles and therefore no problem for this application, as p = 35 is
much smaller than the allowed number of latency cycles. The only critical problem left
is the question, if the NRAlgorithm with its bit adder is fast enough to meet the clock
cycle requirements. A parametrized 35 bit version with some adaptions to meet the
clock cycle requirement is presented in Fig. 6.10. The parts with the gray background
fulll no functional tasks  they are simply present to shorten the longest path and
therefore allow for a higher clock frequency. It is not necessary to initialize this circuit,
as the algorithm changes all values automatically. Another possibility would be to make
the divider smaller and cut away the last few bits of the input values (i.e. dividing both
A and B by a power of two). This could lead to a loss in precision.
6.4.10 Mapping Onto FPGA
The different entities were programmed in VHDL and then mapped onto the FPGA. An
overview of the hardware costs can be seen in table 6.4. This overview shows that the
implementation uses only a small part of the available FPGA resources. The amount
of ip ops and 4LUTs can be further reduced by omitting one of the dividers at the
output. One could either share one divider for both outputs or if one of the outputs is
not needed, simply omit that divider. The results also show that most of the slices only
use logic or storage  but rarely both. It has to be further noted that the gate count is
rather a marketing number and is not suitable for direct comparison to standard ASIC
designs.
6.4.11 Testing
The system was tested using several sets of test vectors. It seems to perform as expected.
For the better understanding of the signal forms, a sample run can be seen in Fig. 6.11.
80 6 IMPLEMENTATION
E
n
D
Q
C
l
k
R
s
t
N
R
_
D
I
V
I
S
I
O
N
_
E
N
T
D
A
T
A
_
B
B
O
T
H
_
V
A
L
U
E
S
_
R
E
A
D
Y
R
E
S
U
L
T
R
E
S
U
L
T
_
R
E
A
D
Y
D
A
T
A
_
A
1 0
O
R
"
p
"
"

1
"
D
Q
C
l
k
R
s
t
E
n
D
Q
C
l
k
R
s
t
i
f
=
=
0
t
h
e
n
1
E
n
D
Q
C
l
k
R
s
t
p
1
l
o
g
i
c
a
l
s
h
i
f
t
l
e
f
t
b
y
N
b
i
t
s
N
0 1
"
0
"
2
p
2
p
+
1
1 0
p
2
p
+
1
2
p
+
1
2
p
+
1
(
*
)
(
*
)
t
h
e
o
r
e
t
i
c
a
l
l
y
2
p
+
2
,
b
u
t
n
o
o
v
e
r
f
l
o
w
p
o
s
s
i
b
l
e
d
u
e
t
o
r
e
s
t
r
i
c
t
i
o
n
s
o
f
i
n
p
u
t
s
D
Q
C
l
k
R
s
t
o
n
l
y
M
S
B
p
D
E
M
U
X
N
1
p
p
D
Q
C
l
k
R
s
t
"
B
"
"
R
"
"
S
H
I
F
T
_
B
"
"
S
U
M
"
"
M
S
B
"
"
Q
"
"
D
E
L
A
Y
" "
D
E
L
A
Y
_
O
U
T
"
A
N
D
1
1
o
u
t
p
u
t
p
o
u
t
p
u
t
s
p

1
.
.
.
0
O
R
"
Q
_
E
N
A
B
L
E
"
"
C
N
T
"
n
u
m
e
r
i
c
a
l
e
x
a
m
p
l
e
:
(
A
=
1
0
0
/
B
=
3
)
=
3
3
;
p
r
e
c
i
s
i
o
n
:
7
b
i
t
r
0
=
A
=
1
0
0
>
0
r
1
=
r
0

2
^
7
*
B
=

2
8
4
<
0
=
=
>
q
7
=
0
(
1
2
8
)
r
2
=
r
1
+
2
^
6
*
B
=

9
2
<
0
=
=
>
q
6
=
0
(
6
4
)
r
3
=
r
2
+
2
^
5
*
B
=
4
>
=
0
=
=
>
q
5
=
1
(
3
2
)
r
4
=
r
3

2
^
4
*
B
=

4
4
<
0
=
=
>
q
4
=
0
(
1
6
)
r
5
=
r
4
+
2
^
3
*
B
=

2
0
<
0
=
=
>
q
3
=
0
(
8
)
r
6
=
r
5
+
2
^
2
*
B
=

8
<
0
=
=
>
q
2
=
0
(
4
)
r
7
=
r
6
+
2
^
1
*
B
=

2
<
0
=
=
>
q
1
=
0
(
2
)
r
8
=
r
7
+
2
^
0
*
B
=
1
>
=
0
=
=
>
q
0
=
1
(
1
)
=
=
>
Q
=
0
1
0
0
0
0
1
=
3
3
a
s
e
x
p
e
c
t
e
d
n
o
n
r
e
s
t
o
r
i
n
g
d
i
g
i
t
a
l
d
i
v
i
s
i
o
n
a
l
g
o
r
i
t
h
m
A
,
B
>
0
Q
=
r
o
u
n
d
d
o
w
n
(
A
/
B
)
"
I
N
T
E
R
N
A
L
_
S
H
I
F
T
_
T
E
M
P
"
b i t p  1
b i t 0
O R
p
D
Q
C
l
k
R
s
t
1 0
l
o
g
i
c
a
l
s
h
i
f
t
l
e
f
t
b
y
b
i
t
s
p
i
p
e
l
i
n
e
r
e
g
i
s
t
e
r
Figure 6.10: NR_DIVISION_ENT  the division entity.
6.4 FINAL APPROACH 81
Figure 6.11: Overview over all signals for the nal estimator entity.
82 6 IMPLEMENTATION
used available utilization
ip ops 1,236 49,152 2.5%
4 input Look Up Tables (4LUT) 2,409 49,152 4.9%
slices (= two 4LUT and two FF plus connec
tions to adjacent slices)
1,644 24,576 6.7%
Digital Signal Processing blocks (DSP48,
used for multipliers)
6 512 1.2%
total equivalent gate count for design 29,436
Table 6.4: Overview over the hardware costs for the implementation of the SNR
estimator.
7 Measurements
7.1 Measurements With Ofine Testbed
In a rst part, several measurements were made with the ofine testbed. An image of
the testbed can be seen in Fig. 7.1.
Figure 7.1: A picture of the MIMOOFDM testbed with 4 antennas.
The measurement setup was the following: One of the testbeds transmits a packet of
data. This packet is then sent over the channel simulator (simulating TGn C channel)
84 7 MEASUREMENTS
and received by the second testbed. The received datapoints are read out by a software
environment and the SNR is calculated in that software. In a next step, the data is
decoded separately for the constant SNR estimators and the proposed estimator. The
BER is calculated for each case. This step is repeated for several output power settings
of the channel simulator. The change of the output power is equivalent to changing
the channel SNR. The whole procedure was repeated for 200 different TGN C channels
(1000 bit data each). At the end, the BER values were averaged. A plot with the results
can be seen in Fig. 7.2.
As can be seen, the proposed algorithm performs better than the constant 60dB SNR
estimator over the whole range. In the low SNR range, the proposed algorithm is
superior to the constant 30dB estimator. In the high SNR range, the constant 30dB
estimator seems to perform slightly better than the proposed algorithm.
The main reason for this behavior can be found in Fig. 7.3: The estimated SNR curve
attens in the high SNR region and the estimation is therefore too low. The reasons for
this loss in performance will be investigated in the following sections.
7.1.1 DC Carrier Removal
Inspection of the received signal showed that there was a slight offset that seemed to
be slowly time varying. This offset was removed by the use of a high order digial high
pass lter. It was necessary to periodically extend the received signal in order to neglect
border effects. Measurements showed that the use of such a high pass lter had no
visible inuence on the performance of the SNR estimator.
7.1.2 Four SNR Values Estimated but Only One Required
Each receiving antenna calculates one SNR. But which of those SNRs should be
forwarded to the decoding components? The most obvious idea would be taking the
average. This doesnt seem to be the best version. A few measurements indicate that
taking the highest of all four SNRs leads to a better performance. This option was
used in the measurement from Fig. 7.2. One explanation could be that it is safer to
overestimate the SNR than to underestimate it. Taking the largest of the four values
also could be able to suppress some hardware nonidealities. But there are also other
7.1 MEASUREMENTS WITH OFFLINE TESTBED 85
6
0
5
5
5
0
4
5
4
0
3
5
3
0
1
0
2
1
0
1
O
u
t
p
u
t
p
o
w
e
r
c
h
a
n
n
e
l
s
i
m
u
l
a
t
o
r
[
d
B
]
B E R
E
s
t
i
m
a
t
o
r
0 5 1
0
1
5
2
0
2
5
3
0
3
5
6
0
Figure 7.2: Measurement of the BER with the ofine testbed.
86 7 MEASUREMENTS
6
5
6
0
5
5
5
0
4
5
4
0
3
5
3
0
2
5
1
2
1
4
1
6
1
8
2
0
2
2
2
4
2
6
e s t i m a t e d S N R [ d B ]
o
u
t
p
u
t
p
o
w
e
r
c
h
a
n
n
e
l
s
i
m
u
l
a
t
o
r
[
d
B
]
p
r
o
p
o
s
e
d
a
l
g
o
r
i
t
h
m
(
H
W
)
o
p
t
i
m
a
l
g
u
e
s
s
(
r
o
u
g
h
e
s
t
i
m
a
t
i
o
n
)
Figure 7.3: Estimated SNR values with ofine testbed compared to expected SNR
values. The expected values were approximated by taking the best performing curves
from Fig. 7.2 for each output setting.
7.1 MEASUREMENTS WITH OFFLINE TESTBED 87
possibilities  e.g. taking the average of the two highest values. This topic needs to be
further investigated and possibly also depends on the employed hardware platform.
7.1.3 Scaling All Streams to Equal Noise
It is possible to scale each receiving stream such that all of the streams have the same
noise power. There are implementational issues with this idea as a signal overow
due to upscaling and loss in precision due to downscaling have to be avoided. First
measurements suggest that there is no increase in performance due to the scaling.
Further measurements would be necessary to reliably determine the effects of scaling
the streams to equal noise power.
7.1.4 Transmit Noise
Another issue is transmit noise. A model describing transmit noise can be found in
Fig. 7.4.
transmit
data: s
channel H
transmit
noise: n1
channel
noise: n2
AGC
receive
noise: n3
received
data: y
Figure 7.4: A transmit noise model.
The received signal can be described in the following way:
y = (H (s +n
1
) +n
2
) +n
3
It has to be noted that even if the transmit noise n
1
was assumed to be AWGN, the
corresponding noise at the receiver would not be white anymore.
The inuence of transmit noise was investigated by the use of a simulation
1
. The
following assumptions were made: n
1
such that a desired transmit SNR is reached.
1
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=0..9)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE/ideal demapper: MMSE modulation: QPSK
88 7 MEASUREMENTS
was set to one and n
3
was set to zero. n
2
was set such that the desired channel
SNR was achieved. The estimated SNR for several given transmit SNR can be seen in
Fig. 7.5. The attening in the high SNR region that was observed in the measurement
can also be seen in the simulation. Comparing the measured and the simulated SNR
curves, it seems that the testbed has a transmit SNR of approximately 27dB which is
approximately the value that was known beforehand. Transmit noise therefore seems
to be sufcient to explain the attening of the SNR curve in the high SNR region.
It is obvious that one cannot achieve a total SNR that is higher than the transmit SNR.
As the simulated curves are still visibly rising at the point where the channel SNR
reaches the transmit SNR, one could compensate the attening by the use of a LUT.
It remains to investigate if this attening is responsible for the increased BER. A
simulation
2
with 30dB transmit SNR can be seen in Fig. 7.6. It is clearly visible that
the proposed estimator is as good as optimal in the low SNR region. In the high SNR
region, the constant estimators that estimate a higher SNR than the proposed estimator
perform slightly better. The attening due to the transmit noise seems therefore to be
sufcient to explain the loss in performance of the proposed estimator in the high SNR
region.
7.2 Measurements With Online Testbed
The estimator block was inserted into the testbed once for each receiving antenna.
Measurements show that the SNR value from the hardware estimator is signicantly
lower than the value obtained from the ofine testbed. There are basically two issues:
The beginning and the end of the valid data are not easily detected and the frequency
offset between the clocks seems to be a major problem. The rst problem is a timing
problem that can be solved by inserting the appropriate delays. The second problem
can be reproduced with the ofine testbed by switching off the frequency offset com
pensation. It therefore seems that the frequency offset is sufcient to explain the loss in
performance of the online testbed estimator.
2
SNR range: 030 [dB](step: 1 [dB]) number of sweeps: 20000 (seed=0..9)
channel model: TGn C transmitting antennas: 4 receiving antennas: 4 number of tones: 64
channel estimator: FDMLE/ideal demapper: MMSE modulation: QPSK
7.2 MEASUREMENTS WITH ONLINE TESTBED 89
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
30
35
40
SNR (channel) [dB]
e
s
t
i
m
a
t
e
d
S
N
R
[
d
B
]
proposed algorithm, 20dB transmit SNR
proposed algorithm, 30dB transmit SNR
proposed algorithm, 50dB transmit SNR
channel SNR = estimated SNR
Figure 7.5: Estimated SNR for several transmit SNR values
90 7 MEASUREMENTS
0 5 10 15 20 25 30
10
3
10
2
10
1
SNR (channel) [dB]
B
E
R
proposed algorithm
const=10dB
const=15dB
const=20dB
const=25dB
const=30dB
Figure 7.6: Simulation showing the BER for several estimators with 30dB transmit SNR.
7.2 MEASUREMENTS WITH ONLINE TESTBED 91
The problem with the frequency offset can be solved by placing the estimator in a dif
ferent position on the testbed. Instead of using directly the downsampled datastreams,
the estimator could be placed further back after the synchronization block where the
frequency offset is compensated. A block diagram of the online testbed can be seen in
Fig. 7.7.
92 7 MEASUREMENTS
b
u
f
f
e
r
B
A
T
B
o
a
r
d
F
P
G
A
3

V
i
r
t
e
x
4
W
I
N
G
B
o
a
r
d
s
R
F
R
F
R
F
V
A
M
P
B
o
a
r
d
s
y
n
c
h
r
o

n
i
z
a
t
i
o
n
m
o
d
u
l
a
t
i
o
n
F
F
T
/
I
F
F
T
d
e
m
o
d
u
l
a
t
i
o
n
M
I
M
O
p
r
o
c
e
s
s
i
n
g
n
o
i
s
e
e
s
t
i
m
a
t
o
r
u
p

&
d
o
w
n

s
a
m
p
l
i
n
g
F
P
G
A
2

V
i
r
t
e
x
2
P
r
o
R
F
c
h
a
n
n
e
l
c
o
d
i
n
g
a
n
d
d
e
c
o
d
i
n
g
b
u
f
f
e
r
b
u
f
f
e
r
P
o
w
e
r
P
C
s
u
b

s
y
s
t
e
m
e
t
h
e
r
n
e
t
s
u
b

s
y
s
t
e
m
e
t
h
e
r
n
e
t
p
l
u
g
F
P
G
A
1

V
i
r
t
e
x
2
P
r
o
D
A
C
A
G
C
D
A
C
A
G
C
D
A
C
A
G
C
D
A
C
A
G
C
b
u
f
f
e
r
B
A
T
B
o
a
r
d
F
P
G
A
3

V
i
r
t
e
x
4
W
I
N
G
B
o
a
r
d
s
R
F
R
F
R
F
V
A
M
P
B
o
a
r
d
s
y
n
c
h
r
o

n
i
z
a
t
i
o
n
m
o
d
u
l
a
t
i
o
n
F
F
T
/
I
F
F
T
d
e
m
o
d
u
l
a
t
i
o
n
M
I
M
O
p
r
o
c
e
s
s
i
n
g
u
p

&
d
o
w
n

s
a
m
p
l
i
n
g
n
o
i
s
e
e
s
t
i
m
a
t
o
r
F
P
G
A
2

V
i
r
t
e
x
2
P
r
o
R
F
c
h
a
n
n
e
l
c
o
d
i
n
g
a
n
d
d
e
c
o
d
i
n
g
b
u
f
f
e
r
b
u
f
f
e
r
P
o
w
e
r
P
C
s
u
b

s
y
s
t
e
m
e
t
h
e
r
n
e
t
s
u
b

s
y
s
t
e
m
e
t
h
e
r
n
e
t
p
l
u
g
F
P
G
A
1

V
i
r
t
e
x
2
P
r
o
D
A
C
A
G
C
D
A
C
A
G
C
D
A
C
A
G
C
D
A
C
A
G
C
Figure 7.7: Left image: Block diagram of the online testbed without frequency offset
compensation. Right image: Alternative block diagram of the online testbed that solves
the frequency offset problem.
8 Summary, Conclusion and Outlook
8.1 Summary
The rst chapter of this thesis presents the ofcial task description for this semester
thesis. The aim is to implement a noise variance estimator (or equivalently a SNR
estimator) for a MIMOOFDM testbed. In the following, the term SNR estimator is used
instead of noise variance estimator, as the SNR is a value that is easier understandable.
The two can be converted into each other by the following equation:
SNR =
signal power
noise variance
In the second chapter of this thesis, the basics of MIMOOFDM communication are
presented. It is further justied why it is worth to increase the hardware costs in order
to use MIMOOFDM instead of a simple SISO system. The principal arguments are
a higher throughput and more robustness against noise. Several channel models are
presented including the TGn channels.
In the third chapter, an overview of the actual state of research in the topic of noise
estimation is presented. There exist a larger number of different algorithms. Most
of the presented algorithms are not directly applicable to MIMOOFDM. Some of the
remaining ones have high hardware costs or exhibit poor performance. There are a few
algorithms that look promising for estimating the SNR in a MIMOOFDM system.
The fourth chapter introduces the simulation environment. It is justied why imple
menting a good SNR estimator is worth the effort. It is shown that a good SNR estimator
can lower the BER in a MIMOOFDM system compared to a constant SNR estimator
that was used beforehand. It is also elaborated why the best performance is attained by
estimating the SNR two decibels lower than the actual channel SNR.
At the beginning of the fth chapter, two of the most promising SNR estimation
algorithms are implemented and tested. It is found that algorithms working in the
frequency domain are highly dependent on the quality of the channel estimator. This
94 8 SUMMARY, CONCLUSION AND OUTLOOK
is not desired out of two reasons: Firstly, it is not desired to be dependent on another
component. And even more important, a high quality channel estimator signicantly
decreases the throughput of the system. Out of those reections, the autor proposes
a novel algorithm working in the time domain. The mathematical description of the
algorithm is elaborated. In a next step, the algorithm is simulated on a TGn C channel
using several scenarios. The algorithm seems to perform quite well in situations that
might be found in a real system.
In the sixth chapter, the proposed algorithm is implemented on a FPGA. The different
blocks of the nal version are described in detail. One of the most critical blocks was
the divider, as cell libraries usually do not provide dividers. The non restoring digital
division algorithm was found to be the best suited solution. At the end, the algorithm
was successfully mapped onto an FPGA.
The seventh chapter presents measurements with the ofine testbed. The results show
that the presented SNR estimator is superior to the previously used constant SNR. There
is one issue with the attening of the estimated SNR curve in the high SNR region.
The reason for this behavior is the transmit noise present in the transmitter hardware.
An idea for a solution using a LUT is presented. In a next step, measurements with
the online testbed were conducted. The estimated SNR is signicantly lower than
expected. The main issue here is the frequency offset between the transmitter and the
receiver. This problem should be solvable by putting the SNR estimator block behind
the synchronization block already present on the testbed.
8.2 Conclusion and Outlook
The proposed algorithm performs in the ofine testbed visibly better than the previous
constant 30dB SNR estimator. If one assumes that the region of interest starts some
where above 0dB SNR and ends at around 27dB SNR (limited by hardware noise), the
algorithm has an acceptable performance over the whole range. As already mentioned
before, one could further try to compensate the attening in the high SNR region.
In this work, it was not investigated how the algorithm performs if the SNR is below
0dB. It seems that this region is not of interest in the given case. If SNRs between zero
and ten decibels were expected regularly, it would probably be worth to supplement
the hardware implementation with some extra precision. It could also be interesting
8.2 CONCLUSION AND OUTLOOK 95
to further decrease the hardware costs by sharing the two dividers at the output (or
leaving one away) or by removing some precision in the high SNR region.
The hardware implementation still needs to be fully incorporated into the online testbed
and further measurements have to be conducted in order to evaluate the exact benet
of implementing the SNR estimator.
Further, the algorithm could still be enhanced. It is not yet investigated thoroughly,
which combination of the four estimated SNRs should be taken. It remains also to
investigate what effects the scaling of all streams to equal noise power generates. It
could further be interesting to investigate, if it would be worth to subtract the two
decibels of noise that are added by the channel estimator. There are also some ideas
available about how to further increase the performance of the algorithm itself. Scaling
all samples to the same area, removing the offset or ltering them are only a few
ideas. One has to watch out that one of the strong points of the algorithm doesnt get
destroyed: Its simplicity. There are the small hardware costs as well as the use of the
already present short preambles.
There exist also further possibilities for simulations: The effects of having different
noise and signal powers on each of the receiving streams or the effects of a slowly
fading channel could be further investigated.
96 8 SUMMARY, CONCLUSION AND OUTLOOK
Bibliography
[1] Helmut Blcskei, MIMOOFDM Wireless Systems: Basics, Perspectives and Chal
lenges, IEEE Wireless Communications, Volume 13, Issue 4, August 2006.
[2] LANCOM Systems GmbH, LANCOM Techpaper: 802.11n im Uberblick,
www.lancom.de, 2008.
[3] Luis Litwin and Michael Pugel, The Principles of OFDM, RF signal processing,
January 2001.
[4] S. B. Weinstein, Data Transmission by FrequencyDivision Multiplexing Using
the Discrete Fourier Transform, IEEE Transactions on Communication Technology,
Volume 19, Issue 5, October 1971.
[5] Cisco Systems GMBH, berblick ber die WirelessTechnologie 802.11n,
www.cisco.com, 2007.
[6] Vinko Erceg, Laurent Schumacher, Persefoni Kyritsi, et al., TGn Channel Models,
doc.: IEEE 802.1103/940r4, May 2004.
[7] Carlos H. Aldana, Atul A. Salvekar, Jose Tellado and John Ciof, Accurate Noise
Estimates in Multicarrier Systems, IEEE Vehicular Technology Conference, 2000.
[8] Doukas Athanasios and Grigorios Kalivas, SNR Estimation Algorithms in AWGN
for HiperLAN/2 Transceiver, Applied Electronics Laboratory, Department of Electri
cal Computer Engineering, University of Patras, 2005.
[9] , SNR Estimation for Low Bit Rate OFDM Systems in AWGN Channel,
Proceedings of the International Conference on Networking, International Conference
on Systems and International Conference on Mobile Communications and Learning
Technologies, 2006.
[10] Norman C. Beaulieu, Comparison of Four SNR Estimators for QPSK Modulations,
IEEE Communications Letters, Volume 4, Issue 2, February 2000.
98 BIBLIOGRAPHY
[11] DaeKi Hong, CheolHee Park, MinChul Ju, KyuJung Youn, SunDo Jun and Jin
Woong Cho, SNR Estimation in Frequency Domain Using Circular Correlation,
IEEE Electronics Letters, Volume 38, Issue 25, December 2002.
[12] Sandrine Boumard, Novel Noise Variance and SNR Estimation Algorithm for Wire
less MIMO OFDM Systems, Global Telecommunications Conference, GLOBECOM
03, Volume 3, 2003.
[13] David R. Pauluzzi and Norman C. Beaulieu, A Comparison of SNR Estimation
Techniques for the AWGN Channel, IEEE Transactions on Communications, Volume
48, Issue 10, October 2000.
[14] Bin Li, Robert DiFazio and Ariela Zeira, A Low Bias Algorithm to Estimate
Negative SNRs in an AWGN Channel, IEEE Communications Letters, Volume 6,
Issue 11, November 2002.
[15] GuangLiang Ren, YiLin Chang and Hui Zhang, A New SNRs Estimator for QPSK
Modulations in an AWGN Channel, IEEE Transactions on Circuits and Systems II:
Express Briefs, Volume 52, Issue 6, June 2005.
[16] GuangLiang Ren, YiLin Chang and HuiNing Zhang, SNR Estimation Algorithm
Based on the Preamble for Wireless OFDM Systems, Science in China Series F:
Information Sciences, Volume 51, Issue 7, July 2008.
[17] Timothy M. Schmidl and Donald C. Cox, Robust Frequency and Timing Synchro
nization for OFDM, IEEE Transactions on Communications, Volume 45, Issue 12,
December 1997.
[18] DongJoon Shin, Wonjin Sung and InKyung Kim, Simple SNR Estimation Meth
ods for QPSK Modulated Short Bursts, Global Telecommunications Conference,
GLOBECOM 01, 2001.
[19] Xiaodong Xu, Ya Jing, Xiaohu Yu, SubspaceBased Noise Variance and SNR
Estimation for OFDM Systems, IEEE Wireless Communications and Networking
Conference, 2005.
[20] Huilin Xu, Guo Wei and Jinkang Zhu, A Novel SNR Estimation Algorithm for
OFDM, IEEE Vehicular Technology Conference, 2005.
BIBLIOGRAPHY 99
[21] Tevk Ycek and Hseyin Arslan, MMSE Noise Power and SNR Estimation for
OFDM Systems, IEEE Transactions on Vehicular Technology, Volume 56, Issue 6,
2006.
[22] , Noise Plus Interference Power Estimation in Adaptive OFDM Systems,
IEEE Transactions on Vehicular Technology, Volume 56, Issue 6, 2005.
[23] Nader S. Alagha, CramerRao Bounds of SNR Estimates for BPSK and QPSK
Modulated Signals, IEEE Communications Letters, Volume 5, Issue 1, January
2001.
[24] Thomas R. Benedict, The Joint Estimation of Signal and Noise From the Sum
Envelope, IEEE Information Theory, Volume 13, Issue 3, July 1967.
[25] Shousheng He and Mats Torkelson, Effective SNR Estimation in OFDM System
Simulation, Global Telecommunications Conference, GLOBECOM 98, Volume 2,
1998.
[26] M. C. Jeruchim and R.J. Wolfe, Estimation of the SignaltoNoise Ratio (SNR) in
Communication Simulation, Global Telecommunications Conference, GLOBECOM
89, 1989.
[27] R. B. Kerr, On Signal and Noise Level Estimation in a Coherent PCM Channel,
IEEE Transactions on Aerospace and Electronic Systems, Volume 2, Issue 4, March
1966.
[28] Mustafa Trkboylari and Gordon L. Stber, An Efcient Algorithm for Estimating
the SignaltoInterference Ratio in TDMA Cellular Systems, IEEE Transactions on
Communications, Volume 46, Issue 6, June 1998.
[29] Ami Wiesel, Jason Goldberg and Hagit Messer, DataAided SignaltoNoiseRatio
Estimation in Time Selective Fading Channels, IEEE International Conference on
Acoustics, Speech and Signal Processing, 2002.
[30] , NonDataAided SignaltoNoiseRatio Estimation, IEEE International Con
ference on Communications, 2002.
[31] Ami Wiesel, Jason Goldberg and Hagit MesserYaron, SNR Estimation in Time
Varying Fading Channels, IEEE Transactions on Communications, Volume 54, Issue
5, May 2006.
[32] Xilinx Inc., Virtex4 Family Overview, Product Specication, 2007.
100 BIBLIOGRAPHY
[33] Volker Jungnickel and Eduard Jorswiek et. al., White Paper: MIMOOFDM in the
TDD Mode, Fraunhofer Institute for Telecommunications, HeinrichHertz Institut,
2005.
[34] A. Strey, Computer Arithmetik, SS 2005, Lecture Slides  Universitt Ulm, April
1998.
[35] Peter Markstein, Software Division and Square Root Using Goldschmidts Algo
rithm, Real Numbers and Computers6, 146157, November 2004.
[36] Reto Zimmermann, Computer Arithmetic: Principles, Architectures and VLSI
Design, Lecture Notes; Integrated Systems Laboratory  Swiss Federal Institute of
Technology (ETH), 1999.
[37] TaekJun Kwon, Jeff Draper, FloatingPoint Division and Square Root Implemen
tation Using a TaylorSeries Expansion Algorithm With Reduced LookUp Tables,
Symposium on Circuits and Systems, 2008.