You are on page 1of 25

Cohen, A.

“Biomedical Signals: Origin and Dynamic Characteristics; Frequency-Domain Analysis”


The Biomedical Engineering Handbook: Second Edition.
Ed. Joseph D. Bronzino
Boca Raton: CRC Press LLC, 2000
52
Biomedical Signals:
Origin and Dynamic
Characteristics;
Frequency-Domain
Analysis

52.1 Origin of Biomedical Signals


52.2 Classification of Biosignals
52.3 Stochastic Signals
52.4 Frequency-Domain Analysis
52.5 Discrete Signals
52.6 Data Windows
52.7 Short-Time Fourier Transform (STFT)
52.8 Spectral Estimation
The Blackman-Tukey Method • The Periodogram • Time-
Series Analysis Methods
52.9 Signal Enhancement
52.10 Optimal Filtering
Minimization of Mean Squared Error: The Wiener Filter •
Maximization of the Signal-to-Noise Ratio: The Matched
Filter
Arnon Cohen 52.11 Adaptive Filtering
Ben-Gurion University 52.12 Segmentation of Nonstationary Signals

A signal is a phenomenon that conveys information. Biomedical signals are signals, used in biomedical
fields, mainly for extracting information on a biologic system under investigation. The complete process
of information extraction may be as simple as a physician estimating the patient’s mean heart rate by
feeling, with the fingertips, the blood pressure pulse or as complex as analyzing the structure of internal
soft tissues by means of a complex CT machine.
Most often in biomedical applications (as in many other applications), the acquisition of the signal is
not sufficient. It is required to process the acquired signal to get the relevant information “buried” in it.
This may be due to the fact that the signal is noisy and thus must be “cleaned” (or in more professional
terminology, the signal has to be enhanced) or due to the fact that the relevant information is not “visible”
in the signal. In the latter case, we usually apply some transformation to enhance the required information.

© 2000 by CRC Press LLC


The processing of biomedical signals poses some unique problems. The reason for this is mainly the
complexity of the underlying system and the need to perform indirect, noninvasive measurements. A
large number of processing methods and algorithms is available. In order to apply the best method, the
user must know the goal of the processing, the test conditions, and the characteristics of the underlying
signal. In this chapter, the characteristics of biomedical signals will be discussed [Cohen, 1986]. Biomed-
ical signals will be divided into characteristic classes, requiring different classes of processing methods.
Also in this chapter, the basics of frequency-domain processing methods will be presented.

52.1 Origin of Biomedical Signals


From the broad definition of the biomedical signal presented in the preceding section, it is clear that
biomedical signals differ from other signals only in terms of the application—signals that are used in the
biomedical field. As such, biomedical signals originate from a variety of sources. The following is a brief
description of these sources:
• Bioelectric signals. The bioelectric signal is unique to biomedical systems. It is generated by nerve
cells and muscle cells. Its source is the membrane potential, which under certain conditions may
be excited to generate an action potential. In single cell measurements, where specific microelec-
trodes are used as sensors, the action potential itself is the biomedical signal. In more gross
measurements, where, for example, surface electrodes are used as sensors, the electric field gen-
erated by the action of many cells, distributed in the electrode’s vicinity, constitutes the bioelectric
signal. Bioelectric signals are probably the most important biosignals. The fact that most important
biosystems use excitable cells makes it possible to use biosignals to study and monitor the main
functions of the systems. The electric field propagates through the biologic medium, and thus the
potential may be acquired at relatively convenient locations on the surface, eliminating the need
to invade the system. The bioelectric signal requires a relatively simple transducer for its acquisi-
tion. A transducer is needed because the electric conduction in the biomedical medium is done
by means of ions, while the conduction in the measurement system is by electrons. All these lead
to the fact that the bioelectric signal is widely used in most fields of biomedicine.
• Bioimpedance signals. The impedance of the tissue contains important information concerning its
composition, blood volume, blood distribution, endocrine activity, automatic nervous system
activity, and more. The bioimpedance signal is usually generated by injecting into the tissue under
test sinusoidal currents (frequency range of 50 kHz to 1 MHz, with low current densities of the
order of 20 µA to 20 mA). The frequency range is chosen to minimize electrode polarization
problems, and the low current densities are chosen to avoid tissue damage mainly due to heating
effects. Bioimpedance measurements are usually performed with four electrodes. Two source
electrodes are connected to a current source and are used to inject the current into the tissue. The
two measurement electrodes are placed on the tissue under investigation and are used to measure
the voltage drop generated by the current and the tissue impedance.
• Bioacoustic signals. Many biomedical phenomena create acoustic noise. The measurement of this
acoustic noise provides information about the underlying phenomenon. The flow of blood in the
heart, through the heart’s valves, or through blood vessels generates typical acoustic noise. The
flow of air through the upper and lower airways and in the lungs creates acoustic sounds. These
sounds, known as coughs, snores, and chest and lung sounds, are used extensively in medicine.
Sounds are also generated in the digestive tract and in the joints. It also has been observed that
the contracting muscle produces an acoustic noise (muscle noise). Since the acoustic energy
propagates through the biologic medium, the bioacoustic signal may be conveniently acquired on
the surface, using acoustic transducers (microphones or accelerometers).
• Biomagnetic signals. Various organs, such as the brain, heart, and lungs, produce extremely weak
magnetic fields. The measurements of these fields provides information not included in other

© 2000 by CRC Press LLC


biosignals (such as bioelectric signals). Due to the low level of the magnetic fields to be measured,
biomagnetic signals are usually of very low signal-to-noise ratio. Extreme caution must be taken
in designing the acquisition system of these signals.
• Biomechanical signals. The term biomechanical signals includes all signals used in the biomedicine
fields that originate from some mechanical function of the biologic system. These signals include
motion and displacement signals, pressure and tension and flow signals, and others. The mea-
surement of biomechanical signals requires a variety of transducers, not always simple and inex-
pensive. The mechanical phenomenon does not propagate, as do the electric, magnetic, and
acoustic fields. The measurement therefore usually has to be performed at the exact site. This very
often complicates the measurement and forces it to be an invasive one.
• Biochemical signals. Biochemical signals are the result of chemical measurements from the living
tissue or from samples analyzed in the clinical laboratory. Measuring the concentration of various
ions inside and in the vicinity of a cell by means of specific ion electrodes is an example of such
a signal. Partial pressures of oxygen (pO2) and of carbon dioxide (pCO2) in the blood or respiratory
system are other examples. Biochemical signals are most often very low frequency signals. Most
biochemical signals are actually dc signals.
• Biooptical signals. Biooptical signals are the result of optical functions of the biologic system,
occurring naturally or induced by the measurement. Blood oxygenation may be estimated by
measuring the transmitted and backscattered light from a tissue (in vivo and in vitro) in several
wavelengths. Important information about the fetus may be acquired by measuring fluorescence
characteristics of the amniotic fluid. Estimation of the heart output may be performed by the dye
dilution method, which requires the monitoring of the appearance of recirculated dye in the
bloodstream. The development of fiberoptic technology has opened vast applications of biooptical
signals.
Table 52.1 lists some of the more common biomedical signals with some of their characteristics.

52.2 Classification of Biosignals


Biosignals may be classified in many ways. The following is a brief discussion of some of the most
important classifications.
• Classification according to source. Biosignals may be classified according to their source or physical
nature. This classification was described in the preceding section. This classification may be used
when the basic physical characteristics of the underlying process is of interest, e.g., when a model
for the signal is desired.
• Classification according to biomedical application. The biomedical signal is acquired and processed
with some diagnostic, monitoring, or other goal in mind. Classification may be constructed
according to the field of application, e.g., cardiology or neurology. Such classification may be of
interest when the goal is, for example, the study of physiologic systems.
• Classification according to signal characteristics. From point of view of signal analysis, this is the
most relevant classification method. When the main goal is processing, it is not relevant what is
the source of the signal or to which biomedical system it belongs; what matters are the signal
characteristics.
We recognize two broad classes of signals: continuous signals and discrete signals. Continuous signals
are described by a continuous function s(t) which provides information about the signal at any given
time. Discrete signals are described by a sequence s(m) which provides information at a given discrete
point on the time axis. Most of the biomedical signals are continuous. Since current technology provides
powerful tools for discrete signal processing, we most often transform a continuous signal into a discrete
one by a process known as sampling. A given signal s(t) is sampled into the sequence s(m) by

© 2000 by CRC Press LLC


TABLE 52.1 Biomedical Signals
Classification Acquisition Frequency Range Dynamic Range Comments

Bioelectric
Action potential Microelectrodes 100 Hz–2 kHz 10 µV–100 mV Invasive measurement of cell
membrane potential
Electroneurogram (ENG) Needle electrode 100 Hz–1 kHz 5 µV–10 mV Potential of a nerve bundle
Electroretinogram (ERG) Microelectrode 0.2–200 Hz 0.5 µV–1 mV Evoked flash potential
Electro-oculogram (EOG) Surface electrodes dc–100 Hz 10 µV–5 mV Steady-corneal-retinal potential
Electroencephalogram
(EEG)
Surface Surface electrodes 0.5–100 Hz 2–100 µV Multichannel (6–32) scalp
potential
Delta range 0.5–4 Hz Young children, deep sleep and
pathologies
Theta range 4–8 Hz Temporal and central areas
during alert states
Alpha range 8–13 Hz Awake, relaxed, closed eyes
Beta range 13–22 Hz
Sleep spindles 6–15 Hz 50–100 µV Bursts of about 0.2 to 0.6 s
K-complexes 12–14 Hz 100–200 µV Bursts during moderate and
deep sleep
Evoked potentials (EP) Surface electrodes 0.1–20 µV Response of brain potential to
stimulus
Visual (VEP) 1–300 Hz 1–20 µV Occipital lobe recordings,
200-ms duration
Somatosensory (SEP) 2 Hz–3 kHz Sensory cortex
Auditory (AEP) 100 Hz–3 kHz 0.5–10 µV Vertex recordings
Electrocorticogram Needle electrodes 100 Hz–5 kHz Recordings from exposed
surface of brain
Electromyography (EMG)
Single-fiber (SFEMG) Needle electrode 500 Hz–10 kHz 1–10 µV Action potentials from single
muscle fiber
Motor unit action Needle electrode 5 Hz–10 kHz 100 µV–2 mV
potential (MUAP)
Surface EMG (SEMG) Surface electrodes
Skeletal muscle 2–500 Hz 50 µV–5 mV
Smooth muscle 0.01–1 Hz
Electrocardiogram (ECG) Surface electrodes 0.05–100 Hz 1–10 mV
High-Frequency ECG Surface electrodes 100 Hz–1 kHz 100 µV–2 mV Notchs and slus waveforms
superimposed on the ECG.

( ) ()
s m =s t
t =mTs
m = …, − 1, 0, 1,… (52.1)

where Ts is the sampling interval and fs = (2π/Ts ) is the sampling frequency. Further characteristic
classification, which applies to continuous as well as discrete signals, is described in Fig. 52.1.
We divide signals into two main groups: deterministic and stochastic signals. Deterministic signals are
signals that can be exactly described mathematically or graphically. If a signal is deterministic and its
mathematical description is given, it conveys no information. Real-world signals are never deterministic.
There is always some unknown and unpredictable noise added, some unpredictable change in the
parameters, and the underlying characteristics of the signal that render it nondeterministic. It is, however,
very often convenient to approximate or model the signal by means of a deterministic function.
An important family of deterministic signals is the periodic family. A periodic signal is a deterministic
signal that may be expressed by

© 2000 by CRC Press LLC


FIGURE 52.1 Classification of signals according to characteristics.

() (
s t = s t + nT ) (52.2)

where n is an integer, and T is the period. The periodic signal consists of a basic wave shape with a
duration of T seconds. The basic wave shape repeats itself an infinite number of times on the time axis.
The simplest periodic signal is the sinusoidal signal. Complex periodic signals have more elaborate wave
shapes. Under some conditions, the blood pressure signal may be modeled by a complex periodic signal,
with the heart rate as its period and the blood pressure wave shape as its basic wave shape. This is, of
course, a very rough and inaccurate model.
Most deterministic functions are nonperiodic. It is sometimes worthwhile to consider an “almost
periodic” type of signal. The ECG signal can sometimes be considered “almost periodic.” The ECG’s RR
interval is never constant; in addition, the PQRST complex of one heartbeat is never exactly the same as
that of another beat. The signal is definitely nonperiodic. Under certain conditions, however, the RR
interval is almost constant, and one PQRST is almost the same as the other. The ECG may thus sometimes
be modeled as “almost periodic.”

52.3 Stochastic Signals


The most important class of signals is the stochastic class. A stochastic signal is a sample function of a
stochastic process. The process produces sample functions, the infinite collection of which is called the
ensemble. Each sample function differs from the other in it fine details; however, they all share the same

© 2000 by CRC Press LLC


FIGURE 52.2 The ensemble of the stochastic process s(t).

distribution probabilities. Figure 52.2 depicts three sample functions of an ensemble. Note that at any
given time, the values of the sample functions are different.
Stochastic signals cannot be expressed exactly; they can be described only in terms of probabilities
which may be calculated over the ensemble.
Assuming a signal s(t), the Nth-order joint probability function

[( ) () ( ) ] (
P s t1 ≤ s1, s t 2 ≤ s2 , …, s t N ≤ s N = P s1, s2 ,…, s N ) (52.3)

is the joint probability that the signal at time ti will be less than or equal to Si and at time tj will be less
than or equal to Sj , etc. This joint probability describes the statistical behavior and intradependence of
the process.
It is very often useful to work with the derivative of the joint probability function; this derivative is
known as the joint probability density function (PDF):

( )
p s1, s2 ,…, s N =
∂N
∂s1 ∂s2 L∂s N [(
P s1, s2 ,…, s N )] (52.4)

Of particular interest are the first- and second-order PDFs.


The expectation of the process s(t), denoted by E{s(t)} or by ms , is a statistical operator defined as

© 2000 by CRC Press LLC


{ ( )} = ∫ sp(s)ds = m,

E st (52.5)
−∞

The expectation of the function sn(t) is known as the nth-order moment. The first-order moment is
thus the expectation of the process. The nth-order moment is given by

{ ( )} = ∫ s p(s)ds

E sn t n
(52.6)
−∞

Another important statistical operator is the nth central moment:

( ) ∫ (s − m ) p(s)ds

 n n
µ n = E  s − ms  = s (52.7)
  −∞

The second central moment is known as the variance (the square root of which is the standard
deviation). The variance is denoted by σ 2 :

( ) ∫ (s − m ) p(s)ds

 2 2
σ 2 = µ 2 = E  s − ms  = s (52.8)
  −∞

The second-order joint moment is defined by the joint PDF. Of particular interest is the autocorrelation
function rss :

( ) { ( ) ( )} = ∫ ∫ s(t )s(t ) p(s , s )ds ds


∞ ∞
rss t1, t 2 = E s t1 s t 2 1 2 1 2 1 2 (52.9)
−∞ −∞

The cross-correlation function is defined as the second joint moment of the signal s at time t1, s(t1),
and the signal y at time t2, y(t2):

( ) { ( ) ( )} = ∫ ∫ s(t ) y(t ) p(s , y )ds dy


∞ ∞
rsy t1, t 2 = E s t1 y t 2 1 2 1 2 1 2 (52.10)
−∞ −∞

Stationary stochastic processes are processes whose statistics do not change in time. The expectation
and the variance (as with any other statistical mean) of a stationary process will be time-independent. The
autocorrelation function, for example, of a stationary process will thus be a function of the time difference
τ = t2 – t1 (one-dimensional function) rather than a function of t2 and t1 (two-dimensional function).
Ergodic stationary processes possess an important characteristic: Their statistical probability distribu-
tions (along the ensemble) equal those of their time distributions (along the time axis of any one of its
sample functions). For example, the correlation function of an ergodic process may be calculated by its
definition (along the ensemble) or along the time axis of any one of its sample functions:

( ) { ( ) ( )} ∫ s(t )s(t − τ)dt


T
1
rss τ = E s t s t − τ = lim (52.11)
T→ ∞ 2T −T

The right side of Eq. (52.11) is the time autocorrelation function.


Ergodic processes are nice because one does not need the ensemble for calculating the distributions;
a single sample function is sufficient. From the point of view of processing, it is desirable to model the

© 2000 by CRC Press LLC


signal as an ergodic one. Unfortunately, almost all signals are nonstationary (and hence nonergodic).
One must therefore use nonstationary processing methods (such as, for example, wavelet transformation)
which are relatively complex or cut the signals into short-duration segments in such a way that each may
be considered stationary.
The sleep EEG signal, for example, is a nonstationary signal. We may consider segments of the signal,
in which the subject was at a given sleep state, as stationary. In order to describe the signal, we need to
estimate its probability distributions. However, the ensemble is unavailable. If we further assume that
the process is ergodic, the distributions may be estimated along the time axis of the given sample function.
Most of the standard processing techniques assume the signal to be stationary and ergodic.

52.4 Frequency-Domain Analysis


Until now we have dealt with signals represented in the time domain, that is to say, we have described
the signal by means of its value on the time axis. It is possible to use another representation for the same
signal: that of the frequency domain. Any signal may be described as a continuum of sine waves having
different amplitudes and phases. The frequency representation describes the signals by means of the
amplitudes and phases of the sine waves. The transformation between the two representations is given
by the Fourier transform (FT):

( ) ∫ s(t )e { ( )}

− jωt
Sω = dt = F s t (52.12)
−∞

where ω = 2πf is the angular frequency, and F{*} is the Fourier operator.
The inverse Fourier transform (IFT) is the operator that transforms a signal from the frequency domain
into the time domain:

() ∫ S(ω)e { ( )}

1 jωt
st = dw = F −1 S ω (52.13)
2π −∞

The frequency domain representation S(ω) is complex; hence

() ()
Sω =Sω e ( )
jθ ω
(52.14)

where S(ω), the absolute value of the complex function, is the amplitude spectrum, and θ(ω), the phase
of the complex function, is the phase spectrum. The square of the absolute value, S(ω) 2, is termed the
power spectrum. The power spectrum of a signal describes the distribution of the signal’s power on the
frequency axis. A signal in which the power is limited to a finite range of the frequency axis is called a
band-limited signal. Figure 52.3 depicts an example of such a signal.
The signal in Fig. 52.3 is a band-limited signal; its power spectrum is limited to the frequency range
–ωmax ≤ ω ≤ ωmax . It is easy to show that if s(t) is real (which is the case in almost all applications), the
amplitude spectrum is an even function and the phase spectrum is an odd function.
Special attention must be given to stochastic signals. Applying the FT to a sample function would
provide a sample function on the frequency axis. The process may be described by the ensemble of
spectra. Another alternative to the frequency representation is to consider the correlation function of the
process. This function is deterministic. The FT may be applied to it, yielding a deterministic frequency
function. The FT of the correlation function is defined as the power spectral density function (PSD):

[ ( )] ( ) { ( )} ∫ r (τ)e

− jωτ
PSD s t = Sss ω = F rss τ = ss dτ (52.15)
−∞

© 2000 by CRC Press LLC


FIGURE 52.3 Example of a signal described in the time and frequency domains.

The PSD is used to describe stochastic signals; it describes the density of power on the frequency axis.
Note that since the autocorrelation function is an even function, the PSD is real; hence no phase spectrum
is required.
The EEG signal may serve as an example of the importance of the PSD in signal processing. When
processing the EEG, it is very helpful to use the PSD. It turns out that the power distribution of the EEG
changes according to the physiologic and psychological states of the subject. The PSD may thus serve as
a tool for the analysis and recognition of such states.
Very often we are interested in the relationship between two processes. This may be the case, for
example, when two sides of the brain are investigated by means of EEG signals. The time-domain
expression of such relationships is given by the cross-correlation function (Eq. 52.10). The frequency-
domain representation of this is given by the FT of the cross-correlation function, which is called the
cross-power spectral density function (C-PSD) or the cross-spectrum:

( ) { ( )}
Ssy ω = F rsy τ = Ssy ω e () ( )
jθsy ω
(52.16)

Note that we have assumed the signals s(t) and y(t) are stationary; hence the cross-correlation function
is not a function of time but of the time difference τ. Note also that unlike the autocorrelation function,
rsy(τ) is not even; hence its FT is not real. Both absolute value and phase are required.
It can be shown that the absolute value of the C-PSD is bounded:

() () ()
2
Ssy ω ≤ Sss ω S yy ω (52.17)

The absolute value information of the C-PSD may thus be normalized to provide the coherence function:

( ) ≤1
2
Ssy ω
γ sy2 =
S (ω )S (ω )
(52.18)
ss yy

The coherence function is used in a variety of biomedical applications. It has been used, for example, in
EEG analysis to investigate brain asymmetry.

© 2000 by CRC Press LLC


FIGURE 52.4 Amplitude spectrum of a sampled signal with sampling frequency above the Nyquist frequency (upper
trace) and below the Nyquist frequency (lower trace).

52.5 Discrete Signals


Assume now that the signal s(t) of Fig. 52.3 was sampled using a sampling frequency of fs = ωs /(2π) =
(2π)/Ts . The sampled signal is the sequence s(m). The representation of the sampled signal in the
frequency domain is given by applying the Fourier operator:

( ) { ( )} ( )
Ss ω = F s m = Ss ω e ( )
jθs ω
(52.19)

The amplitude spectrum of the sampled signal is depicted in Fig. 52.4. It can easily be proven that the
spectrum of the sampled signal is the spectrum of the original signal repeated infinite times at frequencies
of nωs . The spectrum of a sampled signal is thus a periodic signal in the frequency domain. It can be
observed, in Fig. 52.4, that provided the sampling frequency is large enough, the wave shapes of the
spectrum do not overlap. In such a case, the original (continuous) signal may be extracted from the
sampled signal by low-pass filtering. A low-pass filter with a cutoff frequency of ωmax will yield at its
output only the first period of the spectrum, which is exactly the continuous signal. If, however, the
sampling frequency is low, the wave shapes overlap, and it will be impossible to regain the continuous
signal.
The sampling frequency must obey the inequality

ω s ≥ 2ω max (52.20)

Equation (52.20) is known as the sampling theorem, and the lowest allowable sampling frequency is called
the Nyquist frequency. When overlapping does occur, there are errors between the sampled and original

© 2000 by CRC Press LLC


signals. These errors are known as aliasing errors. In practical applications, the signal does not possess a
finite bandwidth; we therefore limit its bandwidth by an antialiasing filter prior to sampling.
The discrete Fourier transform (DFT) [Proakis & Manolakis, 1988] is an important operator that maps
a finite sequence s(m), m = 0, 1, … , N – 1, into another finite sequence S(k), k = 0, 1, … , N – 1. The
DFT is defined as

{ ( )} ∑ s(m)e
N −1

()
S k = DFT s m = − jkm
(52.21)
m =0

An inverse operator, the inverse discrete Fourier transform (IDFT), is an operator that transforms the
sequence S(k) back into the sequence s(m). It is given by

{ ( )} ∑S(k)e
N −1

( )
s m = IDFT S k = − jkm
(52.22)
k =0

It can be shown that if the sequence s(m) represents the samples of the band-limited signal s(t), sampled
under Nyquist conditions with sampling interval of Ts , the DFT sequence S(k) (neglecting windowing
effects) represents the samples of the FT of the original signal:

() ( )
S k = Ss ω ω
ω =k s
k = 0, 1, …, N − 1 (52.23)
N

Figure 52.5 depicts the DFT and its relations to the FT. Note that the N samples of the DFT span the
frequency range one period. Since the amplitude spectrum is even, only half the DFT samples carry the
information; the other half is composed of the complex conjugates of the first half.

FIGURE 52.5 The sampled signal s(m) and its DFT.

© 2000 by CRC Press LLC


The DFT may be calculated very efficiently by means of the fast (discrete) Fourier transform (FFT)
algorithm. It is this fact that makes the DFT an attractive means for FT estimation. The DFT provides
an estimate for the FT with frequency resolution of

2πf s 2π
∆f = = (52.24)
N T
where T is the duration of the data window. The resolution may be improved by using a longer window.
In cases where it is not possible to have a longer data window, e.g., because the signal is not stationary,
zero padding may be used. The sequence may be augmented with zeroes:

( ) {( ) () ( )
s A m = s 0 , s 1 , …, s N − 1 , 0, …, 0 } (52.25)

The zero padded sequence sA(m), m = 0, 1, … , L – 1, contains N elements of the original sequence
and L – N zeroes. It can be shown that its DFT represents the samples of the FT with an increased
resolution of ∆f = 2 πfsL–1.

52.6 Data Windows


Calculation of the various functions previously defined, such as the correlation function, requires knowl-
edge of the signal from minus infinity to infinity. This is, of course, impractical because the signal is not
available for long durations and the results of the calculations are expected at a reasonable time. We
therefore do not use the signal itself but the windowed signal.
A window w(t) is defined as a real and even function that is also time-limited:

()
w t =0 ∀t > T 2

The FT of a window W(ω) is thus real and even and is not band-limited.
Multiplying a signal by a window will zero the signal outside the window duration (the observation
period) and will create a windowed, time-limited signal sw(t):

() () ()
sw t = s t w t (52.26)

In the frequency domain, the windowed signal will be

() () ()
Sw ω = S ω ∗ W ω (52.27)

where (*) is the convolution operator. The effect of windowing on the spectrum of the signal is thus the
convolution with the FT of the window. A window with very narrow spectrum will cause low distortions.
A practical window has an FT with a main lobe, where most of its energy is located, and sidelobes, which
cover the frequency axis. The convolution of the sidelobes with the FT of the signal causes distortions
known as spectral leakage. Many windows have been suggested for a variety of applications.
The simplest window is the rectangular (Dirichlet) window; in its discrete form it is given by w(m) =
1, m = 0, 1, … , N – 1. A more useful window is the Hamming window, given by

 2π 
( )
w m = 0.54 − 0.46 cos  m ; m = 0, 1, …, N − 1
N 
(52.28)

The Hamming window was designed to minimize the effects of the first sidelobe.

© 2000 by CRC Press LLC


52.7 Short-Time Fourier Transform (STFT)
The Fourier analysis discussed in preceding sections assumed that the signal is stationary. Unfortunately,
most signals are nonstationary. A relatively simple way to deal with the problem is to divide the signal
into short segments. The segments are chosen such that each one by itself can be considered a windowed
sample of a stationary process. The duration of the segments has to be determined either by having some
a priori information about the signal or by examining its local characteristics. Depending on the signal
and the application, the segments may be of equal or different duration.
We want to represent such a segmented signal in the frequency domain. We define the short-time
Fourier transform (STFT):

( ) { ( ) ( )} ∫ s(t )w(t − τ)e



− jωt
STFTs ω, τ = F s t w t − τ = dt (52.29)
−∞

The window is shifted on the time axis to t = τ so that the FT is performed on a windowed segment in
the range τ – (T/2) ≤ t ≤ τ + (T/2). The STFT describes the amplitude and phase-frequency distributions
of the signal in the vicinity of t = τ.
In general, the STFT is a two-dimensional, time-frequency function. The resolution of the STFT on
the time axis depends on the duration T of the window. The narrower the window, the better the time
resolution. Unfortunately, choosing a short-duration window means a wider-band window. The wider
the window in the frequency domain, the larger the spectral leakage and hence the deterioration of the
frequency resolution. One of the main drawbacks of the STFT method is the fact that the time and
frequency resolutions are linked together. Other methods, such as the wavelet transform, are able to better
deal with the problem.
In highly nonstationary signals, such as speech signals, equal-duration windows are used. Window
duration is on the order of 10 to 20 ms. In other signals, such as the EEG, variable-duration windows
are used. In the EEG, windows on the order of 5 to 30 s are often used.
A common way for representing the two-dimensional STFT function is by means of the spectrogram.
In the spectrogram, the time and frequency axes are plotted, and the STFT PSD value is given by the
gray-scale code or by a color code. Figure 52.6 depicts a simple spectrogram. The time axis is quantized
to the window duration T. The gray scale codes the PSD such that black denotes maximum power and
white denotes zero power. In Figure 52.6, the PSD is quantized into only four levels of gray. The
spectrogram shows a signal that is nonstationary in the time range 0 to 8T. In this time range, the PSD
possesses a peak that is shifted from about 0.6fs to about 0.1fs at time 0.7T. From time 0.8T, the signal
becomes stationary with a PSD peak power in the low-frequency range and the high-frequency range.

FIGURE 52.6 A spectrogram.

© 2000 by CRC Press LLC


52.8 Spectral Estimation
The PSD is a very useful tool in biomedical signal processing. It is, however, impossible to calculate, since
it requires infinite integration time. Estimation methods must be used to acquire an estimate of the PSD
from a given finite sample of the process under investigation. Many algorithms for spectral estimation
are available in the literature [Kay, 1988], each with its advantages and drawbacks. One method may be
suitable for processes with sharp spectral peaks, while another will perform best for broad, smoothed
spectra. An a priori knowledge on the type of PSD one is investigating helps in choosing the proper
spectral estimation method. Some of the PSD estimation methods will be discussed here.

The Blackman-Tukey Method


This method estimates the PSD directly from its definition (Eq. 52.15) but uses finite integration time
and an estimate rather than the true correlation function. In its discrete form, the PSD estimation is

() ∑ rˆ (m)e
M
− jωmTs
Sˆ xx ω = Ts xx
m = −M
(52.30)
N − i −1

( )
rˆxx m =
1
N ∑ x(m + i) x(i)
i=0

where N is the number of samples used for the estimation of the correlation coefficients, and M is the
number of correlation coefficients used for estimation of the PSD. Note that a biased estimation of the
correlation is employed. Note also that once the correlations have been estimated, the PSD may be
calculated by applying the FFT to the correlation sequence.

The Periodogram
The periodogram estimates the PSD directly from the signal without the need to first estimate the
correlation. It can be shown that

 2
() 1
∫ ()

T
− jωt
S xx ω = lim E  xte dt  (52.31)
T→ ∞
 2T −T


The PSD presented in Eq. (52.31) requires infinite integration time. The periodogram estimates the PSD
from a finite observation time by dropping the lim operator. It can be shown that in its discrete form,
the periodogram estimator is given by

() { ( )}
2
Ts
Ŝ xx ω = DFT x m (52.32)
N

The great advantage of the periodogram is that the DFT operator can very efficiently be calculated by
the FFT algorithm.
A modification to the periodogram is weighted overlapped segment averaging (WOSA). Rather than
using one segment of N samples, we divide the observation segment into shorter subsegments, perform
a periodogram for each one, and then average all periodograms. The WOSA method provides a smoother
estimate of the PSD.

© 2000 by CRC Press LLC


FIGURE 52.7 Time-series model for the signal s(m).

Time-Series Analysis Methods


Time-series analysis methods model the signal as an output of a linear system driven by a white source.
Figure 52.7 depicts this model in its discrete form. Since the input is a white noise process (with zero
mean and unity variance), the PSD of the signal is given by

() ()
2
Sss ω = H ω (52.33)

The PSD of the signal may thus be represented by the system’s transfer function. Consider a general
pole-zero system with p poles and q zeros [ARMA(p, q)]:

∑b z i
−i

H z = () i=0
p
(52.34)
1+ ∑a zi =1
i
−i

Its absolute value evaluated on the frequency axis is

2
q

∑b z i
−i

()
2 i=0
H ω = 2
(52.35)
p

1+ ∑a zi =1
i
−i

z =e − jωTs

Several algorithms are available for the estimation of the model’s coefficients. The estimation of the
ARMA model parameters requires the solution of a nonlinear set of equations. The special case of q =
0, namely, an all-pole model [AR(p)], may be estimated by means of linear equations. Efficient AR
estimation algorithms are available, making it a popular means for PSD estimation. Figure 52.8 shows
the estimation of EMG PSD using several estimation methods.

52.9 Signal Enhancement


The biomedical signal is very often a weak signal contaminated by noise. Consider, for example, the
problem of monitoring the ECG signal. The signal is acquired by surface electrodes that pick up the electric
potential generated by the heart muscle. In addition, the electrodes pick up potentials from other active
muscles. When the subject is at rest, this type of noise may be very small, but when the subject is an athlete

© 2000 by CRC Press LLC


FIGURE 52.8 PSD of surface EMG. (Upper trace) Blackman-Tukey (256 correlation coefficients and 256 padding
zeroes). (Middle trace) Periodogram (512 samples and 512 padding zeroes). (Lower trace) AR model (p = 40).

performing some exercise, the muscle noise may become dominant. Additional noise may enter the system
from electrodes motion, from the power lines, and from other sources. The first task of processing is
usually to enhance the signal by “cleaning” the noise without (if possible) distorting the signal.
Assume a simple case where the measured signal x(t) is given by

() () ()
x t = s t +n t () () ()
X ω =S ω +N ω (52.36)

where s(t) is the desired signal and n(t) is the additive noise. For simplicity, we assume that both the
signal and noise are band-limited, namely, for the signal, S(ω) = 0, for ωmax ≤ ω, ωmin ≥ ω. Figure 52.9

© 2000 by CRC Press LLC


FIGURE 52.9 Noisy signal in the frequency domain.

depicts the PSD of the signal in two cases, the first where the PSD of the signal and noise do not overlap
and the second where they do overlap (for the sake of simplicity, only the positive frequency axis was
plotted). We want to enhance the signal by means of linear filtering. The problem is to design the linear
filter that will provide best enhancement. Assuming we have the filter, its output, the enhanced signal,
is given by

( ) ( ) ( ) Y (ω ) = X (ω ) H (ω )
y t = x t ∗h t (52.37)

where y(t) = ŝ(t) + no(t) is the enhanced output, and h(t) is the impulse response of the filter. The solution
for the first case is trivial; we need an ideal bandpass filter whose transfer function H(ω) is

1
()
H ω =
0
ω min < ω < ω max
otherwise
(52.38)

Such a filter and its output are depicted in Fig. 52.10.


As is clearly seen in Fig. 52.10, the desired signal s(t) was completely recovered from the given noisy
signal x(t). Practically, we do not have ideal filters, so some distortions and some noise contamination
will always appear at the output. With the correct design, we can approximate the ideal filter so that the
distortions and noise may be as small as we desire. The enhancement of overlapping noisy signals is far
from being trivial.

52.10 Optimal Filtering


When the PSD of signal and noise overlap, complete, undistorted recovery of the signal is impossible.
Optimal processing is required, with the first task being definition of the optimality criterion. Different
criteria will result in different solutions to the problem. Two approaches will be presented here: the Wiener
filter and the matched filter.

© 2000 by CRC Press LLC


FIGURE 52.10 (a) An ideal bandpass filter. (b) Enhancement of a nonoverlapping noisy signal by an ideal bandpass
filter.

Minimization of Mean Squared Error: The Wiener Filter


Assume that our goal is to estimate, at time t + ξ, the value of the signal s(t + ξ), based on the observations
x(t). The case ξ = 0 is known as smoothing, while the case ξ > 0 is called prediction.
We define an output error ε(t) as the error between the filter’s output and the desired output. The
expectation of the square of the error is given by

{ (t )} = E [s(t + ξ) − y(t + ξ)] 


2
2
E 

 2  (52.39)

( ) ∫ ()( )

 
= E  s t + ξ − h τ x t − τ dτ  
 −∞  

The integral term on the right side of Eq. (52.39) is the convolution integral expressing the output of
the filter.
The minimization of Eq. (52.39) with respect of h(t) yields the optimal filter (in the sense of minimum
squared error). The minimization yields the Wiener-Hopf equation:

© 2000 by CRC Press LLC


( ) ∫ h(η)r (τ ⋅ −η)dη

rsx τ + ξ = xx (52.40)
−∞

In the frequency domain, this equation becomes

()
Ssx ω e jωξ = H opt ω S xx ω () () (52.41)

from which the optimal filter Hopt(ω) can be calculated:

H opt ω =()
Ssx ω ( )e jωξ
=
()
Ssx ω
e jωξ
(ω ) () ()
(52.42)
S xx Sss ω + Snn ω

If the signal and noise are uncorrelated and either the signal or the noise has zero mean, the last equation
becomes

H opt ω = () ()
Sss ω
e jωξ
() ()
(52.43)
Sss ω + Snn ω

The optimal filter requires a priori knowledge of the PSD of noise and signal. These are very often not
available and must be estimated from the available signal. The optimal filter given in Eqs. (52.42) and
(52.43) is not necessarily realizable. In performing the minimization, we have not introduced a constraint
that will ensure that the filter is causal. This can be done, yielding the realizable optimal filter.

Maximization of the Signal-to-Noise Ratio: The Matched Filter


The Wiener filter was optimally designed to yield an output as close as possible to the signal. In many
cases we are not interested in the fine details of the signal but only in the question whether the signal
exists at a particular observation or not. Consider, for example, the case of determining the heart rate
of a subject under noisy conditions. We need to detect the presence of the R wave in the ECG. The exact
shape of the wave is not important. For this case, the optimality criterion used in the last section is not
suitable. To find a more suitable criterion, we define the output signal-to-noise ratio: Let us assume that
the signal s(t) is a deterministic function. The response of the filter, ŝ(t) = s(t) * h(t), to the signal is also
deterministic. We shall define the output signal-to-noise ratio

SNR o t =() ()
sˆ t
E {n (t )}
(52.44)
2
o

as the optimality criterion. The optimal filter will be the filter that maximizes the output SNR at a certain
given time t = to . The maximization yields the following integral equation:

∫ h(ξ)r (τ − ξ)dξ = αs(t − τ)


T

nn o o ≤ τ ≤T (52.45)
0

where T is the observation time and α is any constant. This equation has to be solved for any given noise
and signal.

© 2000 by CRC Press LLC


A special important case is the case where the noise is a white noise so that its autocorrelation function
is a delta function. In this case, the solution of Eq. (52.45) is

()
hτ =
1
N
(
s to − τ ) (52.46)

where N is the noise power. For this special case, the impulse response of the optimal filter has the form
of the signal run backward, shifted to the time to . This type of filter is called a matched filter.

52.11 Adaptive Filtering


The optimal filters discussed in the preceding section assumed the signals to be stationary with known
PSD. Both assumptions rarely occur in reality. In most biomedical applications, the signals are nonsta-
tionary with unknown PSD. To enhance such signals, we require a filter that will continuously adjust
itself to perform optimally under the changing circumstances. Such a filter is called an adaptive filter
[Widrow & Stearns, 1985].
The general description of an adaptive filter is depicted in Fig. 52.11. The signal s(t) is to be corrected
according to the specific application. The correction may be enhancement or some reshaping. The signal
is given in terms of the noisy observation signal x(t). The main part of the system is a filter, and the
parameters (gain, poles, and zeroes) are controllable by the adaptive algorithm. The adaptive algorithm
has some a priori information on the signal and the noise (the amount and type of information depend
on the application). It also has a correction criterion, according to which the signal is operating. The
adaptive algorithm also gets the input and output signals of the filter so that its performance can be
analyzed continuously.
The adaptive filter requires a correction algorithm. This can best be implemented digitally. Most
adaptive filters therefore are implemented by means of computers or special digital processing chips.
An important class of adaptive filters requires a reference signal. The knowledge of the noise required
by this type of adaptive filter is a reference signal that is correlated with the noise. The filter thus has two
inputs: the noisy signal x(t) = s(t) + n(t) and the reference signal nR(t). The adaptive filter, functioning

FIGURE 52.11 Adaptive filter, general scheme.

© 2000 by CRC Press LLC


as a noise canceler, estimates the noise n(t) and, by subtracting it from the given noisy input, gets an
estimate for the signal. Hence

( ) ( ) ( ) ( ) [ ( ) ( )] ( )
y t = x t − nˆ t = s t + n t − nˆ t = sˆ t (52.47)

The output of the filter is the enhanced signal. Since the reference signal is correlated with the noise,
the following relationship exists:

() () ()
NR ω = G ω N ω (52.48)

which means that the reference noise may be represented as the output of an unknown filter G(ω). The
adaptive filter estimates the inverse of this unknown noise filter and from its estimates the noise:

() { ( ) ( )}
nˆ t = F −1 Gˆ −1 ω N R ω (52.49)

The estimation of the inverse filter is done by the minimization of some performance criterion. There
are two dominant algorithms for the optimization: the recursive least squares (RLS) and the least mean
squares (LMS). The LMS algorithm will be discussed here.
Consider the mean square error

{ ( )} = E{y (t )} = E  s(t ) + [n(t ) − nˆ(t )] 


 2
E ε2 t 2

(52.50)

{ ( )} [ ( ) ( )]
 
2
= E s 2 t + E  n t − nˆ t 
 

The right side of Eq. (52.50) is correct, assuming that the signal and noise are uncorrelated. We are
searching for the estimate Ĝ–1(ω) that will minimize the mean square error: E{[n(t) – n̂(t)]2}. Since the
estimated filter affects only the estimated noise, the minimization of the noise error is equivalent to the
minimization of Eq. (52.50). The implementation of the LMS filter will be presented in its discrete form
(see Fig. 52.12).
The estimated noise is

( ) ∑v w = v
n̂ R m = i i
T
m w (52.51)
i=0

where

[ ] [ ( )
v Tm = v0 , v1, …, v p = 1, n R m − 1 , …, n R m − p ( )]
(52.52)
w T
= [w , w , …, w ]
0 p

The vector w represents the filter. The steepest descent minimization of Eq. (52.50) with respect to the
filter’s coefficients w yields the iterative algorithm

w j +1 = w j + 2µ j v j (52.53)

© 2000 by CRC Press LLC


FIGURE 52.12 Block diagram of LMS adaptive noise canceler.

where µ is a scalar that controls the stability and convergence of the algorithm. In the evaluation of
Eq. (52.53), the assumption

∂E { }≅ ∂
2
j 2
j
(52.54)
∂wk ∂wk

was made. This is indeed a drastic approximation; the results, however, are very satisfactory. Figure 54.12
depicts the block diagram of the LMS adaptive noise canceler.
The LMS adaptive noise canceler has been applied to many biomedical problems, among them
cancellation of power-line interferences, elimination of electrosurgical interferences, enhancement of fetal
ECG, noise reduction for the hearing impaired, and enhancement of evoked potentials.

52.12 Segmentation of Nonstationary Signals


Most biomedical signals are nonstationary, yet the common processing techniques (such as the FT) deal
with stationary signals. The STFT is one method of processing nonstationary signals, but it does require,
however, the segmentation of the signal into “almost” stationary segments. The signal is thus represented
as a piecewise-stationary signal.
An important problem in biomedical signal processing is efficient segmentation. In very highly non-
stationary signals, such as the speech signal, short, constant-duration (of the order of 15 ms) segments
are used. The segmentation processing in such a case is simple and inexpensive. In other cases such as
the monitoring of nocturnal EEG, a more elaborate segmentation procedure is called for because the
signal may consist of “stationary” segments with very wide duration range. Segmentation into a priori
fixed-duration segments will be very inefficient in such cases.

© 2000 by CRC Press LLC


FIGURE 52.13 Adaptive segmentation of simulated EEG. First 2.5 seconds and last 2.5 seconds were simulated by
means of different AR models. (Lower trace) SEM calculated with fixed reference window. A new segment has been
detected at t – 2.5. [From Cohen, 1986, with permission.]

Several adaptive segmentation algorithms have been suggested. Figure 52.13 demonstrates the basic
idea of these algorithms. A fixed reference window is used to define an initial segment of the signal. The
duration of the reference window is determined such that it is long enough to allow a reliable PSD
estimate yet short enough so that the segment may still be considered stationary. Some a priori infor-
mation about the signal will help in determining the reference window duration. A second, sliding window
is shifted along the signal. The PSD of the segment defined by the sliding window is estimated at each
window position. The two spectra are compared using some spectral distance measure. As long as this
distance measure remains below a certain decision threshold, the reference segment and the sliding
segment are considered close enough and are related to the same stationary segment. Once the distance

© 2000 by CRC Press LLC


measure exceeds the decision threshold, a new segment is defined. The process continues by defining the
last sliding window as the reference window of the new segment.
Let us define a relative spectral distance measure

() ()
2
S ω −S ω 
() ∫
ωM
R t
Dt ω =   dω
()
(52.55)
ωM  SR ω 
 

where SR(ω) and St(ω) are the PSD estimates of the reference and sliding segments, respectively, and ωM
is the bandwidth of the signal. A normalized spectral measure was chosen, since we are interested in
differences in the shape of the PSD and not in the gain.
Some of the segmentation algorithms use growing reference windows rather than fixed ones. This is
depicted in the upper part of Fig. 52.13. The various segmentation methods differ in the way the PSDs
are estimated. Two of the more well-known segmentation methods are the auto-correlation measure
method (ACM) and the spectral error measure (SEM).

References
Cohen A. 1986. Biomedical Signal Processing. Boca Raton, Fla, CRC Press.
Kay SM. 1988. Modern Spectral Estimation: Theory and Application. Englewood Cliffs, NJ, Prentice-Hall.
Proakis JG, Manolakis DG. 1988. Introduction to Digital Signal Processing. New York, Macmillan.
Weitkunat R (ed). 1991. Digital Biosignal Processing. Amsterdam, Elsevier.
Widrow B, Stearns SD. 1985. Adaptive Signal Processing. Englewood Cliffs, NJ, Prentice-Hall.

© 2000 by CRC Press LLC

You might also like