2.literature Review

ReviewCHAPTER
of Literature
2
REVIEW OF LITERATURE
The literature has been reviewed under the following sections:

Electroencephalogram (EEG), Electrooculogram (EOG) , Electromyogram (EMG) ,
Polygraph analysis , Fuzzy Inference System(FIS) , Neural Network Analysis and
Adaptive Neuro Fuzzy Inference System(ANFIS).
2.1 Electroencephalogram (EEG)

The electrical activity of the brain and cerebral cortex is very complicated and
produced by different types of neurons, nerves and nerve fibers. This occurs due to
large number of neurons, synapses and various properties of synapses, such as
inhibition, summation, facilitation etc., are integrated together to give rise rhythmic
electrical potential changes. EEGs are recording of these minute rhythmic electrical
potentials produced by cortical cell discharging of the brain. Cortical dendrites are the
sites of forest of dense units placed in the superficial layers of the cerebral cortex and
non-propagated hypopolarising local potential changes in the excitatory and
inhibitory axo-dendritic synapses. When the excitatory axo-dendritic synapses are
activated, current flow in and out in between the cell body and axo-dendritic endings,
causing a wave-like potential fluctuation.
Studies correlating surface events and intracellular events in cortical neurons
show a direct relationship of Post Synaptic Potentials (PSPs) and surface potentials.
The release of neurotransmitter at a synapse allows selective movements of ions
through the postsynaptic membrane. These transmembrane currents result in local
changes in ionic concentrations, both intracellularly and extracellularly, which results
in the formation of dipoles. These dipoles are responsible for the flow of extracellular
and intracellular currents and hence, gives rise to surface recorded EEG. Axons are
multidirectionally distributed. Thus, the net effect of action potentials on surface
electrode is zero. But in the case of evoked potentials, action potentials can contribute
11
Review of Literature
to the recordings because the stimulus evokes synchronous activity in a large number
of axons simultaneously [40]. EEG potentials are field potentials behaving in an ‘on-
off’ fashion due to dipoles. Thus they contrast with nerve or muscle action potentials
in which small electrically active areas of axons or muscle fibers move toward, pass
by and move away from the recording electrodes [41].
The pyramidal cells of the cerebral cortex are the most important neuronal
source of the EEG because their dendrites are long and arranged in parallel, thus PSPs
can occur in one part of a cell while other, relatively remote parts are quiescent; the
dipoles so formed cause currents to flow, which will have a greater effect on a surface
electrode [40]. Studies in the cats indicate that potentials originating on the surfaces
of the gyri oriented parallel to the skull have maximal effects on surface electrodes,
whereas those arising in the depths of the convolutions have minimal effects.
Intermediate sources have intermediate effects [42]. The glial cell contribution to the
EEG results from potential changes in cortical neurons. These cells have close
electrical coupling and this may allow them to spread the potentials to other cells,
amplifying those initiated by neurons and thus contributing to the DC potentials [43].
A model in which afferent action potentials arrive in synchronous volleys at
presynaptic terminals within a given area explains the rhythms of the EEG. Activity
in other similar areas is not necessarily synchronous; therefore, recordings made from
different parts of the scalp differ in appearances. Short volleys arriving in a given area
initiate PSPs, which overlap in time and appear at the surface as EEG waves of
corresponding duration and frequency. Longer volleys elicit longer PSPs and slower
waves.
2.1.1 Interpretation of the EEG by visual inspection
The ongoing, continuous voltage changes that comprise the EEG consist of
Back-Ground Rhythms (BGR), transient events and artifacts. These characteristics
can be focal, multifocal or generalized.
12
2.1.2 Background rhythms (BGR)

BGR are the ongoing, continuous EEG activity, which vary significantly with
changes in state of sleep or arousal in normal animals. These have both frequency and
amplitude (20 to 200 µV; depending upon the vigilance state), which stay within
relatively narrow limits (about 0.5 Hz to 30 Hz.) [44]. Some normal BGR are
mentioned below.
Full arousal: Full arousal is recognizable by alert behavior, the animal responding
to environmental disturbances, making searching movements with the head, eyes or
ears, or even by attempting to arise from the recumbent state usually assumed during
recording. BGR during full arousal are usually of very low amplitude (< 20 µV), with
higher frequencies (15-25 Hz) predominating. Muscle artifact is common during
arousal.
Alpha rhythm: It can be recorded if the animal was held in a quiet ambience and
closed its eyes. Typically, a period of stable alpha rhythm had duration of about 8-15
seconds, after which it was replaced by higher frequencies when the animal opened its
eyes or was aroused, or by lower frequencies as the animal became sleepy [45, 46,
47].
Drowsiness: During drowsiness, animals are recumbent with their eyes partially or
completely closed. At this time they can be quite readily aroused by minimal
stimulation or small disturbances in the environment. The BGR consist of
predominant 6-8 or 6-10 Hz waves, with superimposed random slower and higher
frequencies. If undisturbed during drowsiness, most animals soon enter non-REM
sleep.
Non-REM sleep: During non-REM sleep, eyes are closed or nearly closed, and
stronger stimuli are needed to arouse them than during drowsiness. BGR at this time
are predominantly 2-4 Hz with lower amplitude superimposed activity, predominantly
in the 6-10 Hz range. Stages of light and deep non-REM sleep are often recognizable
by the higher amplitudes and lower frequencies during deep non-REM sleep.
13
REM sleep: During REM sleep, animals are recumbent with their eyes partially or
completely closed. With the onset of REM sleep, BGR change rapidly, losing
amplitude and increasing in frequency, so they resemble those of arousal.
2.1.3 Transient events

Transient events are events that occur for brief periods and consist of distinct
and superimposed background EEG rhythms. Paroxysmal discharges (PD) are
abnormal transient events that are associated with seizure disorders, ictally or
interictally. Spindles, K-complexes and V waves are normal EEG transients that are
of highest amplitude on the midline but are usually quite evident in electrodes located
laterally, where they are symmetrical in amplitude and frequency.
Spindles: Sleep spindles are repetitive sinusoidal waves. They begin at somewhat
lower amplitude, increase and then decrease in amplitude thus giving the outline of
the event a spindle shape. Spindles are recorded often during non-REM sleep. Sleep
spindles are bilaterally symmetrical and of highest amplitude in midline derivations,
but are visible in lateral derivations as well.
K-complexes: A K-complex consists of a spindle associated with a slow wave; the
spindle may appear before, during or after the slow wave, thus there is considerable
variation in the appearance of K-complexes [48]. K-complexes occur during non-
REM sleep. They are of highest amplitude and most clearly developed on the midline,
but are visible in lateral derivations.
Vertex waves: Vertex sharp transients (V waves) are responses to stimulation,
usually during drowsiness or sleep. V-waves can be differentiated from PD by the fact
that they do not have spike components, occur only during drowsiness, and are
always of highest amplitude on the midline and symmetrically distributed.
14
2.1.4 Artifacts
The most frequent artifacts encountered are those associated with muscle
potentials, head, limb or body movements and respiration etc. Less often, swallowing
or mandibular movements cause artifacts.
Muscle artifact: Muscle potentials are the most common and often the most
troublesome artifacts. They arise from the facial or masticatory muscles and cause
intermittent or continuous relatively high frequency activity that can partially or
completely obscure BGR.
Movement artifacts: The most common of these are associated with movements
of the head or other body parts, and respiratory movements. These artifacts arise from
movements that cause the electrodes or their connecting wires to move through the
ambient electromagnetic fields. Eye movements cause artifacts by this means and also
by a very different mechanism.
Respiratory artifact: It consists of slow waves that are synchronized with
movements of the body and head associated with inspiration and/or expiration. Head
movements, swallowing, and movements of the mandibles will introduce artifacts
also.
Eye movements: These are relatively slow waves in the EEG by volume
conduction of the EOG to the EEG electrodes. Recording of EOG helps in verifying
eye movement artifacts. Eye movements often are accompanied by eyelid movements
(e.g., blinking) and/or with movements of the scalp, either of which may be
associated with muscle potentials and/or electrode/wire movement artifacts.
2.1.5 Automated EEG analysis

Soon after the discovery of human EEG by Berger, attempts were made at
developing objective, quantitative methods to aid the interpretation of these complex,
noise-like looking waves [49]. Its use was first reported in veterinary medicine over
30 years ago and since then EEG characteristics of a variety of diseases have been
15
reported [50, 51, 52]. With the advent of automated quantitative computer analysis,
EEG is now finding valuable applications in medicine in monitoring anesthetized or
critical-care patients, who have cerebral cortex damage or are at risk for it [53, 54,
55]. Initially analog devices were used but very quickly digital methods were taken
over after introduction of computers. Now a number of research oriented laboratories
employing EEG monitoring for epileptogenic spikes, sleep staging or monitoring
during anesthesia [56, 57, 58]. The single area where computerized EEG analysis has
made an impact, and indeed changed the field, is the analysis of evoked responses.
Averaging techniques have been developed to improve the signal to noise ratio of the
ERPs, which are buried in the much larger, ongoing EEG activity [59].
Statistical pattern recognition was one of the first methods used for the
comparison and to extract quantitative features from EEG intervals. For this purpose,
epochs were selected through optional feature selection technique to reduce the large
number of a priori defined features to a small set of descriptors. It has been reported
that the proportion of EEG intervals that can be regarded as stationary, decreases from
90% to 10% when the interval length is increased from 1 to 10 seconds [60].
There are two basic approaches to control the unwanted source of variability due
to the nonstationarity of the EEG:
1. Time varying modeling: This approach involves the use of autoregressive (AR)
modeling techniques. An AR model relates the present EEG sample value y(n) to a
weighted sum of p (the model order) previous sample values y(n-1) through y(n-p).
2. Segmentation: Important work in this area was done by using adaptive
segmentation technique [61, 62]. The EEG variability relies on segmentation
techniques [63], which divide the longer EEG interval into stationary segments.
Electroencephalographers based their diagnosis (mainly sleep) on record of
long EEG intervals. Therefore, the short segment produced by adaptive segmentation
technique [28] is ill sited, where a more global interpretation is required. A piece-wise
segmentation and clustering technique attempted to address this point. This method
was based on assumption that an EEG consists of a finite number of elementary
patterns. These patterns may be determined by dividing the EEG segments of 1s each,
16
followed by clustering procedure. The disadvantage of piece-wise segmentation and

clustering approach is that the temporal relationship between elementary patterns is
lost. Both in a spatial and temporal sense play an important role in visual EEG
interpretation [64]. Temporal context may be presented by representing EEG intervals
by means of transition matrices, rather than profiles. Long EEG may be required to
obtain reliable transition [65]. Alternative techniques to utilize spatio-temporal
contextual information for automatic EEG analysis are based on syntactic analysis
techniques [66]. This requires that an EEG be represented as a series of elementary
patterns, referred as tokens. The drawback of syntactic approach is the amount of
heuristics involved in grammar of this approach, which makes it virtually impossible
for electroencephalographers to inspect the grammar and to make the modifications
where appropriate.
2.1.5.1 FFT analysis of EEG

The shapes of complex waves and responses may depend on time relationships
or phases of their components as well as upon their amplitude [10]. Spectral analysis
using FFT provides this information. Spectral analysis permits the presentation of
large amount of data in a comprehensive manner and by selection of the components
for further processing can result in significant data reduction. Sometimes, the loss of
the second to second variations to amplitude and frequency of the signal becomes a
limitation [67, 68], time and frequency domain techniques are frequently being used.
Recorded EEG responses consist of short realizations of voltage against time
of length T seconds. These are converted to the frequency domain by determining the
discrete values of voltage of regular sampling intervals and processing them either by
computer software or by dedicated digital electronics [69]. The conversion is carried
out using the discrete Fourier transform (DFT), which transforms N discrete values to
N discrete complex DFT amplitude and phase values. Each complex value
corresponds to a particular harmonic frequency component, whose frequency is an
integral multiple of the first harmonic frequency.
17
f1 =1/(N - 1)Ts
where Ts is the sampling interval; and therefore
(N - 1)Ts = T
Each frequency component is separated from its neighbor by the frequency
interval 1/(N - 1)Ts. Multiplication of the DFT components by Ts/N converts them to
Fourier transform components, the amplitude of which has dimensions of volts per
hertz. A plot of the square of these amplitudes against frequency is referred to as an
energy spectrum or power spectrum. The FFT has been developed [70, 71, 72] to
calculate DFT efficiently and rapidly. The term FFT applies to one of a number of
computational algorithms by which the DFT can be evaluated for a signal consisting
of N equispaced samples, with N, a highly composite number, usually an integer
power of two. The principle behind all FFT methods is that the transform of the
sequence of N samples is decomposed into a number of transforms of sorter
sequences.
In the case of EEG responses, the response length T = (N - 1)Ts is
comparatively shorter so that frequency resolution of the spectrum is coarse. This
situation can be improved upon by adding N1 augmenting zero to the sampled data
values. This increases the frequency resolution 1/(N - 1)Ts to 1/(N + N1 - 1)Ts .
Addition to these augmenting zeros also helps to avoid errors. A potential
disadvantage is that the increased frequency resolution increases the variance of the
estimated amplitude [73].
The faithful representation of a continuous waveform by a number of evenly
space discrete values requires a sampling rate fs, known as Nyquist rate, which must
be twice the highest frequency present in the signal fmax, i.e. fs = 2 fmax. The computed
energy spectrum is symmetrical on opposite sides of fs/2. Amplitude components at
frequencies in excess of fs/2 will be folded about the component at fs/2, the folding
frequency, to appear at frequency below fs/2, thus distorting the spectrum. This
phenomenon is known as ‘aliasing’. Aliasing must be avoided in practice by either
sampling at or greater than the Nyquist rate, or by using a low pass anti-aliasing filter.
18
The data, which represent a length of T seconds of the signal, is obtained by

multiplying all the sampled values in the interval T by unity, while zero multiplies all
values outside this interval. This is equivalent to multiplying, or windowing, the
signal by a rectangular pulse, or window, of width T and height l. In this case the
sampled data values v(i) are given by the product of data values s(i) and the window
function value w(i):
v(i) = w(i)s(i)
This time domain product is equivalent to a convolution in the frequency
domain, so that the FFT value for the nth harmonic is given by:
k  N
V(n) =  W (n- k)S(k)
k  N
n = angular frequency of the nth harmonic

V(n) = Complex DFT component at frequency n
W(n) = amplitude of the spectrum of the window at frequency n
S(k) = true amplitude of the spectrum of the signal at frequency k
The important features of FFT are:
 The FFT shows frequency components, which are not contained in the raw EEG
data. The frequency and the relative magnitudes of these bands depend on the
central frequency and the epoch length. Reduction in the spurious frequency
components may be accomplished with various mathematical functions called
‘windows’. Hence, it is also called as Windowed Discrete Fourier Transform
(WDFT).
 The power spectrum and the phase spectrum can be obtained at discrete
frequencies with good frequency resolution.
 The highest frequency that can be easily observed.
 Low pass analog filters in EEG machine and high sampling frequency can avoid
the ‘aliasing’ problem.
 The choice of the epoch length involves compromises between the considerations
of frequency resolution and the stationarity of EEG signals. The longer epoch lengths
19
yield high resolution, but the EEG characterizes more likely to change during this
period. This makes the interpretation of changing EEG difficult. Wavelet analysis can
overcome this difficulty [74].
Recently, power spectrum of logarithmic power (Ceptrum and biceptrum)
analysis has also been tried for EEG [75]. ‘Bispectrum’, in which the Fourier
transform of the third order Cumulant sequence, was used [76] to
 extract information in the signal pertaining to deviation from the Gaussianity and
 detect the presence of nonlinear properties and quadratic phase coupling.
In another report, an improved procedure for complex demodulation in the
frequency domain by means of FFT has been applied for the frequency analyses of
EEG.
2.1.5.2 Wavelet transform

The Fourier transform (FT) tells whether a certain frequency component exists
or not. This information is independent of where in time this component appears. In
the Fourier transform, there is no resolution problem in the frequency domain and no
time resolution problem in the time domain, since we know the value of the signal at
every instant of time. Conversely, the time resolution in the Fourier transform, and the
frequency resolution in the time domain are zero, since we have no information about
them. Therefore, FT is not suitable for analyzing non-stationary signals, whose
frequency response changes in time. Since EEG signals are non-stationary in nature
the use of FT may not be sufficient to realize small changes and the analysis may
change depending on the length of data. So, for spectral analysis, it can be said
wavelet transform (WT) is more suitable than FT. the reason of this success depends
on the scaling and shifting properties of the mother wavelet.
Interest in wavelet analysis remained within a small, mainly mathematical
community during the rest of the 1980s with only a handful of scientific papers
coming out each year. The application of wavelet transform analysis in science and
engineering really began to take off at the beginning of the 1990s, with a rapid growth
20
in the numbers of researchers turning their attention to wavelet analysis during that
decade. The last few years have each seen the publication of over one thousand
refereed journal papers concerning application of the wavelet transform, and these
covering several disciplines.
As time-frequency signals analysis methods offer simultaneous interpretation
of the signal in both time and frequency which allows local, transients or intermittent
components to be elucidated. Many of the ideas behind wavelet transforms have been
in existence for a long time. However, wavelet transform analysis as we now know it
really began in the mid 1980s where it was developed to interrogate seismic signals
[77]. The activity in wavelets was initiated by Morlet’s work in geophysical signal
processing. A strong mathematical framework was built around the basic wavelet
idea and is documented in the recent book by Mayer [78, 79], which also shows
the connections to earlier results in operator theory.
Wavelet transform analysis has now been applied to a wide variety of
biomedical signals including the EMG, EEG, clinical sounds, respiratory patterns,
blood pressure trends and DNA sequences [80, 81].
WT is designed to give good time resolution and poor frequency resolution at
high frequencies and good frequency resolution and poor time resolution of low
frequency. This approach makes sense especially when the signal at hand has high
frequency components for short duration and low frequency components for long
duration. Advantage of using WT is 3-D representation of signals as amplitude,
frequency and time. The 3-D representation of wavelet subspectral components is
more convenient for pathological cases.
Due to the ability of WT to elucidate simultaneously local spectral and
temporal information from a signal in a more flexible way than the short time Fourier
transform (STFT) by employing a window of variable width, it has emerged over
recent years as the most favoured tool by researchers for analyzing problematic
signals across a wide variety of areas in science, engineering and medicine [82]. Thus
a wavelet transforms produce a time- frequency decomposition of the signal which
separates individual signal components more effectively than the traditional STFT.
21
Wavelet transforms as they are in use today come in essentially two distinct
classes: the continuous wavelet transform and the discrete wavelet transform. These
are now reviewed separately. The continuous wavelet transform is computed by
changing the scale of the analysis window, shifting the window in time, multiplying
by the signal, and integrating over all times. In the discrete case, filters of different
cut-off frequencies are used to analyze the signal at different scales. The signal is
passed through a series of high pass filters to analyze the high frequencies, and it is
passed through a series of low pass filters to analyze the low frequencies. The
resolution of the signals, which is a measure of the amount of detail information in the
signal, changed by the filtering operations, and the scale is changed by the
upsampling operation.
2.1.5.3 Continuous wavelet transform (CWT)

The continuous wavelet transform (CWT) is a time-frequency analysis
method, which differs from the more traditional short time Fourier transform (STFT)
by allowing arbitrary high localization in time of high frequency signal features. The
CWT does this by having a variable window width, which is related to the scale of
observation-a flexibility that allows for the isolation of the high frequency features.
Another important distinction from the STFT is that the CWT is not limited to using
sinusoidal analyzing functions. Rather, a large selection of localized waveforms can
be employed as long as they satisfy predefined mathematical criteria (described
below). The wavelet transform of a continuous time signal, x(t), is defined as:

1 t b
 x(t )

T ( a, b)   dt (1)
a   a 
where ψ*(t) is the complex conjugate of the analyzing wavelet function ψ(t),
‘a’ is the dilation parameter of the wavelet and ‘b’ is the location parameter of the
wavelet. In order to be classified as a wavelet, a function must satisfy certain
mathematical criteria. These are:
22
(1) It must have finite energy:


E    (t ) dt  
2
(2)

(2) If ˆ ( f ) is the Fourier transform of ψ(t), i.e.


ˆ ( f )   (t ) e i ( 2f )t dt (3)

then the following condition must hold:


ˆ ( f )
2
Cg  
0
f
df   (4)
where, Cg is called the admissibility constant. The value of Cg depends on the chosen
wavelet.
The contribution to the signal energy at the specific ‘a’ scale and ‘b’ location
is given by the two dimensional wavelet energy density function known as the
scalogram (analogous to the spectrogram-the energy density surface of the STFT)
E ( a, b)  T ( a, b)
2
(5)
The scalogram can be integrated across ‘a’ and ‘b’ to recover the total energy in the
signal using the admissibility constant, Cg, as follows:
 
1 da
E   T ( a, b )
2
db (6)
Cg  0 a2
The relative contribution to the total energy contained within the signal at a specific
‘a’ scale is given by the scale-dependent energy distribution:

1
E (a)   T ( a, b)
2
db (7)
Cg 
The spectral components are inversely proportional to the dilation, i.e. f α 1/a, the
frequency associated with a wavelet of arbitrary ‘a’ scale is given by
fc
f  (8)
a
23
where, the characteristic frequency of the mother wavelet (the archetypal wavelet at
scale a =1 and location b = 0), fc, becomes a scaling constant and f is the
representative or frequency for the wavelet at arbitrary scale ‘a’.
Finally, as with the Fourier transform, the original signal may be reconstructed
using an inverse transform:
 
1 da db
x(t ) 
Cg   T (a, b)
 0
a ,b (t )
a2
(9)
In practice, a fine discretization of the CWT is computed where usually the ‘b’
location is discretized at sampling interval and the ‘a’ scale is discretized
logarithmically.
As the wavelet transform given by equation (1) is a convolution of the signal
with a wavelet function we can use the convolution theorem to express the integral as
a product in Fourier space, i.e.

1
 xˆ ( )ˆ ( )d

T ( a, b)  (10a)
2
a ,b

where
ˆ a,b ( )  a ˆ  (a )eib (10b)
is the Fourier spectrum of the analyzing wavelet at scale ‘a’ and location ‘b’. In this
way, a fast Fourier transform (FFT) algorithm can be employed in practice to speed
up the computation of the wavelet transform.
A vast amount of repeated information is contained within this redundant
representation of the continuous wavelet transform T(a,b). This can be condensed
considerably by considering only local maxima and minima of the transform. Two
definitions of these maxima are commonly used in wavelet analysis practice, these
are:
(1) Wavelet ridges, defined as
d ( T ( a, b) 2 / a )
0 (11)
da
are used for the determination of instantaneous frequencies and amplitudes of signal
components [83, 84].
24
(2) Wavelet modulus maxima, defines as

2
d T ( a, b)
0 (12)
db
are used for locating and characterizing similarities in the signal
2.1.5.4 Discrete wavelet transform (DWT)

In its most common form, the DWT employs a dyadic grid (integer power of
two scaling in ‘a’ and ‘b’) and orthonormal wavelet basis functions and exhibits zero
redundancy. The transform integral remains continuous for the DWT but is
determined only on a discretized grid of ‘a’ scales and ‘b’ locations. In practice, the
input signal is treated as an initial wavelet approximation to the underlying
continuous signal from which, using a multiresolution algorithm, the wavelet
transform and inverse transform can be computed discretely, quickly and without loss
of signal information.
A natural way to sample the parameters ‘a’ and ‘b’ is to use a logarithmic
discretization of the ‘a’ scale and link this, in turn, to the size of steps taken between
‘b’ locations. To link ‘b’ to ‘a’, we move in discrete steps to each location ‘b’, which
are proportional to the ‘a’ scale. This kind of discretization of the wavelet has the
form
1  1  nb0 a0m 
 m, n (t )    (13)
a0m  a0m 
where, the integers m and n control the wavelet dilation and translation respectively;
a0 is a specified fixed dilation step parameter set at a value greater than 1, and b0 is the
location parameter, which must be greater than zero. A common choice for discrete
wavelet parameters a0 and b0 are 2 and 1 respectively. This power-of-two logarithmic
scaling of both the dilation and translation steps is known as the dyadic grid
arrangements. Substituting a0=2 and b0=1 into equation (13) we see that the dyadic
grid wavelet can be written compactly, as
25
 m, n (t )  2m / 2 (2m t  n) (14)
From here on,  m, n (t ) will be used only to denote dyadic grid scaling with a0=2 and
b0=1. Discrete dyadic grid wavelets are usually chosen to be orthonormal, i.e. they are
both orthogonal to each other and are normalized to have unit energy. This is
expressed as

1 if m  m' and n  n'
 m,n (t ) m ,n (t )dt  

' '
0 otherwise.
(15)
This means that the informations stored in a wavelet coefficients Tm,n obtained from
the wavelet transform is not repeated elsewhere and allows for the complete
regeneration of the original signal without redundancy.
Orthonormal dyadic discrete wavelets are associated with scaling functions and their
dilation equations. The scaling function is associated with the smoothing of the signal
and has the same form as the wavelet, given by
m.n (t )  2 m / 2  (2 m t  n) (16)
They have the property



0, 0 (t )dt  1 (17)
where, 0,0 (t )   (t ) is sometimes referred to as the mother scaling function or mother
wavelet. The scaling function can be convolved with the signal to produce
approximation coefficients as follows

Sm, n   x(t )

m, n (t )dt (18)
The approximation coefficients at a specific scale m are collectively known as the

discrete approximation of the signal at that scale. A continuous approximation of the
signal at scale m can be generated by summing a sequence functions at this scale
factored by the approximation coefficients as follows

xm (t )  S
n  

m, n m, n (t ) (19)
26
where xm (t ) is a smooth, scaling-function-dependent version of the signal x(t) at scale

index m. A signal x(t) can then be represented using a combined series expansion
using both the approximation coefficients and the wavelet (detail) coefficients as
follows
 m0 
x(t )  
n  
Sm0 , nm0 , n (t )   T
m   n  
 m, n (t )
m, n (20)
The signal detail at scale m is defined as


d m (t )  T
n  
 m, n (t )
m, n (21)
hence equation (20) may be written as

m0
x(t )  xm0 (t )  d
m  
m (t ) (22)
From this equation it is easy to show that

xm 1 (t )  xm (t )  dm (t ) (23)
which tells us that if we add the signal detail at an arbitrary scale (index m0 to the
approximation at that scale), we get the signal approximation at an increased
resolution. This is called a multiresolution representation [85].
2.1.5.5 Wavelet applications in EEG

Application of wavelet transform has been popular for computer vision
problems from range detection to motion estimation [86]. An important application to
image coding called a pyramid [87] is closely related both to subband coding and to
wavelets. Mallat used concept of multiresolution analysis to define wavelets [88, 85,
89] and Daubechies constructed compactly supported orthonormal wavelets based on
iterations of discrete filters [90]. The relation of these filters to classical maximally
flat designs [91] was recently noted by Shensa [92]. In the signal processing
literature, work on filter banks goes back to subband coding of speech [93, 89].
Orthogonal filter bands were first derived by Smith and Barnwell [94] and
Mintzer [95] and were systematically studied by Vaidyanathan [96, 97]. The
biorthogonal case, especially the linear phase case, was also studied [98, 99, 100].
27
The features of the waveform in each scale using the wavelet transform
reflect the states of the EEG signal [101, 102, 103]. A comparison was made between
the quality of feature extraction of continuous wavelets using the standard numerical
techniques and more rapid algorithms utilizing both polynomial splines and multi-
resolution frameworks [104]. It again contrasts the difference between filtering with
and without the use of surrogate data to model background noise, demonstrates the
preservation of feature extraction with critical versus redundant sampling and perform
the analysis with wavelets of different shape. The study [105] suggested a new
processing system in which the EEG signals obtained from 31 patients were analyzed
using the method of the wavelet transform.
Wavelet transform has been well applied in the classification of three patterns
of EEG signals - Normal, Schizophrenia, and Obsessive-compulsive Disorder [106].
The architecture of the artificial neural network used in the classification is a three-
layered feedforward network, which implements the backpropagation algorithm.
Wavelet coefficients were used to train the network, which correctly classified over
66% of the normal class and 71% of the Schizophrenia.
The work [107] describes two feature extraction methods considered for neural
network classifiers. The first feature extraction method was based on translation-
invariant wavelet transform. The second feature extraction was based on tree-
structural multi-rated filter banks [108, 109, 110]. Another method [111] for the
analysis of physiological time-series uses wavelet transform to analyze heart rhythm,
chest volume, and a blood oxygen saturation data from a patient suffering from sleep
apnea. Wavelet transform based, brain-state identification method which could form
the basis for forecasting a generalized epileptic seizure, has been reported [112]. This
method relies on the existence in the EEG of a pre-seizure state, with extractable
unique features.
In the analysis of rat electroencephalogram under slow wave sleep using
wavelet transform [31], component powers in different frequency bands were found
to be varied with time, and in about a quarter of the delta power percentage was less
28
than 50%. Dynamic state recognition and event-prediction are fundamental tasks in
biomedical signal processing.
Use of wavelet decomposition enables segmentation of EEG into standard
clinical bands [113]. The entropy of the wavelet coefficients in each level of
decomposition reflects the underlying statistics and the degree of bursting activity
associated with the recovery phenomena.
With the help of wavelet transform the analysis of the time-frequency
structure of spike-wave discharges (SWDS) in rats, a model of genetic absence
epilepsy [114] was carried out and frequency spectrum of the EEG records was
determined within the range form 1 to 20Hz. The time dynamics of the SWDs was
analyzed using fragments of record from several seconds to more than one minute in
length.
Several studies have been performed using wavelets to analyze EEG signals in
an attempts to find a biomarker for Alzheimer’s disease, which showed varying
degrees of success. Early diagnosis of Alzheimer’s disease [115] is found to be
preliminary, yet very promising.
Application of wavelet transform is found to be very much useful in automatic
recognition of vigilance state [32], where prediction of the level of drowsiness was
examined and delta, theta, alpha, and beta sub-frequencies of the EEG signals were
extracted by using the discrete wavelet transform technique. The wavelet spectrums
of the EEG signals were used as an input to a multilayer perceptron neural network.
29
2.2 Electrooculogram (EOG)

Electrooculography is a technique for the measurement of resting potential .A resting
potential appears because of potential difference generated by an electric dipole
formed by a positive cornea and a negative cornea. The resulting signal is called
EOG, which is essentially a record of the differences in voltage between the front and
back of the eye which is correlated with the eyeball movements and obtained by
electrodes placed on the skin near the eye. EOG signal varies from 0.001-0.3 mV in
terms of amplitude and 0.1 – 10 Hz in terms of frequency. EOG signal is measured by
placing Ag/AgCI electrodes around the eyes. Silver (Ag) – Silver Chloride (AgCI)
electrodes have been used, which produce low levels of junction potential, motion
artifacts and drift in the direct current signal [236]. Two channels of bipolar EOG
signal are acquired for analysis, which are horizontal channels and vertical channels.
The horizontal channel EOG reflects horizontal eyeball movements while the vertical
channel EOG reflects the vertical eyeball movements. Two disposable Ag/AgCI
electrodes are placed above and below the right eye to measure the vertical EOG
while two other electrodes were placed at the outer canthi to measure the horizontal
EOG. While recording signal PSNR (Peak Signal-to-Noise Ratio) is maintained as
high as possible.
Various features like energy entropy, Shannon’s entropy, Amplitude, R.M.S value
and Mean value has been taken to determine the activity feature of the EOG. Since,
frequency domain analysis is not required these features can suitably represents the
changes that can distinguish various sleep stages. Even in certain cases attempts have
been made to classify sleep using EOG only. EOG is also very much related to the
Rapid Eye Movement (REM) stage where eye movement is maximum and
corresponding EOG activity is also high.
30
2.3 Electromyogram (EMG)

EMG serves the purpose of evaluating physiological properties of muscular tissue
either at rest or during activity. By detecting the electrical potential of muscular tissue
one can say if it is relaxed or contracted. Data available for this signal is in the mV
order. Focusing on the frequency domain, the signal is considered to be comprised in
the 50 - 150 Hz band. It serves the purpose to inform what bandwidth is to be
evaluated, avoiding unnecessary frequencies or noise; since EMG frequency analysis
is not a common procedure [238]. For the manipulation of the EMG data it is
necessary to pay special attention to the fidelity of the signal. One must avoid
electrical noise, which is extremely common in EMG setups. For this purpose,
maximizing the PSNR (Peak Signal-to-Noise Ratio) must be performed with the
minimal distortion of the signal.
In our work an electrode is placed in the sub-mental chin muscles in order to detect
the signal, evaluate its activity and determine whether it is with or without muscular
tonus. Various features like energy entropy, Shannon’s entropy, Amplitude, R.M.S
value and Mean value has been taken to determine the activity feature of the EOG.
Since, frequency domain analysis is not required these features can suitably represents
the changes that can distinguish various sleep stages. Various Deep sleep stages leads
to muscular relaxation and changes the EMG values. These changes can help classify
sleep stages and also help in reinforcing the sleep stages result achieved using EEG.
31
2.4 Polygraph readings for sleep stage classification

Physiologically, sleep can be identified when there is a system change in
awake homeostasis. The following changes can be considered as markers for sleep
onset detection:-
 Cardiovascular
As per the sleep stage there can be a generalized vasodilatation leading to
reductions in heart rate, cardiac output and even blood pressure (commonly associated
with Non REM sleep stage). However, it can also take place in opposite direction
motivated by generalized event variability
 Respiration
A few neurons related to breathing stop firing in the deep sleep and there is
slight hypercapnia, a decrease in overall ventilation, and a decreased sensitivity to
inhaled CO2. In the NREM stage there is a slight hypoventilation because of
relaxation in upper airway muscles and a decrease in the firing of inspiratory neurons,
which shows a decreased sensitivity towards stimuli. Accordingly pCO2 levels rise
while pO2 levels fall. At this stage breathing is under mechanical and chemical
feedback control.
During Rapid Eye Movement (REM) sleep stage there is a higher and variable
respiratory rate and it appears as if different processes maintain breathing during
REM sleep, and this sleep is not driven by vagal signals, peripheral or central
chemoreceptors, which can be driven by higher cortical control and which may
explain the variable rate. As REM sleep is associated with a loss of muscle tone, there
is an increased resistance in the upper airway.
 Nervous System
32
Neuronal wave patterns vary in terms of frequency and amplitude, as well as

the depolarization origin, suggesting sporadic and different activation areas. Further,
Postsynaptic inhibitions of motor neurons also take place, affecting the muscular
functions like contraction and tonus. More specifically, brain metabolism and the
discharge rate are decreased during NREM sleep. During SWS or NREM sleep, there
can be seen an active inhibition of the reticular activating system. The most relevant
neurons in this inhibition are located in the basal forebrain (anterior hypothalamus
and adjacent forebrain areas); a lesion of the basal forebrain leads to insomnia, while
electrical stimulation can lead a subject to fall asleep. The, dorsal raphe, thalamus and
nucleus tractus solitarius are also important areas in NREM sleep. There can also be
seen an increased parasympathetic activity which is similar to relaxed wakefulness;
sympathetic drives remain at about the same level as during relaxed wakefulness.
During REM sleep stage, many parts of the brain (limbic lobe, visual cortex)
show increased firing rate and metabolism. Brain transection studies have established
that the pons are necessary and sufficient to generate the basic phenomena of REM
sleep. During tonic REM sleep, parasympathetic activities remains the same as during
NREM sleep, but sympathetic activities tends to decrease, resulting in an overall
predominance of parasympathetic activity. However, during phasic REM sleep, both
sympathetic and parasympathetic activity tends to increase; sympathetic activation is
mostly favored.
 Endocrinology
Deep sleep stages are also associated with the elevated secretion of Growth
Hormones, predominantly in children. Different hormones are differently regulated
during sleep, e.g. cortisol decreases, prolactin increases. Thermoregulation also takes
place at sleep onset, where the body temperature set point is lowered and body
temperature falls. The body thus activates its heat loss mechanisms i.e. sweating to
cool down the body and reduce its temperature to a new set point.
33
With the main sleep physiological manifestations figured, the methods to analyze
the relevant information can be selected. Primarily, brain activity is monitored for
EEG, as well as other sensors focusing on easily measurable correlated variables like
EMG, EOG, and ECG etc. For this work EEG, EMG and EOG were chosen. Others
like ECG and pulse oximetry could probably also be used. Albeit, for the present
study they are not implemented as their data is known to exude a higher inter subject
variability, when compared with EEG, EMG and EOG data.
34
2.5 Fuzzy Inference System (FIS)

The most important contribution of fuzzy logic is a methodology for computing with
words/linguistics which can deal with imprecision and granularity easily. Our brain
can interpret and process the imprecise and incomplete sensor information which are
received from the perceptive organs. Analogous to that, the fuzzy set theory can also
provide a similar systematic approach to deal with such information linguistically
with words. Fuzzy theory can also be used to perform numerical computation by
using membership function for the specified linguistic labels.
FIS (Fuzzy Inference System) is based on the concepts of fuzzy set theory, fuzzy if-
then rules and fuzzy logic reasoning. The framing of the fuzzy rules forms the pivotal
component of FIS. Fuzzy logic is a very popular technique which has been widely
applied in different fields like robotics, data classification, expert system, automatic
control, decision making, time series analysis, pattern classification, system
identification etc [112] [131]. The basic structure of a FIS consists of three principal
components viz. a rule base comprising of the stipulated fuzzy rules, a database which
defines the membership functions of the fuzzy rules, and a reasoning mechanism
which can perform a fuzzy reasoning inference with respect to the rules so as to
derive a reasonable output or conclusion.
2.5.1 Fuzzy Inference System (FIS) steps

In order to analyze a fuzzy system whose outputs and inputs are described
using linguistic variables, the following steps have to be carried out:
 Fuzzification :-
The linguistic variables of the fuzzy rules can be expressed in terms of
fuzzy sets where these variables are defined in terms of the degree of association with
the stipulated membership functions. This method of evaluating the degree of
35
belongingness or association of the crisp input in the fuzzy set is called the
Fuzzification.
The membership functions may be Trapezoidal, Triangular, Gaussian or Bell shaped.
As the degree of the membership is used for further processing, considerable amount
of information may be lost during Fuzzification. This is because the procedure can be
seen as a nonlinear transformation of the inputs. For example in the case of
trapezoidal or triangular membership functions, the information is lost in the regions
of membership functions ,where the slope is zero and their resulting membership also
comes out to be zero ,as at these points the membership functions are not
differentiable.
Therefore fuzzy systems having trapezoidal or triangular membership function can
encounter problems of learning from data. Smoother membership functions like
Gaussian or Gaussian bell function may be used to overcome this difficulty of
learning.
 Aggregation :-
After evaluating the degree of each linguistic statement, they are
combined by logical operators such as OR and AND. The conjunction of these
linguistic statements is carried out with the help of logical t-conorm and the t-norm
operator to a large number of linguistic statements. Min and Max operators are used
for classification task. For the purpose of identification and approximation, the
product and algebraic product operators are better suited due to their smoothness and
differentiability. Similarly the difference operators and bounded sum offer several
advantages to some Neuro-fuzzy learning schemes.
 Activation :-
Here the degree of fulfillment of rule is used to evaluate the output
activations of the rules.
36
 Accumulation :-
The output activations of all the rules are accumulated together to give
rise to the fuzzy output of the system in this step.
 De-Fuzzification :-
If a crisp output of the system is required, the final fuzzy output has to
be De-Fuzzified. This can be achieved by different methods like bisector of area,
center of gravity, mean of maximum (mom), smallest (absolute) of maximum (som)
and largest (absolute) of maximum (lom).
2.5.2 Type of FIS system

The Fuzzy systems are of three principal types, namely:
 Mamdani fuzzy system :-

This type of system is also known as the linguistic fuzzy system. The
complexity of defuzzification of this system is higher when compared to other type of
fuzzy systems. This fuzzy system can take all the membership functions.
 Singleton Fuzzy system :-

The complexity of defuzzification of a mamdani fuzzy system can be
simplified by restricting the output of the system to a singleton membership function.
Since no integration needs to be carried out numerically, this results in reduction of
the computational demand for the evaluation and learning of the fuzzy system.
Therefore, a singleton fuzzy system is most widely applied in industry.
 Takagi-Sugeno Fuzzy system :-

This system may be considered to be an extension of the singleton
fuzzy system. Here the function f is not a fuzzy set. But the premise of a Takagi-
Sugeno fuzzy system is linguistically interpretable. For a dynamic process modelling
37
the Takagi-Sugeno models possess an excellent interpretation. A singleton fuzzy

system can be recovered from a Takagi-Sugeno fuzzy system if the function f is
chosen to be a constant. As the constant can be interpreted as a zeroth order Taylor
series expansion of the function f, hence it is also called the zeroth order Takagi-
Sugeno fuzzy system. In most of the applications, however, the first order Takagi-
Sugeno fuzzy system is used. Certain membership functions like trapezoidal and
triangular may fail to learn from input because of non-differentiability at zero slope
regions of the functions. Thus, mostly Gaussian functions are used in this type of
fuzzy system.
With the introduction of the fuzzy logic concept by Zadeh, research is continuing for
the application of fuzzy system theory for identification of system because in many
complex and ill-defined systems, where precise mathematical models are difficult to
arrive at, their fuzzy models can be constructed easily, which can reflect the
uncertainty of the system in a proper way.
38
2.6 Artificial Neural Networks (ANN)

Artificial neural network (ANN) models have been studied for many years with the
hope of achieving human-like performance in the fields of speech and image
recognition. Presently, these models represent an emerging technology rooted in
many disciplines including science and engineering. They are endowed with some
unique attributes: universal approximation (input-output mapping), the ability to learn
from and adapt to their environment, and the ability to invoke weak assumptions
about underlying physical phenomena responsible for the generation of the input data.
The research work on ANNs has been motivated right from its inception by
the recognition that the brain of human and other animals process data in an entirely
different way from the conventional digital computer. Typically, brain cells, i.e.,
neurons are five to six orders of magnitude slower than the silicon logic gates: events
in a silicon chip happen in the nanosecond range, whereas neural events happen in the
millisecond range. However, the brain makes up the slow rate of operation of a
neuron by having a staggering number of neurons with massive interconnection
between them. It is estimated that the human brain consists of about one hundred
billion neural cells, about the same number as the stars in our galaxy. Each neuron, on
an average, receive information from about ten thousand neighboring neurons, thus
making over 1015 connections (synapses) in the brain.
The brain is the highly complex, nonlinear, and parallel information-
processing system. The neuron, the basic information processing element in the
central nervous system, plays an important and diverse role in human sensory
processing, locomotion, control, and cognition (thinking, learning, adaptation,
perception etc.). The brain routinely accomplishes perceptual recognition tasks
(e.g., recognizing a familiar face embedded in an unfamiliar scene) in about
100-200 ms, whereas a task of much less complexity will take days on a huge
conventional computer [116]. The human brain is able to do complex tasks by its
ability to built up its own rules through experience from the very childhood. In its
most general form, an artificial neural network is a machine that is designed to model
the way in which the brain performs a particular task or function of interest.
39
The ANN is usually implemented using electronic components (digital or

analog), or simulated using softwares on a digital computer. To achieve good
performance, ANNs employ a massive interconnection of simple computing cells
called ‘neurons’, ‘processing units’ or ‘processing elements’.
“A neural network is a massively parallel distributed processor that has a
natural propensity for storing experimental knowledge and making it available for
use”. It resembles the brain in two aspects: (a) knowledge is acquired by the network
through a learning process (b) inter-neuron connection strengths known as synaptic
weights are used to store the knowledge.
The modification of synaptic weights provides the traditional method for the
design of ANNs. Such an approach is closest to linear adaptive filter theory, which is
already well established and successfully applied in many diverse fields such as
communications, control, radar, sonar, seismology, and biomedical engineering
[117, 118, 119]. The computational process carried out in the ANNs is as follows:
An artificial neuron or simply neuron receives inputs from a number of other
neurons or from an external stimulus. A weighted sum of these inputs constitutes the
argument to a nonlinear activation function. The three most often used functions are
hard limiting, threshold logic, and sigmoidal. The resulting value of the activation
function is the output of the neuron. This output gets distributed (or fanned out) along
weighted connections to other neurons. The actual manner in which these connections
are made (the topology) defines the flow of information in the network and is called
the architecture of the ANN. The method used to adjust the weights in the process of
the training the network is called the learning rule. That is, ANN systems are not
programmed; rather they are taught. The learning may be supervised or unsupervised.
In summary, the essential ingredients of a computational system based on ANNs are
the activation function, the architecture and the learning rule. It should be emphasized
that the computational models of this kind have only a metaphorical resemblance to
real brain.
Due to differences in one or all of these three ingredients, different structures
of ANNs are being explored for various applications. Computational differences in
40
ANNs arise from the different types of synaptic connections that are assumed to exist
among the neurons. These connections can be strictly feed-forward, laterally
connected, topologically ordered, feedforward/feedback, and hybrid. Some of the
important ANN structures include McCulloch-Pitts’ nerve nets [120], Rosenblatt’s
perceptron [121], adaptive resonance theory (ART) developed by Carpenter and
Grossberg [122], Fukushima’s Neocogntron [123,], cellular neural network of Chua
[124], multilayer perceptron [125], time-delay neural networks of Waibel [126],
counterpropagation networks of Nielsen [127], radial basis function networks [128],
bidirectional associative memory [129], Hopfield’s network [130], fuzzy multilayer
perceptron [131], and Kohonen’s associative memory [132].
2.6.1 Potential benefits of ANNs

It is apparent that an ANN derives its computing power through, first, its
massive parallel distributed structure and, second, its ability to learn and therefore
generalize; generalization refers to the ANN producing reasonable outputs for inputs
not encountered during training (i.e. learning). These two information processing
capabilities make it possible for ANNs to solve complex problems that are currently
intractable. Some of the useful properties and capabilities of the use of ANNs are
given below.
1. Nonlinearity: A neuron is basically a nonlinear device. Consequently, an ANN,
made up of interconnection of neurons, is itself nonlinear. Moreover, the nonlinearity
is of a special kind in the sense that it is distributed throughout the network.
Nonlinearity is a highly important property, particularly if the physical mechanism
responsible for the generation of an input signal (e.g., a speech signal) is inherently
nonlinear.
2. Input-Output mapping: In the ANN structure employing supervised learning,
the modification of the synaptic weights is carried out by applying a set of labeled
training examples. Each example consists of an unique input signal and the
corresponding desired response. The modification of weights is carried out so as to
minimize the difference between the desired response and the actual output of the
41
network produced by the input signal in accordance with appropriate statistical

criterion. Thus, the network learns from the examples by constructing an input-output
mapping for the problem at hand. ANNs are capable of forming complex input-output
mapping because of which these networks can be applied effectively in pattern
classification problem involving highly nonlinear decision boundaries. Further, the
ANNs have the ability to approximate any nonlinear continuous function to the
desired degree of accuracy. This ability has made these networks very useful in
modeling nonlinear systems.
3. Adaptability: ANNs have a built-in capability to adapt their synaptic weights to
change in the surrounding environment. In particular, an ANN trained to operate in a
specific environment can be easily retrained to deal with change in the operating
environmental conditions. Moreover, when it is operating in non-stationary
environment, an ANN can be designed to change its synaptic weights in real time.
This property makes the ANN an ideal tool for use in adaptive pattern classification,
signal processing and control.
4. Fault tolerance: An ANN implemented in hardware form, has the potential to
be inherently fault tolerant in the sense that its performance is degraded gracefully
under adverse operating conditions. For example, if a neuron or its connecting links
are damaged, recall of a stored pattern is impaired in quality. However, due to the
distributed nature of information in the network, the damage has to be extensive
before the overall response of the network is degraded seriously.
5. VLSI Implementability: The massively parallel nature makes an ANN ideally
suitable for implementation using VLSI technology. The VLSI technology provides a
means of capturing truly complex behavior in a highly hierarchical fashion, which
makes it possible to use an ANN as a tool for real time applications involving pattern
recognition, signal processing and control.
2.6.2 Learning process
42
Learning is one of the most important features of an ANN. All the knowledge
in the ANN is encoded in the interconnection weights, and the learning process
determines the weights. A weight represents the strength of association that is, the co-
occurrence of connected features, concepts, propositions, or events during a training
period. There are several schemes for classifying the ANN learning techniques. In one
of the schemes, the learning algorithms are divided into supervised, reinforcement
and unsupervised learning. In supervised learning, a teacher specifies the desired
output of the network, and the training data consists of input-output pairs. The most
popularly used backpropagation (BP) algorithm falls in this category. In
reinforcement learning, the input is not precise teaching input, but is rather only a
‘good’ or ‘bad’ performance rating. In other words, reinforcement is like supervised
learning, except that in supervised learning, the feedback provided to the network is
instructive, whereas in reinforcement learning, it is evaluative. In unsupervised
learning, the network attempts to develop internal models to capture the patterns of
regularity in the input signal. A representative of this class is competitive learning, in
which the input vectors are classified into disjoint clusters such that elements of a
cluster are similar to each other in some sense. The method is called competitive,
because during training, a set of hidden units compete with each other to become
active and perform weight changes. The winning unit increases its weights on those
links with high input values and decreases them on those links with low input value.
43
Because there is usually the constraint that the sum of the weights of the network is to
be a constant, this process allows the winning unit to be selective to some input
patterns. Carpenter and Grossberg’s Adaptive Resonance Theory (ART) may be
thought of as a form of competitive learning algorithm. A taxonomy of the learning
algorithm mostly used for various applications is depicted in Figure-2.1.
Training of network: In most applications, both modes are present. The ANN is
first run in a learning mode, with the training continuing until the weights are
properly adjusted for the particular applications. Then it is used in operational mode.
Neural networks can be divided into two main classes, based on the learning
algorithms for weight adjustment as supervised or unsupervised. Hybrid systems
using both strategies have also been developed.
Supervised training: In this type of training, both inputs and outputs are provided.
The network then processes the inputs and compares its resulting outputs against the
desired outputs. Errors are then propagated back through the system, causing it to
adjust the weights, which control the network. This process occurs over and over as
the weights are continually adjusted. The set of data, which enables the training, is
called the ‘training set’. During the training of network, the same set of data is
processed many times as the connections weights are over refined. One example of
44
the supervised learning algorithm is the backpropagation [125]. Backpropagation

networks use a sigmoidal transfer function rather than a simple threshold type of
function, which introduces two important properties.
 The sigmoidal is nonlinear, allowing the network to perform complex mappings of
input to output vector spaces.
 It is continuous and differentiable, which allows the gradient of the error to be
used in updating the weights.
Using the backpropagation algorithm, it is possible to train networks

containing hidden layers and so perform complex nonlinear mappings possible with
perceptron networks. Networks of this type are sometimes referred to as ‘multilayer
perceptrons employing backpropagation learning’.
Unsupervised training: In this type of training, the network is provided with

inputs but not with desired outputs. Each node of the network represents an important
aspect of the data used for learning. In some cases, each node’s weight vector is in the
same vector space as the input data, and each input sample is associated with the
nearest weight vector (with the smallest Euclidean distance). Such networks have
been used for clustering, approximating probability distributions, and compressing
voluminous input data into a small number of code words. In other cases, each node
computes a function of the inputs such that each output value for this node represents
the extent to which the corresponding input vector contains an important feature of
the input data.
Hybrid systems: Hybrid systems combine more than one intelligence paradigm in
a synergistic framework. Expert systems that were hitherto purely symbolic, now
employ neural networks to support their decisions. The interest in such synergistic
system is certainly not new. Signal processing models have been extended through
intelligent pre/post-processing of data. During the past decade, we have increasingly
45
seen neural network approaches being combined with existing computational

techniques to produce a variety of applications. More recently, the soft computing
paradigm, which includes neural networks, fuzzy system and genetic algorithm as its
principal components, has found its way into numerous important commercial
application. Considerable stress is being laid on the seamless integration of these
three technologies in order to exploit advances of each in the design of intelligent
systems.
Counter propagation networks are examples of network architecture, which are
having both supervised and unsupervised learning [127]. The network consists of
three layers and is trained in two stages. First, the Kohonen unsupervised learning
algorithm is employed to adjust the weights between input and hidden layers and in
the second phase of training, the Grossberg Outstar supervised learning method is
used to set the weights between the hidden layer and the output layer.
2.6.3 Multilayer perceptron neural network (MLPNN)

Some learning rules such as perceptron learning rule and the least mean square
algorithm are designed to train single layer (perceptron-like) neural networks. These
have the drawback that they can only solve linearly separable classification problems.
Both Rosenblatt and Widrow knew about this and knew that going to multiple layers
could overcome this, but they were not able to generalize their algorithm to train these
more powerful networks. In the mid-1980’s the backpropagation algorithm was
advanced [136] and soon became the most widely used algorithm for training the
multilayer perceptron; it still is today. Typically, the multilayer perceptron neural
network (MLPNN) consists of a set of sensory units or source nodes that consists of
the input layer, one or more hidden layers of computing nodes and an output layer of
computing nodes. The feedforward structure of the MLPNN refers to a network in
which all nodes of a layer are fully connected through synaptic weights to all the
nodes of the layer just above it. The input signal propagates in a forward direction, on
a layer-by-layer basis. The learning of the network is carried out in two phases. In the
forward phase, an input pattern is applied to the input layer of the network, and its
46
effect propagates through the network layer by layer. The set of outputs of the output
layer constitutes the actual response of the network. During the forward phase, the
weights of the network are all fixed. In the backward phase, on the other hand the
synaptic weights are all adjusted in accordance with the error-correction rule, most
popularly known as backpropagation (BP) algorithm. The MLPNNs have been
applied successfully to solve many difficult and highly nonlinear problems of
engineering and science using the BP algorithm.
The MLPNN has three distinctive characteristics. First, each neuron in this
network includes a smooth nonlinearity (i.e., differentiable everywhere). Second, the
network contains one or more hidden layers that are not part of the input or the output
of the network. Finally, this network exhibits a high degree of connectivity
determined by the synaptic weights. Indeed, it is through the combination of these
characteristics together with the ability to learn from experience through training that
the MLPNN derives its computing power. In the following we describe the BP
algorithm used to train the MLPNN.
Artificial Neural Networks are mathematical models inspired from Biological neural
network, which can learn from the inputs and can approximate functions and classify
patterns etc. The feed-forward neural network is the simplest type of neural network
which is shown in Figure 2.1. There are three layers in this network: Input layer,
Hidden layer and the output layer. Inputs or patterns are presented to the network
Input
signals
(external
stimuli)
Output Layer
Hidden Layer
Input Layer
Figure-2.1: Multilayer feed forward network.
47
from input layer. Hidden layers are the layer where actual processing takes place and
weights of different neurons are altered and communicates the processed output to the
output layer. The output layer evaluates the output of the system which may be sent
back for error correction or feedback evaluation as in case of back-propagation
network. Back propagation algorithm is the most common and simplest way of
learning the ANN.
The feed forward back propagation networks emerged as a most significant
result in the field of neural networks. The backpropagation learning involves
propagation of the error backwards from the output layer to the hidden layers in order
to determine the update for the weights leading to the units in a hidden layer (Figure-
2.1). It does not have feedback connections, but errors are backpropagated during
training by using LMS error. Error in the output determines measures of hidden layer
output errors, which are used as a bias for adjustment of connection weights between
the input and hidden layers. Adjusting the two sets of weights between the pair of
layers and recalculating the outputs is an iterative process that is carried on until the
error falls below a tolerance level. Learning rate parameters scale the adjustments to
the weights. Once training is completed, the weights are set and the network can be
used to find outputs for new inputs. Supervised training is used for the training of the
network. The input of a particular element is calculated as the sum of the input values
multiplied by connection strengths (synaptic weights) [136]. The back propagation
algorithm, according to Generalized Delta rule can be described in the following steps
[137]:
Notations
 The indices i, j, and k refer to different neurons in the network; with the signals
propagating through the network from left to right, neuron j lies in a layer to the
right of neuron i, and neuron k lies in a layer to the right of neuron j, when neuron
j is a hidden unit.
 The iteration n refers to the nth training pattern (example) presented to the
network.
48
 The symbol  (n) refers to the instantaneous sum of error squares at iteration n.
the average of  (n) over all values of n yields the average squared error  av .
 The symbol ej(n) refers to the error signal at the output of neuron j for iteration n.
 The symbol dj(n) refers to the desired response for neuron j and is used to
compute ej(n).
 The symbol yj(n) refers to the function signal appearing at the output of neuron j
at iteration n.
 The symbol wji (n) denotes the synaptic weight connecting the output of neuron i
to the input of the neuron j at iteration n. the correction applied to this weight at
iteration n is denoted by Δwji (n).
 The net internal activity level of neuron j at iteration n is denoted by vj (n); it
constitutes the signal applied to the nonlinearity associated with neuron j.
 The activation function describing the input-outout functional relationship of the
nonlinearity associated with neuron j is denoted by  j () .
 The threshold applied to neuron j is denoted by θj; its effect is represented by a

synapse of weight wj0 = θj connected to a fixed input equal to -1.
 The ith element of the input vector (pattern) is denoted by xi (n).
 The kth element of the overall output vector (pattern) is denoted by ok (n).
 The learning-rate parameter is denoted by η.
The error signal at the output of neuron j at iteration n is defined by

e j (n)  d j (n)  y j (n), (1)
Where, neuron j is an output node.

The instantaneous sum of squared errors of the network is thus written as
1
 ( n)  
2 jC
e 2j (n) (2)
The average squared error is obtained by summing  (n) over all n and then
normalizing with respect to the set size N, as shown by
49
1 N
 av    (n) (3)
N n 1
The net internal activity level vj (n) produced at the input of the nonlinearity
associated with neuron j is therefore
p
v j (n)   w ji (n) yi (n) (4)
i 0
Hence the function signal yj (n) appearing at the output of neuron j at iteration n is
y j (n)   j (v j (n)) (5)
According to the chain rule, gradient may be expressed as follows
 (n)  (n) ej (n) yj (n) vj (n)


wji(n) ej (n) yj (n) vj (n) wji(n) (6)
Differentiating both sides of equation (2) with respect to ej(n), we get

 (n)
 e j ( n) (7)
e j (n)
Differentiating both sides of equation (1) with respect to yj(n), we get

e j (n)
 1 (8)
y j (n)
Next, differentiating equation (5) with respect to vj(n), we get

y j (n)
  'j (v j (n)) (9)
v j (n)
Differentiating equation (4) with respect to wji(n) yields

v j (n)
 yi (n) (10)
w ji (n)
Hence, the use of equation (7-10) in (6) yields

 (n)
  e j (n) 'j (v j (n)) yi (n) (11)
w ji (n)
The correction w ji (n) applied to wji(n) is defined by the delta rule
50
 (n)
w ji (n)    (12)
w ji (n)
Using equations (13) and (12), we get

w ji (n)  j (n) y i (n) (36)
where, the local gradient j (n) is itself defined by
 (n) e j (n) y j (n)

j (n)  
e j (n) y j (n) v j (n) (13)
 e j (n) 'j (v j (n))
Similarly, if neuron j is a hidden node then local gradient  j (n) may be written as
 j (n)   'j (v j (n))  k (n) wkj (n) (14)

k
Therefore, the correction Δwji (n) applied to the synaptic weight connecting neuron i
to neuron j is defined by the delta rule:
 weight   learning   local   input signal 

     
 correction    rate parameter . gradient . of neuron j 
 Δw (n)   η   δ (n)   y (n) 
 ji    j  j 
Apply the given patterns one by one, may be several times, in some random
order and update the weights until the total error reduces to an acceptable value.The
basic procedure for training the feedforward error backpropagation neural network
can be described in following steps:
1. Apply the input vector to the input units.

2. Calculate the net input values to the hidden layer units.
3. Calculate the outputs to the hidden layer.
4. Move to the output layer. Calculate the net input values to each unit.
5. Calculate the outputs.
6. Calculate the error terms for the output units.
7. Calculate the error terms for the hidden units.
51
8. Update weights on the output layer.

9. Update weights of the hidden layer and when error is acceptably small for
each training-vector pair, training can be discontinued.
Momentum parameter: The backpropagation algorithm provides an

approximation to the trajectory in weight space computed by the method of steepest
descent. The smaller we make the learning-rate parameter η, the smaller will be the
changes to the synaptic weights in the network from one iteration to the next and the
smoother will be the trajectory in weight space. The improvement, however, is
attained at the cost of a slower rate of learning. If, on the other hand, we make the
learning-rate parameter η too large so as to speed up the rate of learning, the resulting
large changes in the synaptic weights assume such a form that the network may
become unstable. A simple method of increasing the rate of learning and yet avoiding
the danger of instability is to modify the delta rule of equation (13) by including a
momentum term as given below-
wji (n)  wji (n  1)  j (n) yi (n) (15)
where, α is usually a positive number called the momentum constant.
Variable learning rate: The error surface for the multilayer perceptron is
convoluted, consisting of many local maxima as well as a global minimum. The error
surface consists of many flat surfaces as well as many steep surfaces. It is easy to
experience that the speed of convergence can be enhanced if the learning rate was
allowed to increase on flat parts of the error surface, and allowed to decrease on steep
parts. This could be done whilst still maintaining stability. The trick is to determine
when to adjust learning rate and by how much. For a very simple variable learning
rate adaptive learning algorithm, learning rate is adjusted according to the following
rules [138]
 If the mean square error increases by more than some set percentage after a
weight update, then the weight update is discarded, the learning rate is reduced by
52
some fixed amount and the momentum coefficient α is set to zero (if it is used at
all).
 If the mean square error decreases after a weight update, then the weight update is
accepted and the learning rate is multiplied by same factor greater than one, and α
is set to its previous value if it had been set to zero.
 If the mean square error increases by less than the set percentage of 1 above, then
the weight update is accepted but the learning rate and the momentum coefficient
are left unchanged.
Pattern versus batch modes of training: In a practical application of the

backpropagation algorithm, learning results from the many presentations of a
prescribed set of training examples to the multilayer perceptron. One complete
presentation of the entire training set during the learning process is called an epoch.
The learning process proceeds from epoch to epoch until the synaptic weights and
biases stabilize and the mean square error over the entire training set converges to
some minimum value. For a given training set, backpropagation learning may proceed
in one or two basic ways:
In pattern mode training, synaptic weights and the biases are updated after the
presentation of each training examples. In batch mode training, updating of the
synaptic weights and biases is held off until all input/output pairs forming an epoch
have been presented to the network. Pattern mode training is preferred from an online
training/operation point of view because it requires less local storage for each
synaptic connection. What is more, if the input patterns are presented in a random
order, the search in weight space becomes stochastic in nature, which makes it less
likely for the backpropagation algorithm to be trapped in a local minimum. On the
other hand, batch mode training does provide a more accurate estimate of the gradient
vector. The final choice between pattern or batch modes of training is highly problem
specific [139].
53
2.6.4 Levenberg-Marquardt algorithm

ANN training is usually formulated as a nonlinear least-squares problem.
Essentially, the Levenberg-Marquardt algorithm is a least-squares estimation
algorithm based on the maximum neighborhood idea. Consider the sum-of-squares
error function in the form
1 1
E   (e(n)) 2  e
2
(16)
2 n 2
where, e(n) is the error for the nth pattern. If the displacement wnew - wold in the
weight space is small then the error vector e can be expanded to first in a Taylor
series
e (wnew )  e(wold )  Z (wnew  wold ) (17)
where, matrix Z is defined with elements
e n
( Z ) ni  (18)
wi
The error function (Eq.16) can then be written as
1
E e( wold )  Z ( wnew  wold )
2
(19)
2
If this error is minimized with respect to the new weights wnew , then
wnew  wold  ( Z T Z ) 1 Z T e( wold ) (20)
For the sum-of-squares error function, the elements of the Hessian matrix take the
form
2E  e(n) e(n)  2 e(n) 
( H ) ik 
wi wk
  
n  wi wk
 e( n)
wi wk 
 (21)
If the second term is neglected, the Hessian can be written in the form
H ZTZ (22)
For a linear network, equation (21) is exact. For nonlinear networks it
represents an approximation. In principle, the update formula (equation 22) could be
applied iteratively in order to try to minimize the error function. The problem with
such an approach is that the step size which is given by equation (19) could turn out
54
to be relatively large in which case the linear approximation on which it is based

would no longer, be valid. In the Levenberg-Marquardt algorithm [140, 141], this is
achieved by considering a modified error function of the form
~ 1
E  e( wold )  Z ( wnew  wold )   wnew  wold
2 2
(23)
2
Where, the parameter λ governs the step size.
55
2.7 Adaptive Neuro-Fuzzy Inference System (ANFIS)

ANFIS is a technique developed in early 1990’s and is based on Takagi-
Sugeno Fuzzy Inference System. Fuzzy systems suffer from the limitation that it
cannot learn from the patterns and is simply based on the rules provided. There is also
no method to evaluate the fuzzy system’s output if the actual output is not known.
Neural network on the other hand suffers from the problem that it cannot handle
imprecisions of the system. This is where ANFIS comes to rescue. ANIFS possess the
advantages of both ANN and FIS and is widely used in artificial intelligence fields.
The following figure (fig 2.2) shows a simple ANFIS network.
Fig 2.2:- ANFIS layered structure
For the sake of simplicity, it can be assumed that the fuzzy inference
system under consideration has two inputs and one output. The rule base contains the
fuzzy if-then rules of Takagi and Sugeno’s type as follows:
56
If x is A and y is B then z is f(x, y)
Where A and B are the fuzzy sets in the antecedents and z = f(x, y) is a
crisp function in the consequent. Usually f(x, y) is a polynomial for the input variables
x and y. But it can also be any other function that can approximately describe the
output of the system within the fuzzy region as specified by the antecedent.
When f(x, y) is a constant, a zero order Sugeno fuzzy model is formed which may be
considered to be a special case of Mamdani fuzzy inference system where each rule
consequent is specified by a fuzzy singleton. If f(x, y) is taken to be a first order
polynomial a first order Sugeno fuzzy model is formed. For a first order two rule
Sugeno fuzzy inference system, the two rules may be stated as:
Rule 1: If x is A1 and y is B1 then f1 = p1x + q1y + r1

Rule 2: If x is A2 and y is B2 then f2 = p2x + q2y + r2
Here type-3 fuzzy inference system proposed by Takagi and Sugeno is

used. In this inference system the output of each rule is a linear combination of the
input variables added by a constant term. The final output is the weighted average of
each rule’s output. The corresponding equivalent ANFIS structure is shown in Fig.
2.2.
The individual layers of this ANFIS structure are described below:
 Layer 1:
Every node i in this layer is adaptive in nature with a node function
𝑂𝑖 = 𝜇𝐴𝑖 (𝑥) (24)
Where, x is the input to the node i, 𝐴𝑖 the linguistic variable ( also a fuzzy set )
associated with this node function and 𝜇𝐴 is the membership function of 𝐴𝑖 .
𝑖
 Layer 2:
57
Each node of this layer is a fixed node which calculates the firing strength of
𝑤𝑖 a rule. The output of each node is the product of all the incoming signals to it and
is given by,
𝑜𝑖2 = 𝑤𝑖 = 𝜇𝐴𝑖 (𝑥) ∗ 𝜇𝐵𝑖 (𝑥) (25)
 Layer 3:
Every node in this layer is a fixed node. Each 𝑖𝑡ℎ node calculates the ratio of the 𝑖𝑡ℎ
rule’s firing strength to the sum of firing strengths of all the rules. The output from
the 𝑖𝑡ℎ node is the normalized firing strength given by,
𝑜𝑖3 = 𝑤𝑖 = 𝑤𝑖 /(𝑤1 + 𝑤2 ), 𝑖 = 1, 2 (26)
 Layer 4:
Every node in this layer is an adaptive node with a node function given by
𝑜𝑖4 = 𝑤𝑖 ∗ 𝑓𝑖 = 𝑤𝑖 (𝑝𝑖 𝑥 + 𝑞𝑖 𝑦 + 𝑟𝑖 ), 𝑖 = 1, 2 (27)
Where 𝑤𝑖 is the output of Layer 3 and {𝑝𝑖 ,𝑞𝑖 ,𝑟𝑖 } is the consequent parameter set.
 Layer 5:
This layer comprises of only one fixed node that calculates the overall
output as the summation of all incoming signals, i.e.
Σ𝑤𝑖 ∗𝑓𝑖
𝑜𝑖5 = 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 = ∑𝑤𝑖 ∗ 𝑓𝑖 = (28)
Σ𝑤𝑖
In the ANFIS structure, it is observed that given the values of premise

parameters, the final output can be expressed as a linear combination of the
consequent parameters. The output f in Fig. 2.2 can be written as:
𝑓 = (𝑤1 ∗ 𝑥 )𝑝1 + (𝑤1 ∗ 𝑦)𝑞1 + (𝑤1 )𝑟1 + (𝑤2 ∗ 𝑥)𝑝2 + (𝑤2 ∗ 𝑦)𝑞2 + (𝑤2 )𝑟2 (28)
Where f is linear in the consequent parameters (p1, q1, r1, p2, q2, r2). In
the forward pass of the learning algorithm, consequent parameters are identified by
the least squares estimate. In the backward pass, the error signals, which are the
derivatives of the squared error with respect to each node output, propagate backward
58
from the output layer to the input layer. In this backward pass, the premise parameters
are updated by the gradient descent algorithm.
2.8 Radial Basis Function Network (RBFN)
A Radial Basis Function Network (RBFN) is a kind of neural network. Mostly, when
people talk about neural networks or ANN, they are referring to the Multilayer
Perceptron (MLP). Each neuron in an MLP takes the weighted sum of all of its input
values. That is to say each input value is multiplied by a coefficient, and then the
results are all summed together. Only one MLP neuron acts as a simple linear
classifier, but complex non-linear classifiers can be built by combining multiple of
these neurons into a network.
The RBFN approach is apparently more intuitive than the MLP. An RBFN performs
classification by measuring the testing input’s similarity to the examples from the
training input set. Each RBFN neuron stores a “prototype”, which is just one of the
examples provided in the training set. When a new input is given to the RBFN for
classification, each neuron computes the Euclidean distance between the given input
and its prototype. If the input more closely resembles the class A prototypes than the
class B prototypes, it is classified as class A and this resemblance is reflected in the
Euclidian distance.
59
Fig 2.3:- RBFN layered structure
Fig 2.3 shows the typical architecture of an RBF Network. It consists of an input
layer, a layer of RBF neurons also known as pattern layer, summation layer and an
output layer with one node per category or class of data. The structure closely
resemblance a typical MLPN Network
 The Input Vector
The input vector is the n-dimensional vector that is being classified. The entire input
vector is shown to each of the RBF neurons.
 The RBF Neurons or Pattern Layer
Each RBF neuron stores a “prototype” or a pattern vector which is one of the vectors
from the training set. Each RBF neuron compares the input vector to its prototype,
and outputs a value between 0 and 1 which is a measure of similarity. If the input is
equal to one of the prototype, then the output of that RBF neuron will be 1. As the
60
distance between the input and prototype grows, the output response falls off
exponentially towards 0. The shape of the RBF neuron’s response is a bell curve, as
illustrated in the network architecture diagram. The neuron’s response value is also
known as its “activation” value. The prototype vector is also know as the neuron’s
“center”, since it’s the value is present at the center of the bell curve.
 The Output Layer
The output Layer of the network consists of a set of nodes, one per category that we
are trying to classify. Each output node computes a score for the associated category.
A classification decision is made by assigning the input to the category with the
highest score. The score is a measure of how close the given input is to a given
pattern.
The score is computed by taking a weighted sum of the activation values from all the
RBF neuron. Weighted sum means that an output node associates a weight value with
each of the RBF neurons, and multiplies the neuron’s activation by this weight before
adding it to the total response.
As each output node is computing score for a different category, every output node
has its own set of weights. The output node will typically give a positive weight to the
RBF neurons that belong to its category, and a negative weight to those inputs which
doesn’t belong.
Each of the RBF neuron computes a measure of the similarity between the testing
input and its prototype vector (taken from the training input set). Testing input vectors
which are more similar to the prototype or Training inputs return a result closer to 1.
There are different kinds of similarity functions, but the most popular is based on the
Gaussian curve. Given below is the equation for a Gaussian with a one-dimensional
input.
61
Where x is the input, mu is the mean, and sigma is the standard deviation. This
produces the familiar bell curve shown in Fig 2.4, which is centred at the mean, mu
(Here, the mean is 5 and sigma is 1).
Fig 2.4: A typical Gaussian Function
The RBF neuron activation function is slightly different, and is typically written as:
In the Gaussian distribution, mu refers to the mean of the distribution. Here, it is the
prototype vector or the pattern vector which is at the center of the bell curve. For the
activation function, phi, we aren’t directly interested in the value of the standard
deviation, sigma, so we make a couple simplifying modifications. The first change is
that we have removed the outer coefficient, 1 / (sigma * sqrt (2 * pi)). This term
usually controls the height of the Gaussian. Here, it is redundant because of the
weights applied by the output nodes. During training, the output nodes learn the
correct coefficient or “weight” which needs to be applied to the neuron’s response.
62
The second change is that we have replaced the inner coefficient, 1 / (2 * sigma^2),
with a single parameter known as ‘beta’. This beta coefficient controls the width of
the bell curve. Again, here we don’t care about the value of sigma, we just care that
there is some coefficient which is controlling the width of the bell curve. So we can
simplify the equation by replacing the term with a single variable.
Fig 2.5: RBF Neuron activation for different values of beta
There is also a slight change in the notation here when we apply the equation to an n-
dimensional vector. The double bar notation in the activation equation indicates that
we are taking the Euclidean distance between x and mu, and squaring the result. For
the one-dimensional Gaussian, this simplifies to just (x - mu) ^2. It is imperative to
note that the underlying metric here for evaluating the similarity between an input
vector and a prototype is the Euclidean distance between the two vectors.
Also, each RBF neuron will produce its largest response i.e. 1 when the input is equal
to the prototype vector. This allows taking it as a measure of similarity, and summing
the results from all the RBF neurons. As we move away from the prototype vector,
the response falls off exponentially. The exponential fall off of the activation function
means that the neurons whose prototypes are far away from the input vector will
actually contribute very little to the result.
63
2.9 Probabilistic Neural Network (PNN)
A probabilistic neural network is similar to a Multi-layer Perceptron neural network

and predominantly serves as a classifier. A probabilistic neural network consists of
RBF network and associate inputs to the prototype or pattern for classification. It can
be used to map any input pattern to a number of classifications. It can also be forced
as a more generalized approximator for approximating function and for regression
analysis.
Fig 2.6: Generalized PNN
A PNN is an implementation of a statistical algorithm called kernel discriminant analysis

in which the operations are organized into a multilayered feed-forward network with four
layers:
 Input layer
 Pattern layer
 Summation layer
 Output layer
The four layers are illustrated in Fig 2.6. The input layer is for receiving the input. The
radial basis layer is for storing the pattern or prototype as described in the previous
section. Finally the competitive layer is used where the correct prototype is decided for
64
any given input based on the similarity score (0-1 range). The output layer is from where
the final output for classification is achieved. The PNN incorporates a very fast training
process significantly faster than back propagation training. The network is also
guaranteed to converge to an optimal classifier as the size of representative set increases.
It is also devoid of local minima issues. Because of the inherent parallel nature of the
network, training samples can be added or removed without much re-training.
However, it also has few drawbacks like lack of generalization as opposed to back-
propagations network. Large memory requirements and slow execution as compared to
MLPNN. It also requires a representative training set for classification of inputs.
2.9.1 Classification theory

If the probability density function (pdf) of each of the populations is known, then an
unknown, X, belongs to class “i” if: 𝑓𝑖 (𝑥) >𝑓𝑖 (𝑥) , all j ≠ i. Other parameters may be
included such as Prior probability (h), Misclassification cost (c or cost of incorrectly
classifying an unknown) .Classification decision becomes: ℎ𝑖 𝑐𝑖 𝑓𝑖 (X) > ℎ𝑗 𝑐𝑗 𝑓𝑗 (X), all j ≠ i
(Bayes optimal decision rule) 𝑓𝑘 is the pdf for class. The probability density function
estimation for a class has been shown in last section.
2.9.2 Training of PNN

The training set should be thoroughly representative of the actual population for
effective classification. The PNN’s are more demanding than most NN’s with respect
to the input sets for training. But on the other hand sparse sets are also sufficient.
PNN also handles Erroneous samples and outliers which makes it ideal for most
application. Adding and removing training samples in PNN simply involves adding or
removing “neurons” in the pattern layer. As the training set increases in size, the PNN
asymptotically converges to the Bayes optimal classifier. The estimated pdf
approaches the true pdf, assuming the true pdf is smooth.
The training process of a PNN is essentially the act of determining the value of the
smoothing parameter, sigma. To determine sigma educated guess based on knowledge
of the data is used. A heuristics based method for determining sigma is also known as
65
jackknifing. It involves systematically testing the values of sigma over some range
and bounding the optimal value to some interval. Jackknifing is also used to grade the
performance of each sigma.
2.8 Applications of Fuzzy Logic in EEG

The Fuzzy logic has now become one of the most versatile and promising
fields of computer science. These are applied for a prolific research areas and wide
spread for multiple purposes, from simple classification to forecasting. Applications
on Fuzzy also include Robotics, control systems and error handling. The main
features of Fuzzy logic include imprecision and ability to handle fuzzy sets. These are
now finding new application areas within biomedical sciences. Use of soft computing
tools like ANN; ANFIS etc. find a dominant place in the field of bioinformatics.
Bioinformatics can be seen as a branch of science that combines the multi-
disciplinary area such as computer science, biology, physical and chemical principles,
designing of tools utilized for the analysis and modeling of large biological data sets,
chronic diseases management, learning of molecular computing and cloning etc.
Bioinformatics is intensifying as a field for research and development of new
technology. Nowadays, fuzzy inference technologies are predominantly applied in
bioinformatics. For example, to increase the suppleness of motifs of the protein and
learn about the distinctions that can happen among polynucleotides, utilizes the fuzzy
adaptive resonance theory for the holistic analysis of expression data and applying the
dynamic programming algorithm for the alignment of the sequences based on fuzzy
recast, use of algorithms like fuzzy k-nearest neighbors algorithm to identify the
proteins sub-cellular locations from their dipeptide composition, use of fuzzy c-means
and partitioning methods for the characteristic cluster relationship values of genes,
analysis of gene appearance data, ancestral and functional relationships between
amino acids with the help of fuzzy alignment method, fuzzy classification rules
generated by the use of neural network architecture for the analysis of affairs between
genes and decipher of a genetic set-up to process micro-array images, use of fuzzy
vector filtering framework in the classification of amino acid sequences in to different
66
super families etc. In addition to the widespread application in diagnosis, much

developmental work is being undertaken in the field of signal processing and analysis
of bioelectric signals.
One major application of fuzzy logic in the field of bioinformatics is the
analysis of EEG. Intelligent automated systems are needed to assist the tedious visual
analysis of polygraphic recordings. Most of the biomedical systems need detection of
different electroencephalogram (EEG) waveforms. The problem present in automated
detection of alpha activity is the presence of large inter-individual variability of its
amplitude and duration. Intelligence of the method can be attributed to the features
extracted and the way they are selected. The ranges of the fuzzy rules are determined
based on feature statistics in most of the applications.
Another important application is a multistage fuzzy rule-based algorithm for
epileptic seizure onset detection. Various features based on Amplitude, frequency,
and entropy are extracted from intracranial electroencephalogram (iEEG) recordings
and considered as the inputs for a fuzzy system. These features which are extracted
from multichannel iEEG signals and are combined using fuzzy algorithms both in
feature domain and in spatial domain. Fuzzy rules were evaluated based on experts'
knowledge and reasoning. An adaptive fuzzy subsystem is used for combining
characteristics features which are extracted from iEEG. For the case of spatial
combination, three channels from epileptogenic zone and one from remote zone are
considered into another fuzzy subsystem. Finally, a threshold procedure is applied to
the fuzzy output derived from the final fuzzy subsystem.
The neural activity of the human brain in case of infants starts between the
seventeenth and twenty third week of prenatal development. It is believed that from
this stage and throughout the span of life electrical signals are generated by the brain
function but also the status of the whole body. Understanding of neurophysiologic
properties and neuronal functions of the brain function together with the mechanisms
underlying the generation of signals and their recording is, however, vital for those
who deal with these signals for detection, diagnosis, and treatment of brain disorders
67
and the related diseases. Use of Adaptive fuzzy technique also finds a major
application in this stage.
The prediction of sleep stages on the basis of wave band counts, collected by
data acquisition system were carried out in cats [15], by using delta waves, spindle
bursts, Ponto-Geniculo-Occipital (PGO) waves, EOG, basal EMG, amplitude and
movement artifact amplitude to train the network and used to score the states of Quite
Awake (QA), SWS, and Desynchronized Sleep (DSS). The ANFIS agreed with
manual scoring of 93.3% for all epochs scored. In another report, two types of ANNs,
a multilayer perceptron and a learning vector quantizer were used to classify the sleep
stages in infants [163]. Signals from each infant were recorded, digitized and stored in
computer. Subsets of these signals and additional calculated parameters were used to
obtain data vectors. Human experts provided the teaching inputs for both networks for
the six sleep classes and a 65% to 80% rate of correct classification was obtained.
In the area of heat stress detection and sleep-wake stages recognition of an
animal model (rats), application of back propagation learning scheme and EEG power
spectra have been reported [39, 4]. A method has been presented for an effective use
of ANFIS in establishment of EEG power spectra, EOG and EMG activity as an
index of stress in hot environment [4]. The power spectrum data for slow wave sleep,
rapid eye movement sleep and awake states was acquired from three groups of rats
(acute heat stress, chromic heat stress and the normal). The ANFIS was found
effective in recognizing the heat stress level with an average of 89% accuracy.
Effect of acute and chronic heat exposure on frequency of EEG components in
different sleep-wake states in young moving rats was investigated [165, 166].
Observations suggest that the higher frequency components of the EEG power
spectrum are very sensitive in hot environment and change significantly in all three
sleep-wake states in comparison with the control subjects following acute as well
chronic exposure to heat stress. The study [165] demonstrated that cortical EEG is
sensitive to environmental heat, and alterations in frequencies of EEG in different
states of mental consciousness due to high heat, can be successfully differentiated
efficiently by EEG power spectrum analysis. With the features extracted from EEG
68
power spectra of stressed and normal rats, Neuro-fuzzy approach [39] was found to
differentiate stressed from normal patterns following acute (95.79% in SWS, 93.8%
in REM sleep, 81% in AWAKE state) as well as chronic heat exposure (95.24% in
SWS, 84.62% in REM sleep, 83.33% in AWAKE state).
2.9 Heat stress

Environmental heat is one of the natural stress conditions that present stress in
all level of biological organizations. It creates complex changes in physiology of
nervous, endocrine, neurohumoral and motor functions of homoeothermic animals,
which are essential to restore the constant body temperature, adjust fluid balance,
energy metabolism and behavior to the needs concomitant with survival in hot
environment. Animals, when subjected to hot environment, respond by activating
different physiological processes. The intensity, duration and the adaptations to hot
environment play important role in change of many physiological processes and
determine the level of thermoregulatory activity [177], which influence the
performance of all animals including man. These adaptational adjustments can also
give way to more prolonged metabolic changes associated with growth and
reproduction during life in hot climate. However, the primary response to the heat is
the immediate rise in body temperature that play the central role in stimulating the
mechanism necessary to heat dissipation, vasodilation and sweating. The relationship
of temperature rise with the regulatory system and the control feedback mechanism
has already been analyzed very well. The effects of hot environment on different
stress markers in the same animal model have been demonstrated recently [4].
The hypothalamus is the chief center in the brain having regulatory action over
the body temperature [178, 179]. It utilizes sensory information from core, muscle,
skin and chemoreceptors to control sweating mechanisms, vasomotor changes in
blood vessels and motor neurons of the muscles, which in turn affect the temperature
in the body itself. Thermal environment serve as an external driver of this regulating
system. The hypothalamus is responsible for further heat exchange with the
environment by increasing the heart rate in order to increase the blood flow to skin
69
and sweating is initiated in order to enhance evaporative heat loss. The strain of the
heat exposure is related to hypothalamus quantitatively in the equilibrium temperature
attained and in the increase in thermal conductance and output of sweat for
evaporation loss. Reestablishment of body temperature in the face of heat gain
depends only to a minor extent on depression of metabolic heat production.
Lechin et al. [180] had given additional support favoring the diagnosis of
stress in acute heat conditions. Bedrak et al. [181] reported that single acute exposure
to normal restingen to environmental heat stress caused fibrinolytic activity in both
whole blood and plasma systems. However, the balance of body fluid and salts play
major role in heat-related illness called as heat disorders, which are a group of
physically related illness caused by exposure to hot temperatures, restricted fluid
intake or failure of temperature regulation mechanism of the body.
Automatic sleep stage scoring in human was also tried by using a multilayer
feedforward network with the help of all night spectral analysis for the background
activity of the EEG and sleep pattern detectors for the transient activity [182, 183]. By
adding expert supervision for ambiguous and unknown epochs, detected by
computation of an uncertainty index and unknown rejection, the automatic/expert
agreement was observed from 82.3% to 90%. The automatic procedure for online
recognition of REM sleep depressive patients was discussed by applying generalized
backpropagation ANN to preprocessed single channel EEG activity [184]. EOG and
EMG information were not provided as the input to the network. The sleep profile
scored manually and served as the desired output during the training period and as the
standard for the judgment of the network output during working mode. Between
84.9% to 88.6% continuous EEG activity are correctly classified.
In another effort to classify sleep patterns, five types of characteristic waves
such as spindle, hump, alpha waves, hump train and background waves are diagnosed
with the help of a new type of neural network model referred to as a sleep-EEG
recognition neural network (SRNN) [21]. But this network was not tried to detect
several other kinds of important characteristic waves in sleep, which are necessary for
diagnosing sleep stages.
70
The review of results of sleep classification shows that the correct recognition
rates were good in recognizing different sleep patterns. The computational and
learning ability of ANFIS, however, indicate much more potential in recognizing
different sleep patterns. The review of literature suggests that no work has been
reported that classifies heat stressed conditions from normal candidates with help of
ANFIS for sleep stage classification followed by Fuzzy for heat stress level detection,
and analyses using Frequency analysis of EEG. The works reported so far used
supervised learning scheme for heat stress detection and sleep-EEG analysis of
animal model [39, 4]. Similar animal model was used to classify EEG power spectra
in psychological condition and compares the depressed from control rats [23, 24, 25].
2.9.1 EEG changes in hot environment

A number of reports on the effect of artificially induced heat on brain
electrical activities have been published in near past that demonstrate the overall
pattern changes in EEG activity as a function of ambient temperature. The ambient
environmental heat increases the body temperature either spontaneously or
artificially, which produces an initial increase in EEG frequencies. After a continuous
rise in heat, the wave amplitude was observed increasing with decrease in dominant
frequency [185]. In hyperthermia-induced fever, convulsions in EEG waves were
recorded. The EEG abnormalities were observed to persist even if body temperature
return to normal. Frantzen et al. [186] presented that hyperthermia alone last no
longer than the elevated temperature. If the elevation of temperature maintained for
long duration at higher ambient temperature, a transient major reduction in EEG
activity was observed [187]. Morimoto et al. [188] showed that hyperthermia initially
induces theta burst and after some time accompanied by small spikes and finally
typical spike and wave burst in freely moving rats. These sequential changes in EEG
suggest that brain function varies in stepwise manner during hyperthermia. The hot
water pour over the head causes spontaneous seizures initiated by hyperthermic
stimulations. The EEG recordings show similar fibril convulsions as in the
hyperthermia, only differ with respect to the stimulus and rate of rise in temperature
71
in susceptible subjects [189, 190, 191]. Studies on the effect of hyperthermia on brain
auditory evoked potentials demonstrate that hyperthermia in conscious animals
produce potentially damaging effect on CNS when critical brain temperature was
increased [192]. Other observations present that pattern reversal evoked potential
latencies were seen strongly influenced and cause changes in evoked potentials
[193, 194] followed by exposure to high environmental heat.
Spectral analyses of EEG in high environmental heat, showed the fall and rise
of cortical activities with the alterations in the temperature [195]. The increase in fast
waves (18-25 Hz), sometimes irregular and disrrhythmic, over a background of low
voltage activities has been recorded as the body temperature increases. In latter
activity of continuous exposure, the EEG frequencies were slowed to 7-10 Hz and at
higher temperature 0.5-3 Hz of variable amplitude were observed. Yamada [196]
reported that the heating of whole body or the head alone up to 45°C increased the
peak frequency of EEG power spectra. At higher temperature, high amplitude
rhythmic slow wave burst were also reported.
The states and the changes in EEG are of great importance in a wide range of
normal and pathological conditions. In psychiatry, EEG has been established as
widely used tool, especially in spike detection. The detection and analysis of EEG in
sleep have also been considered important as they provide suggestions of the origin
and mechanism of EEG in different sleep stages. EEG has already been defined
visually and regarded as an important index of classification of sleep [197], and in the
study of aging [198]. Computer analysis of EEG offers advantages in the information
about the spectral components of the waveforms. Since typical long-term EEG
records are extended over several hours, while the epilepsy may be characterized by
occasional and sleep represent cyclic activities, FFT provide an important data
reduction tool for electroencephalographers. The frequency components of EEG
signals are very important and the spectral estimates by FFT are most widely used to
study the pattern features of EEG signals and in the analysis of background activities
[199]. The location and power of resonant peaks are quantitative presentation of main
frequency components that constitute the individual EEG [200]. Thus, the clinical
72
value of EEG power spectra as a noninvasive tool has been well established for the
investigation of epilepsy, spikes and wave burst [201, 202] and invariably the results
reported are very good. The works in the area of analyses of EEG power spectra in
different sleep-wake states are very less reported. However, it is now well established
that the stressful conditions under hot environment are responsible for the changes of
the brain functions that alters the brain electrical activities or EEG too. The study
regarding the changes in the EEG power spectra due to stress, especially in natural
conditions such as in high environmental heat has not been well analyzed so far, and
thus need to be studied in a systematic manner.
2.9.2 Changes in sleep stages in hot environment

Some research works have been published in the area of the study of changes
in sleep stages following exposure to heat in human as well as in rats. Haskell et al.
[203] studied the effect of ambient heat on human sleep stages at different
temperatures (21, 24, 29, 34 and 37 °C). The report displayed a significant quadratic
trend for nearly every sleep variables. There were marked individual differences in
sensitivity of sleep to ambient environmental temperature observed. Results show that
the REM sleep was the only stage, which significantly reduced by the ambient heat,
probably due to a general disruption of sleep processes rather than being specifically
related to the status of thermoregulatory system during REM sleep.
Horne & Reid [204] reported the effect of warm bath on sleep EEG on
women, who took a bath for 90 minutes. After hot bath, there was significant increase
in sleepiness at bedtime and SWS observed with reduction in REM sleep. The EEG
sleep patterns were also compared following body heating (1 hour immersion in water
at 41°C) at each of four times a day: morning, afternoon, early evening and late
evening, ending just prior to sleep [205]. Effects on manually stage indices of SWS
were confined to increase following late evening heating and heating just prior to
sleep, resulted in substantial reduction in latency of REM sleep period. Sleep onset
time was also found reduced by heating. The results indicate that body heating
73
induces temporary changes that affect sleep propensity and both the quantity and
temporal distribution of the sleep EEG.
Libert et al. [206] showed sleep disturbances at high ambient temperature
(35°C) in young men. Subjects were exposed to a thermoneutral environment at 20°C
for five days and nights followed by an acclimation period of five days and nights at
35°C and two recovery days and nights at 20°C. It was observed that this chronic
exposure to environmental heat stress reduces the total sleep time, while the amount
of wakefulness increased. The subjects exhibited fragmented sleep patterns. In the
acclimated period, there was no change in sleep pattern from night to night. The
protective mechanisms of deep body temperature were not activated, as the heat
adaptation did not interact with sleep processes.
Reports were also published on the effect of high environmental heat on sleep
stages of rats. Obal Jr. et al. [207] reported reduction in waking episode and a shift
toward more SWS and REM sleep in adult rats after elevation of ambient temperature
from 22°C to 29°C. The acute heat exposure at 32°C elicited increase in rectal
temperature and the long-term heat load induced persistent, ambeit slight
enhancements of non-REM sleep in young rats (26 days old) [208]. REM sleep
increased with a 12-hour delay during the 24-hour heat load. Heat elicited an
immediate large increase in SWS, which was not followed by subsequent alterations
in sleep when the ambient temperature returned to normal. It was interpreted as
suggesting that heat increases non-REM sleep in the young rats. When the normal rats
were exposed to different temperatures (18, 24 and 30°C), increased SWS at higher
temperature as compared to lower temperature were observed. The increase in the
total amount of REM sleep and number of REM episodes and their mean duration
were also higher [209]. The increase in sleep with increase in temperature might be
considered as an adaptation to thermal load aimed to energy conservation.
To assess the effect of high environmental heat or other stress on sleep
variables, long-term polygraphic recording is essential. But the conventional form of
recordings is not very useful in definitive diagnosis as analyses of the sleep records
are laborious, time consuming and need electroencephalographer’s skill. Computers,
74
digital filters and several other signal processing techniques are applied to quantify
the polygraphic sleep recording and thereby ease clinical utility. Attempts were also
made to develop automated systems by using computational and signal-processing
tools have highlighted the difficulty of electroencephalographers, particularly in
determination of the parameters for sleep classification. However, in the most
fundamental terms, sleep stages are manifested by sequential changes in frequency
and amplitude of polygraphic bioelectric signals. Sleep-EEG classification by using
frequency-amplitude features is hardly a new concept and virtually all-computational
methods try to capitalize on this notion. ANFIS is an area of computer science that
has been used very efficiently in several pattern recognition tasks and may be helpful
in development of an automated system. Recently, it has also been applied to classify
different sleep-wake patterns but the recognition rate was not very good. So, various
architectures of ANFIS have been tried to achieve a better performance in the
classification of sleep stages and heat stress conditions. On the other hand, no work
has been reported to classify stressful events by ANFIS, mainly by means of sleep-
EEG changes.
A method for computerized detection of heat stress is presented here and tested on
pre-recorded data from a range of subjects. The physiological changes that happen in
the subjects are incorporated as fuzzy logic to distinguish the stress level of the
subjects as Chronic or Acute stress. First the sleep stage classification is done with the
help of rules established from sleep manual by AASM .After the sleep stage
classification is done data are further classified as chronic or acute stress with respect
to their controlled states. The proposed algorithm achieves an average of 89 %
accuracy for different sleep stages and stress level.
75

2.literature Review

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2.literature Review

Uploaded by

Copyright:

Available Formats

ReviewCHAPTER

The literature has been reviewed under the following sections:

2.1 Electroencephalogram (EEG)

2.1.2 Background rhythms (BGR)

2.1.3 Transient events

2.1.5 Automated EEG analysis

followed by clustering procedure. The disadvantage of piece-wise segmentation and

2.1.5.1 FFT analysis of EEG

The data, which represent a length of T seconds of the signal, is obtained by

n = angular frequency of the nth harmonic

2.1.5.2 Wavelet transform

2.1.5.3 Continuous wavelet transform (CWT)

(1) It must have finite energy:

(2) If ˆ ( f ) is the Fourier transform of ψ(t), i.e.

then the following condition must hold:

(2) Wavelet modulus maxima, defines as

2.1.5.4 Discrete wavelet transform (DWT)

 m, n (t )  2m / 2 (2m t  n) (14)

where, 0,0 (t )   (t ) is sometimes referred to as the mother scaling function or mother

The approximation coefficients at a specific scale m are collectively known as the

where xm (t ) is a smooth, scaling-function-dependent version of the signal x(t) at scale

The signal detail at scale m is defined as

hence equation (20) may be written as

From this equation it is easy to show that

2.1.5.5 Wavelet applications in EEG

2.2 Electrooculogram (EOG)

2.3 Electromyogram (EMG)

2.4 Polygraph readings for sleep stage classification

Neuronal wave patterns vary in terms of frequency and amplitude, as well as

2.5 Fuzzy Inference System (FIS)

2.5.1 Fuzzy Inference System (FIS) steps

2.5.2 Type of FIS system

 Mamdani fuzzy system :-

 Singleton Fuzzy system :-

 Takagi-Sugeno Fuzzy system :-

the Takagi-Sugeno models possess an excellent interpretation. A singleton fuzzy

2.6 Artificial Neural Networks (ANN)

The ANN is usually implemented using electronic components (digital or

2.6.1 Potential benefits of ANNs

network produced by the input signal in accordance with appropriate statistical

2.6.2 Learning process

the supervised learning algorithm is the backpropagation [125]. Backpropagation

Using the backpropagation algorithm, it is possible to train networks

Unsupervised training: In this type of training, the network is provided with

seen neural network approaches being combined with existing computational

2.6.3 Multilayer perceptron neural network (MLPNN)

Figure-2.1: Multilayer feed forward network.

 The threshold applied to neuron j is denoted by θj; its effect is represented by a

The error signal at the output of neuron j at iteration n is defined by

Where, neuron j is an output node.

According to the chain rule, gradient may be expressed as follows

 (n)  (n) ej (n) yj (n) vj (n)

Differentiating both sides of equation (2) with respect to ej(n), we get

Differentiating both sides of equation (1) with respect to yj(n), we get

Next, differentiating equation (5) with respect to vj(n), we get

Differentiating equation (4) with respect to wji(n) yields

Hence, the use of equation (7-10) in (6) yields

The correction w ji (n) applied to wji(n) is defined by the delta rule

Using equations (13) and (12), we get

where, the local gradient j (n) is itself defined by

 (n) e j (n) y j (n)

 j (n)   'j (v j (n))  k (n) wkj (n) (14)

 weight   learning   local   input signal 