You are on page 1of 44

INTRODUCTION

Image fusion is the process by which two or more images are combined into
a single image retaining the important features from each of the original images.
The fusion of images is often required for images acquired from different
instrument modalities or capture techniques of the same scene or objects.
Important applications of the fusion of images include medical imaging,
microscopic imaging, remote sensing, computer vision, and robotics. Fusion
techniques include the simplest method of pixel averaging to more complicated
methods such as principal component analysis and wavelet transform fusion.
Several approaches to image fusion can be distinguished, depending on whether
the images are fused in the spatial domain or they are transformed into another
domain, and their transforms fused.

With the development of new imaging sensors arises the need of a
meaningful combination of all employed imaging sources. The actual fusion
process can take place at different levels of information representation, a generic
categorization is to consider the different levels as, sorted in ascending order of
abstraction: signal, pixel, feature and symbolic level. This focuses on the so-called
pixel level fusion process, where a composite image has to be built of several input
images. To date, the result of pixel level image fusion is considered primarily to be
presented to the human observer, especially in image sequence fusion (where the
input data consists of image sequences). A possible application is the fusion of
forward looking infrared (FLIR) and low light visible images (LLTV) obtained by an
airborne sensor platform to aid a pilot navigate in poor weather conditions or
darkness. In pixel-level image fusion, some generic requirements can be imposed
on the fusion result. The fusion process should preserve all relevant information of
the input imagery in the composite image (pattern conservation) The fusion
scheme should not introduce any artifacts or inconsistencies which would distract
the human observer or following processing stages .The fusion process should be
shift and rotational invariant, i.e. the fusion result should not depend on the
location or orientation of an object the input imagery .In case of image sequence
fusion arises the additional problem of temporal stability and consistency of the
fused image sequence. The human visual system is primarily sensitive to moving

1
light stimuli, so moving artifacts or time depended contrast changes introduced by
the fusion process are highly distracting to the human observer. So, in case of
image sequence fusion the two additional requirements apply. Temporal stability:
The fused image sequence should be temporal stable, i.e. gray level changes in
the fused sequence must only be caused by gray level changes in the input
sequences, they must not be introduced by the fusion scheme itself; Temporal
consistency: Gray level changes occurring in the input sequences must be present
in the fused sequence without any delay or contrast change.

1.1 FUSION METHODS

1.1.1 Introduction

The following summarize several approaches to the pixel level fusion of
spatially registered input images. Most of these methods have been developed for
the fusion of stationary input images (such as multispectral satellite imagery). Due
to the static nature of the input data, temporal aspects arising in the fusion process
of image sequences, e.g. stability and consistency, are not addressed.
A generic categorization of image fusion methods is the following:

 linear superposition
 nonlinear methods
 optimization approaches
 artificial neural networks
 image pyramids
 wavelet transform
 generic multiresolution fusion scheme

2
1.1.2 Linear Superposition

The probably most straightforward way to build a fused image of several
input frames is performing the fusion as a weighted superposition of all input
frames.
The optimal weighting coefficients, with respect to information content and
redundancy removal, can be determined by a principal component analysis (PCA)
of all input intensities. By performing a PCA of the covariance matrix of input
intensities, the weightings for each input frame are obtained from the eigenvector
corresponding to the largest eigenvalue.
A similar procedure is the linear combination of all inputs in a pre-chosen
colorspace (eg. R-G-B or H-S-V), leading to a false color representation of the
fused image.

1.1.3 Nonlinear Methods

Another simple approach to image fusion is to build the fused image by the
application of a simple nonlinear operator such as max or min. If in all input images
the bright objects are of interest, a good choice is to compute the fused image by
an pixel-by-pixel application of the maximum operator.

An extension to this approach follows by the introduction of morphological
operators such as opening or closing. One application is the use of conditional
morphological operators by the definition of highly reliable 'core' features present
in both images and a set of 'potential' features present only in one source, where
the actual fusion process is performed by the application of conditional erosion and
dilation operators.

A further extension to this approach is image algebra, which is a high-level
algebraic extension of image morphology, designed to describe all image
processing operations. The basic types defined in image algebra are value sets,
coordinate sets which allow the integration of different resolutions and
tessellations, images and templates. For each basic type binary and unary
operations are defined which reach from the basic set operations to more complex

3
ones for the operations on images and templates. Image algebra has been used in
a generic way to combine multisensor images
1.1.4 Optimization Approaches

In this approach to image fusion, the fusion task is expressed as an
bayesian optimization problem. Using the multisensor image data and an a-prori
model of the fusion result, the goal is to find the fused image which maximizes the
a-posteriori probability. Due to the fact that this problem cannot be solved in
general, some simplifications are introduced: All input images are modeled as
markov random fields to define an energy function which describes the fusion goal.
Due to the equivalence of of gibbs random fields and markov random fields, this
energy function can be expressed as a sum of so-called clique potentials, where
only pixels in a predefined neighborhood affect the actual pixel.

The fusion task then consists of a maximization of the energy function.
Since this energy function will be non-convex in general, typically stochastic
optimization procedures such as simulated annealing or modifications like iterated
conditional modes will be used.

1.1.5 Artificial Neural Networks

Inspired by the fusion of different sensor signals in biological systems, many
researchers have employed artificial neural networks in the process of pixel-level
image fusion. The most popular example for the fusion of different imaging
sensors in biological systems is described by Newman and Hart line in the 80s:
Rattlesnakes (and the general family of pit vipers) possess so called pit organs
which are sensitive to thermal radiation through a dense network of nerve fibers.
The output of these pit organs is fed to the optical tectum, where it is combined
with the nerve signals obtained from the eyes. Newman and Hart line distinguished
six different types of bimodal neurons merging the two signals based on a
sophisticated combination of suppression and enhancement.
Several researchers modeled this fusion process

1.1.6 Image Pyramids

4
Image pyramids have been initially described for multiresolution image
analysis and as a model for the binocular fusion in human vision. A generic image
pyramid is a sequence of images where each image is constructed by low pass
filtering and sub sampling from its predecessor. Due to sampling, the image size is
halved in both spatial directions at each level of the decomposition process, thus
leading to an multiresolution signal representation. The difference between the
input image and the filtered image is necessary to allow an exact reconstruction
from the pyramidal representation. The image pyramid approach thus leads to a
signal representation with two pyramids: The smoothing pyramid containing the
averaged pixel values, and the difference pyramid containing the pixel differences,
i.e. the edges. So the difference pyramid can be viewed as a multiresolution edge
representation of the input image.
The actual fusion process can be described by a generic multiresolution
fusion scheme which is applicable both to image pyramids and the wavelet
approach. There are several modifications of this generic pyramid construction
method described above. Some authors propose the computation of nonlinear
pyramids, such as the ratio and contrast pyramid, where the multistage edge
representation is computed by an pixel-by-pixel division of neighboring resolutions.
A further modification is to substitute the linear filters by morphological nonlinear
filters, resulting in the morphological pyramid. Another type of image pyramid - the
gradient pyramid - results, if the input image is decomposed into its directional
edge representation using directional derivative filter

1.1.7 Wavelet Transform

A signal analysis method similar to image pyramids is the discrete wavelet
transform. The main difference is that while image pyramids lead to an over
complete set of transform coefficients, the wavelet transform results in a
nonredundant image representation. The discrete 2-dim wavelet transform is
computed by the recursive application of lowpass and high pass filters in each
direction of the input image (i.e. rows and columns) followed by sub sampling.
Details on this scheme can be found in the reference section. One major
drawback of the wavelet transform when applied to image fusion is its well known
shift dependency, i.e. a simple shift of the input signal may lead to complete
different transform coefficients. This results in inconsistent fused images when

5
invoked in image sequence fusion. To overcome the shift dependency of the
wavelet fusion scheme, the input images must be decomposed into a shift
invariant representation. There are several ways to achieve this: The
straightforward way is to compute the wavelet transform for all possible circular
shifts of the input signal. In this case, not all shifts are necessary and it is possible
to develop an efficient computation scheme for the resulting wavelet
representation. Another simple approach is to drop the subsampling in the
decomposition process and instead modify the filters at each decomposition level,
resulting in a highly redundant signal representation.

The actual fusion process can be described by a generic multiresolution
fusion scheme which is applicable both to image pyramids and the wavelet
approach.

1.1.8 Generic Multiresolution Fusion Scheme

The basic idea of the generic multiresolution fusion scheme is motivated by
the fact that the human visual system is primary sensitive to local contrast
changes, i.e. edges. Motivated from this insight, and in mind that both image
pyramids and the wavelet transform result in an multiresolution edge
representation, it is straightforward to build the fused image as a fused multiscale
edge representation. The fusion process is summarized in the following: In the first
step the input images are decomposed into their multiscale edge representation,
using either any image pyramid or any wavelet transform. The actual fusion
process takes place in the difference resp. wavelet domain, where the fused
multiscale representation is built by a pixel-by-pixel selection of the coefficients
with maximum magnitude. Finally the fused image is computed by an application
of the appropriate reconstruction scheme

6
Fig. 1 Block Diagram Of Basic Image Fusion Process

7
AIM OF THE PROJECT

2.1 NEW IMAGE FUSION ALGORITHM

The paper adopts the multiresolution analysis discrete wavelet frame
transform and fuzzy region feature fusion scheme to implement the selection of
source image wavelet coefficients. Fig.1 is the framework of the proposed image
fusion algorithm. The first step is to choose an image as object image that can
reflect the object and background clearer than the other image. The second step is
to decompose the source image into multiresolution representation. There are low
frequency band at each level during the next level decomposition. The low
frequency bands of the object image are segmented into region images. The third
step is defining the attributes of the regions by some region features, such as the
mean of gray level in a region. In this case, each pixel point has its membership
value. Then using certain attribute region fusion scheme combining with the
membership value of each pixel, the multiresolution representation of the fusion
result is achieved using defuzzification process. The final step is to do inverse
discrete wavelet frame transform, and the final fusion result is obtained.

8
The fusion of images is the process of combining two or more images into a
single image retaining important features from each. Fusion is an important
technique within many disparate fields such as remote sensing, robotics and
medical applications. Wavelet based fusion techniques have been reasonably
effective in combining perceptually important image features. Shift invariance of
the wavelet transform is important in ensuring robust sub band fusion. Therefore
the novel application of the shift invariant and directionally selective Dual Tree
Complex Wavelet Transform (DT-CWT) to image fusion is now introduced. This
novel technique provides improved qualitative and quantitative results compared to
previous wavelet fusion method.

9
The goals for this Project have been the following.

One goal has been to compile an introduction to the subject of Image
Fusion. There exist a number of studies on various algorithms, but complete
treatments on a technical level are not as common. Material from papers, journals,
and conference proceedings are used that best describe the various parts.

Another goal has been to search for algorithms that can be used to
implement for the image fusion for various applications.

A third goal is to evaluate their performance of with different image quality
metrics. These properties were chosen because they have the greatest impact on
the detection of Image fusion algorithms

A final goal has been to design and implement the Wavelet based fuzzy and
Neural approaches using matlab.

2.2 SCOPE OF THE PROJECT

2.2.1 DWT versus DT-CWT

Figures 3(a) and 3(b) show a pair of multifocus test images that were fused
for a closer comparison of the DWT and DT-CWT methods. Figures 3(d) and 3(e)
show the results of a simple MS method using the DWT and DT-CWT,
respectively. These results are clearly superior to the simple pixel averaging result
shown in 3(c). They both retain a perceptually acceptable combination of the two
“in focus” areas from each input image. An edge fusion result is also shown for
comparison (figure 3(f)) [8]. Upon closer inspection however, there are residual
ringing artefacts found in the DWT fused image not found within the DT-CWT
fused image. Using more sophisticated coefficient fusion rules (such as WBV or
WA) the DWT and DT-CWT results were much more difficult to distinguish.
However, the above comparison when using a simple MS method reflects the
ability of the DT-CWT to retain edge details without ringing.

10
Figure 2.1: (a) First image of the multifocus test set. (b) Second image of the
multi focus test set. (c) Fused image using average pixel values. (d) Fused
image using DWT with an MS fuse rule. (e) Fused image using DT-CWT with
an MS fuse rule. (f) Fused image using multiscale edge fusion
(point representations).

11
2.2.2 Quantitative Comparisons

Often the perceptual quality of the resulting fused image is of prime
importance. In these circumstances comparisons of quantitative quality can often
be misleading or meaningless. However, a few authors [1, 7, 10] have attempted
to generate such measures for applications where their meaning is clearer. Figures
3(a) and 3(b) reflect such an application: fusion of two images of differing focus to
produce an image of maximum focus. Firstly, a “ground truth” image needs to be
created that can be quantitatively compared to the fusion result images. This is
produced using a simple cut-and-paste technique, physically taking the “in focus”
areas from each image and combining them. The quantitative measure used to
compare the cut-and-paste image to each fused image was taken from [1]

Figure 2.2: (a) First image (MR) of the medical test set. (b) Second image
(CT) of the medical test set. (c) Fused image using average pixel values. (d)
Fused image using DWT with an MS fuse rule. (e) Fused image using DT-
CWT with an MS fuse rule. (f) Fused image using multiscale edge fusion
(point representations).

12
where Igt is
the cut-and-paste “ground truth” image, ___ is the fused image and is the size of
the image. Lower values of _ indicate greater similarity between the images___
and ___ and therefore more successful fusion in terms of quantitatively
measurable similarity. Table 1 shows the results for the various methods used. The
average pixel value method gives a baseline result. The PCA method gave an
equivalent but a slightly worse result. These methods have poor results relatively
to the others. This was expected as they have no scale selectivity. Results were
obtained for the DWT methods using all the bio-orthogonal wavelets available
within the Matlab (5.0) Wavelet Toolbox. Similarly, results were obtained for the
DT-CWT methods using all the shift invariant wavelet filters described in [3].
Results were also calculated for the SIDWT using the Haar wavelet and the
bior2.2 Daubechies wavelet. The table 1 shows the best results for all filters for
each method. For all filters, the DWT results were worse than their DT-CWT
equivalents. Similarly, all the DWT results were worse than their SIDWT
equivalents. This demonstrates the importance of shift invariance in wavelet
transform fusion. The DT-CWT results were also better than the equivalent results
using the SIDWT. This indicates the improvement gained from the added
directional selectivity of the DT-CWT over the SIDWT. The WBV and WA methods
performed better than MS with equivalent transforms as expected, with WBV
performing best for both cases. All of the wavelet transform results were
decomposed to four levels. In addition, the residual low pass images were fused
using simple averaging and the window for the WA and WBV methods were all set
to 3_3.

13
Table 2.1: Quantitative results for various fusion methods.

2.3 EFFECT OF WAVELET FILTER CHOICE FOR DWT AND DT-CWT
BASED FUSION

There are many different choices of filters to affect the DWT transform. In
order not to introduce phase distortions, using filters having a linear phase
response is a sensible choice. To retain a perfect reconstruction property, this
necessitates the use of biorthogonal filters. MS fusion results were compared for
all the images in figures 3 and 4 using all the biorthogonal filters included in the
Mat lab (5.0) Wavelet Toolbox. Likewise there are also many different choices of
filters to affect the DT-CWT transform. MS fusion results were compared for all the
same three image pairs using all the specially designed filters given in [3].
Qualitatively all the DWT results gave more ringing artifacts than the equivalent
DTCWT results. Different choices of DWT filters gave ringing artifacts at different
image locations and scales. The choice of filters for the DT-CWT did not seem to
alter or move the ringing artifacts found within the fused images. The perceived
higher quality of the DT-CWT fusion results compared to the DWT fusion results
was also reflected by a quantitative comparison.

14
WAVELET TRANSFORM OVERVIEW

3.1 WAVELET TRANSFORM

Wavelets are mathematical functions defined over a finite interval and
having an average value of zero that transform data into different frequency
components, representing each component with a resolution matched to its scale.

The basic idea of the wavelet transform is to represent any arbitrary
function as a superposition of a set of such wavelets or basis functions. These
basis functions or baby wavelets are obtained from a single prototype wavelet
called the mother wavelet, by dilations or contractions (scaling) and translations
(shifts). They have advantages over traditional Fourier methods in analyzing
physical situations where the signal contains discontinuities and sharp spikes.
Many new wavelet applications such as image compression, turbulence, human
vision, radar, and earthquake prediction are developed in recent years. In wavelet
transform the basis functions are wavelets. Wavelets tend to be irregular and
symmetric. All wavelet functions, w(2kt - m), are derived from a single mother
wavelet, w(t). This wavelet is a small wave or pulse like the one shown in Fig. 3.2.

Fig. 3.1 Mother wavelet w(t)

Normally it starts at time t = 0 and ends at t = T. The shifted wavelet w(t - m)
starts at t = m and ends at t = m + T. The scaled wavelets w(2kt) start at t = 0 and
end at t = T/2k. Their graphs are w(t) compressed by the factor of 2k as shown in
Fig. 3.3. For example, when k = 1, the wavelet is shown in Fig 3.3 (a). If k = 2 and
3, they are shown in (b) and (c), respectively.

15
(a)w(2t) (b)w(4t) (c)w(8t)

Fig. 3.2 Scaled wavelets

The wavelets are called orthogonal when their inner products are zero. The
smaller the scaling factor is, the wider the wavelet is. Wide wavelets are
comparable to low-frequency sinusoids and narrow wavelets are comparable to
high-frequency sinusoids.

3.1.1 Scaling

Wavelet analysis produces a time-scale view of a signal. Scaling a wavelet
simply means stretching (or compressing) it. The scale factor is used to express
the compression of wavelets and often denoted by the letter a. The smaller the
scale factor, the more “compressed” the wavelet. The scale is inversely related to
the frequency of the signal in wavelet analysis.

3.1.2 Shifting

Shifting a wavelet simply means delaying (or hastening) its onset.
Mathematically, delaying a function f(t) by k is represented by: f(t-k) and the
schematic is shown in fig. 3.4.

(a)
Wavelet function Ψ(t) (b) Shifted wavelet function Ψ(t-k)

Fig. 3.3 Shifted wavelets

16
3.1.3 Scale and Frequency

The higher scales correspond to the most “stretched” wavelets. The more
stretched the wavelet, the longer the portion of the signal with which it is being
compared, and thus the coarser the signal features being measured by the
wavelet coefficients. The relation between the scale and the frequency is shown in
Fig. 3.5.

Low scale High scale

Fig. 3.4 Scale and frequency

Thus, there is a correspondence between wavelet scales and frequency as
revealed by wavelet analysis:
•Low scale a Compressed wavelet Rapidly changing details High
frequency.
•High scale a Stretched wavelet Slowly changing, coarse features Low
frequency.

3.2 DISCRETE WAVELET TRANSFORM

Calculating wavelet coefficients at every possible scale is a fair amount of
work, and it generates an awful lot of data. If the scales and positions are chosen
based on powers of two, the so-called dyadic scales and positions, then
calculating wavelet coefficients are efficient and just as accurate. This is obtained
from discrete wavelet transform (DWT).

3.2.1 One-Stage Filtering

For many signals, the low-frequency content is the most important part. It is
the identity of the signal. The high-frequency content, on the other hand, imparts
details to the signal. In wavelet analysis, the approximations and details are
obtained after filtering. The approximations are the high-scale, low frequency
17
components of the signal. The details are the low-scale, high frequency
components. The filtering process is schematically represented as in Fig. 3.6.

Fig. 3.5 Single stage filtering

The original signal, S, passes through two complementary filters and
emerges as two signals. Unfortunately, it may result in doubling of samples and
hence to avoid this, downsampling is introduced. The process on the right, which
includes downsampling, produces DWT coefficients. The schematic diagram with
real signals inserted is as shown in Fig. 3.7.

Fig. 3.6 Decomposition and decimation

3.2.2 Multiple-Level Decomposition

The decomposition process can be iterated, with successive
approximations being decomposed in turn, so that one signal is broken down into
18
many lower resolution components. This is called the wavelet decomposition tree
and is depicted as in Fig. 3.8.

Fig. 3.7 Multilevel decomposition

3.2.3 Wavelet Reconstruction

The reconstruction of the image is achieved by the inverse discrete wavelet
transform (IDWT). The values are first upsampled and then passed to the filters.
This is represented as shown in Fig. 3.9.

Fig. 3.8 Wavelet Reconstruction

The wavelet analysis involves filtering and downsampling, whereas the
wavelet reconstruction process consists of upsampling and filtering. Upsampling is
the process of lengthening a signal component by inserting zeros between
samples as shown in Fig. 3.10.

19
Fig. 3.9 Reconstruction using upsampling

3.2.4 Reconstructing Approximations and Details

It is possible to reconstruct the original signal from the coefficients of the
approximations and details. The process yields a reconstructed approximation
which has the same length as the original signal and which is a real approximation
of it.

The reconstructed details and approximations are true constituents of the
original signal. Since details and approximations are produced by downsampling
and are only half the length of the original signal they cannot be directly combined
to reproduce the signal. It is necessary to reconstruct the approximations and
details before combining them. The reconstructed signal is schematically
represented as in Fig. 3.11.

Fig. 3.10 Reconstructed signal components

3.2.5 1-D Wavelet Transform

The generic form for a one-dimensional (1-D) wavelet transform is shown in
Fig. 3.12. Here a signal is passed through a low pass and high pass filter, h and g,

20
respectively, then down sampled by a factor of two, constituting one level of
transform.

Fig. 3.11 1D Wavelet Decomposition.

Repeating the filtering and decimation process on the lowpass branch
outputs make multiple levels or “scales” of the wavelet transform only. The process
is typically carried out for a finite number of levels K, and the resulting coefficients
are called wavelet coefficients.

The one-dimensional forward wavelet transform is defined by a pair of filters
and t that are convolved with the data at either the even or odd locations. The
filters s and t used for the forward transform are called analysis filters.

nL nH
li = ∑ sjx2i+j and hi = ∑ tjx2i+1+j
j=-nl j=-nH

Although l and h are two separate output streams, together they have the
same total number of coefficients as the original data. The output stream l, which
is commonly referred to as the low-pass data may then have the identical process
applied again repeatedly. The other output stream, h (or high-pass data), generally
remains untouched. The inverse process expands the two separate low- and high-
pass data streams by inserting zeros between every other sample, convolves the
resulting data streams with two new synthesis filters s’ and t’, and adds them
together to regenerate the original double size data stream.
nH nl
yi = ∑ t’jl’i+j + ∑ s’j h’i+j where l’2i = li, l’ 2i+1 = 0
j= -nH j= -nH h’2i+1 = hi, h’2i = 0

21
To meet the definition of a wavelet transform, the analysis and synthesis
filters s, t, s’ and t’ must be chosen so that the inverse transform perfectly
reconstructs the original data. Since the wavelet transform maintains the same
number of coefficients as the original data, the transform itself does not provide
any compression. However, the structure provided by the transform and the
expected values of the coefficients give a form that is much more amenable to
compression than the original data. Since the filters s, t, s’ and t’ are chosen to be
perfectly invertible, the wavelet transform itself is lossless. Later application of the
quantization step will cause some data loss and can be used to control the degree
of compression. The forward wavelet-based transform uses a 1-D subband
decomposition process; here a 1-D set of samples is converted into the low-pass
subband (Li) and high-pass subband (Hi). The low-pass subband represents a
down sampled low-resolution version of the original image. The high-pass
subband represents residual information of the original image, needed for the
perfect reconstruction of the original image from the low-pass subband

3.3 2-D TRANSFORM HEIRARCHY

The 1-D wavelet transform can be extended to a two-dimensional (2-D)
wavelet transform using separable wavelet filters. With separable filters the 2-D
transform can be computed by applying a 1-D transform to all the rows of the
input, and then repeating on all of the columns.

LL1 HL1
Fig. 3.12
Subband
Labeling
Scheme for a
LH1 HH1
one level, 2-D
Wavelet
Transform

22
The original image of a one-level (K=1), 2-D wavelet transform, with
corresponding notation is shown in Fig. 3.13. The example is repeated for a three-
level (K =3) wavelet expansion in Fig. 3.14. In all of the discussion K represents
the highest level of the decomposition of the wavelet transform.

LL1 HL1
HL2
LH1 HH1
HL3
LH2 HH2

LH3 HH3

Fig. 3.13 Subband labeling Scheme for a Three Level, 2-D Wavelet Transform

The 2-D subband decomposition is just an extension of 1-D subband
decomposition. The entire process is carried out by executing 1-D subband
decomposition twice, first in one direction (horizontal), then in the orthogonal
(vertical) direction. For example, the low-pass subbands (Li) resulting from the
horizontal direction is further decomposed in the vertical direction, leading to LLi
and LHi subbands.

Similarly, the high pass subband (Hi) is further decomposed into HLi and
HHi. After one level of transform, the image can be further decomposed by
applying the 2-D subband decomposition to the existing LLi subband. This iterative
process results in multiple “transform levels”. In Fig. 3.14 the first level of transform
results in LH1, HL1, and HH1, in addition to LL1, which is further decomposed into
LH2, HL2, HH2, LL2 at the second level, and the information of LL2 is used for the
third level transform. The subband LLi is a low-resolution subband and high-pass
23
subbands LHi, HLi, HHi are horizontal, vertical, and diagonal subband respectively
since they represent the horizontal, vertical, and diagonal residual information of
the original image. An example of three-level decomposition into subbands of the
image CASTLE is illustrated in Fig. 3.15.

H2H1HH

Fig. 3.14 The process of 2-D wavelet transform applied through three
transform levels

To obtain a two-dimensional wavelet transform, the one-dimensional
transform is applied first along the rows and then along the columns to produce
four subbands: low-resolution, horizontal, vertical, and diagonal. (The vertical
subband is created by applying a horizontal high-pass, which yields vertical
edges.) At each level, the wavelet transform can be reapplied to the low-resolution
subband to further decorrelate the image. Fig. 3.16 illustrates the image
decomposition, defining level and subband conventions used in the AWIC
algorithm. The final configuration contains a small low-resolution subband. In
addition to the various transform levels, the phrase level 0 is used to refer to the
original image data. When the user requests zero levels of transform, the original
image data (level 0) is treated as a low-pass band and processing follows its
natural flow.

Low Resolution Subband
24
4
3 Level 1
4 4 Level 2
3 3 Vertical subband
Level 2 Level 2 HL
Fig. 3.15 Image
Level 1 Level 1 Decomposition
Horizontal Subband Diagonal Subband Using
LH HH Wavelets

Wavelet transform is first performed on each source images, then a fusion
decision map is generated based on a set of fusion rules. The fused wavelet
coefficient map can be constructed from the wavelet coefficients of the source
images according to the fusion decision map. Finally the fused image is obtained
by performing the inverse wavelet transform.
From the above diagram, we can see that the fusion rules are playing a
very important role during the fusion process. Here are some frequently used
fusion rules in the previous work:

25
When constructing each wavelet coefficient for the fused image. We will
have to determine which source image describes this coefficient better. This
information will be kept in the fusion decision map. The fusion decision map has
the same size as the original image. Each value is the index of the source image
which may be more informative on the corresponding wavelet coefficient. Thus, we
will actually make decision on each coefficient. There are two frequently used
methods in the previous research. In order to make the decision on one of the
coefficients of the fused image, one way is to consider the corresponding
coefficients in the source images as illustrated by the red pixels. This is called
pixel-based fusion rule. The other way is to consider not only the corresponding
coefficients, but also their close neighbors, say a 3x3 or 5x5 windows, as
illustrated by the blue and shadowing pixels. This is called window-based fusion
rules. This method considered the fact that there usually has high correlation
among neighboring pixels.

In our research, we think objects carry the information of interest, each pixel
or a small neighboring pixels are just one part of an object. Thus, we proposed a
region-based fusion scheme. When make the decision on each coefficient, we
consider not only the corresponding coefficients and their closing neighborhood,
but also the regions the coefficients are in. We think the regions represent the
objects of interest. We will provide more details of the scheme in the following.

26
3.4 PROPOSED SCHEME

Neural Network and Fuzzy Logic approach can be used for sensor fusion.
Such a sensor fusion could belong to a class of sensor fusion in which case the
features could be input and decision could be output. The help of Neuro-fuzzy of
fuzzy systems can achieve sensor fusion. The system can be trained from the
input data obtained from the sensors. The basic concept is to associate the given
sensory inputs with some decision outputs. After developing the system. another
group of input data is used to evaluate the performance of the system.

Following algorithm and .M file for pixel level image fusion using Fuzzy
Logic illustrate the process of defining membership functions and rules for the
image fusion process using FIS (Fuzzy Inference System) editor of Fuzzy Logic
toolbox in Matlab.

3.5 PROPOSED ALGORITHM

STEP 1

 Read first image in variable M1 and find its size (rows z l , columns: SI).
 Read second image in variable M2 and find its size (rows z2, columns: s2).
 Variables MI and M2 are images in matrix form where each pixel value is in
the range from 0-255. Use Gray color map.
 Compare rows and columns of both input images. If the two images are not
of the same size, select the portion,which are of same size.

STEP 2
 Apply wavelet decomposition and form spatial decomposition Trees
 Convert the images in column form which has C= zl*sl entries.

27
STEP 3
Create fuzzy interference system of type Mamdani with following
specifications
Name: 'c7'
Type: 'mamdani'
AndMethod: 'min'
OrMethod: 'max'
DefuzzMethod: 'centroid'
ImpMethod: 'min'
AggMethod: 'max'

STEP 4
 Decide number and type of membership functions for both the input
images by tuning the membership functions.
 Input images in antecedent are resolved to a degree of membership
ranging 0 to 255.
 Make rules for input images, which resolve the two antecedents to a
single number from 0 to 255.

28
STEP 5
For num=l to C in steps of one, apply fuzzification using the rules developed
above on the corresponding pixel values of the input images which gives a fuzzy
set represented by a membership function and results in output image in column
format.

Check the rules using rule viewer and surface viewer

29
STEP 6
Convert the column form to matrix form and display the fused image.

3.7 ALGORITHM USING NEURO FUZZY

30
STEP 1

 Read first image in variable M1 and find its size (rows z l , columns: SI).
 Read second image in variable M2 and find its size (rows z2, columns: s2).
 Variables MI and M2 are images in matrix form where each pixel value is in the
range from 0-255. Use Gray color map.
 Compare rows and columns of both input images. If the two images are not of the
same size, select the portion,which are of same size.

STEP 2
 Apply wavelet decomposition and form spatial decomposition Trees
 Convert the images in column form which has C= zl*sl entries.

STEP 3

 Form a training data, which is a matrix with three columns and
entries in each column are form 0 to 255 in steps of 1.
 Form a check data which is a matrix of pixels of two input images in
a column format
 Decide the number and type of Membership Function.

 Create fuzzy interference system of type Mamdani with following
specifications
Name: 'c7'
Type: 'mamdani'
AndMethod: 'min'

31
OrMethod: 'max'
DefuzzMethod: 'centroid'
ImpMethod: 'min'
AggMethod: 'max'

STEP 4
 Decide number and type of membership functions for both the input
images by tuning the membership functions.
 Input images in antecedent are resolved to a degree of membership
ranging 0 to 255.
 Make rules for input images, which resolve the two antecedents to a
single number from 0 to 255.

32
STEP 5
For num=l to C in steps of one, apply fuzzification using the rules developed
above on the corresponding pixel values of the input images which gives a fuzzy
set represented by a membership function and results in output image in column
format.

STEP 6
 Start training using ANFIS for the generated Fuzzy Interference system
using Training data

 Apply Fuzification using Trained Data and Check Data

 Convert the column form to matrix form and display the fused image.

33
QUANTITATIVE COMPARISONS

4.1 PERFORMANCE EVALUATION OF FUSION

It has been common to evaluate the result of fusion visually. According to
visual evaluation, human judgment determines the quality of the image. Some
independent and objective observers give grade to corresponding image and the
final grade is obtained by taking the average or weighted mean of the individual
grades. Obviously this evaluation method has some drawbacks, namely it is not
accurate and depends on the observer’s experience. For an accurate and truthful
assessment of the fusion product some quantitative measures (indicator) is
required. Two different measures are used in this project to evaluate the results of
fusion process. They are Information Entropy and Root Mean Square Error.

4.2 ENTROPY

One of the quantitative measures in digital image processing is Entropy.
Claude Shannon introduced the entropy concept in quantification of information
content of messages. Although he used entropy in communication, it can be also
employed as a measure and quantify the information content of digital images. A
digital image consists of pixels arranged in rows and columns. Each pixel is
defined by its position and by its grey scale level. For an image consists of L grey
levels, the entropy is defined as:

where is the probability (here frequency) of each grey scale level. As an example a
digital image of type uint8 (unsigned integer 8) has 256 different levels from 0
(black) to 255(white . It must be noticed that in combined images the number of
levels is very large and grey level intensity of each pixel is a decimal, double
number. But the equation (10) is still valid to compute the entropy. For images with
high information content the entropy is large. The larger alternations and changes
in an image give larger entropy and the sharp and focused images have more

34
changes than blurred and misfocused images. Hence, the entropy is a measure to
assess the quality of different aligned images from the same scene.

The Root Mean Square Error between the reference image, I and the fused
image is defined as: F

where and i j denotes the spatial position of pixels, M and are the dimensions of
the images. N This measure is appropriate for a pair of images containing two
objects. First a reference, everywhere-infocus image I is taken. Then two images
are provided from this original image. In one image the first object is focused and
the second one is blurred. In the other image the first object is blurred and another
one is remained focused. The fused image would contain both well-focused
objects.

Often the perceptual quality of the resulting fused image is of prime
importance. In these circumstances, comparisons of quantitative quality can often
be misleading or meaningless. However, a few authors [1, 8, 9] have attempted to
generate such measures for applications where their meaning is clearer. Figure 2
reflects such an application: fusion of two images of differing focus to produce an
image of maximum focus. Firstly, a “ground truth” image needs to be created that
can be quantitatively compared to the fusion result images. This is produced using
a simple cut-and-paste technique, physically taking the “in focus” areas from each
image and combining them. The quantitative measure used to compare the cut-
and-paste image to each fused image wastaken from [1]

where Igt is the cut-and-paste “ground truth” image, Ifd is the fused image and N
is the size of the image. Lower values of _ indicate greater similarity between the
images Igt and Ifd and therefore more successful fusion in terms of quantitatively
35
measurable similarity. Table 1 shows the results for the various methods used. The
average pixel value method, the pixel based PCA and the DWT methods give poor
results relatively to the others as expected. The DT-CWT methods give roughly
equivalent results although the New-CWT method gave slightly worse results. The
results were however very close and should not be taken as indicative as this is
just one experiment and the transforms are producing essentially the same
subband orms. The WBV and WA methods performed better than MS with
equivalent transforms as expected in most cases. The residual low pass images
were fused using simple averaging and the window for the WA and WBV methods
were all set to 3×3. The table 1 shows the best results for all filters available for
each method.

4.3 APPLICATIONS AND TRENDS

4.3.1 Navigation Aid

To allow helicopter pilots navigate under poor visibility conditions (such as
fog or heavy rain) helicopters are equipped with several imaging sensors, which
can be viewed by the pilot in a helmet mounted display. A typical sensor suite
includes both a low-light-television (LLTV) sensor and a thermal imaging forward-
looking-infrared (FLIR) sensor. In the current configuration, the pilot can choose on
of the two sensors to watch in his display. A possible improvement is combine both
imaging sources into a single fused image which contains the relevant image
information of both imaging devices. The following images in the result 1.1
illustrate this application.

4.3.2 Merging Out-Of-Focus Images

Due to the limited depth-of-focus of optical lenses (especially such with long
focal lengths) it is often not possible to get an image which contains all relevant
objects 'in focus'. One possibility to overcome this problem is to take several
pictures with different focus points and combine them together into a single frame
which finally contains the focused regions of all input images. The following
images in the result 1.2 illustrate this approach.

36
4.3.3 Medical Imaging

With the development of new imaging methods in medical diagnostics
arises the need of meaningful (and spatial correct) combination of all available
image datasets. Examples for imaging devices include computer tomography (CT),
magnetic resonance imaging (MRI) or the newer positron emission tomography
(PET). The following images in the result 1.3 illustrate the fusion of a CT and a
MRI image.

4.3.4 Remote Sensing

Remote sensing is a typical application for image fusion: Modern spectral
scanners gather up to several hundred of spectral bands which can be either
visualized and processed individually, or which can be fused into a single image,
depending on the image analysis task. The following images illustrate in the result
1.4 the fusion of two bands of a multispectral scanner.

37
4.4 RESULTS

Fig 4.1 Fusion by Averaging

Fig 4.2
Fusion
by
Maximum

38
Fig 4.3 Fusion by Minimum

Fig 4.4 Fusion by PCA

39
Fig 4.5 Fusion by averaging

Fig 4.6 Fusion by Averaging

40
FUSION BY AVERAGING FUSION BY MAXIMUM

FUSION BY MINIMUM FUSION BY PCA

41
CONCLUSION

In this project, the use of Discrete Wavelet Transform (DWT), Fuzzy, Neuro
Fuzzy, the fusion of images taken by digital camera was studied. The pixel-level-
based fusion mechanism applied to sets of images. All the results obtained by
these methods are valid in case of using aligned source images from the same
scene. In order to evaluate the results and compare these methods two
quantitative assessment criteria Information Entropy and Root Mean Square Error
were employed. Experimental results indicated that there are no considerable
differences between these two methods in performance. In fact if the result of
fusion in each level of decomposition is separately evaluated visually and
quantitatively in terms of entropy, no considerable difference will be observed (Fig.
5, 6,7, 9 and 11 and Tables 2 and 4). Although some differences identified in lower
levels, DWT and LPT demonstrated similar results from level three of
decomposing. Both techniques reach the best result in terms of information
entropy with a decomposing level of three. Experimental results demonstrated in
Tables 2 and 4 also indicate that LPT algorithm reaches its best quality in terms of
entropy in lower levels than DWT. The RMSE values represented in Table 6 show
that neither LPT nor DWT has better performance in all levels, although the best
result belongs to the LPT method. However the RMSE results compared to quality
and entropy of fused images indicate that RMSE can not be used as a proper
criterion to evaluate and compare the fusion results. Finally the experiments
showed that the LPT approach is implemented faster than DWT. Actually LPT
takes less than half the time in comparison with DWT and with regard to
approximately similar performance, LPT is preferred in real-time applications.
Fuzzy and Neuro-Fuzzy algorithms have been implemented to fuse a variety of
images. The results of fusion process proposed are given in terms of Entropy and
Variance. The fusions have been implemented for medical images and remote
sensing images. It is hoped that the techniques can be extended for colored
images and for fusion of multiple sensor images.

42
5.1 DWT Fusion

The DWT fusion methods provide computationally efficient image fusion
techniques. Various fusion rules for the selection and combination of subband
coefficients increase the quality (perceptual and quantitatively measurable) of
image fusion in specific applications.

5.2 DT-CWT Fusion

The developed DT-CWT fusion techniques provide better quantitative and
qualitative results than the DWT at the expense of increased computation. The DT-
CWT method is able to retain edge information without significant ringing artifacts.
It is also good at faithfully retaining textures from the input images. All of these
features can be attributed to the increased shift invariance and orientation
selectivity of the DT-CWT when compared to the DWT. A previously developed
shift invariant wavelet transform (the SIDWT) has been used for image fusion [7].
However, the SIDWT suffers from excessive redundancy. The SIDWT also lacks
the directional selectivity of the DT-CWT. This is reflected in the superior
quantitative results of the DT-CWT (see table1) Various fusion rules for the
selection and combination of sub band coefficients quantitatively measurable) of
image fusion in specific applications. The DT-CWT has the further advantage that
the phase information is available for analysis. However, after an initial set of
experiments using the notion of phase coherence, no improvement in fusion
performance has been achieved.

43
REFERENCES

[1] Shutao Li, James T. Kwok, Ivor W. Tsang, Yaonan Wang, “ Fusing images with
different focuses using support vector machines” IEEE Transactions on Neural
Networks, 15(6):1555- 1561, Nov. 2004.

[2] P. J. Burt and R. J. Lolczynski, “ Enhanced image capture through fusion” In
Proc. the 4th Intl. Conf. on Computer Vision, pages 173-182, Berlin, Germany, May
1993.

[3] Z. Zhang and R. Blum, “A categorization of multiscale-decomposition-based
image fusion schemes with a performance study for a digital camera application”
Proceedings of the IEEE, pages 1315 -1328, August 1999.

[4] P.J Burt, EH Adelson, “The Laplacian pyramid as a compact image code”. IEEE
Transactions Communications, 31, pp.532-540, April.1983

[5] Shutao Li, James T. Kwok, Yaonan Wang, “Combination of images with diverse
focuses using the spatial frequency”, Information Fusion 2(3): 169-176, 2001

[6] Z. Zhang and R. S. Blum, “ Multisensor Image Fusion using a Region-Based
Wavelet
Transform Approach” Proc. of the DARPA IUW, pp. 1447-1451, 1997.

[7] Pajares, G., De La Cruz, JM, “ A wavelet-based image fusion tutorial”. Pattern
Recognition,37, pp. 1855-1872, 2004.

[8] MATLAB, Wavelet Toolbox User's Guide, http://www.mathworks.com
The Mathworks, Inc., August 2005.

[9] H. Wang, J. Peng and W. Wu, “Fusion algorithm for multisensor images based
on discretemultiwavelet transform”. Vision, Image and Signal Processing,
Proceedings of the IEEE Vol.149, no.5: October 2002.

44