Diffusion-based Image Filtering: Gaussian Filters Reduce Noise

Diffusion-based Image Filtering
Technical Report
Advanced Mining Technology - University of Chile
Alvaro Egaña
October 24, 2012
1
1 INTRODUCTION
1 Introduction
When you are looking for patterns in an image you would like to have some kind of
procedure to reduce both redundant and spurious information. This way you isolate the
target object as much as possible from its environment focusing the search on the object
properties and not on the relationships with other objects.
1.1 Kernel filters
Spurious information can be defined as random fluctuations in pixel values and it is often
referred as noise. There are many techniques to eliminate noise from an image and the
most common is neighborhood averaging which in most cases means convolving the original
image I(x, y) with a kernel K(x, y) to obtain a noise-free image Ic (x, y):
Ic (x, y) = K(x, y) ? I(x, y) (1.1.1)
In general terms in space domain, if the image and the kernel are regarded as a matrix
I(x, y) : h × w and a matrix K(x, y) : n × n respectively (n h, w) the operation can be
defined as
X n−1
n−1 X
Ic (i, j) = I(i − (d − a), j − (d − b))K(a, b) (1.1.2)
a=0 b=0
where
• n is an odd integer.
n−1
• d= 2
• d ≤ i ≤ (h − d) and d ≤ j ≤ (w − d)
The resulting matrix Ic (x, y) : (h − n + 1) × (w − n + 1) is smaller than the original one

and there are many approaches, each one with pros and cons, to correct that in order
to preserve the original image dimensions - one of them is to repeat the borders in the
2
1.1 Kernel filters 1 INTRODUCTION
original image to increase its size so that when it is trimmed by operation 1.1.2 the desired
size is obtained.
In practice you want to keep pixel values within the original image domain. To accomplish
this, equation 1.1.1 is modified to include a normalisation this way:
Ic (x, y)
I˜c (x, y) = (1.1.3)
S(K)
where (if K(x, y) : n × n)
n−1
X n−1
X
S(K) = K(a, b) (1.1.4)
a=0 b=0
From now on, unless otherwise indicated, the term convolution (using a kernel) will be
referred to equation 1.1.3.
Of course the smoothing - i.e., noise reduction - properties depend on the kernel type. For
example, if you apply the kernel
 
1 1 1
A= 1 1 1 
 
1 1 1
you are replacing every pixel value by the average of its surrounding pixels. This filter
is called mean filter and produces an effective but very rough smoothing effect. You can
control the blurring level by changing the kernel weights, thus, for instance, by applying
kernel Ã you will also produce a smoothed image but with different blurring.
 
2 4 2
Ã =  4 8 4 
 
2 4 2
On the contrary, if you apply a kernel like this:
3
1.2 Gaussian filters 1 INTRODUCTION
 
−1 −1 −1
L =  −1 8 −1 
 
−1 −1 −1
which is a second-order-derivative based kernel, you are subtracting the original pixel value
from the eight neighbouring pixels. This way, in an image region where pixel values are
uniform (or has uniform gradient pixel values) this kernel will reduce the pixel value to
zero. If there is a discontinuity the pixel value will be non-zero and consequently edges
will be enhanced. Figure 1 shows the result of applying these kernels to a sample image.
(a) (b) (c) (d)
Figure 1: (a) Original image, (b) After applying kernel A, (c) After applying kernel Ã, (d)
After applying kernel L.
On the other hand, kernel smoothing level will depend on the kernel size since too small
kernels will tend to reproduce local variations - resulting in a smoothed version very similar
to the original image - while larger kernels will accumulate global values - for instance, a
large enough kernel will calculate the average of all image pixels at every pixel. Figure 2
shows the result of applying kernels with the same shape but at different sizes.
1.2 Gaussian filters
Convolution can also be defined in frequency domain and kernels can be best analysed
to understand their smoothing properties in that context. It turns out from this analysis
that one of the most useful shapes for a kernel is that of a Gaussian. For instance, if you
consider a 1-D digital signal f , the convolution with a Gaussian function Kσ is
Z
(Kσ ? f )(x) ≡ Kσ (x − y)f (y)dy (1.2.1)
R
4
(a) (b) (c)
Figure 2: (a) Original image, (b) After applying kernel A (size 3x3), (c) After applying
kernel A (size 13x13).
where Kσ is Gaussian function characterized by a standard deviation σ,
1 − x2
h i
Kσ (x) = e 2σ 2 (1.2.2)
2πσ 2
If you define the Fourier transform z by
Z
(zf )(w) ≡ f (x)e−i2πwx dx (1.2.3)
R
Then by the convolution theorem you can obtain that in frequency domain you have that
(z(Kσ ? f ))(w) = (zKσ )(w) · (zf )(w) (1.2.4)
and since the Fourier transform of a Gaussian is also Gaussian-shaped,
w2
h i
− 2/σ 2
(zKσ )(w) = e (1.2.5)
you can conclude that a Gaussian function is a low-pass filter in frequency domain. Thus,
again by the convolution theorem you should expect excellent smoothing properties in
space domain in advance. The same analysis can be extended to 2-D signals, i.e. images,
in frequency domain - further details can be found in [8, chapter 6] or in [4, chapter 4]. One
way to use these Gaussian filters in frequency domain would be just applying the discrete
5
Fourier transform (DFT) to the image, then convolving the transformed image with a
Gaussian function using the discrete version of equation 1.2.1, and then going back to the
space domain by applying the inverse DFT to the result. But this adds too many steps
to the process and for relatively small kernels is much more (computationally) efficient to
use the process described in section 1.1 in space domain directly as shown below.
In space domain, a Gaussian kernel is composed of a set of (ideally integer) weights that
approximate the profile of a Gaussian function along any column, row, or diagonal through
the centre. The 2-D function is also characterized by a standard deviation and calculated
as
1 − x2 +y 2
h i
Gσ (x, y) = e 2σ 2 (1.2.6)
2πσ 2
where x and y are the distance in pixels from the kernel centre. The following kernel G is
calculated for σ = 0.5 and using 100 as the central weight:
 
2 14 2
σ = 0.5 =⇒ G =  14 100 14 
 
2 14 2
Relationship between σ and the kernel size is kind of practical: size is generally made large
enough that adding another row (or column) of terms would just add very small values
(ideally zero) - although normally by using this criterion you will still have zeros at the
corners, but modifying equation 1.2.6 to avoid that is very simple. There is no rule to
choose the kernel centre value but you may normally want to keep it smaller to facilitate
computer arithmetic - and therefore to reduce computation time. Of course if you want
better results you should use a floating point centre value, but the process could be rather
slower - depending on the value of σ.
Regarding to the latter, one of the most useful things about Gaussian kernels is that there
is a strategy to drastically reduce the time cost of the method described by expression
1.1.2. The key concept is that you can separate the function 1.2.6 into its components:
6
1.3 The Scale-Space 1 INTRODUCTION
1 − x2 +y 2 1 x2 1 y2
h i h i h i
− −
Gσ (x, y) = e 2σ 2 =√ e 2σ 2 ·√ e 2σ 2 (1.2.7)
2πσ 2 2πσ 2πσ
| {z } | {z }
Gσ (x) Gσ (y)
This way, with Gσ (x) and Gσ (y), you build two one-dimensional kernels that can be
applied along rows and columns respectively - in general, kernels that allow this strategy
are called separable.
In terms of noise reduction, Gaussian kernels have a very interesting property which is
that, because of the total kernel weight is concentrated in the kernel centre and drastically
- but smoothly - decreases at the kernel borders, they tend to best preserve edges, or
contact zones between objects. This is a fundamental property when you are looking for
patterns in a noise-reduced image. Figure 3 shows the result of applying a Gaussian kernel
the same size as a mean kernel (kernel A).
(a) (b) (c)
Figure 3: (a) Original image, (b) After applying kernel A (size 13x13) , (c) After applying
a Gaussian kernel (size 13x13).
A very concise description of noise reduction, or smoothing, using kernels has been exposed
so far - further details about this topic can be found either in [8] or in [4].
1.3 The Scale-Space
1.3.1 General concept
As mentioned above, the other important topic when you are looking for patterns in an
image regards to the fact of being able to reduce redundant information in some way.
This is a rather foggy concept because in principle there is no formal definition of what
7
redundant information can be. But a well-known fact is that structures, that can be
associated to different information levels, may appear at a large variety of scales within
a single image. As pointed out in [11, chapter 1], in most cases knowing what the right
scale to get the target information is in advance is not possible. Thus, having a multi-
scale image representation arises as a natural need. Moreover, latter concept gives the
possibility of having a hierarchy of structures at different scales so that information that
is redundant at finer levels is not present at coarser levels.
This multi-scale representation of images is called scale-space and a formalism to define

it was firstly introduced by Witkin [14] in 1983. The idea is quite simple and consists of
embedding the original image I(x, y) into a family of images {Ik (x, y) | k ≥ 0} such that
(
Fk (I(x, y)) (k > 0)
Ik (x, y) = (1.3.1)
I(x, y) (k = 0)
where {Fk | k ≥ 0} is a filter bank. Observe that F0 could be included into the family
under the condition F0 (I(x, y)) = I(x, y). Thus, the grater k is the coarser is the level
within the scale-space.
One of the most important assumption about the filter bank is that it must satisfy
Ft+s (I(x, y)) = Ft (Fs (I(x, y))), ∀ t, s > 0 (1.3.2)
This property is called recursivity (other properties can be found in [1, section 2]) and
means that the filtering may be split into a recursive sequence of filter banks. In other
words, relevant information from one level must be carried to coarser levels while redundant
information must be discarded, but target structures must be present at every scale.
Same way as for the case of noise reduction, filter bank smoothing properties are funda-
mental due to the wish that filtering should not add new structures or artifacts at coarser
levels caused by the filtering method. For instance, if a mean filter bank were used then
you would rapidly have “new boundaries” produced by the finer level objects blending as
shown in figure 3(b).
8
1.3.2 Gaussian Scale-Space
The first and best studied scale-space is obtained when the filter bank is a family of
Gaussian kernels. Thus, the formal definition is:
(
Gk (x, y) ? I(x, y) (k > 0)
Ik (x, y) = (1.3.3)
I(x, y) (k = 0)
In this case there is an explicit relation between σ and k. It is not so hard to prove that
it satisfies the properties mentioned above (among others) in frequency domain - you can
find more details in [11, pp. 7-9]. It has been applied in many areas in image processing
with rather good results. Its main limitation is that at coarser levels, i.e. when using
larger σ’s, you still have the problem of object borders blending. This is the motivation
to study another kind of filters that improve this drawback and that will be treated in the
coming sections.
9
2 GENERAL BACKGROUND
2 General Background
Before getting into further details it is worth to keep in mind the “good things” that you
are expecting from a smoothing filter. Perona and Malik [7] enunciated some criteria that
they considered to be essential for a multi-scale image representation that are in accordance
with what it was mentioned in the previous section. Thus, a scale-space representation
(or a smoothing process) should:
(a) Not add spurious details when passing from finer to coarser scales.
(b) Ensure that at each resolution, region boundaries or contact zones remain sharp and
coincide with semantically meaningful boundaries at that resolution.
(c) Ensure that at all scales, intra-region smoothing occur preferentially over inter-region
smoothing.
In other words, you would like that each smoothing filter is adaptive in the sense it is able
to detect the local properties of the image to decrease the blurring level when an edge or
boundary is found and increase it otherwise. Notice that the ideal candidate to be that
“local properties detector” for boundaries is ∇I(x, y). Thus, if Fk is a filter at scale k you
would like that
Fk (I(x, y)) = fk (∇I(x, y)) (2.0.4)
Where
1. fk → 0 when ∇I increases. This way you could meet both (b) and (c).
2. fk increases when ∇I → 0. This way you could meet (a).
In general, when a function Fk (h) - where h is another function - changes as ∇h changes

is said to be differential structure-dependent.
2.0.3 Space Regularization
It is worth to mention here - and the reason will be clear in the next sections - that
the main problem of using ∇I as a local predictor is that differentiation operator is very
10
2.1 The Diffusion Equation 2 GENERAL BACKGROUND
sensitive to small perturbations in the original data - that can lead to large fluctuations
in the derivatives. A problem that has this behaviour is said to be ill-posed - otherwise,
is said to be well-posed. Hence, the need of space regularisation (somewhere during the
process to calculate fk ) arises naturally - for instance, a low-pass filter could be applied to
I before calculating ∇I. But it strongly depends on the chosen method and in this report
will be treated with more details for the case of nonlinear diffusion filters.
2.1 The Diffusion Equation
This topic may sound odd, but it is necessary to see the way you can find a suitable
alternative for expression 2.0.4. Surprisingly, by the study of diffusion process equations,
partial differential equations (PDE) theory provides a very elegant framework to find such
expression. That is why, just as a reminder, a concise revision of what the physical diffusion
process is need here.
As pointed out in [5, chapter 14], heat and mass are transferred if there is a difference
of temperature in a medium, and in the concentration of some chemical species in a
mixture, respectively. But these transfers may occur in two different modes: convection
or diffusion. Diffusion process is kind of intuitive because it reflects the fact that when
such concentration differences are present a physical process occurs which equilibrates the
system by transferring heat or mass (without destroying mass) from the high concentration
zones to the low concentration zones.
This equilibration process is modelled by a rate equation called Fick’s Law of Diffusion:
j = −D · ∇u (2.1.1)
where j is the diffusive flux, produced by the gradient ∇u, which represents the amount
of heat or mass per unit time and per unit area that is being transferred. The matrix
D is positive definite and symmetric and it is called diffusion tensor. Notice that if you
replace D by a constant scalar g - called diffusivity - you have that j and ∇u are parallel
(moreover, if ∇u = ∇T you get the Fourier’s Law for heat transfer where g would become
into the thermal conductivity within the process). In this case, the diffusion process is
said to be isotropic. But Fick’s law is more general than that and j and ∇u do not need
to be always parallel - an the angle between them depends on the diffusion tensor. This
general case is said to be anisotropic.
11
2.1 The Diffusion Equation 2 GENERAL BACKGROUND
On the other hand, you have to take into account that, in the whole process, mass does
not have to be destroyed. This can be expressed by the continuity equation:
∂u
= −∇ · j (2.1.2)
∂t
Finally, if you combine equations 2.1.1 and 2.1.2 you end up with the so called diffusion
equation:
∂u
= ∇ · (D · ∇u) (2.1.3)
∂t
Expression 2.1.3 is a PDE that is used to model many physical transport processes. In
image processing you can associate the concentration u with the image value I(x, y) (which
can be either a grey value or a specific channel value) while concentration dependency on
time can be associated to the concept of scale within a scale-space context. This way you
have that if
∂I(x, y, k)
= ∇ · (D · ∇I(x, y, k)) (2.1.4)
∂k
then Fk (I(x, y)) would be the solution of this equation (2.1.4). Therefore, the key to get
the behaviour described in 2.0.4 would be to adjust the diffusion tensor D to reproduce
the behaviour of fk . In general, when D remains constant over the whole image domain
the diffusion process is said to be homogeneous, otherwise you have an inhomogeneous
diffusion process 1 .
The last distinction is, for many reasons, the most important. It is about the diffusion
tensor D adapting to the differential structure of the image along the diffusion process
(or scale-space), i.e., depending on ∇I(x, y, k) at every time step (or scale) k. When the
latter happens the diffusion process is said to be non-linear. Otherwise, when D remains
constant along the time period (or scale-space), the diffusion process is said to be linear.
Next sections provide more details about these two key concepts.
1 Although, in literature there are some divergences about the latter definition because some authors
name isotropic processes homogeneous and name anisotropic processes inhomogeneous. In this report,
definitions enunciated in [11] are adopted
12
3 LINEAR DIFFUSION FILTERS
3 Linear Diffusion Filters
Linear diffusion filtering is probably the best investigated PDE-based method for smooth-
ing images and it has been applied in numerous fields in image processing [11, pp. 11-12].
Its most surprising characteristic is that it is equivalent to Gaussian smoothing. This is
very interesting because in some way supports the intuitive idea that Gaussian smoothing
“diffuses” (fades out) objects to each other in the image. But at the same time, from this
point of view, this method has the same problem as Gaussian filters about loosing edges
when passing to coarser levels in a scale-space. However, unlike Gaussian filtering, linear
diffusion filtering provides a rich framework where to look at for a suitable alternative for
expression 2.0.4.
3.1 Equivalence to Gaussian Smoothing
When you consider the following 2-D isotropic homogeneous version of equation 2.1.4 -
and assuming that the initial condition is any bounded function f ∈ C2 (R) :
∂I(x, y, t)
= ∇ · (∇I(x, y, t)) (3.1.1)
∂t
I(x, y, 0) = f (x, y) (3.1.2)
it happens that a classical result [3, pp. 238 - 256] is that this equation (called equation
of heat in two dimensions in PDE theory literature 2 ) has this formal solution:

Z (x−ξ)2 +(y−η)2
1 − 4t
I(x, y, t) = f (ξ, η)e dξdη (t > 0) (3.1.3)
4πt R2
If you conveniently define:
1 − x2 +y 2
h i
k(x, y, t) = e 4t
(3.1.4)
4πt
2 Notation for ∇ · (∇u) is often ∆u or ∇2 u in image processing literature.
13
3.1 Equivalence to Gaussian Smoothing 3 LINEAR DIFFUSION FILTERS
you have that expression 3.1.1 is actually:
Z
I(x, y, t) = f (ξ, η)k(x − ξ, y − η, t)dξdη (t > 0) (3.1.5)
R2
On the other hand, Witkin [14] observed that if you revisit equations 1.2.1 and 1.2.2 to
adapt them to the 2-D case you will have that 2-D Gaussian convolution is:
Z
(Kσ ? f )(x, y) ≡ Kσ (x − ξ, y − η)f (ξ, η)dξdη (3.1.6)
R2
where,
1 − x2 +y 2
h i
Kσ (x, y) = e 2σ 2 (3.1.7)
2πσ 2
From equations 3.1.4 and 3.1.7, you can notice that
k(x, y, t) = K√2t (x, y) (3.1.8)
Which, combined with expressions 3.1.5 and 3.1.6, leads you to see that the solution of
the linear isotropic diffusion process (3.1.1) is actually:
(
(K√2t ? f )(x, y) (t > 0)
I(x, y, t) = (3.1.9)
f (x, y) (t = 0)
This solution:
(a) Is unique.
(b) Continuously depends on f with respect to ||.||∞ .
(c) Fulfills:
inf2 f ≤ I(x, y, t) ≤ sup f on R2 × [0, ∞) (3.1.10)
R R2
14
3.1 Equivalence to Gaussian Smoothing 3 LINEAR DIFFUSION FILTERS
Thus, an image I(x, y) can be regarded as the function f . This way you can solve equation
3.1.1 and build the following scale-space:
(
I(x, y, k) = (K√2k ? I)(x, y) (k > 0)
Fk (I(x, y)) = (3.1.11)
I(x, y) (k = 0)
which would be the scale space generated by a diffusion process.
On the other hand, if you instead would like to solve equation 3.1.1 numerically to apply it
to an image as a single filter and get a smoothing effect equivalent to a standard deviation
σ Gaussian filter effect, you would have to stop the diffusion process at time
σ2
T = (3.1.12)
2
The latter does not sound as an attractive idea since expression 3.1.9 provides an analytical
solution, which is suitable to be applied directly, but it gives you an idea of what is
happening in terms of the diffusion process in the nonlinear and/or anisotropic case.
Finally, equation 3.1.1 was solved for a unitary diffusivity (g = 1). But the same procedure
can be applied for the equation
∂I(x, y, t)
= ∇ · (g∇I(x, y, t)) (3.1.13)
∂t
I(x, y, 0) = I(x, y) (3.1.14)
where g > 0. In this case, unlike Gaussian smoothing, you have an extra parameter to
control the blurring level - figure 4 shows the result of this effect. The limitations are, as
can be seen on figure 4, include that a) it blends borders, b) dislocates edges from finer to
coarser levels. This motivates to keep improving this technique to fulfil the “good things”
expected from a filter enumerated in section 1.1.
15
4 NONLINEAR DIFFUSION FILTERS
(a) (b) (c) (d)
Figure 4: Linear diffusion filtering (σ = 5). (a) Original image, (b) g = 1, (c) g = 2, (d)
g = 5. Observe that the blurring level can also be controlled with the diffusivity g.
4 Nonlinear Diffusion Filters
The central idea of these kind of filtering method is that:
• The smoothing process is differential structure dependent - i.e., it adapts to the

image local conditions. For this reason this method belongs to the category of
adaptive smoothing methods which are well known in image processing.
• The diffusion process is also differential structure dependent. This is, that the dif-
fusivity (or the diffusion tensor) adapts to the local image conditions at every time
step.
4.1 The Perona-Malik Model
The first formulation to achieve the idea behind the desired properties for fk in expression
2.0.4 using a PDE-based adaptive method was given by Perona and Malik [7] in 1987.
They proposed to replace equation 3.1.13 by a non-linear version:
∂I(x, y, t)
= ∇ · (g(|∇I(x, y, t)|)∇I(x, y, t)) (4.1.1)
∂t
I(x, y, 0) = I(x, y) (4.1.2)
where the function g : R → R is such that:
• It is a smooth non-increasing function.
16
4.1 The Perona-Malik Model 4 NONLINEAR DIFFUSION FILTERS
• g(0) = 1 and g(s) = 0.
• s → ±∞ ⇒ g(s) → 0.
The idea is that if ∇I(x, y, t) is large then the diffusion, and therefore the smoothing, will
be low, thus contact zones (or edges) will be kept, while if ∇I(x, y, t) is small then it will
tend to smooth more around (x, y). You may notice that Perona and Malik refer their
definition as anisotropic diffusion but, since the scalar function g does not affect the angle
of ∇I(x, y, t), in this report terminology is referred as an isotropic process. They tested
their method using two versions for g:
h 2i
s
− λ 2
g(s) = e (4.1.3)
and
1
g(s) = 2 (4.1.4)
1 + λs 2
To see what the role played by λ is it is necessary to define, in general, a flux function by:
Φ(s) = sg(s) (4.1.5)
If you consider, without loss of generality (just to simplify notation), the 1-D case and
define:
∂f (x, t)
f 0 (x, t) =
∂x
Equation 4.1.1 can be rewritten as:
∂I(x, t)
= (g(I 0 (x, t))I 0 (x, t))0
∂t
= g 0 (I 0 (x, t))I 00 (x, t)I 0 (x, t) + g(I 0 (x, t))I 00 (x, t)
= [g 0 (I 0 (x, t))I 0 (x, t) + g(I 0 (x, t))] I 00 (x, t)
17
4.1 The Perona-Malik Model 4 NONLINEAR DIFFUSION FILTERS
Thus3 ,
∂I(x, t)
= Φ0 (I 0 (x, t))I 00 (x, t) (4.1.6)
∂t
On the other hand, the flux function Φ has a global maximum at λ for both expressions
(4.1.3 and 4.1.4). Hence, λ plays the role of a contrast parameter. The same procedure
can be extended to the 2-D case to have that if |∇I(x, y, t)| ≤ λ the neighbourhood of
(x, y) will be regarded as belonging to an interior region while if |∇I(x, y, t)| > λ the
neighbourhood of (x, y) will be considered as an edge.
Perona and Malik initial results were visually very impressive. The main assessment of
this original method was that, for the first time in an image smoothing method, edges
remained much more stable across the scale-space. But it had two difficulties that could
not be by-passed and got attention from many researchers:
A. The method itself is ill-posed because it lies on the image differential structure. This
becomes critical when, for instance, the image is noisy - in that case you would have
local zones where the gradient is particularly large without being an edge. Perona
and Malik were aware about this situation and they proposed to smooth the image
with a low-pass filter before applying the diffusion process. As Catté et al. [2] pointed
out, despite it works in practice, this workaround seems to go back to the problem it
was being avoided since this low-pass filter may cause that some edges to disappear as
information in advance.
B. In expression 4.1.6, when ∇I(x, y, t) > λ you have that Φ0 is negative. This becomes
the PDE to be of backward parabolic type which is known to be not necessary well-
posed.
4.1.1 Regularisation model
In spite of the Perona and Malik method intrinsic ill-posedness, Catté et al. [2] proposed to
use the so called gaussian derivatives to smooth the image before calculating the gradient
but when the latter is used as argument of g(s) only. Thus, they modified equation 4.1.1
to be:
3 Because Φ0 (s) = sg 0 (s) + g(s)
18
4.2 Anisotropic Models 4 NONLINEAR DIFFUSION FILTERS
∂I(x, y, t)
= ∇ · (g(|∇Iσ (x, y, t)|)∇I(x, y, t)) (4.1.7)
∂t
I(x, y, 0) = I(x, y) (4.1.8)
where ∇Iσ (x, y, t) = Kσ ? ∇I(x, y, t). They also proved that this equation has a solution
which is unique and regular.
4.2 Anisotropic Models
Despite Perona and Malik model adapts the diffusivity to the local image properties there
is something that still remains constant so far. It is the flux j = −g∇I(x, y, t) direction
(because of g is scalar-valued, j is parallel to ∇I(x, y, t)). But sometimes you may want to
change that flux direction towards the direction of certain features (such as corners, edges,
etc.) - for instance, the Perona and Malik model behaves as a linear diffusion filter in
intra-regions while the smoothing process is almost completely inhibited at edges causing
a problem when edges are noisy themselves; therefore, you would want a model that allows
you to de-noise edges as well. The only way to get that behaviour is no longer using a
scalar-valued diffusivity and going back to expression 4.2.1 to solve:
∂I(x, y, k)
= ∇ · (D(∇I(x, y)) · ∇I(x, y, k)) (4.2.1)
∂k
I(x, y, 0) = I(x, y) (4.2.2)
The issue now is to find a suitable form for the diffusion tensor D(∇I(x, y))4 in order to
change the filter behaviour at contact zones.
4.2.1 The Structure Tensor
If you have in mind that you do not only want to detect local properties but you also
want to identify interesting features, you need a method that takes into account the
predominant directions of the gradient - and not only the gradient or the gradient module
4 The expression D(∇I(x, y)) is just a notation to say that D is calculated using the local differential
structure of I(x, y)
19
- in a specific point neighbourhood. Fortunately there is a mathematical structure called

structure tensor that is designed to do this - further details can be found in [6, chapter 8].
If you conveniently write
∇I(x, y) = (Ix , Iy ) (4.2.3)
the idea is to build a kind of averaged gradients matrix (very similar to a jacobian) to look
at the gradient behaviour around a point (x, y). First of all, you need to define a window
W (x, y) = {(x − εx , y − εy )/ (εx , εy ) ∈ P } ⊆ I(x, y)
where
P ⊆ {−sx , . . . , 0, . . . , sx } × {−sy , . . . , 0, . . . , sy } , sx , sy ∈ N
which surrounds (x, y), and a set of window position weights

X
{w(εx , εy ) ∈ R / (εx , εy ) ∈ P } such that w(εx , εy ) = 1
(εx ,εy )∈P
If you consider the function ε(p) = (εx , εy ) for any point p = (x − εx , y − εy ) within a
window W (x, y), the general structure tensor is defined as:
 
w(ε(p))Ix (p)2
P P
w(ε(p))Ix (p)Iy (p)
p∈W (x,y) p∈W (x,y)
JW (x, y) =  (4.2.4)
 
w(ε(p))Iy (p)2
P P 
w(ε(p))Ix (p)Iy (p)
p∈W (x,y) p∈W (x,y)
or alternatively5 as
X
JW (x, y) = w(ε(p))S0 (p) (4.2.5)
p∈W (x,y)
5 Both forms are commonly found in literature.
20
where
" #
Ix (x, y)2 Ix (x, y)Iy (x, y)
J0 (x, y) = (4.2.6)
Ix (x, y)Iy (x, y) Iy (x, y)2
Application to Filters Design
The core concept about this matrix JW and filters design is that its eigenvalues, and their
corresponding eigenvectors, completely describe the behaviour of ∇I(x, y, t) within the
window W (x, y) for any point (x, y).
If µ1 and µ2 are the eigenvalues - and e1 and e2 are the eigenvectors - it happens that:
• µ1 ≥ µ2 ≥ 0 because JW is positive definite 6 .
• In general, the relative difference between µ1 and µ2 indicates the degree of local
spacial anisotropy within the window. This attribute is measured with a parameter
CW known as coherence. There are several expression proposed to calculate CW .
Among them are:
2
µ1 − µ2
CW =
µ1 + µ2
or simply
2
CW = (µ1 − µ2 )
because the earlier has problems when the gradient has no predominant direction as
mentioned below.
• If µ1 > µ2 then e1 - or −e1 - represents the direction that is maximally aligned to

∇I(x, y, t) within the window. In particular, if µ2 = 0 then ∇I(x, y, t) is parallel to
e1 - it is actually a multiple of it.
• If µ1 = µ2 , the gradient has no predominant direction within the window. The case
µ1 = µ2 = 0 can only occur if and only if ∇I(x, y, t) = 0 within the window.
Regularised Version
6 If z = (z1 , z2 ) ∈ C2 then
Ix2

Ix Iy
z∗ z = (z1 Ix + z2 Iy )2 > 0
Ix Iy Iy2
21
If you just define D = JW (x, y) in equation 4.2.1 you will notice that it is ill-posed -
because it is again differential structure dependent. But fortunately you can use the
same regularisation strategy as for the Perona and Malik model. Firstly, just to simplify
notation, observe that J0 (x, y) can also be expressed as a tensor product of ∇I(x, y):
J0 (x, y) = ∇I(x, y) ⊗ ∇I(x, y) = ∇I(x, y)∇I(x, y)t (4.2.7)
Weickert [10] proposed to use ∇Iσ (x, y) instead of ∇I(x, y) as regularisation scheme. This
way,
J0 (x, y) = ∇Iσ (x, y) ⊗ ∇Iσ (x, y) (4.2.8)
With this change, J0 (x, y) has eigenvectors e1 and e2 such that e1 k ∇Iσ (x, y) and
e2 ⊥ ∇Iσ (x, y). Observe that you already knew that e1 and e2 are orthonormal because
J0 (x, y) - and therefore JW (x, y) - is a real-valued symmetric matrix.
On the other hand, in JW (x, y) definition the window shape has no restrictions. But you
may notice that if you change W by a mean kernel defined by square matrix Wk : k×k then
the averaging operation is really a convolution in space domain. Therefore, the structure
tensor can also be defined as
Jk (x, y) = Wk ? ∇Iσ (x, y) ⊗ ∇Iσ (x, y) (4.2.9)
Finally, as it was defined, window weights do not need to be constant. That is why a
Gaussian kernel Gρ is often used instead of Wk . Thus, this is the common regularised
version of the structure tensor:
Jσ,ρ (x, y) = Gρ ? ∇Iσ (x, y) ⊗ ∇Iσ (x, y) (4.2.10)
j11 j12

Then Jσ,ρ (x, y) = j12 j22 has:
22
1. Eigenvectors e1 and e2 such that

2j12
e1 k p
2
(4.2.11)
j11 + j22 − (j11 − j22 )2 + 4j12
2. Eigenvalues µ1 and µ2 such that

1
q
µ1 = 2 2
j11 + j22 + (j11 − j22 ) + 4j12 (4.2.12)
2

1
q
µ2 = 2 2
j11 + j22 − (j11 − j22 ) + 4j12 (4.2.13)
2
As pointed out in [11, chapter 2] these eigenvalues integrate (summarise) the vari-
ation of grey values within a neighbourhood of which size is now characterised by
ρ. Therefore, they also describe the average contrast in the eigen-directions. This
parameter ρ is called integration scale. The pre-smoothing applied when ∇σ I(x, y)
is calculated makes the structure tensor insensitive to noise - and irrelevant details
- for scales of which size is smaller than σ. That is why the parameter σ is called
local scale or noise scale.
Furthermore, e1 is the orientation with the highest grey value fluctuations while e2
gives the local spacial anisotropy orientation called coherence direction.
4.2.2 Anisotropic Diffusion Equation
In equation 4.2.14 the structure tensor Jσ,ρ (x, y) is used instead of ∇Iσ (x, y) to build the
diffusion tensor D. Thus,
∂I(x, y, k)
= ∇ · (D(Jσ,ρ (x, y)) · ∇I(x, y, k)) (4.2.14)
∂k
I(x, y, 0) = I(x, y) (4.2.15)
is the anisotropic diffusion PDE to solve.
The general idea to build D is to take into account that it is a linear transformation
that can be described by its eigenvalues and eigenvectors. If you define them according
to certain expected structure tensor behaviour, you make sure that you will detect the
features of interest. Thus, since D should reflect the local image structure, its eigenvectors
should be the same as the structure tensor Jσ,ρ (x, y). But the D eigenvalues λ1 and λ2
23
must be chosen depending on the desired filter goal. Once you have defined λ1 and λ2 you
can write:

λ1 0
D(Jσ,ρ (x, y)) = (e1 e2 ) (e1 e2 )t = λ1 e1 ⊗ e1 + λ2 e2 ⊗ e2 (4.2.16)
0 λ2
Where e1 and e2 are the Jσ,ρ (x, y) eigenvectors. This way D is definite positive (since
its eigenvalues are positive) and symmetric because e1 and e2 are orthonormal (due to
Jσ,ρ (x, y) is symmetric).
It is also worth to recall here that the purpose of D is actually to modify the direction in
which diffusion is performed (further details in [6]). Thus, depending on λ1 and λ2 values
you have that:
• If λ1 ' λ2 ' k ⇒ D ' k(e1 ⊗ e1 + e2 ⊗ e2 ). This means that diffusion will be

in all directions, i.e. isotropic, with diffusivity k - because e1 ⊗ e1 + e2 ⊗ e2 = I
when e1 and e2 are orthonormal; hence the gradient will not be forced to have any
predominant direction.
• If λ1 λ2 ⇒ D ' λ1 e1 ⊗ e1 . In this case diffusion (with diffusivity λ1 ) will be in

the direction of the eigenvector of which eigenvalue is the largest.
In the following subsections two alternatives for D proposed by Weickert [11, chapter 5]
are presented.
4.2.3 Edge Enhancing Diffusion
The key point to enhance edges is to smooth preferably within regions and inhibit the
diffusion process across edges. Since e1 k ∇Iσ (x, y) - and therefore it is perpendicular to
7
the edges - the needed contrast can be controlled by λ1 if you keep λ2 constant. Thus,
if you define λ2 = 1 and assume that µ1 ≥ µ2 ,
• Edges are detected by the structure tensor when µ1 is large. Hence,
µ1 → ∞ ⇒ λ1 → 0 ⇒ D → e2 ⊗ e2
7 Keep in mind that a structure tensor can estimate edges by using two orthonormal eigenvectors e
1
and e2 such that e1 k ∇Iσ (x, y) and e2 ⊥ ∇Iσ (x, y)
24
This is that, since e2 ⊥ ∇Iσ (x, y), if λ1 → 0 the diffusion process will be in the
direction of the edges. Assuming that edges width is not significant you have that
diffusion is actually stopped at edges.
• Smaller µ1 values will mean that intra-regions are being detected. Then,
µ1 → 0 ⇒ λ1 → 1 ⇒ D → e1 ⊗ e1 + e2 ⊗ e2
And therefore, diffusion will be in all directions.
This behaviour can be obtained with:
λ1 (µ1 ) = g(µ1 ) (4.2.17)

λ2 = 1 (4.2.18)
where (m ∈ N, Cm > 0, λ > 0)
Cm
g(s) = 1 − e[− (s/λ)m ] (4.2.19)
The constant Cm is calculated such that the flux Φ(s) = sg(s) is increasing in [0, λ] and
decreasing in (λ, ∞). Since edges width is not significant you can set ρ = 0.
4.2.4 Coherence Enhancing Diffusion
So far, edges or contact zones width has not been regarded as an important parameter. But
what would happen if you have a flow-like texture where you observe many 1-D structures?
If you just apply a linear, or non-linear, diffusion process that enhance edges you would
see that smoothing within those 1-D structures would be quite poor and therefore they
would remain noisy. In that case you need a method not only to smooth in intra-regions
but also to smooth within contact zones.
If you follow the same kind of reasoning as for the previous section, you can see that
to get this behaviour you just need that λ1 is constant and λ2 depend on the coherence
Cρ = (µ1 − µ2 )2 (because e2 is aligned to the edges and, therefore, to the coherence
direction; assuming that µ1 ≥ µ2 again). Since that the coherence measures the local
25
anisotropy level within the integration window (characterised by ρ) you can see that for a
γ ∈ (0, 1) small enough (γ 1) and λ1 = γ,
• Cρ → 0 ⇒ λ2 → γ ⇒ D → γ(e1 ⊗ e1 + e2 ⊗ e2 ). In other words, when coherence is

small within the integration window diffusion will be in all directions with diffusivity
γ. This is another way to detect intra-regions. But unlike previous methods, regions
within edges when their width is significant, are also considered.
• Cρ → ∞ ⇒ λ2 → 1 ⇒ D → γe1 ⊗ e1 + e2 ⊗ e2 . Since γ 1, diffusion will be along

e2 when coherence is large. This will stop diffusion between contact zone edges only.
Finally, this can be accomplished by defining (m ∈ N, C > 0):
λ1 = γ (4.2.20)

 γ if µ1 = µ2
λ2 = h
C
i
(4.2.21)
 γ + (1 − γ)e − (µ1 −µ2 )2m else
26
5 NUMERICAL ASPECTS
5 Numerical Aspects
5.1 Isotropic Case
As has been shown in previous sections, both linear (isotropic) diffusion and isotropic
nonlinear diffusion do not alter the orientation of the diffusion process flux −j. This way,
you can apply the same numerical approach to solve both equation 4.1.1 and equation
4.1.7. If you observe that equation 4.1.7 is actually the general case then the problem now
is to find a suitable discretisation model.
Both Weickert [13] and Catté et al. [2] have used the following model:
k+1 k 2 k
I(i,j) − I(i,j) X X g(i,j) + gαk k
= (I(i,j) − Iαk ) (5.1.1)
τ 2h2l
l=1 α∈Nl (i,j)
where,
• The image is regarded as a matrix I ∈ RN × RM of which components Ii,j ((i, j) ∈

{1, · · · , N } × {1, · · · , M }) represent the intensity value (or the grey value in case of
working with grey-scale images only).
• In the discrete grid:
– The pixel (i, j) represents the location (xi , yj ).

– hl is the grid size in the direction l ∈ {1, 2}. Theoretically you could consider an
arbitrary number of directions but in this case direction 1 goes along columns
and direction 2 goes along rows.
• If you consider a time step size τ the discrete times would be tk = kτ where k ∈ N.
k k
• I(i,j) and g(i,j) denote:
k
I(i,j) = I(xi , yj , k)
k
g(i,j) = g(|∇Iσ (xi , yj , k)|)
∇Iσ (xi , yj , k) is calculated using central differences.
• Nl (i, j) consists of two neighbours of pixel (i, j) along the direction l.
27
5.1 Isotropic Case 5 NUMERICAL ASPECTS
Before going into further details, it is worth to simplify notation here by fixing one di-
rection, say direction 1 - to treat each dimension separately. This way, equation 5.1.1
becomes into:
Iik+1 − Iik X gik + gjk

= (Iik − Ijk ) (5.1.2)
τ 2h21
j∈N1 (i)
Now i represents the pixel position and j represents the neighbour pixel positions. This
is necessary just to spot that equation 5.1.2 can be rewritten in matrix-vector notation as
I k+1 − I k
= A1 (I k )I k (5.1.3)
τ
(1)
where A1 (I k ) = (aij (I k )) is a matrix of which components are given by
gik +gjk



 2h21
j ∈ N1 (i)
gik +gak

(1)
aij (I k )
P
= − 2h21
j=i (5.1.4)

 a∈N1 (i)

0 else

If you fix direction 2 you can easily obtain A2 (I k ) by symmetry and finally check that
equation 5.1.1 can be rewritten as
2
I k+1 − I k X
= Al (I k )I k (5.1.5)
τ
l=1
Equation 5.1.5 is particularly useful because it allows to derive the three main approaches
- that have been deeply studied by Weickert [13] - to actually solve the system:
1. Explicit Scheme. If you rearrange equation 5.1.5 you can see that:
2
X
I k+1 = (Id + τ Al (I k ))I k (5.1.6)
l=1
Where Id is the identity matrix. Every I k+1 is obtained directly from the previous
I k without solving a linear system in this scheme.
28
5.1 Isotropic Case 5 NUMERICAL ASPECTS
At first sight it looks computationally very cheap because it only involves matrix
sums and multiplications. But it has been proven [13] that this scheme is stable if τ
satisfies:
−1
2 2
τ≤ + 2
h21 h2
In practice8 , this means that the process requires a high number of iterations with
very small step sizes, turning the whole scheme into a quite slow option for many
cases.
2. Semi-implicit Scheme. Following the same procedure shown above, a slightly

(more complicated) variation of the model given by equation 5.1.1 leads to the ex-
pression
2
I k+1 − I k X
= Al (I k )I k+1 (5.1.7)
τ
l=1
Rewriting equation 5.1.7 you obtain that
2
!−1
X
k+1 k
I = Id + τ Al (I ) Ik (5.1.8)
l=1
where each I k+1 is obtained by solving a linear system. Weickert [13] proved that
this scheme is unconditionally stable. But, due to every iteration involves a matrix
inversion, it is not superior to the explicit scheme in computation time - despite it
allows a larger τ (because now it has no restrictions) and therefore significantly less
iterations.
3. Additive Operator Splitting (AOS) Scheme. The idea here is to preserve the
unconditionally stability from the semi-implicit scheme but at the same time finding
a mechanism to speed up the matrix inversion operation. In this sense, using the
approximation (A + B)−1 ≈ A−1 + B −1 you can check that equation 5.1.8 can be
expressed as (details in [11, chapter 4])
2
1X −1 k
I k+1 = Id + 2τ Al (I k ) I (5.1.9)
2
l=1
8 If you are considering more than two directions (m > 2) the expression would be
2 −1

2
τ ≤ 2
+ ··· + 2
h1 hm
29
5.2 Anisotropic Case 5 NUMERICAL ASPECTS
Therefore, the matrix inversion operation is performed for each direction separately.
This may sound computationally more expensive than the semi-implicit scheme be-
cause you are increasing the number of matrix inversions. The point is that if you
observe the definition of every matrix (equation 5.1.4) you will notice that Al (I k )
are symmetric tridiagonal matrices which leads to a tridiagonal system of equations
can be efficiently solved using the classical Thomas Algorithm - that can be found in
any elementary linear algebra text book. This algorithm is O(n) instead of Gaussian
elimination which is O(n3 ) making the AOS scheme 10 times faster than both the
explicit and the semi-implicit schemes.
The only drawback of this scheme is that, since it is based on an approximation,
it may be less accurate than previous schemes. But experiments ([13], [12] and [9])
have shown that the added error is pretty acceptable when it is compared to the
efficiency gain.
5.2 Anisotropic Case
The idea here is to solve equation 4.2.14. This can be done directly using finite differences.
Weickert [11, chapter 3] showed that if you use central differences for the spatial derivatives
while you use forward difference for the temporal derivative, you get this explicit scheme:
k+1 k
I(i,j) − I(i,j) k+1
= Ak(i,j) ? I(i,j)
k
⇔ I(i,j) = (Id + τ Ak(i,j) ) ? I(i,j)
k
(5.2.1)
τ
where Ak(i,j) ?I(i,j)

k
is the discretisation of ∇·(D∇I). This notation is not arbitrary because
k
it happens that the explicit scheme is actually a convolution of the image step I(i,j) using
a spatially and temporally variant kernel. The standard discretisation 3 × 3 stencil for this
kernel is:
−b(i−1,j) −b(i,j+1) c(i,j+1) +c(i,j) −b(i+1,j) −b(i,j+1)

4 2 4
a(i−1,j) +a(i,j) a +2a +a(i+1,j) c +2c +c(i,j+1) a(i+1,j) +a(i,j)
2 − (i−1,j) (i,j)
2 − (i,j−1) (i,j)
2 2
−b(i−1,j) −b(i,j−1) c(i,j−1) +c(i,j) −b(i+1,j) −b(i,j−1)
4 2 4
where a(i,j) , b(i,j) , c(i,j) denote the finite differences approximation of a(Jρ (∇Iσ )), b(Jρ (∇Iσ )),
c(Jρ (∇Iσ )) at some grid point (xi , yj ) respectively. You can find other discretisation sten-
cils in [11, chapter 3].
30
6 SOME EXAMPLES
6 Some Examples
6.1 Image Restoration
6.1.1 Salt and Pepper Noise
Probably the most annoying noise type is the salt and pepper noise which consists of
sparse random disturbances in pixel values that have no relation with the surrounding
pixels. Figures 5[a] and 6[a] show the effect of this kind of noise. The most used technique
to eliminate this noise is the median filter - this is a nonlinear kernel filter that replaces
every pixel (x, y) with the median value within a kernel window around (x, y). Figures
5[b to f] and 6[b to f] show the result of applying different filters until the point you no
longer see noise spots.
The median filter needs that the window contains enough information otherwise you soon
start loosing details at edges (figure 5[b]). Both figures 5[e] and 5[f] show that anisotropic
filters have the same drawback as median filter on binary images - coherence enhancing
filters, for instance, rely on the information amount that you can collect within the inte-
gration window; if the latter is too large then noise could be considered as an anisotropy
pattern and, on the contrary, you get a linear diffusion behaviour. Performance of non-
linear diffusion isotropic filter is quite remarkable (figure 5[d]).
In the case of grey-scale image you get better results with the median filter although the
non-linear diffusion filters give quite acceptable results (6[d to f]).
6.1.2 Quality Improvement
In this case you want to detect patterns in poor quality images. Figure 7[a] shows a
low quality mineralogical microscopy image where you want to eliminate impurity spots
carried in the contrast medium, for example. Both isotropic and anisotropic non-linear
diffusion (figures 7[c to e]) give rather good results.
Figure 8[a] shows an example where you want to improve quality within the contact zones
because they actually are the patterns of interest. Observe that both the mean filter
(figure 8[b]) and the non-linear isotropic diffusion (8[d]) filter keep impurities within the
black lines. The edge-enhancing filter (figure 8[e]) improves that situation but coherence
enhancing (figure 8[f]) behaviour is outstanding.
31
6.2 Image Multiscale View 6 SOME EXAMPLES
6.1.3 Removing Cracks
This is the area where diffusion filters are probably the most useful. Figure 9[a] shows an
old painting that has very visible cracks that you may want to remove. Figures 9[b to e]
show the effect of applying different filters until the point you no longer see the cracks.
In this case you can see how the isotropic filter (9[c]) keeps some bumpy texture at edges
while anisotropic filters (9[d and e]) behaviour solve that problem.
The same thing happens for figure 10[a]. Notice that edge enhancing (10[e]) filter stands
out due to the flow-like texture of the painting technique.
6.2 Image Multiscale View
Figures 11 to 15 show instances of different scale spaces. In every case the first row is the
scale 0 (original image), the three next rows represent the same small variations within
fine scales while the last three rows show the same larger variations to get to coarser levels.
Every column represent a scale space generated using: (a) linear diffusion, (b) non-linear
isotropic diffusion, (c) edge-enhancing (anisotropic) diffusion and (d) coherence-enhancing
(anisotropic) diffusion.
32
REFERENCES REFERENCES
References
[1] Luis Alvarez, Frédéric Guichier, Pierre-Louis Lions, Jean-Michel Morel, and Tomeu
Coll. Axioms and fundamental equations of image processing. Arch. Rational Mech.
Anal - Springer-Verlag, Vol. 123, 1993.
[2] Francine Catté, Pierre-Louis Lions, Jean-Michel Morel, and Tomeu Coll. Image se-
lective smoothing and edge detection by nonlinear diffusion. SIAM J. Numer. Anal.,
Vol. 29, February 1992.
[3] Edward T. Copson. Partial Differential Equations. Cambridge University Press, The
Pitt Building, Trumpsington Street, Cambridge CB2 1RP, Bentley House, 200 Euston
Road, London NW1 2DB, first edition, 1975.
[4] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Pearson Prentice
Hall, Pearson Education Inc., Upper Saddle River, New Jersey 07458, third edition,
2008.
[5] Frank P. Incropera, David P. Dewitt, Theodore L. Bergman, and Adrienne S. Lavine.
Fundamentals of Heat and Mass Transfer. John Wiley & Sons, 111 River Street,
Hoboken, NJ 07030-5774, sixth edition, 2007.
[6] Bernd Jähne. Spatio-Temporal Image Processing: Theory and Scientific Applica-
tions. Springer-Verlag Berlin, Postfach 6980, Vincenz-Priessnitz-straβe I, D-76131
Karlsruhne, Germany, first edition, 1993.
[7] Pietro Perona and Jitendra Malik. Scale-space and edge dectection using anisotropic
difussion. IEEE Transactions on Pattern Analisys and Machine Intelligence, Vol. 12,
July 1990.
[8] John C. Russ. Image Processing Handbook. CRC Press, Taylor & Francis Group, 6000
Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742, fifth edition,
2007.
[9] S.K. Weeratunga and C.Kamath. Pde-based nonlinear diffusion techniques for de-
noising scientific and industrial images: an empirical study. In Electronic Imaging
Symposium, San Jose, January, 2002.
[10] Joachim Weickert. Coherence-enhancing diffusion filtering. International Journal of

Computer Vision, Vol. 31, 1999.
[11] Joachim Weickert. Anistropic Difussion in Image Processing. B.G. Teubner,

Stuttgart, first edition, 2008.
33
REFERENCES REFERENCES
[12] Joachim Weickert, Bart M. ter Haar Romeny, L. Florackand, J. Koenderinck, and Max
A. Viergever (Eds). Recursive separable schemes for nonlinear diffusion filters. Scale-
space Theory in Computer Vision, Lecture Notes in Computer Science, Springer,
Berlin, Vol. 1252, 1997.
[13] Joachim Weickert, Bart M. ter Haar Romeny, and Max A. Viergever. Efficient and
reliable schemes for nonlinear diffusion filtering. IEEE Transactions on Image Pro-
cessing, Vol. 7, NO. 3, March 1998.
[14] Andrew P. Witkin. Scale-space filtering. Int. Joint Conf. Artificial Intelligence,
Karlsruhe, West Germany, pages 1019–1021, 1983.
34
A FIGURES
A Figures
(a) (b)
(c) (d)
(e) (f )
Figure 5: Salt and Pepper noise reduction. (a) Original image. Using (b) a median filter,
(c) linear diffusion, (d) non-linear isotropic diffusion, (e) edge enhancing (anisotropic)
diffusion, (f) coherence enhancing (anisotropic) diffusion.
35
A FIGURES
(a) (b)
(c) (d)
(e) (f )
Figure 6: Salt and Pepper noise reduction. (a) Original image. Using (b) a median filter,
(c) linear diffusion, (d) non-linear isotropic diffusion, (e) edge enhancing (anisotropic)
diffusion, (f) coherence enhancing (anisotropic) diffusion.
36
A FIGURES
(a)
(b) (c)
(d) (e)
Figure 7: Quality improvement. (a) Original image. Using (b) linear diffusion, (c) non-
linear isotropic diffusion, (d) edge enhancing (anisotropic) diffusion, (e) coherence enhanc-
ing (anisotropic) diffusion.
37
A FIGURES
(a) (b)
(c) (d)
(e) (f )
Figure 8: Quality improvement. (a) Original image. Using (b) a mean filter, (c) linear
diffusion, (d) non-linear isotropic diffusion, (e) edge enhancing (anisotropic) diffusion, (f)
coherence enhancing (anisotropic) diffusion.
38
A FIGURES
(a)
(b) (c)
(d) (e)
Figure 9: Removing cracks. (a) Original image. Using (b) linear diffusion, (c) non-linear
isotropic diffusion, (d) edge enhancing (anisotropic) diffusion, (e) coherence enhancing
39
A FIGURES
(a)
(b) (c)
(d) (e)
Figure 10: Removing cracks. (a) Original image. Using (b) linear diffusion, (c) non-linear
isotropic diffusion, (d) edge enhancing (anisotropic) diffusion, (e) coherence enhancing
40
A FIGURES
(a) (b) (c) (d)
Figure 11: Multiscale view within a scale-space generated with (a) linear diffusion, (b)
non-linear isotropic diffusion, (c) edge enhancing (anisotropic) diffusion, (d) coherence
enhancing (anisotropic) diffusion.
41
A FIGURES
(a) (b) (c) (d)
enhancing (anisotropic) diffusion. 42
A FIGURES
(a) (b) (c) (d)
43
A FIGURES
(a) (b) (c) (d)
44
A FIGURES
(a) (b) (c) (d)
45

Diffusion-based Image Filtering: Gaussian Filters Reduce Noise

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Diffusion-based Image Filtering: Gaussian Filters Reduce Noise

Uploaded by

Copyright:

Available Formats

Diffusion-based Image Filtering

October 24, 2012

1.1 Kernel filters

Ic (x, y) = K(x, y) ? I(x, y) (1.1.1)

The resulting matrix Ic (x, y) : (h − n + 1) × (w − n + 1) is smaller than the original one

where (if K(x, y) : n × n)

On the contrary, if you apply a kernel like this:

(a) (b) (c) (d)

1.2 Gaussian filters

(a) (b) (c)

where Kσ is Gaussian function characterized by a standard deviation σ,

If you define the Fourier transform z by

(z(Kσ ? f ))(w) = (zKσ )(w) · (zf )(w) (1.2.4)

and since the Fourier transform of a Gaussian is also Gaussian-shaped,

(a) (b) (c)

1.3 The Scale-Space

1.3.1 General concept

This multi-scale representation of images is called scale-space and a formalism to define

Ft+s (I(x, y)) = Ft (Fs (I(x, y))), ∀ t, s > 0 (1.3.2)

1.3.2 Gaussian Scale-Space

Fk (I(x, y)) = fk (∇I(x, y)) (2.0.4)

2. fk increases when ∇I → 0. This way you could meet (a).

In general, when a function Fk (h) - where h is another function - changes as ∇h changes

2.0.3 Space Regularization

2.1 The Diffusion Equation

3 Linear Diffusion Filters

3.1 Equivalence to Gaussian Smoothing

If you conveniently define:

you have that expression 3.1.1 is actually:

From equations 3.1.4 and 3.1.7, you can notice that

k(x, y, t) = K√2t (x, y) (3.1.8)

(b) Continuously depends on f with respect to ||.||∞ .

which would be the scale space generated by a diffusion process.

(a) (b) (c) (d)

4 Nonlinear Diffusion Filters

The central idea of these kind of filtering method is that:

• The smoothing process is differential structure dependent - i.e., it adapts to the

4.1 The Perona-Malik Model

where the function g : R → R is such that:

• It is a smooth non-increasing function.

• g(0) = 1 and g(s) = 0.

Φ(s) = sg(s) (4.1.5)

Equation 4.1.1 can be rewritten as:

4.1.1 Regularisation model

4.2 Anisotropic Models

4.2.1 The Structure Tensor

- in a specific point neighbourhood. Fortunately there is a mathematical structure called

If you conveniently write

∇I(x, y) = (Ix , Iy ) (4.2.3)

W (x, y) = {(x − εx , y − εy )/ (εx , εy ) ∈ P } ⊆ I(x, y)

which surrounds (x, y), and a set of window position weights

5 Both forms are commonly found in literature.

Application to Filters Design

• µ1 ≥ µ2 ≥ 0 because JW is positive definite 6 .

• If µ1 > µ2 then e1 - or −e1 - represents the direction that is maximally aligned to

J0 (x, y) = ∇I(x, y) ⊗ ∇I(x, y) = ∇I(x, y)∇I(x, y)t (4.2.7)

J0 (x, y) = ∇Iσ (x, y) ⊗ ∇Iσ (x, y) (4.2.8)

Jk (x, y) = Wk ? ∇Iσ (x, y) ⊗ ∇Iσ (x, y) (4.2.9)

Jσ,ρ (x, y) = Gρ ? ∇Iσ (x, y) ⊗ ∇Iσ (x, y) (4.2.10)

1. Eigenvectors e1 and e2 such that

2. Eigenvalues µ1 and µ2 such that

4.2.2 Anisotropic Diffusion Equation

is the anisotropic diffusion PDE to solve.

• If λ1 λ2 ⇒ D ' λ1 e1 ⊗ e1 . In this case diffusion (with diffusivity λ1 ) will be in

• Cρ → ∞ ⇒ λ2 → 1 ⇒ D → γe1 ⊗ e1 + e2 ⊗ e2 . Since γ 1, diffusion will be along