Image Pre-Processing
Definition
Pre-processing refers to operations on images at the lowest level of abstraction.
Both input and output are intensity images.
Such images are usually of the same kind as the original sensor data.
Intensity images are typically represented as a matrix/matrices of brightness values.
Key Characteristics
Pre-processing does not increase image information content.
If information is measured using entropy, pre-processing usually decreases it.
From an information-theoretic perspective, the best pre-processing is no pre-processing.
The best way to avoid elaborate pre-processing is to ensure high-quality image acquisition.
Why Pre-processing is Still Useful
Suppresses irrelevant information for a specific image-processing or analysis task.
Improves image data by:
o Suppressing undesired distortions
o Enhancing features important for further processing
Includes geometric transformations (rotation, scaling, translation).
Classification of Pre-processing Methods
Based on pixel neighborhood size used to calculate new pixel brightness:
1. Pixel Brightness Transformations (Section 5.1)
2. Geometric Transformations (Section 5.2)
3. Local Neighborhood Methods (Section 5.3)
4. Image Restoration using Entire Image Knowledge (Section 5.4)
Use of Redundancy in Images
Most images have redundant information.
Pre-processing can explore statistical image characteristics to:
o Suppress degradations like noise
o Enhance image details
Neighboring pixels in real images usually have similar brightness values.
Distorted pixels can be restored by replacing them with the average value of neighbors.
Pre-processing for Image Degradation Correction
The a priori knowledge used matters:
1. Methods using little or no knowledge about degradation.
2. Methods using knowledge of acquisition device properties and acquisition conditions.
o Sometimes the noise characteristics (spectral properties) are known.
3. Methods using knowledge about objects sought in the image.
o If object knowledge is not known in advance, it can be estimated during processing.
5.1 Pixel Brightness Transformations
Definition
A brightness transformation modifies pixel brightness.
The transformation depends only on the properties of a pixel itself.
Two main classes:
1. Brightness Corrections
2. Gray-Scale Transformations
5.1.1 Position-Dependent Brightness Correction
In theory, image acquisition and digitization should not depend on position.
In practice, this assumption fails due to:
o Lens attenuating light more at points farther from the optical axis.
o Non-uniform sensitivity of the sensor.
o Uneven object illumination.
Systematic degradation can be suppressed by brightness correction.
A multiplicative error coefficient e(i,j) models degradation:
f(i,j)=e(i,j) g(i,j)f(i,j) = e(i,j) \, g(i,j)f(i,j)=e(i,j)g(i,j)
where:
o g(i,j) = ideal undegraded image
o f(i,j) = degraded version
To compute e(i,j):
o Capture a reference image g(i,j) with known brightness (e.g., constant brightness c).
o Its degraded version is f_c(i,j).
o Then:
e(i,j)=cfc(i,j)e(i,j) = \frac{c}{f_c(i,j)}e(i,j)=fc(i,j)c g(i,j)=f(i,j)e(i,j)g(i,j) = \frac{f(i,j)}{e(i,j)}g(i,j)=e(i,j)f(i,j)
Conditions:
o The degradation process must be stable.
o Assumes linearity of transformation (not fully true, since brightness scale is limited).
o Overflow can occur → brightness must be chosen carefully.
Example:
o If grayscale has 256 levels → ideal reference image brightness = 128 (midpoint).
5.1.2 Gray-Scale Transformations
Independent of pixel position.
General transformation:
q=T(p)q = T(p)q=T(p)
where:
o p = original brightness (scale [p0,pk][p_0, p_k][p0,pk])
o q = transformed brightness (new scale [q0,qk][q_0, q_k][q0,qk])
Types of Gray-Scale Transformations
1. Contrast Enhancement
o Piecewise linear function enhances brightness between p1 and p2.
2. Thresholding
o Converts image into black-and-white.
3. Negative Transformation
o Produces a negative image by inverting brightness.
Implementation
Digital images have limited gray levels.
Easy to implement using look-up tables (both in hardware & software).
Image signals usually pass through LUTs in displays → real-time transformation possible.
Works for color images too:
o Color = RGB (3 components).
o Separate LUT for each channel allows any color-scale transformation.
Applications
Used mainly for human observation:
o Improves visual quality.
o Enhances contrast (e.g., medical images).
Automatic contrast enhancement often achieved by Histogram Equalization.
Histogram Equalization
Goal: Spread brightness levels uniformly across entire scale.
Contrast increases near histogram maxima, decreases near minima.
Mathematical Formulation
Input histogram = H(p).
Desired uniform output histogram = G(q).
Transformation condition:
∑i=0kG(qi)=∑i=0kH(pi)\sum_{i=0}^k G(q_i) = \sum_{i=0}^k H(p_i)i=0∑kG(qi)=i=0∑kH(pi)
Equalized histogram corresponds to uniform probability density:
f=N2qk−q0f = \frac{N^2}{q_k - q_0}f=qk−q0N2
(for an N × N image).
Transformation function:
q=T(p)=(qk−q0)⋅∫p0pH(s) dsN2+q0q = T(p) = (q_k - q_0) \cdot \frac{\int_{p_0}^p H(s)\,ds}{N^2} +
q_0q=T(p)=(qk−q0)⋅N2∫p0pH(s)ds+q0
The integral = cumulative histogram.
In practice → discrete approximation (not perfectly equalized).
Algorithm 5.1: Histogram Equalization
1. For an N×M image with G gray levels, initialize array H[G] = 0.
2. Build histogram: For each pixel of intensity gp, update
H[gp]=H[gp]+1H[gp] = H[gp] + 1H[gp]=H[gp]+1
Find g_min (lowest gray level present).
3. Build cumulative histogram Hc:
o Hc[0] = H[0]
o Hc[g] = Hc[g-1] + H[g] for g = 1...G-1
Let H_min = Hc[g_min].
4. Compute transformation:
T[g]=round(Hc[g]−HminMN−Hmin⋅(G−1))T[g] = \text{round}\left(\frac{Hc[g] - H_{min}}{MN -
H_{min}} \cdot (G - 1)\right)T[g]=round(MN−HminHc[g]−Hmin⋅(G−1))
5. Rescan image: Assign
gq=T[gp]g_q = T[g_p]gq=T[gp]
Examples
Applied to X-ray CT lung image:
o Original vs. Equalized images (Fig. 5.3).
o Original vs. Equalized histograms (Fig. 5.4).
Other Gray-Scale Transformations
1. Logarithmic Transformation
o Often used to correct exponential γ-correction in cameras.
2. Pseudo-Color Transformation
o Maps different brightness values in a monochrome image to colors.
o Exploits high human sensitivity to color changes.
o Helps perceive more details.
Geometric Transformations (Section 5.2)
Purpose:
o Eliminate geometric distortions in images.
o Example: Matching remotely sensed images of the same area taken at different
times but from different positions.
o After alignment, images can be compared or subtracted.
o Only 2D transformations considered (sufficient for most digital images).
Definition
A geometric transform is a vector function T mapping pixel (x,y)(x,y)(x,y) to new position (x′,y
′)(x′, y′)(x′,y′):
o x′=Tx(x,y)x′ = T_x(x,y)x′=Tx(x,y)
o y′=Ty(x,y)y′ = T_y(x,y)y′=Ty(x,y)
Transformation functions (Tx,TyT_x, T_yTx,Ty) can be:
o Known in advance (e.g., rotation, translation, scaling).
o Derived from known original & transformed images using corresponding points.
Two Basic Steps of a Geometric Transform
1. Pixel co-ordinate transformation
o Maps input pixel coordinates (x,y)(x,y)(x,y) to new point in output image.
o Output coordinates are usually continuous (real numbers).
2. Brightness interpolation
o Finds closest raster point & assigns brightness.
o Usually computed as interpolation from neighboring pixels.
5.2.1 Pixel Co-ordinate Transformations
General polynomial transformation:
x′=∑r=0m∑k=0m−rarkxryk,y′=∑r=0m∑k=0m−rbrkxrykx′ = \sum_{r=0}^m \sum_{k=0}^{m-r} a_{rk}x^r
y^k, \quad y′ = \sum_{r=0}^m \sum_{k=0}^{m-r} b_{rk}x^r y^kx′=r=0∑mk=0∑m−rarkxryk,y′=r=0∑m
k=0∑m−rbrkxryk
Properties:
o Linear in coefficients ark,brka_{rk}, b_{rk}ark,brk.
o Coefficients found by solving system of linear equations using corresponding point
pairs.
o For slow-varying transforms:
Low-order polynomials (m=2m=2m=2 or m=3m=3m=3).
Require at least 6 or 10 corresponding pairs.
Corresponding points should be well-distributed across the image.
Special Cases
1. Bilinear Transformation (requires 4 pairs):
o x′=a0+a1x+a2y+a3xyx′ = a_0 + a_1x + a_2y + a_3xyx′=a0+a1x+a2y+a3xy
o y′=b0+b1x+b2y+b3xyy′ = b_0 + b_1x + b_2y + b_3xyy′=b0+b1x+b2y+b3xy
2. Affine Transformation (requires 3 pairs):
o x′=a0+a1x+a2yx′ = a_0 + a_1x + a_2yx′=a0+a1x+a2y
o y′=b0+b1x+b2yy′ = b_0 + b_1x + b_2yy′=b0+b1x+b2y
o Includes rotation, translation, scaling, skewing.
Jacobian Determinant
Measures effect of transformation on coordinate system:
J=∂(x′,y′)∂(x,y)=∣∂x′/∂x∂x′/∂y∂y′/∂x∂y′/∂y∣J = \frac{∂(x′,y′)}{∂(x,y)} = \begin{vmatrix} ∂x′/∂x & ∂x
′/∂y \\ ∂y′/∂x & ∂y′/∂y \end{vmatrix}J=∂(x,y)∂(x′,y′)=∂x′/∂x∂y′/∂x∂x′/∂y∂y′/∂y
If singular (no inverse), then J=0J=0J=0.
If area invariant under transformation, then ∣J∣=1|J|=1∣J∣=1.
Examples:
o Bilinear: J=a1b2−a2b1+(a1b3−a3b1)x+(a3b2−a2b3)yJ = a_1b_2 − a_2b_1 + (a_1b_3 −
a_3b_1)x + (a_3b_2 − a_2b_3)yJ=a1b2−a2b1+(a1b3−a3b1)x+(a3b2−a2b3)y
o Affine: J=a1b2−a2b1J = a_1b_2 − a_2b_1J=a1b2−a2b1
Important Transformations
Rotation (angle φ, about origin):
o x′=xcosφ+ysinφx′ = x\cos φ + y\sin φx′=xcosφ+ysinφ
o y′=−xsinφ+ycosφy′ = -x\sin φ + y\cos φy′=−xsinφ+ycosφ
o J=1J = 1J=1
Scaling (factors a, b):
o x′=axx′ = axx′=ax
o y′=byy′ = byy′=by
o J=abJ = abJ=ab
Skewing (angle φ):
o x′=x+ytanφx′ = x + y\tan φx′=x+ytanφ
o y′=yy′ = yy′=y
o J=1J = 1J=1
Partitioning for Complex Transformations
Complex distortions can be approximated:
o Divide image into smaller subimages.
o Estimate simple transformations for each.
o Correct each subimage separately.
5.2.2 Brightness Interpolation
After transformation, coordinates (x′,y′)(x′,y′)(x′,y′) are usually non-integer.
Need integer grid values for raster pixels.
Achieved via brightness interpolation.
General approach:
o Inverse transform: (x,y)=T−1(x′,y′)(x,y) = T^{-1}(x′,y′)(x,y)=T−1(x′,y′).
o Real coordinates may not match raster → resample input image.
Convolution formulation:
fn(x,y)=∑l=−∞∞∑k=−∞∞gs(l∆x,k∆y)hn(x−l∆x,y−k∆y)f_n(x,y) = \sum_{l=-∞}^{∞} \sum_{k=-∞}^{∞}
g_s(l∆x,k∆y) h_n(x-l∆x,y-k∆y)fn(x,y)=l=−∞∑∞k=−∞∑∞gs(l∆x,k∆y)hn(x−l∆x,y−k∆y)
o hnh_nhn = interpolation kernel.
o Small neighborhood used (outside → kernel = 0).
Interpolation Methods
1. Nearest Neighbor
o Assigns brightness of nearest raster point:
f1(x,y)=gs(round(x),round(y))f_1(x,y) = g_s(\text{round}(x), \text{round}(y))f1(x,y)=gs
(round(x),round(y))
o Position error ≤ 0.5 pixel.
o Causes step-like edges on straight boundaries.
2. Linear Interpolation
o Uses 4 nearest neighbors.
o Brightness assumed linear:
f2(x,y)=(1−a)(1−b)gs(l,k)+a(1−b)gs(l+1,k)+b(1−a)gs(l,k+1)+abgs(l+1,k+1)f_2(x,y) = (1-a)(1-b)g_s(l,k) +
a(1-b)g_s(l+1,k) + b(1-a)g_s(l,k+1) + abg_s(l+1,k+1)f2(x,y)=(1−a)(1−b)gs(l,k)+a(1−b)gs(l+1,k)+b(1−a)gs
(l,k+1)+abgs(l+1,k+1)
l=floor(x), a=x−l, k=floor(y), b=y−kl = \text{floor}(x), \; a = x-l, \; k = \
text{floor}(y), \; b = y-kl=floor(x),a=x−l,k=floor(y),b=y−k
o Smoother than nearest neighbor.
o May cause slight blurring and resolution loss.
3. Bi-Cubic Interpolation
o Uses 16 neighboring points.
o Kernel (“Mexican hat”):
h3={1−2∣x∣2+∣x∣3,0≤∣x∣<14−8∣x∣+5∣x∣2−∣x∣3,1≤∣x∣<20,otherwiseh_3 = \begin{cases} 1 − 2|x|^2 + |x|^3,
& 0 ≤ |x| < 1 \\ 4 − 8|x| + 5|x|^2 − |x|^3, & 1 ≤ |x| < 2 \\ 0, & \text{otherwise} \end{cases}h3=⎩⎨⎧
1−2∣x∣2+∣x∣3,4−8∣x∣+5∣x∣2−∣x∣3,0,0≤∣x∣<11≤∣x∣<2otherwise
o Preserves fine details.
o Avoids both step-like edges (nearest neighbor) and blurring (linear).
o Common in raster displays with zooming.
✅ That’s a complete, point-wise extraction of Section 5.2 (Geometric Transformations + Brightness
Interpolation) without missing any detail.
5.4 Image Restoration
Definition
Pre-processing methods that suppress degradation using knowledge of its nature are called
image restoration.
Most methods are based on global convolution across the whole image.
Goal: Reconstruct the original image from its degraded version.
Causes of Image Degradation
Defective optical lenses
Non-linearity of electro-optical sensors
Film material graininess
Relative motion (object ↔ camera)
Incorrect focus
Atmospheric turbulence (remote sensing, astronomy)
Scanning of photographs
Types of Restoration Techniques
1. Deterministic
o Applied when noise is low and degradation function is known.
o Original image obtained by applying the inverse function.
2. Stochastic
o Used when degradation is noisy.
o Finds restoration based on statistical criteria (e.g., least squares).
Typical Simple Degradations
Relative constant speed motion of object vs. camera
Wrong lens focus
Atmospheric turbulence
Knowledge About Degradation
1. A priori knowledge (known beforehand)
o Example: If degradation is motion blur → only speed & direction need estimating.
o Device degradation (e.g., TV camera, digitizer) can be modeled using known sample
images.
2. A posteriori knowledge (estimated from degraded image)
o Example: Find interest points (corners, lines) and guess original look.
o Use spectral characteristics of homogeneous regions.
Mathematical Model of Image Degradation
General model:
g(i,j)=∬f(a,b)h(a,b,i,j)dadb+ν(i,j)g(i,j) = \iint f(a,b)h(a,b,i,j)dadb + \
nu(i,j)g(i,j)=∬f(a,b)h(a,b,i,j)dadb+ν(i,j)
Simplified (space-invariant h):
g(i,j)=(f∗h)(i,j)+ν(i,j)g(i,j) = (f * h)(i,j) + \nu(i,j)g(i,j)=(f∗h)(i,j)+ν(i,j)
If noise is negligible → Inverse Convolution (Deconvolution).
If noise significant → solved as overdetermined linear system (Least Squares, Wiener filter,
Kalman filter).
5.4.1 Degradations Easy to Restore
Fourier domain representation:
G=HFG = H FG=HF
Examples:
1. Relative Motion Blur (object ↔ camera during shutter time TTT):
H(u,v)=sin(πVTu)πVuH(u,v) = \frac{\sin(\pi VTu)}{\pi Vu}H(u,v)=πVusin(πVTu)
2. Wrong Lens Focus (thin lens):
H(u,v)=J1(ar)ar,r2=u2+v2H(u,v) = \frac{J_1(ar)}{ar}, \quad r^2 = u^2+v^2H(u,v)=arJ1(ar),r2=u2+v2
3. Atmospheric Turbulence:
H(u,v)=e−c(u2+v2)5/6H(u,v) = e^{-c(u^2+v^2)^{5/6}}H(u,v)=e−c(u2+v2)5/6
5.4.2 Inverse Filtering
Assumes degradation by linear function h(i,j)h(i,j)h(i,j).
Fourier domain:
G(u,v)=F(u,v)H(u,v)+N(u,v)G(u,v) = F(u,v)H(u,v) + N(u,v)G(u,v)=F(u,v)H(u,v)+N(u,v)
Restoration filter:
F(u,v)=G(u,v)H−1(u,v)−N(u,v)H−1(u,v)F(u,v) = G(u,v)H^{-1}(u,v) - N(u,v)H^{-1}
(u,v)F(u,v)=G(u,v)H−1(u,v)−N(u,v)H−1(u,v)
Works well only when noise is negligible.
Problems with noise:
o Noise dominates at frequencies where H(u,v)H(u,v)H(u,v) is small.
o Solution: Restrict restoration to low-frequency regions where H(u,v)H(u,v)H(u,v) is
strong.
5.4.3 Wiener Filtering
Wiener filter = Least Mean Square (LMS) filter.
Tries to incorporate noise statistics into restoration.
Minimizes Mean Square Error (MSE):
e2=E{(f(i,j)−f^(i,j))2}e^2 = E\{ (f(i,j) - \hat{f}(i,j))^2 \}e2=E{(f(i,j)−f^(i,j))2}
Fourier domain estimate:
F^(u,v)=HW(u,v)G(u,v)\hat{F}(u,v) = H_W(u,v) G(u,v)F^(u,v)=HW(u,v)G(u,v)
where
HW(u,v)=H∗(u,v)∣H(u,v)∣2+Sνν(u,v)/Sff(u,v)H_W(u,v) = \frac{H^*(u,v)}{|H(u,v)|^2 +
S_{\nu\nu}(u,v)/S_{ff}(u,v)}HW(u,v)=∣H(u,v)∣2+Sνν(u,v)/Sff(u,v)H∗(u,v)
Requires knowledge of:
o Degradation function HHH
o Noise spectral density SννS_{\nu\nu}Sνν
o Undegraded image spectrum SffS_{ff}Sff
Limitations of Wiener Filtering
1. MSE criterion is not perceptually optimal:
o Humans perceive errors differently across regions (more sensitive in flat/bright
areas, less in dark or high-gradient regions).
2. Cannot handle spatially variant degradations.
3. Images are non-stationary (large flat regions + edges), but Wiener assumes stationarity.
More advanced methods:
o Power Spectrum Equalization
o Geometric Mean Filtering
o Other specialized restoration techniques
Perfect 👍 Let’s go one by one in depth.
I’ll start with Thresholding Segmentation (Topic 1). After we finish it fully, I’ll continue to the next
topic in the following messages.
1. Thresholding Segmentation (Detailed Explanation)
Definition
Thresholding is the simplest image segmentation technique where pixels are divided into regions
based on their intensity values (brightness or gray level). The idea is:
Objects (foreground) usually have different intensity values compared to the background.
By choosing a threshold value (T), we can separate the object from the background.
Mathematical Representation
For a pixel at position (x, y) with intensity value f(x, y):
g(x,y)={1if f(x,y)>T(object/foreground)0if f(x,y)≤T(background)g(x,y) = \begin{cases} 1 & \text{if }
f(x,y) > T \quad \text{(object/foreground)} \\ 0 & \text{if } f(x,y) \leq T \quad \text{(background)} \
end{cases}
Here,
g(x,y)g(x,y) → segmented binary image
f(x,y)f(x,y) → original pixel intensity
TT → threshold
Types of Thresholding
1. Global Thresholding
o A single threshold TT is applied to the entire image.
o Works well when there is a clear intensity difference between object and
background.
Example: Separating black text from a white page.
2. Local (Adaptive) Thresholding
o Threshold value TT changes depending on local neighborhoods.
o Useful for images with uneven lighting/illumination.
Example: Detecting handwritten text on a page where some parts are darker than others.
3. Otsu’s Method
o An automatic thresholding method.
o It selects the threshold TT by maximizing the variance between foreground and
background classes.
o Steps:
1. Compute image histogram.
2. Try all possible thresholds.
3. Select the one with maximum between-class variance.
→ Very popular in medical imaging (e.g., separating tumor regions).
Advantages
Very simple and fast (low computational cost).
Works well when object and background intensities differ clearly.
Suitable for binary segmentation problems.
Disadvantages
Fails when object and background intensities overlap.
Sensitive to noise and illumination changes.
Not suitable for complex, multi-object images.
Applications
Document image analysis → separating text/characters from background.
Medical imaging → segmenting tumors, bones, or organs (if contrast is high).
Industrial inspection → detecting defective parts or cracks in materials.
Remote sensing → separating land vs. water, urban vs. vegetation regions.
✅ This completes Thresholding segmentation in detail.
Would you like me to continue with Edge-based segmentation (Topic 2) next, in the same deep-dive
style?
Great 👍 Let’s move to the second segmentation method.
Here’s a complete deep-dive on Edge-based Segmentation.
2. Edge-Based Segmentation (Detailed Explanation)
Definition
Edge-based segmentation is a technique where objects are segmented by detecting their edges
(boundaries).
An edge is a set of connected pixels where there is a sharp change in intensity/brightness.
The main idea is: Objects are different from their background, and the transition region
between them forms edges.
Mathematical Idea
Edges correspond to discontinuities in:
First derivative → Gradient (change in intensity).
Second derivative → Laplacian (rate of gradient change).
For image intensity function f(x,y)f(x,y):
Gradient:
∇f(x,y)=(∂f∂x,∂f∂y)\nabla f(x,y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right)
Magnitude of gradient:
∣∇f(x,y)∣=(∂f∂x)2+(∂f∂y)2|\nabla f(x,y)| = \sqrt{\left(\frac{\partial f}{\partial x}\right)^2 + \left(\frac{\
partial f}{\partial y}\right)^2}
High gradient values indicate edges.
Steps in Edge-Based Segmentation
1. Preprocessing (Noise Removal)
o Apply smoothing filter (e.g., Gaussian filter) to reduce noise.
o This prevents false edges.
2. Edge Detection
o Use operators to calculate gradient or Laplacian.
o Popular detectors:
Sobel Operator (computes gradient using convolution masks).
Prewitt Operator.
Roberts Operator.
Canny Edge Detector (multi-stage, most accurate).
3. Edge Linking and Boundary Detection
o Detected edges are often fragmented.
o Link them to form continuous boundaries of objects.
o Techniques: Hough Transform, graph search, contour following.
Popular Edge Detectors
Sobel Operator
o Uses two 3×3 convolution kernels to approximate gradients in x and y.
Prewitt Operator
o Similar to Sobel but simpler weights.
Roberts Cross Operator
o Works on diagonal gradients (2×2 kernel).
Canny Edge Detector (best one)
o Multi-step: smoothing, gradient calculation, non-maximum suppression, double
threshold, edge tracking.
o Produces thin, accurate edges.
Advantages
Highlights object boundaries clearly.
Works well when objects are well-defined by edges.
Canny gives thin, continuous, accurate edges.
Disadvantages
Sensitive to noise → false edges may appear.
Cannot segment objects with weak or blurred boundaries.
Produces edges, not full regions (so post-processing is required).
Applications
Medical imaging → Detecting bone edges, tumors, blood vessels.
Object detection → Shape analysis, face recognition.
Industrial inspection → Identifying defects, cracks, scratches.
Remote sensing → Detecting roads, rivers, field boundaries.
✅ That’s the complete Edge-based Segmentation explanation.
Would you like me to continue with Region-based Segmentation (Topic 3) next in the same style?
Perfect 👍 Let’s now move to the third segmentation technique.
3. Region-Based Segmentation (Detailed Explanation)
Definition
Region-based segmentation groups pixels into homogeneous regions based on similarity criteria
such as:
Intensity (brightness)
Color
Texture
Statistical properties
Unlike edge-based segmentation (which looks for boundaries), region-based segmentation grows
areas by finding pixels that are similar and belong together.
Basic Principle
A good segmentation must satisfy two conditions:
1. Homogeneity → All pixels inside a region are similar according to some criterion.
2. Discontinuity → Neighboring regions are different from each other.
Techniques in Region-Based Segmentation
1. Region Growing
Start with seed points (selected manually or automatically).
Add neighboring pixels that are similar in intensity/color to the seed.
Repeat until no more pixels can be added.
Advantages:
Simple, produces connected regions.
Works well when seed points are chosen correctly.
Disadvantages:
Sensitive to seed selection.
Sensitive to noise → may grow into unwanted areas.
2. Region Splitting and Merging
Splitting: Start with the whole image → split into smaller regions if not homogeneous.
Merging: Combine adjacent regions if they are similar.
Often implemented with quadtree decomposition.
Advantages:
Systematic, less dependent on seed points.
Disadvantages:
May produce blocky boundaries.
3. Watershed Segmentation (special case of region-based)
Treats image like a topographic surface.
Bright pixels = peaks, dark pixels = valleys.
Water is "poured" into valleys → divides image into catchment basins separated by
watershed lines (boundaries).
Works well on gradient images.
Problem: Over-segmentation (too many tiny regions).
Solution: Marker-controlled watershed (using markers to guide segmentation).
Advantages
Produces complete regions (not just edges).
Can incorporate multiple similarity features (intensity, color, texture).
Watershed works well for medical images (e.g., separating cells).
Disadvantages
Computationally expensive compared to thresholding.
Sensitive to noise and illumination.
Region growing depends heavily on seed point selection.
Watershed tends to over-segment if not controlled.
Applications
Medical imaging → Organ segmentation, tumor detection, separating overlapping cells.
Remote sensing → Land cover classification (vegetation, water, urban).
Object detection → Finding connected components in objects.
Industrial inspection → Identifying defective areas on surfaces.
✅ That’s the detailed Region-Based Segmentation explanation.
Would you like me to continue with Matching-Based Segmentation (Topic 4) next, in the same
detailed way?
Awesome 👍 Let’s go step by step into the fourth technique: Matching-Based Segmentation.
4. Matching-Based Segmentation (Detailed Explanation)
Definition
Matching-based segmentation is a model-driven approach, where objects are segmented by
matching parts of the image to predefined patterns or templates.
Instead of just relying on intensity, edges, or regions, this method compares parts of the image to a
reference model (template) of the object to be detected.
It is commonly used when we know what object we are looking for and have some prior knowledge
about its shape, size, or texture.
Basic Principle
1. Define a template/model of the object.
o Could be shape (circle, rectangle), texture, or structural pattern.
2. Slide/compare the template over the image.
3. Measure similarity between template and image patch.
4. Segment regions where similarity is highest.
Types of Matching Methods
1. Template Matching
Predefined template (e.g., edge pattern, shape) is compared to regions in the image.
Similarity measures:
o Cross-correlation
o Sum of Squared Differences (SSD)
o Normalized Cross-Correlation (NCC)
Example: Detecting traffic signs in road images by comparing them to circle/triangle templates.
2. Feature-Based Matching
Instead of full templates, use features such as:
o Corners (Harris, Shi-Tomasi)
o Keypoints (SIFT, SURF, ORB)
Match features between reference object and image using distance metrics.
Example: Detecting logos, faces, or objects in cluttered backgrounds.
3. Structural Matching
Used when objects are defined by geometric structures.
Represent object as a graph (nodes = features, edges = spatial relations).
Match image graph with model graph → segmentation.
Example: Handwritten character recognition, medical imaging of known organ shapes.
Advantages
Works well when we know the object’s shape/appearance beforehand.
Can detect specific objects even in cluttered images.
Feature-based matching is robust to scaling, rotation, and partial occlusion.
Disadvantages
Requires a predefined model/template.
Template matching is computationally expensive for large images.
Sensitive to illumination and noise (unless robust features are used).
Doesn’t generalize well to unknown objects.
Applications
Medical Imaging → Matching known anatomical structures (e.g., ventricles, tumors, bones).
Face Recognition → Matching face templates/features to detect/segment faces.
Industrial Vision → Detecting defects by comparing with a reference template.
Remote Sensing → Matching geographical structures like roads or buildings.
Robotics/Autonomous Driving → Detecting traffic signs, lane markers using shape matching.
✅ That’s the detailed Matching-Based Segmentation explanation.
Would you like me to continue with Mean Shift Segmentation (Topic 5) next, in the same style?
Great 👍 Let’s dive into Mean Shift Segmentation (Topic 5) in full detail.
5. Mean Shift Segmentation (Detailed Explanation)
Definition
Mean Shift Segmentation is a clustering-based, non-parametric segmentation technique.
It works by finding clusters in feature space (not just pixel intensity).
Features can be:
o Pixel color (RGB, HSV, etc.)
o Spatial position (x, y)
o Texture features
The algorithm iteratively shifts data points toward the densest region in the feature space (the
"mode" of the distribution). Each cluster corresponds to a segmented region.
Basic Principle
1. Each pixel is represented in a multidimensional feature space (e.g., color + spatial
coordinates).
2. A kernel (window) is placed around the point.
3. Compute the mean of data points inside the window.
4. Shift the window center to this mean.
5. Repeat until convergence (no significant shift).
6. All points converging to the same mode → form one cluster/region.
Mathematical Formulation
For a point xx, the mean shift vector is:
m(x)=∑i=1nK(xi−x)xi∑i=1nK(xi−x)−xm(x) = \frac{\sum_{i=1}^n K(x_i - x) x_i}{\sum_{i=1}^n K(x_i - x)} - x
Where:
xx = current position
xix_i = neighboring data points
KK = kernel function (e.g., Gaussian, flat kernel)
The algorithm shifts xx in the direction of m(x)m(x) until convergence.
Steps of Mean Shift Segmentation
1. Map image pixels into feature space (e.g., [x, y, R, G, B]).
2. Select a kernel bandwidth (window size).
3. For each pixel:
o Place window at its feature location.
o Compute mean of points inside.
o Shift center to mean.
o Repeat until convergence.
4. Merge points that converge to the same mode into one cluster → segment.
Key Properties
Non-parametric: No need to specify the number of clusters (unlike k-means).
Mode-seeking: Finds clusters by detecting density peaks.
Bandwidth (window size) controls segmentation:
o Large bandwidth → fewer, larger regions.
o Small bandwidth → more, smaller regions.
Advantages
Does not assume any prior shape/distribution of clusters.
Can handle complex, arbitrary-shaped regions.
Works in both color and spatial domains.
Robust to noise compared to thresholding.
Disadvantages
Computationally expensive (iterative shifting for each pixel).
Sensitive to bandwidth parameter (poor choice → under/over segmentation).
May struggle with very high-dimensional features.
Applications
Image segmentation → separating natural scenes into sky, water, vegetation, etc.
Medical imaging → clustering tissues with similar intensities and textures.
Tracking in videos → Mean shift tracker (used in object tracking like face tracking).
Remote sensing → land cover classification.
Computer vision → feature clustering, motion segmentation.
✅ That’s the full Mean Shift Segmentation explanation.
Would you like me to continue with Active Contour Models (Topic 6) next, in the same detailed
style?
Perfect 👍 Now let’s dive into the sixth segmentation technique: Active Contour Models (Snakes).
6. Active Contour Models (Detailed Explanation)
Definition
Active Contour Models (also called Snakes) are deformable curves that move within an image to fit
object boundaries.
The curve evolves under the influence of internal forces (smoothness constraint) and
external forces (image features such as edges, gradients).
The goal is to find the object boundary by minimizing an energy function.
Introduced by Kass, Witkin, and Terzopoulos (1988).
Basic Principle
Start with an initial contour (a curve drawn near the object).
The contour automatically moves/shrinks/expands to align with the object boundary.
The movement is driven by energy minimization.
Mathematical Formulation
The contour v(s)=(x(s),y(s))v(s) = (x(s), y(s)), where ss is the curve parameter.
The energy functional is:
E=∫[Eint(v(s))+Eimg(v(s))+Eext(v(s))]dsE = \int \left[ E_{\text{int}}(v(s)) + E_{\text{img}}(v(s)) + E_{\
text{ext}}(v(s)) \right] ds
Where:
EintE_{\text{int}} = Internal energy (keeps contour smooth, continuous).
EimgE_{\text{img}} = Image energy (attracts contour to edges, lines, corners).
EextE_{\text{ext}} = External constraints (user-defined forces, e.g., balloon force).
Forces Explanation
1. Internal Energy (Smoothness Constraint)
o Prevents the curve from breaking apart.
o Two terms:
Elasticity (controls stretching).
Rigidity (controls bending).
2. Image Energy
o Attracts contour to object features.
o Derived from image gradients, edge maps, or region descriptors.
3. External Energy
o User can apply forces (e.g., to push/pull contour toward expected boundary).
Variants of Active Contour Models
1. Edge-Based Active Contours (Classical Snakes)
o Contours move toward edges detected by gradients.
o Sensitive to weak edges → may stop early.
2. Region-Based Active Contours (Chan–Vese Model, 2001)
o Instead of edges, uses region statistics (intensity inside vs. outside contour).
o Works even when edges are weak or missing.
Advantages
Provides smooth, continuous boundaries.
Flexible: can handle irregular object shapes.
Region-based versions handle low-contrast images well.
Can be extended to 3D surfaces (for volumetric segmentation).
Disadvantages
Requires good initial contour placement (otherwise may converge to wrong boundary).
Sensitive to noise and weak edges (especially edge-based snakes).
Computationally expensive for large images.
Applications
Medical Imaging → Segmenting organs, tumors, blood vessels.
Object Tracking → Contour evolves frame by frame.
Computer Vision → Shape detection (faces, objects).
Remote Sensing → Extracting boundaries of lakes, rivers, roads.
✅ That’s the full explanation of Active Contour Models (Snakes).
Would you like me to continue with Geometric Deformable Models (Topic 7) next in the same
detailed style?
Great 👍 Let’s now move to the seventh segmentation method: Geometric Deformable Models
(GDMs).
7. Geometric Deformable Models (Detailed Explanation)
Definition
Geometric Deformable Models (GDMs) are an extension of active contour models, but instead of
representing contours explicitly (as parametric curves), they are represented implicitly using a
mathematical framework called the Level Set Method.
The contour is embedded as the zero level set of a higher-dimensional function (usually 2D
curve in 3D function space).
This allows the contour to automatically handle topological changes, such as splitting and
merging, which classical snakes cannot easily handle.
Basic Principle
Represent contour as a level set function ϕ(x,y)\phi(x, y).
The contour is the set of points where:
ϕ(x,y)=0\phi(x, y) = 0
As the curve evolves, the level set function is updated using partial differential equations
(PDEs).
This allows:
Smooth evolution of curves.
Handling merging, splitting, and complex shapes automatically.
Mathematical Formulation
The evolution of the contour is governed by:
∂ϕ∂t+F∣∇ϕ∣=0\frac{\partial \phi}{\partial t} + F |\nabla \phi| = 0
Where:
ϕ(x,y,t)\phi(x, y, t) = level set function at time tt.
FF = speed function (depends on image features like edges, intensity).
∣∇ϕ∣|\nabla \phi| = gradient magnitude of ϕ\phi.
If F>0F > 0, contour expands outward.
If F<0F < 0, contour shrinks inward.
Advantages over Classical Snakes
1. Handles Topological Changes Naturally
o If one contour splits into two (e.g., two nearby objects), GDM handles it
automatically.
o Classical snakes fail here because they are parametric.
2. Numerical Stability
o PDE-based formulation is robust and mathematically elegant.
3. 3D Extension
o Works not only for 2D images but also for 3D volumes (organs in CT/MRI scans).
Types of Geometric Models
Edge-Based Level Set Models
o Drive contour to object boundaries using gradients.
Region-Based Level Set Models (e.g., Chan–Vese Model in level set form)
o Use regional intensity differences → works with weak edges.
Advantages
Handles complex shapes, holes, merging, and splitting.
Works well for multi-object segmentation.
Easily extended to 3D segmentation (surfaces instead of curves).
More robust than parametric snakes.
Disadvantages
Computationally expensive (solving PDEs iteratively).
Sensitive to parameter tuning (speed function, regularization).
Requires careful initialization in some cases.
Applications
Medical Imaging → Brain tumor segmentation, vessel extraction, organ boundaries in
MRI/CT.
Computer Vision → Segmenting overlapping objects, shape analysis.
Remote Sensing → Extracting rivers, lakes, urban structures.
Biological Imaging → Cell tracking and shape evolution.
✅ That completes the Geometric Deformable Models (Level Set–based segmentation) explanation.
Would you like me to continue with Fuzzy Connectivity (Topic 8) next, in the same detailed style?
Perfect 👍 Let’s now move to Topic 8: Fuzzy Connectivity Segmentation in detail.
8. Fuzzy Connectivity Segmentation (Detailed Explanation)
Definition
Fuzzy Connectivity (FC) segmentation is a region-based image segmentation method that uses fuzzy
logic to determine how strongly a pixel belongs to a region (object).
Instead of classifying pixels as strictly inside or outside a region (like in hard thresholding), FC assigns
a degree of belonging between 0 and 1.
This allows the algorithm to deal with:
Ambiguous boundaries
Gradual intensity transitions
Noisy images
Basic Principle
Define a measure of fuzzy affinity between neighboring pixels.
Affinity indicates how similar two pixels are (based on intensity, color, texture, spatial
closeness).
A region is defined as a set of pixels that are strongly connected to a seed pixel (user-
provided or automatically chosen).
Connectivity strength is measured using fuzzy logic rules.
Mathematical Formulation
1. Fuzzy Affinity (κ)
o For two adjacent pixels pp and qq:
κ(p,q)∈[0,1]\kappa(p, q) \in [0, 1]
o κ=1\kappa = 1 → pixels are fully similar (high affinity).
o κ=0\kappa = 0 → pixels are completely dissimilar.
2. Fuzzy Connectedness (FC)
o Connectivity between pixel pp and region RR is defined as the maximum affinity
path connecting pp to RR.
o This is computed using graph-based path algorithms.
3. Segmentation Rule
o A pixel is included in the region if its fuzzy connectivity to the seed pixel is above a
threshold.
Steps in Fuzzy Connectivity Segmentation
1. Select seed point(s) (representing the object of interest).
2. Define fuzzy affinity function based on similarity (intensity, texture, spatial closeness).
3. Compute fuzzy connectivity of every pixel to the seed.
4. Generate the segmented region by thresholding connectivity values.
Advantages
Handles uncertain/ambiguous boundaries (e.g., medical images with fuzzy tissue edges).
More robust to noise than simple region growing.
Allows interactive segmentation (user can adjust seed points and thresholds).
Provides soft segmentation (with degrees of membership), useful in analysis.
Disadvantages
Computationally expensive for large images (needs graph-based computations).
Requires good seed selection.
Parameters (affinity function, threshold) need careful tuning.
Applications
Medical Imaging →
o Brain tumor segmentation in MRI.
o Vessel/organ segmentation in CT scans.
o Useful when tissue boundaries are not clear (e.g., diffuse tumors).
Biological Imaging → Identifying cells with fuzzy/uncertain borders.
Remote Sensing → Land cover classification when regions overlap.
General Vision → Image segmentation with uncertain object-background separation.
✅ That’s the full Fuzzy Connectivity Segmentation explanation.
Would you like me to continue with Toward 3D Graph-Based Image Segmentation (Topic 9) next in
the same detailed style?
Perfect 👍 Now let’s move to Topic 9: Toward 3D Graph-Based Image Segmentation.
9. Toward 3D Graph-Based Image Segmentation (Detailed
Explanation)
Definition
3D Graph-Based Segmentation is an extension of graph-based segmentation techniques (like Graph
Cuts, Normalized Cuts) to 3D volumetric data.
In 2D, an image is represented as a graph:
o Nodes = pixels
o Edges = similarity between pixels (e.g., intensity, color, texture, spatial closeness).
In 3D, nodes represent voxels (volume pixels), and edges connect neighboring voxels.
Segmentation = dividing the graph into regions by cutting edges while preserving strong
internal connectivity.
This approach is particularly important in medical imaging (MRI, CT, PET) and 3D computer vision.
Basic Principle
Construct a weighted graph G=(V,E)G = (V, E):
o VV: set of voxels (3D pixels).
o EE: edges between neighboring voxels.
o w(p,q)w(p, q): weight of edge = similarity between voxel pp and qq.
Segmentation is performed by partitioning the graph into subsets (regions) such that:
o High similarity inside regions.
o Low similarity between regions.
Mathematical Framework
1. Graph Construction
o Each voxel → node in graph.
o Edges connect voxels within a neighborhood (e.g., 6-, 18-, or 26-neighborhood in
3D).
o Edge weights depend on voxel similarity:
w(p,q)=e−∣∣I(p)−I(q)∣∣2σ2w(p, q) = e^{-\frac{||I(p)-I(q)||^2}{\sigma^2}}
where I(p)I(p) = intensity at voxel pp, σ\sigma = scale factor.
2. Graph Partitioning
o The goal is to minimize the cost of the cut:
Cut(A,B)=∑p∈A,q∈Bw(p,q)\text{Cut}(A, B) = \sum_{p \in A, q \in B} w(p, q)
o Different strategies:
Minimum Cut
Normalized Cut
Random Walks
3. 3D Extension
o Instead of pixels (2D), voxels (3D) are considered.
o Handles volumetric data → produces surfaces instead of curves.
Advantages
Works for complex 3D structures.
Provides globally optimal segmentation (in some methods like graph cuts).
Handles multiple objects and weak boundaries well.
Suitable for medical imaging where 3D continuity is essential.
Disadvantages
High computational cost (graphs in 3D are huge).
Requires efficient algorithms (multi-scale, parallel GPU implementations).
Sensitive to parameter tuning (similarity functions, cut criteria).
Applications
Medical Imaging
o Brain tumor segmentation in 3D MRI.
o Vessel segmentation in CT angiography.
o 3D organ segmentation (liver, heart, lungs).
Computer Vision & Graphics
o 3D object recognition.
o Shape analysis.
Robotics
o 3D scene understanding for navigation.
Remote Sensing & Geoscience
o Geological volume segmentation (rock layers, mineral deposits).
✅ That’s the detailed 3D Graph-Based Segmentation explanation.
Would you like me to continue with Graph Cut Segmentation (Topic 10) next in the same style?
Perfect 👍 Let’s move to the last topic in your list:
Optimal Single and Multiple Surface Segmentation
This is an advanced segmentation method often used in medical imaging, 3D image analysis, and
computer vision, where the goal is to extract one or more meaningful surfaces (boundaries) from
image volumes.
1. What is Surface Segmentation?
Unlike simple thresholding or region-growing that segment regions, surface segmentation
aims to detect continuous 2D surfaces embedded in 3D image volumes.
Example:
o Extracting the inner and outer surface of a brain cortex in MRI scans.
o Finding the boundary of an organ like the heart, lungs, or bones in CT scans.
2. Optimal Single Surface Segmentation
Here, the goal is to find one continuous surface that best separates two regions.
The method usually relies on energy minimization:
o Energy function = Data term (fitting surface to image) + Smoothness term (penalizing
irregular surfaces).
Algorithms used:
o Graph cut optimization (finding minimum cut corresponding to surface).
o Dynamic programming in 2D (for optimal contour extraction).
o Level set methods (evolving a surface to minimize energy).
👉 Example: Segmenting the boundary of the left ventricle in cardiac MRI.
3. Optimal Multiple Surface Segmentation
Many real-world problems require segmenting multiple related surfaces simultaneously.
Example:
o The inner and outer surfaces of a blood vessel wall.
o The layers of the retina in optical coherence tomography (OCT).
These surfaces are not independent; they often:
o Must remain non-intersecting.
o Have a fixed order (inner surface should always lie inside the outer surface).
Coupled optimization techniques are used:
o Graph cuts with constraints on multiple surfaces.
o Dynamic programming extended to multiple surfaces.
o Variational formulations with constraints.
4. Advantages
✅ Captures global optimum solutions (avoids local minima).
✅ Works well in 3D medical imaging where accurate boundaries are needed.
✅ Multiple surface segmentation respects anatomical constraints.
5. Applications
Medical Imaging:
o Brain structure segmentation (white matter, gray matter, ventricles).
o Retinal layer segmentation in OCT.
o Heart chamber wall extraction.
o Tumor boundary detection.
Computer Vision:
o Object boundary extraction in 3D point clouds.
o Industrial inspection (detecting surfaces of manufactured parts).
6. Challenges
❌ Computationally expensive (especially in 3D volumes).
❌ Sensitive to noise if energy terms are not well defined.
❌ Requires domain-specific constraints (medical knowledge, anatomical structure rules).
✅ Summary:
Optimal Single Surface Segmentation → finds one best boundary surface.
Optimal Multiple Surface Segmentation → extracts multiple coupled boundaries while
ensuring constraints (non-crossing, ordering).
Key tools: Graph cuts, dynamic programming, variational/level-set methods.
Applications: Mostly in medical imaging, but also in general 3D vision tasks.
✨ That completes all the segmentation topics you listed:
1. Thresholding
2. Edge-based
3. Region-based
4. Matching
5. Mean shift
6. Active contours
7. Geometric deformable models
8. Fuzzy connectivity
9. 3D Graph-based
10. Graph cut
11. Optimal surface segmentation
Do you want me to now combine all these into one clean "exam-ready notes PDF" (with diagrams
and comparisons), so you’ll have the entire segmentation chapter in one place?