01 Intro and Dig Image en

Introduction, digital image processing or computer vision, digital image
Vclav Hlav Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/hlavac, hlavac@fel.cvut.cz Outline of the lecture:
Digital image processing image analysis computer vision. Interpretation, its signicance for images. Image, image function f (x, y). Image digitization: sampling + quantization. Distance in the image, to be contiguous, region. Convex set, brightness histogram.
What is computer vision?

2/46
Computer vision is the science and technology of machines that see. As a scientic discipline: the theory for building articial systems that obtain information from images. As a technological discipline: construction of computer vision systems. Computer vision = Camera + Computer + ? Images (e.g.): views from multiple cameras, a video sequence, multi-dimensional data from a medical scanner.
Why to study image processing, analysis and computer vision?

Computer vision has grown on four pillars (at least): (1) Computer science; (2) Signal processing; (3) Pattern recognition; (4) Human vision. The history of more than 40 years. A rich methodology. Interesting interdisciplinary ties. Exciting insights into human vision. An important information source and modality in the information age.
Signal Processing Pattern Recognition Computer Vision
3/46
Human Vision
Computer Science
What are computer vision systems used for?

4/46
Controlling processes (e.g., an industrial robot or an autonomous vehicle). Detecting events (e.g., for visual surveillance, people counting, detecting a launching ballistic missile from a satellite). Organizing information (e.g., for indexing databases of images and image sequences). Modeling objects or environments (e.g. industrial inspection, medical image analysis or topographical modeling). Interaction (e.g. as the input to a device for computer-human interaction).
...
Perception
5/46
Process of attaining awareness or understanding of sensory information. A task is far more complex than it was imagined in the 1950s and 1960s: Building perceiving machines would take about a decade. However, it still very far from reality. Aristotles ve senses are: sight, hearing, touch, smell, taste. Perception conjectures a dynamic relationship between: description (in the brain) senses, surrounding, memory.
Human vision
6/46
Vision is really hard. Visual cortex occupies about 50% of Macaque brain. More human brain is devoted to vision than to anything else.
Human vision as opposed from computer vision

7/46
Vision allows both humans and animals to perceive and understand the world surrounding them. Cognitive science investigates vision in biological systems: It seeks empirical models which adequately describe biological vision. It sometimes describes vision as a computational system. Computer vision aims at engineering solutions, but its research is also interested in biological vision: Biological vision systems cope with tasks not yet solved in computer vision. They provide ideas for engineering solutions. Technical requirements for vision systems often match requirements for biological vision. Caution: Mimicking biological vision does not necessarily provide the best solution for a technical problem.
Image, digital image, pixel

8/46
(Continuous) image = the input (understood intuitively), e.g., on the retina or captured by a TV camera. Let us assume gray level image for simplicity. The continuous image function f (x, y) or a matrix of picture elements, pixels (after digitization). (x, y) are spatial coordinates of a pixel. Value f (x, y) corresponds usually to brightness. f (x, y, t) in the case of an image sequence, t corresponds to time.
10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10
Examples of input images

9/46
Why is computer vision hard? Let us nd six reasons (at least).
10/46
Why is computer vision hard? Let us nd six reasons (at least).

Loss of information in 3D 2D due to perspective transformation (mathematical abstraction = pinhole model). Measured brightness is given by a complicated image formation physics. Radiance ( brightness) depends on light sources intensity and positions, observer position, surface local geometry, and albedo. Inverse task is ill-posed.
10/46
Inherent presence of noise as each real world measurement is corrupted by noise. A lot of data Sheet A4, 300 dpi, 8 bit per pixel = 8.5 Mbytes. Non-interlaced video 512 768, RGB (24 bit) = 225 Mbits/second. Interpretation needed (to be discussed soon). Local window vs. need for global view
Insuciency of local view, illustration

11/46
Insuciency of local view, illustration

11/46
Interpretation and its role, semantics

12/46
Interpretation : Observation Model Syntax Semantics Examples: Looking out of the window {rains, does not rain}. An apple on the conveyer belt {class 1, class 2, class 3}. Trac scene seeking number plate of a car. Theoretical background: mathematical logic, theory of formal languages. Deep philosophical problem: Gdels incompleteness theorems, informally: a logic system with propositions cannot be proved or disproved.
From a low to a high level processing from the apriori knowledge point of view
Low level of knowledge (or none) = digital image processing Images are not interpreted. Methods are independent on a specic application area. Signal processing methods are used, e.g., the 2D Fourier transform.
13/46
Middle level of knowledge = image analysis Often 2D images only, e.g. cell images in an optical microscope. Interpretation explores an important additional knowledge allowing to solve tasks unsolvable otherwise. High level of knowledge = computer vision, e.g., understanding content of a 3D scene from images and videos The most general task formulations, 3D world, changing scenes. Complicated, interpretation is explored, feed back explored, articial intelligence methods. Goals are overambitious. Involved tasks are underconstraint and too ambitious. Tasks have to be radically simplied.
Role of the apriori knowledge, counterexample

14/46
Apriori knowledge about our world enables humans to understand multi-meaning images. Of course, apriori assumptions can mislead the human too . . . Counterexample: Ames chair. We can see chairs.
nic
Role of the apriori knowledge, counterexample

15/46
Apriori knowledge about our world enables humans to understand multi-meaning images. Of course, apriori assumptions can mislead the human too . . . Counterexample: Ames chair. We can see chairs.
nic Actually, there are no chairs.
The ultra brief history of computer vision

16/46
1966 M. Minsky assigns computer vision as an undergrad summer project. 1960 Interpretation of synthetic worlds, e.g. block world for robots. 1970s Some progress on interpreting selected images. 1980s Articial neural nets come and go; shift toward geometry and increased mathematical rigor; inspiration from biological vision (D. Marr et al.) 1990s Face recognition; statistical analysis in vogue; geometry of vision. 2000s Broader recognition; large annotated datasets available; video processing starts.
Image-based recogniton hierarchy of representations

Object or scene
17/46
2D image
Digital image
from objects to images from images to features
Regions
Edgels
Scale
Orientation
Texture
Image with features
from features to objects understanding objects
Objects
Image
18/46
Image is understood intuitively as the visual response on the retina or light sensitive chip in a camera, TV camera . . . Image function f (x, y), f (x, y, t) is the result of the perspective projection.
Y X y' - x' - y'
f point in a 3D scene
P(x,y,z)
y x'
ima
la ep
ne
xf x = , z
yf y = . z
Image function = 2D signal

19/46
Monochromatic static image f (x, y), where (x, y) are coordinates in a plane with the range R = {(x, y), 1 x xm, 1 y yn} ; f is a value of the image function ( brightness, optical density with transparent original, distance to the observer, temperature in termovision, etc.) (Natural) 2D images: A thin sample in the optical microscope, image of a letter (character) on a piece of paper, ngerprint, one slice in the tomograph, etc.
Example of a digital image a single slice from a X-ray tomograph
20/46
Digitization
21/46
Sampling & quantization of the image function value (also called intensity). Digital image is often represented as a matrix. Pixel = the acronym from picture element.
Image sampling
22/46
Consists of two tasks: 1. Arrangement of sampling points into a raster.
(a)
(b)
2. Distance between samples (Nyquist-Shannon sampling theorem). The sampling frequency must be > 2 higher than the maximal frequency (in the sense: which would be possible to reconstruct from the sampled signal). Informally, in images the samples size (pixel size) has to be twice smaller than the smallest detail of interest.
First image scanner, 1956

23/46
Image sampling, example 1

24/46
Original 256 256
128 128

25/46
Original 256 256
64 64

26/46
Original 256 256
32 32
Image quantization, example 1

27/46
Original 256 gray levels
64 gray levels

28/46
16 gray levels

29/46
4 gray levels
Image quantization, example 4 (binary image)

30/46
2 gray levels
The distance, mathematically

31/46
Function D is called the distance, if and only if
D(p, q) 0 , D(p, q) = D(q, p) ,
specially D(p, p) = 0 (identity). (symmetry).
D(p, r) D(p, q) + D(q, r) , (triangular inequality).
Several distance denitions in the square grid

32/46
Euclidean distance DE ((x, y), (h, k)) = (x h)2 + (y k)2 .
Manhattan distance (distance in a city with the rectangular street layout) D4((x, y), (h, k)) =| x h | + | y k | . Chessboard distance (from the king point of view in chess) D8((x, y), (h, k)) = max{| x h |, | y k |} .
0 1 2 3 4 0 1 2 DE D4 D8
4-neighborhood and 8-neighborhood

33/46
A set consisting of the pixel (called, e.g., a representative pixel or point) and its neighbors of distance 1.
Paradox of crossing line segments

34/46
Binary image & the relation be contiguous

35/46
black objects white background
A note for curious. Japanees kanji symbol means near to here. Introduction of the concept object allows to select those pixels on a grid which have some particular meaning (recall discussion about interpretation). Here, black pixels belong to the object a character. Neighboring pixels are contiguous. Two pixels are contiguous if and only if there is a path consisting of contiguous pixels.
Region = contiguous set

36/46
The relation x is contiguous to y is reexive, x x, symmetric x y = y x and transitive (x y) & (y z) = x z. Thus it is an equivalence relation. Any equivalence relation decomposes a set into subsets called classes of equivalence. These are regions in our particular case of relation to be contiguous. In the image below, dierent regions are labeled by dierent colors.
Region boundary
37/46
The Region boundary (also border) R is the set of pixels is the set of pixels within the region that have one or more neighbors outside R. Theoretically, the the continuous image function innitesimally thin boundary. In a digital image, the boundary has always a nite width. Consequently, it is necessary to distinguish inner and outer boundary.
Boundary (border) of a region edge in the image edge element (edgel).
Convex set, convex hul

38/46
Convex set = any two points of it can be connected by a straight line which lies inside the set.
convex
non-convex
Convex hull, lake, bay.
Region
Convex hull
Lakes Bays
Distance transform, DT
39/46
Called also: distance function, chamfering algorithm (analogy of woodcarving). DT provides in each pixel the distance from some image subset (perhaps describing objects). The resulting DT image has pixel values of 0 for elements of the relevant subset, low values for close pixels, and then high values for pixels remote from it. For a binary image, DT provides the distance from each pixel to the nearest non-zero pixel (object).
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 5 4 3 2 1 1 1 4 3 2 1 0 0 0 0 4 3 2 1 0 1 1 1 3 2 2 2 1 2 2 2 2 1 1 1 2 3 3 3 1 0 0 0 1 2 3 4 0 1 1 1 0 1 2 3 1 2 2 2 1 0 1 2
0 1 0 0
0 1
input image
DT result
Distance transform algorithm informally

40/46
The famous two-pass algorithm calculating DT by Rosenfeld, Pfaltz (1966) for distances D4 and D8. The idea is to traverse the image by a small local mask. The rst pass starts from top-left corner of the image and moves row-wise horizontally left to right. The second pass goes from the bottom-right corner in the opposite bottom-up manner, right to left.
Top-down
AL AL p AL AL AL AL p p BR BR p BR
Bottom-up
BR BR BR
4-neighb. 8-neighb.
4-neighb. 8-neighb.
The eectiveness of the algorithm comes from propagating the values of the previous image investigation in a wave-like manner.
Distance transform algorithm

41/46
1. To calculate the distance transform for a subset S of an image of dimension M N with respect to a distance metric D, where D is one of D4 or D8, construct an M N array F with elements corresponding to the set S set to 0, and all other elements set to innity. 2. Pass through the image row by row, from top to bottom and left to right. For each neighboring pixel above and to the left (illustrated in Figure ?? by the set AL) set F (p) = min F (p), D(p, q) + F (q) .
q AL
3. Pass through the image row by row, from bottom to top and right to left. For each neighboring pixel below and to the right (the set BR in Figure ??), set F (p) = min F (p), D(p, q) + F (q) .
q BR
4. The array F now holds a chamfer of the subset S.
DT illustration for three distance denitions

42/46
Euclidean
D4
D8
Quasieucledean distance
43/46
Eucledean DT cannot be easily computed in two passes only. The quasieucledean distance approximation is often used which can be obtained in two passes. DQE (i, j), (h, k) = |i h| + ( 2 1) |j k| for |i h| > |j k| , ( 2 1) |i h| + |j k| otherwise.
Euclidean
quasieuclidean
DT, starsh example, input image

44/46
Input color image of a starfish
Starfish converted to grayscale image
Segmented to logical image; 0object; 1background
50
50
50
100
100
100
150
150
150
200
200
200
250
250
250
300 50 100 150 200 250 300
300 50 100 150 200 250 300
300 50 100 150 200 250 300
color
grayscale
binary
DT, starsh example, results

45/46
Distance transform, distance D4 (cityblock) Distance transform, distance D8 (chessboard) 50 50
100
100
150
150
200
200
250
250
300 50 100 150 200 250 300
300 50 100 150 200 250 300
D4
Distance transform, distance DQE (quasiEuclidean)
D8
Distance transform, distance DE (Euclidean)
50
50
100
100
150
150
200
200
250
250
300 50 100 150 200 250 300
300 50 100 150 200 250 300
quazieuclidean
euclidean
Brightness histogram
46/46
Histogram of brightness values serves as the probability density estimate of a phenomenon, that a pixel has a denite brightness.
3500
3000
2500
2000
1500
1000
500
50
100
150
200
250
input image
its brightness histogram

01 Intro and Dig Image en

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 Intro and Dig Image en

Uploaded by

Copyright:

Available Formats

Introduction, digital image processing or computer vision, digital image

What is computer vision?

Why to study image processing, analysis and computer vision?

What are computer vision systems used for?

Human vision as opposed from computer vision

Image, digital image, pixel

Examples of input images

Why is computer vision hard? Let us nd six reasons (at least).

Why is computer vision hard? Let us nd six reasons (at least).

Insuciency of local view, illustration

Insuciency of local view, illustration

Interpretation and its role, semantics

Role of the apriori knowledge, counterexample

Role of the apriori knowledge, counterexample

nic Actually, there are no chairs.

The ultra brief history of computer vision

Image-based recogniton hierarchy of representations

from objects to images from images to features

Image with features

from features to objects understanding objects

Image function = 2D signal

Example of a digital image a single slice from a X-ray tomograph

Consists of two tasks: 1. Arrangement of sampling points into a raster.

First image scanner, 1956

Image sampling, example 1

Original 256 256

Image sampling, example 2

Original 256 256

Image sampling, example 3

Original 256 256

Image quantization, example 1

Original 256 gray levels

Image quantization, example 2

Original 256 gray levels

Image quantization, example 3

Original 256 gray levels

Image quantization, example 4 (binary image)

Original 256 gray levels

The distance, mathematically

Function D is called the distance, if and only if

D(p, q) 0 , D(p, q) = D(q, p) ,

specially D(p, p) = 0 (identity). (symmetry).

D(p, r) D(p, q) + D(q, r) , (triangular inequality).

Several distance denitions in the square grid

Euclidean distance DE ((x, y), (h, k)) = (x h)2 + (y k)2 .

4-neighborhood and 8-neighborhood

Paradox of crossing line segments

Binary image & the relation be contiguous

black objects white background

Region = contiguous set

Boundary (border) of a region edge in the image edge element (edgel).

Convex set, convex hul

Convex hull, lake, bay.

Distance transform algorithm informally

Distance transform algorithm

4. The array F now holds a chamfer of the subset S.

DT illustration for three distance denitions

DT, starsh example, input image

Input color image of a starfish

Starfish converted to grayscale image

Segmented to logical image; 0object; 1background

300 50 100 150 200 250 300