You are on page 1of 48

Introduction, digital image processing or computer vision, digital image

Vclav Hlav Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/hlavac, hlavac@fel.cvut.cz Outline of the lecture:
Digital image processing image analysis computer vision. Interpretation, its signicance for images. Image, image function f (x, y). Image digitization: sampling + quantization. Distance in the image, to be contiguous, region. Convex set, brightness histogram.

What is computer vision?


2/46

Computer vision is the science and technology of machines that see. As a scientic discipline: the theory for building articial systems that obtain information from images. As a technological discipline: construction of computer vision systems. Computer vision = Camera + Computer + ? Images (e.g.): views from multiple cameras, a video sequence, multi-dimensional data from a medical scanner.

Why to study image processing, analysis and computer vision?


Computer vision has grown on four pillars (at least): (1) Computer science; (2) Signal processing; (3) Pattern recognition; (4) Human vision. The history of more than 40 years. A rich methodology. Interesting interdisciplinary ties. Exciting insights into human vision. An important information source and modality in the information age.
Signal Processing Pattern Recognition Computer Vision

3/46

Human Vision

Computer Science

What are computer vision systems used for?


4/46

Controlling processes (e.g., an industrial robot or an autonomous vehicle). Detecting events (e.g., for visual surveillance, people counting, detecting a launching ballistic missile from a satellite). Organizing information (e.g., for indexing databases of images and image sequences). Modeling objects or environments (e.g. industrial inspection, medical image analysis or topographical modeling). Interaction (e.g. as the input to a device for computer-human interaction).

...

Perception
5/46

Process of attaining awareness or understanding of sensory information. A task is far more complex than it was imagined in the 1950s and 1960s: Building perceiving machines would take about a decade. However, it still very far from reality. Aristotles ve senses are: sight, hearing, touch, smell, taste. Perception conjectures a dynamic relationship between: description (in the brain) senses, surrounding, memory.

Human vision
6/46

Vision is really hard. Visual cortex occupies about 50% of Macaque brain. More human brain is devoted to vision than to anything else.

Human vision as opposed from computer vision


7/46

Vision allows both humans and animals to perceive and understand the world surrounding them. Cognitive science investigates vision in biological systems: It seeks empirical models which adequately describe biological vision. It sometimes describes vision as a computational system. Computer vision aims at engineering solutions, but its research is also interested in biological vision: Biological vision systems cope with tasks not yet solved in computer vision. They provide ideas for engineering solutions. Technical requirements for vision systems often match requirements for biological vision. Caution: Mimicking biological vision does not necessarily provide the best solution for a technical problem.

Image, digital image, pixel


8/46

(Continuous) image = the input (understood intuitively), e.g., on the retina or captured by a TV camera. Let us assume gray level image for simplicity. The continuous image function f (x, y) or a matrix of picture elements, pixels (after digitization). (x, y) are spatial coordinates of a pixel. Value f (x, y) corresponds usually to brightness. f (x, y, t) in the case of an image sequence, t corresponds to time.

10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10

Examples of input images


9/46

Why is computer vision hard? Let us nd six reasons (at least).

10/46

Why is computer vision hard? Let us nd six reasons (at least).


Loss of information in 3D 2D due to perspective transformation (mathematical abstraction = pinhole model). Measured brightness is given by a complicated image formation physics. Radiance ( brightness) depends on light sources intensity and positions, observer position, surface local geometry, and albedo. Inverse task is ill-posed.

10/46

Inherent presence of noise as each real world measurement is corrupted by noise. A lot of data Sheet A4, 300 dpi, 8 bit per pixel = 8.5 Mbytes. Non-interlaced video 512 768, RGB (24 bit) = 225 Mbits/second. Interpretation needed (to be discussed soon). Local window vs. need for global view

Insuciency of local view, illustration


11/46

Insuciency of local view, illustration


11/46

Interpretation and its role, semantics


12/46

Interpretation : Observation Model Syntax Semantics Examples: Looking out of the window {rains, does not rain}. An apple on the conveyer belt {class 1, class 2, class 3}. Trac scene seeking number plate of a car. Theoretical background: mathematical logic, theory of formal languages. Deep philosophical problem: Gdels incompleteness theorems, informally: a logic system with propositions cannot be proved or disproved.

From a low to a high level processing from the apriori knowledge point of view
Low level of knowledge (or none) = digital image processing Images are not interpreted. Methods are independent on a specic application area. Signal processing methods are used, e.g., the 2D Fourier transform.

13/46

Middle level of knowledge = image analysis Often 2D images only, e.g. cell images in an optical microscope. Interpretation explores an important additional knowledge allowing to solve tasks unsolvable otherwise. High level of knowledge = computer vision, e.g., understanding content of a 3D scene from images and videos The most general task formulations, 3D world, changing scenes. Complicated, interpretation is explored, feed back explored, articial intelligence methods. Goals are overambitious. Involved tasks are underconstraint and too ambitious. Tasks have to be radically simplied.

Role of the apriori knowledge, counterexample


14/46

Apriori knowledge about our world enables humans to understand multi-meaning images. Of course, apriori assumptions can mislead the human too . . . Counterexample: Ames chair. We can see chairs.

nic

Role of the apriori knowledge, counterexample


15/46

Apriori knowledge about our world enables humans to understand multi-meaning images. Of course, apriori assumptions can mislead the human too . . . Counterexample: Ames chair. We can see chairs.

nic Actually, there are no chairs.

The ultra brief history of computer vision


16/46

1966 M. Minsky assigns computer vision as an undergrad summer project. 1960 Interpretation of synthetic worlds, e.g. block world for robots. 1970s Some progress on interpreting selected images. 1980s Articial neural nets come and go; shift toward geometry and increased mathematical rigor; inspiration from biological vision (D. Marr et al.) 1990s Face recognition; statistical analysis in vogue; geometry of vision. 2000s Broader recognition; large annotated datasets available; video processing starts.

Image-based recogniton hierarchy of representations


Object or scene

17/46

2D image

Digital image

from objects to images from images to features

Regions

Edgels

Scale

Orientation

Texture

Image with features

from features to objects understanding objects

Objects

Image
18/46

Image is understood intuitively as the visual response on the retina or light sensitive chip in a camera, TV camera . . . Image function f (x, y), f (x, y, t) is the result of the perspective projection.
Y X y' - x' - y'
f point in a 3D scene

P(x,y,z)

y x'

ima

la ep

ne

xf x = , z

yf y = . z

Image function = 2D signal


19/46

Monochromatic static image f (x, y), where (x, y) are coordinates in a plane with the range R = {(x, y), 1 x xm, 1 y yn} ; f is a value of the image function ( brightness, optical density with transparent original, distance to the observer, temperature in termovision, etc.) (Natural) 2D images: A thin sample in the optical microscope, image of a letter (character) on a piece of paper, ngerprint, one slice in the tomograph, etc.

Example of a digital image a single slice from a X-ray tomograph

20/46

Digitization
21/46

Sampling & quantization of the image function value (also called intensity). Digital image is often represented as a matrix. Pixel = the acronym from picture element.

Image sampling
22/46

Consists of two tasks: 1. Arrangement of sampling points into a raster.

(a)

(b)

2. Distance between samples (Nyquist-Shannon sampling theorem). The sampling frequency must be > 2 higher than the maximal frequency (in the sense: which would be possible to reconstruct from the sampled signal). Informally, in images the samples size (pixel size) has to be twice smaller than the smallest detail of interest.

First image scanner, 1956


23/46

Image sampling, example 1


24/46

Original 256 256

128 128

Image sampling, example 2


25/46

Original 256 256

64 64

Image sampling, example 3


26/46

Original 256 256

32 32

Image quantization, example 1


27/46

Original 256 gray levels

64 gray levels

Image quantization, example 2


28/46

Original 256 gray levels

16 gray levels

Image quantization, example 3


29/46

Original 256 gray levels

4 gray levels

Image quantization, example 4 (binary image)


30/46

Original 256 gray levels

2 gray levels

The distance, mathematically


31/46

Function D is called the distance, if and only if

D(p, q) 0 , D(p, q) = D(q, p) ,

specially D(p, p) = 0 (identity). (symmetry).

D(p, r) D(p, q) + D(q, r) , (triangular inequality).

Several distance denitions in the square grid


32/46

Euclidean distance DE ((x, y), (h, k)) = (x h)2 + (y k)2 .

Manhattan distance (distance in a city with the rectangular street layout) D4((x, y), (h, k)) =| x h | + | y k | . Chessboard distance (from the king point of view in chess) D8((x, y), (h, k)) = max{| x h |, | y k |} .

0 1 2 3 4 0 1 2 DE D4 D8

4-neighborhood and 8-neighborhood


33/46

A set consisting of the pixel (called, e.g., a representative pixel or point) and its neighbors of distance 1.

Paradox of crossing line segments


34/46

Binary image & the relation be contiguous


35/46

black objects white background

A note for curious. Japanees kanji symbol means near to here. Introduction of the concept object allows to select those pixels on a grid which have some particular meaning (recall discussion about interpretation). Here, black pixels belong to the object a character. Neighboring pixels are contiguous. Two pixels are contiguous if and only if there is a path consisting of contiguous pixels.

Region = contiguous set


36/46

The relation x is contiguous to y is reexive, x x, symmetric x y = y x and transitive (x y) & (y z) = x z. Thus it is an equivalence relation. Any equivalence relation decomposes a set into subsets called classes of equivalence. These are regions in our particular case of relation to be contiguous. In the image below, dierent regions are labeled by dierent colors.

Region boundary
37/46

The Region boundary (also border) R is the set of pixels is the set of pixels within the region that have one or more neighbors outside R. Theoretically, the the continuous image function innitesimally thin boundary. In a digital image, the boundary has always a nite width. Consequently, it is necessary to distinguish inner and outer boundary.

Boundary (border) of a region edge in the image edge element (edgel).

Convex set, convex hul


38/46

Convex set = any two points of it can be connected by a straight line which lies inside the set.

convex

non-convex

Convex hull, lake, bay.

Region

Convex hull

Lakes Bays

Distance transform, DT
39/46

Called also: distance function, chamfering algorithm (analogy of woodcarving). DT provides in each pixel the distance from some image subset (perhaps describing objects). The resulting DT image has pixel values of 0 for elements of the relevant subset, low values for close pixels, and then high values for pixels remote from it. For a binary image, DT provides the distance from each pixel to the nearest non-zero pixel (object).
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 5 4 3 2 1 1 1 4 3 2 1 0 0 0 0 4 3 2 1 0 1 1 1 3 2 2 2 1 2 2 2 2 1 1 1 2 3 3 3 1 0 0 0 1 2 3 4 0 1 1 1 0 1 2 3 1 2 2 2 1 0 1 2

0 1 0 0

0 1

input image

DT result

Distance transform algorithm informally


40/46

The famous two-pass algorithm calculating DT by Rosenfeld, Pfaltz (1966) for distances D4 and D8. The idea is to traverse the image by a small local mask. The rst pass starts from top-left corner of the image and moves row-wise horizontally left to right. The second pass goes from the bottom-right corner in the opposite bottom-up manner, right to left.
Top-down
AL AL p AL AL AL AL p p BR BR p BR

Bottom-up
BR BR BR

4-neighb. 8-neighb.

4-neighb. 8-neighb.

The eectiveness of the algorithm comes from propagating the values of the previous image investigation in a wave-like manner.

Distance transform algorithm


41/46

1. To calculate the distance transform for a subset S of an image of dimension M N with respect to a distance metric D, where D is one of D4 or D8, construct an M N array F with elements corresponding to the set S set to 0, and all other elements set to innity. 2. Pass through the image row by row, from top to bottom and left to right. For each neighboring pixel above and to the left (illustrated in Figure ?? by the set AL) set F (p) = min F (p), D(p, q) + F (q) .
q AL

3. Pass through the image row by row, from bottom to top and right to left. For each neighboring pixel below and to the right (the set BR in Figure ??), set F (p) = min F (p), D(p, q) + F (q) .
q BR

4. The array F now holds a chamfer of the subset S.

DT illustration for three distance denitions


42/46

Euclidean

D4

D8

Quasieucledean distance
43/46

Eucledean DT cannot be easily computed in two passes only. The quasieucledean distance approximation is often used which can be obtained in two passes. DQE (i, j), (h, k) = |i h| + ( 2 1) |j k| for |i h| > |j k| , ( 2 1) |i h| + |j k| otherwise.

Euclidean

quasieuclidean

DT, starsh example, input image


44/46

Input color image of a starfish

Starfish converted to grayscale image

Segmented to logical image; 0object; 1background

50

50

50

100

100

100

150

150

150

200

200

200

250

250

250

300 50 100 150 200 250 300

300 50 100 150 200 250 300

300 50 100 150 200 250 300

color

grayscale

binary

DT, starsh example, results


45/46
Distance transform, distance D4 (cityblock) Distance transform, distance D8 (chessboard) 50 50

100

100

150

150

200

200

250

250

300 50 100 150 200 250 300

300 50 100 150 200 250 300

D4
Distance transform, distance DQE (quasiEuclidean)

D8
Distance transform, distance DE (Euclidean)

50

50

100

100

150

150

200

200

250

250

300 50 100 150 200 250 300

300 50 100 150 200 250 300

quazieuclidean

euclidean

Brightness histogram
46/46

Histogram of brightness values serves as the probability density estimate of a phenomenon, that a pixel has a denite brightness.
3500

3000

2500

2000

1500

1000

500

50

100

150

200

250

input image

its brightness histogram

You might also like