Professional Documents
Culture Documents
Vclav Hlav Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/hlavac, hlavac@fel.cvut.cz Outline of the lecture:
Digital image processing image analysis computer vision. Interpretation, its signicance for images. Image, image function f (x, y). Image digitization: sampling + quantization. Distance in the image, to be contiguous, region. Convex set, brightness histogram.
Computer vision is the science and technology of machines that see. As a scientic discipline: the theory for building articial systems that obtain information from images. As a technological discipline: construction of computer vision systems. Computer vision = Camera + Computer + ? Images (e.g.): views from multiple cameras, a video sequence, multi-dimensional data from a medical scanner.
3/46
Human Vision
Computer Science
Controlling processes (e.g., an industrial robot or an autonomous vehicle). Detecting events (e.g., for visual surveillance, people counting, detecting a launching ballistic missile from a satellite). Organizing information (e.g., for indexing databases of images and image sequences). Modeling objects or environments (e.g. industrial inspection, medical image analysis or topographical modeling). Interaction (e.g. as the input to a device for computer-human interaction).
...
Perception
5/46
Process of attaining awareness or understanding of sensory information. A task is far more complex than it was imagined in the 1950s and 1960s: Building perceiving machines would take about a decade. However, it still very far from reality. Aristotles ve senses are: sight, hearing, touch, smell, taste. Perception conjectures a dynamic relationship between: description (in the brain) senses, surrounding, memory.
Human vision
6/46
Vision is really hard. Visual cortex occupies about 50% of Macaque brain. More human brain is devoted to vision than to anything else.
Vision allows both humans and animals to perceive and understand the world surrounding them. Cognitive science investigates vision in biological systems: It seeks empirical models which adequately describe biological vision. It sometimes describes vision as a computational system. Computer vision aims at engineering solutions, but its research is also interested in biological vision: Biological vision systems cope with tasks not yet solved in computer vision. They provide ideas for engineering solutions. Technical requirements for vision systems often match requirements for biological vision. Caution: Mimicking biological vision does not necessarily provide the best solution for a technical problem.
(Continuous) image = the input (understood intuitively), e.g., on the retina or captured by a TV camera. Let us assume gray level image for simplicity. The continuous image function f (x, y) or a matrix of picture elements, pixels (after digitization). (x, y) are spatial coordinates of a pixel. Value f (x, y) corresponds usually to brightness. f (x, y, t) in the case of an image sequence, t corresponds to time.
10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10
10/46
10/46
Inherent presence of noise as each real world measurement is corrupted by noise. A lot of data Sheet A4, 300 dpi, 8 bit per pixel = 8.5 Mbytes. Non-interlaced video 512 768, RGB (24 bit) = 225 Mbits/second. Interpretation needed (to be discussed soon). Local window vs. need for global view
Interpretation : Observation Model Syntax Semantics Examples: Looking out of the window {rains, does not rain}. An apple on the conveyer belt {class 1, class 2, class 3}. Trac scene seeking number plate of a car. Theoretical background: mathematical logic, theory of formal languages. Deep philosophical problem: Gdels incompleteness theorems, informally: a logic system with propositions cannot be proved or disproved.
From a low to a high level processing from the apriori knowledge point of view
Low level of knowledge (or none) = digital image processing Images are not interpreted. Methods are independent on a specic application area. Signal processing methods are used, e.g., the 2D Fourier transform.
13/46
Middle level of knowledge = image analysis Often 2D images only, e.g. cell images in an optical microscope. Interpretation explores an important additional knowledge allowing to solve tasks unsolvable otherwise. High level of knowledge = computer vision, e.g., understanding content of a 3D scene from images and videos The most general task formulations, 3D world, changing scenes. Complicated, interpretation is explored, feed back explored, articial intelligence methods. Goals are overambitious. Involved tasks are underconstraint and too ambitious. Tasks have to be radically simplied.
Apriori knowledge about our world enables humans to understand multi-meaning images. Of course, apriori assumptions can mislead the human too . . . Counterexample: Ames chair. We can see chairs.
nic
Apriori knowledge about our world enables humans to understand multi-meaning images. Of course, apriori assumptions can mislead the human too . . . Counterexample: Ames chair. We can see chairs.
1966 M. Minsky assigns computer vision as an undergrad summer project. 1960 Interpretation of synthetic worlds, e.g. block world for robots. 1970s Some progress on interpreting selected images. 1980s Articial neural nets come and go; shift toward geometry and increased mathematical rigor; inspiration from biological vision (D. Marr et al.) 1990s Face recognition; statistical analysis in vogue; geometry of vision. 2000s Broader recognition; large annotated datasets available; video processing starts.
17/46
2D image
Digital image
Regions
Edgels
Scale
Orientation
Texture
Objects
Image
18/46
Image is understood intuitively as the visual response on the retina or light sensitive chip in a camera, TV camera . . . Image function f (x, y), f (x, y, t) is the result of the perspective projection.
Y X y' - x' - y'
f point in a 3D scene
P(x,y,z)
y x'
ima
la ep
ne
xf x = , z
yf y = . z
Monochromatic static image f (x, y), where (x, y) are coordinates in a plane with the range R = {(x, y), 1 x xm, 1 y yn} ; f is a value of the image function ( brightness, optical density with transparent original, distance to the observer, temperature in termovision, etc.) (Natural) 2D images: A thin sample in the optical microscope, image of a letter (character) on a piece of paper, ngerprint, one slice in the tomograph, etc.
20/46
Digitization
21/46
Sampling & quantization of the image function value (also called intensity). Digital image is often represented as a matrix. Pixel = the acronym from picture element.
Image sampling
22/46
(a)
(b)
2. Distance between samples (Nyquist-Shannon sampling theorem). The sampling frequency must be > 2 higher than the maximal frequency (in the sense: which would be possible to reconstruct from the sampled signal). Informally, in images the samples size (pixel size) has to be twice smaller than the smallest detail of interest.
128 128
64 64
32 32
64 gray levels
16 gray levels
4 gray levels
2 gray levels
Manhattan distance (distance in a city with the rectangular street layout) D4((x, y), (h, k)) =| x h | + | y k | . Chessboard distance (from the king point of view in chess) D8((x, y), (h, k)) = max{| x h |, | y k |} .
0 1 2 3 4 0 1 2 DE D4 D8
A set consisting of the pixel (called, e.g., a representative pixel or point) and its neighbors of distance 1.
A note for curious. Japanees kanji symbol means near to here. Introduction of the concept object allows to select those pixels on a grid which have some particular meaning (recall discussion about interpretation). Here, black pixels belong to the object a character. Neighboring pixels are contiguous. Two pixels are contiguous if and only if there is a path consisting of contiguous pixels.
The relation x is contiguous to y is reexive, x x, symmetric x y = y x and transitive (x y) & (y z) = x z. Thus it is an equivalence relation. Any equivalence relation decomposes a set into subsets called classes of equivalence. These are regions in our particular case of relation to be contiguous. In the image below, dierent regions are labeled by dierent colors.
Region boundary
37/46
The Region boundary (also border) R is the set of pixels is the set of pixels within the region that have one or more neighbors outside R. Theoretically, the the continuous image function innitesimally thin boundary. In a digital image, the boundary has always a nite width. Consequently, it is necessary to distinguish inner and outer boundary.
Convex set = any two points of it can be connected by a straight line which lies inside the set.
convex
non-convex
Region
Convex hull
Lakes Bays
Distance transform, DT
39/46
Called also: distance function, chamfering algorithm (analogy of woodcarving). DT provides in each pixel the distance from some image subset (perhaps describing objects). The resulting DT image has pixel values of 0 for elements of the relevant subset, low values for close pixels, and then high values for pixels remote from it. For a binary image, DT provides the distance from each pixel to the nearest non-zero pixel (object).
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 5 4 3 2 1 1 1 4 3 2 1 0 0 0 0 4 3 2 1 0 1 1 1 3 2 2 2 1 2 2 2 2 1 1 1 2 3 3 3 1 0 0 0 1 2 3 4 0 1 1 1 0 1 2 3 1 2 2 2 1 0 1 2
0 1 0 0
0 1
input image
DT result
The famous two-pass algorithm calculating DT by Rosenfeld, Pfaltz (1966) for distances D4 and D8. The idea is to traverse the image by a small local mask. The rst pass starts from top-left corner of the image and moves row-wise horizontally left to right. The second pass goes from the bottom-right corner in the opposite bottom-up manner, right to left.
Top-down
AL AL p AL AL AL AL p p BR BR p BR
Bottom-up
BR BR BR
4-neighb. 8-neighb.
4-neighb. 8-neighb.
The eectiveness of the algorithm comes from propagating the values of the previous image investigation in a wave-like manner.
1. To calculate the distance transform for a subset S of an image of dimension M N with respect to a distance metric D, where D is one of D4 or D8, construct an M N array F with elements corresponding to the set S set to 0, and all other elements set to innity. 2. Pass through the image row by row, from top to bottom and left to right. For each neighboring pixel above and to the left (illustrated in Figure ?? by the set AL) set F (p) = min F (p), D(p, q) + F (q) .
q AL
3. Pass through the image row by row, from bottom to top and right to left. For each neighboring pixel below and to the right (the set BR in Figure ??), set F (p) = min F (p), D(p, q) + F (q) .
q BR
Euclidean
D4
D8
Quasieucledean distance
43/46
Eucledean DT cannot be easily computed in two passes only. The quasieucledean distance approximation is often used which can be obtained in two passes. DQE (i, j), (h, k) = |i h| + ( 2 1) |j k| for |i h| > |j k| , ( 2 1) |i h| + |j k| otherwise.
Euclidean
quasieuclidean
50
50
50
100
100
100
150
150
150
200
200
200
250
250
250
color
grayscale
binary
100
100
150
150
200
200
250
250
D4
Distance transform, distance DQE (quasiEuclidean)
D8
Distance transform, distance DE (Euclidean)
50
50
100
100
150
150
200
200
250
250
quazieuclidean
euclidean
Brightness histogram
46/46
Histogram of brightness values serves as the probability density estimate of a phenomenon, that a pixel has a denite brightness.
3500
3000
2500
2000
1500
1000
500
50
100
150
200
250
input image