Lecture 1 - Introduction

CSE3018 Content Based Image and
Video Retrieval

Dr.S.Geetha
Professor
School of Computing Science and Engineering
VIT University, Chennai Campus, Chennai
AB1-205 – geetha.s@vit.ac.in
SYLLABUS
• Fundamentals of Content based image and video
retrieval
• Image Content descriptors –Key Frame features Color
• Image Content descriptors Key frame features– Texture,
Shape
• Motion features
• Similarity Measures and Indexing Schemes Feature
Extraction techniques
• Computer Vision System Toolbox
Recommended Books
• MOODLE
– Lecture Slides
– Reading materials
– Web links
– Tutorials
– Research works, Recent Trends etc.
• Digital Image Processing (3rd Edition)– Gonzalez
• Research and White Papers
Grading Policy
• CAT – 1 – 15%
• CAT – 2 – 15%
• FAT – 40%
• Digital Assignment, Quiz, Programming
Exercise – 30%
• Project
– Team of maximum 3 members
COURSE OBJECTIVES
• To understand the fundamentals for the content-based
image and video retrieval.
• To understand the role of key frame features for content
based image retrieval.
• To understand the motion features of videos.
• To give the exposure on importance of similarity
measure in content based image retrieval.
• To design the algorithm for content based image
retrieval using machine learning algorithm.
Lecture 1
Introduction to Image Retrieval
Applications
Lecture 1: Outline
Whatisisand
•• What andWhy
Why image
image retrieval?
retrieval?
• How to compare and retrieve images?
− Digital image representation
– Digital image representation
− Common components of the CBIR systems
– Common components of the CBIR systems
− Main problems and research directions
– Main
• What areproblems and research directions
applications?
• What are applications?
What is image retrieval?
 Description Based Image Retrieval (DBIR)
 Content Based Image Retrieval (CBIR)
Query
Textual Text
query
Query Image
by
example Sketch
DBIR v. s. CBIR
DBIR CBIR
 Full text search algorithms are  Automatic index construction
applicable
+  Index is objective
 Search results correspond to
image semantics
 Manual annotating is hardly  Semantic gap

feasible
–  Manual annotations are  Querying by example is not
subjective convenient for a user
Why image retrieval?
• Huge amounts of images are everywhere: how

to manage this data?
• “A Picture is worth thousand words”
• Not everything can be described in text
Why is Image Retrieval Hard ?
? What is the topic of this image
? What are right keywords to index this image
? What words would you use to retrieve this
image ?
The Semantic Gap
How similar are these two images
Complexities for Image Retrieval
• Images were traditionally managed by first annotating their
contents and then using text-retrieval techniques to index
them.
• However, with the increase of information in digital image
format some drawbacks of this technique were revealed:
• Manual annotation requires vast amount of labor
• Different people may perceive differently the contents of an image; thus no objective
keywords for search are defined
• A new research field was born in the 90’s: Content-based

Image Retrieval aims at indexing and retrieving images
based on their visual contents
Image Database Examples
• IBM: Query by Image Content (QBIC)
– Retrieves images based on visual content, including such
properties as color percentage, color layout, and texture.
• Virage, Inc.
– Virage search engine can retrieve images based on color
composition, texture, and structure.
• Google Image search.
• National Library of Medicine provides a database of x-rays,
CT scans, MRI images, and color cross-sections, taken at very
small intervals along the bodies of male and female cadaver.
• The NASA collects huge databases of images from its satellites
and makes them available for public acquisition. (for free )
DB2 Image Extender
• DB2 Image Extender defines the distinct data type
DB2IMAGE with associated user-defined functions
for storing and manipulating image files
– (http://www-306.ibm.com/software/data/db2/extenders/
).
• The DB2 Image Extender provides similarity search
functionality based on the QBIC technology
– (http://wwwqbic.almaden.ibm.com/ )
17
Query By Example (QBE)
• The image DB user should be able to:
– show the system a sample image, or
– Paint one interactively on the screen, or
– Just sketch the outline of an object.
• The system should then be able to return
similar images or images containing similar
objects.
18
IBM-QBIC
• The Hermitage Web site was voted the best in
Russia. It uses the QBIC engine for searching
archives of world-famous art.
– http://www.hermitagemuseum.org/fcgi-bin/db2w
ww/qbicSearch.mac/qbic?selLang=English
– Color percentage
– Color layout
19
A sample query
• SELECT CONTENTS(image),
QBScoreFROMStr(`averageColor=
<255,0,0>’, image) AS SCORE
FROM signs ORDER BY SCORE
20
Photobook System
Figure 1. The texture retrieval of PhotoBook system

(http://web.media.mit.edu/~tpminka/photobook/).
Search Using Sketch
 Sketch entry
 Results of search
ImageScape System
Figure 2.5 The interface of ImgeScape visual query

system (http://skynet.liacs.nl/imagescape/).
Relevance Feedback in CBIR
• Motivation:
– Human perception of image similarity is subjective,
semantic, and task-dependent.
– The CBIR based on the similarities of pure visual
features are not necessarily perceptually and
semantically meaningful.
• Each type of visual feature tends to capture only one aspect
of image property and it is usually hard for a user to specify
clearly how different aspects are combined …
– Relevance Feedback is introduced to address these
problems.
• It is possible to establish the link between high-level
concepts and low-level features.
Relevance Feedback RF (cont.)
• RF is a supervised active learning technique
used to improve the effectiveness of
information systems.
– Main idea: use positive and negative examples
from the user to improve system performance.
Initial query
results
Query Image
Collect user’s
feedback Initial Query Results
Real-time User Relevance Feedback

learning
Query Results After User Feedback

Refine query
results
Content-Based Image Retrieval
• Content-Based Image Retrieval (CBIR)
– Image databases can be huge, containing hundreds of
thousands or millions of images.
– In most cases they are only indexed by keywords that have
to be decided upon and entered into the database system
by a human categorizer.
– However, image can be retrieved according to their
content, where content might refer to color distributions,
texture, region shapes, or object classification.
Why content based image
retrieval?
• Automatic generation of textual annotations for a wide
spectrum of images is not feasible.
• Annotating images manually is a cumbersome and expensive

task for large image databases.
• Manual annotations are often subjective, context-sensitive

and incomplete.
• Google, Yandex and others use text-based search. Results are

not perfect.
However, now it is much better, than a couple of years ago!
28/46
Content-based Image Retrieval (CBIR)
Searching a large database for images that

match a query:
– What kinds of databases?

– What kinds of queries?
– What constitutes a match?
– How do we make such searches efficient?
Applications
– Art Collections
e.g. Fine Arts Museum of San Francisco
– Medical Image Databases
CT, MRI, Ultrasound, The Visible Human
– Scientific Databases
e.g. Earth Sciences
– General Image Collections for Licensing
Corbis, Getty Images
– The World Wide Web
What is a Query?
– an image you already have

How did you get it?
– a rough sketch you draw

How might you draw it?
– a symbolic description of what you want

What’s an example?
Some Systems You Can Try
Corbis Stock Photography and Pictures
http://pro.corbis.com/
• Corbis sells high-quality images for use in advertising,

marketing, illustrating, etc.
• Search is entirely by keywords.
• Human indexers look at each new image and enter keywords.
• A thesaurus constructed from user queries is used.

QBIC
IBM’s QBIC (Query by Image Content)
http://wwwqbic.almaden.ibm.com
• The first commercial system.
• Uses or has-used color percentages, color layout,

texture, shape, location, and keywords.
Blobworld
UC Berkeley’s Blobworld
http://elib.cs.berkeley.edu/blobworld
• Images are segmented on color plus texture
• User selects a region of the query image
• System returns images with similar regions
• Works really well for tigers and zebras

Image Features / Distance Measures
Query Image Retrieved Images
User
Image Database
Distance Measure
Images Image Feature Feature Space

Extraction
Levels of image retrieval
• Level 1: Based on color, texture, shape features
– Images are compared based on low-level features,
no semantics involved
– A lot of research done, is a feasible task
• Level 2: Bring semantic meanings into the search

– E. g. identifying human beings, horses, trees, beaches
– Requires retrieval techniques of level 1
– Very active and challengeable research area
• Level 3: Retrieval with abstract and subjective attributes

– Find pictures of a particular birthday celebration
– Find a picture of a happy beautiful woman
– Requires retrieval techniques of level 2 and very complex logic
– Is far from being developed with modern technology available now
36/46
Image retrieval by Google
37/46
Image retrieval by Yandex
38/46
Lecture 1: Outline
• What is and Why image retrieval?
– Main problems and research directions
39/46
Digital image representation
Vector image
draw circle
center 0.5, 0.5
radius 0.4
fill-color yellow
stroke-color black
stroke-width 0.05
draw circle
center 0.35, 0.4
radius 0.05
fill-color black
draw circle
center 0.65, 0.4
radius 0.05
fill-color black
draw line
start 0.3, 0.6
end 0.7, 0.6
stroke-color black
stroke-width 0.1
40/46
Bitmap (raster) image
0  f (x , y )  L, and typically L  255
• Bitmap image is an array of pixels

• The value of each array element corresponds
to the color of the appropriate pixel
41/46
Important parameters of raster image:
• Raster dimensions
• Resolution (ppi)
• Sample depth (usually 2k)
Fixed resolution, varying dimension
Fixed dimensions, varying resolution

42/46
The same image with varying sample depths:
16 levels 8 levels 4 levels 2 levels
Typical levels: 8 bit (256 levels), 16 bit – png, tiff
43/46
Bitmap (raster) image: color
• RGB – the most common color model (CRT monitors, LCD
screens/projectors)
• Each pixel represented by 3 values: red, green, blue
RGB bands:
color image built up of bands of red, green and blue color
44/46
Bitmap (raster) image: color
• Pixel-interleaved format (chunky) – is a common one
• Color-interleaved format (planar)
45/46
Lecture 1: Outline
46/46
Common components of CBIR
system
indexation
image feature
extraction database
query feature result

retrieval
comparison
extraction
Relevance feedback: query refinement
47/46
Lecture 1: Outline
48/46
Problems and directions
• Low-level feature extraction
– How to represent an image in a compact and descriptive
way?
– How to compare features, and, thus, images?
• High dimensional indexing
• How to index huge amounts of high dimensional data?
• Visual interface for image browsing
– How to visualize the results?
49/46
How to: Image features
Textual/metadata features
 Semantics
Levels of image content
 Shape
 Texture
Low-level features / visual features
(signatures, descriptors)
 Color, lightness
50/46
Image features
Visual (low-
Textual
level)
Annotations and metadata: Features extracted from pixel
− tags/keywords; values:
− creation date; − color descriptors;
− geo tags; − texture descriptors;
− name of the file; − shape descriptors;
− photography conditions − spatial layout descriptors.
(exposition, aperture, flash…).
51/46
Low-level features
Global Local
Describes the whole image: Describes one part of the image:

− average intensity; − average intensity for the left
− average amount of red; upper part;
− average amount of red in the
− …
center of the image;
− …
All pixels of the image are Segmentation of the image is

processed. performed, pixels of a particular
segment are processed to extract
features.
52/46
How to: Feature spaces
• Feature vector – a vector of features, representing one image.
• Feature space – the set of all possible feature vectors with
defined similarity measure.
Image A Image B
xA1 xA2 … xAN Similarity measure xB1 xB2 … x BN

Similarity measure
xA1 … xAN yA1 … yAM zA1 … zAK Similarity measure xA1 … xAN yA1 … yAM zA1 … zAK
y A1 y A2 … yAM y B1 y B2 … y BM
z A1 z A2 … z AK Similarity measure …z B1 z B2 … zBK

…
53/46
How to: Combine results
Image A Image B
Similarity measure
xA1 xA2 … x AN xB1 xB2 … xBN d1
Similarity measure
y A 1 yA 2 … yAM yB1 yB2 … yBM d2
Similarity measure
z A 1 zA 2 … zAK zB1 zB2 … zBK d3
…
…
D c d
i
i i
54/46
How to: Image segmentation
• Fixed regions
– The same region boundaries for all images.
• Segmentation
– Boundaries depends on image content.
• Key points (point of interest) detection
– Points of particular interest in the image, feature extraction for areas
around key points.
55/46
Problems: semantic gap
Objects semantics
(regions)
Levels of image content
Texture semantic gap

(local regions)
Color, brightness
(one pixel) low-level features
How to understand what’s on the images?

56/46
Problems: what’s on the images?
• Sometimes it is not easy to understand the image even for humans!

• What do we want from machines?
57/46
Problems: what’s on the images?
• How do we know that all these objects are lamps?

58/46
Problems: subjectivity of
perception
Let’s compare our perception!
• Test a CBIR system
59/46
Problems: high dimensional data
• More information in feature vectors – better search results.
• Local features are usually more precise than global ->

more feature vectors.
• The dimensionality of the feature vectors is normally of the

order 102.
• ~200-500 keypoints per image
• Non-Euclidean similarity measure
60/46
How to: high dimensional
indexing
• Perform dimension reduction
– The dimension of the feature vectors is normally
very high, the embedded dimension is much lower.
• Use appropriate multi-dimensional indexing

techniques, which are capable of supporting
Non-Euclidean similarity measures
– Trees (k-d tree, VP-tree and others)
– Hashing
61/46
Problems: visualization
• Image content is very rich and its interpretation is
very contextual and subjective.
• Many independent similarity measures are

commonly used. How about to let user influence
the choice of these parameters?
• Which images to show as a result (result diversity)?
• Interactive search and relevance feedback.
62/46
1-D
How to: visual interfaces

• 1-D visualizations
– As a list (standard way)
• 2-D visualizations
– Based on dimension reduction techniques
• “3-D” visualizations
– Fish eye
2-D 3-D
63/46
Neighbour research areas
• Image processing
– Features extraction
– Pattern recognition and machine learning
• Faces, handwritings, thumbprints, …
• Classification tools
– Image enhancement
– Image classification
• The same features are used
• Classification helps to retrieve
• Information retrieval
– Scalability
– Performance measurement
– Fusion of multiple evidences
64/46
Lecture 1: Outline
65/46
What are applications? – Image Archives.
• Manage image archives
– Personal photo collections (many thousands of photos in mine)
– Professional photograph archives (millions of photos)
– Art collections (millions of photos)
• Browse images
• Organize image collection: delete duplicates, classify images, select “the best”
from the group of similar images
• Posters creation, auto cropping, album creation (www.snapfishlab.hpl.hp.com)
• Better organization of search-by-text results
66/46
What are applications? – Image Archives.
• Manage image archives
• …
• Search for particular image (by its smaller version, by its fragment)
• Search for similar images (landscape paintings, sea views, paintings by the
same author)
• Search for a painting with particular colors (“I want a sea view painting to my
bedroom with an orange carpet and yellow walls”)
• Search for group photos of my family
• Search for an image that will be a good illustration to my article/presentation
• … a lot of other use cases
67/46
What are applications? – Copyrights.
• Trademark and copyright application
– World Wide Web
– Enterprise network
• Copyright detection without watermarking and protect intellectual

property
• Forged images detection and sub-image retrieval
• Trademark image registration: a new candidate is compared with
existing marks to ensure no risk of confusing property ownership
• Search if confidential images are included into public presentations
68/46
What are applications? – Medical.
• Medical diagnosis
– Collection of X-ray images
• Search for similar past cases

• Is it similar to the “healthy” case?
• Classification of X-ray images
69/46
What are applications? – Security.
• Security issues
– Video surveillance material
– Faces, fingerprints, retina images
• Detect suspicious objects during the

video surveillance
• Detect “wanted” faces during the video
surveillance
• Grant or deny access based on
fingerprints/retina scanning
70/46
What are applications? – In
industry.
• Quality assurance
(a) CD-ROM controller(b) Pack of pills (c) Level of liquid (d) Air-bladders
in plastic
• Control that all parts of the product are on place (a)
• Control if all places in pill pack are filled (b)
• Control the level of liquid in bottles (c)
• Control the quality of plastic details (d)
• And even control the corn flakes! (e)
(e) Corn flakes
71/46
What are applications? – Others.
• Military-related issues
– Auto aiming, tracking systems
• Image-based modeling and 3-D reconstruction

– Medical imaging
– Indoor scene reconstruction from multiple images
– Outdoor scene reconstruction from aerial photography
• Geographical information and remote sensing

– Process satellite data: climate variability, sea surface
temperatures, storms watch.
72/46
Lecture 1: Summary
• CBIR is an actual problem and an active research area
• Main research directions are:

– Feature extraction
– Multidimensional indexing
– Visualization
• CBIR combines research results of image processing,

information retrieval, database communities
• CBIR has many applications in various areas
73/46
Lecture 1: Bibliography
• Gonzalez R, Woods R. Digital Image Processing, published by
Pearson Education, Inc, 2002.
• Rui Y., Huang T.S., Chang S.-F. Image Retrieval: Past, Present
and Future. In Proc. of Int. Symposium on Multimedia
Information Processing, Dec. 1997.
• Links:
– http://videolectures.net/russir08_vassilieva_cbir/
– http://videolectures.net/kdd08_malik_fis/
– http://imagedatabase.cs.washington.edu/demo/cbir.html
– courses.cs.vt.edu/~cs4624/cache/qbic.htm
– http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG363
http://www.ux.uis.no/~tranden/brodatz.html
http://www.cs.washington.edu/research/imagedatabase/groundtruth/
http://peipa.essex.ac.uk/benchmark/databases/
74/46
Thank You!

Lecture 1 - Introduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1 - Introduction

Uploaded by

Copyright:

Available Formats

CSE3018 Content Based Image and

 Manual annotating is hardly  Semantic gap

• Huge amounts of images are everywhere: how

• A new research field was born in the 90’s: Content-based

Figure 1. The texture retrieval of PhotoBook system

Figure 2.5 The interface of ImgeScape visual query

Real-time User Relevance Feedback

Query Results After User Feedback

• Annotating images manually is a cumbersome and expensive

• Manual annotations are often subjective, context-sensitive

• Google, Yandex and others use text-based search. Results are

Searching a large database for images that

– What kinds of databases?

– an image you already have

– a rough sketch you draw

– a symbolic description of what you want

• Corbis sells high-quality images for use in advertising,

• Search is entirely by keywords.

• Human indexers look at each new image and enter keywords.

• A thesaurus constructed from user queries is used.

IBM’s QBIC (Query by Image Content)

• The first commercial system.

• Uses or has-used color percentages, color layout,

• Images are segmented on color plus texture

• User selects a region of the query image

• System returns images with similar regions

• Works really well for tigers and zebras

Query Image Retrieved Images

Images Image Feature Feature Space

• Level 2: Bring semantic meanings into the search

• Level 3: Retrieval with abstract and subjective attributes

0  f (x , y )  L, and typically L  255

• Bitmap image is an array of pixels

Fixed resolution, varying dimension

Fixed dimensions, varying resolution

The same image with varying sample depths:

16 levels 8 levels 4 levels 2 levels

Typical levels: 8 bit (256 levels), 16 bit – png, tiff

• Color-interleaved format (planar)

query feature result

Relevance feedback: query refinement

Describes the whole image: Describes one part of the image:

All pixels of the image are Segmentation of the image is

xA1 xA2 … xAN Similarity measure xB1 xB2 … x BN

z A1 z A2 … z AK Similarity measure …z B1 z B2 … zBK

Texture semantic gap

How to understand what’s on the images?

• Sometimes it is not easy to understand the image even for humans!

• How do we know that all these objects are lamps?

• Test a CBIR system

• Local features are usually more precise than global ->

• The dimensionality of the feature vectors is normally of the

• ~200-500 keypoints per image

• Non-Euclidean similarity measure

• Use appropriate multi-dimensional indexing

• Many independent similarity measures are

• Which images to show as a result (result diversity)?

• Interactive search and relevance feedback.

How to: visual interfaces

• Copyright detection without watermarking and protect intellectual

• Search for similar past cases

• Detect suspicious objects during the

• Image-based modeling and 3-D reconstruction