Professional Documents
Culture Documents
Image Retrieval
A big part of the data that we work with is presented as an image, drawing, or photo.
In this chapter, we will implement a similarity-based image retrieval without the
use of any metadata or concept-based image indexing. We will work with distance
metric and dynamic warping to retrieve the most similar images.
The distance metric measures how far are two points A and B from each other in
a geometric space. We commonly use the Euclidian distance which draws a direct
line between the pair of points. In the following figure, we can see different kinds of
paths between the points A and B such as the Euclidian distance (with the arrow)
but also we see the Manhattan (or taxicab) distance (with the dotted lines), which
simulate the way a New York taxi moves through the buildings.
[ 94 ]
Chapter 5
A Euclidean Distance
Manhattan Distance
DTW is used to define similarity between time series for classification, in this
example, we will implement the same metric with sequences of pixels. We can say
that if the distance between the sequence A and B is small, these images are similar.
We will use Manhattan distance between the two series to sum of the squared
distances. However, we can use other distances metrics such as Minkowski or
Euclidean, depending on the problem at hand.
In the following figure we can observe warping between two time series:
[ 95 ]
Similarity-based Image Retrieval
The example of this chapter will use mlpy, which is a Python module for Machine
Learning, built on top of numPy and sciPy. The mlpy library implements a version of
DTW that can be found at http://mlpy.sourceforge.net/docs/3.4/dtw.html.
In the paper Direct Image Matching by Dynamic Warping, Hansheng Lei and
Venu Govindaraju implement a DTW for image matching, finding an optimal
pixel-to-pixel alignment, and prove that DTW is very successful in the task.
In the following figure we can observe a cost matrix with the minimum distance
warp path traced through it to indicate the optimal alignment:
[ 96 ]
Chapter 5
In order to implement the DTW, first we need to extract a time series (pixel sequences)
from each image. The time series will have a length of 768 values adding the 256 values
of each color in the RGB (Red, Green, and Blue) color model of each image. The
following code implements the Image.open("Image.jpg") function and cast into an
array, then simply add the three vectors of color in the list:
from PIL import Image
img = Image.open("Image.jpg")
arr = array(img)
list = []
for n in arr: list.append(n[0][0]) #R
for n in arr: list.append(n[0][1]) #G
for n in arr: list.append(n[0][2]) #B
Pillow is a PIL fork by Alex Clark, compatible with Python 2.x and 3.x.
PIL is the Python Imaging Library by Fredrik Lundh. In this chapter,
we will use Pillow due to its compatibility with Python 3.2 and can be
downloaded from https://github.com/python-imaging/Pillow.
Implementing DTW
In this example, we will look for similarity in 684 images from 8 categories. We will
use four imports of PIL, numpy, mlpy, and collections:
from PIL import Image
from numpy import array
import mlpy
from collections import OrderedDict
[ 97 ]
Similarity-based Image Retrieval
First, we need to obtain the time series representation of the images and store
it in a dictionary (data) with the number of the image and its time series as
data[fn] = list.
data = {}
for fn in range(1,685):
img = Image.open("ImgFolder\\{0}.jpg".format(fn))
arr = array(img)
list = []
for n in arr: list.append(n[0][0])
for n in arr: list.append(n[0][1])
for n in arr: list.append(n[0][2])
data[fn] = list
Then, we need to select an image for the reference, which will be compared with all
the other images in the data dictionary:
reference = data[31]
Now, we need to apply the mlpy.dtw_std function to all the elements and store the
distance in the result dictionary:
result ={}
for x, y in data.items():
#print("{0} --------------- {1}".format(x,y))
dist = mlpy.dtw_std(reference, y, dist_only=True)
result[x] = dist
Finally, we need to sort the result in order to find the closest elements with the
OrderedDict function and print the ordered result:
[ 98 ]
Chapter 5
In the following screenshot, we can see the result and we can observe that the result
is accurate with the first element (reference time series). The first result presents a
distance of 0.0 because it's exactly similar to the image we used as a reference.
[ 99 ]
Similarity-based Image Retrieval
data = {}
for fn in range(1,685):
img = Image.open("ImgFolder\\{0}.jpg".format(fn))
arr = array(img)
list = []
for n in arr: list.append(n[0][0])
for n in arr: list.append(n[0][1])
for n in arr: list.append(n[0][2])
data[fn] = list
reference = data[31]
result ={}
for x, y in data.items():
#print("{0} --------------- {1}".format(x,y))
dist = mlpy.dtw_std(reference, y, dist_only=True)
result[x] = dist
[ 100 ]
Chapter 5
In the following figure, we can see the first three searches and can observe a good
accuracy in the result, even in case of the bus the result displays the result elements
in different angles, rotation, and colors:
[ 101 ]
Similarity-based Image Retrieval
In the following figure, we see the fourth, fifth, and sixth search, and we can observe
that the algorithm performs well with an image that has a good contrast in colors.
In case of the seventh search the result is poor, and in similar cases when the
references time series is a landscape or a building, the result are images that are not
related to the search criteria. This is because the RGB color model of the time series is
very similar to other categories. In the following figure, we can see that the reference
image and the first result share a big saturation of the color blue. Due to this, their
time series (sequences of pixels) are very similar. We may overcome this problem
by using a filter such as Find Edges on the images before the search. In Chapter 14,
Online Data Analysis with IPython and Wakari, we present the use of filters, operations,
and transformations for image processing using PIL.
[ 102 ]
Chapter 5
In the following table we can see the result of the complete set of tests:
Summary
In this chapter, we introduced the dynamic time warping (DTW) algorithm, which is
an excellent tool to find similarity between time series without any previous training.
We presented an implementation of DTW to find similarity between a set of images,
which worked very well in most cases. This method can be used for several other
problems in a variety of areas such as robotics, computer vision, speech recognition,
and time series analysis. We also saw how to turn an image into a time series with
the PIL library. Finally we learned how to implement DTW with the mlpy library. In
the next chapter, we will present how simulation can help us in the data analysis and
how to model pseudo-random events.
[ 103 ]