You are on page 1of 75

SHANGHAI

UNIVERSITY

UNDERGRADUATE

PROJECT (THESIS)

06122604


10.03.2910.07.09

()

............................................................................................................................ 1
............................................................................................................................ 3
ABSTRACT ................................................................................................................... 4
.................................................................................................................. 6
1.1 ..................................................................................................................................... 6
1.2 ......................................................................................... 8
1.2.1 .......................................................................................... 8
1.2.2 ................................................................................ 13
1.2.3 ........................................................................................................ 14
1.3 ............................................................................................................... 14

............................................................................ 17
2.1 ........................................................................................................................... 17
2.2 ....................................................................................................... 17

............................................................................ 19
3.1 ....................................................................................................................... 19
3.2 ........................................................................................................... 20
3.3 ....................................................................... 22

........................................................................................................ 23
4.1 ........................................................................................................................... 23
4.2 ........................................................................................................................... 23
4.3 ................................................................................................................... 25
4.3.1 ............................................................................................ 25
4.3.2 ............................................................................................ 26
4.3.3 ............................................................................................ 27
4.3.4 .................................................................................................... 27
4.4 ........................................................................................................................... 28
4.4.1 .................................................................................................................... 28
4.4.2 ........................................................................................................ 29

........................................................................................................ 31
5.1 .......................................................................................................................... 31
5.2 CBIR ............................................................................ 32
5.3 CBIR ................................................................................................ 33
5.3.1 .................................................................................................... 33
5.3.2 .................................................................................................................... 34
5.3.3 .................................................................................................................... 35
5.3.4 ............................................................................................................ 36
5.3.5 ................................................................................................ 36
5.4 .............................................................................................................. 37

()

................................................................................................................ 39
.......................................................................................................................... 41
...................................................................................................................... 42
.......................................................................................................................... 44
............................................................................................................................. 44
................................................................................................................................. 55
................................................................................................................................. 55
................................................................................................................................. 66

()

Content-based image retrieval

,,

, QBIC(Query By Image Content)Visualseek


Webseek

VS2008 Windows

()

ABSTRACT
Content-based image retrieval, image analysis is a research field. It is called the
CBIR or called content-based visual information retrieval. Content-based image
retrieval aims to check the image in a given context, based on the content of
information or specify search criteria, in the image database to search for and find a
consistent query the corresponding picture.
In recent years, multimedia technology and network technology along with the rapid
development of the traditional keyword-based information retrieval technology does
not meet the requirements, therefore, content-based image retrieval technique came
into being, and aroused wide attention. More and more research on content-based
image retrieval system was developed, and made some research. Such as QBIC
(Query By Image Content), Visualseek, Webseek.
With content-based image retrieval technology, beginning to image retrieval
applications, but for different application areas, there are still a lot of technical issues
remain unresolved. Currently, network technology is already mature, carry out
scientific education through the network means to improve the public's scientific
literacy is necessary. Shanghai has many historical relics of resources to carry out on
digital museum system, universal museum knowledge, for sharing resources,
protection of valuable museum resources to enhance public understanding of history
knowledge in the interest of heritage, of great importance.
The Digital Museum is currently available based on user input keywords to find the
appropriate information, but the returned information is often non-human means.
Because people may have only a general impression of heritage, did not know that
this is what dynasties, with what meanings. So how heritage rough impression of
people in large numbers to find the appropriate Museum accurate information to help

()

people understand the history of the relevant institutions to promote heritage, display
artifacts are meaningful.
Using VS2008, in the Windows environment to develop a practical content-based
retrieval system artifacts. System can be entered by the user or their own heritage
picture drawn sketch, automatically extracting the shape features, then retrieve with
a similar heritage, and return relevant information.

Keywords: Shape, Image Retrieval, Fourier Descriptors, Hu Moments,


Historical Relic

()


1.1

20 70

text-based image retrieval 20


70

CBIR Content-based image retrieval content-based visual information


retrieval

()

90
content-based image

retrieval

GoogleYahoo MSN

query by
image example

1)

()

2)

3)

2
3 4
5

1.2
1.2.1

()

1.2.1.1

:
(1)(ColorHistogram),
,
,
RGB,CIE,HSI,HSV ,

,,
[5],

(2)(Color Correlogram)
,,

()

,,,
(3)(Color Moment)
,
,
(4)(Color Coherence Vectors,CCV)
,
,
1.2.1.2

Tamura

,,
,
:
(1)
,
,Haralick
[9],
,
Tamura

10

()

[10],
:(Coarseness)(Contrast)
(Directionality)(Linelikeness)(Regularity)(Roughness)
Tamura :
,()
Tamura QBIC MARS

(2)
,
Carlucci[9]
,Lu and Fu[9]
, 99 ,
,

(3),
,
[13](Multi-Resolution Simultane-ous Autoregressive,MRSA)
MRSA ,

(4),[15]
Gabor [14](PyramidWaveletTransform,PWT)[11]
(Tree Wavelet Trans-form,TWT)[11]Manjunath and Ma[14]
,Gabor , TWT PWT,
MRSA,,
1.2.1.3

11

()

,,

:
,
(1),
,
[16],
,
[20] ,
Delaunay ,
,
[17],,

(2)
,,
[18]
,,
,,[19]
,,

12

()

,,
,

1.2.1.4
,
,,
,
,
,
,,
1 ,
,,[17]
[29]
:

,
Tamura Gabor
,

1.2.2

13

()

1.2.3

semantic gap

relevance feedback

image segmentation

Support Vector Machine

1.3
IBM QBIC
QBIC IBM
QBIC IBM QBIC UIUC
MARS MIT PhotobookUC Berkeley Digital Library Project
Columbia VisualSEEk
CBIR CT

14

()

()

:
(1)

(2)

(3)

(4)
()

15

()

(5)

,:
(1),
,

(2)
,
,
,,,
,

(3),

,
,
,,
,
,,

16

()


2.1
,,

, QBIC(Query By Image Content)Visualseek


Webseek

2.2

Jain , ,
, ,

Jain , ,
400
, ,

17

()

1972 ,Zahn C T (Normalized

Fourier

Descriptors) ,
, ,

,
,
,
(Local) ( Global)
, ,
, ,
,

18

()


3.1
Moment Invariants
R R p+q

xC , yC

Hu [13] 7

[14]
Yang Albregtsen Green
Kapur
[15]Gross Latecki

19

()

[15][16]

3.2
Fourier shape descriptors

xSyS 0

N-1 N

curvature
functioncentroid distancecomplex coordinates
function

K(s)

(s)

xC, yC

20

()

DC

|F-i| = |Fi |

Fi i

DC

MM 2n = 64

21

()

3.3

0.2 0.8

22

()


4.1

Visual C# 2008 OpenCV


OpenCV Intel C C++

4.2
:
2
:
1.

2.
()

23

()

3.

4.

5.
()

6.

7.

;;

24

()

4.3
4.3.1

25

()

public class FDNode


{
public string filename;
public int size = 8;
public double[] bin;
public FDNode()
{
bin = new double[size];
}
};

3
3 filename size 8
double[]bin

4.3.2
public class HuNode
{
public string filename;
public int size = 7;
public double[] bin;
public HuNode()
{
bin = new double[size];
}
};

4
size Hu 7
7

26

()

4.3.3
cvLoadImage
Image_Re Image_Im DFT

public void GetFD(ref FDNode node)


{
IntPtr img = CvInvoke.cvLoadImage(node.filename,
LOAD_IMAGE_TYPE.CV_LOAD_IMAGE_GRAYSCALE);
IntPtr image_Re = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_64F, 1);
IntPtr image_Im = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_64F, 1);
IntPtr Fourier = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_64F, 2);
IntPtr dst = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_64F, 2);
CvInvoke.cvConvertScale(img, image_Re, 1, 0);
CvInvoke.cvZero(image_Im);
CvInvoke.cvMerge(image_Re, image_Im, IntPtr.Zero, IntPtr.Zero, Fourier);
CvInvoke.cvDFT(Fourier, dst, CV_DXT.CV_DXT_FORWARD, 0);
double DFTfactor = CvInvoke.cvGet2D(dst, 0, 0).v0;
for (int i = 0; i < node.size; i++)
{
double x = CvInvoke.cvGet2D(dst, i + 1, 0).v0;
double y = CvInvoke.cvGet2D(dst, i + 1, 0).v1;
node.bin[i] = Math.Sqrt(x * x + y * y) / DFTfactor;
}
CvInvoke.cvReleaseImage(ref img);
CvInvoke.cvReleaseImage(ref
image_Re);
5
CvInvoke.cvReleaseImage(ref image_Im);
CvInvoke.cvReleaseImage(ref dst);
CvInvoke.cvReleaseImage(ref Fourier);
}

4.3.4
FD
7

27

()

public void GetHu(ref HuNode node)


{
IntPtr img = CvInvoke.cvLoadImage(node.filename,
LOAD_IMAGE_TYPE.CV_LOAD_IMAGE_GRAYSCALE);
MCvMoments moments = new MCvMoments();
CvInvoke.cvMoments(img, ref moments, 0);
MCvHuMoments hu = new MCvHuMoments();
CvInvoke.cvGetHuMoments(ref moments, ref hu);
node.bin[0] = hu.hu1;
node.bin[1] = hu.hu2;
node.bin[2] = hu.hu3;
node.bin[3] = hu.hu4;
node.bin[4] = hu.hu5;
node.bin[5] = hu.hu6;
node.bin[6] = hu.hu7;
for (int i = 0; i < node.size; i++)
node.bin[i] = -Math.Log(Math.Abs(node.bin[i]));
CvInvoke.cvReleaseImage(ref img);
}

4.4
4.4.1

Hu MomentsFourier DescriptorsHu & Fourier

28

()

4.4.2
OpenCV Canny

29

()

30

()


5.1
CBIR
Web of ScienceSCI1997-SSCI2000-A&HCI2001-CPCI1997-
TI=(image OR(image re-trieval)OR(content based
image retrieval)OR CBIR)AND TS=((performance evaluat*)OR(performance
as-sess*)) LISAACMEBSCOhostEmeraldSpringLink
10 10 CBIR
1996
2002 CBIR

CBIR

CBIR
CBIR

10

CBIR
CBIR

31

()

Corel 22000 Brodatz Vistex


Corel

5.2 CBIR

Corel

Guojun Lu

32

()

N M
NM

5.3 CBIR
CBIR
CBIR

5.3.1

CBIR Kam A H CBIR


CBIR 11

11

P=|Ra|/|A| R=|Ra|/|R|=|A|-|Ra|/|A|
Tan Kian-Lee Pnormal
Rnormal

33

()

N L
ri i

CBIR

CBIR
CBIR CBIR

5.3.2
Sameer Antani

CBIR

P CBIR Dm
Td

34

()

CBIR
N Td N1 Td
(N3)
M=(N2) Td (M1)(N3)
Td 250 Td
31250
CBIR

5.3.3
Tk
k=12M qTi N
Pj 19

RR 100%
CBIR
SPj

35

()

5.3.4

CBIR Vishal Chitkara E

s Si
E CBIR

CBIR

5.3.5
Henning Muller 22

N NR
K1
K2 KNRRi Ki
Ri=1N[01] 0
Henning Muller 0.5

36

()

PR
CBIR
Suc-cess of Target Search STS
average match percentile AMPtau

5.4

,
,

,
,
, .

, ,

654
,
(query by example), (query
im age) , ,
, 10
, ( retrieval rate)
.

Rate = N~/N
, N~,
QSN N QS
,
, ,
37

()

38

()

MPEG-7
MPEG-7

CBIR ,
,, CBIR
, CBIR :
(1)CBIR
,
(Relevance Feedback)
,,
,,,
,
(2) MPEG-7 MPEG MPEG-7
(Multimedia Content Interface)[21,22]
,
(Visual Descriptor)
,
MPEG-7

(3)(Region-based)
(Object Level)[34]
,
,,

39

()

(4),
,

(5)

40

()

41

()

[1] 2003
12
[2]
200504
[3]
200110
[4]
200404
[5]
200407
[6]
200512
[7]
200407
[8] T. Sikora,The MPEG-7 Visual Standard for Content Description
An Overview,IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp.
696-702, June 2001
[9] John R. Smith and Shih-fu Chang,VisualSEEk: a fully automated
content-based image query system,ACM Multimedia,1996,8798
[10] Jain A K, Vailaya A. Image Retrieval Using Color and Shape. Pattern
Recognition ,1996 ,29 (8)
[11] Zahn C T, Roskies R Z.Fourier Descriptors for Plane Closed Curves.
IEEE Transactions on Computers,1972,(21)
[12] Rui Y etal. Modified Fourier Descriptor for Shape Representation:
A Practical Approach. Proc. 1st International Workshop on Image Database
and Multimedia Search ,Amsterdam ,the Netherlands ,1996

42

()

[13] Hu M K. Visual Pattern Recognition by Moment Invariants. IRE


Transactions on Information Theory ,1962.
[14] L. Yang and F. Algregtsen, Fast computation of invariant geimetric
moments: A new method giving correct results,Proc. IEEE Int. Conf. On
Image Processing, 1994.
[15] Deepak Kapur, Y. N. Lakshman, and Tushar Saxena. Computing
invariants using elimination methods. In Proc. IEEE Int. Conf. on Image
roc., 1995.
[16] David Copper and Zhibin Lei. On representation and invariant
recognition of complex objects based on patches and parts. In Spinger
Lecture Notes in Computer Science series, 3D Object Representation for
Computer Vision. M. Hebert, J. Ponce, T. Boult, A. Gross, Eds., New York,:
Springer, 1995, pp. 139-153.

43

()

using System;
using System.Collections.Generic;
using System.Collections;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Numeric;
using Emgu.CV;
using Emgu.CV.Structure;
using Emgu.CV.CvEnum;
using Emgu.Util;
using System.IO;
namespace CBIR
{
public partial class Form1 : Form
{
string HuFeature;
string FDFeature;
string ImgSearch;
string[] picImg = new string[10];
public class FDNode
{
public string filename;
public int size = 8;
public double[] bin;
public FDNode()
{
bin = new double[size];
}
};
public class HuNode

44

()

{
public string filename;
public int size = 7;
public double[] bin;
public HuNode()
{
bin = new double[size];
}
};
public struct ImgNode
{
public string filename;
public double dist;
};
public void GetFD(ref FDNode node)
{
IntPtr img = CvInvoke.cvLoadImage(node.filename,
LOAD_IMAGE_TYPE.CV_LOAD_IMAGE_GRAYSCALE);
IntPtr image_Re = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_64F, 1);
IntPtr image_Im = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_64F, 1);
IntPtr Fourier = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_64F, 2);
IntPtr dst = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_64F, 2);
CvInvoke.cvConvertScale(img, image_Re, 1, 0);
CvInvoke.cvZero(image_Im);
CvInvoke.cvMerge(image_Re, image_Im, IntPtr.Zero, IntPtr.Zero, Fourier);
CvInvoke.cvDFT(Fourier, dst, CV_DXT.CV_DXT_FORWARD, 0);
double DFTfactor = CvInvoke.cvGet2D(dst, 0, 0).v0;

45

()

for (int i = 0; i < node.size; i++)


{
double x = CvInvoke.cvGet2D(dst, i + 1, 0).v0;
double y = CvInvoke.cvGet2D(dst, i + 1, 0).v1;
node.bin[i] = Math.Sqrt(x * x + y * y) / DFTfactor;
}
CvInvoke.cvReleaseImage(ref img);
CvInvoke.cvReleaseImage(ref image_Re);
CvInvoke.cvReleaseImage(ref image_Im);
CvInvoke.cvReleaseImage(ref dst);
CvInvoke.cvReleaseImage(ref Fourier);
}
public void GetHu(ref HuNode node)
{
IntPtr img = CvInvoke.cvLoadImage(node.filename,
LOAD_IMAGE_TYPE.CV_LOAD_IMAGE_GRAYSCALE);
MCvMoments moments = new MCvMoments();
CvInvoke.cvMoments(img, ref moments, 0);
MCvHuMoments hu = new MCvHuMoments();
CvInvoke.cvGetHuMoments(ref moments, ref hu);
node.bin[0] = hu.hu1;
node.bin[1] = hu.hu2;
node.bin[2] = hu.hu3;
node.bin[3] = hu.hu4;
node.bin[4] = hu.hu5;
node.bin[5] = hu.hu6;
node.bin[6] = hu.hu7;
for (int i = 0; i < node.size; i++)
node.bin[i] = -Math.Log(Math.Abs(node.bin[i]));
CvInvoke.cvReleaseImage(ref img);
}
public string cannyName(string filename)

46

()

{
string[] part = filename.Split('\\');
string[] name = part[part.Length - 1].Split('.');
string ret = name[0] + "_canny.jpg";
return ret;
}
public void canny(string filename)
{
IntPtr img = CvInvoke.cvLoadImage(filename,
LOAD_IMAGE_TYPE.CV_LOAD_IMAGE_GRAYSCALE);
IntPtr imgCanny = CvInvoke.cvCreateImage(CvInvoke.cvGetSize(img),
IPL_DEPTH.IPL_DEPTH_8U, 1);
CvInvoke.cvCanny(img, imgCanny, 50, 150, 3);
//CvInvoke.cvSaveImage(cannyName(filename), imgCanny);
CvInvoke.cvNamedWindow("image");
CvInvoke.cvShowImage("image", imgCanny);
CvInvoke.cvReleaseImage(ref img);
CvInvoke.cvReleaseImage(ref imgCanny);
}
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
}
private void button1_Click(object sender, EventArgs e)
{
OpenFileDialog fileDialog = new OpenFileDialog();
if (fileDialog.ShowDialog() == DialogResult.OK)

47

()

HuFeature = fileDialog.FileName;
textBox1.Text = HuFeature;
}
private void button2_Click(object sender, EventArgs e)
{
OpenFileDialog fileDialog = new OpenFileDialog();
if (fileDialog.ShowDialog() == DialogResult.OK)
FDFeature = fileDialog.FileName;
textBox2.Text = FDFeature;
}
private void button3_Click(object sender, EventArgs e)
{
OpenFileDialog fileDialog = new OpenFileDialog();
if (fileDialog.ShowDialog() == DialogResult.OK)
ImgSearch = fileDialog.FileName;
textBox3.Text = ImgSearch;
}
public double CalNodeDistFD(ref FDNode node1, ref FDNode node2)
{
double dist = 0;
for (int i = 0; i < node1.size; i++)
dist += (node1.bin[i] - node2.bin[i]) * (node1.bin[i] - node2.bin[i]);
return Math.Sqrt(dist);
}
public double CalNodeDistHu(ref HuNode node1, ref HuNode node2)
{
double dist = 0;
for (int i = 0; i < node1.size; i++)
dist += (node1.bin[i] - node2.bin[i]) * (node1.bin[i] - node2.bin[i]);
return Math.Sqrt(dist);
}
public void CalDatabaseDistFD(string database, ref FDNode node1, ref ImgNode[]
imgNode, double weight, ref int size)
{

48

()

StreamReader sr = new StreamReader(database);


string line;
FDNode node2 = new FDNode();
while ((line = sr.ReadLine()) != null)
{
string[] wordline = line.Split();
int i = -1;
foreach (string word in wordline)
{
if (i == -1)
imgNode[size].filename = word;
else
node2.bin[i] = Double.Parse(word);
i++;
}
imgNode[size++].dist += CalNodeDistFD(ref node1, ref node2) * weight;
}
}
public void CalDatabaseDistHu(string database, ref HuNode node1, ref ImgNode[]
imgNode, double weight, ref int size)
{
StreamReader sr = new StreamReader(database);
string line;
HuNode node2 = new HuNode();
while ((line = sr.ReadLine()) != null)
{
string[] wordline = line.Split();
int i = -1;
foreach (string word in wordline)
{

49

()

if (i == -1)
imgNode[size].filename = word;
else
node2.bin[i] = Double.Parse(word);
i++;
}
imgNode[size++].dist += CalNodeDistHu(ref node1, ref node2) * weight;
}
}
private void button4_Click(object sender, EventArgs e)
{
Object selectedItem = comboBox1.SelectedItem;
FDNode nodeFD = new FDNode();
HuNode nodeHu = new HuNode();
nodeFD.filename = nodeHu.filename = ImgSearch;
pictureBox1.Image = Image.FromFile(ImgSearch);
//canny(ImgSearch);
int method = comboBox1.Items.IndexOf(selectedItem);
const int ImgSize = 10000;
ImgNode[] imgNode = new ImgNode[ImgSize];
for(int i = 0; i < ImgSize; i++)
imgNode[i].dist = 0;
int size = 0;
if (method == 0)
{
GetHu(ref nodeHu);
CalDatabaseDistHu(HuFeature, ref nodeHu, ref imgNode, 1, ref size);
}
else if (method == 1)

50

()

{
GetFD(ref nodeFD);
CalDatabaseDistFD(FDFeature, ref nodeFD, ref imgNode, 1, ref size);
}
else
{
GetFD(ref nodeFD);
CalDatabaseDistFD(FDFeature, ref nodeFD, ref imgNode, 0.9, ref size);
size = 0;
GetHu(ref nodeHu);
CalDatabaseDistHu(HuFeature, ref nodeHu, ref imgNode, 0.1, ref size);
}
for(int i = 0; i < size; i++)
for(int j = i + 1; j < size; j++)
if (imgNode[i].dist > imgNode[ j].dist)
{
ImgNode t = imgNode[i];
imgNode[i] = imgNode[ j];
imgNode[ j] = t;
}
pictureBox2.Image = Image.FromFile(imgNode[0].filename);
pictureBox3.Image = Image.FromFile(imgNode[1].filename);
pictureBox4.Image = Image.FromFile(imgNode[2].filename);
pictureBox5.Image = Image.FromFile(imgNode[3].filename);
pictureBox6.Image = Image.FromFile(imgNode[4].filename);
pictureBox7.Image = Image.FromFile(imgNode[5].filename);
pictureBox8.Image = Image.FromFile(imgNode[6].filename);
pictureBox9.Image = Image.FromFile(imgNode[7].filename);
pictureBox10.Image = Image.FromFile(imgNode[8].filename);
pictureBox11.Image = Image.FromFile(imgNode[9].filename);
for (int i = 0; i < 10; i++)
picImg[i] = imgNode[i].filename;
/*
for (int i = 0; i < 10; i++)
canny(imgNode[i].filename);
*/
}

51

()

private void button6_Click(object sender, EventArgs e)


{
StreamReader sr = new StreamReader(FDFeature);
StreamWriter sw = new StreamWriter(FDFeature + "_FD");
string filename;
FDNode node = new FDNode();
while ((filename = sr.ReadLine()) != null)
{
node.filename = filename;
GetFD(ref node);
sw.Write(filename);
for (int i = 0; i < node.size; i++)
{
sw.Write(" ");
sw.Write(node.bin[i]);
}
sw.WriteLine();
}
sr.Close();
sw.Close();
}
private void button5_Click(object sender, EventArgs e)
{
StreamReader sr = new StreamReader(HuFeature);
StreamWriter sw = new StreamWriter(HuFeature + "_Hu");
string filename;
HuNode node = new HuNode();
while ((filename = sr.ReadLine()) != null)
{
node.filename = filename;

52

()

GetHu(ref node);
sw.Write(filename);
for (int i = 0; i < node.size; i++)
{
sw.Write(" ");
sw.Write(node.bin[i]);
}
sw.WriteLine();
}
sr.Close();
sw.Close();
}
private void pictureBox1_Click(object sender, EventArgs e)
{
canny(ImgSearch);
}
private void pictureBox2_Click(object sender, EventArgs e)
{
canny(picImg[0]);
}
private void pictureBox3_Click(object sender, EventArgs e)
{
canny(picImg[1]);
}
private void pictureBox4_Click(object sender, EventArgs e)
{
canny(picImg[2]);
}
private void pictureBox5_Click(object sender, EventArgs e)
{
canny(picImg[3]);
}

53

()

private void pictureBox6_Click(object sender, EventArgs e)


{
canny(picImg[4]);
}
private void pictureBox7_Click(object sender, EventArgs e)
{
canny(picImg[5]);
}
private void pictureBox8_Click(object sender, EventArgs e)
{
canny(picImg[6]);
}
private void pictureBox9_Click(object sender, EventArgs e)
{
canny(picImg[7]);
}
private void pictureBox10_Click(object sender, EventArgs e)
{
canny(picImg[8]);
}
private void pictureBox11_Click(object sender, EventArgs e)
{
canny(picImg[9]);
}
private void button7_Click(object sender, EventArgs e)
{
System.Diagnostics.ProcessStartInfo Info = new System.Diagnostics.ProcessStartInfo();
Info.FileName = "mspaint.exe";
System.Diagnostics.Process.Start(Info);
}
}
}

54

()

Ontology-based Digital Photo Annotation using


Multi-source Information
Yanmei Chai, Xiaoyan Zhu, Sen Zhou, Yiting Bian, Fan Bu, Wei Li and Jing Zhu
Department of Computer Science and Technology
Tsinghua University, 100084
Beijing, China

AbstractThe number of digital photos in the personal computeris exploding.


In an effective photo management system, photoannotation is the most
challenging task. The current photoannotation and management systems suffer
from two crucialproblems. One is the expression of semantic knowledge; the
otheris the way of photo annotation. Aiming at the former problem,this paper
proposes to utilize ontology to organize the domainknowledge and provide
formal, explicit and conceptualannotation. Meanwhile, a dual-level
semi-automatic annotationapproach is also proposed to resolve the latter
problem. Therough annotation layer provides preliminary annotation
byautomatically extracting some semantic concepts from phototitles/texts, time
concepts from EXIF metadata, and photoclassification concepts from the result
of face detectionalgorithms. The accurate annotation layer provides more
detailedannotation by allowing users to modify, delete and add theannotation
information freely. An ontology based photomanagement system OntoAlbum is
implemented in this paper.Experimental results show that the proposed
approach is veryeffective and promising

55

()

Keywords-FamilyAlbum Ontology; OntoAlbum; Multi-sourceinformation;

I.INTRODUCTION
Nowadays personal digital photos are becoming a commonand valuable form of
personal information. As a large numberof family photos and other personal images
pile up, usersencounter severe difficulties with the management and retrievalof them,
especially when they want to find a desired oneamong tens of thousands of photos
using just a simple query.Traditional photo management ways based on
filefolders/albums are far from such requirement. As a result,effective management
of these large personal photo collectionsis becoming indispensable.

II. RELATED WORK


Most current management systems use text/keyword asannotation information
for the photo management. When userswant to find a desired photo, search engines
will match theirqueries with these text/keyword descriptions and present thebest
matches to them. However there are still two crucial issuesin such systems.
One issue is the expression of semantic knowledge. Thetext/keyword-based
approach is intuitive and accurate. Early systems, such as PhotoFinder [1], used
keywords to annotatethe content of photos and names of people to help withretrieval.
However, the keyword-based approach is onlylexically motivated and the lexicons
are not well organized.And the problem of semantic relevance and multiplicity
cannotbe well resolved. For example, a photo annotated with puppywould not
appear in the search results with keyword animalor dog.
The other issue is the way of photo annotation. Traditionalediting annotation
information manually is a tedious and time-consuming task for most people. Though
many researchers aredevoted to improve the convenience of the annotation
tools,such as Show&Tell[2], Shoebox[3], the work of O'Hare et al.[4]and that of
Brendan et al.[5] etc., these ways of annotation arenot appropriate for the large-scale
photo collection.

56

()

The content-based approach is another solution for photoannotation. In CBIR


(Content-based Image Retrieval), keyefforts have concentrated on using low-level
features such ascolor, texture, and shapes to describe and compare imagecontent.
The advantage of this kind of approach is that itrequires little or no manual
intervention. In order to bridge thegap between low-level visual features and
high-level semantics,researchers have done a lot of work based on computer
visionand machine learning techniques, such as Joo-Hwee Lim etal.[6], Wu Yi [7]
and Seungji Yang [8] etc.. However, there arestill many difficult problems in the
fields of computer visionand pattern recognition, which lead to low accuracy of
conceptdetecting and limited number of semantic concepts (about 50 atmost) that
can be recognized. So it couldnt meet users needswell by purely depending on the
conception learning basedapproach. In a word, the research on photo annotation is
still achallenging task in the photo management systems.

III. OUR METHOD


A. FamilyAlbum Ontology
In this paper, a dual-level semi-automatic photo annotationapproach by using
multi-source information and ontology isproposed.
Ontology is a formal, explicit specification of a sharedconceptualization [9].
Different from other data models,ontology focuses on providing an explicit
conceptualizationthat describes the semantics of data with modeling domain
concepts, their relationships and attribute associations. In ourwork, a domain model,
FamilyAlbum, is built to be aknowledge description framework for the personal
digitalphoto management domain, which provides the vocabulary andbackground
knowledge to describe semantic concepts of aphoto.
Ontology has 5 modeling elements: concepts, properties,relation, axioms and
instances [10]. Ontology building is aprocess of defining and enriching these
elements. User studyand survey show that some semantic concepts, such as Time(eg.
Spring, Dusk, 1995), Site (eg. Home, Disneyland,Museum), Person (eg. Myself,

57

()

Mary, Tom), and Event (eg.Gathering, Sport, Visit) etc., are important cues for
photobrowsing and researching. By the frequency statistics of tagsappearing on the
Flickr website, we chose some day to dayvocabularies often used by people to mark
their photos. About120 initialized concepts are included in the
FamilyAlbumontology, in which about 12 are regarded as core top-levelconcepts. By
the bottom-up method, these concepts arearranged hierarchically in the ontology
referencing to thestructure of WordNet. In the hierarchy, the relationshipbetween
concepts can be subsumption. As show in fig. 1(a), theleft diagram is a hierarchy of
Photo class. In the right diagram,PhotoID, TakeTime, Width etc. are DataType
properties ofPhoto class. These intrinsic attributes are assigned to eachconcept
identifying it as a unique one in the whole knowledgeframework. Meanwhile, Object
properties are defined to bracethe globe knowledge network, such as
hasTarget,Event_occurTime etc. These extrinsic attributes representthe semantic
relationships between abstract concepts. Theframework of FamilyAlbum is shown in
figure 1(b).

58

()

Besides abovementioned class concepts, properties andrelations, system also


allows users to create and edit instancesin the ontology as they like. These elements
of ontologyprovide formalized definition of conceptions in this domain,and also
provide annotation information for photos. Ontologycan be regarded as a bridge
between actual expressions ofgrammar and abstract conceptual model.
B. Text Based Automatic Annotation
Representing the content of a photo by texts is quiteintuitive. Users may
organize the photos into directories wherethe directory names can be used as text
input. Some of thephotos may have descriptive filenames that can be used as
textinput. Perhaps there are other text inputs for the photos. Inorder to take full
advantage of these texts to automaticallyannotate photos. A conceptual matching
algorithm based ontext analysis technique is proposed. The detailed process isshown
in the following.
Step 1: Input text
Step 2: Extract key phrases W[] from text
Step 3: For each key phrase wW[] Match it with all of the existingontology

59

()

instances
If there are appropriate matching
Then go to step 7
Else go to step 4
Step 4: Calculate the similarities based onWordNet between the key phrase and
every classconcept in the ontology
Step 5: Find out the best match concept
Step 6: Create an instance using the key phrase for the best match concept
Step 7: Connect the photo and the instance withproperties defined in the
FamilyAlbum ontology
Step 8: IF all of the key phrases are computed
Then complete annotation process
Else go to step 3
WordNet is a lexical reference system, which organizesEnglish nouns, verbs,
adjectives and adverbs into synonym sets,each representing one underlying concept.
We compute thesimilarity between a key phrase and a concept in theFamilyAlbum
ontology using javasimlib, a Java-based tool thatcomputes the similarity between
words (or synsets) over theWordNet hierarchies based on an information theoretic
metric[11]. Given two words or synsets, javasimlib returns a valuebetween 0 and 1,
where 1 indicates the highest similarity.
C. Metadata and SVM Based Automatic Annotation
An image file created by a digital camera usually containsEXIF metadata in the
file header that includes the mostimportant parameters of camera settings when the
photographwas taken. Different camera manufacturers often produce adifferent set of
EXIF parameters. The parameters we found tobe useful for photo annotations are:
date and time, f-stop,exposure time, flash and focal length.
The date and time parameter could inference some semanticconcepts related
time, such as, Calenda (eg. April 5, 2007),season (eg. spring, autumn) and time

60

()

period (eg. dawn,morning) etc. The EXIF metadata together with the colormoment
(CM) features of images also can be utilized toproduce the semantic concepts of
scene classification. In our work, the libSVM library [12] with the RBF kernel is
used totrain and classify photos. Both the CM features and the EXIFfeatures are
extracted and combined as the input of classifier.The photos are grouped into indoor,
outdoor daytime andoutdoor night automatically. The instances of these conceptswill
be created automatically in the ontology for photoannotation when they are
importing the system. Each photo isallowed to own multiple annotations.
D. Face Detection Based Automatic Annotation
Images contain rich semantic information. Inference of thesemantic concepts
from image content is another way toautomatically annotate photos. Despite the
limitation ofcomputer vision technology, there are still some comparativelymature
techniques that could be used in photo managementsystems, such as face detection.
In this paper, face detectionalgorithm provided by Intels OpenCV library are used
tolocate faces in a photo with reasonable accuracy. Bycalculating the numbers of
faces in the photo, the system canclassify the photos into portrait, group, crowd
andscenery photos etc. Meanwhile, the instances of these classconcepts are created
in the ontology and linked with thecorresponding photo instance by the system
automatically.
E. Semi-automatic Dual-level Annotation Approach
Though we can automatically detect and infer someimportant concepts from text,
EXIF metadata and results offace detection, there are still many difficulties to obtain
moreconcepts with higher accuracy. As a result, a fully automaticphoto annotation
system is unrealistic till now.
A semi-automatic dual-level annotation approach is alsoproposed in our work.
The first layer is a rough annotationlayer, in which the Nature Language Process
(NLP) technique,the face detection technique and EXIF information are used
toextract concepts automatically. The other layer is the accurateannotation layer, in

61

()

which the users can modify inappropriateannotations and create new annotations for
photos manually.The approach of dual-level annotation leverages thetediousness of
fully manual annotation and the inaccuracy offully automatic annotation.

IV. EXPERIMENTAL RESULTS


A.Evaluation of Metadata and SVM Based annotation
The image database in these experiments consists of 2376consumer photographs
provided by HP. They depict typicalfamily and vacation scenes, and are taken by
many differentindividuals, at all different times of the year. The database isquite
diverse, and includes snow, bright sun, sea, sunset, nightand silhouette scenes. Image
types not in our database can beeasily added to the training set without changing
anyalgorithms. The images were hand-labeled by three independent people.Included
are 911 ones labeled as outdoor daytime, 833 oneslabeled as indoor and 420 ones
labeled as outdoor night fortraining. Other 107 labeled as indoor, 145 labeled as
outdoorday and 60 labeled as outdoor night photos are used for testing.The
experimental results are shown in table 1.

B.Evaluation of Text Based annotation


We ran the proposed conceptual matching algorithm onthree photo title texts
datasets. The first one is from a personalphoto collection. The second one is from
URLhttp://dancephotography.com/, which describe many scenes oflife. The last one
is from http://www.twin-springs.com, whichdescribe many natural landscapes. These

62

()

text items areextracted key words and matched with the concepts in theontology.
Compared with manual matching, the precisions ofthe algorithm on the three
databases are shown in table 2.

C.Evaluation of automatic classification based on the facedetection


We test the performance of face detection based automaticclassification
algorithm as well. The experiment is performedon a dataset that includes a total of
1433 personal photos.These photos are classified into portrait, group, crowd
andscenery according to the number of automatically detectedfaces. The comparison
of the automatic classification with themanually annotation is show in table 3.

D.Photo management system: OntoAlbum


In order to verify the validity of the proposed approach, wedeveloped an
ontology based photo management system:OntoAlbum. The diagram of the
semi-automatic annotation andmanagement system is shown in figure 2.
Users can browse, annotate and search their photos in theOntoAlbum through
the GUI (see fig.3). When selected photosare imported into the database, automatic
annotation processorcreates associates an ontology instance for each photo based

63

()

onthe above-mentioned algorithms. Moreover, improperannotation information can


be modified anywhere at any timewhen users browsing photos. These photo
annotations arestored in a repository library in the OWL format.
Once the photos are well annotated, the query processorconducts the query by
using W3Cs SPARQL language. Thelogical foundation of OWL is DL (Description
Logic), whichsupports reasoning about instances. For example, we want tosearch the
photos of Anne. Clicking the instance Anne ofPerson class (see the most right of
Figure 3), all the photosrelated to Anne will be extracted explicitly from the
databasethrough the ontology inference.
V. CONCLUSION AND FUTURE WORK
In this paper, we propose to utilize multi-sourceinformation and ontology to
annotate and manage personaphoto collections. Ontology can provide formal,
explicit andconceptual annotation lexicons and inference query. The semi-automatic
dual-level annotation method alleviates thetediousness of manual annotation by
automatically extractingsemantic concepts from text, EXIF metadata and face
detectionresult, and improves the performance of annotation by allowingusers to
freely edit the annotation information. Furthermore,the implemented OntoAlbum
system shows that the proposedapproach is very effective and promising. More
conceptsautomatic extraction algorithms and more complex ontology based
inference algorithm will be further studied.
ACKNOWLEDGMENT
The work is funded by China Postdoctoral ScienceFoundation No. 20080440262
and 20080440381. It is alsosupposed by National 973 Foundation of China
No.2007CB311003 and National University Students InnovativePilot Scheme

REFERENCES
[1] Kang, H., Shneiderman, B. Visualization Methods for Personal PhotoCollections:
Browsing and Searching in the PhotoFinder. In Proc. ICME2000, New York City,

64

()

New York 1539-1542, 2000


[2] Timothy J. Mills, David Pye, David Sinclair, and Kenneth R. Wood.SHOEBOX:
A Digital Photo Management System, 2000
[3] R.K. Srihari et al. Multimedia Indexing and Retrieval of Voice-Annotated
Consumer Photos. Proceedings of the Multimedia Indexingand Retrieval Workshop,
pp 1-16, 1999
[4] O'Hare N, Gurrin C etal. Using Text Search for Personal PhotoCollections with
the MediAssist System. 22nd Annual ACMSymposium on Applied Computing,
Seoul, Korea, 11-15, 2007, pp: 880-881
[5] Brendan Elliott, Z. Meral Ozsoyoglu. A Comparison of Methods forSemantic
Photo Annotation Suggestion. 22th International Symposiumon Computer and
Information Sciences, Ankara, Turkey, 2007
[6] Joo-Hwee LimQi TianPhilippe Mulhem. Home Photo ContentModeling for
Personalized Event-Based Retrieval, IEEE Multimedia,2003, 10(4), pp: 28-37
[7] Wu, Yi Bouguet, Jean-Yves Nefian, Ara Kozintsev, Igor V.Learning Concept
Templates from Web Images to Query PersonalImage Databases. The IEEE
International Conference on Multimediaand Expo, Beijing, 2007, pp: 1986-1989
[8] Seungji Yang Sang-Kyun Kim Yong Man Ro. Semantic HomePhoto
Categorization. IEEE Transactions on Circuits and Systems forVideo Technology,
2007, 17(3): 324 335
[9] Thomas R. Gruber. A Translation Approach to Portable OntologySpecifications.
Knowledge Acquisition, 1993, 5(2): 199-220
[10] Alexander Maeehe et al. Ontology learning for the semantic web.Boston,
Nonvell: Kluwer Aeademie Publishers, 2002.18~20
[11] N. Seco, T. Veale, and J. Hayes. An Intrinsic Information ContentMetric for
Semantic Similarity in WordNet. Proc. of EuropeanConference on Artificial
Intelligence, 2004.
[12] Wang, Y., Zhang, H., 2001, Content-Based Image OrientationDetection with

65

()

Support Vector Machines in IEEE Workshop onContent-based Access of Image and


Video Libraries, pp 17-23

Yanmei Chai, Xiaoyan Zhu, Sen Zhou, Yiting Bian, Fan Bu, Wei Li and Jing Zhu

100084

/ EXIF

OntoAlbum

FamilyAlbum Ontology, OntoAlum,

66

()

/
/

/
PhotoFinder


Show&TellShoeboxOHare

Joo-Hwee Lim Wu Yi SeungjiYang

50

FamilyAlbum Ontology

FamilyAlbum

67

()

1995

Flickr
120
FamilyAlbum 12
WordNet
1aPhotoID
TakeTime

hasTargetEvent_occurTime
FamilyAlbum 1b

68

()

1
2 W[]
3 wW[]

7
4
4 WordNet

69

()

5
6
7 FamilyAlbum
8 3
WordNet

FamilyAlbum javasimlib Java
synsets WordNet [11]
synsetsjavasimlib 0 1 1
SVM
EXIF

EXIF

2007
4 5 EXIF

libSVM RBF
EXIF

OpenCV

70

()

EXIF

NLP EXIF

A. SVM
2376 HP

3 833 911
420 107 145
60 1

B.
3

71

()

URLhttp://dancephotography.com/
http://www.twin-springs.com

C.

1433
3

D OntoAlbum

OntoAlbum 2

72

()

GUI OntoAlbum 3

OWL
W3C SPARQL
OWL DL
Person 3
3 OntoAlbum

73

()

EXIF
OntoAlbum

74

You might also like