You are on page 1of 83

CHAPTER 1

INTRODUCTION
1.1 ABOUT THE PROJECT

The main aim of this project is to effectively manage the video blogs
(vlogs) and make them more conveniently accessible for users.

A video blog (vlog) is a blog which uses video as the primary content, often
accompanied by supporting text, image, and additional metadata to provide
context.Compared with general videos, vlogs have several unique
characteristics. A vlog often provides textual content as description of the
video. As a medium for communication, a vlog usually has some comment
entries conveying vlog viewer’s opinions. Some unique but useful
information of a vlog, such as submitting time, viewed times, comment entry
number, and popularity rating, can be easily obtained.

We proposed a novel vlog management model which is comprised of


automatic vlog annotation and user-oriented vlog search. For vlog
annotation, we extract informative keywords from both the target vlog itself
and relevant external resources; besides semantic annotation, and we
perform sentiment analysis on comments to obtain the overall evaluation.
For vlog search, we present saliency-based matching to simulate human
perception of similarity, and organize the results by personalized ranking
and category-based clustering.

1
1.2 LITERATURE SURVEY

Blogging has been a textual activity, but text is only one aspects of the
diverse skills which are needed in order to understand and manage different
aspects of modern communication. Broadband connections are likely to
stimulate a rapid increase in audio-visual services on the web, presumably
changing the future conditions for blogging. Videoblogs can facilitate
practices which promote media literacy and collaborative learning through
the making of collective documentaries. Videoblogs with wiki-like functions
promise to turn users into producing collectives rather than individual
consumers of audiovisual content.

Arguably textual blogging has an effect on journalism by making the


process of gathering information more dynamic, potentially involving the
public before the moment of print publication, or even as providers of
content. When it comes to audiovisual media the users begin to abandon
broadcasting in favour of broadband services . This development will
probably continue as new technologies transform the viewing experience
into a more personalized activities, potentially by-passing the traditional
broadcaster completely .

2
1.2.1 VIDEOBLOGS

The importance of video, being a very powerful medium, the increased


amount of video material on the web, and the possibilities offered by
weblogs when it comes to collaboration sums up in “videoblogs” as one of
the most promising tools which may foster media literacy.

Blogs began as a textual genre of personal publishing, but the genre has
developed visual expressions, like phototblogs, and more recently adapting
sound and video. Most bloggers publish short posts, they write quite often
with an extensive use of hypertext-linking. Linking and commenting makes
blogging a kind of collaboration, where individual bloggers are writing in
context created through collective, but often unorganised effort.

Textual blogs have at least three characteristics, apart from usable and easy
accessible software, which have made them easy to use whether as a
"producer" (writer), a "consumer" (reader) or both: Textblogs are based on
non-temporal media which is easily controllable, they are easy to cite and
they are part of long textual tradition, "re-mediating" many of the features
known from diaries and journals. Even though there are several substantial
differences the easiest way to explain blogging often is to begin with
reffering to "an online diary".

When it comes to audiovisual blogs these are more difficult to explain:


Audio- and videoblogs are based on temporal media and there are no
established tradition which they are closely related to. Audioblogs can

3
hardly be compared to radio or recorded sound and videoblogs are not like
television or private filmmaking: In contrast to broadcasting blogs are
personal and at the same time they are shared by people outside the private
sphere.Production, presentation and distribution: In all these areas blogging
promise to be close to the opposite of broadcasting. Looking for the sources
of a videoblogging language we therefore have to explore other aspects of
audiovisual culture.

1.2.1.1 SEVERAL VIDEOBLOG-TRADITIONS

First one have to come to terms with what characterizes a videoblog. We


have to distinguish between several significantly different technical
solutions which claim to be videoblogs, ranging from simply uploading
unedited videofiles, via play-lists to edited sequences, sometimes with
complex interactivity. It becomes difficult to define distinct genres, but in
general there seems to be some major traditions, which of course interfere
into a number of sub-genres. I make a distinction between "vogs", which are
based on pre-edited sequences with interactive features, video-"moblogs",
consisting of relatively short, autonomous video-clips, and playlists, which
are collections of references to video-files on different servers.

Some of the characteristics of vogs and "vogging" are formulated in a


manifesto written by Adrian Miles, inspired by the danish filmmaking
initiative Dogme 95. In this tradition videoblogs are personal publishing of
video, exploring the potential of linking, using technology which is easy
available. Vogs are made of edited sequences which normally include
interactive elements. They are typically made with different kinds of

4
software on the producers computer and posted to individual websites. The
other major tradition has emerged along with the introduction of mobile
input-devices with internet connection (smartphones, PDAs, camera
cellphones).

Blogging from mobile devices is particularly interesting in relation to


documentary filmmaking, trying to grasp moments in life provoked or
captured by the presence of a camera. A pioneer within the moblogging
tradition is Steve Mann who has experimented with wearable cameras,
posting images to the Web since 1994 (Mann 1997). Today most moblogs
are based on technology quite similar to textual blogs with posts containing
uploaded pictures or videoclips and additional text. Moblogs are often
hosted by professional service providers where a large number of blogs
share the same infrastructure. Videos on moblogs normally contain
individual videoclips, not edited sequences. Vogs and moblogs are quite
different, both regarding they way they are produced and the way they are
consumed.

Miles makes a good distinction between the two traditions by emphasizing


that "a vog is a video blog where video in a blog must be more than video in
a blog". The posts in vogs are edited and may offer quite complex
interactivity. Therefore those who produce vogs have to combine skills in
the use of software with the ability to manipulate moving images and add
hypertextual interactivity. If we restrict vogs to video-content there are not
many voggers around, even including those posting cinematic 2D- and 3D-
animations the number is still relatively small.

5
Even though there are no major technical barriers some skills are required,
preventing most users from becoming "voggers". Moblogs on the other hand
are easy to use, in most cases it is just a matter of simply uploading video-
files to a dedicated webserver. However, this is not only a question of ease
of use, possibly even more important is the time which the producer has to
invest in order to get his material on the web. When posting is not time-
consuming bloggers are encouraged to post often, an aspect which has made
a lot of text-bloggers and blog-readers become avid users. The same criteria
for success apply to videoblogs.

Playlists are perhaps not videoblogs, but an interesting genre because they
use technologies which may bridge the gap between “vogs” and “moblogs”.
Playlists adress individual files on different servers and may even provide a
level of interactivity without manipulating the content in these files. One
way to achive this is by using SMIL (Synchronized Multimedia Integration
Language) an established XML-dialect defined by The World Wide Web
Consortium in order to control different media distributed through the
internet. SMIL seems to be an ideal platform for distributing various content
as "movies" without moving or manipulating the original source-files
(videoclips, pictures and text). Since SMIL-files are based on an open format
(XML) stored as ASCII-text it is quite easy to make alternative versions of
playlists, taking advantage of server-side applications and the internets
transparent nature.

6
1.2.2 COLLECTIVE DOCUMENTARIES

Looking for existing genres which videoblogs re-mediates the closest we get
are some traditions known from documentary filmmaking. One of these is
diary-films, which are personal first-person narratives. Another tradition is
found-footage films, which often are based on old private material filmed by
others than the filmmaker himself. Found-footage films are part of a larger
tradition known as compilation film, using material from a variety of
sources, including archived material. Any kind of film and video have the
possibility of ending up as found footage: Your grandfathers Super-8
movies, old commercials, parts of feature films, recorded televion etc. Quite
a few excellent filmmakers have made their first movies with material
found as leftovers in a studio or in a film school. William Wees discusses
three general ways in which found footage is most often used

1. Compilation : Film where the editor cuts together pieces of footage in


order to illustrate a point. The images are intended to represent “reality”
and is typical in television documentaries.

2. Collage : Film which use found footage to create metaphors, provoke self-
consciousness and encourage critical viewing. The viewer is able to read
images critically with attention to the metaphors.

3. Appropriation : Film where images are reused in order to be decorative.


Representation is about surface, rather than creation of secondary

7
meanings. Wees relates this to postmodernism and to the loss of the film-
material´s historical meaning.

Videoblogs are related to both diary-films and found footage: The material is
a combination of personal recordings and material provided by others.
With the introduction of mobile cameras and fast, easy and reliable ways of
sharing video-clips the concept of found footage is going to change. In the
future video recorded by individuals promise to be a kind of found footage
more closely related to recent or ongoing events. This material is hardly
going to be used in the same way as traditional found footage, because the
time between recording and use will be much shorter. Instead new patterns
of use will emerge: Different aspects of voyeurism are quite obvious, but
hopefully also ways of using online video in order to make individual
statements about issues not covered by today’s mass-media. The most
successful online environments seems to be those which are designed in
order to make it possible to post information at different levels, socializing
new users into the systems publishing-culture. Blogs provide some of these
socializing effects providing an individual base for entering a community,
blurring the boundaries between production, distribution and consumption.

1.2.2.1 THE VIDEOBLOGGING PROCESS

In order to make a videoblogging community of practice where the


members benefit from each others creative effort one have to take into
consideration the different stages of the production process. Trying to build
a theoretical and practial framework for a "digital cinematography" one have
to acknowledge that networked computers cause significant changes to some

8
parts of the production process while others remain almost the same as if the
process was done with an offline computer. After the video-material is
recorded the videoblogging process can be divided into five stages :
1. "Posting", 2. "Selecting", 3. "Editing", 4. "Storing" and 5. "Re-editing".

1. Posting

The success of blogging is partly a result of cheaper net connections and the
increasing number of computer literate people using the web. Videologs may
become a genre where "href tracks", "sprites" and "interactive" elements
enhance the users personal experience in ways that are unique to computer
mediated communication. The considerable downside is that advanced
interactive features have consequences for the amount of work which each
user have to put into his posts. When posting becomes a task the most
important advantage of blogging disappears and the “media literacy-
potential” will decrease. Following the moblog tradition, emphasizing
simplicity, posting should be as easy as possible. The users should be
encouraged to post short clips of unedited video-material which is
transformed into a unified video-format on the server, becoming a common
resource for future editing and citation. Comments is a way of posting
which is important when building an online community for at least two
major reasons: First comments is the easiest way to become a part of a
community without having to be among those who provide the content in the
first place. Secondly comments help maintaining a community by making
different members aware of each other. Frequent comments give those who
post an explicit confirmation of their publics presence in addition to some

9
substantial feedback. This is particularly important in photo- and video-
blogs because these are personal expressions, often even more vivid than
most textual blogs which in many cases might be considered as varieties of
content management.

2.selecting
A flexible system must allow a media-object to be assigned to multiple
categories, allowing hierarchy but without enforcing it. This will in most
cases be an excellent system usable for a large videoblogging environment,
perhaps even with features combining personal and global categories.
Combined with personal information, time and date of posting and possibly
geograpgical information, this provides metadata for searching or automatic
generation of play-lists which may be re-edited manually. The problem with
searching and search results is that the information is shown out of context.
In order to get an idea of the quality of a specific clip it might be helpful to
know how many who have used it in a sequence. If a lot of people have used
it you have an indication that it might be worth looking at. Sequences are in
fact the best way to view a clip as long as it provides both context and an
example of how the clip might be trimmed (where to begin and where to
end).

3. Editing

Both the edit and the link provides context to separated segments, but both
theory and practice addressing editing / linking tend to concentrate on the
segments. Editing techniques which have a duration, like dissolves and
wipes, are easier to identify, but they are almost never used unless the
director wants to call for the viewers attention. In an analogous manner

10
theory and practice in hypertext design emphasize the nodes in hypertext
should be designed in order to make navigation as intuitive as possible,
emphasizing the importance of a unified design in order not to confuse the
user. The editing interface has to display all the clips which the user have
selected during a session.The editing process result in a text-document
(SMIL), small enough to allow storage of an almost infinite number of
sequences, which can be made by combining a limited number of video-
clips.

4. Storing

Collective editing capabilities relies on storage which consume as little disk-


space and bandwidth as possible. Downloading video-clips in order to re-
edit a sequence and uploading a new version will not be effective. I would
like to propose an approach to videoblogs focusing on the simplicity known
from moblogs combined with an easy to use editing-interface which makes
it possible to combine clips from different logs into sequences and store
these as SMIL-documents (Synchronized Multimedia Integration Language).
In order to make a flexible system all references to users, shots and
sequences are stored in a database, generating SMIL-documents "on the fly".
Because SMIL documents are plain text files which might be played in
Quicktime it becomes an easy task to generate customized movies using
server-side applications. The online editor must be capable of combining
clips into sequences and store these as SMIL-documents in the creators blog.
The SMIL-document include references to the video-clips, controlling order
and duration, positioning, and additional text.

11
5. Re-editing
Re-editing in a videoblogging environment means that any user can take a
sequence or a number of individual video-clips into the editor, make a re-
edit and store the result as a new sequence in his own blog. The idea of
making “collective documentaries” or fiction in this way is intriguing: Those
who are unhappy with a version, may comment on the original sequence or
just make their own version, possibly adding their own content.

1.2.2.2 CURRENT VIDEO BLOGS


Current video blogs are essentially text blogs with externally linked videos
for each entry. Though the fragments of video content form a cohesive diary,
they’re always introduced and navigated to via text. . Typical examples of
current video blogs are Miles teaches the theory and practice of hypermedia
and interactive video at RMIT University, Australia, and uses his blog to
demonstrate some of the ideas.His video blog differs because he includes
timed hyperlinks to other Web resources inside his videos, and has
postproduced speech tracks with timed transcriptions of his speech inside the
QuickTime video files. It’s an exceptional video blog, requiring skills and
tools not usually available to the more common blogger. Although those
pioneering the creation of video blogs are finding ample room for
expression, and they obviously enjoy pushing the limits of current
technology, users cite a few problems with current video blogs
• there’s no way to add comments in video form;
• video items can’t be easily found via search engines;
• video items can’t be aggregated easily;

12
• interesting clips can’t be viewed on their own; instead the video must
be played back in its entirety; and as with most multimedia on the
Web, sufficiently high bandwidth for reasonable quality video isn’t
widely available.

1.2.2.3 VIDEO BLOG SEARCH


The aim is to make video blogs as easily searchable by Web search engines
as normal Web pages. Web search engines increasingly support scanning of
RSS and Atom feeds, which allows more tightly coupled searches of actual
blog entries. In fact, the original blogging company Blogger was acquired
by Google to bring about such developments. Although a multimedia
syndication language would be just as amenable to scanning and indexing by
search engines, it would unfortunately be able to index only the metadata
referring to a video blog entry, not the actual video content itself. Ideally, the
search engine should index the video content itself, by scanning for
embedded transcriptions and timed metadata or even by performing
automated analysis of the video content directly.

1.2.2.4 VIDEO BLOG COMMENTS


That readers can add comments to blog entries is very popular, allowing
friendly advice and spontaneous discussions to take place in remote corners
of the Web. It’s easy to imagine how lively these discussions would be in
video format—if any viewer could easily provide feedback for others to
watch and themselves respond to. The technology for such video forums is
already being explored in various areas of the telecommunications industry,
but usually in the context of developing vendor-specific applications for

13
limited numbers of users. In such environments, users are simply able to
“point-andshoot” to reply to commentary posted by others.

1.2.3 WORDNET: A LEXICAL DATABASE FOR ENGLISH


WordNet provides a more effective combination of traditional lexicographic
information and modern computing. WordNet is an online lexical database
designed for use under program control. English nouns, verbs, adjectives,
and adverbs are organized into sets of synonyms, each representing a
lexicalized concept. Semantic relations link the synonym sets.

1.2.3.1 LANGUAGE DEFINITIONS


We define the vocabulary of a language as a set W of pairs (f,s), where a
form f is a string over a finite alphabet, and a sense s is an element from a
given set of meanings. Forms can be utterances composed of a string of
phonemes or inscriptions composed of a string of characters. Each form with
a sense in a language is called a word in that language. A dictionary is an
alphabetical list of words. A word that has more than one sense is
polysemous; two words that share at least one sense in common are said to
be synonymous. A word’s usage is the set C of linguistic contexts in which
the word can be used. The syntax of the language partitions C into syntactic
categories. Words that occur in the subset N are nouns, words that occur in
the subset V are verbs, and so on. Within each category of syntactic contexts
are further categories of semantic contexts—the set of contexts in which a
particular f can be used to express a particular s. The morphology of the
language is defined in terms of a set M of relations between word forms. For
example, the morphology of English is partitioned into inflectional,
derivational, and compound morphological relations. Finally, the lexical

14
semantics of the language is defined in terms of a set S of relations between
word senses. The semantic relations into which a word enters determine the
definition of that word.

In WordNet, a form is represented by a string of ASCII characters, and a


sense is represented by the set of (one or more) synonyms that have that
sense. WordNet contains more than 118,000 different word forms and more
than 90,000 different word senses, or more than 166,000 (f,s) pairs. words in
WordNet are polysemous; approximately 40% have one or more synonyms.
WordNet respects the syntactic categories noun, verb, adjective, and adverb
—the so-called open-class words .For example, word forms like “back,’’
“right,’’ or “well’’ are interpreted as nouns in some linguistic contexts, as
verbs in other contexts, and as adjectives or adverbs in other contexts; each
is entered separately into WordNet. It is assumed that the closed-class
categories of English—some 300 prepositions, pronouns, and determiners—
play an important role in any parsing system; they are given no semantic
explication in WordNet. WordNet includes the following semantic relations:
• Synonymy is WordNet’s basic relation, because WordNet uses sets of
synonyms (synsets) to represent word senses. Synonymy (syn same, onyma
name) is a symmetric relation between word forms.
• Antonymy (opposing-name) is also a symmetric semantic relation between
word forms, especially important in organizing the meanings of adjectives
and adverbs.
• Hyponymy (sub-name) and its inverse, hypernymy (super-name), are
transitive relations between synsets. Because there is usually only one
hypernym, this semantic relation organizes the meanings of nouns into a
hierarchical structure.

15
• Meronymy (part-name) and its inverse, holonymy (whole-name), are
complex semantic relations. WordNet distinguishes component parts,
substantive parts, and member parts.
• Troponymy (manner-name) is for verbs what hyponymy is for nouns,
although the resulting hierarchies are much shallower.
• Entailment relations between verbs are also coded in WordNet.
An XWindows interface to WordNet allows a user to enter a word form and
to choose a pull-down menu for the appropriate syntactic category. The
menus provide access to the semantic relations that have been coded into
WordNet for that word.

1.2.3.2 CONTEXTUAL REPRESENTATIONS


In information retrieval, a query intended to elicit material relevant to one
sense of a polysemous word may elicit unwanted material relevant to other
senses of that word. For example, in computer-assisted instruction, a student
asking the meaning of a word should be given its meaning in that context,
not a list of alternative senses from which to pick. WordNet lists the
alternatives from which choices must be made. WordNet would be much
more useful if it incorporated the means for determining appropriate senses,
allowing the program to evaluate the contexts in which words are used. The
limits of a linguistic context can be defined arbitrarily, but we prefer to
define it in terms of sentences. That is to say, two words co-occur in the
same context if they occur in the same sentence. A semantic concordance is
a textual corpus and a lexicon combined so that every substantive word in
the text is linked to its appropriate sense in the lexicon.

16
1.2.4 WORD ASSOCIATION NORMS, MUTUAL
INFORMATION, AND LEXICOGRAPHY
1.2.4.1 MEANING AND ASSOCIATION
It is common practice in linguistics to classify words not only on the basis of
their meanings but also on the basis of their co-occurrence with other words.
Running through the whole Firthian tradition, for example, is the theme that
"You shall know a word by the company it keeps" .On the one hand, bank
co-occurs with words and expression such as money, notes, loan, account,
investment, clerk, official, manager, robbery, vaults, working in a, its
actions, First National, of England, and so forth. On the other hand, we find
bank co-occurring with river, swim, boat, east (and of course West and
South, which have acquired special meanings of their own), on top of the,
and of the Rhine.
The search for increasingly delicate word classes is not new. In
lexicography, for example, it goes back at least to the "verb patterns"
described in Hornby's Advanced Learner's Dictionary (first edition 1948).
What is new is that facilities for the computational storage and analysis of
large bodies of natural language have developed significantly in recent
years, so that it is now becoming possible to test and apply informal
assertions of this kind in a more rigorous way, and to see what company our
words do keep.

1.2.4.2 WORD ASSOCIATION AND PSYCHOLINGUISTICS


Word association norms are well known to be an important factor in
psycholinguistic research, especially in the area of lexical retrieval.
Generally speaking, subjects respond quicker than normal to the word nurse
if it follows a highly associated word such as doctor. Some results and

17
implications are summarized from reaction-time experiments in which
subjects either (a) classified successive strings of letters as words and
nonwords, or (b) pronounced the strings.

1.2.4.3 PREPROCESSING WITH A PARSER


Hindle has found it helpful to preprocess the input with the Fidditch parser
to identify associations between verbs and arguments, and postulate
semantic classes for nouns on this basis. Hindle's method is able to find
some very interesting associations, demonstrate. After running his parser
over the 1988 AP corpus (44 million words), Hindle found N = 4,112,943
subject/verb/object (SVO) triples. The mutual information between a verb
and its object was computed from these 4 million triples by counting how
often the verb and its object werefound in the same triple and dividing by
chance. Thus, for example, disconnect/V and telephone/0 have a joint
probability of 7/N. In this case, chance is 84/N x 481/N because there are 84
SVO triples with the verb disconnect, and 481 SVO triples with the object
telephone. The mutual information is log z 7N/(84 × 481) = 9.48. Similarly,
the mutual information for drink/Vbeer/O is 9.9 = log 2 29N/ (660 × 195).
(drink/V and beer/O are found in 660 and 195 SVO triples, respectively;
they are found together in 29 of these triples). This application of Hindle's
parser illustrates a second example of preprocessing the input to highlight
certain constraints of interest. For measuring syntactic constraints, it may be
useful to include some part of speech information and to exclude much of
the internal structure of noun phrases. For other purposes, it may be helpful
to tag itemsand/or phrases with semantic labels such as *person*, *place*,
*time*, *body part*, *bad*, and so on.

18
1.2.5 AUTOMATIC SEMANTIC ANNOTATION FOR VIDEO
BLOGS
Vlog annotation is essentially a multi-labeling process , as a vlog can usually
be annotated with multiple words. There exist many effective approaches for
multi-label image/video annotation, and it has become a trend that the
annotation should be extracted not only from the target image/video itself,
but also from other images/videos which are relevant to it.

1.2.5.1 AUTOMATIC VLOG ANNOTATION


In our vlog annotation model, the annotation of a vlog consists of two parts:
the intrinsic annotation extracted from the text of the target vlog and the
expanded annotation from relevant external resources.

1.2.5.2 INTRINSIC ANNOTATION EXTRACTION


Since a vlog often has supporting text in itself, we can extract informative
keywords as its intrinsic annotation. The textual content in a vlog mainly
comprises the title, description, and comments, among which the title and
description are closely related to the semantics of the vlog video, while the
comments are often filled with irrelevant words and thus too noisy to be
used. As a result, only the title and description are kept for annotation
extraction. As the title indicates the main topic of the whole vlog, it is of the
greatest importance for understanding the semantics of the vlog. Therefore
we first extract annotation words from the title. After stop word removal,
important words are reserved in the word set Wtitle. For the textual
description, we also remove the stop words beforehand. Then, using the
standard text processing technique such as tf-idf, we can acquire the
important words, and create another word set, Wdescription. Since in the

19
description not all the words are relevant to the semantics of the central
video, Wdescription can be rather noisy. Considering the fact that in an
article, keywords are usually used to reveal the main subject, or the title, we
assume that if an annotation word is a good one, it should be highly
correlated with at least one word in Wtitle. Therefore, we delete from
Wdescription the words which have low correlation with all the words in
Wtitle.

1.2.5.3 CONTEXT-BASED ANNOTATION EXPANSION


External annotation candidate extraction
Inspired by the search-based annotation methods , we conduct annotation
expansion for the target vlog through a searchbased mode, where a labeled
database is indispensable. As we know, YouTube4 is one of the most
popular video sharing websites which has by far the biggest collection of
videos. Each video on YouTube is labeled by one or more tags. Therefore,
we use YouTube as our labeled database. Given a keyword query, the text-
based video search engine (powered by Google) in YouTube can return
rather good results, hence we can use YouTube search to find the
semantically related videos. For the target vlog, we submit each word w in
Wintrinsic as a query to YouTube searcher, and get the corresponding search
results Rw (for simplicity, only the top-ranked 20 results are included). For
each result r in Rw, we extract the video’s representative frame fr (which is
usually the first frame of the video) and the corresponding tags. Then,
among the semantically related videos, visually related ones are selected
through content-based similarity between the vlog video and the result
videos found on YouTube. We define the visual similarity between a result

20
video r and the vlog video v as the maximum image similarity between the
representative frame fr of r and the keyframe fv of v:
After the above two search stages, we have obtained a batch of videos which
are relevant to the vlog both semantically and visually with regard to the
intrinsic annotation word w. We then gather the tags of all the reserved
videos into a tag set T(w), which is adopted as the external annotation
candidates for the vlog. This process is applied for each intrinsic annotation
word w in Wintrinsic. Finally, we obtain the word set Wexternal for external
annotation candidates:
Context-based annotation refinement
Although the videos used for annotation expansion are all semantically and
visually relevant to the target vlog, it dose not follow that all the tags of the
videos are also relevant to the vlog. In the process of annotation expansion,
we have to deal with the serious problem of semantic drift. Therefore, we
should refine the expanded annotation candidates and delete the irrelevant
words. We calculate the relevance between an annotation candidate c and
the vlog by comparing c with the words in Wintrinsic. As we know, when
comparing two words, we should consider not only the semantics in them
but also the specific contexts they are in. In this paper, we propose a novel
context histogram to depict the semantics of a word in a specific context. For
a word w, its context is substantially a set of words which confines its
specific semantics. We first calculate the one-to-one correlation between w
and each of the words in its context Wcontext. Then, we organize all the
correlation values into a histogram and get the context histogram for w with
respect to Wcontext . The problem of context comparison is now reduced to
histogram comparison. Here we simply use histogram intersection as a
metric of the context histogram similarity.We perform the context-based

21
external annotation refinement as follows: For an intrinsic annotation word
w of Wintrinsic, we create its context histogram with respect to Wcontext =
Wintrinsic − {w}; while for an annotation candidate c in Wexternal, we also
build its context histogram with respect to the same Wcontext = Wintrinsic −
{w}. In order to compare c with w, we calculate both their one-to-one word
correlation Simword and their contextual similarity Simcontext. The total
correlation between c and w is defined as:
Sim ( , ) Sim ( , total c w =α word c w) +β Simcontext (c,w) ,
where α and β are adjustable parameters. Only those annotation candidates
with high relevance to Wintrinsic are kept in Wexternal. After the
refinement, we merge Wintrinsic and Wexternal to get the final annotation
for the target vlog.

1.2.6 ANNOSEARCH: IMAGE AUTO-ANNOTATION BY SEARCH


A novel solution to image auto-annotation problem is rather than training a
concept model using supervised learning techniques as most previous works
do, we propose a data-driven approach leveraging the Web-scale image
dataset and search technology to learn relevant annotations. In an ideal case,
if a well annotated and unlimitedscale image database is available, then for
any query image, we can find its duplicates in this database and simply
propagate its annotation to the query image. In a more realistic case that the
image database is of limited scale, we can still find a group of very similar
images in terms of either global features or local features, extract salient
phrases from their descriptions, and select the most salient ones to annotate
the query image. Thus to by-pass the semantic gap, we can divide and
conquer the annotation problem in two steps:
1) find one accurate keyword for a query image;

22
2) given one keyword, find complementary annotations to describe the
details of this image.
The requirement in the first step is not as lacking in subtlety as it may first
seem. For example, in a desktop photo search, users usually provide a
location or event name in the folder name. Or, in a Web image search, we
can choose one of a Web image’s surrounding keywords as the query
keyword.

1.2.6.1 THE ANNOSEARCH SYSTEM


It contains three stages: the text-based search stage, the content-based search
stage and the annotation learning stage.

1.2.6.2 TEXT-BASED SEARCH


Jeon et al. recommend using high quality training data to learn prediction
models as it affects greatly the annotation performance. Hence in our
approach, we collected about 2.4 million high-quality Web images
associated with meaningful descriptions from online photo forums. These
descriptions capture the corresponding images’ contents to certain degrees.

1.2.6.3 CONTENT-BASED SEARCH


Because visual features are generally of high dimensional, similarity-
oriented search based on visual features is always a bottleneck for large-
scale image database retrieval on search efficiency. To overcome this
problem, we adopt a hash encoding algorithm to speed up this procedure.

23
CHAPTER 2
SYSTEM ANALYSIS

2.1 EXISTING SYSTEM

The traditional annotation models focus exclusively on the semantic


aspect, while the sentimental aspect is totally neglected. The most existing
vlog search methods employ the traditional text-based retrieval techniques
which mainly rely on the textual content of the vlogs.

2.2 PROPOSED SYSTEM

In proposed vlog management model when a user uploads a vlog to


the database, semantic annotation will run automatically using vlog text and
relevant external resources, and sentiment evaluation is obtained from vlog
comments. After that, the vlog will be stored in the database with the
corresponding annotation and evaluation. When a user submits a query to
the search engine, the vlog search module will access the vlog database to
obtain relevant vlogs by saliency-based matching; then, using user specified
ranking strategy and clustering, the results will be returned to the user in a
well-organized manner.

24
CHAPTER 3
REQUIREMENT SPECIFICATIONS

3.1 HARDWARE REQUIREMENTS

 Hard Disk : 10GB and above


 RAM : 256MB and above
 Processor : Pentium IV

3.2 SOFTWARE REQUIREMENTS

Windows Operating System


JDK 1.6
Web Browser – Internet Explorer
Glassfish Application server
Apache Tomcat Server
Oracle 10G
JMF

3.3 SOFTWARE DESCRIPTION


3.3.1 JAVA

It is a Platform Independent. Java is an object-oriented programming


language developed initially by James Gosling and colleagues at Sun
Microsystems. The language, initially called Oak (named after the oak trees
outside Gosling's office), was intended to replace C++, although the feature
set better resembles that of Objective C.

25
3.3.1.1 INTRODUCTION TO JAVA

Java has been around since 1991, developed by a small team of Sun
Microsystems developers in a project originally called the Green project.
The intent of the project was to develop a platform-independent software
technology that would be used in the consumer electronics industry. The
language that the team created was originally called Oak.

The first implementation of Oak was in a PDA-type device called Star Seven
(*7) that consisted of the Oak language, an operating system called
GreenOS, a user interface, and hardware. The name *7 was derived from the
telephone sequence that was used in the team's office and that was dialed in
order to answer any ringing telephone from any other phone in the office.

Around the time the First Person project was floundering in consumer
electronics, a new craze was gaining momentum in America; the craze was
called "Web surfing." The World Wide Web, a name applied to the Internet's
millions of linked HTML documents was suddenly becoming popular for
use by the masses. The reason for this was the introduction of a graphical
Web browser called Mosaic, developed by ncSA. The browser simplified
Web browsing by combining text and graphics into a single interface to
eliminate the need for users to learn many confusing UNIX and DOS
commands. Navigating around the Web was much easier using Mosaic.

It has only been since 1994 that Oak technology has been applied to the
Web. In 1994, two Sun developers created the first version of Hot Java, and
then called Web Runner, which is a graphical browser for the Web that
exists today. The browser was coded entirely in the Oak language, by this

26
time called Java. Soon after, the Java compiler was rewritten in the Java
language from its original C code, thus proving that Java could be used
effectively as an application language. Sun introduced Java in May 1995 at
the Sun World 95 convention.

Web surfing has become an enormously popular practice among millions of


computer users. Until Java, however, the content of information on the
Internet has been a bland series of HTML documents. Web users are hungry
for applications that are interactive, that users can execute no matter what
hardware or software platform they are using, and that travel across
heterogeneous networks and do not spread viruses to their computers. Java
can create such applications.

3.3.1.2 WORKING OF JAVA

For those who are new to object-oriented programming, the concept of a


class will be new to you. Simplistically, a class is the definition for a
segment of code that can contain both data and functions.

When the interpreter executes a class, it looks for a particular method by the
name of main, which will sound familiar to C programmers. The main
method is passed as a parameter an array of strings (similar to the argv[] of
C), and is declared as a static method.

To output text from the program, we execute the println method of


System.out, which is java’s output stream. UNIX users will appreciate the
theory behind such a stream, as it is actually standard output. For those who
are instead used to the Wintel platform, it will write the string passed to it to
the user’s program.

27
3.3.2. JAVA MEDIA FRAMEWORK
The Java Media Framework (JMF) is an application programming interface
(API) for incorporating time-based media into java application and applets.
This guide is intended for java programmers who want to incorporate time-
based into their application and for technology providers who are interested
in extending JMF and providing JMF plug-ins to support additional media
types and perform custom processing and rendering.

The JMF 1.0 API enables programmers to develop java program that
presented time-based media .The JMF2.0 API extends the framework to
provide support for capturing and storing media data, controlling the type of
processing that is performed during addition ,JMF2.0 defines a plug-in API
that enables advanced developers and technology providers to more easily
customize and extend JMF functionality.

JMF provides a unified architecture and managing protocol for managing


the acquisition, processing and delivery of time-based media data. JMF is
designed to support most std media content types, such as AIFF, AV, AVI,
GSM, MIDI , MPEG , quickTime, RMFand , WAV.

By exploiting the advantage of the java platform, JMF delivers the promise
of “write once,Run Anywhere”, to developers who want to use media such
as audio and video in their java programs. JMF provides a common cross
platform java API for accessing underlying media frame works. JMF
implementation can leverage the capabilities of the underlying OS, while
developers can easily create portable java programs that features time-based
media by writing to the JMF API.

28
With JMF , creating applets and applications that present, capture,
manipulate and store time-based media .The frame work enables advanced
developers and technology provides to perform custom processing of raw
media data and seamless extend JMF to support additional content types and
formats, optimize handling of supported formats and create new presentation
mechanism.

HIGH-LEVEL ARCHITECTURE

Devices such as tape decks and VCRs provide a familiar model for
recording, processing and presenting time based media. When playing a
movie using a VCR, it provides the media stream to the VCR by inserting a
video tape. The VCR reads and interprets the data on tape and sends
appropriate signals to television and speakers.

JMF uses this same basic model .A data source encapsulates the media
stream much like a video tape and a player provides processing and control
mechanism similar to VCR. Playing and capturing audio and video
microphones, cameras, speakers and monitors.

FIG: 3.1 JMF BASIC MODEL

29
Data source and players are integral parts of JMF’s high-level API for
managing the capture, presentation and processing of time-based media.
JMF also provides a lower-level API that supports the seamless integration
of custom processing components and extensions. This layering provides
java developers with an easy-to-use API for incorporating time-based media
into java programs while maintaining the flexibility and extensibility
required to support advanced media application future media technology.

To present time-based media such as audio or video with JMF, player can be
used. Playback can be controlled programmatically, or it can be able to
display a control-panel comp that enables the user to control play-back
interactively. If several media streams want to be play, a separate player is to
be used for each one, to play them in sync, Player object can be used to
control the operation of others.

PLAYER

A player process as an input stream of media data renders it at a precise


time. A Data Source is used to deliver the input media stream to the player.
The rendering destination depends on the type of media being presents.

FIG: 3.2 JMF PLAYER

30
A player does not provide any control over the processing that it performs
that it performs or how it renders the media data.

PROCESSORS

Processors can also be used to present media data. A processor is just a


specialized type of player that provides control over what processing is
performed on the input media stream. A processor supports all of the same
presentation controls as a player.

In addition to rending media data to presentation devices, a Processor can


output media data through a data source so that it can be presents by another
player or processor, further manipulated by another processor, or delivered
to some other destination such a file.

FIG: 3.3 JMF PROCESSORS

EXTENSIBLITY:

JMF can be extended by Implementing customs plug-ins, media handlers


and data sources. By implementing of the JMF plug-in interfaces, can be
accessed directly and manipulated the media data associated with the
processor:

 Implementing the Demultiplexer interface enables to control how


individual tracks are extracted from a multiplexed media stream.

31
 Implementing the code interface enables to perform the processing
required to decode compressed media data, convert media data
from one format to another and encode raw media data into a
compressed for materials.

3.3.3 APACHE TOMCAT SERVER

Apache Tomcat version 6.0 implements the Servlet 2.5 and Java Server
Pages 2.1 specifications from the Java Community Process, and includes
many additional features that make it a useful platform for developing and
deploying web applications and web services.

3.3.3.1 TOMCAT ARCHITECTURE

Tomcat is a container that is made up of pluggable components that fit


together in a nested manner. Tomcat is configurable you can set such
settings to use specialized filters, change port numbers and IP address
bindings, security settings, etc. You should always change the default setting
when using in a production environment especially the security aspects.

3.3.3.2 TOMCAT DIRECTORY OVERVIEW

Directory Files Description

bin bootstrap.jar This directory hold some of the JAR files


commons- that are required when starting Tomcat, it
daemon.jar also holds the startup files themselves, the
tomcatuli.jar startup.bat used to start the Tomcat as a
startup.bat daemon process, the catalina.sh can be used

32
on a commandline and to add additional
catalina.sh
parameters to change Tomcat when starting.

conf contains security policy statements that are


implemented by the Java SecurityManager.
It replaces the java.policy file that comes
with the JVM, it prevents rogue code of JSPs
catalina.policy
from executing damaging code that can
affect the container. It is only used once
when Tomcat is launched thus you need to
restart Tomcat if you change this file

contains a list of Java packages that cannot


be overridden by executable Java code in
catalina.properties
servlets or JSPs which could be a security
risk.

this file is used by all Web applications, it


context.xml explains where the web.xml should be
accessed

this file details the logging within Tomcat,


two default configuration are setup a
logging.properties
ConsoleHandler and a FileHandler, you can
change the logging level using this file.

this is the main configuration file in Tomcat,


server.xml it is used by the "digester" to build the
container on startup

tomcat-users.xml Used for security to allow access to the

33
Administration applications section, it is
used with the default UserDatabase Realm as
referenced in server.xml.

The default web.xml file that is used by all


Web applications, it sets up the JSPServlet to
allow your applications to handle JSPs and a
web.xml
default servlet to handle static resources and
HTML files. It also sets up default session
times outs, welcome files and MIME types.

all the JAR files that the container uses are


located here, this includes Tomcat JAR's and
the servlet and JSP application programming
lib number of JAR files
interfaces (API's). Place your own JAR files
here if they will be used across all your Web
applications.

contains a number of logs files, these are


produced by JULI logging which will be
logs number of log files discussed in a later topic. The logs are
rotated each day, so you may need to clear
them down from time to time.

temp ? used for scratch files and temporary use

webapps Web app files this is were the Web application files reside,
including your own Web applications. This
is were you place your Web Application
aRchive (WAR) file, Tomcat will then

34
deploy the file. We will get into deploying
Web applications in another topic.

There are several default Web application


that come with Tomcat:

• ROOT - The welcome screen that you


saw when you first installed Tomcat.
This is a special directory called "/",
this gets removed when you move into
production. From this web you can
access all the below Web applications
• docs - contains the Tomcat
documentation
• examples - contains some JSP and
servlet examples
• host-manager - allows you to manage
the hosts that run in your application,
use the /host-manager/html URL to
access

• manager - allows you to manage your


applications in Tomcat, you can start,
stop, reload, deploy and undeploy
your applications. Use /manager/html/
URL to access
work used for temporary working files, it is used
heavy during JSP compilation where the

35
JSPs are converted to a Java servlet and
accessed through this directory.

3.3.3.3 TOMCAT ARCHITECTURE OVERVIEW

Tomcat 6 consists of a nested hierarchy of components, containers are


components that can contain a collection of other components.

FIG 3.4 ARCHITECTURE OF TOMCAT SERVER

36
The server is Tomcat, its an instance of the Web application
server, it owns a port that is used to shutdown the server (port
Server 8005). You can setup multiple servers on one node providing
they use different ports. The server is an implementation of the
Server interface, it implements the StandardServer object.

A service groups a container (usually an engine) with a set of


Connectors. The service is responsible for accepting requests,
routing them to the specified Web application and specific
Service
resources and then returning the result of the processing of the
request, they are the middle man between the clients web
browser and the container.

Connectors connect the applications to clients. They receive the


incoming requests HTTP (port 8080) or AJP (port 8009) by
Connectors
default from the clients.

The default connector is Coyote which implements HTTP 1.1.


The engine is the top-level container, it cannot be contained by
another container, thus this is the parent container for all the
containers beneath it. The engine is a request-processing
component that represents the Catalina Server Engine.
Engine
It examines the HTTP headers to determine the virtual host or
context to which requests should be passed. An engine may
contain Hosts representing a group of Web applications and
Contexts representing a single Web application i.e. a virtual host
The realm for an engine manages user authentication and
authorization. Resources uses roles to allow access, the realm
enforces the security polices. A realm applies across the whole
Realm
engine, however this can be overridden by using a realm at the
Host level or the Context level, it a object that can be superceded
by its children objects.

Valves are used to intercept


37 a request and preprocess it. They are

similar to filter mechanism of the servlet specifications but are


3.3.3.4 CONNECTOR ARCHITECTURE

All connectors work on the same principle, they have an Apache module
end(mod_jk or mod_proxy) that loads just like any other Apache module.
On the Tomcat end, each Web application instance has a connector module
component written in Java. In Tomcat 6 this is with the
org.apache.catalina.Connector class. The constructor takes one of two
connector types, HTTP/1.1 or AJP/1.3. You call the constructor indirectly
via the server.xml file using the connector and protocol tags. Depending on
what setup you have, different classes will be used.

Apache
• HTTP/1.1:
Portable
org.apache.coyote.http11.Http11AprProcotol
Runtime
• AJP/1.3: org.apache.coyote.ajp.AjpAprProtocol
(APR) is
supported

APR is not • HTTP/1.1: org.apache.coyote.http11.Http11Procotol


supported
• AJP/1.3: org.apache.jk.server.JkCoyoteHandler

The Web server handles all the static content, but when it comes across
content intended for a servlet container, it passes it to the module in question
(mod_jk, mod_proxy), the web server knows what content to pass to the
Connector module because the directives in the Web servers configuration
specify this.

38
FIG 3.5 INTERACTION BETWEEN TOMCAT SERVER AND WEB
SERVER

The Apache JServ Protocol (AJP) uses a binary format for transmitting data
between the Web server and Tomcat, a network socket is used for all
communication. The AJP packet consist of a packet header and a payload,
below is the structure of the packet.

39
As you can see, the binary packet starts with the sequence 0X1234, this is
followed by the packet size (2 bytes) and then the actual payload. On the
return path the packets are prefixed by AB (the ASCII codes for A and B),
the size of the packet and then the payload.

The HTTP protocol is exactly as the name implies it uses the HTTP protocol
to exchange messages. You can use HTTPS but you require a SSL certificate
and make a few changes to Tomcat's configuration.

3.3.3.5 LIFECYCLE

Tomcat starts and stops the components in the order that were started, thus
when starting the parent gets started first then the children get started,
stopping is the reserve order. This is done through the Lifecycle interface:
LifecycleEvent and LifecycleListener.

The Lifecycle interface has two key methods start() and stop(), all major
components usually contain a LifecycleSupport object that manages all of
the LifecycleListener objects for that component, it is this object that

40
propagates and fires general events. The top-level component calls all of its
child's start() methods, the reverse is true when stopping. This method
allows to to stop/start Host components without affecting any other Hosts.

The LifecycleListener interface can be added at any level in the Tomcat


container that can execute specific code when a particular event is fired. By
default there are three listeners configured at the server level, they are
configured in the server.xml or context.xml file at the specific level.

Configuration

The most important file in Tomcat is the server.xml file, when Tomcat starts
it uses a version of the Apache Commons Digester to read the file, the
digester is a utility that reads XML file and creates Java objects from a set of
rules. With what you have learned above you can see that the rules in the file
follows Tomcat architecture exactly.

3.3.3.6 WORKING WITH TOMCAT SERVER

Apache Tomcat is a famous Servlet container developed at Apache


Software Foundation. This software is released under under the Apache
Software License. Anyone can use it for the development as well as
deployment of the applications. Tomcat is the official reference of
implementation of java Servlets and java Server Pages. Tomcat is very easy
to install and configure. Anyone can learn it very fast and start using the
Tomcat server for the development and deployment of the web applications.

These days many web hosting companies are providing Tomcat support on
their server. So, if you develop the application in Java technology you can

41
get any host and then deploy it on the internet. Earlier it was a difficult task
to get a good host for hosting.

3.3.3.7 DEPLOYING SERVLETS ON TOMCAT SERVER

To deploy servlets on Tomcat Server, following steps are to be taken for


example given below.

1. Create web application

To develop an application using servlets or jsp, a directory structure is to


be maintained for the example given below.

Step1: Create a web application folder (servlet-examples) under tomcat


webapps directory. The path will be C:\apache tomcat\webapps\servlets-
examples.

Step2: Create a WEB-INF folder which should be created under servlets-


examples.

Step3: Create web.xml file and classes folder under the WEB_INF folder.

2. Compile the servlet Program- Create a servlet program and compile it


on the command Prompt .The procedure is not different from any java
program. The set of classes required for writing servlets is available in
servlet-api.jar which is put into CLASSPATH.

3. Copy the Servlet class(Hello) into classes folder, which is under WEB-
INF folder.

4. Edit web.xml to include servlet's name and url pattern.

42
<servlet>
<servlet-
name>Hello</servlet-name>
<servlet-class>Hello</servlet-
class>
</servlet>
<servlet-mapping>
<servlet-
name>Hello</servlet-name>
<url-pattern>/Hello</url-
pattern>
</servlet-mapping>

5. Run Tomcat Server and execute your servlet- To run the server Go
to C:\apache tomcat\bin\startup.bat and double click on it , the server will
start up. After assuring that the server is running successfully, you can run
your servlet. To execute your servlet, open your web browser and type the
url which you have mentioned in your web.xml. The url will be like this:

http://localhost:8080/servlets-examples/Hello

3.3.4. GLASSFISH

3.3.4.1 ABOUT GLASSFISH

The GlassFish open-source application server is based on the Java Platform,


Enterprise Edition (Java EE) reference implementation and is built for
mission-critical enterprise deployments.

43
Sun GlassFish Enterprise Server enables customers to leverage the benefits
of open source with a subscription that provides support, training credits,
limited indemnification and more.

3.3.4.2 BENEFITS OF GLASSFISH

The Sun GlassFish Enterprise Server provides the foundation to develop


and deploy Java EE artifacts, including Web services. It provides value-
added services for management, monitoring, diagnostics, clustering,
transaction management, and high availability of mission-critical
services.

44
CHAPTER 4
SYSTEM DESIGN

4.1 ARCHITECTURE

FIG 4.1 ARCHITECTURE DIAGRAM FOR THE SYSTEM

when a user uploads a vlog to the database, semantic annotation will run
automatically using vlog text and relevant external resources, and sentiment
evaluation is obtained from vlog comments. After that, the vlog will be
stored in the database with the corresponding annotation and evaluation.
When a user submits a query to the search engine, the vlog search module
will access the vlog database to obtain relevant vlogs by saliency-based
matching; then, using user specified ranking strategy and clustering, the
results will be returned to the user in a well-organized manner.

45
4.2 SEQUENCE DIAGRAM

FIG 4.2 SEQUENCE DIAGRAM FOR THE SYSTEM

A sequence diagram shows an interaction between the system and its


environment, arranged in a time sequence. It shows the objects participating
in the interactions by their lifelines and the messages they exchange,
arranged in a time sequence as shown in figure. The sequence diagram is
very simple and has immediate visual appeal. It is an alternative way to
understand the overall flow of the control.

46
4.3 USE CASE DIAGRAM

FIG 4.3 USECASE DIAGRAM FOR THE SYSTEM

The use case diagram shows the relationship between the actors and the use
cases within the system. The clients are the actors who uploads and search
for the videos in the server. And the results are received by user from the
database.

47
4.4 ACTIVITY DIAGRAM

FIG 4.4 ACTIVITY DIAGRAM FOR THE SYSTEM

An activity diagram is used to provide a view of flows and what is going on


inside an use case or among several classes. Activity diagram is used to
represent a class’s method implementation as shown in figure.

48
CHAPTER 5

MODULES

5.1 MODULES

 Semantic annotation
 Content Analysis
 Sentiment evaluation
 Saliency-based matching and Ranking

5.2 MODULE EXPLANATION:

Semantic annotation
Annotation is the process of extracting informative keywords from text of
the vlog.This is necessary because the words used in the vlog texts are
arbitrary and non-standard.When a user uploads a video automatic semantic
vlog annotation is run. The textual content in a vlog mainly comprises of
title, description, and comments, among which the title and description are
closely related to the semantics of the vlog video. So we use the title and
description for annotation process. The title indicates the main topic of the
whole vlog. It is of the greatest importance for understanding the semantics
of the vlog. Therefore, we first extract annotation words from the title and
then we extract annotation words from the body of the textual content, i.e.,
the vlog’s description. It is then stored in the database.

49
Content Analysis
In the content analysis the video is split into a number frames and stored in
the database in blob form. Each frame consists of objects. Different object
from the each frame is detected and stored in the database. Thus no two
frame will have same object and this makes the searching process efficient.
The process of frame splitting is done with the help of Framesplitter class.
Sentiment evaluation
The main purpose of the sentiment evaluation process is to extract
annotation words from the user’s comments. The users can decide whether a
vlog is worth viewing based on the existing comments. Traditional
annotation models focus solely on the semantic aspect while the sentiment
aspect is totally neglected. For opinioned texts such as comments in vlog
,extra informations can be obtained through sentiment analysis. The users
comments will be predominantly text based. Sentiment evaluation is used to
obtain overall evaluation of the vlog.

Saliency-based matching and Ranking

We propose a novel saliency-based similarity matching approach for vlog


search. This is done with the help of canny edge detection algorithm.
Saliencies are nothing but the edges of an image.Even if two images are
same the saliencies will not be the same.Thus when an image is given as
input query ,this algorithm aids is resulting the exact video. After the
relevant vlogs are obtained using saliency based matching different ranking
strategies are adopted. Finally the ranked vlogs are clustered according to
the category information to further facilitate users browsing.

50
5.3 DATA FLOW DIAGRAMS:

Level 0:

User

Login Server

Data
Base

Registration Home Page

FIG 5.3.1 LEVEL 0 DFD FOR THE SYSTEM

This is the Level 0 DFD for the system.When the user visits the login page
,the server checks with the database. If he is a registered user then he can
access the home page if not he has to register to access the videos.

51
Level 1:

User Upload Server

Annotation
process

Content

Analysis

Stored in

Data base

DataBase

FIG 5.3.2 LEVEL 1 DFD FOR THE SYSTEM

This is the Level 1 DFD for the system. When a user uploads a video
automatic semantic vlog annotation is run. Content analysis is performed
where the video is split into a number frames and stored in the database in
blob form.

52
Level 2:

Sends Request to

User Search Server

Data

Base
Results

FIG 5.3.3 LEVEL 2 DFD FOR THE SYSTEM

This is the Level 2 DFD for the system. When the user sends a request for
searching the video,The server checks in the database. Finally the results are
given to the user in a ranking manner.

53
CHAPTER 6
TABLES

6.1 USERLOGIN

FIELD NAME DATA TYPE


Username Varchar 2
Password Varchar2

6.2 VIDEOUPLOAD

FIELD NAME DATA TYPE


Videoname Varchar2
Title Varchar2
Description Varchar2
Comments Varchar2
Count Number
Path Varchar2
Video blob

54
CHAPTER 7

TESTING

7.1 SOURCE CODE TESTING

This examines the logic of the system. If we are getting the output that is
required by the user, then we can say that the logic is perfect.

7.2 MODULE LEVEL TESTING

In this the error will be found in each individual module, it encourages the
programmer to find and rectify the error without affecting other modules.

7.3 UNIT TESTING

Unit testing is conducted to verify the functional performance of each


modular component of the software. Unit testing focuses on the smallest unit
of the software design (i.e.), the module.

7.4 INTEGRATION TESTING

Integration testing is a systematic technique for constructing the program


structure while at the same time conducting tests to uncover errors
associated with. Individual modules, which are highly prone to interface
errors, should not be assumed to work instantly when we put them together.
The problem of course, is “putting them together”- interfacing. There may
be the chances of data lost across on another’s sub functions, when
combined may not produce the desired major function; individually
acceptable impression may be magnified to unacceptable levels; global data
structures can present problems.

55
7.5 FUNCTIONAL TEST

Functional test cases involved exercising the code with nominal input values
for which the expected results are known, as well as boundary values and
special values, such as logically related inputs, files of identical elements,
and empty files.

Three types of tests in Functional test:

 Performance Test
 Stress Test
 Structure Test

7.5.1 PERFORMANCE TEST

It determines the amount of execution time spent in various parts of the unit,
program throughput, and response time and device utilization by the
program unit.

7.5.2 STRESS TEST

Stress Test is those test designed to intentionally break the unit. A Great
deal can be learned about the strength and limitations of a program by
examining the manner in which a programmer in which a program unit
breaks.

7.5.3 STRUCTURE TEST

Structure Tests are concerned with exercising the internal logic of a program
and traversing particular execution paths. The way in which White-Box test
strategy was employed to ensure that the test cases could Guarantee that all

56
independent paths within a module have been have been exercised at least
once.

 Exercise all logical decisions on their true or false sides.


 Execute all loops at their boundaries and within their
operational bounds.
 Exercise internal data structures to assure their validity.
 Checking attributes for their correctness.
 Handling end of file condition, I/O errors, buffer problems and
textual errors in output information.
7.6 WHITE BOX TESTING

This testing is also called as Glass box testing. In this testing, by knowing
the specific functions that a product has been design to perform test can be
conducted that demonstrate each function is fully operational at the same
time searching for errors in each function. It is a test case design method that
uses the control structure of the procedural design to derive test cases. Basis
path testing is a white box testing.

7.7 BLACK BOX TESTING

In this testing by knowing the internal operation of a product, test can be


conducted to ensure that “all gears mesh”, that is the internal operation
performs according to specification and all internal components have been
adequately exercised. It fundamentally focuses on the functional
requirements of the software.

57
7.8 USER ACCEPTANCE TESTING

User acceptance of the system is key factor for the success of any system.
The system under consideration is tested for user acceptance by constantly
keeping in touch with prospective system and user at the time of developing
and making changes whenever required. This is done in regarding to the
following points.

• Input screen design.


• Output screen design.

58
CHAPTER 8
C
ONCLUSION AND FUTURE ENHANCEMENT

8.1 CONCLUSION:

In a vlog’s annotation, we extract informative keywords not only from the


textual content of the target vlog itself but also from external recourses
which are semantically and visually relevant to it; besides semantic
annotation, we obtain sentiment evaluation from comments as guidance for
vlog browsing In the user-oriented vlog search, we adopt saliency-based
matching to make the search results more agreeable to users; we use
different ranking strategies are adopted according to the user’s specific
interest.

8.2 FUTURE ENHANCEMENTS

The proposed system results in a video given a frame of the video as input.It
can further be enhanced by giving any object in the frame as input resulting
in the exact video.

59
APPENDIX 1

SCREEN SHOTS

This is the home page for video blog where the user must signup to access
the videos.

60
This is the members login page.

61
This is the video upload page where the user uploads the video by giving
title, description and comment. While uploading the video frames will be
splitted and stored in database after automatic semantic annotation.

62
This is the page for searching the video. The user browses the image and
search for the video.

63
This the page for changing the password. If the user wishes to change his
password he can do so by providing a new password.

64
APPENDIX 2

SAMPLE CODING

MONTY TAG:

import java.util.*;
import montytagger.JMontyTagger;

public class montytag


{
JMontyTagger mon=new JMontyTagger();
public montytag()
{
}

public static void main(String[] args)


{
//new montytag("this video is super");
String str="sachin&dravid";
//new montytag(str);
}
public Vector method(String strr)
{
Vector vv=new Vector();
try
{
String sr=mon.Tag(strr);
Vector v=new Vector();
StringTokenizer st=new StringTokenizer(sr);
while(st.hasMoreElements())
{
v.add(st.nextElement());
}
System.out.println(v);

for(int i=0;i<v.size();i++)
{
String s=(String)v.get(i);
if(s.contains("JJ")||s.contains("NNP")||
s.contains("NN")||s.contains("CD")||s.contains("VBD"))
{
StringTokenizer stt=new
StringTokenizer(s,"/");
String t1=stt.nextToken();

65
String t2=stt.nextToken();
vv.add(t1);
}
}
System.out.println("Final vecu"+vv);

}catch (Exception e)
{
e.printStackTrace();
}
return vv;
}

VIDEO UPLOAD:
import java.io.*;
import java.sql.*;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.util.*;

/**
* servlet implementation class upload
*/
public class upload extends HttpServlet {
private static final long serialVersionUID = 1L;
public Connection conn;
public montytag mm=new montytag();
Video_FrameSplitter vfs;

public void doPost(HttpServletRequest request,HttpServletResponse


response) throws ServletException, IOException
{
try {
vfs=new Video_FrameSplitter();
Properties p=new Properties();

//System.out.println(request.getRealPath("/"));
FileInputStream fis=new FileInputStream("C:/Program
Files/Apache Software Foundation/Tomcat 6.0/webapps/
VideoBlogfull/src/Database.properties");
p.load(fis);

66
String system=p.getProperty("system");
String username=p.getProperty("username");
String password=p.getProperty("password");
String video=request.getParameter("Browse");
System.out.println("video path "+video);
vfs.Video_SplitterMethod("d:/f.avi");
/*String title=request.getParameter("Title");
Vector vt = mm.method(title);
String stit=vt.get(0).toString();
String desc=request.getParameter("Description");
Vector vd = mm.method(title);
String sdes=vd.get(0).toString();
String comm=request.getParameter("Comment");
Vector vc = mm.method(title);
String scomm=vc.get(0).toString();
DriverManager.registerDriver( new
oracle.jdbc.driver.OracleDriver());
conn =
DriverManager.getConnection("jdbc:oracle:thin:@"+system+"",username,passwo
rd);
//System.out.println("in database");*/

} catch (Exception e) {
e.printStackTrace();
}
}
}

DBSTORE

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.ObjectOutputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.Statement;
import java.util.Vector;

public class DBstore {

67
public DBstore( String video,Vector vec,String tit,String des,String
comm)
{
try
{
DriverManager.registerDriver( new
oracle.jdbc.driver.OracleDriver() );
Connection conn =
DriverManager.getConnection( "jdbc:oracle:thin:@ramarathinam","system","redh
at");

System.out.println(video+vec+tit+des+comm);
System.exit(0);
Statement stmt = conn .createStatement();

ByteArrayOutputStream baos = new ByteArrayOutputStream();


ObjectOutputStream objOstream = new
ObjectOutputStream(baos);
objOstream.writeObject(vec);
objOstream.flush();
objOstream.close();

byte[] bArray = baos.toByteArray();

System.out.println("*** bArray = " + bArray);

PreparedStatement objStatement = conn


.prepareStatement("insert into samp(video,frameobj) values (?,?)");
File newfile=new File("d:/f.avi");
String filename="d:/f.avi";
String finame=newfile.getName();

System.out.println(" Video File NAme & Path ::::::::::: "+


filename);

InputStream fis=new FileInputStream(filename);


System.out.println(" Video File Length :
"+newfile.length());

System.out.println(" File InputStream "+fis.available());

objStatement.setBinaryStream(1,fis,(int)newfile.length());
objStatement.setBytes(2, bArray);

68
objStatement.execute();
System.out.println("stored");}
catch(Exception e){e.printStackTrace();}}}

SEARCH IMAGE:

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.sql.*;
import java.util.*;
import java.awt.image.BufferedImage;
import javax.imageio.ImageIO;
import javax.servlet.http.HttpSession;
public class Searchimg extends HttpServlet
{
public Connection conn;
HttpSession hs;
TreeMap tm1 ;
TreeMap tm2 ;
TreeMap tm3 ;
TreeMap tm4 ;
TreeMap tm5 ;
TreeMap tm6;
Vector v2 ;
Vector v3 ;
Vector fp ;
Vector trueval;
Vector resv;

ConPixel cp=new ConPixel();


public void doPost(HttpServletRequest request,HttpServletResponse
response)throws ServletException, IOException
{
try
{
tm1=new TreeMap();
tm2=new TreeMap();
tm3=new TreeMap();
tm4=new TreeMap();
tm5=new TreeMap();
tm6=new TreeMap();
v2=new Vector();
v3=new Vector();

69
fp=new Vector();
trueval=new Vector();
resv=new Vector();
hs=request.getSession(true);
String filepath=request.getParameter("Browse");
conncre();
String ip=filepath;
System.out.println("ip "+ip);
Statement st=conn.createStatement();
ResultSet rs=st.executeQuery("select * from
videoupload");
while(rs.next())
{
byte vdata[]=rs.getBytes(1);
String vpa=rs.getString(2);
int cnt=rs.getInt(3);
String tt=rs.getString(4);
String dd=rs.getString(5);
String cc=rs.getString(6);
String fname=rs.getString(7);
tm1.put(vpa,new Integer(cnt));
tm2.put(vpa,vdata);
tm4.put(vpa,fname);
tm5.put(fname,vdata);
fp.add(vpa+"*"+cnt);
}
System.out.println("tm1"+tm1);
System.out.println("tm2"+tm2);
System.out.println("tm4"+tm4);
System.out.println("tm5"+tm5);
System.out.println(fp);
for(int u=0;u<fp.size();u++)
{
String val=fp.get(u).toString();
StringTokenizer sto=new
StringTokenizer(val,"*");
String t1=sto.nextToken();
String t2=sto.nextToken();
File f=new File(ip);
CannyEdgeDetector detector = new
CannyEdgeDetector();
BufferedImage frame = ImageIO.read(f);
detector.setLowThreshold(0.5f);
detector.setHighThreshold(1f);
//apply it to an image
detector.setSourceImage(frame);

70
detector.process();
BufferedImage edges =
detector.getEdgesImage();
int c[]=cp.canpix(edges);
int cnn=Integer.parseInt(t2);
for(int i=1;i<=cnn;i++)
{
String ps=t1+"/"+"pic"+i+".jpg";
//String ps=t1;
File f1=new File(ps);
CannyEdgeDetector detector1= new
CannyEdgeDetector();
BufferedImage
frame1=ImageIO.read(f1);
detector1.setLowThreshold(0.5f);
detector1.setHighThreshold(1f);
//apply it to an image
detector1.setSourceImage(frame1);
detector1.process();
BufferedImage edges1 =
detector1.getEdgesImage();
int c1[]=cp.canpix(edges1);
boolean bb=comp(c,c1);
System.out.println(i+" "+bb);
if(bb)
{
System.out.println("*******
OH its matched *******");
trueval.add(t1);
i=cnn;
}
}
}
System.out.println("dddddddddd"+trueval);
// getvi(trueval,request,response);
dis(trueval,request,response);

}catch(Exception e)
{e.printStackTrace();}

}
public void conncre()throws Exception
{
DriverManager.registerDriver( new
oracle.jdbc.driver.OracleDriver());

71
conn =
DriverManager.getConnection("jdbc:oracle:thin:@"+"ramarathinam"+"","system
","redhat");
}
public boolean comp(int[] a,int[] b)
{
int t=0;
int f=0;
for(int k=0;k<a.length;k++)
{

if(a[k]==b[k])
{
t++;
}
else
{
f++;
}
}
int sum=t+f;
float per;
per=(t*100)/sum;

if(per>98.0)

return(true);
else
return(false);
}
public void dis(Vector dd,HttpServletRequest
request,HttpServletResponse response)throws Exception
{
Vector v1=new Vector();
v1.addAll(dd);
System.out.println("v11111111"+v1);
for(int b=0;b<v1.size();b++)
{
String cval=v1.get(b).toString();
v3.add(tm4.get(cval));
}
System.out.println(v3);

for(int r=0;r<v3.size();r++)
{
String co=v3.get(r).toString();

72
byte bb[]=(byte[]) tm5.get(co);

String filename=co;
String fpath="d:/Temp/"+filename;
//FileOutputStream fos=new
FileOutputStream("d:/Temp/"+filename);
//fos.write(bb);
tm6.put(co,fpath);

}
System.out.println(tm6);
RequestDispatcher view =
request.getRequestDispatcher("Results.jsp");
hs.setAttribute("result",tm6);
view.forward(request, response);
}
/* public void getvi(Vector v,HttpServletRequest
request,HttpServletResponse response)throws Exception
{
Vector v1=new Vector();

v1.addAll(v);
System.out.println(v1);
for(int rr=0;rr<v1.size();rr++)
{
String val=v1.get(rr).toString();
v2.add(tm2.get(val));
}
System.out.println(v2);
for(int r=0;r<v2.size();r++)
{
byte b[]=(byte[]) v2.get(0);
String filename="f"+r+".avi";
String fpath="d:/Temp/"+filename;
FileOutputStream fos=new
FileOutputStream("d:/Temp/"+filename);
fos.write(b);
tm3.put(filename,fpath);
resv.add(filename);
}
System.out.println("tm3"+tm3);
System.out.println("resv "+resv);
RequestDispatcher view =
request.getRequestDispatcher("Results.jsp");
hs.setAttribute("result",resv);
view.forward(request, response);}*/}

73
PLAY VIDEO:

import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

/**
* servlet implementation class playvideos
*/
public class playvideos extends HttpServlet {
TypicalPlayerApplet tp;

private static final long serialVersionUID = 1L;

public void doPost(HttpServletRequest request, HttpServletResponse


response) throws ServletException, IOException {
response.setContentType("text/html");
System.out.print("rrrrrrrrrrrrrr");

String path=request.getParameter("vpath");
System.out.println("path "+path);
tp=new TypicalPlayerApplet(path);

}
}

LOGIN CHECK:

import java.io.*;
import java.sql.*;
import javax.servlet.*;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpSession;

/**
* servlet implementation class upload
*/
public class Logincheck extends HttpServlet
{
Connection conn;

74
public void doPost(HttpServletRequest request,HttpServletResponse
response) throws ServletException, IOException
{

try {

DriverManager.registerDriver( new
oracle.jdbc.driver.OracleDriver());
conn =
DriverManager.getConnection("jdbc:oracle:thin:@"+"ramarathinam"+"","system
","redhat");
String uname=request.getParameter("username");
String upass=request.getParameter("pass");
Statement st=conn.createStatement();
ResultSet rs=st.executeQuery("select * from USERLOGIN
where username='"+uname+"' and password='"+upass+"'");
if(rs.next())
{
RequestDispatcher view =
request.getRequestDispatcher("Uploadvideos.jsp");
view.forward(request, response);
}else
{
System.out.println("invalid ");
RequestDispatcher
rq=request.getRequestDispatcher("HomePage.jsp");
request.setAttribute("msg","User name and
password in invalid");
rq.forward(request,response);
}

} catch (Exception e) {
e.printStackTrace();
}
}

75
APPENDIX 3
ALGORITHM USED: CANNY EDGE DETECTION ALGORITHM

INTRODUCTION

The purpose of edge detection in general is to significantly reduce the


amount of data in an image, while preserving the structural properties to be
used for further image processing. Several algorithms exists, and this
worksheet focuses on a particular one developed by John F. Canny (JFC) in
1986 . Even though it is quite old, it has become one of the standard edge
detection methods and it is still used in research .

The aim of JFC was to develop an algorithm that is optimal with regards to
the following
criteria:

1. Detection: The probability of detecting real edge points should be


maximized while the
probability of falsely detecting non-edge points should be minimized. This
corresponds to
maximizing the signal-to-noise ratio.

2. Localization: The detected edges should be as close as possible to the real


edges.

3. Number of responses: One real edge should not result in more than one
detected edge

76
(one can argue that this is implicitly included in the first requirement).With
JFC’s mathematical formulation of these criteria, Canny’s Edge Detector is
optimal for a certain class of edges (known as step edges). The images used
are generated using this implementation.

TEST IMAGE

The image in Figure is used throughout this worksheet to demonstrate how


Canny edge detection works. It depicts a partially assembled pump from
Grundfos, and the edge detection is a step in the process of estimating the
pose (position and orientation) of the pump. The image has been
preprocessed as described in the worksheet “Ideas for Solution to the Pose
Estimation Problem”. The preprocessing includes:
• Determining ROI (Region of Interest) that includes only white background
besides the
pump, and cropping the image to this region.
• Conversion to gray-scale to limit the computational requirements.
• Histogram-stretching, so that the image uses the entire gray-scale. This
step may not be necessary, but it is included to counter-compensate for
automatic light adjustment in the used web camera.

THE CANNY EDGE DETECTION ALGORITHM

The algorithm runs in 5 separate steps:


1. Smoothing: Blurring of the image to remove noise.
2. Finding gradients: The edges should be marked where the gradients of the
image has

77
large magnitudes.
3. Non-maximum suppression: Only local maxima should be marked as
edges.
4. Double thresholding: Potential edges are determined by thresholding.
5. Edge tracking by hysteresis: Final edges are determined by suppressing all
edges that
are not connected to a very certain (strong) edge.

Each step is described in the following subsections.


1.Smoothing
It is inevitable that all images taken from a camera will contain some
amount of noise. To
prevent that noise is mistaken for edges, noise must be reduced. Therefore
the image is first smoothed by applying a Gaussian filter. The kernel of a
Gaussian filter with a standard deviation of θ = 1.4 .

2. Finding gradients

The Canny algorithm basically finds edges where the grayscale intensity of
the image changes the most. These areas are found by determining gradients
of the image.Gradients at each pixel in the smoothed image are determined
by applying what is known as the Sobel-operator. First step is to
approximate the gradient in the x- and y-direction respectively by applying
the kernels .
The gradient magnitudes (also known as the edge strengths) can then be
determined as an

78
Euclidean distance measure by applying the law of Pythagoras .It is
sometimes simplified by applying Manhattan distance measure to reduce the
computational complexity. The Euclidean distance measure has been applied
to the test image. The computed edge strengths are compared to the
smoothed image .
|G| = √|Gx| + |Gy|
|G| = |Gx| + |Gy| ()
where: Gx and Gy are the gradients in the x- and y-directions respectively.
It is obvious from Figure , that an image of the gradient magnitudes often
indicate the edges quite clearly. However, the edges are typically broad and
thus do not indicate exactly where the edges are. To make it possible , the
direction of the edges must be determined and stored .
θ = arctan|Gy|/|Gx|

3.Non-maximum suppression
The purpose of this step is to convert the “blurred” edges in the image of the
gradient magnitudes to “sharp” edges. Basically this is done by preserving
all local maxima in the gradient image, and deleting everything else. The
algorithm is for each pixel in the gradient image:
1. Round the gradient direction θ to nearest 45◦, corresponding to the use of
an 8-connected neighbourhood.
2. Compare the edge strength of the current pixel with the edge strength of
the pixel in the positive and negative gradient direction. I.e. if the gradient
direction is north (theta =
90◦), compare with the pixels to the north and south.
3. If the edge strength of the current pixel is largest; preserve the value of the
edge strength. If not, suppress (i.e. remove) the value.

79
A simple example of non-maximum suppression . Almost all pixels have
gradient directions pointing north. They are therefore compared with the
pixels above and below. The pixels that turn out to be maximal in this
comparison are marked with white borders. All other pixels will be
suppressed.

4.Double thresholding

The edge-pixels remaining after the non-maximum suppression step are


(still) marked with their strength pixel-by-pixel. Many of these will probably
be true edges in the image, but some may be caused by noise or color
variations for instance due to rough surfaces. The simplest way to discern
between these would be to use a threshold, so that only edges stronger that a
certain value would be preserved. The Canny edge detection algorithm uses
double thresholding. Edge pixels stronger than the high threshold are marked
as strong; edge pixels weaker than the low threshold are suppressed and
edge pixels between the two thresholds are marked as weak.
(a) Edges after non-maximum suppression
(b) Double thresholding
In the second image strong edges are white, while weak edges are grey.
Edges with a strength below both thresholds are suppressed.

5. Edge tracking by hysteresis


Strong edges are interpreted as “certain edges”, and can immediately be
included in the final edge image. Weak edges are included if and only if they
are connected to strong edges. This of course that noise and other small

80
variations are unlikely to result in a strong edge (with proper adjustment of
the threshold levels). Thus strong edges will (almost) only be due to true
edges in the original image. The weak edges can either be due to true edges
or noise/color variations. The latter type will probably be distributed
independently of edges on the entire image, and thus only a small amount
will be located adjacent to strong edges. Weak edges due to true edges are
much more likely to be connected directly to strong edges. Edge tracking
can be implemented by BLOB-analysis (Binary Large OBject). The edge
pixels are divided into connected BLOB’s using 8-connected
neighbourhood. BLOB’s containing at least one strong edge pixel are then
preserved, while other BLOB’s are suppressed.
(a) Double thresholding (b) Edge tracking by hysteresis (c) Final output

3. Implementation of Canny Edge Detection

A few things should be noted with regards to implementation are


1. The (source) image and the thresholds can be chosen arbitrarily.
2. Only a smoothing filter with a standard deviation of θ= 1.4 is supported
(the one shown in Equation ).
3. The implementation uses the “correct” Euclidean measure for the edge
strengths.
4. The different filters cannot be applied to edge pixels. This causes the
output image to be 8 pixels smaller in each direction.
The last step in the algorithm known as edge tracking can be implemented as
either iterative or recursive BLOB analysis . A recursive implementation can
use the grass-fire algorithm.However, our implementation uses the iterative
approach. First all weak edges are scanned for neighbour edges and joined

81
into groups. At the same time it is marked which groups are adjacent. Then
all of these markings are examined to determine which groups of weak
edges are connected to strong edges (directly or indirectly). All weak edges
that are connected to strong edges are marked as strong edges themselves.
The rest of the weak edges are suppressed. This can be interpreted as BLOB
analysis where only BLOB’s containing strong edges are preserved (and
considered as one BLOB).

The figures show the complete edge detection process on the test image
including all intermediate results.

(a) Original (b) Smoothed (c) Gradient magnitudes (d) Edges after
nonmaximum
suppression

(e) Double thresholding (f) Edge tracking by hysteresis (g) Final output

82
REFERENCES

1. C. Parker and S. Pfeiffer, “Video blogging: Content to the max,” in


Proc. IEEE Multimedia, Los Alamitos, CA, 2005, pp. 4–8.

2. C. Wang, F. Jing, L. Zhang, and H. J. Zhang, “Scalable search-based


image annotation of personal images,” in Proc. 8th ACMInt.Workshop on
Multimedia Information Retrieval, Santa Barbara, CA, 2006, pp. 269–
278, ACM.

3. G. A. Miller, “WordNet: A lexical database for english,” Commun.


ACM, pp. 39–41, 1995.

4. J. Hoem, “Videoblogs as ‘collective documentary’,” in BlogTalks 2.0:


The European Conf. on Weblogs, 2004, pp. 237–270.

5. John Canny. A computational approach to edge detection. Pattern


Analysis and Machine Intelligence, IEEE Transactions on, PAMI-
8(6):679–698, Nov. 1986.

6. K. W. Church and P. Hanks, “Word association norms, mutual


information, and lexicography,” in Proc. 27th Annu. Conf.
Association for Computational Linguistics (ACL), BC, Canada, 1989,
pp. 76–83.

7. X. J. Wang, L. Zhang, F. Jing, and W. Y. Ma, “AnnoSearch: Image


auto-annotation by search,” in Proc. IEEE Computer Society Conf.
Computer Vision and Pattern Recognition,Washington, DC, 2006, pp.
1483–1490.

8. X. Zhang, C. Xu, J. Cheng, H. Lu, and S. Ma, “Automatic semantic


annotation for video blogs,” in Proc. 2008 IEEE Int. Conf.
Multimedia & Expo, Hannover, Germany, 2008.

83

You might also like