You are on page 1of 6

Common Sense and Folksonomy: Engineering an Intelligent Search System

Mohammad Nauman, Fida Hussain


City University of Science and Information Technology,
Peshawar, Pakistan
{recluze , fidamsse}@gmail.com

Abstract With the advent of Web2.0, this problem has


increased manifold. Web2.0 applications include user-
The massive growth of the Internet has resulted in centric publishing and knowledge management
increased difficulty in organizing and searching platforms like Wikis, Blogs, and social resource
through the information present on it. Several sharing systems [1]. Now, everyone with an Internet
strategies have been employed to tackle this problem. connection has the ability to share information with
With the coming of Web2.0, user-created data and peers and the global audience. Along with Web2.0
sharing of information among peers, community-based came a new technique (dubbed folksonomy) for
tagging or folksonomy has become the norm for organizing the information posted by the users.
organization of data on the Internet. The loose Folksonomy (from folk and taxonomy) means
structure of folksonomy has led to ease in structural representations created by the general public
categorization of the huge amounts of information but [1]. It refers to the attachment of tags or labels to
has also given rise to some serious problems. Users content for the purpose of organizing and searching.
tag information based on their own experiences, Folksonomy is a flat namespace with no hierarchy and
preferences and common sense. This leads to no direct relationship between its elements [2]. An
difficulties in search and organization of information ever-increasing number of web services such as
in the long run. In this paper, we argue that since user del.icio.us and flickr use folksonomy for the purpose
common sense has generated both folksonomy and a of organizing content [2]. While tags serve as a very
corpus of machine common sense, it seems effective basis for organization of content, they are
appropriate that machine common sense be used for nonetheless difficult to search through because of
addressing the problem of search in folksonomy. We variances in possible textual representation of concepts
outline an architecture for such an operation, develop [3]. Because of these variances, it is difficult to
a prototype as a proof of concept and describe how automate the process of searching through
this approach can be extended in the future. folksonomy.
We argue that folksonomy, by its definition, is
generated through the use of common sense of the
1. Introduction general public. It seems appropriate to use common
sense to search through it. We propose a solution in
which machine common sense can be used to search
The Internets massive growth in the past decade
through folksonomy in a Web2.0 service.
or so has given rise to an information revolution
The rest of the paper is organized as follows:
throughout the world. With the initial increase in
First, we give an overview of the problem of search in
Internets content came search engines and search
folksonomy. Then we describe what machine common
techniques grew more and more complicated to
sense is and shed some light on attempts to formalize
accommodate for the increased amount of information.
it. We describe the Open Mind Common Sense Project
With more and more businesses, organizations and
in detail and see how it can be used for application of
even individuals posting on the web, it became
common sense. We then outline an architecture for
extremely difficult to locate relevant information.
using machine common sense for search in
Until recently, search engines and hand- crafted
folksonomy. Finally, we discuss a prototype developed
directories were the only available solution to this
as a proof of concept along with ideas about future
problem.
work regarding this framework.
2. Problem Overview 2.2. Common Sense

2.1. Folksonomy: Community-based Tagging Common sense, in the context of Artificial


Intelligence, is a technical term which refers to the
Several strategies have been used for searching millions of basic facts and understandings possessed
for information on the web based on the scenario. On by most people knowledge machines do not have
the World Wide Web, the amount of information [9]. It is this lack of common sense that has caused
present and the evolving nature of this information machines to fail when presented with an unexpected
make it extremely difficult for a single authority to situation even though they have always been better
classify and sort through. Folksonomy or community- than humans at solving problems which can be
based tagging suggests itself for classification of described by fixed parameters and strict rules [10].
information in such a scenario [3]. It does not Creating machines with common sense is a central
however, come without some drawbacks. The major problem of artificial intelligence but is very difficult
problem with tagging information is caused by the for several reasons. These include the difficulty caused
way different individuals would link real-world by diversity in varieties, huge amounts of information
concepts with textual representation of these concepts. and the flexibility requirements posed by such
This is the problem of polysemy (using the same knowledge [11]. There is also the difficulty in the
words to describe different concepts), synonymy broad range of domains involved in common sense
(using different words to describe the same concept) reasoning. An architecture for common sense must
and variation in basic levels of categorization [3]. Not encompass domains which require reasoning about
only is this a problem with different users, it is also a temporal, spatial, physical, psychological and social
problem of recall. Users may not be able to recall matters [12]. Such an architecture must also include
correctly how they tagged the information a month inference methods regarding negative expertise,
earlier. Several strategies have been employed to knowledge retrieval and self-reflection [13]. Some of
address this issue including those based on synonyms the attempts at formalizing common sense are
and co-occurrence frequencies. Since these approaches discussed below in Related Work.
are all based on lexical analysis of terms instead of
contextual, they have had only moderate levels of 3. Related work
success [4].
Another aspect of the problem is that of ontology. 3.1. Formalization of Common Sense
An ontology is a formal specification of sets of
objects, concepts and other entities about which The most famous attempts at formalizing common
knowledge is being expressed and the relationships sense have been WordNet [14], Cyc [15] and the Open
that hold among them [5], [6]. Folksonomy is a non- Mind Common Sense project [16].
hierarchical and non-exclusive ontology [7]. While 1) WordNet: is a collection of English language
this is beneficial in many respects, it poses a serious nouns, verbs, adjectives and adverbs organized as
problem for searching. It creates a fuzzy and synonym sets, each representing a lexical concept.
ambiguous boundary between sets of different These synonym sets are linked using relations
concepts. The tags created are heavily reflective of the described in natural language [14]. WordNet contains
authors personal experiences, preferences and skill precise definitions of words and their relationships
level [3]. Searching in this scenario becomes with each other. It does not however deal with
extremely difficult. Search results ranking is biased concepts of common understanding. WordNet, for
towards content marked with popular tags. example, can be used to deduce the fact that a cat is a
Another problem with folksonomy (which it mammal but not that it is also usually a pet [11]. For
shares with traditional search systems) is that even this reason, it would not be suitable to use WordNet as
with the automation of search, the human brain still a common sense engine for search on folksonomy.
has to do much work during the search process. Search 2) Cyc: is a large-scale project aimed at compiling
engines/systems do not cater for important search sub- a universal schema of common sense information. It
processes such as examining results, extraction of has a large collection of hand crafted and logically
information and reflection and iteration on the search inferred axioms [15]. It currently has over 1.5 million
process [8]. rules and facts in a logical language called CycL [17].
A solution to all of these problems is to create a The drawback of Cyc is that it relies on time and effort
search system which is better representative of how of experts to handle common sense knowledge.
the human brain works a search system with Anyone willing to use Cyc in practical textual
common sense.
processing applications would have to invest their time 1) Number of utterances in OMCS corpus
in learning CycL and the Cyc ontology [18]. (denoted by the f value)
3) Open Mind Common Sense: is a project aimed 2) Inference from other facts in ConceptNet
at collecting common sense information from a non- (denoted by the i value) [9]
expert global audience using the World Wide Web as Figure 1 shows an example of concepts and links
an interface [16]. The motivating idea behind the Open as represented in ConceptNet.
Mind project was that common sense by its definition OMCS and ConceptNet have successfully been
is information and understanding shared by most of applied on problems such as natural language
the people. The general public therefore is the best processing [21], photo retrieval [22], and search.
source for gathering common sense information. This
is accomplished by engaging thousands of users in an Mobility

effort to collect this information through the OMCS


Shift Used for
project web site [18].
By analysing the first attempt and building on its Subevent Of Vehicle
strengths, the creators of Open Mind Common Sense Tire
Drive
project designed a more elaborate means of collecting IsA
Part Of
Used for
and processing common sense information Open Subevent Of

Mind Common Sense 2 (OMCS-2) [18]. Get direction


Car Capable Of Travel

In lessons learned from OMCS, an important one Location Of Capable Of Motivation Of


is that it is not necessary to first collect complete
common sense knowledge before embarking on the In garage Use patrol See world

utilization of the available knowledge bases in


applications [17]. Figure 1 Excerpt from ConceptNet showing links from
Learning from this, several attempts have been Car
made to utilize the common sense knowledge base
gathered from OMCS. The most important of these are
the ConceptNet [9] and LifeNet [19] projects at MIT 3.2. Application of Common Sense in Search
Media Lab. ConceptNet is the first knowledge base
extracted from the OMCS data. It consists of a The practicality of application of common sense
semantic network of concepts associated together in applications has been shown by several projects in
using links such as IsA, SubeventOf, MotivationOf and the past. Here we give a few examples of how OMCS
PartOf etc. [11] We shall return to a detailed has been used to tackle the issue of search on the web.
discussion of ConceptNet in a moment. ARIA is a search system which is used for
In contrast with ConceptNets rule-based retrieval of annotated photos. It uses ConceptNet to
reasoning methods, LifeNets design is based on expand the concepts depicted by keywords in the
probabilistic techniques. It caters for uncertainty in users query [22].
both the knowledge and in the rules [11]. It is, GOOSE is another search system based on OMCS
however, not clear how these added capabilities of which uses common sense to reformulate users
LifeNet can be put to use for solving the problem of natural language queries into effective keywords [23].
search in the domain of folksonomy. ConSearch works on a different level in the search
4) ConceptNet: ConceptNet is a freely available problem. It targets the results of a search engine and
common sense reasoning tool kit comprising of over groups them together based on similarity of concepts
250,000 elements of common sense knowledge built [8].
using semi structured natural language fragments. It While all these applications of common sense are
builds on the simplicity of WordNets ontology of innovative and robust, they lack one thing: they fail to
semantic relations (described using natural language) utilize sources of information other than the common
and on Cycs diversity of relations and concepts [20]. sense corpus itself. In the following section, we
In essence, it combines the best of both worlds by propose an approach which, in addition to common
creating a powerful and expressive ontology while still sense, utilizes another vast source of user-generated
keeping the reasoning and inference logic simple. information folksonomy.
The current version of ConceptNet is 2, consisting
of 1.6 million assertions. There are twenty relation- 4. Common Sense And Folksonomy
types in ConceptNet grouped in K-LINES, THINGS,
AGENTS, EVENTS, SPATIAL, CAUSAL, FUNCTIONAL and The simplicity of the OMCS project and the freely
AFFECTIVE thematics. Scores are assigned to relations available tool kit for common sense reasoning in the
based on two criteria: form of ConceptNet provide ample opportunity to
anyone aiming to use textual processing in their e) The results are displayed to the user labelled
applications. with the associated concept and the user has the option
On the other hand, community based content is of picking one of the expanded concepts for further
another vast source of information. The tags of search. Figure 3 describes this flow of control in the
folksonomy are already being utilized for the purpose proposed solution.
of search and organization. We believe that in
combining these two sources of arrangement and Get User Query
inference, a powerful mix can be developed for an
efficient search system for Web2.0. Pick options
Also, the source of OMCS projects common
sense information and that of tags in folksonomy is the Get related concepts
same. Users of the World Wide Web have contributed
to both knowledge bases. We believe it would be very
beneficial to apply the common sense information Perform search
present in OMCS through the ConceptNet tool kit for User picked
addressing the problem of searching through another concept from
folksonomy. matched concepts
Apply score function

4.1. Central Problems of Folksonomy

The central problems in folksonomy are two:


Display results
1) Lack of contextual information and
2) Limited inference capabilities due to its loose
ontology Figure 3 Flow of Control in Proposed Framework
We propose a framework in which both these
issues are addressed through the application of
common sense ontology. Figure 2 summarises this 4.3. User Preferences
mapping.
1) Generality/Specification: Let Conceptual
User Similarity (C) of a concept be the closeness of the
Generated by
User Common Sense concept to the original keyword. This closeness of
Common Sense Ontology concept is decided by the f and i values returned by
Mapped by
Machine Common Sense
ConceptNet. The i values are multiplied by 10 because
Folksonomy they are usually an order of magnitude smaller than
User Common Sense
description
the f values but are still very important.
Content

C ( x ) = f ( x ) + (10 i ( x ) )
Figure 2 Architecture of Proposed Framework
The search engines score (S) is the score assigned
4.2. Flow of Control by the search service to a specific search result. C and
S are both normalized to lie between 0 and 1 to
An outline for using common sense on accommodate differences in ranking scale of different
folksonomy is proposed below. search services. Let and be normalized values of C
a) The user enters a search keyword and is and S respectively. Then:
presented with search options. These would decide the
C ( x)
generality or specification of the concept association ( x) =
and the relation-types used for expansion of concepts. max( C ( x) )
b) ConceptNet tool kit is used for concept S ( x)
expansion based on users preferences. ( x) =
c) A search is performed to get a set of result max( S ( x) )
items for each resulting concept.
d) A score function is applied to the union of The user enters the required level of generality
these sets to get a final listing of results. (G) of the combined results. A more specific
preference would give more weight to the items
conceptually more similar to the original keyword i.e.
those with higher values. A general preference would
rank those items higher which have high ranking in the
search service i.e. those with higher values. This
preference is implemented through the score function.
Keyword and
2) Relation-types: The user can also specify the Generality

relation-types of ConceptNet for advanced control


over concept expansion. The user may decide to
Relation
expand concepts along a limited set of relations for Types
better targeting of the results. For example, the user
may prefer to search along the IsA relation-type along
with PartOf and not along MotivationOf relation-type.
4.4. Score Function Figure 4 User Interface of the Prototype

At the heart of the search results presentation is Expanded Concepts: car(0.03:0.12), vehicle(1:1), automobile(0.24:0), engine(0.41:0), Expanded
four wheel(0.28:0),
Concepts
the score function. It is a function which assigns a
relative score to each individual result item based on
conceptual similarity and service rank of the item and Ranked
on user preferences of generality. Results

n
score( x ) = inst _ score( xi ) (1)
i =1 Figure 5 Concepts and Results in the Prototype

Where x is an individual item returned by the


search service, xi are instances of the item x returned 6. Future Work
as responses to different concepts and n is the total
number of these instances. The summation ensures This paper has outlined a framework for
that if an item appears as a result of more than one utilization of machine common sense for addressing
concepts, it gets a higher score in the final listing. the problem of search in folksonomy. While the
The score of a single instance of a result item is application of this technique is feasible as shown by
given by inst_score. inst_score is a function of , the proof of concept, several variables will have to be
and G where and are normalized values of adjusted to fit this approach to real life applications.
conceptual similarity and service rank respectively and Two aspects of the framework which could benefit
G is the level of generality specified by the user. from a thorough study are the concept expansion
algorithm and the score function.
inst _ score( xi ) = s ( ( x) , ( x), G ) (2) Concept expansion currently uses the nave
approach of expansion and selection based on the f
At this point, we leave s undefined so that it may value. A more sophisticated concept expansion
be tailored according to specific situations. We algorithm should result in concepts with higher
provide an example of this in Proof of Concept. conceptual similarity to the users query.
Score function currently uses a summation
5. Proof of Concept function for assigning values to items appearing as
results for different concepts. A statistical analysis of
To demonstrate the practical feasibility of the different functions may lead to a better ranking system
framework, a prototype was developed using the photo for such result items.
sharing service flickr (www.flickr.com) as a case Another important open issue in this framework is
study. The prototype was implemented using the use of a natural language interface as the front-end.
ConceptNets knowledge base and flickrs API. For ConceptNet comes with a powerful tool kit for natural
the inst_score function of Equation 2, we used a language processing, MontyLingua [21]. Integrating a
simple linear function: natural language front-end with this system may yield
fruitful results.
inst _ score( xi ) = (G ( xi ) ) + (1 G ) ( x )
7. Conclusion
Figures 4 and 5 show the user interface and
The information overload caused by the coming
ranked results generated by the prototype.
of user-created data on Web2.0 can only be addressed
by utilizing all available resources for search and [10] P. Singh, M. Minsky, and I. Eslick, Computing
organization. User created organization of data has Commonsense, BT Technology Journal, vol. 22, no. 4,
produced acceptable levels of results but still has pp. 201210, 2004.
problems because of variances in users creating this [11] P. Singh, B. Barry, and H. Liu, Teaching Machines
about Everyday Life, BT Technology Journal, vol. 22,
organization. A possible solution to this problem is the no. 4, pp. 227240, 2004.
application of machine common sense to the problem [12] P. Singh and M. Minsky, An architecture for
of search. In this paper we have outlined a framework combining ways to think, Integration of Knowledge
for using the Open Mind Common Sense project to Intensive Multi-Agent Systems, 2003. International
address the issue. This is done through the use of Conference on, pp. 669674.
ConceptNet, a freely available tool kit for machine [13] M. Minsky, Commonsense-based interfaces,
common sense, for search in folksonomy. Practical Communications of the ACM, vol. 43, no. 8, pp. 6673,
feasibility is demonstrated by an application of this 2000.
technique on a popular Web2.0 service. An outline for [14] C. Fellbaum, Wordnet: an electronic lexical database.
MIT Press, 1998.
future work along this path has also been given. [15] D. Lenat, CYC: A Large-Scale Investment in
Knowledge Infrastructure, Communications of the
ACM, vol. 38, pp. 3338, 1995.
References [16] P. Singh, The public acquisition of commonsense
knowledge, Proceedings of AAAI Spring Symposium:
[1] C. Schmitz, A. Hotho, R. Jaschke, and G. Stumme, Acquiring (and Using) Linguistic (and World)
Mining association rules in folksonomies, Knowledge for Information Access, 2002.
Proceedings of the IFCS 2006 Conference. [17] P. Singh and B. Barry, Collecting commonsense
[2] A. Mathes, Folksonomies-Cooperative Classification experiences, Proceedings of KCAP03, 2003.
and Communication Through Shared Metadata, [18] P. Singh, T. Lin, E. Mueller, G. Lim, T. Perkins, and
Computer Mediated Communication, LIS590CMC W. Zhu, Open Mind Common Sense: Knowledge
(Doctoral Seminar), Graduate School of Library and acquisition from the general public, Proceedings of the
Information Science, University of Illinois Urbana First International Conference on Ontologies,
Champaign, December, 2004. Databases, and Applications of Semantics for Large
[3] S. Golder and B. Huberman, The Structure of Scale Information Systems, 2002.
Collaborative Tagging Systems, Arxiv preprint [19] P. Singh and W. Williams, LifeNet: a propositional
cs.DL/0508082, 2005. model of ordinary human activity, Proceedings of the
[4] H. Lieberman and H. Liu, Adaptive Linking between Workshop on Distributed and Collaborative Knowledge
Text and Photos Using Common Sense Reasoning, Capture (DC-KCAP) at KCAP, 2003.
Conference on Adaptive Hypermedia and Adaptive Web [20] H. Liu and P. Singh, Commonsense reasoning in and
Systems, 2002. over natural language, Proceedings of the 8th
[5] T. Gruber, Toward principles for the design of International Conference on Knowledge-Based
ontologies used for knowledge sharing? International Intelligent Information & Engineering Systems (KES-
Journal of Human Computer Studies, vol. 43, no. 5-6, 2004), 2004.
pp. 907928, 1995. [21] H. Liu, MontyLingua: Commonsense-Enriched NLP,
[6] N. Noy and C. Hafner, The State of the Art in Toolkit and API, Accessed at: http://web. media. mit.
Ontology Design, AI Magazine, vol. 18, no. 3, pp. 53 edu/ ~hugo/montylingua/, 2002.
74, 1997. [22] H. Liu and H. Lieberman, Robust photo retrieval using
[7] Z. Xu, Y. Fu, J. Mao, and D. Su, Towards the world semantics, Proceedings of LREC2002
Semantic Web: Collaborative Tag Suggestions, Workshop: Using Semantics for IR, Canary Islands,
Collaborative Web Tagging Workshop at WWW2006, 2002.
Edinburgh, Scotland, May, 2006. [23] H. Liu, H. Lieberman, and T. Selker, GOOSE: A
[8] C. Lee and H. Lieberman, ConSearch: An Concept- Goal-Oriented Search Engine With Commonsense,
Associating Search Interface using Commonsense. Proceedings of the Second International Conference on
[9] H. Liu and P. Singh, ConceptNet: A Practical Adaptive Hypermedia and Adaptive Web-Based
Commonsense Reasoning Tool-Kit, BT Technology Systems, pp. 253263, 2002.
Journal, vol. 22, no. 4, 2004.

You might also like