• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
On the TREC Blog Track
Iadh Ounis, Craig Macdonald
and
Ian Soboroff 
University of Glasgow and NISTUK and USA
{
ounis,craigm
}
@dcs.gla.ac.uk
and
ian.soboroff@nist.gov
Abstract
The rise of blogging as a new grassroots publishing mediumand the many interesting peculiarities that characterise blogscompared to other genres of documents opened up severalnew interesting research areas in the information retrievalfield. The Blog track was introduced in 2006 as part of therenowned Text REtrieval Conference (TREC) evaluation fo-rum, to drive research on the blogosphere and to facilitate ex-perimentation and evaluation of blog search techniques. Thispaper reports on two years of the Blog track at TREC. Wedescribe the blog search tasks investigated at TREC, and dis-cuss the main lessons we learnt from the track. We concludethe paper with discussions of the broader implications of theBlog track lessons and possible directions for the future, withthe aim to uncover and explore the richness of informationavailable in the blogosphere.
Introduction
The growth of interest in blogs, and the richness of informa-tion available on the blogospherehas opened up several newinteresting research areas in the information retrieval (IR)field. Indeed, the need to have appropriate retrieval tech-niques to track and find out about the way bloggers react toproducts, trends and events as they unfold raises some chal-lenging problemsin IR. In particular,the problem of retriev-ing and analysingnon-factualaspects of informationsuch asopinions, sentiments, perspectives or personal experiencesremains open.The development of new and appropriate retrieval tech-niques for the blogosphere requires a suitable infrastructurefor experimentation and evaluation along with realisticdatasets. The IR community has a long standing history inexperimentation and evaluation (Voorhees 2007), as exem-plified by the annual Text REtrieval Conference (TREC), aninternationally acclaimed forum organised by the NationalInstitute of Standards and Technology (NIST, USA), since1992. TREC aims to support IR research by providing theinfrastructure necessary for large-scale evaluation of textretrieval techniques and approaches. The idea behind TRECis to evaluate IR systems on standard and controlled testcollections. A test collection consists of three components:a collection of documents; an associated set of information
Copyrightc
2008, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.
need statements called “topics”; and a set of human rele-vance judgements stating which documents are relevant forwhich topics. A reusable test collection with its associatedtopics and relevance judgements is very important inencouraging IR research. It allows for the reproducibilityof results, and facilitates the further development of newretrieval techniques, the effectiveness of which can be com-pared to other existing techniques on the same collection.Given the large size of the collections, the relevance judgements cannot be exhaustive. Instead, TREC uses aprocess called “pooling” (Sp¨arck Jones & van Rijsbergen1975), where for each topic the assessors do not judge alldocuments in the collection, but only the top-ranked doc-uments (usually 100) by the set of the participating searchengines. Traditionally, TREC search tasks are describedas “adhoc”, where the retrieval effectiveness of a system isassessed by its ability to rank, for each topic, as many aspossible of the documents judged relevant by the humanassessors above the rest of the documents. Adhoc searchtasks are usually evaluated using mean average precision(MAP) over all topics. Average precision (AP) for one topicis mean of the precision values calculated as each relevantdocument is retrieved.TREC is organised in “tracks”, each addressing specificsearch tasks on a given collection of documents of differentgenres, ranging from newswire articles to Web pages, emailmessages and video clips. In TREC 2006, a new Blog track was introduced. It aims to support research in exploringthe information seeking behaviour in the blogosphere. TheBlog track addressed two main search tasks: Firstly, theopinion finding task addresses a key feature that distin-guishes blog contents from the factual content traditionallyused in other TREC search tasks, namely the subjectivenature of blogs (Mishne & de Rijke 2006); secondly, theblog distillation task is concerned with the search of blogsrather than blog posts.This paper draws conclusions from two years of theBlog track in TREC 2006 (Ounis
et al.
2007) and TREC2007 (Macdonald, Ounis, & Soboroff 2008), assessingthe progress made so far, and proposing new researchdirections. The next section describes the TREC Blogtrack and its corresponding Blog06 test collection. Thisis followed by a section that discusses the opinion findingtask. We summarise the main effective opinion detection
 
approaches deployed by the participating search enginesand provide insights on how well the systems performacross a variety of topic categories. We also report theextent to which spam infiltrates the retrieved blog postsacross the topic categories. The following section describesthe blog distillation search task. We summarise the mainretrieval techniques used for feed search, and also report theperformance of the participating search engines and howspam infiltrates the retrieved blogs across a variety of topiccategories. The penultimate section discuss the lessonslearnt from the first two years of the Blog track, and proposea set of possible interesting future search tasks. Finally, weconclude with the broader implications of the Blog track.
Blog Track at TREC
Similar to all other TREC tracks, the Blog track aims to actas an incubator of new research work, creating the requiredinfrastructure to facilitate research into the blogosphere. In-deed, since its creation, the TREC blog track has aimed todefine suitable search tasks on the blogosphere. The grass-roots (“non-certified”)nature of the blogospherebrings withit a number of challenges. Some are technical, such as spamand widespread duplication. Others are more fundamentalsuch as the fact that subjectivity is a core aspect of the blogsand the blog queries (Mishne & de Rijke 2006). As a con-sequence, the Blog track has investigatedhow the subjectivenature of blogs and blog queries can be incorporatedand ex-ploited in the retrieval context. The blog track addressed theopinionfindingtask, whichis anarticulationof a usersearchtask, where the information need could be of an opinion,or perspective-finding nature, rather than fact-finding. Theopinion finding task was extended in TREC 2007 with anopinion polarity subtask, where the polarity (or orientation)of the opinions in the retrieved documents must also be re-turned. Some blog search engines allow users to search forauthoritative feeds about a given topic. In TREC 2007, weintroduced the Blog distillation task, which evaluates sys-tems on how good they are at finding useful and principleblogs relating to a given topic.As mentioned in the introduction, an important aspect of the TREC paradigm is the creation of a suitable test collec-tion, where the experiments could be conducted and a sys-tem’s retrieval performance can be evaluated. As a conse-quence, the Blog track created the Blogs06
1
test collection.The creation process of the collection and its main featuresare detailed in (Macdonald & Ounis 2006). The Blogs06collection representsa largesample crawledfrom the blogo-sphere over an eleven week period from the 6th December2005 until the 21st February 2006. The collection is 148GBin size, with three main components consisting of 38.6GBof XML feeds, 88.8GB of permalink documents (i.e. a sin-gle blog post and all its associated comments) and 28.8GBof homepages (i.e. the corresponding blog entry page eachtime the feed was fetched).Over 100,000 blogs were monitored for the eleven week period, generating 3.2 million permalink documents (posts).The permalink documents are used as a retrieval unit for the
1
http://ir.dcs.gla.ac.uk/test collections
Quantity ValueNumber of Unique Blogs 100,649RSS 62%Atom 38%First Feed Crawl 06/12/2005Last Feed Crawl 21/02/2006Number of Feeds Fetches 753,681Number of Permalinks 3,215,171Number of Homepages 324,880Total Compressed Size 25GBTotal Uncompressed Size 148GBFeeds (Uncompressed) 38.6GBPermalinks (Uncompressed) 88.8GBHomepages (Uncompressed) 20.8GBTable 1: Statistics of the Blogs06 test collection.Language Nbr. Permalinks Percentage (%)English 2,794,762 86.9Spanish 64,350 2.0French 50,852 1.6German 18,444 0.6Italian 10,797 0.3(other) 76,230 2.4(unknown) 199,736 6.2Table 2: Breakdown of language statistics of the Blogs06collection. The languages labelled
Unknown
correspond al-most entirely to Asian languagesopinion finding task and its associated polarity subtask. Forthe blog distillation task, the feeds are used as the retrievalunit. Table 1 shows the main statistics of the Blogs06 col-lection. Moreover, in order to ensure that the Blog tracexperiments are conducted in a realistic and representativesetting, the collection also includes a significant portion of spam, non-English documents, and some non-blogs docu-ments such as RSS feeds. About 13% of the permalinks inthe Blogs06 collection are non-English. In particular, about6% of the collection is in Asian languages. Table 2 showsthe breakdown of language statistics in the Blogs06 collec-tion. It is of note that only English posts are assessed, postsin any otherlanguageare deemednon-relevant. Finally, dur-ing the creation of the collection, 17,969 presumed spamblogs (known as splogs) and their corresponding 509,137blog posts were included in the Blogs06 collection to assessthe impact of spam on the retrieval performance in such acontrolled setting (Macdonald & Ounis 2006).In the first pilot run of the Blog track in TREC 2006,it was comprised of the opinion finding task, and an opentask which allowed participants the opportunity to influencethe determination of a suitable second search task for 2007on other aspects of blogs besides their opinionated nature.TREC 2007 saw the addition of a new main task and a newsubtask, namely the blog distillation task and a polarity sub-task respectively, along with a second year of the opinionretrieval task. Table 3 provides an overview of the numberof participating groups in the track since its inception.
 
Year Tasks Participants2006 Opinion Finding Task 14Open Task 52007 Opinion Finding Task 20Polarity Subtask 11Blog Distillation Task 9Table 3: Tasks run overthe first two years of the TREC Blogtrack.In the remainder of this paper, we present in details thetwo main tasks that have ran at the TREC Blog track. Wedescribe the tasks in details, as well as the most effectiveretrieval approaches that the participating groups have de-ployed. We provide insights on the performances of searchengines across a variety of topic categories, as well as howthe topic categories were affected by spam.
Opinion Finding
Many blogs are created by their authors as a mechanismfor self-expression encouraged by the freely accessible blogsoftware, communicating their opinions and thoughts onany topic of their choice. A study conducted in (Mishne &de Rijke 2006) shows that many queries received by blogsearch engines seem to be of an opinion, or perspective-finding nature, rather than fact-finding. The opinion findingtask is an articulation of an information need that aimsto uncover the public sentiment towards a given targetentity such as a product, an organisation or a location. Aretrieval engine allowing for an effective opinion findingmight naturally be used as a tool for supporting manybusiness-intelligence tasks such as brand monitoring,consumer-generatedviews and feedback analysis, and moregenerally media analysis. It can also help users make aninformedchoice before buyinga givenproduct,attendinganentertainment event, or taking a holiday in a given location.Several commercial blog search engines aim to allowusers to find out about the opinions and thoughts of otherpeople, who happily share their thoughts on the blogo-sphere. These thoughts range from anger at some products,politicians or organisations, to good reviews of products orappraisal of cultural events.In the Blog track, the opinion retrieval task involved lo-cating blog posts that express an opinion about a given tar-get (Ounis
et al.
2007). The target can be a “traditional”named entity, e.g. a name of a person, location, or organi-sation, but also a concept (such as a type of technology), aproduct name, or an event. The task can be summarised as
What do people think about X 
,
being a target. The topicof the post is not required to be the same as the target. How-ever, for a post to be judged relevant, an opinion about thetargethadto be present in the post orone ofthe commentstothe post, as identified by the permalink. To create a realisticsetting where the topics are actual representations of real in-formationneeds, assessors selected queries froma querylogofa commercialblogsearchengine,andexpandedthemintofully-describedtopics by making a reasonable interpretationof the query. This process was used to generate 50 topics for
<top><num> Number: 930 </num><title> ikea </title><desc> Description:Find opinions on Ikea or its products.</desc><narr> Narrative:Recommendations to shop at Ikea are relevant opinions.Recommendations of Ikea products are relevant opinions.Pictures on an Ikea-related site that are not relatedto the store or its products are not relevant.</narr></top>
Figure 1: Blog track 2007, opinion retrieval task, topic 930.the 2006 Blog track and another50 topics for the 2007 Blogtrack. Figure 1 shows an example topic.The relevance assessment procedure had two levels (Ou-nis
et al.
2007; Macdonald, Ounis, & Soboroff 2008). Thefirst level assesses whether a given blog post, i.e. a perma-link, contains information about the target and is thereforerelevant. The second level assesses the opinionated natureof the blog post, if it was deemed relevant in the first assess-ment level. A workable definition of 
subjective
or
opinion-ated 
content was used. In particular, a post is assumed tohave a subjective contentif it contains an explicit expressionof opinion or sentiment about the target, showing a personalattitude of the writer. Rather than attempting to provide aformal definition, the human assessors were given a numberof examples,which illustrated the two levels of assessments.Given a topic and a blog post, assessors were asked to judgethe content of the blog posts. The following scale was usedfor the assessment:
0
Not relevant 
. The post and its comments were exam-ined, and does not contain any information about the target,or refers to it only in passing.
1
Relevant 
. The post or its comments contain informationaboutthe target, but do not express an opiniontowards it. Tobe assessed as “Relevant”, the information given about thetargetshouldbesubstantial enoughto be includedin a reportcompiled about this entity.If the post or its comments are not only on target, but alsocontain an explicit expression of opinion or sentiment aboutthe target, showing some personal attitude of the writer(s),thenthedocumenthadtobejudgedusingoneofthreelabels:
2
Negative opinionated 
. Contains an explicit expressionof opinion or sentiment about the target, showing some per-sonal attitude of the writer(s), and the opinion expressed isexplicitly negative about, or against, the target.
3
Mixed opinionated 
. Same as (2), but contains both pos-itive and negative opinions.
4
Positive opinionated 
. Same as (2), but the opinion ex-pressed is explicitly positive about, or supporting,the target.Posts that are opinionated, but for which the opinion ex-pressed is ambiguous, mixed, or unclear, were judged sim-ply as “mixed” (3 in the scale).Following the TREC paradigm described in the previoussection, the relevance assessments were formed using the
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...