Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
Look up keyword
Like this
0Activity
0 of .
Results for:
No results containing your search query
P. 1
A Study of Unsupervised Adaptive Crowdsourcing

A Study of Unsupervised Adaptive Crowdsourcing

Ratings: (0)|Views: 81|Likes:
Published by Crowdsourcing.org

More info:

Published by: Crowdsourcing.org on Dec 29, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
See more
See less

02/03/2013

 
Crowdsourcing syntactic relatedness judgements for opinion mining in thestudy of information technology adoption
Asad B. Sayeed, Bryan Rusk, Martin Petrov,Hieu C. Nguyen, Timothy J. Meyer
Department of Computer ScienceUniversity of MarylandCollege Park, MD 20742 USA
asayeed@cs.umd.edu,brusk@umd.edu,martin@martinpetrov.com,
{
hcnguyen88,tmeyer88
}
@gmail.com
Amy Weinberg
Center for the AdvancedStudy of Languageand Department of LinguisticsUniversity of MarylandCollege Park, MD 20742 USA
aweinberg@casl.umd.edu
Abstract
We present an end-to-end pipeline includinga user interface for the production of word-level annotations for an opinion-mining task in the information technology (IT) domain.Our pre-annotation pipeline selects candidatesentences for annotation using results from asmall amount of trained annotation to bias therandom selection over a large corpus. Ouruser interface reduces the need for the user tounderstand the “meaning” of opinion in ourdomain context, which is related to commu-nity reaction. It acts as a preliminary bufferagainst low-quality annotators. Finally, ourpost-annotation pipeline aggregates responsesand applies a more aggressive quality filter.We present positive results using two differ-ent evaluation philosophies and discuss howour design decisions enabled the collection of high-quality annotations under subjective andfine-grained conditions.
1 Introduction
Crowdsourcing permits us to use a bank of anony-mous workers with unknown skill levels to performcomplex tasks given a simple breakdown of thesetasks with user interface design that hides the fulltask complexity. Use of these techniques is growingin the areas of computational linguistics and infor-mation retrieval, particularly since these fields nowrely on the collection of large datasets for use in ma-chine learning. Considering the variety of applica-tions, a variety of datasets is needed, but trained,known workers are an expense in principle that mustbe furnished for each one. Consequently, crowd-sourcing offers a way to collect this data cheaply andquickly (Snow et al., 2008; Sayeed et al., 2010a).We applied crowdsourcing to perform the fine-grained annotation of a domain-specific corpus. Ouruser interface design and our annotator quality con-trol process allows these anonymous workers to per-form a highly subjective task in a manner that cor-relates their collective understanding of the task toour own expert judgements about it. The path tosuccess provides some illustration of the pitfalls in-herent in opinion annotation. Our task is: domainand application-specific sentiment classification atthe sub-sentence level—at the word level.
1.1 Opinions
For our purposes, we define opinion mining (some-times known as sentiment analysis) to be the re-trieval of a triple
{
source, target, opinion
}
(Sayeedet al., 2010b; Pang and Lee, 2008; Kim and Hovy,2006) in which the
source
is the entity that origi-nated the opinionated language, the
target 
is a men-tion of the entity or concept that is the opinion’stopic, and the
opinion
is a value (possibly a struc-ture) that reflects some kind of emotional orientationexpressed by the source towards the target.In much of the recent literature on automaticopinion mining,
opinion
is at best a gradient be-tween
positive
and
negative
or a binary classifica-tionthereof; furthercomplexityaffectsthereliabilityof machine-learning techniques (Koppel and Schler,2006).We call opinion mining “fine-grained” when weare attempting to retrieve potentially many different
 
{
source, target, opinion
}
triples per document. Thisis particularly challenging when there are multipletriples even at a sentence level.
1.2 Corpus-based social science
Our work is part of a larger collaboration with socialscientists to study the diffusion of information tech-nology (IT) innovations through society by identify-ing opinion leaders and IT-relevant opinionated lan-guage (Rogers, 2003). A key hypothesis is that thelanguage used by opinion leaders causes groups of others to encourage the spread of the given IT con-cept in the market.Since the goal of our exercise is to ascertain thecorrelation between the source’s behaviour and thatof others, then it may be more appropriate to look at opinion analysis with the view that what we areattempting to discover are the views of an aggregatereader who may otherwise have an interest in the ITconcept in question. We thus define an expression of opinion in the following manner:
A
expresses opinion about
B
if an in-terested third party
’s actions towards
B
may be affected by
A
’s textually recordedactions, in a context where actions havepositive or negative weight.This perspective runs counter to a widespread view(Ruppenhofer et al., 2008) which has assumed atreatment of opinionated language as an observationof a latent “private state” held by the source. Thisdefinition reflects the relationship of sentiment andopinion with the study of social impact and marketprediction. We return to the question of how to de-fine opinion in section 6.2.
1.3 Crowdsourcing in sentiment analysis
Paid crowdsourcing is a relatively new trend in com-putational linguistics. Work exists at the paragraphand document level, and it exists for the Twitter andblog genres (Hsueh et al., 2009).A key problem in crowdsourcing sentiment analy-sis is the matter of quality control. A crowdsourcedopinion mining task is an attempt to use untrainedannotators over a task that is inherently very subjec-tive. It is doubly difficult for specialized domains,since crowdsourcing platforms have no way of di-rectly recruiting domain experts.Hsueh et al. (2009) present results in quality con-trol over snippets of political blog posts in a task classifying them by sentiment and political align-ment. They find that they can use a measurement of annotator noise to eliminate low-quality annotationsat this coarse level by reweighting snippet ambigu-ity scores with noise scores. We demonstrate that wecan use a similar annotator quality measure alone toeliminate low-quality annotations on a much finer-grained task.
1.4 Syntactic relatedness
We have a downstream application for this annota-tion task which involves acquiring patterns in thedistribution of opinion-bearing words and targets us-ing machine learning (ML) techniques. In partic-ular, we want to acquire the syntactic relationshipsbetween opinion-bearing words and within-sentencetargets. Supervised ML techniques require goldstandard data annotated in advance.The Multi-Perspective Question-Answering(MPQA) newswire corpus (Wilson and Wiebe,2005) and the J. D. Power & Associates (JDPA)automotive review blog post (Kessler et al., 2010)corpus are appropriate because both contain sub-sentence annotations of sentiment-bearing languageas text spans. In some cases, they also include linksto within-sentence targets. This is an example of anMPQA annotation:That was the moment at which the fabricof compassion tore, and worlds crackedapart; when
the contrast and conflict of civilisational values
became so great asto
remove any sense of common ground 
-even on which to do battle.Theitalicizedportionisintendedtoreflectanegativesentiment about the bolded portion. However, whileit is the case that the whole italicized phrase repre-sents a negative sentiment, “remove” appears to rep-resent far more of the negativity than “common” and“ground”. While there are techniques that dependon access to entire phrases, our project is to identifysentiment spans at the length of a single word.
2 Data source
Our corpus for this task is a collection of arti-cles from the IT professional magazine,
Information
 
Week 
, from the years 1991 to 2008. This consistsof 33K articles of varying lengths including newsbulletins, full-length magazine features, and opin-ion columns. We obtained the articles via an institu-tional subscription, and reformatted them in XML
1
.Certain IT concepts are particularly significant inthecontextof thesocial scienceapplication. Our tar-get list consists of 59 IT innovations and concepts.The list includes plurals, common variations, andabbreviations. Examplesof IT conceptsinclude“en-terprise resource planning” and “customer relation-ship management”. To avoid introducing confound-ing factors into our results, we only include explicitmentions and omit pronominal coreference.
3 User interface
Our user interface (figure 1) uses a drag-and-dropprocess through which workers make decisionsabout whether particular highlighted words withina given sentence reflect an opinion about a particu-lar mentioned IT concept or innovation. The useris presented with a sentence from the corpus sur-rounded by some before and after context. Under-neath the text are four boxes: “No effect on opin-ion” (
none
), “Affects opinion positively” (
postive
),Affects opinion negatively” (
negative
), and “Can’ttell” (
ambiguous
).The worker must drag each highlighted word inthe sentence into one of the boxes, as appropriate. If the worker cannot determine the appropriate box fora particular word, she is expected to drag this to the
ambiguous
box. The worker is presented with de-tailed instructions which also remind her that mostof words in the sentence are not actually likely to beinvolved in the expression of an opinion about therelevant IT concept
2
. The worker is not permittedto submit the task without dragging all of the high-lighted words to one of the boxes. When a wordis dragged to a box, the word in context changescolour; the worker can change her mind by clickingan X next to the word in the box.
1
We will likely be able to provide a sample of sentence dataannotated by our process as a resource once we work out docu-mentation and distribution issues.
2
We discovered when testing the interface that workers canfeel obliged to find a opinion about the selected IT concept. Wereduced it by explicitly reminding them that most words do notexpress a relevant opinion and by placing the
none
box first.
We used CrowdFlower to manage the task withAmazon Mechanical Turk as its distribution chan-nel. We set CrowdFlower to present three sentencesat a time to users. Only users with USA-based IPaddresses were permitted to perform the final task.
4 Procedure
In this section, we discuss the data processingpipeline (figure 3) through which we select candi-dates for annotations and the crowdsourcing inter-face we present to the end user for classifying indi-vidual words into categories that reflect the effect of the word on the worker.
4.1 Data preparation4.1.1 Initial annotation
Two social science undergraduate students werehired to do annotations on
Information Week 
withthe original intention of doing all the annotationsthis way. There was a training period where they an-notated about 60 documents in sets of 20 in iterativeconsultation with one of the authors. Then they weregiven 142 documents to annotate simultaneously inorder to assess their agreement after training.Annotation was performed in Atlas.ti, an anno-tation tool popular with social science researchers.It was chosen for its familiarity to the social sci-entists involved in our project and because of theirstated preference for using tools that would allowthem to share annotations with colleagues. Atlas.tihas limitations, including the inability to create hier-archical annotations. We overcame these limitationsusing a special notation to connect related annota-tions. An annotator highlights a sentence that shebelieves contains an opinion about a mentioned tar-get on one of the lists. She then highlights the men-tion of the target and, furthermore, highlights the in-dividualwordsthatexpresstheopinionaboutthetar-get, using the notation to connect related highlights.
4.1.2 Candidate selection
While the use of trained annotators did not pro-duce reliable results (section 6.2) in acceptable timeframes, we decided to use the annotations in a pro-cess for selecting candidate sentences for crowd-sourcing. All 219 sentences that the annotators se-lected as having opinions about within-sentence IT

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->