Professional Documents
Culture Documents
I'm A Belieber: Social Roles Via Self-Identification and Conceptual Attributes
I'm A Belieber: Social Roles Via Self-Identification and Conceptual Attributes
181
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 181–186,
c
Baltimore, Maryland, USA, June 23-25 2014.
2014 Association for Computational Linguistics
Role Tweet
artist I’m an Artist..... the last of a dying breed ●
Chance of Success
0.80
● ●
●
0.70 ●
0.60 ●
rappeader
more
belitioner
womrian
29,924 little 5,694 man 564 champion
dan .0
sin m.2
chenedrom.1
rando ist
mothman
lawyma
ar tian
waitr eer
s ieist
nurseess
optiember
a torer
athlean
doctont
er
er
r
randoster
teacher
hip cer
g te
grand t
chris r
po r
freshger
r
m
athiteis
smotikst
engeinek
pessoim
studaen
wr eet
le
vegecta
21,822 big ... ... 559 teacher
sopohld
m
direc
ra
18,957 good 4,007 belieber 556 writer
Role
13,069 huge 3,997 celebrity 556 awful
13,020 bit 3,737 virgin ... ... Figure 1: Success rate for querying a user. Random.0,1,2
12,816 fan 3,682 pretty 100 cashier are background draws from the population, with the mean of
10,832 bad ... ... 100 bro those three samples drawn horizontally. Tails capture 95%
10,604 girl 2,915 woman ... ... confidence intervals.
9,981 very 2,851 beast 10 linguist
... ... ... ... ... ...
tian), and “followers” (belieber, directioner).2
Table 2: Number of self-identifying users per “role”. While
rich in interesting labels, cases such as very highlight the pur- We filtered users via language ID (Bergsma et al.,
poseful simplicity of the current approach. 2012) to better ensure English content.3
For each selected role, we randomly sampled up
to 500 unique self-reporting users and then queried
Study 2 In the second study we exploit a com-
Twitter for up to 200 of their recent publicly
plementary signal based on characteristic con-
posted tweets.4 These tweets served as represen-
ceptual attributes of a social role, or concept
tative content for that role, with any tweet match-
class (Schubert, 2002; Almuhareb and Poesio,
ing the self-reporting patterns filtered. Three sets
2004; Paşca and Van Durme, 2008). We identify
of background populations were extracted based
typical attributes of a given social role by collect-
on randomly sampling users that self-reported En-
ing terms in the Google n-gram corpus that occur
glish (post-filtered via LID).
frequently in a possessive construction with that
Twitter users are empowered to at any time
role. For example, with the role doctor we extract
delete, rename or make private their accounts.
terms matching the simple pattern “doctor’s ”.
Any given user taken to be representative based on
a previously posted tweet may no longer be avail-
2 Self-identification able to query on. As a hint of the sort of user stud-
All role-representative users were drawn from ies one might explore given access to social role
the free public 1% sample of the Twitter Fire- prediction, we see in Figure 1 a correlation be-
hose, over the period 2011-2013, from the sub- tween self-reported role and the chance of an ac-
set that selected English as their native language count still being publicly visible, with roles such
(85,387,204 unique users). To identify users of as belieber and directioner on the one hand, and
a particular role, we performed a case-agnostic doctor and teacher on the other.
search of variants of a single pattern: I am a(n)
, and I’m a(n) , where all single tokens filling The authors examined the self-identifying tweet
the slot were taken as evidence of the author self- of 20 random users per role. The accuracy of the
reporting for the given “role”. Example tweets can self-identification pattern varied across roles and
be seen in Table 1, examples of frequency per role is attributable to various factors including quotes,
in Table 2. This resulted in 63,858 unique roles e.g. @StarTrek Jim, I’m a DOCTOR not a down-
identified, of which 44,260 appeared only once.1 load!. While these samples are small (and thus
estimates of quality come with wide variance), it
We manually selected a set of roles for fur-
ther exploration, aiming for a diverse sample 2
Those that follow the music/life of the singer Justin
across: occupation (e.g., doctor, teacher), family Bieber and the band One Direction, respectively.
3
This removes users that selected English as their primary
(mother), disposition (pessimist), religion (chris- language, used a self-identification phrase, e.g. I am a be-
lieber, but otherwise tended to communicate in non-English.
1 4
Future work should consider identifying multi-word role Roughly half of the classes had less than 500 self-
labels (e.g., Doctor Who fan, or dog walker). reporting users in total, in those cases we used all matches.
182
writer
woman
waitress 1.0 ●
vegetarian
teacher ●
student ● ●
sophomore 0.8 ● ● ● ● ● ●
Accuracy
●
soldier ● ● ● ● ● ● ● ● ● ●
smoker ● ● ● ● ● ● ●
singer ●
● ● ●
rapper 0.6
poet
pessimist ●
optimist
nurse 0.4
mother
man
lawyer
hipster 0.2
grandma
geek
studeader
atheisore
g ner
freshman
mothrian
chrisimist
direchman
nur ma
grandtian
acto er
waitr ist
smok ss
ber
an
hipsteer
engin nt
vegectaher
sing er
engineer
wo ier
er
docto r
ar r
writ te
athle t
man r
dancer
lawyer
belie er
rappsee
tea r
fres tist
e
sopheoek
m
tio
cheeroet
e
optim
doctor
le
pessm
sold
directioner
p
dancer
christian
cheerleader Role
belieber
athlete
atheist
artist
actor Figure 3: Accuracy in classifying social roles.
0 5 10 15
Figure 2: Valid self-identifying tweets from sample of 20. Role :: Feature ( Rank)
artist morning, summer, life, most, amp, studio
atheist fuck, fucking, shit, makes, dead, ..., religion19
athlete lol, game, probably, life, into, ..., team9
is noteworthy that a non-trivial number for each belieber justin, justinbeiber, believe, beliebers, bieber
were judged as actually self-identifying. cheerleader cheer, best, excited, hate, mom, ..., prom16
christian lol, ..., god12 , pray13 , ..., bless17 , ..., jesus20
Indicative Language Most work in user clas- dancer dance, since, hey, never, been
directioner harry, d, follow, direction, never, liam, niall
sification relies on featurizing language use,
doctor sweet, oh, or, life, nothing
most simply through binary indicators recording engineer (, then, since, may, ), test9 , -17 , =18
whether a user did or did not use a particular word freshman summer, homework, na, ..., party19 , school20
in a history of n tweets. To explore whether lan- geek trying, oh, different, dead, been
grandma morning, baby, around, night, excited
guage provides signal for future work in fine-grain hipster fucking, actually, thing, fuck, song
social role prediction, we constructed a set of ex- lawyer did, never, his, may, pretty, law, even, office
periments, one per role, where training and test man man, away, ai, young, since
mother morning, take, fuck, fucking, trying
sets were balanced between users from a random nurse lol, been, morning, ..., night10 , nursing11 , shift13
background sample and self-reported users. Base- optimist morning, enough, those, everything, never
line accuracy in these experiments was thus 50%. poet feel, song, even, say, yo
rapper fuck, morning, lol, ..., mixtape8 , songs15
Each training set had a target of 600 users (300 singer sing, song, music, lol, never
background, 300 self-identified); for those roles smoker fuck, shit, fucking, since, ass, smoke, weed20
with less than 300 users self-identifying, all users solider ai, beautiful, lol, wan, trying
sophmore summer, >, ..., school11 , homework12
were used, with an equal number background. We
student anything, summer, morning, since, actually
used the Jerboa (Van Durme, 2012a) platform teacher teacher, morning, teach, ..., students7 , ..., school20
to convert data to binary feature vectors over a un- vegetarian actually, dead, summer, oh, morning
igram vocabulary filtered such that the minimum waitress man, try, goes, hate, fat
woman lol, into, woman, morning, never
frequency was 5 (across unique users). Training writer write, story, sweet, very, working
and testing was done with a log-linear model via
Table 3: Most-positively weighted features per role, along
LibLinear (Fan et al., 2008). We used the pos- with select features within the top 20. Surprising mother
itively annotated data to form test sets, balanced features come from ambigious self-identification, as seen in
with data from the background set. Each test set tweets such as: I’m a mother f!cking starrrrr.
had a theoretical maximum size of 40, but for sev-
eral classes it was in the single digits (see Fig- ment. These results qualitatively suggest many
ure 2). Despite the varied noisiness of our simple roles under consideration may be teased out from a
pattern-bootstrapped training data, and the small background population by focussing on language
size of our annotated test set, we see in Figure 3 that follows expected use patterns. For example
that we are able to successfully achieve statisti- the use of the term game by athletes, studio by
cally significant predictions of social role for the artists, mixtape by rappers, or jesus by Christians.
majority of our selected examples.
Table 3 highlights examples of language indica- 3 Characteristic Attributes
tive of role, as determined by the most positively
weighted unigrams in the classification experi- Bergsma and Van Durme (2013) showed that the
183
task of mining attributes for conceptual classes can term chart, for example, had high PMI with doc-
relate straightforwardly to author attribute predic- tor; but a precision test on the phrase my chart
tion. If one views a role, in their case gender, as yielded a single tweet which referred not to a med-
two conceptual classes, male and female, then ex- ical chart but to a top ten list (prompting removal
isting attribute extraction methods for third-person of this attribute). Using this smaller high-precision
content (e.g., news articles) can be cheaply used to set of attribute terms, we collected tweets from the
create a set of bootstrapping features for building Twitter Firehose over the period 2011-2013.
classifiers over first-person content (e.g., tweets).
For example, if we learn from news corpora that: 4 Attribute-based Classification
a man may have a wife, then a tweet saying: ...my Attribute terms are less indicative overall than
wife... can be taken as potential evidence of mem- self-ID, e.g., the phrase I’m a barber is a clearer
bership in the male conceptual class. signal than my scissors. We therefore include a
In our second study, we test whether this idea role verification step in curating a collection of
extends to our wider set of fine-grained roles. For positively identified users. We use the crowd-
example, we aimed to discover that a doctor may sourcing platform Mechanical Turk7 to judge
have a patient, while a hairdresser may have a whether the person tweeting held a given role
salon; these properties can be expressed in first- Tweets were judged 5-way redundantly. Me-
person content as possessives like my patient or my chanical Turk judges (“Turkers”) were presented
salon. We approached this task by selecting target with a tweet and the prompt: Based on this
roles from the first experiment and ranking charac- tweet, would you think this person is a BAR -
teristic attributes for each using pointwise mutual BER /H AIRDRESSER ? along with four response
information (PMI) (Church and Hanks, 1990). options: Yes, Maybe, Hard to tell, and No.
First, we counted all terms matching a target We piloted this labeling task on 10 tweets per
social role’s possessive pattern (e.g., doctor’s ) attribute term over a variety of classes. Each an-
in the web-scale n-gram corpus Google V2 (Lin swer was associated with a score (Yes = 1, Maybe
et al., 2010)5 . We ranked the collected terms = .5, Hard to tell = No = 0) and aggregated across
by computing PMI between classes and attribute the five judges. We found in development that an
terms. Probabilities were estimated from counts of aggregate score of 4.0 (out of 5.0) led to an ac-
the class-attribute pairs along with counts match- ceptable agreement rate between the Turkers and
ing the generic possessive patterns his and the experimenters, when the tweets were randomly
her which serve as general background cate- sampled and judged internally. We found that
gories. Following suggestions by Bergsma and making conceptual class assignments based on a
Van Durme, we manually filtered the ranked list.6 single tweet was often a subtle task. The results of
We removed attributes that were either (a) not this labeling study are shown in Figure 4, which
nominal, or (b) not indicative of the social role. gives the percent of tweets per attribute that were
This left fewer than 30 attribute terms per role, 4.0 or above. Attribute terms shown in red were
with many roles having fewer than 10. manually discarded as being inaccurate (low on
We next performed a precision test to identify the y-axis) or non-prevalent (small shape).
potentially useful attributes in these lists. We ex- From the remaining attribute terms, we identi-
amined tweets with a first person possessive pat- fied users with tweets scoring 4.0 or better as posi-
tern for each attribute term from a small corpus tive examples of the associated roles. Tweets from
of tweets collected over a single month in 2013, those users were scraped via the Twitter API to
discarding those attribute terms with no positive construct corpora for each role. These were split
matches. This precision test is useful regardless intro train and test, balanced with data from the
of how attribute lists are generated. The attribute same background set used in the self-ID study.
5
In this corpus, follower-type roles like belieber and di- Test sets were usually of size 40 (20 positive, 20
rectioner are not at all prevalent. We therefore focused on background), with a few classes being sparse (the
occupational and habitual roles (e.g., doctor, smoker). smallest had only 16 instances). Results are shown
6
Evidence from cognitive work on memory-dependent
tasks suggests that such relevance based filtering (recogni- in Figure 5. Several classes in this balanced setup
tion) involves less cognitive effort than generating relevant can be predicted with accuracies in the 70-90%
attributes (recall) see (Jacoby et al., 1979). Indeed, this filter-
7
ing step generally took less than a minute per class. https://www.mturk.com/mturk/
184
Actor/Actress Athlete Barber/Hairdresser Bartender Blogger Cheerleader
0.8
0.6
0.4
0.2
● ● ●
0.0 ● ● ● ● ●●● ● ●
rsal
jersey ing
rs
bar
ing
pom
tor
ion
shears
r
salon
clien
scisso
theate
lines
blog
tion
proteing
sportin
n
blogg
es
direc
uad
h
rehea
playss
ditio
coac
calv
concu
posi
consq
Christian College Student Dancer Doctor/Nurse Drummer Hunter
0.8
0.6
0.4
● ●
0.2
●
0.0 ● ● ●
ony
syllab hip
tutu
scope
drum
stand
sity
h
advis us
univer us
tuitio r
e
bible
camp n
nt
hope
churc
scrub
rs
colleg
testim
patie
schola
stetho
Jew Mom Musician Photographer Professor Rapper/Songwriter
Above Threshold
0.8
0.6 ●
0.4
0.2
●
0.0 ● ● ●
shul
ting
album
er
lyrics
y
guitar
re
piano
ery
shoot
studen
facult
set
shutt
angel
lectu
kid
deliv
paren
● ●
● ● ●
Figure 6: Results of positive vs negative by attribute term.
Given that a user tweets . . . my lines . . . we are nearly 80%
age
ship
les
ing
barra ent
sophy
ette
co
e
e
r
cks
smok
pipe
at
edito
articl
s
gogg
tobac
duffel
ym
smok
billet
stats
bunk
cover
cigar
cap
comb
lab
philo
deplo
0.4
Keep
0.2 ● FALSE TRUE
Using the same collection as the previous ex-
●
0.0 ● ●
pool
ink
oir
apron
poem
statio
mem
Keyword
Figure 4: Turker judged quality of attributes selected as given attribute term. Positive instances were taken
candidate features for bootstrapping positive instances of the to be those with a score of 4.0 or higher, with neg-
given social role. ative instances taken to be those with scores of 1.0
or lower (strong agreement by judges that the orig-
inal tweet did not provide evidence of the given
role). Classification results are shown in Figure 6.
0.8
5 Conclusion
We have shown that Twitter contains sufficiently
Accuracy
0.7
robust signal to support more fine-grained au-
thor attribute prediction tasks than have previously
0.6 been attempted. Our results are based on simple,
intuitive search patterns with minimal additional
filtering: this establishes the feasibility of the task,
0.5
but leaves wide room for future work, both in the
er
graph
ssor
mer
ian
tian
er
er
er
nt
r
te
r
r
r
mom
soldie
actor
barbe
docto
waite
write
report
blogg
smok
stude
athle
music
drum
profe
chris
cheerl
185
References Saif M. Mohammad, Svetlana Kiritchenko, and Joel
Martin. 2013. Identifying purpose behind elec-
Abdulrahman Almuhareb and Massimo Poesio. 2004. toral tweets. In Proceedings of the Second Interna-
Attribute-based and value-based clustering: an eval- tional Workshop on Issues of Sentiment Discovery
uation. In Proceedings of EMNLP. and Opinion Mining, WISDOM ’13, pages 1–9.
Shane Bergsma and Benjamin Van Durme. 2013. Us- Dong Nguyen, Noah A Smith, and Carolyn P Rosé.
ing Conceptual Class Attributes to Characterize So- 2011. Author age prediction from text using lin-
cial Media Users. In Proceedings of ACL. ear regression. In Proceedings of the 5th ACL-
HLT Workshop on Language Technology for Cul-
Shane Bergsma, Paul McNamee, Mossaab Bagdouri, tural Heritage, Social Sciences, and Humanities,
Clay Fink, and Theresa Wilson. 2012. Language pages 115–123. Association for Computational Lin-
identification for creating language-specific twitter guistics.
collections. In Proceedings of the NAACL Workshop
on Language and Social Media. Marius Paşca and Benjamin Van Durme. 2008.
Weakly-Supervised Acquisition of Open-Domain
John D. Burger, John Henderson, George Kim, and Classes and Class Attributes from Web Documents
Guido Zarrella. 2011. Discriminating gender on and Query Logs. In Proceedings of ACL.
twitter. In Proceedings of EMNLP. Michael J Paul and Mark Dredze. 2011. You are what
you tweet: Analyzing twitter for public health. In
Kenneth Ward Church and Patrick Hanks. 1990. Word ICWSM.
association norms, mutual information, and lexicog-
raphy. Computational linguistics, 16(1):22–29. Marco Pennacchiotti and Ana-Maria Popescu. 2011.
Democrats, Republicans and Starbucks afficionados:
Michael Conover, Jacob Ratkiewicz, Matthew Fran- User classification in Twitter. In Proceedings of
cisco, Bruno Gonçalves, Filippo Menczer, and the 17th ACM SIGKDD International Conference on
Alessandro Flammini. 2011. Political polarization Knowledge Discovery and Data mining, pages 430–
on twitter. In ICWSM. 438. ACM.
Jacob Eisenstein, Brendan O’Connor, Noah Smith, and Delip Rao, David Yarowsky, Abhishek Shreevats, and
Eric P. Xing. 2010. A latent variable model of Manaswi Gupta. 2010. Classifying latent user at-
geographical lexical variation. In Proceedings of tributes in twitter. In Proceedings of the Work-
EMNLP. shop on Search and Mining User-generated Con-
tents (SMUC).
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsief, Xiang-
Lenhart K. Schubert. 2002. Can we derive general
Rui Wang, and Chih-Jen Lin. 2008. Liblinear: A
world knowledge from texts? In Proceedings of
library for large linear classification. Journal of Ma-
HLT.
chine Learning Research, (9).
Benjamin Van Durme. 2012a. Jerboa: A toolkit for
Nikesh Garera and David Yarowsky. 2009. Modeling randomized and streaming algorithms. Technical
latent biographic attributes in conversational genres. Report 7, Human Language Technology Center of
In Proceedings of ACL. Excellence, Johns Hopkins University.
Larry L Jacoby, Fergus IM Craik, and Ian Begg. 1979. Benjamin Van Durme. 2012b. Streaming analysis of
Effects of decision difficulty on recognition and re- discourse participants. In Proceedings of EMNLP.
call. Journal of Verbal Learning and Verbal Behav-
ior, 18(5):585–600. Jennifer Wortman. 2008. Viral marketing and the
diffusion of trends on social networks. Technical
Michal Kosinski, David Stillwell, and Thore Graepel. Report MS-CIS-08-19, University of Pennsylvania,
2013. Private traits and attributes are predictable May.
from digital records of human behavior. Proceed-
Faiyaz Al Zamal, Wendy Liu, and Derek Ruths. 2012.
ings of the National Academy of Sciences.
Homophily and latent attribute inference: Inferring
latent attributes of Twitter users from neighbors. In
Alex Lamb, Michael J. Paul, and Mark Dredze. 2013. Proceedings of ICWSM.
Separating fact from fear: Tracking flu infections on
twitter. In Proceedings of NAACL.
186