You are on page 1of 19

Ostrich Sensitive Text

Categorization 
Last updated: 2022-03-30

Contents
 Summary of Task
 Well-being
 Workflow
 Labels & Attributes Taxonomy
 Annotation Rules, Descriptions, and Examples
 Appendix
 Annotation Examples 

Summary of Task
Thank you for participating in the Ostrich Sensitive Text Categorization project.
The goal of this project is to categorize pieces of texts that may or may not contain harmful or
sensitive content. With your input, we want to help our customer train machine learning
algorithms to detect and remove harmful content on the internet.  
This project involves harmful and sensitive content. That includes speech and text, graphic
depictions of harassment, hateful, sexual, violent, and self-harm. We are also asking you to flag
Child Sexual Abuse Material (CSAM) if you see it.
Example
Summ Example
ary of
the
Task
Each
task
contains
some
text.
1. Read
the text.
2.
Answer
all the
question
s in the
right
window.

Well-Being
Your well-being is very important to us. You may at any time opt out of the project via the
Instructions page. You will also have access to counseling and daily well-being check-ins
throughout the course of the project.
Workflow
1. Read the text in the task.
2. Go to the right window.
3. Read the Section 1 question.
4. Complete the translation task - if required.
5. Review taxonomy, examples, and edge cases to find the best category.
6. Answer the question.
7. Move to the next step.

 Important: DO NOT click any links in the task text.

Section 1 Brief
explanati
on
Is the
passage in
foreign
language?
Click No if
it’s in
English.
Click Yes
and use
Google
Translate
to
translate
the entire
passage
into
English if
it’s not in
English.

Section 2 Brief
explanati
on

Is it
gibberish
or
nonsensica
l text?
Click
Yes if the
passage is
in symbols,
or Google
Translate
does not
return a
readable
English
text.
Click No if
the text is
readable in
English or
Google
Translate
into
English.

Section 3 Brief explanation


Does this passage contain sexual or non-erotic
sexual content?
Click Non-Erotic Sexual Content if the passage
refers to sexual content in a non-erotic and non-
explicit way.
Click None of the above if the content does not
contain references to sexual and/or erotic content
Click Sexual Content and select the option with
the most severity that fits the passage from below.

Section 4 Brief explanation

Select Contextualized hate speech or


harassment when the passage is hate speech
when taken out of context.  
Select Neutral/Factual statement referring to
group identity when the passage refers to a
specific group of people in a negative way but
rationalized by facts.
Select None of the above when the speech does
not contain hate speech.
Select Hateful Content if the passage contains
hate speech and select the all that apply to the
passage.
Section 5 Brief
explanatio
n

Select
Yes if the
text
supports
harassment
and/or
creates
uncomforta
ble
situations
for the
target
individual in
real life.
Select
No if the
text does
not support
harassment
.

Section 6 Brief explanation

Select Neutral depiction of or contextualized


violence or self-harm if the passage describes
acts of violence or self-harm without promoting or
having graphic depictions. Examples are
newspaper articles about violent acts.
Select None of the above when the passage
does not contain violence or self-harm content.
Select Contains Violence or Self-Harm if the
passage contains and/or promotes violence or self-
harm content.

Labels & Attributes Taxonomy


You will see some text. You should select all applicable labels. The labels are as follows:

 [C] Sexual Content (has sub-categories)

 [C4] Sexual content involving minors (most severe, need to flag)


 [C3] Illegal sexual content
 [C2] Erotic sexual content
 [C1] Non-erotic sexual content (least severe)

 [H] Hate Speech (has sub-categories)

 [H1.a] - Call for violence/threatening


 [H1.b] - Other kinds of hate (Derogatory, stereotypes, etc.)
 [H1.c] - Target nationality, race, gender
 [H0.a] - Neutral/factual statement referring to group identity
 [H0.b] - contextualized hate speech or harassment

 [V] Violence (has sub-categories)

 [V3] - Extremely graphic violence


 [V1] - Threats or support for violence
 [N1] - Neutral depiction of contextualized violence or self-harm

 [HR] Harassment
 [SH] Self-Harm

If a text does not fall into any of the above categories, select from the following:

 [X] Invalid
 [N0] None of the Above

Annotation Rules, Descriptions and Examples

Q1: Foreign Language


Is the passage in a foreign language?

1. Determine whether the text is in a foreign language (yes/no).

1. A foreign language is considered any language that is not English.  


2. Please use Google Translate to translate the text to English and then
label accordingly.  

2. Proceed to Q2.

Q2: [X] Gibberish / Nonsensical Text

Does the passage contain gibberish or nonsensical text?

1. Determine if the text is gibberish or nonsensical (yes/no).

1. Check YES if:

1. it is not in any natural language identifiable on Google


Translate.
2. The text is a string of characters (numbers and symbols) that
alone do not have meaning  (e.g. 1@@@@@@@@@)

2. Check NO for edge cases in the table:

1. If the text is not understandable but has readable words


categorize according to the word/words.

Definition Examples

Gibberish Examples

The text is a mix of numbers, letters and symbols that does @66t95Po
not make sense.  1*****#######
Sgdsln;gsnr

#77ff00

Not Gibberish Examples

Numbers alone 123456


Phone numbers 7222789
(55) 66117698

Passages that are math representation, even if it includes  2 + 3 = 5


letters x+y=z
2x+3y = 0

Code /scripts related to possible computer language {print('hello')}


Q3: [C] Sexual Content

Does this passage include sexual content?

1. Determine whether the text contains sexual content (yes/no).


2. If yes:

1. Determine the highest severity of the content displayed (see the table
below for definitions and examples):

1. [C4] Sexual content involving minors (most severe, need to flag)


2. [C3] Illegal sexual content
3. [C2] Erotic sexual content
4. [C1] Non-erotic sexual content (least severe)

2. Proceed to Q4.

3. If no:

1. Proceed to Q4.

 Note on C4/C3 additional labels: Content on physical harm on one party in non-
consentual sexual scenario, or non-consensual acts such as rape and
pedophilia, should be considered as sexualized violence. Thus it should be
tagged with a label out of the [V] Violence category too.

[C] Sexual Content

Definition Examples / keywords /Key notes

[C4] Sexual content involving minors (most severe, need to flag)

Sexual Content Involving Minors (SCIM), or Implicitly mentioning kid, `, little boy, little girl,
child pornography, is any representation of young boy, young girl, daughter, son,
a child engaged in real or simulated sexual teacher/students, nephew/aunt, or niece/uncle
activities or any representation of the sexual in sexual content should be considered
parts of a child for primarily sexual as C4 unless in the text it is explicit  that both
purposes.[1]  parties are above age
The content depicts a child (under 18 years If the text says “teen” but not age we assume
old) engaged in explicit sexual activity. it’s under 18 and therefore C4.
[TODO: Escalation path] “Boy” or “girl” alone (without the “young”
descriptor) are not necessarily C4.

[C3] Illegal sexual content

Erotic sexual content that DOES depict  Incest, Bestiality


activities which could be illegal if they  Rape, Violence
happened in real life.  Non-consensual sexual activity
For the purposes of this task, prostitution,  Sexual trafficking
escorting, and other consensual selling of  Sexual slavery
sexual services is not considered illegal.
Prostitution IS NOT ilegal unless it involves
sexual traficking

[C2] Erotic sexual content

Erotic sexual content that DOES NOT depict Explicit sexual actions with another party,
activities which would be illegal if they e.g. describing mouth/genitals approaching
happened in real life. contact with another person's genitals/anus,
This is the broadest category of erotic or for the purpose of sexual arousal or
pornographic sexual content, referring to stimulation, even if the contact area is clothed.
sexual content without deliberately Explicit sexual actions with oneself, e.g.
obscuring or censoring it. It covers the describing stimulation of one’s genitals with
majority of erotic sexual content in existing hands, sex toys, or other objects, even if the
erotica on the Internet today. contact area is clothed.
Squeezing or stimulating body parts, with
sexual intent, that are commonly understood to
be erogenous zones, such as breasts,
buttocks, and the neck.
Fetish content which describes or is meant to
arouse sexual interest, such as involving
bodily fluids (e.g., urine, feces), other body
parts (e.g., feet), or situations of sexual control
(e.g., BDSM dynamics, master-slave).
Conveying sexual interest in graphic,
evocative terms, e.g. describing anatomy,
physical positioning, other details meant to
contribute to sexual arousal, etc.
Sex chat described in graphic, evocative
terms.
Describing people in graphic, evocative terms,
e.g. describing anatomy, clothing, other details
meant to contribute to sexual arousal.
Describing by-products of sexual activity, e.g.
describing a state of sexual arousal, such as
“wetness” or “hardness,” or of sexual fluids.
Requests / instructions for the writing of
graphic content.

[C1] Non-erotic sexual content

Any sexual content that is not erotic in Description of human body in medical use
nature. It contains both non-pornographic case.
sexualized content (e.g. literary or artistic Sex education.
value) and contextualized sexual content Non-erotic implicit sexualized romance in
(e.g. medical). literature, etc.
Non-erotic romantic scene description (e.g. Non-erotic romantic scene description (e.g.
kissing, hugging) should be counted as C1. hugging, kissing).
Note: Categorization / classification tasks Non-erotic and implied sexual scene.
that involve graphic content should be Implication of “off-screen” sexual activity in
considered as contextualized and thus non- non-graphic terms. Meaning the context might
erotic sexual. provide an understanding of a situation with
sexual activity involved but it is not explicity
said.
Conveying sexual interest in plain terms.
Fictional characters whose “offscreen” career
is in sexual services, as long as the description
is non-graphic.

Q4: [H] Hateful Content

Does this passage contain hateful content?

1. Determine if the text contains hateful content or hate speech (yes/no).

1. Hate speech for the purpose of this taxonomy is demeaning,


threatening, insulting, derogatory and otherwise abusive content
targeting specific chosen groups or members of the group because
of their membership in the group. The chosen groups currently are:

1. Race, Gender, Ethnicity, Religion, Nationality, Sexual Orientation,


Disability Status, Caste (based on prior literature and protected
classes)

2. If the text is making reference to a NON protected group it is not hate


speech it could be Harassment if it supports harm to an individual or
group 

1. Examples of Non protected groups: Socioeconomic status,


political preference, Hobbies preference)

2. If yes:

1. Determine the type of hate speech displayed (see the table below for
definitions and examples). SELECT ALL THAT APPLY and *if applicable
select an option of H1.c Target:

1. [H1.a] Call for violence / threatening[2]


2. [H1.b] Other kinds of hate (derogatory stereotypes, etc.)
3. [H0.a] Neutral / factual statements referring to group identity
4. [H0.b] Contextualized hate speech or harassment
5. *[H1.c; optional tag] Target (non-exclusive)

1. Ethnicity, Gender, Religion, Nationality, Sexual Orientation,


Ability Status, Race, Caste, Other, Neither

2. Proceed to Q5.
3. If no:

1. Proceed to Q5.

[H] Hateful Content

 Definition Examples

[H1.a] Call for violence / threatening

The  passage contains hateful content “Throwing stones to paraplegic kids is my


towards a protected group? favorite hobby.”
Any language that is either threatening or “Really, I hope she get raped in the future.
inciting violence against a chosen group / Raping women should be legal I think”
wishing them harm.  
This is the severest category of hate
speech.
Whenever H1.a is selected also select
Violence and the degree accordingly  tag
goes always with a Violence category

[H1.b] Other kinds of hate (derogatory stereotypes, etc.)

Any language that is humiliating, “Islam is a religion of hate. These propaganda


demeaning or otherwise insulting a chosen type videos are dishonest. Simply read the
group. Quran and Hadith.”
It contains derogatory stereotypes, includes “Women are our pb?
support for hateful entities etc. roperty. The sooner they realize that, the
better.”
“Go bleach yourself black pussy doll and then
come talk to me”

[H1.c; optional tag] Target (non-exclusive)

Additional sub-labels to contextualize the Race: White, Black, Asian, Hispanic, Native
hate speech text. American
If the text is making reference to a NON Gender: Male, Female, Transgender
protected group it is not hate speech it Sexual Orientation: Heterosexual, Homosexual,
could be Harassment if it supports actions Gay, Lesbian, Queer
that support damage to an individual or Religion: Islam, Christianity, Hinduism,
group Buddhism, Judaism
Age: Elderly, Younger
Ability Status: Disabled
Nationality

[H0.a] Neutral / factual statements referring to group identity

The statement must refer “African-Americans have been observed to


to a chosen specific identity group in order perform worse on IQ tests than other
to qualify here. demographic groups and this is because IQ
Some of these factual statements could be tests are themselves biased”
referring to group identity in a negative
light. Unless the factually phrased
statement is clearly asserting something
very derogatory it should fall here
regardless of truthfulness.

[H0.b] Contextualized hate speech or harassment

Contains hate speech, but as a part of the ''I don't care if he's Black white green Or Purple.
passage that is explaining/relating a whole I don't support Police Shooting people UNLESS
situation that is not hate content by itself. there's an extreme situation  White Lives
One way to think about contextualized hate DON'T MATTER.'''
is if there are some sentences / sub-
sentences in the text that would alone  [ target: black | targeting type:
qualify as hate speech if they were to be race| violence: yes | insult: yes |
taken out of context. rejection: yes | accusation: yes”
 Things such as old laws if the old law is
hateful, news stories that include quotes ---
containing hateful speech, and fictional Text: When Sir Syed and his <PersonType>
stories with instances of characters saying were arrested, he said 'they can kill me but they
hateful things, etc. could also qualify as cannot stop Islam'
contextualized hate.
Note: When hateful content targets an  Category: Offensive
individual (i.e. harassment), but it is
contextualized, we label it as H0.b, as we
ll.

Q5: [HR] Harassment

Does this content support harassment?

1. Determine whether the text contains harassing material or supports harassment


(yes/no).

1. Harassing material or support for harassment. Content that may be used


to torment or annoy individuals or groups in real life, to harm them directly
or indirectly, or make harassment (including sexual) more likely to occur,
by encouraging harassment, making it easier to achieve, or endorsing
even in the context of fiction.
2. Insulting someone is classified as HR ("you're too stupid to comprehend
this" or "you're fat and ugly"?) . If the insult is targeted to a protected class
then also select H1.b) (e.g “you’re just a greedy slimy jew like all the
rest of you” is both H1.b and HR - e.g. “you’re just a greedy slimy jew
like all the rest of you” )

3. Notes:
1. If a derogatory statement towards an individual is based on their
group identity (e.g. gender, race, religion), it should be counted as
both [H1.b] Other kinds of hate and [HR] Harassment.
2. Contextualized harassment content (e.g. describing harassment
activity without support or harming others) should be labeled as
[H0.b] Contextualized hate speech or harassment.

4. Examples:

1. Content that creates uncomfortable situations for a real-world


individual by badgering, tormenting or annoying said person.
2. Content that threatens, promotes, or wishes upon a real-world
individual, named or unnamed, with exclusionary or stigmatizing
social consequences

1. “John Doe is” or “You are”


2. “Nobody likes you and I hope your family abandons you”
3. “I hope you lose all your friends”

3. Sexual harassment of a real-world individual.


4. Offensive content.
5. Content expressing gratitude/joy that someone else is being
harassed.
6. Positive depiction of harassment acts, harassers, or that portrays
victims of harassment as responsible for being harassed.

1. “portrayals of jocks stuffing nerds into lockers that are


sympathetic towards the jocks”

7. Instructions on how to harass individuals for optimum meanness,


to get away with harassment, or otherwise to harass more
effectively.

Q6: [V] Violence

Does this contain violence or self-harm?

1. Determine if the text contains violence or self-harm (yes/no).

1. Globally, we categorize content as violence if:

1. (1) It contains elements of violence (violence towards others) and


2. (2) the content is not hyperbolic or a neutral depiction of violence.

2. If yes:

1. Determine the severity of the violence displayed (see table below for
definitions and examples):
1. [V3] Extremely graphic violence
2. [V1] Threats or support for violence
3. [N1] Neutral depictions of or contextualized violence or self-harm

[V] Violence

Definition Examples

[V3] Extremely graphic violence

Content that depicts death, violence or Gratuitous gore, which is excessively graphic or
serious physical injury in extreme, graphic gruesome content related to blood, death,
detail, even if resulting from natural violence or severe physical harm, or violent
causes and with cartoon characters. content.
Note: Many V3 materials are likely to have Graphic depictions of
a V1 tag too, but gory depictions in V3 are
not necessarily always V1. For example,  human or animal torture, abuse
descriptions of a natural disaster where (including physical abuse,
descriptions of fatalities are extremely humans forcing animals into
graphic (talking about exposed bones or highly dangerous situations such
organs, or other horrific injuries) are V3 but as fighting)
not V1.  severely injured or mutilated
humans or animals
 human remains; exposed internal
organs or bones.

Gratuitous or excessive depictions of blood,


violence, or injury.

[V1] Threats or support for violence

Non-graphic content that demonstrates Threats of violence with


support for violent acts against others[3].
Non-graphic content that credibly[4] states  explicit language
an intention to inflict violence (death or  implicit violence (e.g. burning a
serious physical harm) on an individual or cross in someone’s yard)
a group.  use of symbols or symbolism
Note: Sexual violence should be tagged (e.g. use Nazi symbols to
with an additional label from the [C] Sexual threaten others).
Content category.
Incitement to violence: 

 Encourages, urges or naturally


leads others to riot, commit
violence, or acts of insurrection.
Note that this may include things
that might be considered
acceptable within subjective
context, e.g., political riots.
General support for violence: 

 Personal support for violent acts


by individuals, or organizations
(including violent/terrorist
organizations) towards others.

 [N1] Neutral depictions of or contextualized violence or self-harm

Non-graphic content that describes acts of  Reporting of aggressive physical


violence, self-harm or suicide without contact sports (Boxing, wrestling,
supporting them. This includes news martial arts, etc…)
articles, historical accounts, educational  Scenarios where blood, organs,
material, etc. bones or body parts are
mentioned with a
surgical/medical procedure
context.

 News reports on violent/traumatic


events. This may include direct
quotes depicting violent intent, as
long as the context is clear.
 Non-graphic depictions of human
suffering during a war.

Q7: [SH] Self-Harm

Does this content describe or support self-harm?

1. Determine whether the text describes or supports self-harm (yes/no).

1. This includes graphic depictions of, instructions for, or support for


self-harm/suicide and content that depicts acts of self-harm or suicide in
detail, or supports acts of self-harm or the results of said acts.
2. Notes:

1. If the description contains violent content, it should be additionally


tagged with a label out of the [V] Violence category too.
2. Contextualized self-harm content should be labeled as [N1]
Neutral depictions of or contextualized violence or self-harm. For
example, someone shares their experience with eating disorder,
depression, suicidal thought but in recovery now, andDepictions of
someone expressing the willingness to die when ill should not be
considered self-harm.

3. Examples:
1. Instructions for suicide or various forms of self-harm, including
drastic and unhealthy weight loss associated with eating
disorders.
2. Encouragement to commit self-harm; e.g. self-harm presented as
a solution to a problem.
3. Promoting acts of self-harm, e.g., suicide, cutting, and eating
disorders, consumption of substances (drugs/alcohol) with the
intention of sel harfm (not recreational use, even when the
consumption might be ilegal)
4. Content that mocks victims or survivors of suicide, self-harm, or
eating disorders.
5. Depictions of fantasies about self-harm/suicide.
6. Content that focuses on the depiction of ribs, collar bones, thigh
gaps, hips, concave stomachs, or protruding spines, containing
terms associated with eating disorders or other unhealthy [weight-
loss] behaviors.

Appendix

[N0] None of the Above

The label will default to N0 if no other categories are selected. This means that text does not
contain any content on sexual material, hate speech, harassment, self-harm and violence. By
process of elimination, they should be valid natural language texts and may or may not be in
English.
This category includes clearly hyperbolic and not targeted “violent” content.

 Notes:

 If not obviously hyperbole, mark as harmful.


 There is a distinction between using “rapey” as an adjective (perhaps
derogation) versus supporting rape. If violent adjectives appear in a
neutral context that does not promote or encourage violence, harassment
or self-harm, e.g. “the rapey detectives are the ones who keep their
jobs,” mark as [N0].

 Examples:

 "Ugh, if you don’t stop being so cute, I will eat you up!” (expressing
unrealistic threats)
 "I'm going to beat the crap out of my test." (metaphor)
 "Yes, I said it and I will fight anyone who disagrees with me!"
(Intentionally offensive or provocative content that does not express a
realistic commitment to violence)

When in Doubt…

How do I determine if something is hate speech?


1. Is the text targeting a chosen group, or members of a chosen group, because of their
membership?

 (Race, Gender, Ethnicity, Religion, Nationality, Sexual Orientation, Disability


Status, or Caste)

2. Is the text making a hateful claim?

1. Calling for violence or other kinds of harm against a protected group?

1. [H1.a],
2. [V1] Threats or support for violence

2. Supporting violence against or harm for the members of the group?

1. [H1.a] Call for violence / threatening,


2. (maybe) [V1] Threats or support for violence

3. Supporting harmful stereotypes and/or hateful entities?

1. [H1.b] Other kinds of hate

4. Dehumanizing speech comparing or accusing people solely on the basis of


belonging to one of these groups of sexual predatory behaviour, violent crime
and other criminal activity?

1. [H1.b] Other kinds of hate

5. Attributing negative, hateful qualities such as misogyny, tendencies to violence,


incompetence etc. solely because of membership in one of the above groups?

1. [H1.b] Other kinds of hate

6. Comparing members of chosen groups to inhuman imagery/things such as


animals, objects etc?

1. [H1.b] Other kinds of hate

7. Calling for the exclusion of people because of membership in a chosen group?

1. [H1.a]

8. Making other derogatory/demeaning claims against people because of


membership in chosen groups?

1. [H1.b] Other kinds of hate

9. If it’s making a hateful claim but in factual manner- is the sample being neutral or
is it emotionally charged/ accusatory?
1. [H1.b] Other kinds of hate

Hate speech example

 “Here’s your history: White Christians committed innumerable atrocities against


us: invade, sexual violence, genocide, enslave, drug up, plunder, bankrupt,
nuclear/biological/chemical warfare, etc Victims: China, Philippines, Indonesia,
Malaysia, East Timor, Papua New Guinea, Laos, Cambodia, Vietnam, Myanmar
(formerly Burma), Korea, and Japan”

1 2 Label

YES YES If it is contextualized, attributing the speech to someone else, it is


likely [H0.b] Contextualized hate speech or harassment.
Otherwise, likely hate speech, [H1.?]

YES NO [H1.a] Call for violence / threatening

NO NO [N0] None of the Above

Annotation Examples

SAMPLE LABELS

Prostitution and the operation of brothels are legal in some countries, but illegal in C1
others. For instance, there are legal brothels in Nevada, USA, due to the legalization
of prostitution in some areas of the state. In countries where prostitution and brothels
are legal, brothels may be subject to many and varied restrictions.

Stand the fuck up and take them out all you fucking lazy white males! fuck being V1
called a racist protect your woman and children. If you find it in the street kill it and
bury it right there these animals will wipe you out if your not careful

I've said before I only do on my hands because I have some really good excuses for SH
fucking up my hands and it's hard to scar my hands. but I think, for you that it's more
likely they are just not saying anything about it. and I'm really self conscious so I don't
want anyone to see cuts on my arms.

Q: What do you think about me? H1.b, HR


A: Fuck chinese people like you, duh.
Q: Based operating systems?
A: FreeBSD, Linux, TempleOS. Also, fuck <Organization>! They're fucking retards.

loockeeer: Coucou eddy### C3, V2


Eddy: Ce sera moi qui te coupera le cou###
loockeeer: Ah ! Mais je veux pas... Je l'aime <DateTime> cou !###
Eddy: Tant pis pour toi###
loockeeer: Viens je te nique ta mère###
Eddy: D'accord###
loockeeer: Avoue t'es un violeur###
Eddy: ?????###
loockeeer: Fait pas genre tu sais pas###
Eddy: Mais je sais pas !###
loockeeer: Mouais...###
Eddy: Mais je sais pas mais j'ai une tête de violeur, donc je viole###

[1] link.
[2] https://ec.europa.eu/info/sites/default/files/code_of_conduct_hate_speech_en.pdftarget
[3] Support for violence does not include encouragement for self-harm or encouragement for harassment.
[4] “Credibly” here implies that a reasonable person would not interpret it as a metaphor or hyperbole.

You might also like