You are on page 1of 7

m

3 fro ng
a
202 u Ho
WebImagesSingleSideImageSatisfaction
1 2 ,
V
uary ung I
Th
urs
a n y T P d
J b 1
,
y 4. 9 7 2.1 ay, J
Instructions d a
urs .16.2
5 6.2 anu
54
h
T 72 .9 ary 1
In this task,
1 you'll be shown (1) a query that seeks image results and, and
by (2)2two list of
IP Tu , 20
images that are intended to satisfy the query. Every image will have a singleg V 3 f n 2
u H rom
associated host page, which is defined as a webpage that contains the image. You'll oa be
n
asked to provide (1) applicable image flags and image satisfaction grading for eachg
individual image result, and (2) an overall side-by-side preference grading for the two
lists of results.

Understanding the Query

Your first job will be to research the query in an effort to get a better
understanding of the possible user intent(s)—i.e., what content might a user
who issued this query be looking for?

• Some queries might have just a single intent and single visual related aspect, such as
[beyonce lemonade album cover].
• Other queries might have multiple intents such as [mercury], [italy] and [jaguar], or the
intent had multiple visual related aspects such as [nissan car] and [spiderman].
• The query [mercury] could refer to the planet, the chemical element, or the car.
• The query [italy] definitely refers to the country, but users could be seeking a wide variety
of different aspects, such as: a map of the country, famous Italian landmarks, or the Italian
flag.
• The query [jaguar] could refer to the Jaguar vehicle or to the jaguar animal.
• The query [nissan car] could refer to images of different car color, angles, background,
etc.
• The query [spiderman] could refer to images of movie poster, different poses, etc.

While you’re researching the query, use your best judgement to figure out what the
possible user intent(s) could be. If you find that there are multiple intents, try to focus on
the ones that could be consider “reasonable intents,” and focus less on the “very unlikely
intents”.
Grading Individual Image Results
m
Th fro ng
IP u rYour next job will be to look at individual image results for the query, and determine how 3
202 u Ho
a
17 sday 2 ,
2.1satisfying
6.2 Janu each image would be for the user intent(s). Note that, if you were to only ry ulook
, 1 gV
u a n
at a 4single
5
.9 aryimage in isolation, you might not completely recognize or understand J a n ythe
b
T
12
b
content yofTuthe , image.
20 For this reason, for each image, you’ve been provided
s d ay, 5with
4.9 a link to
ng 23 r
u .16. 2
that image’s host Vu page—which,
fro
m as mentioned above, is a webpageThthat172 contains the
Ho P
an I
g
m
3 fro ng
02 Hoa
image. It’s generally a 1good 2, 2 Vuidea to visit each image’s host page to get a better
ry ung Th
understanding ofnuwhat’s a
T depicted in the image. This is especially
I P urs true for images whose
J a b y 1 d
content you d a ydon’t
,
4 . fully recognize or understand.
9 7 2.1 ay, J
5 6.2 anu
urs .16.2 54
h
T 72 .9 ary 1
Only after b
IP you’ve fully understood the content of a given image should yyou Tu proceed to
1 2,
2
ng 023
answer the grading questions pertaining to that image. Vu fro
Ho m
an
Image flags g
Please select the appropriate flags for each image. Flags include:

• Did not load - Select this flag for images that (for whatever reason) aren’t
loading in the task.
• Unsafe - Select this flag for images that depict any of the following
content:
o Pornography or Nudity: Any images showing exposed private parts or sexual
content, regardless of whether the images are real photos or are animated.
Images of animals having sex also belong to this category.
o Violence, Weapons, or Gore: Any images depicting violence, weapons, or gore.
o Hate or Harm: Images depicting or endorsing hate or harm towards groups based
on gender, gender identity, sexual orientation, race, age, disability, color, creed,
national origin, or religion.Note: Sometimes an image on its own is not hateful,
but in the context of a specific query, it can be considered hateful. For example, an
image of a monkey, by itself, is not hateful, but if it is shown for the query [black
people], the image can be considered hateful; in this situation, the image should
be classified as “Hate or Harm”.
o Medically Explicit/Gruesome Content: Any images showing blood, diseased body
parts, medical operations or procedures, etc.
o Drugs, Alcohol, or Tobacco: Any images depicting drugs, alcohol, or tobacco, or
the use of any such substances.
o Potentially Offensive or Upsetting Content: Any images that contain content that
is potentially offensive or upsetting.
• Near duplicate - Select this flag if the image is a near duplicate of at least
one other image that you already saw higher in the list. Two images
should be considered near duplicates if ANY of the following apply:
o The two images are completely identical.
m
Th
u o One image is a cropped version of the other. 3 fro ng
IP a
r
17 sday 202 u Ho
o One image is a slightly transformed (mirrored, rotated, or resized)
2.1 , J 12 ,
6.2 anu ary ung V
.9 ary version of the other.
54 u T
by 1 Jan by , 9
Tu 2, 20 r s day .254.
ng 23 u 6
Vu fro Th 72.1
Ho m 1
an IP
g
m
3 fro ng
a
202 u Ho
o The two 1 2images
, are the same except for minor color scheme
a ry ung V Th
differences,
J a n u T
b y such as filters, backgroundI colors,
P 1
urs lightness/darkness,
d
d a ,
etc.
y 4. 9 7 2.1 ay, J
5 6.2 anu
urs .16.2 54
h
T 7o2 The two images are the same except for the presence .9 ary of
1 by 12, minor
IP T
objects, such as logos, watermarks, text, borders, etc. ung 2023
Vu fro
Ho m
an
Near Duplicate Examples g

m
T fro ng
IP hursd 3 a
17 202 u Ho
2.1 ay, J 12 ,
V
6.2 anu
54 u ary ung
.9 ary 1 , Jan 9 by
T
by
Tu 2, 20 y
da 54 .
ng 23
Vu fro h urs .16.2
T 72
Ho m 1
an IP
g
m
3 fro ng
a
202 u Ho
Important: note that 1 2 , for any cluster of 2 (or more) images that are Near
ry ung V T
Duplicates ofJeach a
nu yother,
T the highest-ranking IP huin
image r that cluster should
, a
9 b 17 sday
NOT bersflagged
d .254 as a duplicate, but the rest of the images
a y . 2 . 16 , in
.25 anthe
J cluster
h u 1 6 ua
SHOULD T 7be
1 2 flagged as a Near Duplicate.
. 4.9 ry 1
by
IP Tu 2, 20
ng 23
Image Satisfaction Vu fro
Ho m
For each image, if you have indicated that the image is neither Did Not Load nor Unsafe, an
g
you’ll be asked to provide an Image Satisfaction rating.

If you have rated an image as duplicate, you should provide the image satisfaction scores
to the duplicate images independent of other images. Image satisifaction score is
given based on how satisfying the image would be to users who issued the query
regardless if it is a duplicate or not.

To answer this question, you’ll need to consider how satisfying the image
would be to users who issued the query. The answer options are on the
following spectrum:

• Not Satisfying — The image has nothing to do with the query. Overall, the image would
be helpful to virtually no users who issued this query.
• Slightly Satisfying — The image is connected with the query, but there are major
issues. For queries that are ambiguous/broad, with many possible intents, the image may
only satisfy a very unlikely intent. For queries that are precise and specific, the query may
be only partially addressed, with the image missing some important aspect(s) of the
query. Alternatively, the image may address the query completely, but the image quality
is poor (e.g., blurry, small, poor resolution, or poor cropping). Overall, the image would be
helpful to only some users who issued this query.
o Here are a few examples, query is 'Patrick Swayze's wife' and result image is of
Patrick Swayze only; query is 'spider bite' and result image is of a spider only.
• Moderately Satisfying — The image is connected with the query, and the image is
decent. For queries that are ambiguous/broad, with many possible intents, the image
satisfies a reasonable intent. For queries that are precise and specific, the image either
mostly or completely addresses the query. The image quality is solid, and there isn’t
exactly anything wrong with the image; however, the image is not particularly beautiful or
inspiring. Overall, the image would be helpful to most users who issued this query.
o Here are a few examples, query is 'Patrick Wayze's wife' and result image is
Patrick Swayze and his wife; query is 'an iguana', result image is graphics or
drawings rather than photos; query is 'lipstick' and result is a collage of photos of
lipsticks. m
Th fro ng
IP u • Highly Satisfying — The image is connected with the query, and the image is excellent. 3 a
r
17 sday , 202 u Ho
2.1 , J For queries that are ambiguous/broad, with many possible intents, the image satisfies 1 a
2
6.2 anu ary ung V
54 reasonable intent. For queries that are precise and specific, the image fully addresses
an by T the
u
.9 ary 1 J
bquery.
y T 2The image quality is excellent. As well, the image is beautiful andayinspiring.
, .9 Overall,
un , 202 r sd .254
the image
g V would
3 f be helpful to virtually all users who issued this query.
h u .1 6
u H rom T 72
1
oa IP
ng
m
3 fro ng
a
Guidelines on whether 2 , 202demote
to u Ho results when text, watermarks or other special
1 gV
effects are added a ry an
to n image Th
n u Tu I P urs
J a b y 1 d
d a ,
y 4. 9 7 2.1 ay, J
• For 5 an
6.2 images
h ursqueries
. 1 6.2 looking for memes or text specifically, text overlay on 5 4 ua is okay, no
Tneed to demote the results. Examples in the below: “stranger things titlery
by fonts”, “minion
7 2 . 9 1
P 1 meme”.
Iyay Tu 2, 20
n 23
• For queries looking for entities, products, concepts where text overlay doesgnot Vu add fr any
value, images with text overlay should be demoted. Ho om
an
• For image results with watermarks or other special effects added, as long as not impact g
the image quality, no need to demote the result.

For evaluating the relevance of the image, please take the image size/resolution into
account in the context of the user intent issuing the query. For example, users would be
satisfied by a small image while searching for an emoji, but lower resolution image may
not be so satisfactory if the user is searching for a wallpaper.

After you finish rating every image in the list, you’ll have the opportunity to
leave a comment. Feel free to use the comment box to explain any parts of the
task that you found to be confusing or difficult, or for providing any
miscellaneous comments you might have.

m
T fro ng
IP hursd 3 a
17 202 u Ho
2.1 ay, J 12 ,
V
6.2 anu
54 u ary ung
.9 ary 1 , Jan 9 by
T
by
Tu 2, 20 y
da 54 .
ng 23
Vu fro h urs .16.2
T 72
Ho m 1
an IP
g
m
3 fro ng
202the oa
Guidelines on how much 1 2 , u Hsize of the image affects the rating
a ry ung V Th
n u T I P urs
a y relevance of the image, please take the17image d
• For evaluating
,
y 4.J 9 b the 2.1 aysize/resolution
,J into
s d a 5 6 a
account
hu 2.16 in the context of the user intent issuing the query. For example,
r .2 . 25 n users would be
Tsatisfied 4.9 uary
1 7 by a small image while searching for an emoji, but lower resolution
by T 12image may
IP ,2
not be so satisfactory if the user is searching for a wallpaper. u ng 230
Vu fro
• After you finish rating every image in the list, you’ll have the opportunity to leave Hao m
comment. Feel free to use the comment box to explain any parts of the task that you an
g
found to be confusing or difficult, or for providing any miscellaneous comments you
might have.

Grading Overall Side-by-Side Preference

At the bottom of the grading task, you’ll see a question about your overall
preference on the two lists of search results, with a set of 7 options to choose
from. Looking at the two lists as a whole, you’ll decide which side you think
users who issue this query would prefer, and why. Click the button that
indicates your preference, and write down the reason why you preferred that
list in the comment box. This comment is required.

• Note: Occasionally you may see cases where both lists are identical. If this happens,
choose About the Same but also add a comment saying “Identical.”

General Principles
When you provide your overall preference grading, keep in mind the following
general principles:

• Users generally prefer a side containing better results (e.g. ones that are more satisfying).
• Users generally prefer a side where better results are ranked as high as possible. If the two
sides have the same results, but the sequence in which those results are presented is
different, please value the side with the better order.
• Users generally prefer a side that contains less did not load or unsafe search results.
• If you're having trouble deciding which side is better, then you should choose About the
Same.

Th Factors to Consider m
u 3 fro ng
IP a
rd
17 sWhen 202 u Ho
2.1 ay, J deciding which side is better, you’ll want to take many factors into 12 ,
6.2 anu ry ung V
account.
54 Here
.9 ary 1 are some you might consider: u a
T
by 2 y , Jan 9 by
Tu 20 , a 54 .
rsdissued 2
• Image 23
ngSatisfaction.
Vu fro How satisfying the results would be to users
T who
h u
2 . 16. the query?
Does it address m
Ho the 17
an “reasonable intents,” or focus on the “very unlikely
IP intents”?
g
m
3 fro ng
02 Hoa
• Diversity. Does 1each 2, 2 of
Vuthe results offer something difference for the user (different
information,udifferent
y g T
n perspective, etc.)? Or are there results
IP huthat
rsd are so similar to each
a r
a n y Tu
other that y, one
J
.9 or more is completely redundant, and the search
b 172 ayengine might as well not
d a 5 4 .16 , Ja
haver s included
.2 them all? . 25 n
u
Th 72.1
6 4.9 uary
• Freshness. If the query is time-sensitive (for example, refers to a subject
by in 1the news), or
P1
Imentions Tu 2, 20
a recurring event, does the result have the latest information? ng 23
fr
Vu think
• Other. Depending on the query, you may want to include other factors that you Ho om
may be important. For example, perhaps if the user is searching for a product, you might an
g
consider a result for a popular brand of that product to be better than an obscure one.

m
T fro ng
IP hursd 3 a
17 202 u Ho
2.1 ay, J 12 ,
V
6.2 anu
54 u ary ung
.9 ary 1 , Jan 9 by
T
by
Tu 2, 20 y
da 54 .
ng 23
Vu fro h urs .16.2
T 72
Ho m 1
an IP
g

You might also like