Professional Documents
Culture Documents
4 Guidelines
Table of Contents
Contents
Table of Contents………………………………………………………………………………………………...1
Purpose…...………………………………………………………………………………………………………….3
Disqualification Criteria……………………………………………………………………………………….7
1|Page
Common Cases for Cross Component Comparisons………………………..…………20
Rating Process…………………………………………………………………………………………21
Examples…………………………………………………………………………………………………………...23
2|Page
Purpose
In this project you will perform pairwise content matching. For each job, you will be
presented with two examples of content. You must review each content pair, think
about the main point of each example, then tell us if they match with one another.
Your job is to tell us the true relationship between each Source and Match
Candidate. It is imperative you follow the process closely so we can measure the
precision of our systems and ultimately improve them.
Same Component Comparison: Jobs where both the Match Candidate and the Source
Content share the same matching components. For instance, a Match Candidate’s
image matches to a Source Content’s image.
• Image to Image
• Caption to Caption Comparison
• URL to URL Comparison
• Video to Video Comparison
Cross Component Comparison: Jobs that have different matching components. For
instance, a Match Candidate’s overlaid text from an image matches a Source
Content’s caption
3|Page
Glossary of Key Terms
4|Page
Rating Process Overview
Step 1
Question: Is this Content Qualified for Review?
Determine if the job meets the criteria to be rated.
• If No, disqualify and move to next job
• If Yes, see below
Step 2
What Component(s) of the Match Candidate contains the match?
Here is where you fully evaluate the job. First look at all of the components in the
Match Candidate, then evaluate the Source Content. Are there any components from
the Match Candidate that match to the Source Content? If so, select all of these
components. For instance, if you see that the Match Candidate has an Image and
Overlaid text that matches to the Image and Overlaid text of the Source Content,
select Image and Overlaid Text.
5|Page
Step 3
[Selected Component] COMPARISON: Compare the Match Candidate
[SELECTED COMPONENT] and Source Content: how would you describe the
relationship between the [Selected Component] and the Source Content?
*Note*: only the components that you selected in the previous question will appear in
the rest of the question flow in SRT. Following from the previous example, if you
selected Image and Overlaid text in Step 2, then only the Image and Overlaid Text
questions will appear. When focusing on the selected Match Candidate Component
and the matching Source Content Component, use the criteria for what’s considered
a match.
Step 4
6|Page
Disqualification Criteria
Once you have determined that the job meets the criteria to be rated (SRT Question
1), you will then:
• Determine how well the content matches on a component level (SRT
Question 2)
• Determine how well the content agrees holistically (SRT Question 3).
The instructions for component-level matching are different for each content type
(image, caption, URL and video). Read through the instructions below and use the
SRT’s tooltips while rating, to refresh your memory when needed.
NOTE: As stated above, different components can match to one another (a caption
matches to overlaid text). This will be explained after general Content <> Content
Comparison (see page X for more).
Once you have determined the job meets the criteria to be rated, you will be asked
to determine how well the images match on an individual component level.
Note: Do not consider the overall meaning of the Source or Match Candidate
when
When you are evaluating this part of the question, it is important to understand the
difference between the text captions and text overlay. NOTE: these two are
considered DIFFERENT components in SRT. This means that you will evaluate
them separately in the SRT question flow.
1. Text Caption – Any text that is added to enhance the post. This can be placed
above or below the image and is not part of the image.
2. Text Overlay – Any text placed within the image box, that acts as part of the
image.
8|Page
Tip: If you hover over the image, the entire image will be outlined in blue. By doing
this, you can tell if text is apart of the image (overlaid text) or the post (caption).
Based on your visual assessment of the images, determine the correct label from the
list below:
Near Duplicates: These are identical or almost identical with the following trivial
differences:
• cropping, tint, color, brightness
• screenshots
• rotations
• stretching
• padding
• pixilation
• Trivial Imagery: These are differences that include added or subtracted
imagery that is trivial in amount, such as watermarks, arrows or circles.
NOTE: These differences do not change the meaning of the component.
Near Match: The Match Candidate and Source Content are a near match when:
• The differences are greater than the criteria for Near Duplicates, but the
components share a similar message.
They Do Not Match: The Match Candidate and Source Content do not match when:
9|Page
• They don’t refer to the same subject matter
• They make different claims
Unsure: If you are unsure whether something should be labeled as Near Match or
Do Not Match, then choose Unsure.
10 | P a g e
Caption to Caption Comparison
Once you have determined the job meets the criteria to be rated, you will be asked
to determine how well the captions match on an individual component level.
Remember: Use the same criteria here when evaluating overlaid text.
Read through the text components for both the Source Content and Match Candidate
and determine how well they match. For overly long texts, read through the first five
paragraphs and skim the rest.
• You may stop reading long texts when it becomes clear the components are
unrelated/do not match.
• When coming across a job where the Match Candidate does not have an
image, but the Source Content does, compare the captions and ignore the
single image.
Based on your assessment, determine the correct description from the list of choices
below:
Near Duplicates: These are both identical or almost identical with the following
trivial differences:
• Trivial Differences in Variance: 10% variance in text.
o Example: 100 words of text with a candidate that has 10 words that
are different.
▪ The captions can be the same length, but have a 10% difference
in overall text or
▪ The captions can be 10% difference in length
• Trivial Differences in Formatting: These are differences that include:
o spacing or text formatting
o punctuation
o the addition or subtraction of citations or copyright claims
o linking strategies.
• Trivial Differences to text: These are differences that include:
o Different spellings
o Character substitutions (deliberate or not)
▪ Examples:
• ✅ The cat's fur really soft | The kats fur is very sofft
• ✅ The cat's fur really soft | Thè cätš für is vêrÿ søft
o The addition or subtraction of emojis for emphasis.
▪ Examples:
• Adding after a joke.
• Adding after a question.
11 | P a g e
NOTE: These differences do not change the meaning of the component.
Near Match: The Match Candidate and Source Content are a near match when:
• The differences are greater than the criteria for Near Duplicates, but the
components share a similar message.
They Do Not Match: The Match Candidate and Source Content do not match when:
• They don’t refer to the same subject matter
• They make different claims
Unsure: If you are unsure whether something should be labeled as Near Match or
Do Not Match, then choose Unsure.
12 | P a g e
URL to URL Comparison
Once you have determined the job meets the criteria to be rated, you will be asked
to determine how well the URL Links match on an individual component level.
Compare the Match Candidate and Source Content; how would you describe the
relationship between the two URL links?
1. Open each link in a new tab; then review the body of the destination page. The
SRT links are not currently clickable. You must manually copy and paste each URL
into a new browser tab. Please take great care to copy the entire URL string.
• Ignore advertisements, menu bars, comments, and related articles. Focus
solely on the title and body of the URL’s destination (blog post, article, etc.)
• If the focal point of the target page is a video (i.e. the URL is a YouTube link),
watch the first 2 minutes and skim through the remainder.
2. Read through the first 5 paragraphs, then skim through the rest.
• You may stop reading long texts when it becomes clear the components are
unrelated/do not match.
3. Decide how well the Match Candidate matches the Source Content, using the
labels outlined below.
Note: It is easy to accidentally paste the same link into both tabs: ensure you
are comparing the two distinct URLs provided.
Near Duplicates: These are both identical or almost identical with the following
trivial differences:
• Trivial Differences in Variance: 10% variance in text.
o Example: 100 words of text with a candidate that has 10 words that
are different.
▪ The captions can be the same length, but have a 10% difference
in overall text or
▪ The captions can be 10% difference in length
• Trivial Differences in Formatting: Differences that include:
o spacing or text formatting
o punctuation
o the addition or subtraction of citations or copyright claims
o linking strategies.
▪ Examples:
• The link https://nyti.ms/3cXHiDa is equivalent to
https://nytimes.com
• Spelling out a link after a hypertext reference is trivial,
you may ignore differences such as these:
o Visit this link
13 | P a g e
o Visit this link (http://nytimes.com)
• Trivial Differences to Text: Differences that include:
o Different spellings
o Character substitutions (deliberate or not)
▪ Examples:
• ✅ The cat's fur really soft | The kats fur is very sofft
• ✅ The cat's fur really soft | Thè cätš für is vêrÿ søft
o The addition or subtraction of emojis for emphasis.
▪ Examples:
• Adding after a joke.
• Adding after a question.
NOTE: These differences do not change the meaning of the component.
Near Match: The Match Candidate and Source Content are a near match when:
• The differences are greater than the criteria for Near Duplicates, but the
components share a similar message.
They Do Not Match: The Match Candidate and Source Content do not match when:
• They don’t refer to the same subject matter
• They make different claims
Unsure: If you are unsure whether something should be labeled as Near Match or
Do Not Match, then choose Unsure.
14 | P a g e
Video to Video Comparison
Once you have determined the job meets the criteria to be rated, you will be asked
to determine how well the videos match on an individual component level.
Note: Do not consider the overall meaning of the Source or Match Candidate
when
Near Duplicates: These are identical or almost identical videos with the following
trivial differences:
• Trivial differences in Formatting: These are differences that
o include cropping, tint, color, brightness
o screenshots
o rotation
o stretching
o padding
o pixilation
• Trivial differences in Imagery: These are differences that include added or
subtracted imagery that is trivial in amount, such as watermarks, arrows or
circles.
• Trivial difference to overlaid text: These are differences that include added or
subtracted overlay text that is trivial in amount, such as:
o Different spellings or character substitutions (deliberate or not)
▪ Examples:
• ✅ The cat's fur really soft | The kats fur is very sofft
• ✅ The cat's fur really soft | Thè cätš für is vêrÿ søft
o The addition or subtraction of emojis for emphasis
o Accurate close captioning (in English)
o Comments that don’t change the meaning of the component
▪ Examples: “Wow”, “Amazing!
• Trivial difference in Content or Length: Differences include:
o Difference of the length of the videos, either a couple seconds
difference or up to 10% of a difference in seconds, whatever is longer,
between the candidate video and the source video.
▪ Example: a 3-minute video with a candidate that is ~18
seconds different
▪ Example: 30-second video with a 36-second video
o 10% difference in content
15 | P a g e
▪ Example: The length of the video is the same, but 10% of it is
different content
• Trivial Differences in Audio:
o The audio is compressed (playback is of slightly lower or higher
quality)
o Has a different volume level
o Music has been changed, added, or removed in a way that does not
affect meaning
o The audio track has been silenced without affecting meaning
o The audio tracks feature different speakers, but the original meaning
is not affected
o The audio contains sound-effects that do not affect the meaning of the
video
NOTE: These differences do not change the meaning of the component.
Near Match: The Match Candidate and Source Content are near matches when:
o The differences are greater than the criteria for Near Duplicates, but
the components share a similar message.
They Do Not Match: The Match Candidate and Source Content do not match when:
• They don’t refer to the same subject matter
• They make different claims
• Differences in Text or Formatting that changes the meaning of the
component (swapped from near match example)
Unsure: If you are unsure whether something should be labeled as Near Match or
Do Not Match, then choose Unsure.
16 | P a g e
Cross Component Comparison
Often, when going through jobs in the queue, you will come across jobs where the
Match Candidates and Source Content match, but match with different components.
These jobs are considered Cross Component Jobs. Don’t fret, this is expected! The
following examples encompass these types of jobs in more details:
1. The Match Candidate and Source Content have the same components, but
different components are considered a match according to criteria outlined
in these guidelines.
a. Example: Both the Match Candidate and the Source Content have a
caption and an image. However, the Match Candidate’s caption
matches to the Source Content’s overlaid text within the image.
2. The Match Candidate and Source Content have different components
altogether, but those different components are considered a match according
to criteria outlined in these guidelines.
a. Example: A Match Candidate consists of only an image. A Source
Content consists of only a video. Although they are different
components and don’t share any components, the image in the Match
Candidate matches one of the images within the Source Content's
video.
17 | P a g e
18 | P a g e
Detailed walk through of example job
CAPTION MATCH TYPE: What component from the Source Content does the
match candidate's caption match to?
Answer: As discussed above, the Caption from the Match Candidate matches to the
Source Content’s Overlaid text. Choose Overlaid Text.
19 | P a g e
Common Cases for Cross Component Comparisons
Overlaid text Matches to Caption: These are jobs where the Overlaid text of an
Image matches to a caption.
How to Handle: Use the matching criteria for captions described above. Moreover,
this criteria includes: trivial differences in variance, formatting, and text.
20 | P a g e
Overall Holistic Comparison
Looking at all of the components together, you will then decide how well the Match
Candidate aligns with the Source Content with respect to the “central claim” being
expressed.
When you are evaluating this part of the question, it is important to understand the
“central claim” and “claim under review”.
Rating Process
1. Review the Claim Under Review
a. If the Claim Under Review is ambiguous, vague, or not well written,
click the link to skim the headline and the first couple of sentences of
the Fact Check Article.
2. Identify the central claim for the Source content
a. Recall: If the Source Content does not relate to the Claim Under
Review, disqualify the job.
3. Make sure to take all components into account and look at the content
holistically (for example, take into account how the caption, the image, and
the overlaid text combined tell the intended meaning of the content).
4. Determine if the claim made in the Match Candidate matches the main claim
being made in the Source Content.
How to Use Link to Fact Check
The link to the Fact Check Article is the source of truth for the associated Claim
Under Review or the Central Claim within the Source Content. The link to the Fact
Check Article should first be used to determine if it is related to the Source Content
and the Claim Under Review. Afterwards, it should be used if you need further
clarification about the Claim Under Review while determining if the Source Content
and Match Candidate are holistic matches.
21 | P a g e
Common Cases for Holistic Matching
1. Debunking: This is when a Match candidate debunks the claim made in the
Source Content or vice versa.
a. How to handle: If the Match Candidate debunks the Source Content’s
central claim, it therefore has a different meaning and should be
labeled as Do Not Match.
2. Ambiguous Meaning:
a. How to handle: If it is unclear whether the Match Candidate agrees
with the Source Content or vice versa, choose Unsure.
22 | P a g e
Examples
Same Component comparison Jobs
The different spacing and punctuation would be considered a trivial difference (and
not change the meaning of the component).
23 | P a g e
2) Trivial Differences in Formatting: Text
In this example, the additional sentence in the end of the Match candidate does not
make up more than 10% of the total text and it does not change the meaning of the
text. Therefore, it would be considered a Near Duplicate.
24 | P a g e
3) Multiple Differences in Text
Spacing and punctuation: The spacing and punctuation are different but are trivial
since it doesn’t affect the meaning of the text.
25 | P a g e
Difference in Length: The additional sentences in the beginning of the Match
Candidate do not change the meaning
Attribution: The quotation marks are not correctly used in the Match Candidate,
but it’s still clear the statement is attributed to Trey Gowdy. Therefore, both of the
captions are attributed to the same person.
Copy and Paste: The post author adds at the end to “Copy and paste if you dare.”
This does not affect the meaning.
4) Identical Images
Exact same: these images are exactly the same or near-exactly the same upon
observation.
26 | P a g e
Component Label: Near Duplicate
27 | P a g e
5) Identical Images
Exact same: these images are exactly the same or near-exactly the same upon
observation.
28 | P a g e
6) Trivial Differences in Formatting: Image
Trivial formatting: this is trivial because the full text is included in both images and
any cropping does not change the image’s meaning. There are no substantive
changes as cropping the bottom only removes whitespace and does not change the
user’s understanding of the content.
29 | P a g e
7) Trivial Differences in Formatting: Image
Trivial formatting: Buffer or whitespace added that does not change the meaning
or substance of the image.
30 | P a g e
8) Trivial Differences in Formatting: Image
Trivial overlay/text: the overlay does not make a substantive change to the
meaning of image. The added overlays are related to the original text and are not
changing user understanding. If the overlays were a politician’s face, company
slogan, etc. then the meaning would be changed.
31 | P a g e
9) Trivial Differences in Formatting: Image
32 | P a g e
10) Trivial Differences in Formatting: Images
33 | P a g e
11) Trivial Differences in Formatting: Images
Trivial formatting: cropping and trivial overlay that don’t change the meaning or
message of the image. The reshare text (generic account) is not indicating
endorsement or comment by a person of interest that changes the image’s message.
34 | P a g e
12) Substantive Differences in Formatting: Image
Substantive Overlay/Text: If we ignore the fact that the text is non-English, the
overlays and text on the image add a value judgment and change the meaning of the
photo. This particular example is a quasi-political statement added to the original
image.
35 | P a g e
13) Substantive Differences in Formatting: Image
Substantive formatting: the cropping removes key elements of the image that
change its meaning. Cropping to remove an individual that is key to the context
(here holding the chain) influences user perception and does not qualify as a match.
36 | P a g e
14) Substantive Differences in Formatting: Image
37 | P a g e
15) Common Cases for Captions: Small Remarks of Agreement
Small remarks of agreement: This is when someone agrees with the post before or
after by adding in a (this is usually less than 10% of difference in length) and should
NOT change the meaning
• Example: In this case, the first couple sentences give an additional sense of
agreement to the caption that follows but does not change or add to the
meaning.
Label: Near Duplicate
38 | P a g e
39 | P a g e
16) Common Cases for Captions: Copy and Paste
This is when the post author urges others to reshare by copying or pasting the
content.
Link to Image: This is when a link appears instead of a photo, but by looking at the
questions given in SRT, it’s a photo comparison.
Given the photos are the same photos with trivial differences (cropping) when you
paste the link into the browser, the photos are considered Near Duplicates
40 | P a g e
18) Common Cases for Images: Photo to Album (Multiple photos) Comparison
41 | P a g e
Cross Component JOBs
Though the Source Content here is a photo, it’s a Near Duplicate match according to
the caption to caption matching criteria. Remember, overlaid text would be treated
with the same criteria as a caption for cross component jobs.
42 | P a g e