Professional Documents
Culture Documents
Updated 2023-08-30
If you have read previous versions of these guidelines, you should read Section 6 (“Version History”) to
learn what has changed. That way, you will not have to re-read the entire guideline. Most recent updates
are highlighted.
Training examples are in English for cross-language consistency and standardization. However, spam
score will also be based on HITs in your language / country.
Warning: a tiny number of queries in this HIT app may have adult content. When you see such queries,
we ask that you classify them as “Cannot Judge”. If you are not an adult, and/or are not willing to judge
such queries, you should stop now.
N.B. HIT app is best viewed with browser set to low zoom level, e.g. 50%.
The query is shown on the top left. Please use the research page to learn more about it. By default, the
highlighted location words in the query will be automatically entered into the research pane for
whichever search engine you prefer (you can change your preference in the drop-down menu). If
research pane says it cannot show results due to “unusual traffic”, try a different search engine for a
while (this may be temporary).
N.B. This is an early version of this HIT app. If you experience any bugs (such as inability to click a button
or see results), they can usually be resolved by refreshing the web page. We will make sure any such bugs
are fixed ASAP. Please feel free to report them.
The HIT app will also show a list of results further below. You should indicate any results that are an
Excellent match to the locations and the search intent inherent in the query. Clicking a result’s Excellent
button will add it to the map.
Results most relevant to the highlighted words are usually in the Primary list. However, sometimes the
best results will be in the Secondary list. If you find Excellent results in the Primary list, you will not need
to look in the Secondary list. In fact, you can only select results from one of the lists. But to do well in
this HIT app, you’ll need to consider both lists for some queries.
Choose the best result(s), or indicate that there were No Excellent Results. Then give a reason for your
judgment (options will depend on your judgment), and enter an optional comment. Or, if you cannot
judge the result for any reason (e.g. broken HIT app, in query or results, no location terms in the query,
extreme adult themes, or locations that are not actually places), select Cannot Judge and enter an
mandatory explanatory comment. Comments should be in English (as much as possible).
Do NOT select “Cannot Judge” if the result is in a foreign language. Please use the “Translate” research
option to translate the results to your preferred language. If you need to translate multiple location
results, it may be easier to set your default search engine to “Translate” under “Auto load research page
in right pane” and then utilize the search buttons next to each location result.
For example, the query “Weather in Seattle WA” indicates the user is looking for weather results for the
“Seattle” area. If a result of “Seattle, King County, Washington, USA” is shown below, you should click its
Excellent button. In the options below, choose “No Problems”, and click Submit.
If, however, the only results shown are “Bellevue, King County, Washington, USA”, “King County,
Washington, USA”, “Washington, USA”, and “Seattle, Zapopan, Jalisco, Mexico”, then click No Excellent
Research in this HIT app is optimized for key usage. Future versions of the HIT app will be fully optimized
for keyboard usage for efficiency. There are also buttons for special cases: No Excellent Results and
Cannot Judge. Selecting any of these will reset selection of results. You must either select results, or
choose No Excellent Results, or choose Cannot Judge.
In Step 4, you will be asked to indicate any Problems that you experienced. The choices will depend on
whether you selected one or more results, or if you chose No Excellent Results or Cannot Judge.
- No problems: you saw no problems in query or results.
- No relevant results: None of the listed results are relevant to the query location words.
- Incomplete geo-chain: Geo-chain of one or more results is missing required information. Do not
select if the list of results / geo-chains is incomplete. For that, choose No relevant results.
- Incorrect Entity Type: the entity type of one or more results is incorrect.
- Spelling errors in geo-chain: Spelling errors the geo-chain of one or more results.
- Incorrect words selected in query: the words highlighted in query are incorrect or incomplete.
- Spelling error in the highlighted location in query: Please ignore non-highlighted parts of query
to consider only highlighted location words for spelling errors.
Finally, please provide a comment to explain your rating and click submit to complete the hit.
2. Definitions
Geo-entity
A specific location which can be uniquely described by a geo-chain, a geo-entity type, and either single
latitude / longitude point or a larger geographic area. A geo-entity could be small (e.g. a single address, a
building, a neighborhood), or very large (an ocean or a continent). For example, <Empire State Building,
20 W 34th St, New York, NY 10001, USA>, <King County, Seattle, USA, North America>, and <Pacific
Ocean> are all valid geo-entities.
Geo-chain
Geo-entities have hierarchical relationships with other geo-entities (e.g. city state country). These
relationships represent “belongs-to”, or parent-child, relationships. These form multiple connections
between the most specific and the least specific parts, and thus are known as geo-chains. Unlike what is
used in a postal address, geo-chains often include rich detail, including multiple administrative levels
(e.g. district, county / prefecture / region, state/province, etc.), and continent (e.g. South America, Asia).
N.B. Australia is both a country and a continent, so both may appear.
Point of Interest
Entity with important cultural, historical, tourist, or other major public interest. Includes well-known
locations such as landmarks, monuments, airports, tourist sites, and other places commonly known to
the public. Examples: Eiffel Tower, Harvard University, Geneva Airport, Taj Mahal, Roman Coliseum,
Louvre Museum, Grand Central Station, Wembley Stadium, Six Flags Amusement Park, Rockefeller
Center.
In this HIT app, Point of Interest also includes Nature AREAS such as parks (national, state, city, regional
including man-made parks e.g. Central Park in New York City), nature preserves, wildlife refuges, and
forests (National Forest, State Forest, etc.), and other Natural geographic or geological features
including mountains, volcanoes, rivers, lakes, oceans, islands, canyons, gorges, and deserts.
User Location
The location (latitude, longitude) of the user at the time the user made the query. It can be used to
determine the distance from the user to different possible results. Many user locations will also have a
circle around them indicating its accuracy. The smaller the circle is, the higher the accuracy, and the
more precisely it can be used to disambiguate locations.
Ambiguous Location
A location that is not uniquely defined, often because it is incompletely described. For example, a
location of “Springfield USA” could be <Springfield, OR, USA>, <Springfield, MA, USA>, <Springfield, IL,
USA>, or possibly even <Springfield Township, MI, USA>, among many, many others.
Unambiguous Location
A location that is uniquely defined, often because it is completely described. For example, a location of
“Springfield, MA” could only be for <Springfield, MA, USA>. The location of “Tlaquepaque” is not fully
described, but there is only Tlaquepaque in the world: <Tlaquepaque, Jalisco, Mexico>. Thus, it is also
unambiguous.
3. Judging results
Always start by looking at the research results. By default, the highlighted location words in the query
will be automatically entered into the research pane for whichever search engine you prefer (you can
change your preference in the drop-down menu). If intent is unclear, check “[S]earch entire query (not
just highlighted tokens)“ (or press “S” key). Finally, you can also modify the query directly in the
research pane or in the full size windows.
The intended location is the most specific location entity in the query, within reason. For query “SOHO
New York”, intended location is the Neighborhood of SOHO in the city of New York, and not the larger
city. But for query “Hudson Furniture in SOHO New York”, the intended location is still SOHO in New
York City, not the more specific furniture store (a store is not considered a “location” in this HIT app,
unless it is very famous and is thus a POI).
Also, some “spatial words” (e.g. in, at, near etc.) may be highlighted to help with context setting. These
words are fine in the query but they are NOT expected to be present in results. So while judging, please
don’t penalize results for not having spatial words or for the query having these words.
If the query is for a town and a postal code, the most specific part will be the one that is contained by
the other (in some cases, the postal code is larger than the town, yet sometimes the opposite is true).
If the location is unambiguous, the intended location is almost always very clear. If it is ambiguous, you
may need to consider dominant search intent, user location (indicating distance of result to user), geo-
entity type, and perhaps the context of the query.
For example, the location word “Paris” almost always refers to <Paris, France> and not to <Paris, Texas,
USA>. A search for the available location word, e.g. “Paris”, on a search engine will reveal overwhelming
dominant search intent for the former.
However, if user location was in or very near Paris, Texas at the time of the query, then “Paris” is more
likely to refer to Paris, Texas. However, Paris, France has overwhelming dominant search intent,
therefore a query for “Paris” when user is in or near Paris, Texas could still be for either <Paris, Texas,
USA> or <Paris, France>. Both would be valid.
Be sure to consider context, including other information in the query. For example, consider query “Phat
Phil's BBQ Paris”. The location word is Paris. Normally, Paris, France has globally dominant intent.
However, “Phat Phil's BBQ” is a restaurant in Paris, Texas that has no equivalent in Paris, France.
Therefore, regardless of user location, the dominant intent of “Paris” is for Paris, Texas.
When dominant search intent does not exist, user location can still be used to disambiguate between
ambiguous locations, especially if it is much closer to one than others (and especially if the UL accuracy
circle is small). As described above in the definition of Ambiguous Location, the location word
“Springfield” might refer to <Springfield, OR, USA>, <Springfield, MA, USA>, <Springfield, IL, USA>, or
many other places by that name. But if the user location is in Oregon, it mostly likely refers to
<Springfield, OR, USA>. If it is roughly equally close to more than one, then it does not disambiguate
between those entities, though it may rule out others. Please select equally Excellent results when there
is no dominant intent and user location doesn’t help disambiguate.
Similarly, for users in the USA or Canada, location word “London” probably refers to <London, UK>,
<London, Ontario, Canada>, or <London, KY, USA>. If user location is in or near London, Ontario, then
<London, Ontario, Canada> is the most relevant result (especially if the UL accuracy circle is small). If
user location is in New York City, then user location does not disambiguate as well. Although <London,
Ontario, Canada> and <London, KY> would still be closer to the user, global dominant search intent for
<London, UK> is more important since the alternatives are not very close.
Some results may seem nearly identical, though they have different geo-entity types. In general, favor
Populated Place over other geo-entity types. For example, if location word is “New Rochelle NY” and
results includes a POI (train station) and a Populated Place (city), pick only Populated Place.
Result #1 Result #2
Geo-chain New Rochelle, Westchester County, New Rochelle, New Rochelle,
New York Westchester County, New York
EntityType Populated Place POI
However, in many cases, you should NOT favor Populated Place over other geo-entity types. Instead,
you may need to consider a combination of dominant search intent, user location, geo-entity type,
context of the query, or other factors to determine the likely intended location. In other cases, there
may simply be multiple reasonable interpretations, even when weighing all these aspects.
For example, in query for “Ohio”, dominant search intent would be for Admin_1 (USA state of Ohio)
rather than the small town of Ohio – a Populated Place which is located within the Admin_1 state of
Ohio.
An interesting variation: “Manhattan” refers formally to the island of Manhattan, but popular use of
that name refers to the most famous borough of New York City. Therefore, intended location would be
the Populated Place (New York) rather than the POI (island of Manhattan).
By contrast, a query for “Long Island” overwhelmingly refers to the large island of Long Island (POI) in
New York state and not to Long Island City (Populated Place), which is a relatively small part of the
island.
You may also need to weigh various factors when deciding intended location for queries that could be
answered with both a Country or an island (POI). For example, Iceland, Greenland, and Madagascar.
Unless dominant search intent indicates otherwise, you should choose the Country.
If the entity type is not listed correctly, the geo-chain should NOT be marked as EXCELLENT. For
example:
- Washington, United States, North America: entity type should be Admin_1. If not, it should NOT
be selected as the EXCELLENT location
- Madhapur, Hyderabad, Andhra Pradesh, India, Asia: The entity type should be Populated Place.
If not, it should NOT be selected as the EXCELLENT location.
Some of these geo-chains may look very similar. In such cases, zoom in to learn more about the results
to determine which one is most relevant to the location words highlighted in the query.
On closer inspection, one is an “Admin_2” (District / County), while the other is a Populated Place (City)
contained within the given Admin_2.
Since an Admin_2 geo-entity is slightly less specific than a Populated Place, the judges should
understand the query-context and base their selection on the location-intent expressed in the query.
For example: if the query were “realtors Los Angeles”, then result should be the Populated Place (Result
#2). But for query “realtors Los Angeles County”, then selecting the Admin_2 geo-entity (Result #1)
would be correct.
Sometimes, a geo-chain may have duplicate / redundant information. These results should NOT be
selected as Excellent. For example:
<Seattle, Seattle, Seattle, WA, USA>. Seattle is the name of a city (Seattle, WA), but there
are no counties or states called Seattle, so it is redundant information.
On the other hand, take case of: <New York, New York, USA>. In this case, “New York” is the
name of both the city and the state. The first “New York” thus refers to the city and the
second “New York” refers to the state. Therefore, it is NOT redundant information.
Many times you may need to consider a combination of the cases above, using entity types and query
context to make the correct decision. For example, consider the query “bernat pop yarn dubai” with
results as a Populated Place (Dubai) and its parent administrative entity (Dubai, the Emirate). Query
context suggests that user is searching to buy a product within a city, and NOT within the Emirate /
Admin_1. Hence correct answer is the Populated Place “Dubai, Dubai, United Arab Emirates, Middle
East, Asia”.
Result #1 Result #2
Geo-chain Dubai, Dubai, United Arab Emirates, Dubai, United Arab Emirates, Middle
Middle East, Asia East, Asia
EntityType Populated Place Admin_1
If none of the listed results have correct geo-chains, then do not select any as Excellent. Instead, choose
No Excellent Results.
1. Location result must be highly relevant to the highlighted location terms in the query.
2. There is no better location result than this result (either in the list or in research).
3. There can be multiple location results rated as “Excellent” for a query.
If multiple results are equally EXCELLENT and relevant, and their geo-chains are correct, then
you should select ALL of those.
If multiple results are equally EXCELLENT and relevant, but only some of their geo-chains are
correct, then you should select only those results that are both relevant and with correct geo-
chains.
But if no results are shown, or there are not any EXCELLENT results with correct geo-chains, do
NOT select any result. Instead choose No Excellent Results.
4. Examples
Example 1. To be or not to be
Query to be or not to be
User Location N/A
Judgment Cannot Judge
This query clearly does not contain a location intent. The highlighted location term “or” could be an
abbreviation of “Oregon”, but not in this context. There are no location terms in this query.Therefore,
select Cannot Judge.
Example 2. www.hyderabad-house.com
Query www hyderabad house com
User Location N/A
Judgment Cannot Judge
This query is looking for a website of Hyderabad House – which is a chain of food outlets in India. The
user has not specified “where” s/he wants to search. The query does contain the word “Hyderabad”, but
Example 4. Boots UK
Query Boots UK
User Location Oberursel, Hessen, Germany
Location Results Entity Chain Entity Type User Distance
Vereinigtes Königreich Country 1300 km
Uttarakhand, Indien, Asien Admin_1 8157 km
Boot Cove, Lubec, Maine, USA POI 3413 km
Step 4 Problems No problems
The user is looking for “Boots” chain stores in the United Kingdom (UK). The user is in Oberursel,
Germany, and should expect a single result: the country of United Kingdom. For this example, please
pretend you are judging this as part of an English-speaking judge pool (even if you are actually in a
different language judge pool). As an English-speaking judge, you would be able to read the query
because it is in English. However, you would need to translate these results because they are in a foreign
language (German).
5. Final notes
1. Judges in ZH HIT apps may encounter queries from different dialects. If they are in a dialect that
you cannot understand, consider skipping the HIT. If you are certain that people in the ZH judge
group would not be able to judge the HIT, then please try to translate the results to your
language.
THANK YOU!
Thank you for reading this far. You should now be ready to complete training and take the qualification
test. It is good practice to keep these guidelines always open during judgment in case you need to
double check the rules.
6. Version history
If you have read previous versions of these guidelines, you should pay close attention to the changes
listed below. That way, you will not have to re-read the entire guidelines. However, be sure to refer to
the relevant sections, especially for new examples.
2023-08-30:
- Section 1 (“HIT app overview”)
o Do not mark results in foreign language as CANNOT JUDGE. Instead, try to translate
these results and judge normally.
2021-02-09:
- Training examples are in English for cross-language consistency and standardization. However,
spam score will also be based on HITs in your language / country.
- Section 1 (“HIT app overview”)
o What to do if map does not load automatically
o What to do if research pane displays “unusual traffic” error
o Option to “[S]earch entire query (not just highlighted tokens)“.
- Section 2 (“Definitions”):
o Defined words are now listed in table of contents (easier to find)
o Definition of geo-chain now describes its inclusion of rich detail (i.e. unlike what is used
in a postal address).
o Updated definitions of Neighborhood and Populated Place (contrasting formal
boundaries and organization by local government).
o Updated Geo-entity type table in Section 2 (Definitions): [TODO add link to table]
POI: added natural features
Other: added “regions”. More examples.
Added “Australia” (as both Country and Continent)
o Added example of Australia to definition of Dominant Search Intent: A query for
“Australia” could be for either the country or the continent, but dominant search intent
is for the country.
- Section 3.1 (“Validate query and listed results”):
o Subsection 1(e): changed “Location in Entity” to example of “Kentucky Fried Chicken”:
2020-12-22:
- Clarified in Section 1 that during step 4 of hit-app, ‘spelling errors’ checkbox is to be marked only
when highlighted location words in query have errors i.e. please ignore non-highlighted words.
- Updated Section 3.2 to emphasize that multiple excellent results should be selected for
ambiguous locations that neither have dominant intent nor user location help disambiguate.
Added example 17 in Section 4 to further explain.
- Updated Section 3.2 to emphasize that the Populated Place should be preferred over POI (Point
of Interest) when both exist with same name. (e.g. New Rochelle NY)
- Updated Section 3.2 to clarify that spatial words (e.g. in, at, near) could be highlighted in query
but they are NOT expected to be present in result. Hence please don’t penalize for the same.
- Updated Section 3.3 to emphasize that for child and parent results having same name, query
context should be used to identify excellent result between them (e.g. Dubai).
2020-11-26:
- Don’t penalize for query misspellings, abbreviations, aliases or alternative names. See Section
3.1(f).
- Try to understand user’s true search intent as expressed in the query text. Users make mistakes,
and can be confused. Try to understand their true meaning. See Section 3.2.
- Intended location is the most specific location in the query (within reason). See Section 3.2
(Determine intended location).
- Consider full query text for context. See example of “Phat Phil’s BBQ Paris” in Section 3.2
(Determine intended location).
- Dominant search intent: added example of query for Qishan / 旗山. See definition of Dominant
Search Intent in Section 2.
- Updated Geo-entity type list and descriptions. See definition of Geo-entity type in Section 2.
- Foreign Language: strengthened language to choose Cannot Judge if result geo-chains are in a
different language group AND not in English. Applies to both query and results. See Section
3.1(a).
- Section 4 (Examples): improved appearance of tables, added “Step 4 Problems”, and added /
modified examples, especially: #4, #10, #15, #16.