You are on page 1of 21

Primary Content 

Quality Guidelines  
New Version 7.1!! Pay attention to:
- New judging option: Gray area. Select when the highlight is neither typical primary nor non-
primary
- Definition changes
o Reviews and site notifications are now back to non-primary. Typical non-primary
content
- Examples changes
o Reviews are no longer gray. Gray area in tourism review page
o Reviews are no longer gray. Gray area in food/service review page
o Reviews are no longer gray. Gray area in product page
o Categories at bottom is not gray. Gray area in wiki page
o Most contents except header/footer are primary on Home page

Your Task
In this HitApp, you will be given a webpage and a text block with green highlight: choose the check box
to decide whether the content in the text block belongs to the primary content of the webpage. 
 
We already highlighted the text for you on the page. You just need to scroll down and find the green
highlight and make the decision:

 If the highlight is a typical primary content, select Yes, this is a typical primary content.
 If the highlight is a typical non-primary, choose No, this is a typical non-primary.
 If the highlight is somewhere in between typical primary and non-primary, our guideline didn’t
clearly mention and you are not sure which side it belongs to, select This is Gray area.
 If the page doesn’t contain any green highlight like below, or you need to click some buttons, e.g.
click the drop-down list, open the collapsed section, just choose “it’s invisible”.
 If the page can’t load, or there’s a big pop-up blocks all the background webpage, choose Skip.
Prerequisite: Instructions for Invoking Chrome 
For this HitApp, we suggest always use Chrome browser. Please follow below instructions to correctly
setup and invoke Chrome which helps you auto skip the spam/secure screen alert.
1. If you don’t have Chrome browser, please download Chrome first and add a shortcut on your
desktop  
2. Depending on your local desktop configuration, you might need Admin permissions to make the
below changes 
3. Right click on the shortcut on the desktop and select “Properties”  
4. In the Properties window, ensure that you are on the “Shortcut” tab (see image below) 
5. In the “Target:” property, you will see the path to “chrome.exe” listed. Click on this path,
proceed till the end, add a whitespace and then add the below switch after that: 
                        “  --allow-running-insecure-content"  
a. Quotes are intentional and should be included 
b. Notice that there is a whitespace after the starting quote 
6. Click on “Apply”, then “OK” and close the Properties window 
7. Always launch Chrome by double clicking on this updated shortcut when using this HitApp 
 

 
Primary Content Definition 
Primary Content is the content of a webpage that is the reason a user wants to visit the webpage.
If you are a user who visits a webpage on SeattleTimes.com, you want to read the text of the article that
a journalist wrote, not the ads that are interspersed throughout the article, or the related articles, or the
navigation bar on top. All that content might distract you and cause you to click on it, but the main
reason you visit an article page is to read the actual article. Primary Content is that content, but for any
page on the web. Similarly, for forums, product pages, questions & answers sites, search pages,
the Primary Content is the main reason you are visiting the page. 

Secondary Content is the content that even they are removed from the page, the web page
remains valuable.  In other words, secondary contents are not the reason you visit the page.  
They don’t contribute to the main information on the page, e.g. ads and page header are the same
across website. Even without ads, header, footer, related articles, the core value of the page is not
affected.  

Typical Primary Content SHOULD include:  


 Article, title, sub-title, date, author, author introduction of Articles. E.g. NewYorkTimes
 SeattleTimes Example
 Forum/QnA post content in the case of Forums. E.g Reddit, Quora. 
 Reddit Example
 Product names and details for each product of Product Listing pages.
 Amazon Example
 Product descriptions, price of Product pages . E.g. Amazon.
 Amazon Example 2
 Movie/Housing/Tourism/Restaurant info. Reviews/Comments are no longer primary. E.g.
Trulia, Redfin, Zillow, TripAdvisor, Yelp, IMDB
 TripAdvisor Example
 Yelp Example
 Image title, description, and any alt text of Image browsing pages. E.g. Pinterest.
 Video title, caption, author, and any transcript or subtitles of video page E.g. Youtube
 Youtube Example
 Almost everything except header/footer of Home page.
 University Example
 Reference section of Articles/Research papers/Wiki pages. E.g. Wikipedia
 Wiki Example
 Breadcrumbs sections 
 Table of Contents, “External links” and “See also” section of Wiki pages. E.g. Wikipedia.
 Wiki Example
 Content on the side of Map pages. E.g. Redfin, Zillow.
 Zillow Example
 Opening/Working time. E.g. opening time of a restaurant.
Typical Primary Content SHOULD NOT include:

Content that are useless comparing to the core information of the page. In other words, if we remove
them, it doesn’t affect the core content on the page:
o Buttons are usually not Primary.
o E.g. shopping cart button, login button, search button, buy button
 Headers and footers 
 Usually the very top and very bottom of a web page
 Web site notifications,
 E.g. Wiki donation banner, Web site terms of use policy changing notifications.
 User review/comment sections

Navigation links/sections: 
o This includes the top or side menus on homepages or other webpages
which repeats across many documents on the website 
o This excludes “External links” on Wikipedia which should be considered Primary
Content – this is unique content for this document. 
 Related, trending, recommended or sponsored articles/videos/products etc. sections: 
o This includes sections which are secondary to the main content of the webpage and
contain list of similar or recommended articles, videos, products or other types of content. 
o This excludes “See also” section on Wikipedia which should be considered Primary
Content – this is unique content for this document. 
 Sidebars that are not related to article content. Usually advertisements.
 This excludes Zillow, Redfin, etc, whose majority of content is map.
 This excludes Opening/Working Time in sidebar, which is related with the primary
content
 Site specific copyright information 
 Usually at very bottom, or end of articles
 Search bars and Forms: comment forms, site search forms, login forms, etc.  
 Search bars usually at beginning of the page. Comment forms are usually together with
comment/review sections.
 Ads 
 Social media links/panels 

Gray area usually includes:


If it’s neither a typical primary or typical non-primary, that is the gray area.

When it’s hard to make the decision, or guideline didn’t cover the case, just select Gray area.
Typical Article Page Example
WebPage: Seattle times
Breadcrumbs, Article, Title, Author, Date, Author intro, image description, all primary.
Header, Footer, Comments, Ads, Related/Recommended contents are not primary.
Tourism Review Page Example
Example WebPage The product/service/places intro, price, spec, location, contact are primary
Header, Footer, Ads, Side bars used for filters are not primary.
Food/Service Review Page Example
Example https://www.yelp.com/biz/pinecrest-bakery-pinecrest-pinecrest .
Restaurant/Service intro, menus, are primary.
Header, Footer, Ads, Other recommended Restaurants/Services are not primary.
Forum Page Example
Example WebPage:
https://www.reddit.com/r/deeplearning/comments/jgnokl/is_the_practical_deep_learning_for_coders_
course/

The main content is post and replies.

Header, Footer, Related/More posts recommended /Ads are not.


Product Listing Example
Webpage: https://www.amazon.com/s?k=weights&i=sporting&ref=nb_sb_noss_2
The main content of such page is the listing product.
Header, Footer, Search bar, side bar, are not primary.
Product page
Example Webpage: Amazon.com
Product info, price, size, specs are primary.
Header, footer, ads, related/recommended contents are not primary.
Gray area: useful info in sidebar.
Wiki Page Example
Webpage: https://en.wikipedia.org/wiki/United_States_Armed_Forces

Article, summary, table of content, references, profile are primary content.

>>> Special for wiki pages, See also, References are Primary as well. <<<

Header, footer, sidebar are secondary.


Video Page Example
Example https://www.youtube.com/watch?v=-9gsHHFVVHI
Video description, like/dislike, are primary
Header Footer, Recommended/related/see also, search bar, buttons, not primary.
Home Page/Image Page Example
Example: http://www.uky.edu/ Most of the components are primary.
Except Header/Footer
More Examples
Dictionary example
Map Example

You might also like