You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224122374

Web Filtering and Censoring

Article  in  Computer · April 2010


DOI: 10.1109/MC.2010.84 · Source: IEEE Xplore

CITATIONS READS

19 1,364

2 authors:

Thomas M. Chen Victoria Wang

50 PUBLICATIONS   1,533 CITATIONS   
University of Portsmouth
57 PUBLICATIONS   426 CITATIONS   
SEE PROFILE
SEE PROFILE

Some of the authors of this publication are also working on these related projects:

EPSRC: Data Release - Trust, Identity, Privacy and Security View project

Data Release - Trust, Identity, Privacy and Security View project

All content following this page was uploaded by Victoria Wang on 22 September 2016.

The user has requested enhancement of the downloaded file.


Chen, T. (2010). Web filtering and censoring. IEEE Computer, 43(3), pp. 94-97. doi:
10.1109/MC.2010.84

City Research Online

Original citation: Chen, T. (2010). Web filtering and censoring. IEEE Computer, 43(3), pp. 94-97.
doi: 10.1109/MC.2010.84

Permanent City Research Online URL: http://openaccess.city.ac.uk/8211/

Copyright & reuse


City University London has developed City Research Online so that its users may access the
research outputs of City University London's staff. Copyright © and Moral Rights for this paper are
retained by the individual author(s) and/ or other copyright holders. All material in City Research
Online is checked for eligibility for copyright before being made available in the live archive. URLs
from City Research Online may be freely distributed and linked to from other web pages.

Versions of research
The version in City Research Online may differ from the final published version. Users are advised
to check the Permanent City Research Online URL above for the status of the paper.

Enquiries
If you have any enquiries about any aspect of City Research Online, or if you wish to make contact
with the author(s) of this paper, please email the team at publications@city.ac.uk.
W EB T ECHNOLOGIE S

Web Filtering
and Censoring
Thomas M. Chen and Victoria Wang,
Swansea University, Wales

Information on the Web is not as uncontrolled as it may appear.

I
n May 2009, the Chinese nian government was also suspected of expression (for good or bad). In
government mandated instal- of monitoring Internet usage to track reality, information on the Web is
lation of the Green Dam Youth down election protesters. not as uncontrolled as it may appear.
Escort Web filter on all PCs In January 2010, public attention According to the OpenNet Initiative
sold in China starting in July. The fil- was again drawn to the issue of Web (www.opennet.net), more than 40
ter’s official purpose was to protect censoring when Google.cn decided countries actively filter Web access—
children from pornographic, violent, to stop complying with Chinese gov- selectively blocking content to a
and other “unhealthy” websites, but ernment requirements to censor billion Internet users worldwide—and
it immediately aroused opposition search results related to politically many others are considering doing so.
from several quarters. and socially sensitive issues. Google These countries use various methods
International security researchers reached its decision, considered long to apply different levels of filtering.
argued that the filter was inaccurate overdue by some, in response to
and contained vulnerabilities poten- attacks by Chinese hackers on Gmail HOW WEB FILTERING WORKS
tially exposing PCs to security threats. accounts of Chinese human rights Web filters differ in complexity,
Free speech advocates expressed activists during the previous month. granularity, accuracy, location, and
concern that the government could US Secretary of State Hillary Clinton transparency. Generally, simpler
use the filter to monitor users’ online publicly praised Google and called Web filters are easy to implement but
activities and block politically sensi- for a global end to Web censoring, work at a coarser granularity. Just as
tive websites. And the US government prompting a critical response from importantly, motivated individuals
urged the Chinese Ministry of Indus- the Chinese government. tend to discover and circumvent them
try and Information Technology and As a mass medium, the Web is more readily.
Ministry of Commerce to revoke the unique. Traditional print media such
Green Dam requirement on the basis as books and newspapers require IP and URL blacklists
of free trade. time to publish and distribute physi- The simplest Web filters depend on
In response to the controversy, cally, while electronic mass media, blacklists of IP addresses. The main
the Chinese government indefinitely namely radio and television, typi- advantage of blacklists is speed—
“delayed” the requirement except for cally depend on official licenses for essentially a fast table lookup. Speed
PCs used in schools, cyber cafes, and operation and are either state-regu- allows Web filtering at “choke points”
other public access locations. lated or self-censored in exchange in the network where traffic is aggre-
Around the same time, follow- for fewer regulations. In contrast, gated such as gateways between
ing the controversial reelection of the Web lets individuals easily pub- neighboring national networks. How-
President Mahmoud Ahmadinejad lish a rich range of content—such as ever, the simplicity of blacklists has
in Iran, critics accused the regime multimedia, hyperlinks, and Java- two major drawbacks. First, continual
of blocking certain websites such as Script—globally in real time to PCs blacklist updating requires enormous
Facebook and YouTube—which had and various mobile devices. effort and resources. Second, IP
been used to post confrontations with The public perceives the Web as blacklists work at a coarse granular-
the police—as well as sites affiliated a wildly uncensored medium, the ity—the filter either blocks or allows
with the opposition leader. The Ira- embodiment of complete freedom all Web content from an IP address.

94 COMPUTER Published by the IEEE Computer Society 0018-9162/10/$26.00 © 2010 IEEE


URL blacklists offer a finer level of
granularity. These are often imple-
mented at DNS servers that resolve
domain names to IP addresses. When
a DNS server receives a resolution
request, it checks the URL against the
blacklist. If the requested URL is on
the list, the DNS server will return an
incorrect or default IP address. URL
blacklists suffer the same drawback as
IP blacklists—maintaining and updat-
ing them requires considerable effort.
IP and URL blacklists can be
deployed at proxy-based filters.
Organizations commonly use proxy
servers to cache Web content locally:
The proxy cache keeps recently
requested content, and if users
request the same content again, it is (a)
served from the proxy instead of the
origin server. Because all Web content
goes through the proxy server, it is an
attractive location for filtering.
A proxy-based filter checks the IP
addresses or URLs in all Web requests
against a blacklist. If it detects a
blacklisted IP address or URL, the
proxy filter can return a “blockpage”
with an error message or explanation
that the content was blocked. Some
nations such as China and Iran go
further and block URLs containing
prohibited keywords in the URL path.
However, keywords in the URL do
not necessarily reflect the webpage
contents accurately. For example, the
word “sex” in a URL would include
many other types of websites besides
pornographic ones and blocking
URLs containing the word would not
catch all pornographic sites.
(b)
Content filtering Figure 1. Green Dam Web filter. (a) Users or network administrators can choose
Real-time content filtering at a to block or admit access to five categories of Web content. Source: University of
proxy or the Web client offers two Michigan. (b) Green Dam’s image scanner relies on skin color and face recognition to
block access to pornographic websites.
important advantages over IP or
URL blacklists. First, the filter exam-
ines Web contents when a page is peting requirements: intelligence Figure 1a shows; users or network
requested, without the need to prees- to recognize an undesirable page administrators can choose to block
tablish a blacklist. Second, the filtering or content within a page, and fast or admit access to each category.
decision is based on individual web- operation. Content filters typically Speed is important because people
pages or even elements within a page. use machine learning or AI tech- become impatient and leave a web-
However, a content filter has niques to classify webpages into site if it returns requested contents
two difficult and somewhat com- a set of predefined categories, as too slowly.

MARCH 2010 95
W EB T ECHNOLOGIE S

Intelligent content filters examine To filter pornography, Green Dam t certain Internet services such as
various elements of a webpage for includes an image scanner that anonymizers, blogs, peer-to-peer
classification, including the metadata, appears to identify regions of skin sharing, and social networks.
links, text, images, and scripts. color and inputs the region charac-
Metadata in a Web document’s teristics—for example, shape—into a Government Web filters in China
header can reveal information about trained support vector machine (SVM) and Iran are considered to be among
authorship and keywords. classifier. An additional step attempts the strictest, but most countries use
Linkage analysis is based on “guilt to recognize a human face, report- some form of Web filtering. Even in
by association”; pages are likely to edly using the open source OpenCV the US, which is widely believed to
be linked to other pages of the same image recognition software. If a have the fewest restrictions, Web
type. For example, a news website is human face is not the primary com- filtering is common at public Inter-
likely linked to other news sites. A ponent, the filter classifies the image net access spots such as schools and
page’s links can be quite revealing as pornographic and blocks the site, libraries, as well as on private corpo-
about its subject. as Figure 1b shows. rate premises.
Textual analysis is critical to Considering the algorithm’s heavy
content filters because many web- reliance on skin color and face rec- Example: China
pages are mostly text. Content filters In 2003, the Chinese Ministry of
typically parse the text, identify key- Public Security initiated the Golden
words, and apply machine-learning Most countries use Shield Project, also called the “Great
techniques to decide on the most Firewall of China,” a comprehen-
some form of Web
appropriate category. This approach sive network to monitor and control
filtering.
is not perfect without semantic anal- access to both domestic and foreign
ysis because it can be difficult to websites. All foreign traffic enters
understand different contexts—for China at three choke points—interna-
example, whether a page with sexual ognition, it is not surprising that tests tional gateways or Internet exchange
content has educational or porno- of Green Dam have found the porn points (IXPs): Beijing-Qingdao-Tianjin
graphic purposes. filter to be fallible, mistakenly filter- in the north, from Japan; Shanghai
Image analysis is vital for clas- ing Garfield the cat, Johnny Depp, on the central coast, also from Japan;
sifying pornographic websites, but Paris Hilton, and flesh-colored pork. and Guangzhou in the south, from
classification accuracy can be low. Hong Kong. A small amount of traffic
Image analysis usually looks for STATE-SPONSORED also enters by satellite, but satellite
regions with skin color, but this task WEB CENSORING connections are slow and expensive.
is complicated by the wide range of Given the long history of govern- The two main backbone providers,
human skin colors and different light- ments trying to control traditional China Netcom and China Telecom,
ing conditions. mass media, state-sponsored Web use IP blocking at the international
Analysis of active content such censoring could be expected. How- gateways. The backbone providers
as JavaScript, VBScript, and ActiveX ever, its prevalence was unknown also filter on the basis of URLs con-
controls is essential to detect and prior to the OpenNet Initiative. taining certain keywords in the URL
filter malicious webpages. In par- path. Through DNS tampering, DNS
ticular, JavaScript is frequently used Filtered content queries for these URLs may return a
for phishing and drive-by downloads. There is little disagreement that “site not found” error.
Unfortunately, heavy obfuscation of certain Web content is harmful and While most countries depend
the code can make it unreadable, in should be filtered, such as fraud, Web solely on IP or URL filtering, China
which case it may be necessary to spam, malware, child pornography, also exercises content filtering via
run the code in a JavaScript inter- and terrorism. State-sponsored Web its state-licensed ISPs. Content fil-
preter and observe its behavior. filtering generally targets four types ters purchased largely from foreign
of content: security companies examine packet
Example: Green Dam payloads for forbidden content—for
Green Dam is an example of a t political dissent; example, sensitive keywords such
desktop Web filter that uses both t social deviance—for example, as “democracy” or “64” (for June 4,
approaches. It first checks a requested hate speech, pornography, and the day of the Tiananmen Square
URL against blacklists and then scans gambling; incident in 1989). The filters may cut
text for obscenities and politically t nationa l securit y threats— connections to websites with prohib-
sensitive phrases. namely, terrorism; and ited contents by means of TCP resets,

96 COMPUTER
and then temporarily block further
connection attempts to the same IP
address. Prohibited websites within
China require official registration and
may be shut down.
Web filtering is one tool in the
broader monitoring and censor-
ship program. It is believed that
the Chinese government employs a
large cyber police force of mostly
university students to continually
inspect websites such as blogs and
discussion forums. They search for
prohibited content, especially politi-
cal dissidence; participate in online
discussions to inf luence public Figure 2. State-sponsored Web censoring. In many countries—in this case, Saudi
Arabia—a user who requests a government-prohibited website is presented with a
opinion; identify and track down blockpage explaining that the site is disallowed.
the authors of subversive content;
and pressure the public to report
on individuals engaging in prohib- the filters are working and thus are goal is to impose significant obstacles
ited online activities. A system of unsure how to circumvent them. on the country’s 350 million Internet
electronic access cards deployed in users to discourage the majority from
Internet cafes identifies users before FILTER CIRCUMVENTION accessing certain foreign websites
they can go online. Perpetrators are Given the Chinese government’s and thereby keep public attention on
subject to fines, job dismissal, or effective covert Web censoring tightly regulated domestic sites. Only
imprisonment. program, its mandate for the desk- a small minority have the technical
Web censoring in China involves top Green Dam filter was puzzling. know-how to find ways around gov-
an extra twist due to its lack of trans- Although the filter would have been ernment censorship.
parency. In contrast, countries such installed on every PC, it could easily

F
as Saudi Arabia publish their Web be disabled. or those concerned with the
censoring policy and rationale. A user Sufficiently motivated individu- prevalence of Web censoring,
who requests a prohibited website is als can circumvent a Web filter—for the Herdict Web project (www.
presented with a blockpage explain- example, they can bypass a URL herdict.org) collects reports of websites
ing that the site is disallowed, as blacklist by going directly to the serv- around the world that are inaccessible
Figure 2 shows. Citizens can submit er’s IP address. In the case of China, to many people and thus a possible
requests for URLs to be blocked or Golden Shield does not attempt to sign of censorship. As Thomas Jeffer-
allowed. read or censor any encrypted traffic son once wrote, “The price for freedom
No one is quite certain of the such as used in virtual private net- is eternal vigilance.”
extent of Web censoring in China. works (VPNs). Virtually all foreign
The level of censoring appears to vary businesses in China depend on VPNs, Thomas M. Chen is a professor of net-
with political events. Users may see and blocking them would unaccept- working in the School of Engineering
connection errors but no blockpages ably impact commerce. at Swansea University, Wales. Contact
with clear explanations. In addition, Chinese citizens also use vari- him at t.m.chen@swansea.ac.uk.
content filters based on sensitive key- ous proxy tools, including Garden Victoria Wang is a postdoctoral
words may block a webpage one day Networks’ GTunnel (www.gardennet- researcher at the Centre for Crimi-
but allow access the next day if the works.org), Psiphon (www.psiphon. nal Justice and Criminology in the
page contents have changed. ca), and UltraSurf (www.ultrareach. School of Human Sciences at Swansea
Consequently, Chinese users are com). These products use a combi- University. Contact her at 257387@
uncertain if a website’s inaccessi- nation of proxies, encryption, and swansea.ac.uk.
bility is due to deliberate filtering onion routing to offer anonymized
or random congestion in the Inter- and uncensored Web access.
Editor: Simon S.Y. Shim, Dept. of Computer
net. This unpredictability actually If Web filtering is easy to circum-
Engineering, San Jose State Univ., San Jose, CA;
makes Web censoring more effective vent, then what purpose does it serve?
simon.shim@sjsu.edu
because people cannot learn how In the case of China, the apparent

MARCH 2010 97

View publication stats

You might also like