You are on page 1of 7

Scaffolding the Semantic Web 1

Scaffolding the Semantic Web

Aaron B. Helton

St. Edward’s University

MCIS 6309.01

February 10, 2008

Scaffolding the Semantic Web 2


Tim Berners-Lee described his vision of a Web in which information could be transferred not

just to humans but also to machines, ushering in an era where machines could combine data in

ways that only humans could do before. The end goal is the potential generation of new

knowledge, machine-interpretable intelligence, and the ability for machines to determine the

answer to specific questions. While his vision has yet to materialize significantly, many of the

enabling factors have begun to form and mature, and it is from these that the next steps, albeit

intermediary in nature, can be taken. Based on existing case and use studies available at the

World Wide Web Consortium (W3C) Web site (Herman, I. & Stephens, S., 2007), such early

Semantic Web enabling factors to date have been in very specific domains, solving specific

problems. This is likely to continue, but it is only by building these foundations that a full

realization of the Semantic Web can be achieved. This paper demonstrates how such building

blocks can be created here and now, with an eye on a particular domain: the United States

tourism industry, especially as it applies to taxpayer funded programs.

Scaffolding the Semantic Web 3

Introduction to the Semantic Web

According to Tim Berners-Lee, the creator of the World Wide Web (“Tim Berners-Lee,”

2008), the Semantic Web is a Web in which computers “become capable of analyzing all the data

on the Web…machines talking to machines.” (“Semantic Web,” 2008) The repercussions of this

include the recombination of Web-enabled data and information in ways unimagined by the

original creators and extensible to the application of any new domain. In short, the Semantic

Web will allow both end users and machines to ask specific questions and get meaningful

answers, as opposed to being presented a simple list of documents with keyword matches.

The concept of the Semantic Web is not new. At the time of this writing, it is nearly ten

years old, and yet it has not fully materialized beyond an extensive set of framework and

specification documents; implementations exist in domain-specific forms, where it is used to

solve domain-specific problems. But lest it remain forever locked in architectural documents, its

elements must be applied to as many domains as possible, solving as many domain-specific

problems as possible. This will ultimately enable the full realization of the Semantic Web.

While the architecture and framework pieces have been fairly well documented,

application has only recently begun, and there is still much to be done. The next step is to create

specific use cases so that others may follow suit. An iterative approach seems most likely, as

adoption in some industries is evident, while the Semantic Web is largely absent in many others.

As new implementations arise, adoption will reach a critical mass, and early efforts should

provide ample payoff for the pioneers.

Getting There from Here: US Tax Funded Tourism

The Semantic Web is not going to build itself. New implementations will foster other

new implementations, but those intermediary applications have to be created. What is needed,
Scaffolding the Semantic Web 4

then, is an approach to designing such applications that can be readily demonstrated, including

what components are necessary and what concerns may arise. The rest of this document takes a

look at just how this can be accomplished, with a particular use case in mind, that of US tourism.

Of all the domains that can benefit from the Semantic Web, the US tourism industry, especially

as represented by the taxpayer funded state tourism campaigns, could begin reaping those

benefits today.

When approaching the Semantic Web, one must begin with a general question that

comprises the set of all more specific questions in the domain. Thus a general question regarding

tax-funded tourism might be along the lines of “What does [given state] have for me to do?” Or,

“What interesting things are located nearby that I can visit this weekend?” Answering either of

these questions, of course, depends heavily on what one likes to do or finds interesting; they are

too broad for this purpose, except to highlight that the broad categories of travel and tourism that

have already been developed (and which help to focus these questions) should be a useful

starting point. In the spirit of simplicity first, complexity later, these questions can be pared

down to something a little more specific. For instance, one might want to know, “What state

parks are within a 50 mile radius of my house, have hiking trails, and allow camping and

fishing?” While this might very well comprise all such state parks in that vicinity, it is not a

given, and so the question becomes the gateway into developing a richer set of semantics to

describe US state parks.

Before any attempt is made to answer the meta-question, that is, the question whose

answer provides the answer to all such questions, some comparison must be made with the

existing information that can be gleaned from the Web. For this exercise, the state park system

of Texas can serve as an example, as the state is large and contains a good number of parks. A
Scaffolding the Semantic Web 5

number of sites exist to describe one or more dimensions of this query. Among them are the

Texas Parks and Wildlife site (“TPWD: Find a Park,” n.d.), which contains a list of all the Texas

state parks, including addresses and attractions (camping, hiking, etc.). With no way to see all of

the parks on a map, and no way to easily compare one’s own location to that of any set of parks,

the TPWD site is of limited use, but a good starting point. Other sites that do include such

features (“Texas Outside Guide,” n.d.) are marginally better, although the data is still not

accessible for meaningful use or integration by other machines. What is needed is a bridge,

something to cross the chasm between what the Web now provides and what it can provide. The

goal is to enable future developers to use available data, arranged semantically, to answer

questions the original developers did not think to ask.

Bridging the Chasm

A number of components need to be in place to organize state park data semantically. In

one form or another, the data exists already, so it does not need to be created again. It does need

to be collected and arranged so that further automatic processing is possible. In current terms,

that means that information about the various state parks needs to be put into the context of the

Semantic Web. The framework that exists for parsing and assigning metadata to information for

the Semantic Web is the Resource Description Framework (RDF). RDF “is a language for

representing information about resources in the World Wide Web.” (Manola, F. & Miller, E.,

2004) Once this data is available in an RDF format, it can be retrieved with semantic query

languages like SPARQL, which is the query language specifically designed for RDF (for more

information on SPARQL, see Prud'Hommeaux, E., & Seaborne, A., 2008). In fact, this will

serve as a good test of the project. Once the data has been collected, organized, and marked up,

an application that makes a standard set of queries can be developed against it and packaged.
Scaffolding the Semantic Web 6

This will represent one of the first fully Semantic applications dealing with a tourism topic, and

it can be extended to include more kinds of destinations and other kinds of metadata.

Summary of the Benefits

Lest this project be regarded as a mere toy, consider the power that Semantic Web

enabled information can have. Advertisers are continually looking to target their advertisements,

and a Semantic database of state parks and other destinations could make advertisement

integration trivial. When someone searches for Texas state parks within fifty miles of postal

code 78704 (Austin), with the additional criteria that they include camping and hiking, any

number of geo-coded support resources (retailers, outfitters, hotels, restaurants) could ensure a

relevant audience for their advertisements. Relevance in this case comes from being in the same

geographic location and appealing to the same set of interests entered by the person doing the

search. This is but one of many possible uses for such data, and it is incredibly likely that other

uses will emerge beyond the original scope and intent of this project.
Scaffolding the Semantic Web 7


Herman, I., & Stephens, S. (2007, December 4). Semantic Web Education and Outreach Interest

Group Case Studies and Use Cases. Retrieved February 13, 2008, from

Manola, F., & Miller, E. (2004, February 10). RDF Primer. Retrieved February 11, 2008, from

Prud'Hommeaux, E., & Seaborne, A. (2008, January 15). SPARQL Query Language for RDF.

Retrieved February 13, 2008, from

Semantic Web. (2008, February 11). Retrieved February 11, 2008, from

TPWD: Find a Park. (n.d.). Retrieved February 11, 2008, from

Texas Outside Guide. (n.d.). Retrieved February 11, 2008, from

Tim Berners-Lee. (2008, February 11). Retrieved February 11, 2008, from