Scaffolding the Semantic Web

1

Scaffolding the Semantic Web Aaron B. Helton St. Edward’s University MCIS 6309.01 February 10, 2008

Scaffolding the Semantic Web Abstract

2

Tim Berners-Lee described his vision of a Web in which information could be transferred not just to humans but also to machines, ushering in an era where machines could combine data in ways that only humans could do before. The end goal is the potential generation of new knowledge, machine-interpretable intelligence, and the ability for machines to determine the answer to specific questions. While his vision has yet to materialize significantly, many of the enabling factors have begun to form and mature, and it is from these that the next steps, albeit intermediary in nature, can be taken. Based on existing case and use studies available at the World Wide Web Consortium (W3C) Web site (Herman, I. & Stephens, S., 2007), such early Semantic Web enabling factors to date have been in very specific domains, solving specific problems. This is likely to continue, but it is only by building these foundations that a full realization of the Semantic Web can be achieved. This paper demonstrates how such building blocks can be created here and now, with an eye on a particular domain: the United States tourism industry, especially as it applies to taxpayer funded programs.

Scaffolding the Semantic Web Introduction to the Semantic Web According to Tim Berners-Lee, the creator of the World Wide Web (“Tim Berners-Lee,”

3

2008), the Semantic Web is a Web in which computers “become capable of analyzing all the data on the Web…machines talking to machines.” (“Semantic Web,” 2008) The repercussions of this include the recombination of Web-enabled data and information in ways unimagined by the original creators and extensible to the application of any new domain. In short, the Semantic Web will allow both end users and machines to ask specific questions and get meaningful answers, as opposed to being presented a simple list of documents with keyword matches. The concept of the Semantic Web is not new. At the time of this writing, it is nearly ten years old, and yet it has not fully materialized beyond an extensive set of framework and specification documents; implementations exist in domain-specific forms, where it is used to solve domain-specific problems. But lest it remain forever locked in architectural documents, its elements must be applied to as many domains as possible, solving as many domain-specific problems as possible. This will ultimately enable the full realization of the Semantic Web. While the architecture and framework pieces have been fairly well documented, application has only recently begun, and there is still much to be done. The next step is to create specific use cases so that others may follow suit. An iterative approach seems most likely, as adoption in some industries is evident, while the Semantic Web is largely absent in many others. As new implementations arise, adoption will reach a critical mass, and early efforts should provide ample payoff for the pioneers. Getting There from Here: US Tax Funded Tourism The Semantic Web is not going to build itself. New implementations will foster other new implementations, but those intermediary applications have to be created. What is needed,

Scaffolding the Semantic Web then, is an approach to designing such applications that can be readily demonstrated, including what components are necessary and what concerns may arise. The rest of this document takes a

4

look at just how this can be accomplished, with a particular use case in mind, that of US tourism. Of all the domains that can benefit from the Semantic Web, the US tourism industry, especially as represented by the taxpayer funded state tourism campaigns, could begin reaping those benefits today. When approaching the Semantic Web, one must begin with a general question that comprises the set of all more specific questions in the domain. Thus a general question regarding tax-funded tourism might be along the lines of “What does [given state] have for me to do?” Or, “What interesting things are located nearby that I can visit this weekend?” Answering either of these questions, of course, depends heavily on what one likes to do or finds interesting; they are too broad for this purpose, except to highlight that the broad categories of travel and tourism that have already been developed (and which help to focus these questions) should be a useful starting point. In the spirit of simplicity first, complexity later, these questions can be pared down to something a little more specific. For instance, one might want to know, “What state parks are within a 50 mile radius of my house, have hiking trails, and allow camping and fishing?” While this might very well comprise all such state parks in that vicinity, it is not a given, and so the question becomes the gateway into developing a richer set of semantics to describe US state parks. Before any attempt is made to answer the meta-question, that is, the question whose answer provides the answer to all such questions, some comparison must be made with the existing information that can be gleaned from the Web. For this exercise, the state park system of Texas can serve as an example, as the state is large and contains a good number of parks. A

Scaffolding the Semantic Web number of sites exist to describe one or more dimensions of this query. Among them are the Texas Parks and Wildlife site (“TPWD: Find a Park,” n.d.), which contains a list of all the Texas

5

state parks, including addresses and attractions (camping, hiking, etc.). With no way to see all of the parks on a map, and no way to easily compare one’s own location to that of any set of parks, the TPWD site is of limited use, but a good starting point. Other sites that do include such features (“Texas Outside Guide,” n.d.) are marginally better, although the data is still not accessible for meaningful use or integration by other machines. What is needed is a bridge, something to cross the chasm between what the Web now provides and what it can provide. The goal is to enable future developers to use available data, arranged semantically, to answer questions the original developers did not think to ask. Bridging the Chasm A number of components need to be in place to organize state park data semantically. In one form or another, the data exists already, so it does not need to be created again. It does need to be collected and arranged so that further automatic processing is possible. In current terms, that means that information about the various state parks needs to be put into the context of the Semantic Web. The framework that exists for parsing and assigning metadata to information for the Semantic Web is the Resource Description Framework (RDF). RDF “is a language for representing information about resources in the World Wide Web.” (Manola, F. & Miller, E., 2004) Once this data is available in an RDF format, it can be retrieved with semantic query languages like SPARQL, which is the query language specifically designed for RDF (for more information on SPARQL, see Prud'Hommeaux, E., & Seaborne, A., 2008). In fact, this will serve as a good test of the project. Once the data has been collected, organized, and marked up, an application that makes a standard set of queries can be developed against it and packaged.

Scaffolding the Semantic Web This will represent one of the first fully Semantic applications dealing with a tourism topic, and it can be extended to include more kinds of destinations and other kinds of metadata. Summary of the Benefits Lest this project be regarded as a mere toy, consider the power that Semantic Web

6

enabled information can have. Advertisers are continually looking to target their advertisements, and a Semantic database of state parks and other destinations could make advertisement integration trivial. When someone searches for Texas state parks within fifty miles of postal code 78704 (Austin), with the additional criteria that they include camping and hiking, any number of geo-coded support resources (retailers, outfitters, hotels, restaurants) could ensure a relevant audience for their advertisements. Relevance in this case comes from being in the same geographic location and appealing to the same set of interests entered by the person doing the search. This is but one of many possible uses for such data, and it is incredibly likely that other uses will emerge beyond the original scope and intent of this project.

Scaffolding the Semantic Web References Herman, I., & Stephens, S. (2007, December 4). Semantic Web Education and Outreach Interest Group Case Studies and Use Cases. Retrieved February 13, 2008, from http://www.w3.org/2001/sw/sweo/public/UseCases/.

7

Manola, F., & Miller, E. (2004, February 10). RDF Primer. Retrieved February 11, 2008, from http://www.w3.org/TR/rdf-primer/.

Prud'Hommeaux, E., & Seaborne, A. (2008, January 15). SPARQL Query Language for RDF. Retrieved February 13, 2008, from http://www.w3.org/TR/rdf-sparql-query/.

Semantic Web. (2008, February 11). Retrieved February 11, 2008, from http://en.wikipedia.org/wiki/Semantic_Web.

TPWD: Find a Park. (n.d.). Retrieved February 11, 2008, from http://www.tpwd.state.tx.us/spdest/findadest/.

Texas Outside Guide. (n.d.). Retrieved February 11, 2008, from http://www.texasoutside.com.

Tim Berners-Lee. (2008, February 11). Retrieved February 11, 2008, from http://en.wikipedia.org/wiki/Tim_berners-lee.

Sign up to vote on this title
UsefulNot useful