You are on page 1of 14

Jaypee Institute of Information Technology University

Semantic Web
(Project of Artificial Intelligence)
Submitted By : Ayush Gupta (7503854) Sajal Gupta (7503878) Yatin Wadhawan (7503879)

Table of Contents
I. Introduction ...................................................................................... 3 II. Database ........................................................................................... 5 III. Database : Index ............................................................................... 7 IV. Algorithm Design .............................................................................. 8 V. Screenshots .................................................................................... 11 VI. Bibliography .................................................................................. 14

I. Introduction
The Semantic Web
The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. -- Tim Berners-Lee, James Hendler and Ora Lassila, The Semantic Web, Scientific American, May 2001 The Semantic Web is a mesh of information linked up in such a way as to be easily processable by machines, on a global scale. You can think of it as being an efficient way of representing data on the World Wide Web, or as a globally linked database. The Semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples based structures: i.e. many triples of URI data that can be held in databases, or interchanged on the world Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called "Resource Description Framework" syntaxes.

URI - Uniform Resource Identifier


A URI is simply a Web identifier: like the strings starting with "http:" or "ftp:" that you often find on the World Wide Web. Anyone can create a URI, and the ownership of them is clearly delegated, so they form an ideal base technology with which to build a global Web on top of. In fact, the World Wide Web is such a thing: anything that has a URI is considered to be "on the Web".

RDF - Resource Description Framework


A triple can simply be described as three URIs. A language which utilises three URIs in such a way is called RDF: the W3C have developed an XML serialization of RDF. The benefit that one gets from drafting a language in RDF is that the information maps directly and unambiguously to a model, a model which is decentralized, and for which there are many generic parsers already available. This means that when you have an RDF application, you know which bits of data are the semantics of the application, and which bits are just syntactic fluff. And not only do you know that, everyone knows that, often implicitly without even reading a specification because RDF is so well known. The second part of the twofold answer is that we hope that RDF data will become a part of the Semantic Web, so the benefits of drafting your data in RDF now draws parallels with drafting your information in HTML in the early days of the Web.

II. Database
Table name : Category ( Cat_id varchar(20) PRIMARY KEY, Category varchar(20) )"; $table2="create table category

Table name : Sub-category (

$table3="create table sub-category

cat_id varchar(20) PRIMARY KEY, subcategory varchar(20) )";

Table name: Car_details (

$table4="create table car_details

model varchar(20) PRIMARY KEY, make varchar(20) PRIMARY KEY, air conditioner varchar(20), power windows varchar(2), power stering varchar(6), antilocing breaking system varchar(6), airbags varchar(6), leather seats varchar(6), cd player varchar(20), overall length varchar(20),

overall width varchar(20), overall height varchar(20), kerb height varchar(20), mileage varchar(20), seating capacity varchar(20), no of doors varchar(20), transmission typevarchar (20), gears varchar(20), minimum turning radius varchar(20), tyres varchar (20)
)";

III. Database : Index


Table name: Page $table5="create table page ( page_id varchar(20) PRIMARY KEY, page_url varchar(20) )";

Table name: word (

$table6="create table word

word_id varchar(20) PRIMARY KEY, word_word varchar(20) )";

Table name: Occurence (

$table7="create table occurence

occurrence_id PRIMARY KEY, word_id varchar(20) page_id varchar(20) )";

varchar(20)

IV. Algorithm Design


Crawling
Step 1: Retrieve the word and the no of links to be displayed. Step 2: Fire the query in SQL to fetch the links having the highest frequency of the word searched by the user. If the query results in less than 5 links then goto Step 4. Step 3: Based on the frequency, display the result fetched by the query. Step 4: Divert the control to crawler page. Crawler Step 4.1: First we create the crawler class. <?php class Crawler { } ?> Step 4.2: Then created methods (getMarkup() & get()) to fetch the web pages markup, and to parse it for data that we are looking at collecting. <?php class Crawler { protected $markup = ''; public function __construct($uri) { } public function getMarkup() { } public function get($type) { } protected function _get_images() { } protected function _get_links() { }

} ?> Step 4.3: Fetching Site Markup : The constructor is having the URL as the argument and fetches the Markup i.e. the code of the given URL. <?php public function __construct($uri) { $this->markup = $this->getMarkup($uri); } public function getMarkup($uri) { return file_get_contents($uri); } ?> Step 4.4: Crawling The Markup For Data: The pages fetched by the get() function are crawled for the protected data collection method simply by using the string operations and the PCRE (Perl Compatible Regular Expressions) function preg_match_all() in order to return all tags within the markup that are acceptable. E.g. <img([^>]+)\/>/i and /<a([^>]+)\>(.*?)\<\/a\>/i. Step 4.5: All the links are collected after filtering and removing the undesired part from URLs fetched, into an array links. Step 4.6: Repeat the process all the websites linked later, thus goto Step 4.2

Step 5: Populating the database Step 5.1: The links are sent to the index function as an argument one by one till the end of an array. Step 5.2: One temp.txt file is maintained which contains the markup of the sites. Whitespaces and the html tags are removed then.

Step 5.3: Now the word entry is made in the Word table and is provided the unique id (word_id). Similarly the pages are updated to the database as page_url and unique id (page_id) is provided to it also. Step 5.4: fetch the word_id fron the word table corresponding to the search keyword. And,now count the total occurrences of the word corresponding to its ID in all the pages. Update them in Occurrence table. Step 5.5: Sort the links in descending order and display the links in decreasing order of their occurrence from occurrence table. Step 6: After displaying. To let the user to input next keyword goto Step 1.

Data Extraction Algorithm:

Step 1: Saved the source code of the base site on basis of make and model argued by the user. Step 2: the source code so generated after the step 1 is again studied and searched for the specifications of the model asked by the user. Step 3: then designed the program to fetch the desired information from that page into an array. Step 4: then dumped the array into the database and displayed it to the user. Step 5: if the user again searches for the same model and make then the details will bedisplayed but the database remains the same with no updation thus saving the processing time.

V. SCREENSHOTS
HOMEPAGE

EXTENDED SEARCH PAGE

SEARCH RESULT PAGE

VI. Bibliography
1) http://semanticweb.org/wiki/Main_Page 2) www.php-manual.net 3) www.cars.com 4) http://www.altova.com/semantic_web.html

You might also like