Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Performance Oriented Query Processing In GEO Based Location Search Engines

Performance Oriented Query Processing In GEO Based Location Search Engines

Ratings: (0)|Views: 71 |Likes:
Published by ijcsis
Geographic location search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called location search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.
Geographic location search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called location search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.

More info:

Published by: ijcsis on Jun 30, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

06/30/2010

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
, No.1, 2010
PERFORMANCE ORIENTED QUERY PROCESSING IN GEOBASED LOCATION SEARCH ENGINES
 A
 BSTRACT 
 Geographic location search enginesallow users to constrain and order searchresults in an intuitive manner by focusing aquery on a particular geographic region.Geographic search technology, also called location search, has recently received significant interest from major search enginecompanies. Academic research in this area has focused primarily on techniques for extracting  geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic searchengines. Query processing is a major bottleneck in standard web search engines, and the mainreason for the thousands of machines used bythe major engines. Geographic search enginequery processing is different in that it requiresa combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them intoan existing web search query processor, and evaluate them on large sets of real data and query traces. Key word: location, search engine, query processing 
I.INTRODUCTION
 
The World-Wide Webhas reached a size where it is becomingincreasingly challenging to satisfy certaininformation needs. While search engines are stillable to index a reasonable subset of the (surface)web, the pages a user is really looking for areoften buried under hundreds of thousands of lessinteresting results. Thus, search engine users arein danger of drowning in information. Addingadditional terms to standard keyword searchesoften fails to narrow down results in the desireddirection. A natural approach is to add advancedfeatures that allow users to express other constraints or preferences in an intuitivemanner, resulting in the desired documents to bereturned among the first results. In fact, searchengines have added a variety of such features,often under a special “advanced search”
 
interface, but mostly limited to fairly simpleconditions on domain, link structure, or modification date. In this paper we focus ongeographic web search engines, which allowusers to constrain web queries to certaingeographic areas. In many cases, users areinterested in information with geographicconstraints, such as local businesses, locallyrelevant news items, or Permission to makedigital or hard copies of all or part of this work for personal or classroom use is granted withoutfee provided that copies are not made or distributed for profit or commercial advantage
1
Dr.M.UmamaheswariDept. of Computer Science, Bharath UniversityChennai-73, Tamil Nadu, Indiadruma_cs@yahoo.com
2
 
S.SivasubramanianDept. of Computer Science, Bharath UniversityChennai-73, Tamil Nadu, Indiasivamdu2001@Yahoo.Com
 
86http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
, No.1, 2010
and that copies bear this notice and the fullcitation on the first page. To copy otherwise, tore publish, to post on servers or to redistribute tolists, requires prior specific tourism informationabout a particular region. For example, whensearching for yoga classes, local yoga schoolsare of much higher interest than the web sites of the world’s largest yoga schools. We expect that
 geographic search engine’s
, that is, searchengines that support geographic preferences, willhave a major impact on search technology andtheir business models. First, geographic searchengines provide a very useful tool. They allowusers to express in a single query what mighttake multiple queries with a standardsearch engine.
A
.
 
L
OCATION BASED
 
A user of a standard search engine looking for ayoga school in or close to Tambaram, Chennai,might have to try queries such as
yoga ‘‘Delhi’’
yoga “Chennai”
yoga‘‘Tambaram’’ (a part of Chennai)
 but this might yield inferior results as there aremany ways to refer to a particular area, and sincea purely text-based engine has no notion of geographical closeness (example, a result acrossthe bridge to Tambaram or nearby in Gundymight also be acceptable). Second, geographicsearch is a fundamental enabling technology for 
location-based services”
, including electroniccommerce via cellular phones and other mobiledevices. Third, geographic search supportslocally targeted web advertising, thus attractingadvertisement budgets of small businesses with alocal focus. Other opportunities arise frommining geographic properties of the web,example, for market research and competitiveintelligence. Given these opportunities, it comesas no surprise that over the last two years leadingsearch engine companies such as
Google
and
Yahoo
have made significant efforts to deploytheir own versions of geographic web search.There has also been some work by the academicresearch community, to mainly on the problemof extracting geographic knowledge from web pages and queries. Our approach here is based ona setup for geographic query processing that werecently introduced in [1] in the context of ageographic search engine prototype. While thereare many different ways to formalize the query processing problem in geographic searchengines, we believe that our approach results in avery general framework that can capture manyscenarios.
B
.
QUERY FOOTPRINT
We focus on the efficiency of query processingin geographic search engines, example, how to
87http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
, No.1, 2010
maximize the query throughput for a given problem size and amount of hardware. Query processing is the major performance bottleneck in current standard web search engines, and themain reason behind the thousands of machinesused by larger commercial players. Addinggeographic constraints to search queries resultsin additional challenges during query executionwhich we now briefly outline. In a nutshell,given a user query consisting of severalkeywords, a standard search engine ranks the pages in its collection in terms of their relevanceto the keywords. This is done by using a textindex structure called an inverted index toretrieve the IDs of pages containing thekeywords, and then evaluating a term-basedranking function on these pages to determine the
highest-scoring pages. (Other factors such ashyperlink structure and user behavior are alsooften used, as discussed later). Query processingis highly optimized to exploit the properties of inverted index structures, stored in an optimizedcompressed format, fetched from disk usingefficient scan operations, and cached in mainmemory. In contrast, a query to a geographicsearch engine consists of keywords and thegeographic area that interests the user, called
query footprint”
.Each page in the search engine also has ageographic area of relevance associated with it,called the ‘
 geographic footprin’t 
of the page.This area of relevance can be obtained byanalyzing the collection in a preprocessing stepthat extracts geographic information, such as citynames, addresses, or references to landmarks,from the pages and then maps these to positionsusing external geographic databases. In other approaches it is assumed that this information is provided via meta tags or by third parties. Theresulting page footprint is an arbitrary, possiblynoncontiguous area, with an amplitude valuespecifying the degree of relevance of eachlocation. Footprints can be represented as polygons or bitmap-based structures; details of the representation are not important here. A geosearch engine computes and orders results basedon two factors.
C
.
KEYWORDS AND GEOGRAPHY
.
 
Given a query, it identifies pages that contain thekeywords and
 
whose page footprint intersectswith the query footprint, and ranks these resultsaccording to a combination of a term-basedranking function and a geographic rankingfunction that might, example, depend on thevolume of the intersection between page andquery footprint. Page footprints could of course be indexed via standard spatial indexes such as
∗ 
-trees, but how can such index structures beintegrated into a search engine query processor,which is optimized towards inverted indexstructures? How should the various structures belaid out on disk for maximal throughput, andhow should the data flow during query executionin such a mixed engine? Should we first executethe textual part of the query, or first the spatial part, or choose a different ordering for eachquery? These are the basic types of problems thatwe address in this paper. We first provide some background on web search engines andgeographic web search technology. We assumethat readers are somewhat familiar with basicspatial data structures and processing, but mayhave less background about search engines andtheir inner workings. Our own perspective is
88http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->