Vinod Gupta School of Management, IIT Kharagpur

MIS Term Paper The Future of Web Search

Submitted To: Dr. Prithwis Mukerjee

Submitted By: Amod Kumar Gupta 10BM60007

Abstract

The internet was made available for public use in the mid 1990s.Since then it has changed our life in a way few other things have been able to, in the past. The internet consists of nearly 487bn gigabytes (GB) of data. A search engine helps us find what we want in this endless sea of data. It is up to the search engine to prevent us from getting lost. So search engines are becoming increasingly important in the internet world. This paper will cover the current search engine technologies, problems with the current technology and the improvements to build better web search engines.

Introduction

The number of internet users are around 1.97 billion as of 30 June 2010. It is incorporated in virtually all aspect of modern human life. The Internet consists of a vast range of information resources and services. Buried in which lies information of interest . The trick is to find it. This is where search engines play a critical role. A web search engine is designed to search for information, resulting in the generation of a list of results. The result will consist of web pages, images, video and other types of files. Archie was one of the first search engines.. It was created in 1990 by Alan Emtage, Bill Heelan and J. Peter Deutsch, at McGill University in Montreal. The working of Archie was very simple compared to current search engines. It downloaded the directory listings of all the files located on public sites, and created a searchable database of file names. WebCrawler, came out in 1994,allowed the users to search for any word in any webpage, which has now become the standard. Lycos was launched in 1994 and became a major success. Yahoo!, allowed the search on its web directory only, rather than all the web pages like other search engines. Users could also browse the directory instead of doing a keyword-based search. Search engines attracted a lot of investments in the Internet investing frenzy that occurred in the late 1990s.Several companies received record gains during their initial public offerings. Some search engines have enterprise-only editions, such as Northern Light. Many search engine companies were also caught up in the dot-com bubble, ending in their demise. Around 2000, the Google search engine rose to prominence. The main difference between Google and other search engines was Google focused on search, they did not sacrifice the quality of

the web search just to make quick money through advertising. Their PageRank algorithm ranks web pages based on the number of pages that link there, on the premise that good or desirable pages are linked to more than others. Working off comScore figures from December 2009 for worldwide search queries, we have: Google: 88 billion per month Twitter: 19 billion per month Yahoo: 9.4 billion per month Bing: 4.1 billion per month

90 80 70 60 50 40 30 20 10 0

S earch queries(billion per m onth)

Goog le

Twitter

Y ahoo

Bing

How web search engines work

A search engine operates, in the following order • • • Web crawling Indexing Searching

First the web search engines retrieves the web pages with the help of software called Web crawler it follows every link on the site. The contents of each page are then processed and it is indexed. Index consists of data about web pages which is stored in an index database. A query can be a single word or a group of words. Index allows to retrieve the information as quickly as possible. When a user enters a query into a search engine, the engine first looks in its index and provides a listing of best-matching web pages according to its criteria. Most search engines support the use of the boolean operators AND, OR and NOT to further specify the search query. The engine looks for the words or phrases exactly as entered. The usefulness of a search engine depends on the relevance of the result it gives back. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant than others. Most search engines employ methods to rank the results to provide the "best" results first. The decision of which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another.

Need of the hour

The main problem with the current search engines is the quality of the search results. When we enter a search query what we get is a million guesses rather than one correct answer. With the evolution of the search engine technology it is possible to improve the search results making them more relevant and useful for the user. There are lots of ways that search will need to evolve in order to easily meet user needs which includes : challenges of mobility, modes, media, personalization, location, socialization, and language. It very exciting what search can achieve in the future. Lets look at the various dimensions in which current search engines : Personalization Search engines of the future will be able to understand more about the user. The amount of personal information to be disclosed will be at the sole discretion of the user; however it will not impose great threat till the privacy of that information is maintained. The user information will allow the search engines to give better search results. Knowledge of the location, what the user knows already or what he had learned earlier , can help fully understand the user preferences. Access to user’s emails and chat data can also be used to understand the user and his context.

Location User location is a very useful piece of information. Location is relevant to a lot of

searches; user location will help understand the context of the search query in a better manner, increasing the relevance and ease of search.

Social
The social circle of the user, consisting of his online friends and contacts will help the search engine to discover relevant content from his social circle. The content from the friends and social contacts of the user is likely to be more relevant to him than content from strangers. Analysis of the user’s social graph can be used to further refine a query or disambiguate it. Language The information on the internet is in many languages. There are cases where an answer exists , but not in a language we know. Translation can be used to solve this problem– the web search engine will search for the information , translate it and bring it back in the language that we want.. Media Instead of just having text search, search must include images, videos, news, books, and maps/local information in the search results. Pictures, video and audio can be searched based on their actual content analysed by the search engine rather than their labels which is the case now. The best media could be chosen according to the query and the corresponding results be displayed to the user .

The Technologies being Developed
What are the technologies that have the potential to revolutionize the way our search engines work. At the rate the progress is being made the search results we receive now will look as archaic in a time of a few years.

Artificial Intelligence Artificial intelligence(AI) is one of the most happening things as of now in the computer world particularly due to the increased processing speeds and large store which is possible today. Artificial intelligence can be used to attain semantic search by extracting specific facts, drawing inferences and organizing those facts based on a few key words. Also another application of AI could be naturallanguage processing by computer, allowing the search engine read text and understand the meaning

of that text. Other techniques in AI such as natural language synthesis, object recognition and statistical machine learning will change the way we search.

Watson:

Watson is a supercomputer which is being currently developed by IBM. It is expected to be world’s most advanced “question answering” machine, able to understand a question posed in natural language and respond with a precise, factual answer. It would allow machines to converse more naturally with people, letting them to ask questions instead of typing keywords. I.B.M. plans to sell versions of Watson to companies in the next year or two. It will help take decisions in a small amount of time based on analysis of all the data available, without the possibility of errors. It will be able to answer questions faster and more accurately than most human beings.

Image search Image search includes object recognition in images such as face detection, product detection etc. Image search engines now is based on keywords, or text that is linked to a image in order to perform web search. This can be unreliable, if the images lack sufficient descriptions. A new search engine, Riya, looks inside the image to extract information about it using Artificial intelligence.Each image is represented by 6,000 numbers and the search engine uses AI to match one visual signature to another.

Voice search Voice search will allow us to directly talk to a search engine, asking our queries to the search engine. The search engine will then process the query and give back the result. One example of a voice search engine is TalkTalk. TalkTalk gives a more accurate search result by interacting with the user to

understand the context and remove any ambiguity. It also evaluates and stores all the user given replies and discussions, to give even more precise answers.

Metasearch engine The concept of metasearch engines is completely different from conventional search engines. A metasearch engine is a search tool that sends user query to several other search engines and then aggregates the results into a single list. The main thinking behind this concept is that internet is too large for any one search engine to index it all and better search results can be obtained by combining the results from several search engines.

Some of the current innovative search engines

Grokker is a search engine that offers a better interface that groups search results graphically, improving the way search results are displayed. Eurekster is a search engine that uses the social networking elements to provides results that can be filtered based upon what members of your social network are searching. Some of the other prominent ones are: Viewzi Viewzi provides various visual viewing options to the users to view their search results. This allows to see the search results based on various categories, which can help the user find the information faster.

SearchMe SearchMe offers an advanced and intuitive interface. The results are displayed as a gallery of images that allows the user to see the result pages without having to click-through. It also gives the users the option to create stacks, or bundles of web pages saved for later.

Custom Search Engines An example of custom search engine is Rollyo .Rollyo allows the user to create his own custom search engine. Users can specify the sites in which they want that the search engine searches their query. One particular use could be to search in ones bookmark list. The custom search can also be shared with others, ie they can be private or public. They are unique and valuable search engines of the future. They can be used to filter the websites depending upon our needs and interest. All of these are very interesting and innovative. But the future of search, lies not in the hands of these small companies but the large companies such as Google and Microsoft simply because they have more better resources and easily surpass other search engines. It will be very difficult for the small companies to make a big impact.

Economic aspect

The future of search, really looks very exciting. Search engine technology is still no where close to where it can be. Still a lot of major changes are possible which will completely change the face of web search. Its growth depends upon how much information and privacy the average user is willing to give up. There are more than 8 million distinct websites and billions of individual Web pages, thus to finding the required information is becoming increasingly challenging. Providers of information and services know that their website is a key component of their business and that, in a crowded information marketplace . It is important for the providers of information and services on the internet that the searchers are able to find it using search engines. Search engine advertising has become a very strong business and prospects for continued growth are strong. “Web search now represents a significant portion of Web activity. Google searches average 250 million searches per day, and the total daily number of Web searches is estimated at well over 600 million. At least a portion of searching is for products or services that the searcher will eventually purchase. Research has shown that higher-income users spend more time on the Internet and buy more online. This marketplace of high-income earners is intensely attractive to marketers and much harder to isolate in traditional media such as TV or magazines.”-

Rita Vine(http://findarticles.com/p/articles/mi_m0FWE/is_2_8/ai_114010257/)

Brand advertising works on the Web. Initially it was used as an alternative advertising medium by only a few early adopters who placed ads on search engine pages. But now many advertisers, including small businesses, are using the search engine pages for advertising . Thus the commercial search engines are in advertising. They earn their revenue mainly by delivering relevant advertising using a variety of means, but principally by selling search keywords to purchasers. However , Google remains the only search engine which keeps paid results out of its main listings. Commercial search engines require traffic and relevance to ensure ad placement success. Traffic represents the number of Web users to a search site. There must be high traffic in order to maximize the probability of conversion of some of that traffic will turn into a revenue-generating activity. Relevance represents the capacity of the search engine to deliver meaningful results to satisfy the user's keyword query. Relevance algorithms are used to determine the relevance. It is important that the ad is relevant with respect to the search query otherwise the entire traffic will not lead to even a single revenue generating activity. Relevance algorithms vary across different search engines and are regularly tweaked in order to improve the user experience. Now when the search engines deliver ads to search results pages, advertisers pay fees to the search engine for every ad

impression that is delivered.

The search engine advertising process starts with keyword buying. The advertiser purchases or leases keywords or that he believes searchers will use when searching for specific products or services. This enables the ad buyer to display a URL link when the searcher enters one or more of the leased keywords into the search engine. Contracts may be based on a time period, or they may be stipulate on the number of impressions that will be delivered. After the keyword has been purchased there are two options: In paid inclusion programs, search engines and their ad-feed partners guarantee that their search engine will list pages from the advertiser's website in its index. But it does not guarantee a rank high. In Paid placement programs, a link to the advertiser's URL will be delivered in the search results on a matched keyword or keywords as well as the rank of the link can also be bought. The better the rank the more is the price. Location of the delivered link generally governs the fees, so advertisers will pay more to be placed higher up the page in the search results.

How Paid Listings Affect Search Results All the major search engines have, to a greater or lesser extent, embedded paid listings in their main search results page, with the exception of Google, which separates the ad links completely from its main search results. This generally leads to the degradation of the search results returned by the search engine. Ad link results are generally separated from algorithmically generated results and

are accompanied by headers such as "Partner Sites" or "Sponsored Links.".The more commercial the search keywords, the more likely the search is to produce paid listings. Just like in case of traditional advertising, persistent viewing of paid listings inevitably creates greater awareness of those paid listings and their brands. With greater awareness comes the likelihood that those who create Web pages will link to those paid listings simply because they have seen them many times and can remember them. All the search engines have the cumulative effect of preferring what is popular. Thus the reach of paid placement extends even to pure search tools like Google that rely on a link analysis algorithm for ranking. Moreover, as a larger number of popular sites climb higher in search results, many excellent informational resources crawl even further down the list of search results and entirely off the searcher's radar. On the ethical front paid listings in search results, Google brilliantly established itself as a trusted search tool. It plays both relevance and monetization sides of Web search in an inspired way. It draws users to its search tool through finely tuned relevance and the promise of pure search results, yet it is one of the largest ad agencies on the Web. Future Trends: The web search engines pay a large percentage of their revenues to other sites that use However this makes sense only if multiple search engines provide equivalent search quality, so that productivity remains the same, no matter what search engine they use. But this seems to be a realistic assumption , since search engines can no longer afford to ignore search quality.

The Current Players

Google Search Anyone and everyone who knows about internet knows about Google. Today google is the most popular search engine on this earth. . Google search was originally developed by Larry Page and Sergey Brin in 1997. For a search engine, the Web is consists a body of words on billions of pages and hyperlinks that connect pages. Google was successfully able to link those words efficiently, measuring relevance by the appearance of words on a page, and the number of hyperlinks pointing to that page. Google Web Search is a web search engine owned by Google Inc. Google receives several hundred million queries each day through its various services.

Google's success was in largely due to PageRank algorithm that helps rank web pages that match a given search string. Previous keyword-based methods, used by other search engines would rank pages by how often the search terms occurred in the page, or how strongly associated the search terms were within each resulting page. Google search provides at least 22 special features beyond the original word-search capability. These include synonyms, weather forecasts, time zones, stock quotes, maps, earthquake data, movie showtimes, airports, home listings, and sports scores. There are special features for numbers, including ranges ,prices, temperatures, money/unit conversions, calculations, package tracking, patents, area codes, and language translation of displayed pages.

Bing search engine Bing is a web search engine from Microsoft. Bing was unveiled by Microsoft CEO Steve Ballmer on May 28, 2009 in San Diego. As of October 2010, Bing is the 4th largest search engine by query volume, at 3.25%, after its competitor Google at 83.34%, Yahoo at 6.32% and Baidu at 4.96%, according toNet Applications. Bing has innovative features like: Image Search infinite scrolling ,A myriad of filtering options ,Video search preview, ClearFlow is a mapping feature that offers up alternative routes when there's heavy traffic, Local search is very comprehensive, Instant answers .

Ask.com Ask is a search engine which was founded in 1996 by Garrett Gruener and David Warthen in Berkeley, California. Three venture capital firms, Highland Capital Partners, Institutional Venture Partners, and The RODA Group were early investors.Ask.com is currently owned by InterActiveCorp under the NASDAQ symbol IACI. Ask.com offers many innovative tools to helps the user to get the information he needs quickly and easily. The features include: Advanced Web Search, Basic Site Preferences, Local Search, Conversions, Dictionary Search, Famous People Search, Maps & Directions, News Search, Image Search, Popular Searches, Shopping Search, Smart Answer, Stock Search, Weather Search, White

Pages Search, Zoom Related Search-Narrow or broaden your search with possible alternative search terms which appear on the right hand side of the Ask results page and Related Names- presents a list of names that are conceptually tied to topic options within the "Narrow Your Search" and "Expand Your Search" lists

Challenges

A very big challenge to building AI into a search engine is that it can be impractical on a large scale. The computational power needed to calculate the required results efficiently can be enormously expensive. The people are not ready to give their personal information to search engines. If search engine users gave up a little of their privacy and allowed their search habits to be monitored, this will allow the search engines to provide better, customized results With the amount of information already on the internet and the rate at which it is increasing it is a challenge for the search engines to scale up to that level and provide relevant and meaningful search results to the users.

Conclusion
Enter your desired search words into any of the search engines of today and the user often ends up hoping that they display the type of results he is looking for. It is more like a "enter your query and hope for the best" experience. An ideal search engine should be like a friend with instant access to all the world’s facts and a photographic memory of everything the user has seen and knows. That search engine could then give answers based on the preferences, the users existing knowledge and the best available information. The search engine could ask for clarification and present the answers in the media that worked the best. If there is a search engine where the user could just ask the questions and get the answers in a much more rich way, then it will quickly become the dominant search engine.

Within a few years, there will be next-generation search engines -- one that could extract specific facts, draw inferences and organize those facts based on a few key words. The big change that will happen in society is that instead of changing the human expressions and interactions into what's easy for the computer, we'll improve computers' abilities to handle the expressions that are natural for the human beings.

References

• • • • • • • “The Google Story by David A. Vise” http://www.internetworldstats.com/stats.htm http://en.wikipedia.org/wiki/Web_search_engine http://searchengineland.com/comscore-us-most-searches-china-slowest-34217 http://googleblog.blogspot.com/2008/09/future-of-search.html http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?_r=1 http://en.wikipedia.org/wiki/Google http://findarticles.com/p/articles/mi_m0FWE/is_2_8/ai_114010257/

“The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture” by John Battelle

Sign up to vote on this title
UsefulNot useful