You are on page 1of 29

Wed Search & Advertising

V.Deepthi 1210A048

Search Engine Early History

By late 1980s many files were available by anonymous FTP. In 1990, Alan Emtage of McGill Univ. developed Archie (short for archives)

Assembled lists of files available on many FTP servers. Allowed regex search of these file names.

In 1993, Veronica and Jughead were developed to search names of text files available through Gopher servers.

Web Search History

In 1993, early web robots (spiders) were built to collect URLs:

Wanderer ALIWEB (Archie-Like Index of the WEB) WWW Worm (indexed URLs and titles for regex search)

In 1994, Stanford grad students David Filo and Jerry Yang started manually collecting popular web sites into a topical hierarchy called Yahoo.

In early 1994, Brian Pinkerton developed WebCrawler as a class project at U Wash.

In late 1995, DEC developed Altavista. Used a large farm of Alpha machines to quickly process large numbers of queries. Supported boolean operators, phrases, and reverse pointer queries.

Web Search Recent History

In 1998, Larry Page and Sergey Brin, Ph.D. students at Stanford, started Google. Main advance is use of link analysis to rank results partially based on authority.

Web Challenges

Distributed Data: Documents spread over millions of different web servers. Volatile Data: Many documents change or disappear rapidly (e.g. dead links). Large Volume: Billions of separate documents. Unstructured and Redundant Data: No uniform structure, HTML errors, up to 30% (near) duplicate documents. Quality of Data: No editorial control, false information, poor quality writing, typos, etc. Heterogeneous Data: Multiple media types (images, video, VRML), languages, character sets, etc.

Growth of Web Pages Indexed

Billions of Pages Google Inktomi AllTheWeb Teoma Altavista

Link to Note from Jan 2004

Assuming 20KB per page, 1 billion pages is about 20 terabytes of data.

Manual Hierarchical Web Taxonomies

Yahoo approach of using human editors to assemble a large hierarchically structured directory of web pages.

Open Directory Project is a similar approach based on the distributed labor of volunteer editors (net-citizens provide the collective brain). Used by most other search engines. Started by Netscape.

Business Models for Web Search

Advertisers pay for banner ads on the site that do not depend on a users query.
CPM: Cost Per Mille (thousand impressions). Pay for each ad display. CPC: Cost Per Click. Pay only when user clicks on ad. CTR: Click Through Rate. Fraction of ad impressions that result in clicks throughs. CPC = CPM / (CTR * 1000) CPA: Cost Per Action (Acquisition). Pay only when user actually makes a purchase on target site.

Advertisers bid for keywords. Ads for highest bidders displayed when user query contains a purchased keyword.
PPC: Pay Per Click. CPC for bid word ads (e.g. Google AdWords).

Search engine components

Spider (a.k.a. crawler/robot) builds corpus

Collects web pages recursively
For each known URL, fetch the page, parse it, and extract new URLs Repeat

Additional pages from direct submissions & other sources

The indexer creates inverted indexes

Various policies wrt which words are indexed, capitalization, support for Unicode, stemming, support for phrases, etc.

Query processor serves query results

Front end query reformulation, word stemming, capitalization, optimization of Booleans, etc. Back end finds matching documents and ranks them

Web Search Using IR

Web Spider

Document corpus

Query String

IR System

1. Page1 2. Page2 3. Page3 . .

Ranked Documents

Ads vs. search results

Ads vs. search results

Search advertising is the revenue model

Multi-billion-dollar industry Advertisers pay for clicks on their ads

Interesting problems
How to pick the top 10 results for a search from 2,230,000 matching pages? What ads to show for a search? If Im an advertiser, which search terms should I bid on and how much to bid?

Web search basics

Sponsored Links CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation San Francisco-Oakland-San Jose, CA Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice.

Miele, Inc -- Anything else is a compromise

Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)

At the heart of your home, Appliances by Miele. ... USA. to Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... - 20k - Cached - Similar pages

Web crawler

Welcome to Miele, the home of the very best appliances and kitchens in the world. - 3k - Cached - Similar pages

Miele - Deutscher Hersteller von Einbaugerten, Hausgerten ... - [ Translate this

page ] Das Portal zum Thema Essen & Geniessen online unter Miele weltweit ...ein Leben lang. ... Whlen Sie die Miele Vertretung Ihres Landes. - 10k - Cached - Similar pages

Herzlich willkommen bei Miele sterreich - [ Translate this page ]

Herzlich willkommen bei Miele sterreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERTE ... - 3k - Cached - Similar pages



The Web Indexes Ad indexes

What is Web Advertising?

Web advertising is the action of promoting your website using online advertising tools, techniques and methods proven to get the results you are looking for. It is used simultaneously as online advertising.

Online advertising is basically the action of actively promoting your new business.

The signposting should give a concise and accurate idea of what they can expect to find when they get there with that precious click. What happens after that, is another matter. -Zsolt Kerekes, is the editor of STORAGEsearch

Web Advertising Principles

Keep ads for outside companies on the periphery of the page Keep ads as small and discreet as possible relative to your core homepage content If you place ads outside the standard banner area at the top of the page, label them as advertising so that users dont confuse them with your sites content Avoid using ad conventions to showcase regular features of the site

What Type of Web Advertisements Are There?

Banners (static, animated and interactive) Interstitial (pop ups and similar pages that interrupt the user) Rich Media (Advanced technology, incorporating video, audio, animation and photographs) Sponsorships, events and corporate sites Opt-ins (forms, newsletters push technologies) Viral marketing and email campaigns Spam, malware and cookies

A Tidbit on Pop-Ups

Pop-ups are the single biggest annoyance on the Internet Yet pop-up advertising is growing faster than any other form of online advertising. ``Any survey we've seen shows that users dislike pop-ups more than almost any other ad format,'' said David Hallerman, senior analyst at marketing-research firm eMarketer. ``[But] we see online advertising growing 25% this year, and [ad ware] surpassing it by 10%.'
Top sites for pop-up/pop-under ads for May 2004
CNN MSN Yahoo!

The Weather Channel

Excite The New York Times Classmates MSNBC CBS SportsLine

How Do Advertisers Pay For It?

Rates are quoted in cost per thousand (CPM), meaning the cost for every thousand times the ad is served; Each time an impression (someone sees the ad on a site) is said to have occurred.

Type of Ads Buys

Run-of-Site Specific Pages Keyword Searches Targeted Users

Ad Buys: Understanding Them

Pay-For-Placement (PFP)
As long as you bid the top two or three positions, you are guaranteed to be displayed in the top of the results for the search engine and its partners

Pay-For-Inclusion (PFI)
A search engine includes your website pages in its index in exchange for payment, generally six months to one year. This does not mean your page will appear in the top position

Google Adwords
Keywords you pick for your site are matched against those products or services people have expressed an active desire to get information on

The Battle For Space

Paid search results are the hottest business on the Web, so it's little surprise the two titans of search are colliding
Google's revenues were $390 million in the first quarter, up 118% from a year ago Yahoo moved into the business forcefully when it acquired a paid search company called Overture last year

The hottest spots include the home pages of the Big Three: Yahoo, MSN, American Online
Marketers generally buy the home-page ad for 24-hour periods Space on these sites they may have to be booked up to a year in advance

A New Form Of Web Advertising: Adware

Adware is an advertising supported software that is available for free and in exchange displays advertising banners within the software interface Instead of you having to pay for the software, the company creates revenue by selling advertising space in the software product Adware will usually install additional third party components on your system and may exchange statistical data with a remote location over the internet Usually, taking advantage of these free products involves providing some information about yourself that is used to target content and measuring effectiveness on behalf of paying advertisers

The Effects of Phising and Spoofing on Web Advertising Phishing and spoofing occur when scammers dupe Web
adopt technology that certifies legitimate mail incorporate toolbars that warn users that they may be entering shady parts of the Internet
Auction site eBay (EBAY) has one that stays green when users are on eBay, goes gray when they leave the site, and sends out a pop-up message when they stumble onto a known spoof site

users into divulging account and other personal information by pretending to represent known brands How can a marketer deal with phishy e-mail and spoofing scamsters?

use software that can help companies react when targeted by tainted mail, blunting the damage to customers. Check with your Internet service providers
Some are developing so-called "black lists" that block e-mail from known spammers. In the future, these could be turned into "white lists," so that only e-mail that has been verified from legitimate sources makes it through

The Problems With Phising and Spoofing on Web Advertisers

The problem with implementing many of today's available security solutions:

slower online communication more expensive for the advertiser more cumbersome for users

Marketers should never ask for personal information nor link to a page that asks for personal data For now, the best defense for marketers is strong and consistent branding, so customers can tell the difference between a real e-mail and a phishing attack

What Can I Do To Protect Myself From Phishing?

Don't trust e-mail headers, which can be forged easily Avoid filling out forms in e-mail messages. You can't know with certainty where the data will be sent and the information can make several stops on the way to the recipient Try not to click on links in an e-mail message from a company. Too many scam artists are making forgeries of company's sites that look like the real thing If you go to a link offered in an unsolicited e-mail, check to see if there is an 's' after the http in the address and a lock at the bottom of the screen. Both are indicators that the site is secure If you want to do business online, don't click on an e-mail link. Go to the company's Web site yourself and fill out information there Review credit card and bank account statements as soon as you receive them to determine whether there are any unauthorized charges. If your statement is late by more than a couple of days, call your credit card company or bank to confirm your billing address and account balances

Use anti-virus software and keep it up to date

Why Are Phishers Rarely Caught?

The fraud can be perpetrated very quickly, and afterward, the perpetrator can "vanish" into cyberspace The phony websites typically migrate from one server to another very rapidly -- in an effort to stay a step ahead of ISPs and law enforcement The average phishing web site is online for only about 54 hours, according to June data from the APWG. Some sites, however, have been able to remain online for more than two weeks before being shut down or abandoned Existing federal laws do criminalize phishing -- but mainly after the damage is done, when a consumer has already been defrauded as a result of the phishing. Those measures include the laws against wire fraud, identity theft, credit card fraud, computer fraud, and a number of trade laws -- and may even encompass the new federal CAN SPAM Act Many phishers appear to send their e-mails from overseas, and it may be difficult to prosecute persons who reside offshore

After All This, Do You Still Want To Get In The Business?

It is an industry that is exponentially growing: U.S. advertisers this year will spend a record $9.1 billion on online advertising, according to a new report from eMarketer Online's share of U.S. media spending this year will reach a record 3.4% By 2007, U.S. online spending as a percent of media advertising total is projected at $16.0 billion

Why is there such growth in Web Advertisement?

While web advertising is important, other investments by marketers, like a company's own Web site, are often more critical to making strong connections with consumers

75% of the U.S. population now has Internet access at home, according to NetRatings 29% of U.S. homes have a broadband connection, says eMarketer m/5742 d.PPT dsassignposts.html .html