You are on page 1of 54

Internet

Rajesh goel

Agenda
Internet Internet History Internet Application Introduction to Google Introduction to Yahoo Introduction to W.W.W Basic & Advance search tool of internet

What Is the Internet?


A network of networks, joining many government, university and private computers together and providing an infrastructure for the use of E-mail, bulletin boards, file archives, hypertext documents, databases and other computational resources The vast collection of computer networks which form and act as a single huge network for transport of data and messages across distances which can be anywhere from the same office to anywhere in the world.

What is the Internet?


The largest network of networks in the world. Uses TCP/IP protocols and packet switching . Runs on any communications substrate.

Visual Representation of internet

History of Internet

*** Internet History ***

Brief History of the Internet


1968 - DARPA (Defense Advanced Research Projects Agency) contracts with BBN (Bolt, Beranek & Newman) to create ARPAnet 1970 - First five nodes:
UCLA Stanford UC Santa Barbara U of Utah, and BBN

1974 - TCP specification by Vint Cerf 1984 On January 1, the Internet with its 1000 hosts converts en masse to using TCP/IP for its messaging

What Was the Victorian Internet


The Telegraph Invented in the 1840s. Signals sent over wires that were established over vast distances Used extensively by the U.S. Government during the American Civil War, 1861 - 1865 Morse Code was dots and dashes, or short signals and long signals The electronic signal standard of +/- 15 v. is still used in network interface cards today.

A Brief Summary of the Evolution of the Internet


TCP/IP Created ARPANET 1972 1969

Mosaic Created WWW Internet Created 1993 Named 1989 and Goes TCP/IP 1984

Age of eCommerce Begins 1995

First Vast Computer Network Silicon Envisioned Chip A 1962 Mathematical 1958 Theory of Memex Communication 1948 Conceived 1945

Packet Switching Invented 1964

Hypertext Invented 1965

1945

1995

Application of internet

Internet-based Applications
Publishing Student Lab Reports online Simulations Finding Lesson Plans Publishing Student Stories to the Web Historical Diaries Online Textbook Virtual Labs (Interactive Frog Dissection) e-Pal Exchanges, Tele-collaborative projects

WebQuests Weather Satellite images Real-time data

Using Online Quizzes

Internet-based Applications
Real-time Data

Century Workforce Skills

In

v no

ve at i

Virtual Labs

E-Pal / d n Telecollaborative ea g iqu ellin Student Un mp o Web Page C Historical Diary Web Quests

Advanced

Simulations

Basic

a Tr

io dit

na

Online Quizzes

Online Textbook Lesson Plans

2 st 1

Highly

Somewhat

Advanced

Scale of Intuitiveness re: Applications of the Internet

Internet Growth Trends


1977: 111 hosts on Internet 1981: 213 hosts 1983: 562 hosts 1984: 1,000 hosts 1986: 5,000 hosts 1987: 10,000 hosts 1989: 100,000 hosts 1992: 1,000,000 hosts 2001: 150 175 million hosts 2002: over 200 million hosts By 2010, about 80% of the planet will be on the Internet

Why Google?
Biggest web search engine database
25 or more billion pages

Results often include what you want Features, shortcuts, special databases & services

Overview
1. How Google works 2. Exploiting Googles FUZZY search options 3. Making your searches more precise 4. Handy tools and shortcuts 5. The best of Googles family of databases 6. Google Books & Google Scholar

How Google works


Spider programs find pages on the public web, build huge database of web pages Search program gives you ways to search this database PageRank arranges your results
Word proximity and placement Popularity - a link to a page is a vote for it Importance - traffic, popularity of pages linking to a page

Outline
Yahoo!, as seen by an engineer Choosing PHP in 2002 PHP architecture at Yahoo!

The Internets most trafficked site

Yahoo! by the Numbers


411M unique visitors per month 191M active registered users 11.4M fee-paying customers 3.4B average daily pageviews

October 2005

W.W.W

The World Wide Web (abbreviated as WWW or W3 and commonly known as the Web), is a system of interlinked hypertext documents accessed via the Internet.

Search Engine

Overview
Introduction Types of Searching Parts of a Local Search Engine Working of a Local Search Engine Choosing a search engine List of some Intranet Search Engines Conclusion References

Types of Searching
A search can be of various types:
Internet Search: Search Engines like Yahoo, Infoseek crawl the web gathering web pages or info on web pages, index them and retrieve them when the specific term is found Database search: Databases store their information neatly organized into fields. A search Interface is provided for this.

Types of Searching
With databases one can set up complex queries to find the search words in all applicable fields. But this makes them slower to respond, requires more memory, and requires programming. Database search is not oriented towards text search and relevance ranking: they are great for listing of inventory or directory of the institute

Types of Searching
Intranet search: Search is restricted to a
site or a group of sites. Text search engines store this information in one index and can find words in any field for a record. Many high-end search engines can also store field information, so searches can be limited to a specific field as well.

Parts of a Local Site Search Tool


Search Indexer
The program that recognizes and creates an index of all the documents on the site. The index is stored in a file called as the index file, where the search engine will find them. Created by the Search Indexer program, this file stores the data from the site in a special index or database, designed for very quick access.

Search Index File

Parts of a Local Site Search Tool


Search Form
HTML interface to the site search tool, provided for visitors to enter their search terms and specify their preferences for the search The program (CGI, server module or separate server) that accepts the request from the form or URL, searches the index, and returns the results page to the server

Search Engine

Parts of a Local Site Search Tool


Results Listing
HTML page listing the pages which contain text matching the search term(s). These are sorted in some kind of relevance order, with the closest match at the top. The format of this is often defined by the site search tool, but may be modified in some ways.

Working of a Local Search Engine


Stores Index Words Search Engine Looks in Index Sends Query Gets Matches Indexer Gets words User Selects required page Results Page User views Retrieved Page Web Site Documents Retrieved Page Sends Formatted Results

Search Form

Types of Search Engines


CGI Programs
The Common Gateway Interface (CGI) standard allows a web server to communicate with external programs. CGI Programs run as Search Engines. For better data interchange, less overhead and more flexibility, web server companies have defined APIs (Application Programmer Interfaces) to their servers. This allows third-party developers to create modules for the servers which run inside the server process

Server Plug-Ins

Types of Search Engines


Search Servers
Some search engines run as separate servers. The form data is passed as part of the URL, just like a URL, but the search engine application runs as a separate HTTP server on a different machine. This reduces the load on the main web server. It is also possible to outsource search to a remote site search service. The indexer and search engine run on the remote server. using a web indexing robot, or spider, they follow links on the site and read the pages, then store every word in the index file on that server. When it comes time to search, the form on the site Web page send a message to the remote search engine which sends results back to the site.

Remote Searching

Choosing a Site Search Tool


Technical Considerations Indexing Features Searching Capabilities Results display Costs, licensing and registration requirements Unique features (if any)

Features of search engines:


Technical Considerations
Server platforms supported Web servers supported Scalability Technical support: Main program modules Source code availability Ease of Installation and Maintenance Unix, NT, Win'95/98/NT NCSA HTTPD, CERN HTTPD, OMNI HTTPD, XITAMI, APACHE, PWS, IIS Indexing support for multiple web servers within an intranet E- Mail , Mailing list , Documentation on Web site

Often related to the technical expertise available

Features of search engines:


Indexing features
File/document formats supported Indexing level support Standard formats recognised Customisation of document formats Stemming Stop words support HTML, ASCII, PDF, SQL, Spread sheets, WYSIWYG (MS-Word, WP, etc.) File/directory level, multi-record files MARC, Medline, etc

If yes, is this an optional or mandatory feature? If yes, is this an optional or mandatory feature?

Features of search engines:


Indexing features
Meta tags indexing Support for compression Field level searching If meta tag indexing is allowed, what kind of meta tags can be used Does the indexer support file compression? Requires more space and time

Indexing ALT text/comment text Shows if the search engine indexes ALT text associated with images or text in comment tags. Database updation Does the indexer support incremental updations?

Features of search engines


Search Capabilities
Boolean Searching Natural Langauge Phrase Truncation /wild card Exact match Duplicate detection Proximity Use of Boolean operators AND, OR and NOT as search term connectors Allows users to enter the query in natural langauge Users can search for exact phrase Variations of search terms and plural forms can be searched Allows users to search for terms exactly as it is entered Remove duplicate records from the retrieved records With connectors such as With , Near, ADJacent one can specify the position of a search terms w.r.t to orhers

Features of search engines


Searching features
Field Searching Thesaurus searching Query by example Soundex searching Relevance ranking Customization (CGI prgs) Search set manipulation Query for a specific field value in the database Search for Broader or Narrower or Related terms or Related concepts Enables users to search for similar documents Search for records with similar spelling as the search term Ranking the retrieved records in some order

Saving the search results as sets and allowing users to view search history

Features of search engine Results Display


Formats supported Can it display in native format or just HTML; Display in different formats, Display number of records retrieved Relevancy ranking If the retrieved records are ranked, how the relevance score is indicated Keyword-incontext Customization of results display Saving options KWIC or highlighting of matching search terms Allow users to select different display formats Saving in different formats; number of records that can be saved at a time

Choosing the right search engine


Checklist of factors to be considered while selecting the search engine:
Size of the website Technical expertise available (local and/or from the supplier / developer) System platforms available Information sources and services to be supported Document collection: type, volume (now and in future) Indexing, search and display requirements

Choosing the right search engine


Checklist of factors to be considered while selecting the search engine:
User community to be served Differentiate between the need for indexing the web site pages and the need for indexing databases / document collections (text, bibliographic, DBMS, etc.) Support for the concept of a "record" by the search engine. Support for structured fields and metadata Cost

Choosing the right search engine


Steps in the selection and procurement of search engines:
- Conduct a needs analysis.

- Talk to other libraries - Attend trade shows and talk to vendors - Read the literature that reviews search engines. - Compile a list of possible products. .

Choosing the right search engine


Steps in the selection and procurement of search engines:
Compare the functionality of each product to the criteria you developed through needs analysis Narrow your list down to three possible products. Spend additional time learning about each product. Invite the vendors in for demonstrations. Ask for references and follow up with each reference Select product and implement. Follow up with end users. Continue an on going review with end users.

Choosing the right search engine


Some Suggestions
The search system development or selection should be based primarily on the local needs Consider using freeware search engines, if your requirements are met by these. For large, highly developed intranet sites, you may like to consider commercial search engines Consider if the webserver you are using supports indexing and search, and if this is adequate for you.

Choosing the right search engine


The IT Professionals should make an effort to keep themselves abreast of the current web technologies The features available within a tool should be made use of properly to get maximum benefits Carefully consider interrelations between the three major components: document resources, users and the search engines.

Conclusion
Since search is such a common activity, the search box should appear on every page of your web site. The initial target of the basic search should be the contents of the entire web site. The basic search should allow for Boolean commands ("and," "or"), although this does not need to be explained.

Conclusion
A quality search process begins with quality metadata. It's that old principle: Garbage in, garbage out. Metadata is about giving a structure the the content. For example, if every document is assigned keywords or or classified by Geography, the reader will get a much more accurate return from his or her search. Search engines are the mortar of the Intranet. As important as they are, their implementation must be given high priority with the necessary time allotted for research and development

List of some (Free) Intranet Indexing Tools (for Windows)


Microsoft Index Server DeepSearch
http://www.namo.com/products/ds3/info/index.html

http://www.microsoft.com/ntserver/web/exec/feature/IndexServerSum

Harvest
http://www.tardis.ed.ac.uk/harvest/

HomepageSearchEngine
http://www.HomepageSearchEngine.com/

Swish-E
http://www.webaugur.com/wares/swish

List of some (Free) Intranet Indexing Tools (for Windows)


PLWeb Turbo (PLS / AOL)
http://www.pls.com/plweb.htm

Namazu
http://www.namazu.org/

Oracle interMedia
http://www.oracle.com/intermedia/

HomepageSearchEngine
http://www.HomepageSearchEngine.com/

Sharewire SiteSearch
http://www.sharewire.com/nav/Products/sitesearch.shtml

Free and commercial search engines


For HTML and text files (web site indexing and file/directory level indexing)
SWISH-E (sunsite.berkeley.edu/SWISH-E/) ht://Dig (htdig.sdsu.edu/) Excite For Web Servers (www.excite.com/navigate/) WebGlimpse (glimpse.cs.arizona.edu/webglimpse/

For structured/formatted data - MYSQL (www.mysql.com)

Free and commercial search engines


Commercial search engines
AltaVista (www.altavista.digital.com/) Fulcrum (www.fulcrum.com/ ) Infoseek (software.infoseek.com) Open Text (www.opentext.com/) Oracle (www.oracle.com/) PLS (www.pls.com/) Verity (www.verity.com/)