You are on page 1of 6

Price's Priceless Tips

The Web search world changes on what sometimes seems like an hourly basis. What
follows are a few selected tips and resources for some of the most well-known of
engines. This is just the tip of the iceberg. Resources like Search Engine Showdown and
Search Engine Watch are essential for learning and keeping up with how these tools work
and change over time. Ten Things to Know About Google

1. The database that Google licenses to Yahoo! [http://google.yahoo.com] is not the


same size: it's smaller than the Google.com database. It does not contain links to cached
versions of pages. This database is also used to supply "fall-through" content (material
not in Yahoo's own database). It is often found listed as "Web page" content.

2. Google utilizes the Open Directory Project database as its Web Directory
[http://directory.google.com].

3. You can search stop words by placing a + in front of the word (ex. "+To +Be +Or Not
+To +Be").

4. At the present time the Google database is refreshed about once every month.

5. You can limit your search to only .pdf files by using the syntax filetype:pdf.

6. Google is the only major search engine to crawl Adobe Acrobat .pdf files.

7. If you are a frequent Google searcher, save time by using the Google Toolbar
[http://toolbar.google.com] and Google Buttons [http://www.google.com/
options/buttons.html].

8. A Boolean "OR" is available with Google. For it to function, capitalize the OR.

9. Google only crawls and makes searchable the first 110 k of a page. Long documents
may have substantial content invisible to Google.

10. Entering a U.S. street address into the query box will return a link to a map of that
address location. Typing in a person or business name, city, and state will also run the
query to the Google phone directory. Several other combinations are available that will
also query the phone directory service, including typing in the area code and number to
run a reverse search [http://www.google.com/ help/features.html#wp].

Ten Things to Know About AllTheWeb

1. AllTheWeb licenses its database to Lycos. The identical database is searched and
makes up some of the content on a Lycos results page.

2. Unlike Google and AltaVista, this search engine does not have a limit on the amount of
content crawled on a Web page.

3. AllTheWeb indexes every word. Words traditionally considered as "stop words" are
searchable.

4. AllTheWeb does not permit the use of Boolean operators.

5. If plus and/or minus signs are not used, AllTheWeb implies a plus sign in front of each
term or phrase. This results in an implied "anding" of terms.
6. AllTheWeb is now promising a complete refresh of its database every 9-12 days.

7. AllTheWeb permits syntax to be used direct from the "basic" search page to limit a
query. See http://www.alltheweb.com/ help/basic.html#special.

8. A query to the AllTheWeb text database simultaneously runs the search in the
AllTheWeb Image, Video, MP3, and FTP databases. If it finds anything, these results are
linked on the right side of the results page.

9. AllTheWeb offers a search engine dedicated to Mobile Web content


[http://mobile.alltheweb.com].

10. Fast Search and Transfer (FAST), the company behind AllTheWeb, has deployed its
software to power the Scirus science search engine from Elsevier.

Ten Things to Know About AltaVista

1. AltaVista is the only major search engine that allows a searcher to use the proximity
operator, NEAR (in simple search) near (advanced search). Using this operator finds
terms within 10 words of each other in either direction.

2. AltaVista indexes only the first 100 k of text on a page.

3. An asterisk (*) can be used in a phrase to represent an entire word. (Ex. "One small
step for man, one giant * leap for mankind")

4. AltaVista News http://news.altavista.com] is "powered" by Moreover. This


continuous feed of material can be searched using AltaVista syntax.

5. The use of the "sort by" box on the AltaVista Advanced interface allows you to give
certain words or phrases a higher relevancy weighting.

6. Caveat: If you use Advanced Search, make sure to place some term or terms in the
Sort-By box; otherwise, results return in completely random order.

7. AltaVista's directory comes from Looksmart.

8. AltaVista's advanced search does not allow for the use of + and — signs.

9. If you search AltaVista in the "simple" mode entering multiple terms without syntax, it
will result in an "implied" OR. In the advanced mode, multiple terms are considered a
phrase.

10. AltaVista software powers the Health Resources and Services (U.S. government)
search engine. This means that all AltaVista syntax can be utilized there. This site also
illustrates AltaVista capability of indexing full-text .pdf documents on the site-specific and
intranet level [http://search.hrsa.gov].

Ten Things to Know About MSN Search

1. MSN (Microsoft Search Network) Search is "powered" by an Inktomi database.


Remember that Inktomi licenses its database to many search sites. Each site gets a
different "flavor" of the total database.
2. The MSN Advanced Search interface offers numerous limiting options via fill-in boxes
and pull-down menus [http://search.msn.com/advanced.asp].

3. The Advanced Search interface permits limiting to pages at a certain depth in the site.
For example, limiting to pages Depth 3 will limit the search to only pages no more than
three directories deep from an entire site [e.g., http://www.testsearch.com/
Directory1/Directory2/Directory3/].

4. MSN Search allows use of the asterisk (*) as a truncation symbol.

5. According to the most current Search Engine Showdown rankings, MSN Search has the
largest database of any Inktomi partner.

6. The directory portion of MSN search is powered by the Looksmart database.

7. On the Advanced Search interface, checking the "Acrobat" box will retrieve pages with
links to pages that contain .pdf files. It does not search content "inside" these files.

8. Greg Notess points out that the same syntax available to limit Hotbot will also work
with MSN Search [http://hotbot.lycos.com/ help/tips/search_features.asp].

9. Danny Sullivan notes that MSN also employs human editors to "hand-pick" key sites in
the Web Directory and Featured Link sections of the site. Although most of the time the
"Featured Links" represent major MSN advertisers, editors can add other content.

10. Selecting and search under the MSN "News Search" tab returns results predominantly
from MSNBC.

Ten Things to Know About Northern Light

1. Make sure to study the Northern Light "Power" search page. It provides many limiting
options without the knowledge of any syntax [http://nlresearch.northernlight.com/
power_research.html].

2. Instead of entering http://www.northernlight.com, use


http://www.nlresearch.com to go straight to the Northern Light Research site. This
site aimed at the enterprise market (but available to any searcher) contains access to
several databases not available from the main URL. Most of these resources are fee-
based. They include EIU Search and market research content from FIND/SVP and
MarkIntel.

3. Northern Light provides FREE full-text access to a database of continuously updating


news content from 56 newswires. Material stays in this database, available for free
access, for 2 weeks. Then the content moves to the Northern Light Special Collection
database.

4. Northern Light's Special Editions are subject specific portals that combine material
from the "open Web" and NL's proprietary databases. Topics of Special Alerts include
XML, managed care, and electronic commerce.

5. The Northern Light Special Collection currently contains content (fee-based, pay-per-
document) from over 7,100 sources. A catalog of these publications is available at
http://nlresearch.northernlight.com/ docs/specoll_help_catlook.html.

6. Northern Light allows the use of Boolean operators and + and - signs.
7. Multiple truncation symbols can be used in a query. Northern Light has two truncation
symbols. The asterisk (*) for multiple letters and the percent symbol (%) for single or
absent letters, e.g., medieval/mediaeval.

8. In addition to the limiting capabilities of the "Power" search page, NL has several
terms available for field searching. These include text:, text:, and pub:. (This last prefix
allows searching in a specific Special Collection publication title.) You can find a complete
list at
http://nlresearch.northernlight.com/ docs/search_help_quickref.html.

9. Northern Light's free "Alerts" feature is one resource you must know about. This
feature allows you to set up search strategies in ANY/ALL of the NL databases and have
those strategies searched up to three times daily. If any new material hits on the
strategy, results will be delivered to you via e-mail. I use this tool to bring me a
customized feed of news via the NL News Search database. Remember, the full-text
content is free to access for 2 weeks.

10. Northern Lights "Geo Search" provides an opportunity to search the Web with
keywords and U.S. and Canadian address information. Results also get the benefit of NL's
organization with its "custom folders."

Ontologies, Controlled Vocabularies, XML, and Web Search Engines

I am very excited to see that controlled vocabularies and the building of ontologies have
come into vogue.

Some of this "hipness" has been caused by the promise and excitement surrounding XML
(eXtensible Markup Language). However, I am not sure if the coming of XML will help the
general-purpose search engine, though it should clearly help specialized, focused, and
Invisible Web engines become much more useful resources.

Why the hesitation?

The general-purpose engines, as we know and love them today, hypothetically index each
page, massive amounts of data coming from just about anyone who wants to produce
Web content and put it on a publicly accessible server.

The problem for implementation of a controlled vocabulary with this material is really one
of creation. Who would create it? Who would maintain it? Who would do the cataloging?
Would entire sites be cataloged at the page level or only a specific page (the top page)?
Who would manage such a project? Where would the money come from?

Controlled vocabularies and XML show a great deal of promise for certain types of search
engines because these types of engines can much more easily create and enforce a set of
agreed upon standards. Many issues would need resolution before we could apply
controlled vocabularies to make searching the massive amount of material on the open
Web more effective.

The Future: New Tools on the Way


When you learn about new search tools and share that knowledge with others, you not
only improve your own searching, but you help to make a better future for all searchers.

Here are some new search products that show a lot of promise, a few more potential
"quick hits." With the vulnerability of the Internet industry of late, let's hope these
products survive. Even if the actual companies do not survive, the technology is still
worth knowing about. Have fun!!!

Three New General Purpose Search Engines

Competition for Google?

• WISEnut
[http://www.wisenut.com]

• Teoma
[http://www.teoma.com]

• GuideBeam
[http://www.guidebeam.com]

A New Image Search Tool

• picsearch
[http://www.picsearch.com]

Real-Time Search
Patented technology to search resources updated in real-time.

• http://www.netcurrents.com

Natural Language Search Technology


This product is getting a lot of attention.

• http://www.iphrase.com

Now let's see if you've learned your lessons. How long will it take before you've tried all
these new promising sites out? The test clock starts...now!

This Article Contains Inaccuracies:


Essential Reading
In the time it takes this article to move from the author to the editor to the publisher to
the printer to you, undoubtedly something mentioned in this article will have changed.
Some feature will have appeared, another vanished. The working searcher must simply
make a policy of staying on top of those changes.

Those of you who need to keep current on the Web search world should monitor the
following sites as often as possible. All these sites are free and most contain free e-mail
newsletter and updates.
SearchDay

http://www.searchenginewatch.com/ searchday/
Written by Chris Sherman. Daily updates.

Search Engine Watch


http://www.searchenginewatch.com
A resource rich site that offers a free monthly newsletter.

Search Engine Showdown


http://www.searchengineshowdown.com
Librarian Greg Notess's site. Updated on a regular basis. Greg also manages the Search-L
list.

ResearchBuzz
http://www.researchbuzz.com
Written and compiled by Tara Calishain. Daily updates.

TVC (The Virtual Chase) Alert


http://www.thevirtualchase.com
Written and compiled by Genie Tyburski. Daily updates.

The Virtual Acquisition Shelf and News Desk


http://resourceshelf.blogspot.com
Compiled by Gary Price. Daily updates.

Free Pint
http://www.freepint.com
Fortnightly newsletter edited by Will Hann. Also offers Web discussion boards.

News Breaks from Info Today


http://www.infotoday.com/newsbreaks/
General information industry coverage of breaking news, that often features news of the
Web search world.

You might also like