Professional Documents
Culture Documents
Tipsdebuscadores
Tipsdebuscadores
The Web search world changes on what sometimes seems like an hourly basis. What
follows are a few selected tips and resources for some of the most well-known of
engines. This is just the tip of the iceberg. Resources like Search Engine Showdown and
Search Engine Watch are essential for learning and keeping up with how these tools work
and change over time. Ten Things to Know About Google
2. Google utilizes the Open Directory Project database as its Web Directory
[http://directory.google.com].
3. You can search stop words by placing a + in front of the word (ex. "+To +Be +Or Not
+To +Be").
4. At the present time the Google database is refreshed about once every month.
5. You can limit your search to only .pdf files by using the syntax filetype:pdf.
6. Google is the only major search engine to crawl Adobe Acrobat .pdf files.
7. If you are a frequent Google searcher, save time by using the Google Toolbar
[http://toolbar.google.com] and Google Buttons [http://www.google.com/
options/buttons.html].
8. A Boolean "OR" is available with Google. For it to function, capitalize the OR.
9. Google only crawls and makes searchable the first 110 k of a page. Long documents
may have substantial content invisible to Google.
10. Entering a U.S. street address into the query box will return a link to a map of that
address location. Typing in a person or business name, city, and state will also run the
query to the Google phone directory. Several other combinations are available that will
also query the phone directory service, including typing in the area code and number to
run a reverse search [http://www.google.com/ help/features.html#wp].
1. AllTheWeb licenses its database to Lycos. The identical database is searched and
makes up some of the content on a Lycos results page.
2. Unlike Google and AltaVista, this search engine does not have a limit on the amount of
content crawled on a Web page.
3. AllTheWeb indexes every word. Words traditionally considered as "stop words" are
searchable.
5. If plus and/or minus signs are not used, AllTheWeb implies a plus sign in front of each
term or phrase. This results in an implied "anding" of terms.
6. AllTheWeb is now promising a complete refresh of its database every 9-12 days.
7. AllTheWeb permits syntax to be used direct from the "basic" search page to limit a
query. See http://www.alltheweb.com/ help/basic.html#special.
8. A query to the AllTheWeb text database simultaneously runs the search in the
AllTheWeb Image, Video, MP3, and FTP databases. If it finds anything, these results are
linked on the right side of the results page.
10. Fast Search and Transfer (FAST), the company behind AllTheWeb, has deployed its
software to power the Scirus science search engine from Elsevier.
1. AltaVista is the only major search engine that allows a searcher to use the proximity
operator, NEAR (in simple search) near (advanced search). Using this operator finds
terms within 10 words of each other in either direction.
3. An asterisk (*) can be used in a phrase to represent an entire word. (Ex. "One small
step for man, one giant * leap for mankind")
5. The use of the "sort by" box on the AltaVista Advanced interface allows you to give
certain words or phrases a higher relevancy weighting.
6. Caveat: If you use Advanced Search, make sure to place some term or terms in the
Sort-By box; otherwise, results return in completely random order.
8. AltaVista's advanced search does not allow for the use of + and — signs.
9. If you search AltaVista in the "simple" mode entering multiple terms without syntax, it
will result in an "implied" OR. In the advanced mode, multiple terms are considered a
phrase.
10. AltaVista software powers the Health Resources and Services (U.S. government)
search engine. This means that all AltaVista syntax can be utilized there. This site also
illustrates AltaVista capability of indexing full-text .pdf documents on the site-specific and
intranet level [http://search.hrsa.gov].
3. The Advanced Search interface permits limiting to pages at a certain depth in the site.
For example, limiting to pages Depth 3 will limit the search to only pages no more than
three directories deep from an entire site [e.g., http://www.testsearch.com/
Directory1/Directory2/Directory3/].
5. According to the most current Search Engine Showdown rankings, MSN Search has the
largest database of any Inktomi partner.
7. On the Advanced Search interface, checking the "Acrobat" box will retrieve pages with
links to pages that contain .pdf files. It does not search content "inside" these files.
8. Greg Notess points out that the same syntax available to limit Hotbot will also work
with MSN Search [http://hotbot.lycos.com/ help/tips/search_features.asp].
9. Danny Sullivan notes that MSN also employs human editors to "hand-pick" key sites in
the Web Directory and Featured Link sections of the site. Although most of the time the
"Featured Links" represent major MSN advertisers, editors can add other content.
10. Selecting and search under the MSN "News Search" tab returns results predominantly
from MSNBC.
1. Make sure to study the Northern Light "Power" search page. It provides many limiting
options without the knowledge of any syntax [http://nlresearch.northernlight.com/
power_research.html].
4. Northern Light's Special Editions are subject specific portals that combine material
from the "open Web" and NL's proprietary databases. Topics of Special Alerts include
XML, managed care, and electronic commerce.
5. The Northern Light Special Collection currently contains content (fee-based, pay-per-
document) from over 7,100 sources. A catalog of these publications is available at
http://nlresearch.northernlight.com/ docs/specoll_help_catlook.html.
6. Northern Light allows the use of Boolean operators and + and - signs.
7. Multiple truncation symbols can be used in a query. Northern Light has two truncation
symbols. The asterisk (*) for multiple letters and the percent symbol (%) for single or
absent letters, e.g., medieval/mediaeval.
8. In addition to the limiting capabilities of the "Power" search page, NL has several
terms available for field searching. These include text:, text:, and pub:. (This last prefix
allows searching in a specific Special Collection publication title.) You can find a complete
list at
http://nlresearch.northernlight.com/ docs/search_help_quickref.html.
9. Northern Light's free "Alerts" feature is one resource you must know about. This
feature allows you to set up search strategies in ANY/ALL of the NL databases and have
those strategies searched up to three times daily. If any new material hits on the
strategy, results will be delivered to you via e-mail. I use this tool to bring me a
customized feed of news via the NL News Search database. Remember, the full-text
content is free to access for 2 weeks.
10. Northern Lights "Geo Search" provides an opportunity to search the Web with
keywords and U.S. and Canadian address information. Results also get the benefit of NL's
organization with its "custom folders."
I am very excited to see that controlled vocabularies and the building of ontologies have
come into vogue.
Some of this "hipness" has been caused by the promise and excitement surrounding XML
(eXtensible Markup Language). However, I am not sure if the coming of XML will help the
general-purpose search engine, though it should clearly help specialized, focused, and
Invisible Web engines become much more useful resources.
The general-purpose engines, as we know and love them today, hypothetically index each
page, massive amounts of data coming from just about anyone who wants to produce
Web content and put it on a publicly accessible server.
The problem for implementation of a controlled vocabulary with this material is really one
of creation. Who would create it? Who would maintain it? Who would do the cataloging?
Would entire sites be cataloged at the page level or only a specific page (the top page)?
Who would manage such a project? Where would the money come from?
Controlled vocabularies and XML show a great deal of promise for certain types of search
engines because these types of engines can much more easily create and enforce a set of
agreed upon standards. Many issues would need resolution before we could apply
controlled vocabularies to make searching the massive amount of material on the open
Web more effective.
Here are some new search products that show a lot of promise, a few more potential
"quick hits." With the vulnerability of the Internet industry of late, let's hope these
products survive. Even if the actual companies do not survive, the technology is still
worth knowing about. Have fun!!!
• WISEnut
[http://www.wisenut.com]
• Teoma
[http://www.teoma.com]
• GuideBeam
[http://www.guidebeam.com]
• picsearch
[http://www.picsearch.com]
Real-Time Search
Patented technology to search resources updated in real-time.
• http://www.netcurrents.com
• http://www.iphrase.com
Now let's see if you've learned your lessons. How long will it take before you've tried all
these new promising sites out? The test clock starts...now!
Those of you who need to keep current on the Web search world should monitor the
following sites as often as possible. All these sites are free and most contain free e-mail
newsletter and updates.
SearchDay
http://www.searchenginewatch.com/ searchday/
Written by Chris Sherman. Daily updates.
ResearchBuzz
http://www.researchbuzz.com
Written and compiled by Tara Calishain. Daily updates.
Free Pint
http://www.freepint.com
Fortnightly newsletter edited by Will Hann. Also offers Web discussion boards.