4

SEARCH AND RESOURCE DISCOVERY PARADIGMS
The three different paradigms of information search and resource discovery are:
1. Information search and retrieval

2. Electronic directories and catalogs
3. Information filtering
1. Information Search and Retrieval
Information search and retrieval is a process of finding and extracting information

according to the specifications provided by a user.
The main purpose of developing this process is to support naive users in areas like
electronic shopping and home banking. The goals include the following:
1) To satisfy the customers up to the maximum extent .

2) To reduce the cost
3) To fastly execute the requested query. Computer methods that are used to
execute the query are (a) method for finding exact match based on keyword
(b) Method for finding nearest neighbours.
Information search and retrieval is used in areas like libraries where customers are
concentrating on information seeking behaviour.
2. Electronic Directories and Catalogs
Directories and catalogs are used for:
1) Information organizing and

2) Information browsing
3) Information filtering
1. Information Organizing: Organizing refers to the human-guided process of
deciding how to interrelate information, usually by placing it into some sort of
hierarchy. The main weakness of information organizing is that is typically done by
"someone else" and is not easy to change. Ironically, this is also its main strength,
because people prefer a fixed system they can get used to, even if it is not the
most efficient. Maintaining.
2. Information Browsing: Browsing refers to the corresponding human-guided

activity of exploring the organization and contents of a resource space. Information
browsing depends heavily on the quality and relevance of the organization.
Browsing can lead to navigation problems and disoriented users.
3. Information Filtering: The objective of information filtering is to provide access

to relevant and variable information when a user requests for it. This information
represents a small portion of the total information base that can be accessed, whenever
required. Information filtering is a process of selecting only those information that matches
the user's request. The purpose of this process is to eliminate unnecessary data present
in the incoming stream. This process is not responsible for performing any kind of search
but its only objective is to filter out inconsistent data. It consists of data which is
1
transmitted implicitly by different remote sources or by different alternative sources like
e-mail.
Software filters are used to provide access control ensures only the information
that is appropriate is passed to the decision maker. The transmitted information helps
them to behave in a more versatile way with respect to the changing organizational
surroundings. There are two types of software filters. They are :
i) Local Filters - Local filters are used for processing incoming stream of data.
ii) Remote Filters - Remote filters are software agents that perform their task
on behalf of users. They help users to perform daily task, search and retrieve
information, support decision-making. They work as a proxy for user that
move around the database present on different networks.
Information Search and Retrieval
Searching is a process of finding the required information from a massive amount

of stored semi structured information. This is in contrast to the database application that
deals with the structured format since it follow certain standards, syntaxes and make use
of data type that have specific meaning. Examples of the structured and includes students
database and email messages respectively.
The process of searching for text strings in a large collection of documents can be
divided into two phases:
1) End-user retrieval and

2) Publisher indexing phase
End-user Retrieval Phase
This phase consists of three steps that the user performs during the text search.
1) The user formulates a query, specifying in some way the material for which
the text database is to be searched.
2) The server interprets the user's query, performs the search, and returns to
the user a list of documents meeting the search criteria. Text systems usually
perform the search by comparing search terms with an index file containing
a sorted list of words found in the document database.
3) The user selects documents from the hit list and browses them, reading and
perhaps printing selected portions of the retrieved documents.
Publisher Indexing Phase
This phase consists of
1) entering documents into the system and

2) creating indexes and pointers to facilitate subsequent searches.
This process often takes place during off-hours so that system performance is not
2
degraded during working hours. Some systems, such as those used by news agencies, add
documents to the database constantly with a live data feed. The process of loading
documents into the system and updating indexes is normally not a concern to the user.
These two phases are highly interdependent. The user interface should provide a
way of entering search queries and for browsing matched documents. The index should
be structured to expedite the type of searching permitted by the queries, and the data-
entry procedures must work within the structure of the documents and the search indexes.
The purpose of a search engine in any indexing system is simple: to find item that
matches a query, no matter where it is located in the file system. The trick for the software
designers is to create a search engine that carries out this job quickly and accurately while
taking up as little disk space as possible. Search engines are now being designed to go
beyond simple, broadband searches for which WAIS is so popular.
Topic is a search engine used in Lotus Notes, Adobe Acrobat, and a variety of other
products. It uses both key words and information searching to rank the relevance of each
document. Topic might return a list of a hundred documents that match the user's criteria,
but they would be listed in order of the relevance that topic assigns.
A different approach is offered by context-based searching. As exemplified by

Architect, these tools let the user enter a query and then come up with the relevant data
based on the content of the documents based on the context of the documents themselves.
The system tries to figure out the content of the documents based on the context of the
words, not the words per se. The result is that the system might find stories that don't
have any of the words in your search but that do have the same general meaning. The
engine is probably not for everyone, but its approach is certainly promising.
Other approaches to data searching on the web or on other wide-area networks

are available. The most compelling is Oracle's context, which can go through a variety of
documents and create its own summary, pulling about three key statements from each
document it selects.
Wide Area Information Service (WAIS) Engine
Wide Area Information Service or WAIS enables users to search the contents of
files for any string of text that they supply. An extremely versatile service, WAIS uses an
English-language query front end to a large assortment of databases that contain text-
based documents.
WAIS lets users search the full text of all the documents on a server. Users on
different platforms can access personal, company, and published information from one
interface - text, pictures, voice or formatted documents. Since the system uses a single
computer-to-computer protocol, information can be stored anywhere on different types of
machines.
WAIS has three elements:
1) a client,
2) a server, and
3
3) an indexer. ,
The indexer takes a list of files the publisher wants to index and generates from it
several index files. These indexes include a directory of all words appearing in the
database, a list of documents and files that constitute the database, and the "headline" of
the documents contained in the database.
With the index created, the user must tell the rest of the world about it. The
publisher does so automatically, by running WAIS with a register option, which places this
index next to the hundreds of WAIS indexes already available on the Internet - items such
as a legal index from West Publishing, indexes of government documents, and countless
academic databases.
WAIS solves a number of problems from the user's perspective. They are :
1) It allows users to identify and select information from large databases.
2) It provides heterogeneous database access, as published databases may be on a

variety of different systems and the user need not know how to use each system.
3) It provides ways to download and organize the retrieved data so that users are
not overwhelmed.
Uses of WAIS:
WAIS is a sophisticated search engine. Some publishers create the WAIS indexes of files
that they serve through the World Wide Web or Gopher. These indexes enable users to
search the contents of those files. Several companies use WAIS to sell information over
the Internet. One can associate a cost directly with WAIS sources, unlike many other
Internet services.
Indexing Methods
The two types of indexing methods used by search engines are :
1) File-level Indexing and

2) Word-level Indexing.
1) File-level Indexing associates each indexed word with a list of all the files in which
that word appears at least once. A file based index does not carry any additional
information about the location of words with in files. Such an index uses disk space
economically, usually a less percentage of the size of the main text that it indexes.
2) Word-level Indexing is more sophisticated and stores the location of every instance
of word. These indexes enable users to search for complete phrases or words that are
in close proximity. For instance, say you entered a query on electronic commerce into
a file-level index. A word-level index, on the other hand, contains the location of each
word in your file system, so it avoids such mistake by ensuring that electronic and
commerce are adjacent.
The disadvantage of word-level indexing schemes is that all the extra information
they contain gobbles up a lot disks pace - anywhere between 35 percent and 100 percent
4
of the size of the original text. They also can be slower than file-level indexes because
they have more information to search through.
Indexing Packages
A large number of indexing packages have become available for UNIX-based workstations.
These packages fall into three categories:
1. The client- server method is based on the distributed approach in which the
document database and the text search and retrieval software reside on a central
server, while sophisticated data representation and user interface software reside
on the user's workstation. The power of the server is used for the data intensive
job of comparing search terms with text files or indexes, while the workstations are
best suited for graphical interfaces.
In this approach, the index file can be split into pieces corresponding to
work groups and maintained on separate servers. This approach provides fast
response time for documents "owned" locally. Searches of portions of the index
stored in other servers can be performed in the background while the user is
retrieving and studying locally owned documents. One disadvantage of this
approach is that each sub-index has to be updated individually each time the
master file is updated.
2. The mainframe-based approach is generally more expensive and less flexible

than the previous architectures, but it provides for large amounts of storage, fast
response time, and standard data management and configuration control. The
mainframe may also handle query and display formatting, enables searches to be
conducted from non-intelligent character based terminals.
3. The parallel-processing approach allows many processing units to conduct
searches simultaneously. Typically, the file to be searched is broken up into many
pieces, and each processor searches its segment of the index file. The processors
may or may not share memory and storage. If the processor and the segments are
balanced, each processor can operate independently of the others, and all
processors complete processing at approximately the same time. The results are
merged before being presented to the user.
WWW Robots, Wanderers, and Spiders
Robots, Wanderers, and Spiders are all programs that traverse the WWW
automatically gathering information. The terms robot and spider are often used in
reference to automated tools for access to publicly accessible databases on the Internet
for the purpose of building indices of documents. These web robots are generally used by
search engine like Google to perform the following tasks:
i) For indexing the web content
ii) Scanning the e-mail addresses
iii) Creating a copy of visited pages for later processing
iv) For validating HTML code
5
v) For providing up-to-date information to the users.
As the demands of merchants are increasing rapidly, agent based resource

discovery is becoming crucial or vital. The main purpose of this discovery program is that,
it helps companies to find the business partners of their interest if centralized directory is
not present. It does this by traversing the WWW recursively and to record the presence
and absence of resource over it.
Working of these Programs
A software agent views the World Wide Web as a graph. It starts at a set of nodes (.HTML)
and traverses the hypertext links in these nodes to a certain depth beginning at a URL
passed as an argument. Only URLs having "." suffixes or tagged as "HTTP:" and ending in
a slash are probed. Unsuccessful attempts and document leaves are logged into a separate
table to prevent revisiting. This method results in a limited-depth breadth-first traversal
of only HTML portions of the Web.
It is very difficult for performing comprehensive or full-scale searching over the

internet because of the time limit and heterogeneous nature. Because of these problems,
it becomes essential to handle multiple searches simultaneously. This is possible when
many software agents cooperative with each other to work in the distributed environment
consisting of different networks, and different protocol standards.
Research is being conducted for the following purposes :
1) control multiple agents,

2) maintaining consistency,
3) negotiation among agents,
4) dealing with payment,
5) security and reliability issues,
6) efficiently searching for and transmitting information,
7) minimizing redundancy and
8) ensuring adequate coverage of the information sources.
Models of Information Retrieval
Researchers have long considered ways to retrieve information efficiently from

databases.
There are three models that are used for retrieving information from the database
in an efficient manner.
1) Boolean information retrieval model

2) Vector space information retrieval model
3) Probabilistic retrieval model
1. Boolean Information Retrieval Model
The Boolean model is based on the "exact match" principle and is the standard for most
popular information retrieval systems. The term Boolean is used because the query
6
specifications are expressed as words or phrases, combined using the standard operators
OR, AND, and NOT.
This model retrieves all text files containing the combination of words or phrases
specified in the query, but it makes no distinction between any of the retrieved documents.
Thus the result of the comparison operation is a partition of the database into a set of
retrieved document set.
One disadvantage of this model is that it does not allow for any form of ranking of
the retrieved document set. Presenting documents to the user in presumed order of
relevance would result in more effective and usable systems. Similarly, excluding
documents that do not precisely match a query specification results in lower effectiveness.
2. Vector Space Information Retrieval Model
Vector space and probabilistic models, based on best-match retrieval models, have
been formulated in response to the problems of Boolean models. The most widely known,
the vector space model, treats texts and queries as vectors in a multidimensional space,
the dimensions of which are the words used to represent the texts. The vector model
processes queries and texts by comparing the vectors, using, for example, a method called
the cosine correlation similarity measure.
The assumption is that the more similar a vector representing a text is to query
vector, the more likely that the text is weight terms (or dimensions) of a query, or text
representation, to take account of their importance. These weights are computed on the
basis of the statistical distributions of the terms in the database and in the texts.
3. Probabilistic Information Retrieval Model
Probabilistic information retrieval models are based on the probability ranking

principle. This states that the function of a retrieval system is to rank the texts in the
database in the order of their probability of relevance to the query. This principle takes
into account that representation of both information need and text is uncertain, and the
relevance relationship between them is also uncertain. The probabilistic retrieval model
suggests that a variety of sources could be used to estimate the probability of relevance
of a text to a query.
Electronic Commerce Catalogs or Directories
Directories perform an essential support function that guides customers in a maze

of options by enabling the organization of the information space. Finding things (users,
resources, data, or applications) in a distributed network is the task of the directory
service. Directories inform a potential customer or software agent about available services,
providers, prices, quality, and other important characteristics necessary for making
purchasing decisions.
7
Need of Directories
Directories are essential for conducting electronic commerce. Although directory

services are one of the most fundamental components of electronic commerce, technically
they are the least understood and have been an invisible component in network
architectures.
Directories are very crucial in successful implementation of electronic commerce,

for providing distributed, replicated information in the form of directory services is it
possible to grant users transparent access to all network resources.
The two types of directories are :
1) The white pages -these are used to locate people or institutions.

2) The yellow pages - these are oriented toward consumers who have decided to buy
a product or service.
Yellow pages focus on customers who had made a decision of purchasing product.
Yellow pages act as an advertising medium of low profile. The difference between print-
based yellow pages and electronic yellow pages is that, the later provides more enhanced
services than the former. In electronic yellow pages, a directory is created that acts as an
interface to various resources. These directories can even be accessible even from an
electronic commerce application, which includes a huge demand for these directories.
An effective directory service must be readily accessible by all network

components, provide quick response times, and accurately reflect changes in network
configurations and resources as they occur.
The challenge is to create a directory representation that can be accessed by
• Different types of networks (wired and wireless),
• different types of interfaces (TV plus set-top, mobile units, or PCs),
• different type of access applications (e-mail, procedure or function calls, or dial-up),

and
• Various applications (home shopping, home banking, electronic stock brokers).
Implementation Problems of Directories
- Directory services must map large numbers of objects (users, organizations,

groups, computers, printers, files, processes, and services) to user-oriented
names. The problem is difficult enough in a homogeneous LAN environment,
given document and equipment moves and changes to names, locations, and so
forth.
- In a heterogeneous global WAN environment, the task becomes considerably
more complex, given the need to synchronize information in different directory
databases.
8
- As distributed applications appear on the network, the directories have to begin
tracking all those objects and their components as well. Hence directories and
naming tend to go hand. A good name service makes use of a distributed
computing environment transparent to the user.
A directory or catalog is an information base about a set of real world objects. Users
often scene directories for telephone numbers or addresses, facts, or organizations or
persons. Directories must therefore be organized in a manner that facilitates easy access
to information, and the directory user must be able to locate "entries" in the directory
where the actual information is stored or presented. Directories are also being slowly
integrated with messaging services such as e-mail and EDI applications.
Electronic White Pages
The electronic white pages provide services from a static listing of e-mail addresses
to directory assistance. The Internet directory assistance service can be more extensive
than the one provided by the phone companies, as the technology provides the ability to
publish important information that an individual may make publicly available,, such as
photographs, home mailing addresses and fax numbers, office information, and job
descriptions.
The original intention behind organizational directories was to reduce the amount of
duplication as corporations spend money maintaining identical lists in several sites - for
phones, security, payroll, faxes, computers, e-mail, and other reasons.
Functions that white pages directory perform are :
1) Searching and
2) Retrieving
Searching is defined as a process of finding people Can be done by mean of indices

according to the information provided. This search process returns a list of documents that
matched the query. The main purpose of using index is to make searching process quicker
and easier.
Retrieving is a process of obtaining additional information related to a person's

address, telephone number, e-maiI mailbox, or security certificate.
Approaches used for creating white pages directory are interpretability and
conventional form of communication. These approaches are strong enough to provide all
the required functionality that is used for establishing directory service associated with
different technologies.
White Pages through X.500
One of the first goals of the X.500 project has been to create a directory for keeping track
of individual electronic mail address on the Internet.
X.500 offers the following features:
9
1) Decentralized maintenance: Each site running X.500 is responsible only for its
local part of the directory, so updates and maintenance can be done instantly.
2) Searching Capabilities: X.500 provides powerful searching facilities that allow
users to construct arbitrarily complex queries. For example, in the white pages,
you can search solely for users in one country. From there, you can view a list
of organizations, then departments, then individual names. This represents a
tree structure with successive descent to the terminal nodes or instances.
3) Single global name space: X.500 provides a single name space to users.
4) Structured information framework: X.500 defines the information framework
used in the directory, allowing local extensions.
5) Structured Information framework: X.500 can be used to build directory
applications that require distributed information (e-mail, automated resources
locators, special-purpose directory tools). These applications can access a
wealth of information in a uniform manner, no matter where they are based
or currently running.
Model required for constructing X.500 directory services are :
1) the directory architecture model,

2) the information architecture model, and
3) the security model.
The X.500 directory is composed of a collection of servers termed directory system
agents (DSAs). A DSA is essentially a server that stores information according to the X.500
standard and can , when necessary, exchange data with other DSAs. The DSAs cooperate
to provide the overall directory services to directory user agents (DUAs).
A directory is a collection of servers (DSAs) cooperating among themselves to hold

information about a variety of objects, thus formatting the directory information tree
(DIT). The DIT is hierarchical data structure consisting of a root, below which countries
are defined. Below the countries (usually) organizations are defined, and below an
organization persons, or first additional organizational units, are defined.
Figure below is a simplified illustration showing only three countries and no

organizational units. The DIT is a representation of the global directory.
10
A user of the directory can be a person of a computer program. The organization
and distribution of information among the DSAs is totally transparent to the users. A user
accesses the directory through a so-called directory user agent. The DUA automatically
contacts a nearby DSA by means of which the user may search or browse through the DIT
and retrieve corresponding information.
A DUA can be implemented in all sorts of user interfaces, so users can access the
directory through dedicated DUA interfaces or e-mail applications. Currently, most DUA
interfaces are dedicated, but it is expected that in the near future a lot of DUA interfaces
will be integrated with other applications.
The information requested by the user agent is located in the local server to which
the user agent is attached. However, it is often the case that the required information is
not contained within the local server (DSA). In this case, various server agents might
become involved and might need to cooperate to provide the information. For this reason,
several methods have been defined for the operation of the directory when information is
not located in the local server agent:
1) Chaining - Chaining involves passing a request to several DSAs before a response

is generated.
2) Referral - Referral identifies "more suitable DSA" who can satisfy the needs of the
user. A DSA might return a referral to a user or another DSA if the request cannot.
be performed.
3) Multicasting - Multicasting involves passing the same request by a DSA to two or
more DSAs.
4) Hybrids - Chaining, referrals, and multicasting can be combined as necessary to
perform the intended request.
Problems associated with X.500
1) It must be much easier to be part of the Internet white pages than to bring up a
X.500 DSA or make good use of the already deployed X.500 DSAs. X.500 is too
complicated and simpler white pages services must be defined to promote
widespread implementations.
2) To promote reliable operation and consistency of data, there must be some central
management of the X.500 system.
3) A common naming scheme must be identified and documented.
11
Electronic Yellow Pages
Users are increasingly turning to directory databases rather than printed Yellow Pages.
collection of printed financial directories. You may get additional information, such as
employee size, sales, and ownership information that are omitted from Yellow Pages
listings.
Independent publishers of various organizations are developing number of

directories that are achievable and feasible. Examples include a directory for storing
college information, corporate information, product catalogs, industrial buying guides etc.
Third-party directories can be categorized variously:
1) Basic yellow pages: These directories could be organized by human-oriented

product and service listings.
2) Business directories: These directories might take the form of extended
information about companies, financial health, news clipping, or whatever.
People are often likely to pay to use this kind of directory rather than pay to be
in the directory. There could be many of these directories, suited to different
research tasks (investment, foreign trade, manufacturing).
3) State business directories: Every business in a state is arranged by type and
by city, and there is a directory for each of the fifty states. The alternative to
using one of these directories to order phone books in the state and compile a
specialized list of names. This type of directory is useful in businesses that
operate on a state or geographic basis.
4) Directories by SIC: SIC(standard industrial classification) directories are
compiled by the government. More than two thousand different directories are
available.
5) Manufacturer’s directory: If your goal is to sell your product or service to
manufactures, then this type of directory would be most useful.
6) Big-business directory: This directory lists companies of 100 or more
employees. If your goal is to reach this group, this is an attractive directory to
have.
7) Metropolitan area business directory: These guides, developed as sales and
marketing tools for specific cities, are designed as comprehensive directories
listing companies and influential contacts at each business, along with phone
number, address, number of employees, and so on.
8) Credit reference directory: This directory provides credit rating codes for
millions of U.S. companies. Credit data are used for a variety of purposes such
as qualifying new customers, suppliers, and so on.
9) World Wide Web Directory: This directory lists the various hyperlinks of the
various servers scattered around the Internet.
Publishers of yellow pages directories can be divided into two categories :
1) Utility-related publishers - These publishers or companies publish yellow pages

directories for the telephone companies.
2) Independent publishers - These publisher publish yellow pages directories for
a specific market segments.
Interactive Product Catalog
12
The goal of interactive catalogs is simple i.e. to enable customers everywhere to
buy goods from anywhere in a virtual mall open twenty-four hours a day, seven days a
week. Customers simply look through the on-line merchandize and interact with the
company using several methods, such as e-mail, form-based secure messaging systems,
interactive desktop video, and other methods.
Directories, in contrast to catalogs, are usually compiled by third parties and play
an influential role in guiding customers in the information space to reach catalogs.
Electronic yellow pages are organized by product and services and becomes
necessary as businesses move toward electronic commerce. The goal of yellow pages is
to organize the vast amount of information so that customers can quickly locate desired
and alternative products and services through eye-catching advertisements. Yellow pages
catalogs are a reactive medium, in that they satisfy a need but do not create a need.
However, it must be noted that yellow pages work only when someone is looking for some
particular information.
The job of interactive catalogs, which are based on the idea that effective marketing
relies on a two-way information flow between the marketer and the customer.
Interactive catalogs are ideally suited for small businesses, because they enable
them to effectively utilize their limited marketing, support, and sales staffs with potential
customers around the world.
Interactive catalogs are ideally suited for small businesses, because they enable
them to effectively utilize their limited marketing, support, and sales staffs with potential
customers around the world. Catalogs must support product / service bundling,
coordinated purchasing, and associated financing.
The design of catalogs as we know it is undergoing a major revolution as

transactions (buying and selling), directories, information programming, and advertising
increasingly overlap. Effective catalogs mean reaching the consumer with information that
includes positioning information to establish image, brand and detailed product
specifications.
Information filtering
Information filtering describes a variety of processes involving the delivery of

information to people who need it. This technology is needed as the rapid accumulation of
information in electronic databases makes it imperative that consumers and organizations
rely on computing methods to filer and disseminate information.
The following are the typical features of this process:
1) Filtering systems involve large amounts of data. Typical applications would deal with
gigabyte of text, or much larger amounts of other media.
2) Filtering typically involves streams of incoming data, either being broadcast by remote
sources or sent directly by other sources (e-mail). Filtering is often meant to imply the
removal of data from an incoming stream, rather than finding data in that stream.
3) Filtering has also been used to describe the process of accessing and retrieving
information from remote databases, in which case the incoming data are the result of a
13
search query. This scenario is also used by the developers of systems that generate
"smart agents" for searching remote, heterogeneous database.
4) Filtering is based on descriptions of individual or group information preferences, often
called profiles. Such profiles typically represent user interests. The use of user profiles is
common in the library community where the process is known as the selective
dissemination of information (SDI). SDI is defined as the service that attacks the
information overload problem by keeping individuals informed of new documents
published in their area of specialization so that they can keep abreast of new
developments.
5) Filtering systems deal primarily with textual information. The problem is more general
than that and should include other types of data such as images, voice, and video that
are part of multimedia information systems. None of these data types are handled well
by conventional filtering systems, and all have representations and meanings that are
difficult to filter.
The two types of filters used in information filtering system are :
1) Intelligent filters: Intelligent filters have the potential to solve information

search and retrieval in large information spaces.
2) Software Jitters: Software filters are responsible for processing a document,
understanding the information and allowing the users to have the following
capabilities.
- speed/scan reading (highlights the most important segments of text to allow the reader
to skim it).
- text summarization (paraphrases the document and reduces the content of the original by
one-half to one-quarter).
- generation of abstracts (creates a new document approximately one-tenth as long as the
original and covering all its themes).
- information extraction (allows the creation of information retrieval agents to extract
specific information from textual databases, such as expected trends in the stock market
based on quoted analyst predictions, or information about mergers and acquisitions).
Mail-Filtering Agents
Users of mail-filtering agents can instruct them to watch for items of interest in e-
mail in-boxes, on-line news services, electronic discussion forums, and the like. The mail
agent will pull the relevant information and put it in the user's personalized newspaper at
predetermined intervals.
An example is Apple's AppleSearch software, which enables creation of personal

search agents called reporter to search incoming mail messages and documents obtained
from on-line feeds or residing on servers. Apple Search uses reporters to scan the available
content, employs a relevance-ranking algorithm to select the information of most value to
the user, then allows the user to view the text of the selected documents.
News-Filtering Agents
14
These deliver real-time one-line news. Users can indicted topics of interest, and the
agent will alert them to news stories on those topics as they appear on the newswire.
Users can also create personalized news clipping reports by selecting from news services.
Customers can receive their news stories through the delivery channel of their choice -
fax; e-mail WWW page, or Lotus Notes platform.
For instance, one can create a user agent that, based on the categories selected,
will daily download news clips on the computer, business, financial, or medical industries.
Currently, news filtering services are primarily targeted to executives who heed to keep
current concerning their areas of interest.
15

4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4

Uploaded by

Copyright:

Available Formats

SEARCH AND RESOURCE DISCOVERY PARADIGMS

1. Information search and retrieval

Information search and retrieval is a process of finding and extracting information

1) To satisfy the customers up to the maximum extent .

2. Electronic Directories and Catalogs

Directories and catalogs are used for:

1) Information organizing and

2. Information Browsing: Browsing refers to the corresponding human-guided

3. Information Filtering: The objective of information filtering is to provide access

Information Search and Retrieval

Searching is a process of finding the required information from a massive amount

1) End-user retrieval and

End-user Retrieval Phase

Publisher Indexing Phase

This phase consists of

1) entering documents into the system and

A different approach is offered by context-based searching. As exemplified by

Other approaches to data searching on the web or on other wide-area networks

Wide Area Information Service (WAIS) Engine

WAIS has three elements:

1) It allows users to identify and select information from large databases.

2) It provides heterogeneous database access, as published databases may be on a

The two types of indexing methods used by search engines are :

1) File-level Indexing and

2. The mainframe-based approach is generally more expensive and less flexible

WWW Robots, Wanderers, and Spiders

i) For indexing the web content

ii) Scanning the e-mail addresses

iii) Creating a copy of visited pages for later processing

iv) For validating HTML code

As the demands of merchants are increasing rapidly, agent based resource

Working of these Programs

It is very difficult for performing comprehensive or full-scale searching over the

Research is being conducted for the following purposes :

1) control multiple agents,

Models of Information Retrieval

Researchers have long considered ways to retrieve information efficiently from

1) Boolean information retrieval model

1. Boolean Information Retrieval Model

2. Vector Space Information Retrieval Model

3. Probabilistic Information Retrieval Model

Probabilistic information retrieval models are based on the probability ranking

Electronic Commerce Catalogs or Directories

Directories perform an essential support function that guides customers in a maze

Directories are essential for conducting electronic commerce. Although directory

Directories are very crucial in successful implementation of electronic commerce,

The two types of directories are :

1) The white pages -these are used to locate people or institutions.

An effective directory service must be readily accessible by all network

The challenge is to create a directory representation that can be accessed by

• Different types of networks (wired and wireless),

• different types of interfaces (TV plus set-top, mobile units, or PCs),

• different type of access applications (e-mail, procedure or function calls, or dial-up),

Implementation Problems of Directories

- Directory services must map large numbers of objects (users, organizations,

Electronic White Pages

Functions that white pages directory perform are :

Searching is defined as a process of finding people Can be done by mean of indices

Retrieving is a process of obtaining additional information related to a person's

White Pages through X.500

X.500 offers the following features:

Model required for constructing X.500 directory services are :

1) the directory architecture model,

A directory is a collection of servers (DSAs) cooperating among themselves to hold