Professional Documents
Culture Documents
INTRODUCTION
Defines a search engine as: ‘a program designed to help find information stored on a computer system
such as the World Wide Web, or a personal computer. The search engine allows one to ask for content
meeting specific criteria (typically those containing a given word or phrase) and retrieving a list of
references that match those criteria. Search engines use regularly updated indexes to operate quickly
and efficiently.’
In other words, a search engine is a sophisticated piece of software, accessed through a page on a
website that allows you to search the web by entering search queries into a search box. The search
engine then attempts to match your search query with the content of web pages that is has stored, or
cached, and indexed on its powerful servers in advance of your search.
Many search engines allow you to search for things other than text: for example, images
SEO methods are largely (but not exclusively) centred upon text as they involve matching key parts of
the text in your web pages with the keywords or keyphrases that people actually type into search
engines when looking for something on the internet.
There are two main types of search indexes we access when searching the web:
• directories
• crawler-based search engines
Directories
Unlike search engines, which use special software to locate and index sites, directories are compiled
and maintained by humans. Directories often consist of a categorised list of links to other sites to
which you can add your own site. Editors sometimes review your site to see if it is fit for inclusion in
the directory.
Crawler-based search engines differ from directories in that they are not compiled and maintained by
humans. Instead, crawler-based search engines use sophisticated pieces of software called spiders or
robots to search and index web pages.
These spiders are constantly at work, crawling around the web, locating pages, and taking snapshots of
those pages to be cached or stored on the search engine’s servers. They are so sophisticated that they can
follow links from one page to another and from one site to another.Google is a prominent example of a
crawler-based search engine.
SEO short for Search Engine Optimization is the art, craft, and science of driving web traffic to web sites.
Learning how to construct web sites and pages to improve and not harm the search engine placement of
those web sites and web pages has become a key component in the evolution of SEO.
In this article we will see one of the technique that is used commonly for improving SEO Creating Friendly
URLs.
Today, most of the websites built are database driven or dynamic sites and most of these websites pass data
between pages using query strings. Search engines crawlers usually do not index the page having a question
mark or any other character. If search engines do not identify a page or its content in a website, it
essentially means missing web presence for that page. How this could be handled. This write-up discusses
this topic with a sample web site project as implementation reference.
Friendly URLs
Friendly URLs pass information to the pages without using the question mark or any other character and
these pages will be indexed by the search engines which will maximize search engine rankings for your
website. Search engines prefer static URLs to dynamic URLs.
A dynamic URL is a page address that is created from the search of a database-driven web site or the URL
of a web site that runs a script. In contrast to static URLs, in which the contents of the web page stay the
same unless the changes are hard-coded into the HTML, dynamic URLs are generated from specific
queries to a site's database. The dynamic page is only a template in which to display the results of the
database query.
Search engines do not index the dynamic URLs for the reason that it will contain the non-standard
characters like ?, &, %, = etc Many times, anything after the non-standard character is omitted. For
example, URLs like the below:
http://www.myweb.com/default.aspx?id=120
In this case, if the URL after the first non-standard character is omitted, the URL will look like:
http://www.myweb.com/default.aspx
The URLs of this type will be a group of duplicate URLs for the search engine and search engines omit the
duplicate URLs and the search engine will not index all your dynamic pages. The search engine indexes a
URL like:
http://www.myweb.com/page/120.aspx
Even though nowadays search engines are optimized to index dynamic URLs, they would still prefer static
URLs.
Creating SEO Friendly URLs
What if we were to implement this in our projects? Here's a method that could be used to create friendly
URLs to boost your Page rank.
http://www.myweb.com/Order.aspx?Itemid=10&Item=Apple
http://www.myweb.com/Order.aspx?Itemid=11&Item=Orange
http://www.myweb.com/shop/10/Apple.aspx
http://www.myweb.com/shop/11/Orange.aspx
What we are going to do is that first we convert the actual URL with the query string to the contextual URL
by first getting the mapped URL name for the page from the web.config file and combining the query string
values followed by / and finally the item name as the page name. When this converted contextual URL is
clicked and the page is navigated the Application_BeginRequest event in the Global.asax file rewrites the
contextul URL into the actual URL with the query strings.
Lets get down in detail about how to build the application that is using friendly URLs. This application
simply explains how to create SEO friendly URLs and this is just an idea about how we can rewrite URLs.
You can just take the idea and rewrite it in your own way.
• CAPTCHA Image
1.1 OBJECTIVE
Background
Searcharoo Version 1 describes building a simple search engine that crawls the file system from a specified
folder, and indexes all HTML (or other known types) of document. A basic design and object model was
developed to support simple, single-word searches, whose results were displayed ina rudimentary
query/results page.
Searcharoo Version 2 focused on adding a 'spider' to find data to index by following web links (rather than
just looking at directory listings in the file system). This means downloading files via HTTP, parsing the
HTML to find more links and ensuring we don't get into a recursive loop because many web pages refer to
each other. This article also discusses how multiple search words results are combined into a single set of
'matches'.
Searcharoo Version 3 implemented a 'save to disk' function for the catalog, so it could be reloaded across
IIS application restarts without having to be generated each time. It also spidered FRAMESETs and added
Stop words, Go words and Stemming to the indexer. A number of bugs reported via CodeProject were also
fixed.
Introduction to version 4
Version 4 of Searcharoo has changed in the following ways (often prompted by CodeProject members):
1. It can now index/search Word, Powerpoint, PDF and many other file types, thanks to the
excellent Using IFilter in C# article by Eyal Post. This is probably the coolest bit of the whole
project - but all credit goes to Eyal for his excellent article.
2. It parses and obeys your robots.txt file (in addition to the robots META tag, which it already
understood) ( cool263).
3. You can 'mark' regions of your html to be ignored during indexing (xbit45).
4. There is a rudimentary effort to follow links hiding in javascript ( ckohler).
5. You can run the Spider locally via a CommandLine application then upload the Catalog file to
your server (useful if your server doesn't have all the IFilter's installed to parse the documents you
want indexed).
6. The code has been significantly refactored (thanks to encouragement from mrhassell and j105
Rob). I hope this makes it easier for people to read/understand and edit to add the stuff they need.
• You need Visual Studio 2005 to work with this code. In previous versions I tried to keep the code
in a small number of files, and structure it so it'd be easy to open/run in Visual WebDev Express
(heck, the first version was written in WebMatrix), but it's just getting too big. As far as I know,
it's still possible to shoehorn the code into VWD (with App_Code directory and assemblies from
the ZIP file) if you want to give it a try...
• I've included two projects from other authors: Eyal's IFilter code (from CodeProject and his blog
on bypassing COM) and the Mono.GetOptions code (nice way to handle Command Line
arguments). I do NOT take credit for these projects - but thank the authors for the hard work that
went into them, and for making the source available.
• The UI (Search.aspx) hasn't really changed at all (except for class name changes as a result of
refactoring) - I have a whole list of ideas & suggestions to improve it, but they will have to wait
for another day.
This made it difficult to add the new functionality required for supporting IFilter (or any other document
types we might like to add) that don't have the same attributes as an Html page.
To 'fix' this design flaw, I pulled out all the Html-specific code from Spider and put it into
HtmlDocument. Then I took all the 'generic' document attributes (Title, Length, Uri, ...) and pushed them
into a superclass Document, from which HtmlDocument inherits. To allow Spider to deal
(polymorphically) with any type of Document, I moved the object creation code into the static
DocumentFactory so there is a single place where Document subclasses get created (so it's easy to
extend later). DocumentFactory uses the MimeType from the HttpResponse header to decide which class
to instantiate.
You can see how much neater the Spider and HtmlDocument classes are (well OK, that's because I hid
the Fields compartment). To give you an idea of how the code 'moved around': Spider went from 680 lines
to 420, HtmlDocument from 165 to 450, and the Document base became 135 lines - the total line count
has increased (as has the functionality) but what's important is the way relevant functions are encapsulated
inside each class.
The new Document class can then form the basis of any downloadable file type: it is an abstract class so
any subclass must at least implement the GetResponse() and Parse() methods:
• GetResponse() controls how the class gets the data out of the stream from the remote server (eg.
Text and Html is read into memory, Word/PDF/etc are written to a temporary disk location) and
text is extracted.
• Parse() performs any additional work required on the files contents (eg. remove Html tags, parse
links, etc).
The first 'new' class is TextDocument, which is a much simpler version of HtmlDocument: it doesn't
handle any encodings (assumes ASCII) and doesn't parse out links or Html, so the two abstract methods are
very simple! From there is was relatively easy to build the FilterDocument class to wrap the IFilter calls
which allow many different file types to be read.
To demonstrate just how easy it was to extend this design to support IFilter, the FilterDocument class
inherits pretty much everything from Document and only needs to add a touch of code (below; most of
which is to download binary data, plus three lines courtesy of Eyal's IFilter sample). Points to note:
• BinaryReader is used to read the webresponse for these files (in HtmlDocument we use
StreamReader, which is intended for use with Text/Encodings)
• The stream is actually saved to disk (NOTE: you need to specify the temp folder in *.config, and
ensure your process has write permission there).
• The saved file location is what's passed to IFilter
• The saved file is deleted at the end of the method
And there you have it - indexing and searching of Word, Excel, Powerpoint, PDF and more in one easy
class... all the indexing and search results display work as before, unmodified!
The refactoring extended way beyond the HtmlDocument class. The 31 or so files are now organised into
five (5!) projects in the solution:
I, robots.txt
Previous versions of Searcharoo only looked in Html Meta tags for robot directives - the robots.txt file was
ignored. Now that we can index non-Html files, however, we need the added flexibility of disallowing
search in certain places. robotstxt.org has further reading on how the scheme works.
1. Check for, and if present, download and parse the robots.txt file on the site
2. Provide an interface for the Spider to check each Url against the robots.txt rules
Function 1 is accomplished in the RobotsTxt class constructor - it reads through every line in the file (if
found), discards comments (indicated by a hash '#') and builds an Array of 'url fragments' that are to be
disallowed.
There is no explicit parsing of Allowed: directives in the robots.txt file - so there's a little more work to do.
Ignoring a NOSEARCHREGION
In HtmlDocument.StripHtml(), this new clause (along with the relevant settings in .config) will cause the
indexer to skip over parts of an Html file surrounded by Html comments of the (default) form <!--
SEARCHAROONOINDEX-->text not indexed<!--/SEARCHAROONOINDEX-->
Links inside the region are still followed - to stop the Spider searching specific links, use robots.txt.
In HtmlDocument.Parse(), the following code has been added inside the loop that matches anchor tags.
It's a very rough piece of code, which looks for the first apostrophe-quoted string inside an onclick=""
attribute (eg. onclick="window.location='top.htm'") and treat it as a link.
Multilingual 'option'
Culture note: in the last version I was really focussed on reducing the index size (and therefore the size of
the Catalog on disk and in memory). To that end, I hardcoded the following Regex.Replace(word,
@"[^a-z0-9,.]", "") statement which agressively removes 'unindexable' characters from words.
Unfortunately, if you are using Searcharoo in any language other than English, this Regex is so agressive
that it will delete a lot (if not ALL) of your content, leaving only numbers and spaces!
I've tried to improve the 'useability' of that a bit, by making it an option in the .config
In future I'd like to make Searcharoo more language aware, but for now hopefully this will at least make it
possible to use the code in a non-English-language environment.
Searcharoo.Indexer.EXE
The console application is a wrapper that performs the exact same function as SearchSpider.aspx (now
that all the code has been refactored out of the ASPX and into the Searcharoo 'common' project). The actual
console program code is extremely simple:
The other code you'll find in the Searcharoo.Indexer project relates to parsing the command line arguments
using the Mono.GetOptions which turns the following attribute-adorned class into the well-behaving
console application below with hardly an additional line of code.
NOTE: the exe has it's own Searcharoo.Indexer.exe.config file, which would normally contain exactly
the same settings as your web.config. You may want to consider using the Indexer if your website
contains lots of IFilter-documents (Word, Powerpoint, PDF) and you get errors when running
SearchSpider.aspx on the server because it does not have the IFilters available. The catalog output file
(searcharoo.dat or whatever your .config says) can be FTPed to your sever where it will be loaded and
searched!
References
There's a lot to read about IFilter and how it works (or doesn't work, as the case may be). Start with Using
IFilter in C#, and it's references: Using IFilter in C# by bypassing COM for references to LoadIFilter,
IFilter.org and IFilter Explorer
dotlucerne also has file parsing references).
Searcharoo now has it's own site - searcharoo.net - where you can actually try a working demo, and
possibly find small fixes and enhancements that aren't groundbreaking enough to justify a new CodeProject
article...
Wrap-up
Hopefully you find the new features useful and the article relevant. Thanks again to the authors of the other
open-source projects used in Searcharoo.
History
License
This article, along with any associated source code and files, is licensed under The Code Project Open
License (CPOL)
1.2 SCOPE
There are some important things to note about these search engines.
we began to show you how search engines work. For the sake of simplicity, we can consider the
search process to work something like the following:
Although the process is actually more complex than this, the above diagram is useful in helping us to
visualise how searches work, more so in reminding us that when we enter a search term, the search
engine does not actually rush off and check every page on the web. This would take far too long.
Instead it checks your search term against an index that is stored on its servers. Spiders working their
way around the web constantly update this index.
If I carry out a search for cheap web-hosting, the search engine checks its index to see which pages
carry the terms ‘cheap’, ‘web’ and ‘hosting’. It then returns a results page containing what it believes
are the most relevant pages for these particular keywords.
Although the majority of Internet users rely on search engines to find what they are looking for, they
do not all use the same search engines. There are, in fact, numerous search engines out there, all vying
for a share in the lucrative search engine market. Here are just a few of the search engines that we use
when looking for something on the Internet:
There are two main factors that search engines use to determine the position that pages will gain in
search results:
• Keyword relevancy
• Page importance or link popularity
As we noted above, when you carry out a search query, the search engine tries to return relevant pages
for that query by returning pages that contain the keywords in your search query.
However, search engines also take the importance of the page into account when ranking pages. This
importance is based on the number of external links pointing to a page. The more links pointing to
your pages, the more important they are deemed to be by the search engine.
• Search engines allow us to search the web by entering search queries that the search engine
compares against its index of web pages.
• The leading search engines are currently Google, Yahoo, and MSN.
• Crawler-based search engines use software called spiders to crawl the web and index web
pages.
• Search engines use complex mathematical algorithms to rank web pages.
• Search engine ranking is based on a combination of page relevance and page importance.
• Page importance (or PageRank) is based on the link popularity of a web page and the quantity
and quality of external links pointing to that page.
• PageRank is calculated on a per-page basis and does not apply to websites as a whole.
page shows the results for the above search in Google .The results page is set out as follows:
1. Search box with our search query
2. The number of results Google returned for our search query (plus the time the search took)
3. Sponsored links. This is paid-for advertising. For this results page, Google has selected
adverts that are relevant to our search query.
4. Search results. This section shows the pages that Google thinks are most relevant to our
particular search terms. These listings are free.
5. Link/Page title. The text is the exact text that appears between the title tags (<title></title>)
on the page that the search result links to. Notice how keywords from our search query have been
highlighted.
6. Page description. This text is commonly the actual text that appears in the meta description of
the page that the search result links to. This is the text between the quotation marks in the HTML tag
<META NAME="description" content="YOUR TEXT HERE">. Again, Google has matched this text
with our search query.
7. Domain. This is the address of the page linked to.
8. Cached page link. Unlike the above link, which links to the domain that the page is on, this link takes
us to the cached version of the page that Google has stored on its server.
9. More results. Links to further pages of results
2.Literature Survey
.NET is a "Software Platform". It is a language-neutral environment for developing rich .NET experiences
and building applications that can easily and securely operate within it. When developed applications are
deployed, those applications will target .NET and will execute wherever .NET is implemented instead of
targeting a particular Hardware/OS combination. The components that make up the .NET platform are
collectively called the .NET Framework.
The .NET Framework is a managed, type-safe environment for developing and executing applications.
The .NET Framework manages all aspects of program execution, like, allocation of memory for the storage
of data and instructions, granting and denying permissions to the application, managing execution of the
application and reallocation of memory for resources that are not needed.
The .NET Framework is designed for cross-language compatibility. Cross-language compatibility means,
an application written in Visual Basic .NET may reference a DLL file written in C# (C-Sharp). A Visual
Basic .NET class might be derived from a C# class or vice versa.
The CLR is described as the "execution engine" of .NET. It's this CLR that manages the execution of
programs. It provides the environment within which the programs run. The software version of .NET is
actually the CLR version.
When the .NET program is compiled, the output of the compiler is not an executable file but a file that
contains a special type of code called the Microsoft Intermediate Language (MSIL). This MSIL defines a
set of portable instructions that are independent of any specific CPU. It's the job of the CLR to translate this
Intermediate code into a executable code when the program is executed making the program to run in any
environment for which the CLR is implemented. And that's how the .NET Framework achieves Portability.
This MSIL is turned into executable code using a JIT (Just In Time) complier. The process goes like this,
when .NET programs are executed, the CLR activates the JIT complier. The JIT complier converts MSIL
into native code on a demand basis as each part of the program is needed. Thus the program executes as a
native code even though it is compiled into MSIL making the program to run as fast as it would if it is
compiled to native code but achieves the portability benefits of MSIL.
Class Libraries
Class library is the second major entity of the .NET Framework. This library gives the program access to
runtime environment. The class library consists of lots of prewritten code that all the applications created in
VB .NET and Visual Studio .NET will use. The code for all the elements like forms, controls and the rest in
VB .NET applications actually comes from the class library.
If we want the code which we write in a language to be used by programs in other languages then it should
adhere to the Common Language Specification (CLS). The CLS describes a set of features that different
languages have in common. The CLS includes a subset of Common Type System (CTS) which define the
rules concerning data types and ensures that code is executed in a safe environment.
Some reasons why developers are building applications using the .NET Framework:
o Improved Reliability
o Increased Performance
o Developer Productivity
o Powerful Security
o Integration with existing Systems
o Ease of Deployment
o Mobility Support
o XML Web service Support
o Support for over 20 Programming Languages
o Flexible Data Access
2.1.2 MYSQL
Microsoft SQL Server is an application used to create computer databases for the
Microsoft Windows family of server operating systems. It provides an environment
used to generate databases that can be accessed from workstations, the web, or other
media such as a personal digital assistant (PDA).
Probably before using a database, you must first have one. A database is primarily a
group of computer files that each has a name and a location. Just as there are different
ways to connect to a server, in the same way, there are also different ways to create a
database.
SQL is short for Structured Query Language and is a widely used database language, providing means of
data manipulation (store, retrieve, update, delete) and database creation.
Almost all modern Relational Database Management Systems like MS SQL Server, Microsoft Access,
MSDE, Oracle, DB2, Sybase, MySQL, Postgres and Informix use SQL as standard database language.
Now a word of warning here, although all those RDBMS use SQL, they use different SQL dialects. For
example MS SQL Server specific version of the SQL is called T-SQL, Oracle version of SQL is called
PL/SQL, MS Access version of SQL is called JET SQL, etc.
Microsoft SQL Server is a Relational Database Management System (RDBMS) designed to run on
platforms ranging from laptops to large multiprocessor servers. SQL Server is commonly used as the
backend system for websites and corporate CRMs and can support thousands of concurrent users.
SQL Server comes with a number of tools to help you with your database administration and programming
tasks.
SQL Server is much more robust and scalable than a desktop database management system such as
Microsoft Access. Anyone whose ever tried using Access as a backend to a website will probably be
familiar with the errors that were generated when too many users tried to access the database!
Although SQL Server can also be run as a desktop database system, it is most commonly used as a server
database system.
Server based database systems are designed to run on a central server, so that multiple users can access the
same data simultaneously. The users normally access the database through an application.
For example, a website could store all its content in a database. Whenever a visitor views an article, they
are retrieving data from the database. As you know, websites aren't normally limited to just one user. So, at
any given moment, a website could be serving up hundreds, or even thousands of articles to its website
visitors. At the same time, other users could be updating their personal profile in the members' area, or
subscribing to a newsletter, or anything else that website users do.
Generally, it's the application that provides the functionality to these visitors. It is the database that stores
the data and makes it available. Having said this, SQL Server does include some useful features that can
assist the application in providing its functionality.
Database -
Objectives of database: -
The objectives of database are: -
Physical level - The lowest level of abstraction describes how the data
are actually stored. The record or data is described in term of block of
consecutive storage locations like words or bytes.
Logical level - The next level of abstraction describes what data are
stored in the database, and what relationships exist among those data.
Drawback of DBMS -
Evolution of RDBMS -
SQL Server:
SQL SERVER is a company that produces the most widely used, Server based Multi user
RDBMS. The SQL SERVER Server is a program installed on the Server’s hard disk
driver. This program must be loaded in RAM so that it can process user requests.
The SQL SERVER Server product is either called SQL SERVER Workgroup Server Or
SQL SERVER Enterprise Server
The functionality of both these products is identical. However, the SQL SERVER
Workgroup Server restricts the number of concurrent users who can query the Server.
SQL SERVER Enterprise Server has no such restrictions. Enter product must be loaded
on a multi user operating system.
Once the SQL SERVER engine is loaded into the server’s memory, users would have to
log into the engine to get work done. SQL SERVER Corporation has several client-based
tools that facilitate this. The client tool most commonly used for Commercial Application
Development is called SQL SERVER Developer 2000.
Using SQL one can create and maintain data manipulation objects such as table, views,
sequence etc. These data manipulation objects will be created and stored on the server's
hard disk drive, in a table space, to which the user has been assigned.
Once these data manipulation objects are created, they are used extensively in
commercial applications.
In addition to the creation of data manipulation objects, the actual manipulation of data
within these objects is done using SQL.
The SQL sentences that are used to create these objects are called DDL's or Data
Definition Language. The SQL sentences used to manipulate data within these objects are
called DML's or Data Manipulation Language. The SQL sentences, which are used to
control the behavior of these objects, are called DCL's or Data Control Language.
Hence, once access to the SQL*Plus tool is available and SQL syntax is known, the
creation of data storage and the manipulation of data within the storage system, required
by commercial applications, is possible.
SQL SERVER Corporation's reputation as a database company is firmly established in its full-featured,
high-performance RDBMS server. With the database as the cornerstone of its product line, SQL SERVER
has evolved into more than just a database company, complementing its RDBMS server with a rich offering
of well-integrated products that are designed specifically for distributed processing and client/server
applications. As SQL SERVER's database server has evolved to support large-scale enterprise systems for
transaction processing and decision support, so too have its other products, to the extent that SQL SERVER
can provide a complete solution for client/server application development and deployment. This chapter
presents an overview of client/server database systems and the SQL SERVER product architectures that
support their implementation.
The premise of client/server computing is to distribute the execution of a task among multiple processors in
a network. Each processor is dedicated to a specific, focused set of subtasks that it performs best, and the
end result is increased overall efficiency and effectiveness of the system as a whole. Splitting the execution
of tasks between processors is done through a protocol of service requests; one processor, the client,
requests a service from another processor, the server. The most prevalent implementation of client/server
processing involves separating the user interface portion of an application from the data access portion.
On the client, or front end, of the typical client/server configuration is a user workstation operating with a
Graphical User Interface (GUI) platform, usually Microsoft Windows, Macintosh, or Motif. At the back
end of the configuration is a database server, often managed by a UNIX, Netware, Windows NT, or VMS
operating system.
Client/server architecture also takes the form of a server-to-server configuration. In this arrangement, one
server plays the role of a client, requesting database services from another server. Multiple database servers
can look like a single logical database, providing transparent access to data that is spread around the
network.
Designing an efficient client/server application is somewhat of a balancing act, the goal of which is to
evenly distribute execution of tasks among processors while making optimal use of available resources.
Given the increased complexity and processing power required to manage a graphical user interface (GUI)
and the increased demands for throughput on database servers and networks, achieving the proper
distribution of tasks is challenging. Client/server systems are inherently more difficult to develop and
manage than traditional host-based application systems because of the following challenges:
• The components of a client/server system are distributed across more varied types
of processors. There are many more software components that manage client,
network, and server functions, as well as an array of infrastructure layers, all of
which must be in place and configured to be compatible with each other.
Client/server technologies have changed the look and architecture of application systems in two ways. Not
only has the supporting hardware architecture undergone substantial changes, but there have also been
significant changes in the approach to designing the application logic of the system.
Prior to the advent of client/server technology, most SQL SERVER applications ran on a single node.
Typically, a character-based SQL*Forms application would access a database instance on the same
machine with the application and the RDBMS competing for the same CPU and memory resources. Not
only was the system responsible for supporting the entire database processing, but it was also responsible
for executing the application logic. In addition, the system was burdened with all the I/O processing for
each terminal on the system; the same processor that processed database requests and application logic
controlled each keystroke and display attribute.
Client/server systems change this architecture considerably by splitting the entire interface management
and much of the application processing from the host system processor and distributing it to the client
processor.
Combined with the advances in hardware infrastructure, the increased capabilities of RDBMS servers have
also contributed to changes in the application architecture. Prior to the release of SQL SERVER7, SQL
SERVER's RDBMS was less sophisticated in its capability to support the processing logic necessary to
maintain the integrity of data in the database. For example, primary and foreign key checking and
enforcement was performed by the application. As a result, the database was highly reliant on application
code for enforcement of business rules and integrity, making application code bulkier and more complex.
Figure 2.1 illustrates the differences between traditional host-based applications and client/server
applications. Client/server database applications can take advantage of the SQL SERVER7 server features
for implementation of some of the application logic.
3. Analysis
3.1 Process:-
Data-flow: User_Id
The Users (Administrator’s) Id and Password is sent to the process for
Validation.
Data-flow: Validation
If the User_Id and Password that was passed is valid, i.e. it was checked
against the ‘Current User Table’ and the user was found to be an
Administrator, a ‘new’ Session will be created for the Administrator by
creating and entry in the ‘User Table’.
Attributes: User_Id
User_Type
Used by the system to create a new session and session history etc when
they (User) log-in/on to the system.
Data-flow: User_Id
The Users (Administrator’s) Id and Password is sent to the process for
Validation.
Attributes: User_Id
Password
Data-flow: Validation
If the User_Id and Password that was passed is valid, i.e. it was checked
against the ‘Current User Table’ and the user was found to be an User, a
‘new’ Session will be created for the User by creating and entry in the
‘User Table’.
Attributes: User_Id
User_Type
Data-flow: Session_Id
Attributes: Session_Id
Used to log a user out from the system and subsequently clear up the
system after they have ‘loged-out’
Data-flow: User_Id
The User_Id is used to clear up the system and ‘log-out’ the user cleanly.
The ‘User Table’ and the ‘Session History’ will be cleared of the relevant
details.
Attributes: User_Id
Used to record the fact that a user is currently active on the system.
Data-flow: User_Details
The user details are sent from the ‘User Table’
Attributes: User_Name
User_Access_Type
Data-flow: User_Id
The ‘User Table’ is updated using the User_Id to mark the fact that the
user is enabled in the "User Enabled" field of the ‘User Table’.
Attributes: User_Id
User_Enabled
Data-flow: User_Details
The Administrator enters the details of the user via the ‘Add User Screen’
so that the Process can add them to the ‘User Table’.
Attributes: User_Id
Full_Name
Password
User_Access_Type
User_Enabled
Used to delete both record and the details of an existing user from the
system.
Data-flow: User_Id
The user id or a ‘wildcard’ is entered and passed to this process wich in
turn passes it to the Search Process to find a user or list of users matching
this, which the Administrator may wish to delete. The Administrator clicks
on an entry in the returned list to delete that user from the system.
Attributes: User_Id
Data-flow: User_Details
The process returns the matching entries from the ‘User Table’ in a list
from which the administrators will click on the one(s) s/he wishes to
delete from the system as described above.
Data-flow: User_Details
The details are sourced from the ‘User Table’ and returned to the screen
whereupon the Administrator can modify them.
Attributes: User_Id
Password
Full_Name
User_Access_Type
User_Enabled
Data-flow: Updated_User
The modified details are returned to the ‘User Table’ and thus the changes
are now reflected in the system. I.e. the details for that user are updated.
Attributes: User_Id
Password
Full_Name
User_Access_Type
User_Enabled
Used to find an existing user or users on the system and return the details
to the screen.
Data-flow: User_Id
The User_Id is accepted from the find user screen and it may contain
‘wildcards’. This is then used in the search process to find user(s) who
may match it/them.
Attributes: User_Id
Data-flow: User_Details
The details of the resulting match(`s) are passed back to the screen in list
format.
Attributes: User-Id
Full_Name
User_Access_Type
User_Enabled
Data-flow: SessionNo
Attributes: Session_No
Data-flow: SessionId
Attributes: Session_Id
Receives the current Session’s Number from ‘Get Session’ and returns the
current Session’s Id.
Data-flow: SessionNo
Attributes: Session_No
Data-flow: SessionId
Attributes: Session_Id
Data-flow: Administrator_Details_Request
A request from an Administrator for an Administrator’s Details.
Attributes: User ID
Data-flow: User_Details
Data-flow: User_Details_Request
A request from an Administrator for User Details.
Attributes: User ID
Data-flow: User_Details
The details pertaining to the user (or Users if wilds cards are used) that
were requested.
Attributes: User ID
Full Name
Password
User Access Type
User Enabled
Data-flow: Document_Request
A request from a User for a Document.
Data-flow: Document_Details
The Document Details matching the required parameters are returned to
the respective screen in list form where the user may select one from the
list to read.
Attributes: Document ID
Document Type
Document Summary
USER LOGIN:-
5. Future of Search Engines
While many smaller search engines are surfacing, the onus of taking search engine technology to the next
level lies with major search engines like Google, Yahoo, MSN and Microsoft.
Information technologists believe that the search results we receive in the not too distant future would make
the present search engine technology appear primitive and cumbersome. However, in order to achieve this
new search technology, consumers must be forthcoming and shed apprehensions about protection of their
privacy.
Picture a scenario where Google is able to keep track and monitor the web sites a consumer views and
maintains a log of all of the search queries. This type of personalized information could greatly improve the
relevancy of the results displayed by the search engines to the said consumer. It is worth giving up a part of
one’s privacy, if it could result in search engines throwing out more relevant results saving time.
6. HARDWARE REQUIREMENTS
Software Requirement:
We are pursuing extensions of this work, including a formal derivation of the optimal
bias, generalization of demand assumptions, and elimination of free placement by the
gatekeeper. Our models can be extended to examine conditions under which the
information gatekeeper will begin to charge users, and specifically the case where the
gatekeeper differentiates between users by offering two versions: a fee-based premium
service with no bias in the query results, and a free basic version with paid placement
bias. The fee-based premium version will bring additional user revenues to the search
engine, however it may reduce placement revenues because paid placement becomes less
attractive to content providers. In addition, the search engine's market coverage and
placement fee may change as well, and the models can be used to determine if it is
optimal for the gatekeeper to offer differentiated service. Similar models can be
developed to examine the impact of differentiation based on advertising. Some search
engines have already began to offer fee-based premium search services that contain no
advertising. If this is the trend, it may eventually change people's view of Internet search
engines as a free resource for fair information.
Search Engines are sophisticated engines that allow users to quickly locate products and
services on the Internet. Since SEO is aimed at improving your visibility in search engine
results, it is essential that you understand the criteria they use to rank web pages. In the
next units of this course we will show how to use search engines to help locate the right
keywords for your products and help analyses the competition you will face in search
engine listings
9. Bibliography