You are on page 1of 13

1. What is so special about the Internet?

To address the question of what makes the Internet special, it is useful to consider a closely related question: what makes the Internet different from other telecommunication services, such as those which run over the public switched telephone network (PSTN)? There are arguably a number of differences. Underlying technology: Both the Internet and the voice telephone network run over essentially the same wires but the equipment attached to those wires, and the use made of them is different. On the Internet messages are broken down into digital "packets" of data which means that the wires can be used much more efficiently, to carry a much higher volume of information, at a lower cost. Pricing: The PSTN has traditionally been priced on the basis of usage. By contrast, the dominant pricing principle for the Internet is flat-rate pricing. The model for wholesale pricing differs too. A service provider terminating a particular telephone call receives a fee for doing so. By contrast, on the Internet, there is almost no flow of cash on an end-to-end basis. On the telephone network, developing countries are net recipients of financial flows, but on the Internet they make net outpayments, for carriage of their traffic. Traffic flows and value flows: In most telephone calls, the traffic flow is approximately even between the caller and the called party. But with web-browsing, the traffic flow is highly asymmetric with the main flow being towards the party which originated the call, who also gains most value from the call. US-centric: Whether measured by the location of Internet users, websites or the direction of traffic flows, the United States takes the lions share of the Internet. This is reflected too in the policymaking process in which all major decisions have, until now, been effectively taken in the United States. Pace of diffusion: While it took the telephone close to 75 years to reach 50 million users, it has taken the World Wide Web (WWW) only four years to reach the same number (see Figure 1). On the supply side of the equation the number of international carriers grew to more than 1500 in 1999, but this is still a long way behind the estimated 17000 Internet Service Providers (ISPs) that have mushroomed around the world.

Send feedback | Rate this page

Overview of CGI
This topic provides information about CGI. Common Gateway Interface (CGI) is a standard, supported by almost all web servers, that defines how information is exchanged between a web server and an external program (CGI program).

The CGI specification dictates how CGI programs get their input and how they produce any output. CGI programs process data that is received from browser clients. For example, the client fills out a form and sends the information back to the server. Then the server runs the CGI program. Programs that are called by the server must conform to the server CGI interface in order to run properly. We will describe this in further detail later in this chapter. The administrator controls which CGI programs the system can run by using the server directives. The server recognizes a URL that contains a request for a CGI program, commonly called a CGI script. Depending on the server directives, the server calls that program on behalf of the client browser. The server supports CGI programs that are written in C++, REXX, Java, ILE C, ILE RPG, and ILE COBOL. It also supports multi-thread CGI programs in all of these languages capable of multiple threads. You need to compile programs that are written in programming languages. Compiled programs typically run faster than programs that are written in scripting languages. On the other hand, those programs that are written in scripting languages tend to be easier to write, maintain, and debug. The functions and tasks that CGI programs can perform range from the simple to the very advanced. In general, we call those that perform the simple tasks CGI scripts because you do not compile them. We often call those that perform complex tasks gateway programs. In this manual, we refer to both types as CGI programs. Given the wide choice of languages and the variety of functions, the possible uses for CGI programs seem almost endless. How you use them is up to you. Once you understand the CGI specification, you will know how servers pass input to CGI programs and how servers expect output. There are many uses for CGI programs. Basically, you should design them to handle dynamic information. Dynamic in this context refers to temporary information that is created for a one-time use and not stored as a static Web page. This information may be a document, an e-mail message, or the results of a conversion program. For detailed information about CGI APIs, see Chapter 8, HTTP Server Application Programming Interfaces on page 51. Parent topic: Common Gateway Interface

CGI and Dynamic Documents


There are many types of files that exist on the web. Primarily they fall into one of the following categories:

Images Multimedia Programs HTML documents

Servers break HTML documents into two distinct types:


Static Dynamic

Static documents exist in non-changing source form on the web server. You should create Dynamic documents as temporary documents to satisfy a specific, individual request. Consider the process of serving these two types of documents. Responding to requests for static documents is fairly simple. For example, Jill User accesses the Acme web server to get information on the Pro-Expert gas grill. She clicks on Products, then on Grills, and finally on Pro-Expert. Each time Jill clicks on a link, the web browser uses the URL that is attached to the link to request a specific document from the web server. The server responds by sending a copy of the document to Jills browser. What if Jill decides that, she wants to search through the information on the Acme web server for all documents that contain information on Acme grills? Such information could consist of news articles, press releases, price listings, and service agreements. This is a more difficult request to process. This is not a request for an existing document. Instead, it is a request for a dynamically generated list of documents that meet certain criteria. This is where CGI comes in. You can use a CGI program to parse the request and search through the documents on your web server. You can also use it to create a list with hypertext links to each of the documents that contain the specified word or string.

Uses for CGI


HTML allows you to access resources on the Internet by using other protocols that are specified in the URL. Examples of such protocols aremailto, ftp, and news. If you code a link with mailto that is followed by an e-mail address, the link will result in a generic mail form. What if you wanted your customers to provide specific information, such as how often they use the web? Or how they heard about your company? Rather than using the generic mailto form, you can create a form that asks these questions and more. You can then use a CGI program to interpret the information, include it in an e-mail message, and send it to the appropriate person. You do not need to limit CGI programs to processing search requests and e-mail. You can use them for a wide variety of purposes. Basically, anytime you want to take input from the reader and generate a response, you can use a CGI program. The input may even be apparent to the reader. For example, many people want to know how many other people have visited their home page. You can create a CGI program that keeps track of the number of requests for your home page. This program can display the new total each time someone links to your home page. Environment Variable Environment variables are a series of hidden values that the web server sends to every CGI you run. Your CGI can parse them, and use the data they send. Environment variables are stored in a hash called %ENV.
Variable Name DOCUMENT_ROOT HTTP_COOKIE HTTP_HOST HTTP_REFERER HTTPS PATH QUERY_STRING REMOTE_ADDR REMOTE_HOST Value The root directory of your server The visitor's cookie, if one is set The hostname of your server The URL of the page that called your script "on" if the script is being called through a secure server The system path your server is running under The query string (see GET, below) The IP address of the visitor The hostname of the visitor (if your server has reverse-name-lookups on; otherwise this is the IP address again)

HTTP_USER_AGENT The browser type of the visitor

REMOTE_PORT REMOTE_USER

The port the visitor is connected to on the web server The visitor's username (for .htaccess-protected pages) The interpreted pathname of the requested document or CGI (relative to the document root) The full pathname of the current CGI The interpreted pathname of the current CGI (relative to the document root) The email address for your server's webmaster Your server's fully qualified domain name (e.g. www.cgi101.com) The port number your server is listening on

REQUEST_METHOD GET or POST REQUEST_URI SCRIPT_FILENAME SCRIPT_NAME SERVER_ADMIN SERVER_NAME SERVER_PORT

SERVER_SOFTWARE The server software you're using (such as Apache 1.3)

Environment variable
An environment variable is a dynamic "object" on a computer that stores a value, which in turn can be referenced by one or more software programs in Windows. Environment variables help programs know what directory to install files in, where to store temporary files, where to find user profile settings, and many other things. It can be said that environment variables help to create and shape the environment of where a program runs. Environment variables are dynamic because they can change. The values they store can be changed to match the current computer system's setup and design (environment). They can also differ between computer systems because each computer can have a different setup and design (environment). There are a number of environment variables that get referenced by programs and can come in handy for a computer user to find needed information about their computer environment. The more common and important ones to be aware of are shown below. %appdata% %commonprogramfiles%

%local% %localappdata% %programfiles% %temp% %userprofile% %windir% Tip: You can access any of the above folders by entering the environment variable in theWindows Run box or Windows Search Box. For example, to get into the Application Data folder type %appdata% and then press Enter in the Run box.

%appdata%
The %appdata% environment variable contains the directory path to the Application Data folder for your user profile. This folder stores settings and logs, among other things, for various software programs. The settings and logs stored there are specific to your user profile.

%commonprogramfiles%
The %commonprogramfiles% environment variable contains the directory path to the Common Files folder, within the main Program Files folder. This folder contains various files for common programs and utilities on a computer, mostly system and services related. The default directory path this variable points to is c:\Program Files\Common Files.

%local%
The %local% environment variable points to where the security policies & rules are located for the user's account, Windows in general, Windows Firewall, Network, and various software programs on the computer. This environment variable is native to Windows 7.

%localappdata%

The %localappdata% environment variable contains the directory path to where programs store their temporary files. Common temporary files to be stored here are Desktop Themes, Windows Error Reporting, program caching and Internet browser profiles. This environment variable is native to Windows Vista & Windows 7.

%programfiles%
The %programfiles% environment variable contains the directory path to where programs are installed. This directory contains sub-directories for each program, which contain the primary files needed by each program in order to run on a computer. The default directory path this variable points to is c:\Program Files.

%temp%
The %temp% environment variable contains the directory path to where temporary files will be stored. These temp files are often Internet temporary files and other user application temporary files (Microsoft Word, Excel, Outlook, etc.). The files located in this directory can be deleted periodically to help improve computer performance.

%userprofile%
The %userprofile% environment variable points to the current logged in user's profile and the directory where user profile data is stored. It is in this directory that a user can find the following folders: My Documents, My Music, My Pictures, Desktop, and Favorites (Internet Explorer bookmarks).

%windir%
The %windir% environment variable points to the Windows directory, where Windows system files are located. This directory is where Windows will installs. The default directory path for most versions of Windows is c:\Windows (for Windows NT 4 and 2000, it is c:\WinNT).

CGI Environmental Variables


One of the methods that the web server uses to pass information to a cgi script is through environmental variables. These are created and assigned appropriate values within the environment that the server spawns for the cgi script. They can be accessed as any other environmental variable, like with getenv() (in C) or %ENV{'VARIABLE_NAME'} (in Perl). Many of them, contain important information, that most cgi programs need to take into account. This list, highlights some of the most commonly used ones, along with a brief description and notes on possible uses for them. This list is by no means a complete reference; many servers pass their own extra variables, or having different names for some, so better check with your server's documentation. The purpose of this list is only to suggest some common good uses for some of the server-passed information.

CONTENT_LENGTH

The length, in bytes, of the input stream that is being passed through standard input. This is needed when a script is processing input with the POST method, in order to read the correct number of bytes from the standard input. Some servers end the input string with EOF, but this is not guaranteed behaviour, so, in order to be sure that you read the correct input length you can do something likeread(STDIN,$input,$ENV{CONTENT_LENGTH})

DOCUMENT_ROOT

The directory over which all www document paths are resolved by the server. Sometimes it is useful to know the server's document root, in order to compose absoulte file paths when all the script is eing given as a parameter is the relative path of the file within the www directory. It is also good practice to have your script resolve paths in this way, both for security reasons and for portability. Another common use is to be able to figure out what the url of a file will be if you only know the absolute path and the hostname. (there's another variable to find out the hostame)

HTTP_REFERER

The URL that the referred (via a link or redirection) the web client to the script. Typed URLs and bookmarks usually result in this variable being left blank. In many cases a script may need to behave differently depending on the referer. For example, you may want to restrict your counter script to operate only if it is called from one of your own pages, to prevent someone from using it from another web page without your permission. Or even, the referer may be the actual data that the script needs to process. Extending the example above you might also like to install your counter to many pages, and have the script figure out from the referer which page generated the call and increment the appropriate count, keeping a separate count for each individual URL. A snippet for the referer blocking example could be: die
unless($ENV{HTTP_REFERER}=~m/http:\/\/(www\.)?$mydomain\//);

HTTP_USER_AGENT

The name/version of the client issuing the request to the script. Like with referers, one might need to implement behaviours that vary with the client software used to call the script. A redirection script could make use of this information to point the client to a page optimized for a specific browser, or you may want to have it block requests from specific clients, like robots or clients that are known not to support appropriate features used by what the script would normally output.

PATH_INFO

The extra path information followin the script's path in the URL. A URL that refers to a script may contain additional information, commonly called 'extra path information'. This is appended to the url and marked by a leading slash. The server puts this information in the PATH_INFO variable, which can be used as a method to pass arguments to the script.

PATH_TRANSLATED

The PATH_INFO mapped onto DOCUMENT_ROOT. Usually PATH_INFO is used to pass a path argument to the script. For example a counter might be passed the path to the file where counts should be stored. The server also makes a mapping of the PATH_INFO variable onto the document root path and store is in PATH_TRANSLATED which can be used directly as an absolute path/file.

QUERY_STRING

Contains query information passed via the calling URL, following a question mark after the script location. QUERY_STRING is the equivalent of content passed through STDIN in POST, but for script called with the GET method. Query arguments are written in this variable in their URL-Encoded form, just like they appear on the calling URL. You can process this string to extract useful parameters for the script.

REMOTE_ADDR

The IP address from which the client is issuing the request. This can be useful either for logging accesses to the script (for example a voting script might want to log voters in a file by their IP in order to prevent them from voting more than once) or to block/behave differently for particular IP adresses. (this might be a requirement in a script that has to be restricted to your local network, and maybe perform different tasks for each known host)

REMOTE_HOST

The name of the host from which the client issues the request. Just like REMOTE_ADDR above, only that this is the hostname of the remote machine. (If it is known via reverse lookup)

REQUEST_METHOD

The method used for the request. (usually GET, POST or HEAD) It is wise to have your script check this variable before doing anything. You can determine where the input will be (STDIN for POST, QUERY_STRING for GET) or choose to permit operation only under one of the two methods. Also, it is a good idea to exit with an explanatory error message if the script is called from the command-line accidentally, in which case the variable is not defined.

SCRIPT_NAME

The virtual path from which the script is executed. This is very useful if your script will output html code that contains calls to itself. Having the script determin its virtual path, (and hence, along with DOCUMENT_ROOT, its full URL) is much more portable than hard coding it in a configuration variable. Also, if you like to keep a log of all script accesses in some file, and want to have each script report its name along with the calling parameters or time, it is very portable to use SCRIPT_NAME to print the path of the script.

SERVER_NAME

The web server's hostname or IP address. Very similarly to SCRIPT_NAME this value can be used to create more portable scripts in case they need to assemble URLs on the local machine. In scripts that are made publically accessible on a system with many virtual hosts, this can provide the ability to have different behaviours depending on the virtual server that's calling the script.

SERVER_PORT

The web server's listening port. Complements SERVER_PORT above, in forming URLs to the local system. A commonly overlooked aspect, but it will make your script portable if you keep in mind that not all servers run on the default port and thus need explicit port reference in the server address part of the URL.

HTML (HyperText Markup Language) is the actual markup language web pages are displayed in. It creates static content on a web page. DHTML is not a language but a technology. It stands for Dynamic HTML and it's using HTML, CSS and Javascript together to create web pages that are not static displays only, but the content is actually dynamic in response to what a visitor does on a page. DHTML is all client side. That means all the dynamic actions occuring on the web page controlled by the

Javascript is first loaded onto the client browser so that's where everything happens. If you want new content, requests have to be made to the server which then reloads the web page you are viewing.

HTML 1. It is referred as a static HTML and static in nature. 2.A plain page without any styles and Scripts called as HTML. 3.HTML sites will be slow upon client-side technologies.

DHTML 1.It is referred as a dynamic HTML and dynamic in nature. 2.A page with HTML, CSS, DOM and Scripts called as DHTML. 3.DHTML sites will be fast enough upon client-side technologies.

DHTML: Dynamic HTML. An extension of HTML that enables, among other things, the inclusion of small animations and dynamic menus in Web pages. DHTML code makes use of style sheets and JavaScript. When you see an object, or word(s), on a webpage that becomes highlighted, larger, a different color, or a streak runs through it by moving your mouse cursor over it is the result of adding a DHTML effect. This is done in the language coding and when the file of the webpage was saved it was saved as the .dhtml format instead of .htm or .html. DHTML sites are dynamic in nature. DHTML uses client side scripting to change variables in the presentation which affects the look and function of an otherwise static page. DHTML characteristics are the functions while a page is

viewed, rather than generating a unique page with each page load (a dynamic website). On the other hand, HTML is static. HTML sites relies solely upon client-side technologies. This means the pages of the site do not require any special processing from the server side before they go to the browser. In other words, the pages are always the same for all visitors - static. HTML pages have no dynamic content, as in the examples above.