Professional Documents
Culture Documents
Joe Lima
Director of Product Development
Port80 Software, Inc.
jlima@port80software.com
Web Server Technologies | Part I: HTTP & Getting Started
Tutorial Content
Introduction to HTTP
• TCP/IP and application layer protocols
• URLs, resources and MIME Types
• HTTP request/response cycle and proxies
Online resources are plentiful and will be cited along the way.
Web Server Technologies | Part I: HTTP & Getting Started
An Introduction to HTTP
Network Layer IP
• IP provides packets that are The ports let TCP carry multiple
routed based on source and protocols that connect services
destination IP addresses running on default ports:
• TCP also provides mechanisms to make the connection a reliable bit pipe
• A data stream is chopped up into chunks that are reassembled, complete and
in correct order on the other endpoint of the connection
• When HTTP is the Application Layer protocol on top of the stack, these
chunks of data are the contents of the HTTP Message
Web Server Technologies | Part I: HTTP & Getting Started
• HTTP now the central mechanism for requesting and serving URL
based resources
Web Server Technologies | Part I: HTTP & Getting Started
• primary-type/sub-type
The most common MIME Types used on the Web come from the
text, image and application top-level groups
• text/html, text/css
• image/gif, image/jpeg, image/png
• application/pdf, application/octet-stream
• application/x-javascript, application/x-shockwave-flash
Web Server Technologies | Part I: HTTP & Getting Started
•User agent (client) issues an HTTP request to a host (server) for a given
resource using its URL
HTTP Request
HTTP Response
Resource
HTTP Client /bar
Network at
Hosting
Provider
• GET
– By far most common method
– Retrieves a resource from the server
– Supports passing of query string arguments
• HEAD
– Retrieves only the Headers associated with a resource but not the entity itself
– Highly useful for protocol analysis, diagnostics
• POST
– Allows passing of data in entity rather than URL
– Can transmit of far larger arguments that GET
– Arguments not displayed on the URL
Web Server Technologies | Part I: HTTP & Getting Started
• OPTIONS
– Shows methods available for use on the resource (if given a path) or the host
(if given a “*”)
• TRACE
– Diagnostic method for assessing the impact of proxies along the request-
response chain
• PUT, DELETE
– Used in HTTP publishing (e.g., WebDav)
• CONNECT
– A common extension method for Tunneling other protocols through HTTP
Web Server Technologies | Part I: HTTP & Getting Started
– General Headers
• Provide info about messages of both kinds
– Request Headers
• Provide request-specific info
– Response Headers
• Provide response-specific info
– Entity Headers
• Provide info about request and response
entities
– Extension headers are also possible
Web Server Technologies | Part I: HTTP & Getting Started
• Host – The hostname (and optionally port) of server to which request is being sent
– Required for name-based virtual hosting
– Host: www.port80software.com
• Referer – The URL of the resource from which the current request URI came
– Misspelled in the specification, so [Sic]
– Referer: http://www.host.com/login.asp
• User-Agent – Name of the requesting application, used in browser sensing
– User-Agent: Mozilla/4.0 (Compatible; MSIE 6.0)
Web Server Technologies | Part I: HTTP & Getting Started
• Accept and its variants – Inform servers of client’s capabilities and preferences
– Enables content negotiation
– Accept: image/gif, image/jpeg;q=0.5
– Accept- variants for Language, Encoding, Charset
• If-Modified-Since and other conditionals
– Frequently used by browsers to manage caches
– If-Modified-Since: Sat, 31-May-03 15:00:00 GMT
• Cookie – How clients pass cookies back to the servers that set them
– Cookie: id=23432;level=3
Web Server Technologies | Part I: HTTP & Getting Started
• Allow – Lists the request methods that can be used on the entity
– Allow: GET, HEAD, POST
• Location – Gives the alternate or new location of the entity
– Used with 3xx response codes (redirects)
– Location: http://www.ibm.com/us/
• Content-Encoding – specifies encoding performed on the body of the response
– Used with HTTP compression
– Corresponds to Accept-Encoding request header
– Content-Encoding: gzip
Web Server Technologies | Part I: HTTP & Getting Started
Network bottlenecks
– Available bandwidth should accommodate max HTTP operations (“hits”) under
peak load
– Assuming an average file size of 14,000 bytes
• 56K Modem could handle about 0.5 hits/sec
• T1 line (1.5Mb) could handle about 13 hits/sec
• T3 (45Mb) could handle about 400 hits/sec
• OC3 (155Mbps) could handle about 1380 hits/sec
– Bandwidth sizing should be adjusted based on your actual request frequency
and size
• Assume peaks at triple the average loads
– Also watch out for collisions and overloading of routers, switches, hubs and
NICs on the network
Web Server Technologies | Part I: HTTP & Getting Started
Making a site available by domain name requires its registration and use of DNS
– A domain name can be registered with many different registrars
– During registration, a DNS server is designated to maintain the domain’s DNS
records
– These records propagate to other DNS servers
– DNS servers use them to resolve a domain such as www.port80software.com
to a four-octet IP address such as 66.45.42.237
– ISP’s offer DNS services; you can also maintain your own or use a 3 rd party
service that lets you manage the records without running a DNS box
Web Server Technologies | Part I: HTTP & Getting Started
1 4
6 5
• You should learn to use nslookup to verify your DNS lookups are
working and troubleshoot DNS problems
• Command line utility also built into network analyzers like free
ieHTTPHeaders
– C:\>nslookup google.com
• You can also point nslookup at specific DNS servers to test their ability
to resolve
– C:\>nslookup
– >Server 206.13.30.12
– >google.com
Web Server Technologies | Part I: HTTP & Getting Started
Think of a site as having not one structure but two – virtual and physical
– Virtual structure is described by the URLs used to request resources
from the site
• This is the public view of the site – the site as visitors will see it
when they browse to it
– Physical structure is the organization of the files and directories in the
file system on the host machine’s hard disk
• This is the private view of the site seen only by you and those
users you choose to give access
– It will become obvious why this distinction is necessary to keep
things straight
Web Server Technologies | Part I: HTTP & Getting Started
Notice how the hostname portion of the URL maps to the same place pointed to
by the physical path that lies to the left of the the “/” representing the
document root
– The URL is virtual to the left of the document root, but it seems to be
physical to the right of the document root
– In fact, a URL is purely virtual – there is no guarantee that the path to
the right of the document root looks this way on disk
– In this simple case, virtual and physical paths happen to coincide from
the document root down, but such is not always the case
Web Server Technologies | Part I: HTTP & Getting Started
• A virtual directory or alias in the URL path preempts the lookup in the document
root
• This extends the virtual structure to the right of (or “below”) the root “/” in the URL
path
– http://www.foo.com/virtual/index2.html
– /htdocs/physical/index2.html
• Here a virtual directory virtual points to a physical directory that is outside of the
document root altogether
• Nested virtual directories are also possible
Web Server Technologies | Part I: HTTP & Getting Started
• You can (and should) take advantage of this virtual/physical distinction to:
– Preserve the site’s URL scheme even if the physical structure has to
change
• Avoids broken links due to site expansion/revision
– Manage directory and file locations in ways that minimize security risks
and facilitate backup procedures
– Reduce redundant physical directories for supporting files
– Allow developers to keep relative URLs in source code simple
Web Server Technologies | Part I: HTTP & Getting Started
Virtual Hosting
• We know the hostname part of the URL is a virtual locator for files that live
(physically) in a site’s document root
• The idea of virtual hosting takes this a step further by allowing a single
server to host many domains, each with its own document root
• Two methods of virtual hosting
– Old way: multiple IP addresses per server
– New way: name-based using host headers
Web Server Technologies | Part I: HTTP & Getting Started
• Users (developers) will need remote access allowing them to transfer files to and
from the site’s physical structure
• FTP (and other file transfer mechanisms) allow the administrator to restrict this
access
– to sub-sections of the site
– by user account or client IP
• These restrictions should be backed up by access control lists on the directories
that enforce the “principle of least access”
Web Server Technologies | Part I: HTTP & Getting Started
• Similar rules apply to managing access to the Web site itself by visitors
– ACLs in the Web site’s physical file structure should be set to the minimum
required by the Web server to serve the resources on the site
• This gets tricky with server side programming
– If the Web site (or part of it) does not need to be available for anonymous
access from everywhere then users, groups, hosts and IPs should be
restricted
– HTTP Authentication can also be employed to require make all or part of a site
private and require login
Web Server Technologies | Part I: HTTP & Getting Started