This action might not be possible to undo. Are you sure you want to continue?
Developing Internet-based apps means discarding many traditional concepts of client/server design. You have to pay attention to scalability and concurrency issues that never came into play before.
A friend confessed to us recently that he felt unsure about joining the ranks of the
Web application industry. His strong C++/MFC skills gave him a sense of job security and self-worth that, in his mind, would be lost by becoming yet another HTML programmer. He was right to assume that simply adding HTML to his resume wouldn't make him invaluable overnight. Anyone can create an HTML page these days, right? Just ask your marketing department. Not everyone can create interactive Web applications that improve the profitability of your company. Even more difficult is creating Web applications that can scale over time without requiring a rewrite of a single line of code. Web application developers who understand how to build these complex systems are indeed invaluable in today's Web-driven market. If you're a Web application developer or trying to become one, it's imperative to understand the design considerations you'll be facing. In this article, we'll begin by reviewing the typical client/server design strategies used today and explain why they can't be adapted to the Web model. As you read this article, you'll learn how you can start thinking in terms of Web applications. In the process, you may have to set aside many of the design principles (and habits) you've learned over time because they simply don't work on the Web. In addition to covering the design limitations with the Web model, we'll define the concepts of a user session and session state in a Web application and how they can influence the overall performance and scalability of your system. We'll tackle one of the toughest design questions every Web application developer faces: where should I store session state? Next, we'll turn to a new Web application design pattern that we call the one-page Web application. While this innovative design pattern can be implemented on most browser flavors (if the developers are creative enough, that is), it becomes trivial with the new technologies built into Microsoft Internet Explorer 5.0. You'll see how XML along with new features in Internet Explorer 5.0 can help you achieve your Web application design goals.
Typical Client/Server Application Design
Whether you come from a C++, MFC, or Visual Basic® background, you have probably dealt with client/server applications. If you've ever written a simple database front end or even a more complex multitier system, you've dealt with the client/server arena. Regardless of the project complexity, most client/server applications share certain design characteristics. First and foremost, typical client/server applications have a limited number of users. At design time, the maximum number of concurrent users that your system must support is given as a requirement. From that point on, most of the remaining design decisions are made around that assumption. Knowing how many people will be using your application concurrently can help tremendously as you make other design decisions along the way. But, as you design your system around the given requirement, you usually end up with a system that
provides exactly that level of functionality and nothing more. If in six months the bigwigs in your company decide that the system should support twice as many users, more often than not you're back to the drawing board, or at least stuck rewriting substantial amounts of code. Most client/server applications also operate in a very controlled environment. Typically, developers are writing the code for both the client and the server. These same developers can usually dictate to a great extent such things as the required operating system, database, or any other system dependencies for both the client and server machines. Having this kind of control over your system's runtime environment reduces the complexity of the system and makes developers happy. However, arming developers with this kind of information at design time can lead to systems that are difficult to port to new environments. Another interesting client/server design concept is that of user connections. In most cases, when the user launches the client application, a connection is established with the server and maintained until the client application terminates. Because these connections are expensive and time-consuming, it's usually better to make the connection once and to share it among processes for the lifetime of the client application. Since the server maintains one connection per client, this model provides a simple mechanism for tracking user sessions. When the server picks up a new connection, it knows that a new user has begun using the system. As the user interacts with the system, the server can associate session-related data (state) with the user's connection or session ID. Once the client connection goes away, the user's session ends and the corresponding session state is released from server-allocated resources. Not all client/server applications are designed this way. These are just some of the typical client/server design characteristics that are common today. Chances are you're used to thinking about applications as we've just described.
Unlike client/server applications, which have a limited number of users, Web applications have a potentially unlimited number of users. When you initially design the Web application, you may only expect 5,000 hits per month. But that 5,000 can turn into 500,000 virtually overnight. Most companies underestimate the potential of their applications during the design phase. In client/server scenarios, the company usually has control over the number of clients accessing the system along with the growth of the client base. This allows the company to plan for scaling the system. On the Web, you need to be prepared to scale from the very beginning. Having to completely redesign your Web application after three months on the market would be devastating to your company's strategic plans (and probably your career). In short, Web application developers must think in terms of unlimited users and, therefore, scalability. This doesn't mean that your Web application has to handle 500,000 hits per day right out of the gate. It simply means that your Web application needs to be designed with scalability in mind so that when you cross certain thresholds, you can increase throughput by adding additional hardware to your Web farm without changing any code. This is probably the most important thing to get right in your Web application design. We'll be discussing this concept in more detail throughout the rest of the article.
Most Web applications, with the exception of intranet applications, also have very little control over their client's runtime environment. Users may try to use your Web application with various different browsers.
Since you don't have any control over the client's runtime environment, you must decide early in the design phase which browsers your application will support. If you decide to support all of them, you'll be incredibly limited in client functionality. Most companies today decide to support certain versions of Internet Explorer and Netscape Navigator. This strategy makes development less complex and less restrictive. In this situation, if a user tries to use your Web application with an unsupported browser, the Web application should advise the user that things might not behave properly and advise them to update their browser. This cross-browser compatibility issue is a major frustration for most Web application developers. They see new technologies surfacing in newer browser versions—like Dynamic HTML (DHTML), Cascading Style Sheets (CSS), XML, and HTML behaviors—that would allow them to create very sophisticated Web applications. But because of corporate policy, they must continue to support every known browser. Many developers in this situation will develop parallel versions of their Web application for the different browser versions and take care of browser identification and redirection on the server (see Figure 1). This approach allows developers to take advantage of the newest technologies without affecting down-level clients. Unfortunately, this also requires the simultaneous maintenance of multiple application versions.
Figure 1: Different Versions for Different Browsers
A typical client/server application supports the notion of a physical user connection. The server uses this one-to-one mapping between users and connections to track user sessions and session-related state. Web applications must behave differently. In fact, this basic concept of user connections is what makes Web applications fundamentally different from typical client/server applications. The reason Web applications can't use the client/server model for tracking user sessions is related to the Web's underlying protocols. The ubiquitous protocol on the Web today is HTTP. To help understand how HTTP works, let's look at a typical HTTP request. When the user points the browser to a given Web site, the browser first establishes
a TCP connection with the corresponding Web server. Once the connection is established, the browser sends an HTTP request using the existing channel. The Web server then processes the request, sends the response back to the client, and closes the TCP connection. This same process is used for every HTTP request made by the browser. For example, consider a Web page with 20 embedded images. To retrieve this page, the browser has to make an HTTP request for the HTML page itself, and additional HTTP requests for each of the 20 embedded images. As you can see, this protocol is a far cry from the client/server model where the connection is made once and used for the lifetime of the session. With HTTP, the user connects and disconnects on every request. HTTP/1.1, which isn't supported by all browsers or Web servers, offers a new mechanism called persistent connections to optimize the connection process. With persistent connections, the server maintains the connection with the client for a period of time so the client can reuse the connection for subsequent requests.
Session, State, and Security
Since HTTP/1.0 doesn't support the concept of persistent connections, there isn't a straightforward way to keep track of user sessions. HTTP is truly a stateless protocol. In other words, the HTTP server doesn't remember anything about previous HTTP requests. If a user sends an HTTP request and then sends another one a few minutes later, the HTTP server behaves as if it's the user's first request. Owing to the connectionless and sessionless nature of HTTP, Web application developers are required to implement higher-level session management. The browser and the server must agree on a mechanism for identifying users on a connection-toconnection basis. Once you have a mechanism for identifying users across connections, you can start associating session state with a given user. In the client/server model the session state is typically stored on the server. While this is by far the easiest solution to implement, you'll see that storing session state on your Web server can severely impair scalability. So where should you store session state in your Web application? This is one of the hardest questions that every Web application designer must face. Another complex consideration is security. Today, everyone is concerned with security. If you visit an online shopping site and don't see the little yellow lock show up in the browser's status bar, you go elsewhere. If you've ever tinkered with network sniffers like NetMon, you know how easy it is to intercept data submitted as clear text. To make things even more complicated, developers must also deal with the possibility that their Web application or its clients will live behind a firewall. Firewalls block potentially malicious traffic from reaching your corporate network. They do so by rejecting attempts to establish TCP/IP connections on unsecured ports. Remote procedure call protocols such as DCOM attempt to establish connections on arbitrary TCP/IP port ranges. Unfortunately, since the firewall has no way to know whether these ports are secured, and because they generally have no knowledge of the port negotiation protocol used by DCOM, they simply prevent this traffic from passing through. But there is a protocol that generally passes unimpeded across firewalls: HTTP, which communicates over the well-established port 80. We will discuss ways in which you can invoke code on Web servers by passing appropriate messages using HTTP. As you can see, designing a Web application is very different from designing a typical client/server application. Considerations you must address early include tracking user sessions and deciding where to store session state.
Figure 2: Passing Session IDs
The server can pass session identifiers by using a hidden form field or embedding it within every relative link (see Figure 3). Like the pass-by-value approach, this also works on all browsers, but is much less cumbersome than the complete pass-byvalue technique. The only thing you would pass as part of every HTTP request is the session ID. The server uses the session ID to look up the user's session data. This approach is commonly used today by big Web applications that have to support a wide range of Web browsers. For an example, point your browser to Amazon's Web
So when does a Web session end? In a client/server application, the session ends when the user closes the client application. In the Web environment, it isn't clear when a session should end. Should it end when the user browses off the page? Or when the user closes the browser? What if the user leaves the page open for days without doing any work? Should the server keep resources allocated for that lazy client? The standard answer for Web sessions is a timeout period. For example, if 20 minutes go by without any activity from a given client, the server terminates the client's session and reclaims its resources. In fact, ASP's built-in session management uses a default timeout period of 20 minutes, but this value is configurable. If you're using one of the manual techniques described previously, you're responsible for determining when a session should timeout (possibly by setting the cookie's expiration date). Transmitting session information through a pass-by-reference technique is becoming the standard in most new Web application development projects because of its obvious benefits. When you pass by reference, you must first decide where to store the session data. Where should you store session state in your Web application? There is no single correct answer to this question. When it comes to Web applications, there is no magic bullet that will solve all of your problems. There are too many variables to consider from one project to another, and we definitely believe in using the right design for a given project. Nevertheless, there are certain Web application principles that will help guide you toward making the right decision for your project. But you must first understand state durability and the state's scope.
Durability and Scope
Most Web applications contain both durable and nondurable state. Durable state is persisted to disk in order to survive system failures. Most durable state is stored in some type of relational database (such as SQL Server™) for optimal storage. Durable state is obviously more robust, but that robustness comes with a performance cost. Accessing and updating durable state requires more CPU cycles along with a datalocking scheme (usually provided by the DBMS). Nondurable state does not survive system failures. Session state stored in memory on the Web server is an example. If the Web server goes down, the current state goes with it. Nondurable state offers much better performance at the cost of robustness. Most Web applications today use a combination of both durable and nondurable state. Deciding when to use one over the other can obviously have an impact on system performance and reliability. As a rule of thumb, if the data is not critical or can be regenerated at minimal cost, use nondurable state. For all system-critical data, like user passwords, use durable state. Web application state can also exist at four different scopes: page scope, session scope, application scope, and external scope. Page-scoped state lives in the currently
loaded Web page and exists only for the lifetime of the page. As soon as the user browses to a new page or closes the browser, the page-scoped state can no longer be accessed. The technique for passing session data by value (using hidden form fields and URLs) described earlier is a good example of page-scoped data. The data in the hidden form fields and URLs exists only for the lifetime of the page. Once a new page is requested from the server, the browser discards the state. This is why you need to pass that state back to the server as part of the request for the next page in the Web application. Session-scoped data lives for the lifetime of the user's session. As long as the user's session is still active, the session-scoped data can be accessed. Once the user's session expires, the session-scoped data is removed from the system. Data that needs to be tied to a given user for the lifetime of the user's session should be stored at this scope. Application-scoped data lives for the lifetime of the application. Data stored at this scope is global in nature and needs to be shared between all active sessions. As long as the Web application is running, any user can access data stored at application scope. Once the application terminates, the application-scoped data is removed from the system. External-scoped data lives beyond the lifetime of the Web application. This type of data is usually managed by another application that is not dependent on the Web application (such as a relational DBMS). Data stored at this scope can be accessed across multiple Web applications. Now that we've covered the nature of Web application state, let's focus on the different techniques for managing application and session-scoped state.
In-memory State on a Web Server
The easiest method for managing session and application-scoped state is to use the support provided by ASP. The ASP Application and Session intrinsic objects exist for this purpose. To store data at application scope within an ASP page, you simply use the following syntax:
Application("HitCount") = Application("HitCount") + 1
This would increment the application-wide hit count. Storing information at session scope is just as easy using ASP:
Session("StartTime") = Now
to think twice about using this strategy. In the Web farm scenario, storing data at application scope in memory on the Web server doesn't work; ASP pages running on other Web servers in the farm will not see each other's updates. You could still get away with storing session state in memory on the Web server, but now you've pinned the client to a single Web server for the lifetime of its session (see Figure 4). This completely destroys any type of dynamic load-balancing strategy you might try to implement within the Web farm. Plus, you'll probably need some expensive hardware (like the Cisco Local Director router) or software to accomplish the session pinning. In other words, you've created a very complicated mess.
Figure 4: Effect on In-memory State Storage
Another major downside to storing state in memory on the Web server has to do with fail-over. If the Web server goes down, the state is completely lost and all users in the middle of sessions lose their data. Plus, if a user is pinned to a server and that server goes down, he is stuck until it comes back online.
In-memory State on Another Server
Another approach is to store application and session-scoped data in-memory on a distinct server designed to manage data. When your ASP page needs to store session-related data for a given user, it sends the data to the server through a DCOM method invocation. This allows you to take advantage of dynamic load-balancing on your Web farm without pinning a user to a Web server. If one Web server goes down, the next time the client connects, he will get sent to the next available Web server (see Figure 5). This solution is going to be slower than local in-memory storage on the Web server, since you have a network round-trip each time you need to access or update data.
Figure 5: The "Next-available" Web Server Approach
Windows® 2000 will introduce a new technology called the In-Memory Database (IMDB). (IMDB technology has been discontinued. See the IMDB update page for more information — Ed.). This technology will, in theory, allow you to cache database tables in memory on all Web servers within a Web farm for lightning-fast access. The first release of IMDB, however, only works in single-node scenarios. IMDB acts as a write-through cache manager for a specific relational database. All updates to the
IMDB tables must take place through the IMDB cache. To make this work in a Web farm scenario, an IMDB cache must exist on each Web server. Problems arise when you need to do updates on the database tables. If one machine updates a table that lives in an IMDB cache on another machine, the IMDB machine won't see the change. You need to keep the IMDB caches synchronized on all the Web servers in the farm. This cache synchronization mechanism is planned for the next release of IMDB. Today, you can use IMDB on a dedicated state server as long as all changes go through the IMDB cache (see Figure 5). As you can see, IMDB gives you the best of both durable and nondurable state. Probably the most common approach today for storing application and sessionscoped state in large-scale Web applications is to use a durable storage mechanism like a relational database. The database can reside on the Web server at first. Then, when the site needs to scale to a Web farm, it can be moved to a dedicated database server. Shopping cart applications that keep your purchase data around for days or even weeks are probably using this strategy. Although this strategy offers lower performance than the nondurable solutions, it allows your application to take advantage of dynamic load-balancing and fail-over strategies. Furthermore, using durable storage makes system failures much less of a problem.
Figure 6: Client-side Session State
The biggest problem with this approach is cross-browser compatibility. None of
these techniques are universally supported across browsers, and most of them—with the exception of in-memory cookies—are supported by Internet Explorer 4.0 and later. You can take things a step further and store session state on the client's disk. You can accomplish this today using persistent cookies (cookies with an expiration date). Internet Explorer 5.0, however, offers a much more powerful and flexible solution with its new userData behavior. The userData behavior allows you to store state as XML on the client's disk. This allows you to add rich structural meaning to your data store, and it gets you around the standard 4KB cookie limit. Using this approach, you get all the same advantages described above, plus the ability to persist state across user sessions. If the user closes their browser and comes back to your application two days later, the state will still be on their disk and available for use. The downside, once again, is cross-browser compatibility.
Do We Really Need Session State?
All of the state management techniques described here are being used today in Web applications. Some work better than others depending on the given project's requirements. Now let's stop and think about a fundamental question: why do you need these strategies for managing session state? The answer lies in page transitions. Each time a user interacts with the Web server, the browser loads and renders a new HTML document. Once the new document is loaded (as the result of a page transition), the data contained in the previous HTML page is no longer accessible. In other words, page-scoped state != session-scoped state. Since page-scoped state is not equivalent to session-scoped state, you need an additional mechanism (like the ones described earlier) for preserving state across page transitions. But what if it were possible to devise a design that changed the formula to page-scoped state == session-scoped state? If this worked, you wouldn't have to worry about managing session state outside the scope of a given page. You can make this formula hold true by avoiding page transitions within your Web application. Then any piece of data stored at page scope would also exist at session scope.
A One-page Web Application
Avoiding page transitions in your Web application gives you what we call the onepage Web application pattern. How on earth will a one-page Web application accomplish anything useful? We never said that the application couldn't interact with the Web server; it simply can't produce a standard browser page transition. While it's possible to produce a one-page Web application using some down-level browsers, it's never been easier than with Internet Explorer 5.0; it's as if Internet Explorer 5.0 was designed specifically for this type of Web application. The Internet Explorer 5.0 enhanced XML support combined with DHTML is what makes it possible. Internet Explorer 5.0 introduces the concept of an XML data island. Using an XML data island, you can contain an XML segment within your HTML document. Internet Explorer 5.0 will automatically parse the XML data and allow you to programmatically access the XML document object model (DOM) from script. The following is an example of an XML data island:
<HTML> <BODY> This is an HTML document
<XML ID=sessionData> <ORDERS> <ITEM ID=1> <NAME>Essential WinInet</NAME> <PRICE>34.95</PRICE> </ITEM> </ORDERS> </XML> </BODY> </HTML>
XML data islands also have an src attribute that you can use to point to XML data on the Web server:
<HTML> <BODY> This is an HTML document <XML ID=xmlData1 SRC="orders.xml"></XML> <XML ID=xmlData2 SRC="getorders.asp?userid=2"></XML> </BODY> </HTML>
Notice that XML data islands can point to static XML files or even ASP files that generate XML output. Now for the interesting stuff: you can also dynamically change the src attributes of XML data islands from script.
<INPUT TYPE=button VALUE=User1 ONCLICK="return getXML(1)"> <INPUT TYPE=button VALUE=User2 ONCLICK="return getXML(2)"> </BODY> </HTML>
As you can see, XML data islands are a very convenient mechanism for requesting additional data from the server without forcing a page transition. Another new object in Internet Explorer 5.0, XMLHttpRequest, gives you direct access to the underlying HTTP protocol along with XML parsing support. For example, consider the following script (disclaimer: this only works with the final release of Internet Explorer 5.0):
This block of script, which can be executed in response to some user interaction, sends an HTTP request to the server and waits to receive the HTTP response. Once the script receives the response, it can use DHTML to update any scriptable element within the HTML page (see Figure 7).
Figure 7: Updating Session State
Internet Explorer 5.0 also has excellent support for the Extensible Stylesheet Language (XSL). This allows you to store most of your page-scoped data as XML. You can then use XSL to transform the XML data into HTML—use XML to represent page-
scoped data and XSL to describe how that data should look to the user. When you need to persist your XML data across user sessions, once again you can save it to disk using the Internet Explorer 5.0 intrinsic userData behavior. As you can see, Internet Explorer 5.0 and its extensive support for XML and out-ofband requests makes the one-page Web application very feasible. Imagine being able to save all session-scoped state within a single Web page, having programmatic access to the data via a standard DOM, and being able to easily persist the session data to disk using XML. This gives you better performance, scalability, and simplicity than any of the other strategies we've discussed here.
Techniques for Avoiding Page Transitions
To make the one-page Web application pattern work, you must avoid page transitions in your Web application. To do so, you must eschew using the standard behavior for form actions and anchor elements. In other words, you'll have to implement custom actions in script and override the element's default behavior. Your script can use DHTML along with XML data islands or the XMLHttpRequest object to make HTTP requests that don't initiate page transitions:
To make the user feel like there was a page transition (and that something actually happened in response to the user's interaction), you can use DHTML to hide, show, or update elements on the page. We should also point out that you could accomplish this type of functionality on down-level browsers using more traditional technologies like Java applets or ActiveX® controls. The downside is that they're more difficult to implement and require downloading binary images. Plus, if you want to take advantage of XML on these down-level systems, you'll have to implement most of the XML functionality yourself. (See Ken Spencer's Beyond the Browser column in this issue for a discussion of how to do this in Visual Basic.)
A New Role for ASP
When you start thinking about the one-page Web application pattern, ASP pages should seem more like remote functions for generating XML data. ASP pages are definitely not limited to returning HTML—they can return any type of data that makes sense to your Web application client, such as XML. Web application developers using this pattern will benefit from creating reusable ASP-compatible COM objects to handle XML generation. More and more database vendors are adding native XML support to their systems. Even today, you can use ADO 2.1 to persist a recordset object to XML. As more tool
vendors add XML support to their portfolios, Web applications designed around XML will continue to benefit.
Developing powerful and scalable Web applications is not easy; there is much more to Web application design than deciding how to lay out your HTML elements. Hopefully, you now understand the fundamental differences between designing a typical client/server application and a highly scalable Web application. You should also appreciate how session and state management influence the overall performance and scalability of your system. One thing we'd like to emphasize is that there is not a single design that works for all projects. But given a particular project's requirements, there is probably a design strategy that will give you optimal performance and scalability. The concepts presented here should help you feel more prepared to discover the best design for your project. We introduced a new design pattern called the one-page Web application, which offers a simplified state management strategy that doesn't compromise scalability or performance. This design allows your Web servers to support more concurrent users because it's capable of leveraging the client's system resources. Internet Explorer 5.0 makes it possible to implement such a radical new design. Due to space constraints, we've barely touched upon the many implementation details that you're probably craving right about now. Not to worry; in a future article we'll present a comprehensive sample application that brings the one-page Web application to life.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.