Live Monitoring: Using Adaptive Instrumentation and Analysis to Debug and MaintainWeb Applications

The recent crop of Web-based applications, known loosely and collectively as Web 2.0, poses a whole new set of challenges for software developers. One person’s challenge, though, can prove another person’s opportunity, and Emre Kıcıman believes that, ultimately, Web 2.0 technology will lead to a fundamental change in how developers analyze and understand their applications’ performance. Kıcıman, a researcher in the Cybersecurity and Systems Management group at Microsoft Research Redmond, focuses his work on the reliability of Web services, and he is taking advantage of the capabilities of Ajax—Asynchronous JavaScript and XML—to provide insight into an elusive realm for software developers: What happens when a Web application interacts with a user’s specific actions within a unique combination of hardware, software, and network that constitutes an individual user’s computing environment? How do you know how well things are working at the last hop? These days, with a dizzying array of software available, individual computers can seem almost as unique as the owners themselves. But how can a developer possibly plan for all contingencies amid such user diversity? Kıcıman has an idea—and a research project, called Ajax View, to examine it. The project is based on the very model that is fueling the Web 2.0 generation itself: dynamic HTML and JavaScript. “The goal of the Ajax View project,” he says, “is to improve the visibility that Webapplication developers have into how their applications are running inside end user’s browsers out in the real world. Having detailed, code-level monitoring can help developers discover, understand, and fix the bugs that are affecting real users.” In a paper co-written by Microsoft Research colleague Helen Wang, entitled Live Monitoring: Using Adaptive Instrumentation and Analysis to Debug and Maintain Web Applications, Kıcıman explains the underlying concept behind Ajax View. Basically, it involves inserting a proxy between a Web application and a user’s browser. The proxy dynamically rewrites the code and injects instrumentation code, which reports back to the Web service any observations it has made about the application’s behavior in the wild, thereby enabling the developer to improve the code as necessary. “With the Web 2.0 model,” he explains, “you have much more dynamic code and content being sent out to the browsers. Is it fast? Is it slow? Is it failing? You don’t know until your users complain. A lot of the challenges of code complexity when you start to write large programs—trying to run your programs across heterogeneous environments,


different browsers, different types of computers, as well as dependencies on third-party services and software that’s not under your control—those issues all are cropping up in Web applications just as they have with conventional software.” One person’s challenge … “There’s something new about the Web environment that gives us an opportunity,” he says. “Now, we can get the visibility, because in the Web-application world, whenever the users want to run your software, they check with you and ask, ‘Do you have a new version for me to download?’ And you can say, ‘Yes, here’s a new version.’ That means if you want to instrument your code, you can have very dynamic control over what you’re monitoring at any given moment in order to optimize.” The instrumentation code injected by the proxy delivers a host of advantages. “This rewriting,” Kıcıman states, “gives you visibility of the last hop of the user experience. The instrumentation code runs with the rest of your application inside the browser and can see almost any part of the application’s behavior. You can start checking for assertions and all the things that programmers care about. You can even start checking for memory leaks.” Beyond that, the technique, which Kıcıman has been refining in collaboration with Ben Livshits, a researcher in the Runtime Analysis and Design group, also provides a couple of valuable options. “One,” Kıcıman says, “we don’t have to give the same instrumentation out to every user. If we have a very heavyweight instrumentation policy that’s going to gather lots and lots of data, not everyone has to pay the cost of running the whole instrumentation. We could split this into pieces and have everyone run just a small piece of the instrumentation policy. This lets us collect all the data we care about in aggregate, but no one user has to pay a large cost. “The second thing we get to do is adapt our instrumentation over time, so if we ask some questions about the application’s behavior and learn about a problem that’s going on in the code, we can then turn around and immediately start asking new questions about that particular problem. We can drill down into issues.” The process boils down to three steps as precursors to improving Web applications: instrumentation, observation, and analysis. As always, stating the question accurately is critical to the quality of the answers you receive. “The first piece is determining how you’re going to gather the data you care about,” Kıcıman says. “You decide what data you want to collect and then how you can grab that from inside the JavaScript environment. There are limitations inside the browser, so you can’t get full knowledge of everything. You’re limited by the security model of the JavaScript sandbox.


“The second thing is you determine what’s going to get reported back about this data, and then you figure out how to distribute this. Does everyone have to run the whole policy at once, or can it be split up? “There’s a question of adaptation. When do I want to turn on this instrumentation? Do I want it to be always running, or do I want it to be reacting to a particular issue? Will you turn on part of the policy and get information about part of the program and use that to follow a trail and turn on instrumentation in the second or third part?” Such questions come naturally to Kıcıman, who has been investigating related issues since his graduate-school days. “Before I came to Microsoft Research, for my thesis work, I had started working on the reliability of Internet services,” he says. “How do we reason about fault detection, fault localization within the data center? How do we know when things are running or not, especially given the challenges of large and complex systems? My earlier research was built around that problem: Observe the system behavior, model it, and look for anomalies, signs of potential problems. “Since coming here, I’m still working in the general area of Internet-service reliability, but I’ve shifted my focus, turned outside the data center. What are the issues that cause problems in the end-to-end reliability of Internet services, specifically things that are outside the data center? This was a different take on a different major component of the same problem.” One thing unique about Web applications is that users have evaluated the usefulness of the technology and have willingly chosen to download it. A trust relationship is thus created. “Users are already sharing so much personal data,” Kıcıman notes, “and there’s a security model that prevents the Web site from doing anything really bad. [That means] you can gather data about your own Web application’s behavior, and the user doesn’t have to worry that you’re snooping on anything else that’s going on in their machine.” There are privacy implications, obviously, but Kıcıman doesn’t see them as overly daunting. “It’s not something we’ve looked at in the context of our research,” he says, “but it wouldn’t be hard to have a check box or a notification somewhere on the page. Whether the Web site wants to do that is up to the individual Web site and what its privacy policy says and what its relationship is with its users. “An important thing to note is that Ajax View isn’t actually changing anything about the security model. The browser is still enforcing what it thinks is an appropriate boundary around the application. We’re just taking advantage of the current visibility that the Web page can get about its own behavior. The security model and the boundary of what the


Web site is allowed to do has already been set by the browser, and we’re operating in those limits.” This self-checking, self-correcting alternative to the traditional, shrink-wrapped software model will become increasingly commonplace. “It’s a trend across the whole software industry,” Kıcıman says. “We’re going into this world where there is central, automated control over the software versions that are running on people’s desktops so that people don’t have to worry about it themselves. Eventually, this could lead to such techniques being more broadly applicable outside the Web-app domain.” The potential for techniques such as Ajax View is immense. “We’re looking at what this enables in terms of things other than just monitoring,” Kıcıman says. “Can we take this kind of run-time analysis of behavior and immediately feed it back directly into debugging and optimization of the code? There are a couple of different usage scenarios for it, including putting it in front of Microsoft’s own Web services. “There’s also some benefit for this as a tool for developers to use on their own desktops. If you have a profiler that’s really baked into your browser and really has hooks into the JavaScript engine, you’re going to get a lot more detailed information about what your application is doing as you’re developing it. But if you don’t want to take that type of dependency, or if you want to have a single tool that’s cross-browser, then this JavaScript-rewriting approach might be appropriate. You get slightly less granular information, but you get comparable information from all the browsers at your disposal.” That combination of the visionary and the practical appeals to Kıcıman. “This is something that has a practical use right now,” he says of Ajax View. “There’s some concrete benefit you can get immediately, even if you don’t buy all the furtherafield stuff about dynamically instrumenting everything.” But if you do … “Going forward,” he says, “this opens up the possibility of analyzing all this run-time information that you’re observing live about your application behavior across all your users and then immediately taking advantage of it in all sorts of debugging, development, and operations processes that originally were kind of static. “In terms of the research angle, I very much like that.”


Sign up to vote on this title
UsefulNot useful