Automatic detection of internal queues and stages in message processing systems
Suman Karumuri, Steve Reiss, Brown University.
{
suman,spr@cs.brown.edu
}
Abstract
Complex applications today involve multiple processes,multiple threads of control, distributed processing, thread pools, event handling, messages. The behaviors and misbe-haviors of these nondeterministic, message-based systemsare difficult to capture and understand. The typical ap- proachis to trace the behaviorof the systems and track howthe different incoming messages are processed throughout the system. While messages between processes can be cap-tured automatically at the network or library level, tracingthemessage processingwithin asystem, whichis oftenmorecomplex and error-prone, requires the programmer to man-ually instrument the code by identifying the different mes-sage handlers, thread states, processing stages, and shared queues accurately and completely. In this paper we showhow dynamic analysis can be used to automatically iden-tify the transactions, stages and shared queues in Java pro-grams as a prelude to trace-based comprehension.
1 Introduction
Today’s complexsystems typically handle a series of ex-ternal requests or messages as a whole or by splitting therequests into separate computational steps and processingeachofthese stages usingmechanismssuchas threadpools.Theoverallbehaviorof such systems can be nondeterminis-tic, unpredictable, and difficult to understand. We call suchsystems
message processing systems
(MPS), call the re-quests
transactions
,call thevariousprocessingsteps
stages
,and call the mechanism used to save and allocate stages tothe different threads
queues
. In this paper, we present newprogram analysis techniques to detect queues, stages andtransactions in a message processing system using minimalprogrammer input, the list of messages in a system.As an example of a MPS, consider a simple web crawlerwe wrote for a class project. The crawler uses a pool of threads to retrieve a queue of pages. Each thread takes apage off the queue and processes it in the following order:check if the URL is a valid, download the HTML page if the URL is valid and finally add the new URLs found in thedownloaded page to the shared queue for further process-ing. Here there is a single stage, processing a URL, anda single queue through which the worker threads exchangemessages containing URLs. Some other examples of MPSswe have looked at are a peer to peer system, the HaboobHTTP server, the Hadoop Map-Reduce framework, a back end for code search, a linguistic search web service, the en-terprise messaging system ActiveMQ, the Rupy and JettyHTTP servers and Jabber tools JBother and Openfire.The best way to understand the behavior of a MPS is byanalyzingatraceofits behavior,atracethatcanbebothpre-cise and that can be generated with relatively low overhead.However,suchtracesaredifficulttoobtainandthusareusedonly as a last resort. The main problem here is that procur-ing such a trace requires significant work on the part of theprogrammersince the code to generate the trace needs to beinserted manually in many portions of the system. Here itis easy to overlooka queue or processing stage and to insertbugs into the system along with the instrumentation code.The manual instrumentation of a message processing sys-tem is carried out in two steps, identifying the transactions,stages and queues in the MPS, and manually adding tracecode in the form of instrumentation based on these identi-fications. While frameworks like XTrace[6] ease the latterstep, no tools currently exist to automate the former step.Prior work in the area of automatic trace generation of message processing systems has concentrated on the exter-nal messages rather than the internals of the systems. Thesesystems capture and interpret the data at well defined in-terfaces like network sockets (Causeway[2]), Unix systemcalls (BorderPatrol[4]), the libevent entries (WhoDunIt[3])or J2EE calls (Pinpoint [5]). Since these interfaces are stan-dard and well documented, the queues and stages in thosesystems can be readily identified. These techniques do notwork on the various internal queues and threading mecha-nisms that most complex MPSs utilize.We are developing a system that attempts to completelyautomatetheprocessoftracingmessageprocessingsystemsto facilitate understanding their behavior. In this paper wedescribe our approach to the problem of automatically de-termining what instrumentation is needed to generate ap-propriate traces. Our approach uses minimal programmerinput and a combination of static and dynamic analysis toobtain the appropriate information.
Add a Comment