You are on page 1of 8

2011 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

Performance analysis and performance modeling of


web-applications
Heinz Kredel, Hans Günther Kruse, Ingo Ott
IT-Center, University of Mannheim, Germany
{kredel,kruse,ott}@rz.uni-mannheim.de

Abstract—We study the performance of coupled web servers • Overall availabilty of the application and redundancy in
and database servers of certain web-applications. We focus on the case of component failures,
performance bottleneck of one of the most pressing applications • Correctness of concurrent access to central data base
and their average response time. Therefore we present the soft-
ware architecture of this web-application, called “Study-Portal” resources.
of the University of Mannheim, and we identify typical work- For a discussion of general QoS requirements for web appli-
load scenarios. We discuss our performance measurements and cations, such as performance, reliability, scalability, capacity,
document the software and hardware infrastructure used during robustness, exception handling, accuracy, integrity, accessibil-
the measurements. Our goal is the identification of the most
influential parameters which determine the timing behavior of the ity, availability, interoperability, security, and network-related
combined web-application. For the modeling we will use standard QoS requirements see for example [15] and [11]. In this article
stochastic methods, even if it will mean over simplification. Our we only analyze the first point and develop a theoretical model
results then provide a simple estimate for the considered web- with which we can estimate the necessary hardware resources
application: a doubling of application servers leads to half of the for a given work-load scenario. Work-load scenarios are, for
response time in case of sufficiently many clients. The developed
model fits well with the observations in the operating environment example,
of an IT-center. • Many concurrent users (students) with relativly simple
queries and HTML pages with time constraints, e.g.
course selection and registration,
I. I NTRODUCTION • Moderate numbers of complex queries with complex data
base updates (teachers, secretaries and class planning)
At a typical university the core process Study and Teaching
without time constraints, e.g. credits accounting.
is mainly driven by lectures and segmented into three phases:
Planning, Operating and Testing. The University of Mannheim The first work-load is the the most demanding and also the
has two different but integrated IT-Systems supporting the most critical for a positive user experience of our services.
core process: The Study Portal based upon the QISServer [4] Thus in the following we focus on this work-load so that
of the Hochschul-Informations-System (HIS) GmbH and the future investments in infrastructure can be calculated more
Learning Management System ILIAS [5]. As the utilization transparent and objective criteria for purchasing can be found.
of both systems raises continuously the systems are facing This will then be the basis on which the boards of the
an increasing load that cannot be handled by the hardware university can decide between technically feasible, financial
architecture calculated at the beginning of the project. For possible or nice to have services. We already did similar
example at the launch of the term in Fall 2009 the peak load stochastic analysis for a wide-area InfiniBand data link of
in the Study Portal was 1.600 concurrent students and almost Grid-Clusters in [9] and [12]. This article is a revised and
resulted in a system crash. Additionally the university has to translated version of [1].
deal with an increasing number of users in future terms for
the following reasons: On the one side there will be more and II. P ERFORMANCE MEASUREMENTS OF INTEGRATED
WEB - APPLICATIONS
more registration processes for lectures in the Study Portal
because it is more efficient for chairs than using paper lists. In this section we introduce the software architecture of the
On the other side continuously more online functions are web-application Study Portal and describe the hardware and
implemented leading to an enhanced information basis and software infrastructure. The next sub-sections will then explain
to a much better support of the core process. Because of the test and load scenario and present the results.
the strong integration of the Study Portal and the Learning
Management System ILIAS the hardware architecture of both A. Software architecture
systems in the critical core process Study and Teaching must Both the Study Portal and the Learning Management System
be upgraded. The resources to be employed will, in our case, ILIAS have an identical architecture and are constructed of
be mainly a function of the following quality of service (QoS) a Webserver, one or more application server and a database
requirements: server. For simplicity and because both systems have identical
• Number of concurrent users with acceptable response hardware architectures we will only explain the architecture
times for certain work-load scenarios, of the Study Portal that is briefly summarized in figure 1.

978-0-7695-4531-8/11 $26.00 © 2011 IEEE 115


DOI 10.1109/3PGCIC.2011.27

Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
24 GB main memory and runs with SUSE Enterprise Linux
SLES 11.
Based on the experience and the performance model in this
article new server hardware was installed. The new hardware
consists of the following components. IBM BladeCenter with
IBM HS22 Blades, 2 x 8-core Intel XEON E5620 (2x32K L1,
6M L2, 12M L3 Cache, 2.40 GHz, 1333 MHz FSB, with HT),
16 GB main memory. It runs with a newer operating system
OpenSUSE 11.4 (x86 64, Kernel 2.6.37.1), HTTP Server
Apache 2.2.17 and with Tomcat 6.0.32. The performance
measurements with this new configuration will be conducted
in the future and the results will be published later.

Fig. 1. Study Portal software architecture overview C. Testing framework and work-load structure
While approaching towards an optimized system, we wanted
to use an efficient approach so that simply replicating the
The webserver is the Apache HTTP-Server [2] that handles
affected server would lead to an enhancement. Taking a close
every request and delegates it to the application server only if
look at the database server we can see that as long as reading
non-statical content has to be served to the client static content
data (SELECT) a load distribution is trivial. Only changing
(for example a picture or a css-file) is served by the webserver
data (for example UPDATE or DELETE) makes the load
itself. All the delegated requests are sent to the application
optimized system of database servers more complex, because
server using the Apache module mod_jk that is the Apache
writing distributive data across the servers forces the system to
Tomcat connector. As mod_jk indicates the application server
maintain the data integrity. In order to realize such a system the
is the Apache Tomcat that serves all dynamic requests (for
database server and the application itself had to be adjusted.
example the lecture index including all lectures) and it is an
So this alternative did not come into consideration in the first
open source reference implementation of the Java Servlet and
phase. Reflecting the fact that the Apache webserver is already
Java Server Pages technology [13]. Each request is executed
balancing the load, balancing some load balancers makes
by a central servlet of the QISServer that can call other
almost no sense because the new bottleneck will always be the
servlets respectively loads data from a database by using the
new load balancer. If we assume that most of the system load
JDBC-Connector. The dynamic pages are then rendered using
is generated by rendering the dynamic pages in the application
the Velocity Template Engine [14] and the database is based
server, the following analysis focuses on finding the optimal
on PostgreSQL [10] a free relational database management
number of those servers. A load test should help providing the
system.
necessary information, but two things were needed to do that:
First a software suite for load testing, whereas an evaluation
B. Hard- and software infrastructure of appropriate software suits lead to the tool Funkload [7]
In this section we document the used hard- and software and second a load test scenario that is comparable to reality.
infrastructure so that the measured performance can be in- Before we explain that load test scenario a brief introduction
terpreted correctly. The performance measurements in section into Funkload will help in understanding how the load test
II-D have been conducted at the IT-center of the University will work.
of Mannheim. The hardware mainly consists of IBM Intel Funkload is developed in python and a test framework that
XEON Blades, with OpenSuse Linux as operating system. The is built upon it with which you can run functional and load
networking infrastructure is build from 1 Gbit Ethernet and a tests. A functional test consists of one or more simple tests
FiberChannel SAN. for example the call of a website and a functional test is only
The first configuration, which led to the development of run once. A load test is based upon a functional test and
the performance model, consists of the following components. runs the test several times according to configuration. In a
IBM BladeCenter with IBM HS21 Blades, 2 x 4-core Intel load test scenario Funkload can also simulate many concurrent
XEON E5430 (2x32K L1, 2x6M L2 Cache, 2.66 GHz, 1333 users by running the functional tests in parallel. Therefore a
MHz FSB, no HT), 16 GB main memory. All blades are separate thread for every user is generated each simulating a
equipped with 2 Gbit FiberChannel and 1 Gbit Ethernet host separate browser environment. After all the threads have been
adapters, the BladeCenter have a FiberChannel Switch and a 1 generated, the systems are under load and Funkload starts to
Gbit Switch with 10 Gbit uplink. FiberChannel hard disks have monitor the request response time and the system environment
not been used for these tests, only the local SAS hard disks for example the CPU, network or memory utilization. While
with a capacity of 140 GB have been used. The operating monitoring the functional tests are iterative executed in order
system is OpenSUSE 11.1 (x86 64, Kernel 2.6.27.25) with to continuously generate load on the systems. If a real life
HTTP Server Apache 2.2.10, Tomcat 6.0.18 and PHP 5.2.11. scenario should be simulated configuring the time a user stays
The data base server has 2 x 8-Core XEON E5530 (1M L2 on a webpage is very important because normally a user is
Cache, 8M L3 Cache, 2.40 GHz, 1066 MHz FSB, with HT), not able to navigate in milliseconds he or she has to read the

116

Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
content on a particular webpage and then decides where to logon. The accounts are delivered randomly using a credential
click next. In Funkload this time can be generated using an server delivered by Funkload. The tolerable maximum request
equal distribution with a minimal and a maximal border. response time was set to 1 second.

D. Performance measurements in the testing environment


At first we had to determine the base load in the system
in order to have a reference value for following load tests.
Therefore the system was sized down into a simple 1-1-1
architecture, i.e. 1 web, 1 application and 1 database server.
Additionally the number of concurrent users in each load
test was raised from 1 to 400 in steps of 50 users. Taking
a look at the results shows that in a 1-1-1 architecture the
tolerable maximum of the request response time is reached at
125 concurrent users in the system. At this critical point the
average request response time almost exponentially increases
till having a very bad value of 8.5 seconds for 400 concurrent
users. Figure 2 shows a cut-out of the results. As one can
see half of the requests are served in less than 2 seconds
(comparing the median in illustration 2). On the other side
Fig. 2. Measurements for 1 server all of the other requests take a very long time to be served
– implied by the 90 percent quantile in figure 2 - and has
Finally the load test scenario of the Study Portal is config- negative effect on the average response time.
ured as follows: The residence time on a webpage is equally
distributed between 2 and 5 seconds this is a normal value
regarding that some pages contain less and the others more
content. Additionally there is a break when navigating between
two webpages of 1 second and the threads starting with a delay
of 0.15 seconds. The duration of the monitoring phase is set
up to 2 minutes. The real life test scenario is derived from the
incident in Fall 2009 as 1.600 students used the Study Portal
concurrently:

Fig. 4. Measurements for 4 servers

Because the behavior of the system in a 1-1-1 architecture is


known a second load test should be executed in order to give
information about the system behavior in a 1-2-1 architecture
where 2 application server are sharing the load. The results
can be seen in figure 3.
The figure shows that the critical point at which the tolerable
maximum request response time crosses the 1 second-mark
Fig. 3. Measurements for 2 servers rises to 250 concurrent users which is a duplication compared
to the 1-1-1 architecture. Again half of the requests are served
1) Call the start page of the Study Portal under 2 seconds, but the other 40 percent approximate the
2) Logon with a random user average response time whereupon the average response time
3) Navigate through the lecture index and load a defined is increasing slower than before. However at the maximum
webpage of a lecture of 800 concurrent users the system has still a very high
4) Logoff the Study Portal response time that is unacceptable for users. The assumption
For the duration of the load test the authentication method is obvious that another duplication of the application servers
of the Study Portal was configured to accept a predefined ip could move the critical point again and lead to a duplication of
and an account of an arbitrary user in order to accept the the number of concurrent users at which the response times are

117

Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
tolerable. Another load test in a 1-4-1 architecture approved 1 -α -
the assumptions. Figure 4 shows the results. The critical point - ... @R
@ - S(µ)
is at 500 concurrent users in the system that is a doubling 
n - α -
of the results in the 1-2-1 architecture. With the exception of

some outliers (that are reproduceable but for which we have no
Fig. 6. Closed application-server-model with n clients
conclusive explanation by now) in the interval between 1.200
and 1.400 concurrent users half of the requests are served less
than 1 second. The other 40 percent approximate very closely In figure 6 the variable n represents the number of clients, µ
at the average response time that also could be improved. So the service rate of server S and α the request rate of the clients.
by now we can state that We suppose an exponential distribution of the corresponding
1) a duplication of the application server leads to a dupli- service and interarrival times. The model consists of n + 1
cation of the maximum number of concurrent users in possible states and transition between them. In the stationary
the system according to the critical point and case we get the following transition diagram, see figure 7.
2) the average response time is cut in half and increases
more slowly by duplicating the application server.  nα - -
(n − 1)α  -
(n − k)α
··· -
0 1 2 k k+1

µ 

µ 
··· 
µ 
Fig. 7. Transition diagram

The diagram describes a system of equations for the state


probabilities πk : (n − k)απk = µπk+1 , k = 0...n − 1.
Introducing the dimensionless parameter ρ = αµ we calculate
 
n 1
πk = k! ρk π0 , π0 = ,
k Sn (ρ)
1 1
Sn (ρ) = e ρ Γ(n + 1, )ρn .
ρ
Relevant for further discussions are only the probability pw
for waiting and the mean value hki of active clients, we get
Fig. 5. Measurement summary, 1, 2 and 4 servers pw = 1 − π0 = 1 − Sn1(ρ) and for the mean value
n
Taking a deeper look at the results in particular at the X
average response time one can see by comparing the times hki = kπk
k=1
that the increase is driven by system. The comparison is shown n  
in figure 5. The gradient of the average response time seams
X n
= π0 ρ k! kρk−1
to be cut in half too if you duplicate the application server. k
k=1
This could explain the slower increase of the average response d
time. = π0 ρ (Sn (ρ) − 1)

On the one hand side these results are used to calculate the pw
number of application servers that are needed to serve 1.600 = n−
ρ
concurrent students in times less than 1 second. On the other
hand the load tests and the following analysis can help to Figures 8 and 9 demonstrate the behavior of pw and hki for
identify bottlenecks in the system and to improve the system two different load parameters ρ = 14 , 34 .
by iterative load testing. The calculation of the averaged response time ht0v i is done
by Little’s Law hαi · ht0v i = hki, in which hαi the averaged
request rate represents
III. P ERFORMANCE MODELING
n
X
Our main goal will be to determine the variables which hαi = αk πk = α(n − hki) with αk = α(n − k)
fix the response time of the system described in section II k=0
Possible tools may be benchmarks or simulations [6] – but n 1
the transfer of the results to other systems is often very µht0v i = −
pw ρ
difficult and not transparent. Therefore we choose modeling
by stochastic standard methods – despite this we know about Since µht0v i is a dimensionless variable it is easy to plot it and
the disadvantages caused by the necessary simplifications. Our we observe a good qualitative agreement with the experimental
assumptions are based on the classical file-server-model [8], data (figure 5).
which we describe in the sequel as an application-server- Analyzing this data we see the importance of the asymptotic
model. case n  1, with pw (n, ρ) ≈ 1 and the averaged response

118

Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 8. Probability for waiting Fig. 10. Averaged response time µ∗ htv i

Fig. 11. Speed-up by load values ρ = 3/4 and 1/4


Fig. 9. Mean value hki of clients

IV. E XTENDED PERFORMANCE MODELING


time, which ist approximately a linear function µht0v i ≈ n− ρ1 .
If we define n∗1 = 1+ ρ1 we can rewrite µht0v i ≈ (n−n∗1 )θ(n− The discussion in the previous section III show that a
n∗1 ) + 1 (θ(x) = 0, if x < 0, otherwise 1 is the well known waiting probability less than 1 is only valid for a small amount
Heaviside Function) and get an excellent description of the of clients – implied by the small load. Otherwise from the
situation. analysis of the empirical data it is obvious, that an essential
In the case µht0v i ≈ 1 all active clients get the same reduction of the load is only successful by parallel server
response time – it does not exist any waiting time. Otherwise, systems. This results in the goal of the modeling process to
if µht0v i ≈ n − n∗1 + 1 the asymptotic behavior is strictly linear diminish the response time and to reproduce the empirical
and increasing. Choosing a different load ρ does not change data.
the slope of the response time and a decreasing course we In figure 6 we replace the server S(µ) by m · S(µ) (m
will get only for small load values ρ – these enhanced the number of parallel servers). The next logical step may be
critical value of n∗1 . Figures 10 and 11 show the speed-up to change the transition diagram in figure 7. Since we want
µht0v (n, ρ = 34 )i/µht0v (n, ρ = 14 )i, the gradual reduction of only a reduction of the load, which corresponds to a smaller
the load by a factor 3 implies only a doubling of speed-up. probability pw of waiting (figure 8), it will be sufficient to

119

Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
ρ
replace ρ by m in the expressions for pw , hki and ht0v i:
ρ
pw (n, ρ) 7→ pw (n, m, ρ) = 1 − Sn−1 ( )
m
(n/m) 1
µht0v i 7→ µhtm
v i= − = hk (m) i
pw (n, m, ρ) ρ
In figures 12 and 13 we can observe how the additional server
systems reproduce the desired result.

Fig. 14. Asymptotic behavior for m=1,2,4 and ρ = 3/4

Fig. 12. Waiting probability for m=1,2,4 server and ρ = 3/4

Fig. 15. Speed-Up for m=2,4 server and ρ = 3/4

(m)
For the reason of the shape of µhtv i, in figure 14, we
see how change the system performance by use of m servers.
If the number of clients n is less n∗1 m, all clients have the
same response time µ1 like one client. For n > n∗1 m we get
a linear increasing function, in which the slope is decreasing
by a factor of m 1 1
. Both effects (1 7→ m ) and (n∗1 7→ mn∗1 )
imply a surprising growth of the speed-up, indeed increasing
Fig. 13. Averaged response time µ∗ htv i for m=1,2,4 server and ρ = 3/4
n reduces it.
We analyze µhtm v i in the asymptotic case (n  1, (1) 1 1
pw (n, m, ρ) ≈ 1) – it follows µhtm µhtv i n− 1−
v i ≈ (n/m)−(1/ρ). Similar Sp = =
ρ
=m
ρn
→ m for n → ∞
(m) n 1 m
to the discussion in section III we introduce the variable µhtv i m − ρ
1− ρn
n∗m = m(1 + ρ1 ) and finally get
n − n∗m n∗ 1 n − n∗m Focusing on qualitative aspects we observe a good agreement
µht(m)
v i≈ + m− = θ(n − n∗m ) + 1 with the empirical data in figure 5.
m m ρ m

120

Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 16. Probability for waiting by m=1,2,4 server and ρ = 3/100 Fig. 18. Averaged response time with DB for m=1,2,4 server by ρ = 3/100

Fig. 17. Averaged response time without DB for m=1,2,4 with ρ = 3/100 Fig. 19. Speeed-up with DB for m=2,4 server by ρ = 3/100

too low. The reason is the data-base-requests of the web-


applications. In the context of the discussion in section III
V. I MPROVED PERFORMANCE MODELING
modeling techniques by Markowian models like M/M/1/∞
Reproducing the measured response times and the speed- or M/G/1/∞ were appropriate. In both cases the response
µ
ups (sections II-A, II-D) in a quantitative manner requires time of the data base is fixed by the parameter ρDB = m µDB .
1 µ DB
a change in the describing model. First we have to check With a linear approximation µDB (1 + m µDB ) ≈ htv i and
µ
the distributions of the request rate α and the service rate µDB = 0.9 we achieve a very good reproduction of measured
µ, second we have to fix their explicit values. Since the data, look at the calculated speed-up in figure 19 and compare
µ
shapes of the theoretical and empirical response times agree with figure 5. A slight variation of the variables α, µ and µDB
(in a qualitative sense at least) we skip its verification. The hardly changes the shapes.
parameter ρ = α µ will be fixed at m = 1 by the relation
m = 1 + ρ1 ≈ 35 to ρ ≈ 0.03 and the value of µ about the VI. S UMMARY AND C ONCLUSION
measured slope of the asymptotic linear function to µ ≈ 25. We hope that the discussion and analysis in the previous
For this reason the probability pw (n, m, ρ) for waiting and the sections show the advantages of modeling techniques which
response time htm v i have the shape shown in figures 16 and are a little bit more complex. Thereby it was possible, by
17. a given number of clients, to estimate the minimal number
A detailed analysis results, that the values of htm v i are

121

Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
of server needed to get acceptable response times. This is
very important in avoiding dead-lock-situations of many web-
applications, but also by saving a lot of financial and power
resources if you have deployed too many server systems. The
presented results allow an effective, simple and transparent
estimation of the configuration like a thumb-rule. A doubling
of server implies a doubling of the number of clients without
waiting time. Further, in the case of large client numbers
(asymptotic case) we get a cut in half of the slope of the
increasing response time. Both effects can overlap below the
asymptotic region and reveal a relevant non-linear amelioration
of performance.

Acknowledgments
One of the authors, H.G.Kruse, thanks Dr. E. Strohmaier
for the hospitality of a research visiting in LBNL/Berkeley.
The inspiring and exciting atmosphere favored the becoming
of this work. All authors thank the colleagues of the Rechen-
zentrum of the University of Mannheim for the support in the
configuration of the test-bed.

R EFERENCES
[1] H. Kredel, H.-G. Kruse, I. Ott, Lastverhalten und Systemkonfiguration
von Web-Applikationsservern, PIK, Vol. 3, 2011, to appear.
[2] The Apache Software Foundation. Apache HTTP Server. 2011 [last
visited 13.03.2011]; http://httpd.apache.org/.
[3] J. L. Hennessy, D. A. Patterson, Computer Architecture, Morgan
Kaufmann, 2003.
[4] HIS GmbH. HIS-GX und QIS. 2011 [last visited 13.03.2011];
http://www.his.de/abt1/gx.
[5] ILIAS open source e-Learning e.V. ILIAS. 2011 [last visited
13.03.2011]; http://www.ilias.de/.
[6] H.G. Kruse, Leistungsbewertung bei Computer-Systemen, Springer,
2009.
[7] Nuxeo SAS. FunkLoad Documentation. 2011 [last visited 13.03.2011];
http://funkload.nuxeo.org/.
[8] L. L. Peterson, B. S. Davie, Computer Networks – A Systems Approach,
Morgan Kaufmann, 2011.
[9] H. Kredel, H.-G. Kruse, S. Richling, Zur Leistung von verteilten,
homogenen Clustern, PIK, Vol. 2, 2010, pp. 166–171.
[10] PostgreSQL Global Development Group. PostgreSQL. 2011 [last visited
13.03.2011]; http://www.postgresql.org/about/.
[11] Rajesh Sumra, Arulazi D., Quality of Service for Web Services De-
mystification, Limitations, and Best Practices, 2003. [last visited 1. 7.
2011]; http://www.developer.com/services/article.php/2027911.
[12] Sabine Richling, Steffen Hau, Heinz Kredel, Hans-Günther Kruse,
Operating Two InfiniBand Grid Clusters over 28 km Distance, Proc.
3PGCIC-2010, IEEE, 2010.
[13] The Apache Software Foundation. Apache Tomcat. 2011 [last visited
13.03.2011]; http://tomcat.apache.org/.
[14] The Apache Software Foundation. Apache Velocity Engine. 2011 [last
visited 13.03.2011]; http://velocity.apache.org/.
[15] W3C Working Group Note, QoS for Web Services: Requirements and
Possible Approaches. 2003 [last visited 1.7.2011];
http://www.w3c.or.kr/kr-office/TR/2003/ws-qos/

122

Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.

You might also like