Professional Documents
Culture Documents
Abstract—We study the performance of coupled web servers • Overall availabilty of the application and redundancy in
and database servers of certain web-applications. We focus on the case of component failures,
performance bottleneck of one of the most pressing applications • Correctness of concurrent access to central data base
and their average response time. Therefore we present the soft-
ware architecture of this web-application, called “Study-Portal” resources.
of the University of Mannheim, and we identify typical work- For a discussion of general QoS requirements for web appli-
load scenarios. We discuss our performance measurements and cations, such as performance, reliability, scalability, capacity,
document the software and hardware infrastructure used during robustness, exception handling, accuracy, integrity, accessibil-
the measurements. Our goal is the identification of the most
influential parameters which determine the timing behavior of the ity, availability, interoperability, security, and network-related
combined web-application. For the modeling we will use standard QoS requirements see for example [15] and [11]. In this article
stochastic methods, even if it will mean over simplification. Our we only analyze the first point and develop a theoretical model
results then provide a simple estimate for the considered web- with which we can estimate the necessary hardware resources
application: a doubling of application servers leads to half of the for a given work-load scenario. Work-load scenarios are, for
response time in case of sufficiently many clients. The developed
model fits well with the observations in the operating environment example,
of an IT-center. • Many concurrent users (students) with relativly simple
queries and HTML pages with time constraints, e.g.
course selection and registration,
I. I NTRODUCTION • Moderate numbers of complex queries with complex data
base updates (teachers, secretaries and class planning)
At a typical university the core process Study and Teaching
without time constraints, e.g. credits accounting.
is mainly driven by lectures and segmented into three phases:
Planning, Operating and Testing. The University of Mannheim The first work-load is the the most demanding and also the
has two different but integrated IT-Systems supporting the most critical for a positive user experience of our services.
core process: The Study Portal based upon the QISServer [4] Thus in the following we focus on this work-load so that
of the Hochschul-Informations-System (HIS) GmbH and the future investments in infrastructure can be calculated more
Learning Management System ILIAS [5]. As the utilization transparent and objective criteria for purchasing can be found.
of both systems raises continuously the systems are facing This will then be the basis on which the boards of the
an increasing load that cannot be handled by the hardware university can decide between technically feasible, financial
architecture calculated at the beginning of the project. For possible or nice to have services. We already did similar
example at the launch of the term in Fall 2009 the peak load stochastic analysis for a wide-area InfiniBand data link of
in the Study Portal was 1.600 concurrent students and almost Grid-Clusters in [9] and [12]. This article is a revised and
resulted in a system crash. Additionally the university has to translated version of [1].
deal with an increasing number of users in future terms for
the following reasons: On the one side there will be more and II. P ERFORMANCE MEASUREMENTS OF INTEGRATED
WEB - APPLICATIONS
more registration processes for lectures in the Study Portal
because it is more efficient for chairs than using paper lists. In this section we introduce the software architecture of the
On the other side continuously more online functions are web-application Study Portal and describe the hardware and
implemented leading to an enhanced information basis and software infrastructure. The next sub-sections will then explain
to a much better support of the core process. Because of the test and load scenario and present the results.
the strong integration of the Study Portal and the Learning
Management System ILIAS the hardware architecture of both A. Software architecture
systems in the critical core process Study and Teaching must Both the Study Portal and the Learning Management System
be upgraded. The resources to be employed will, in our case, ILIAS have an identical architecture and are constructed of
be mainly a function of the following quality of service (QoS) a Webserver, one or more application server and a database
requirements: server. For simplicity and because both systems have identical
• Number of concurrent users with acceptable response hardware architectures we will only explain the architecture
times for certain work-load scenarios, of the Study Portal that is briefly summarized in figure 1.
Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
24 GB main memory and runs with SUSE Enterprise Linux
SLES 11.
Based on the experience and the performance model in this
article new server hardware was installed. The new hardware
consists of the following components. IBM BladeCenter with
IBM HS22 Blades, 2 x 8-core Intel XEON E5620 (2x32K L1,
6M L2, 12M L3 Cache, 2.40 GHz, 1333 MHz FSB, with HT),
16 GB main memory. It runs with a newer operating system
OpenSUSE 11.4 (x86 64, Kernel 2.6.37.1), HTTP Server
Apache 2.2.17 and with Tomcat 6.0.32. The performance
measurements with this new configuration will be conducted
in the future and the results will be published later.
Fig. 1. Study Portal software architecture overview C. Testing framework and work-load structure
While approaching towards an optimized system, we wanted
to use an efficient approach so that simply replicating the
The webserver is the Apache HTTP-Server [2] that handles
affected server would lead to an enhancement. Taking a close
every request and delegates it to the application server only if
look at the database server we can see that as long as reading
non-statical content has to be served to the client static content
data (SELECT) a load distribution is trivial. Only changing
(for example a picture or a css-file) is served by the webserver
data (for example UPDATE or DELETE) makes the load
itself. All the delegated requests are sent to the application
optimized system of database servers more complex, because
server using the Apache module mod_jk that is the Apache
writing distributive data across the servers forces the system to
Tomcat connector. As mod_jk indicates the application server
maintain the data integrity. In order to realize such a system the
is the Apache Tomcat that serves all dynamic requests (for
database server and the application itself had to be adjusted.
example the lecture index including all lectures) and it is an
So this alternative did not come into consideration in the first
open source reference implementation of the Java Servlet and
phase. Reflecting the fact that the Apache webserver is already
Java Server Pages technology [13]. Each request is executed
balancing the load, balancing some load balancers makes
by a central servlet of the QISServer that can call other
almost no sense because the new bottleneck will always be the
servlets respectively loads data from a database by using the
new load balancer. If we assume that most of the system load
JDBC-Connector. The dynamic pages are then rendered using
is generated by rendering the dynamic pages in the application
the Velocity Template Engine [14] and the database is based
server, the following analysis focuses on finding the optimal
on PostgreSQL [10] a free relational database management
number of those servers. A load test should help providing the
system.
necessary information, but two things were needed to do that:
First a software suite for load testing, whereas an evaluation
B. Hard- and software infrastructure of appropriate software suits lead to the tool Funkload [7]
In this section we document the used hard- and software and second a load test scenario that is comparable to reality.
infrastructure so that the measured performance can be in- Before we explain that load test scenario a brief introduction
terpreted correctly. The performance measurements in section into Funkload will help in understanding how the load test
II-D have been conducted at the IT-center of the University will work.
of Mannheim. The hardware mainly consists of IBM Intel Funkload is developed in python and a test framework that
XEON Blades, with OpenSuse Linux as operating system. The is built upon it with which you can run functional and load
networking infrastructure is build from 1 Gbit Ethernet and a tests. A functional test consists of one or more simple tests
FiberChannel SAN. for example the call of a website and a functional test is only
The first configuration, which led to the development of run once. A load test is based upon a functional test and
the performance model, consists of the following components. runs the test several times according to configuration. In a
IBM BladeCenter with IBM HS21 Blades, 2 x 4-core Intel load test scenario Funkload can also simulate many concurrent
XEON E5430 (2x32K L1, 2x6M L2 Cache, 2.66 GHz, 1333 users by running the functional tests in parallel. Therefore a
MHz FSB, no HT), 16 GB main memory. All blades are separate thread for every user is generated each simulating a
equipped with 2 Gbit FiberChannel and 1 Gbit Ethernet host separate browser environment. After all the threads have been
adapters, the BladeCenter have a FiberChannel Switch and a 1 generated, the systems are under load and Funkload starts to
Gbit Switch with 10 Gbit uplink. FiberChannel hard disks have monitor the request response time and the system environment
not been used for these tests, only the local SAS hard disks for example the CPU, network or memory utilization. While
with a capacity of 140 GB have been used. The operating monitoring the functional tests are iterative executed in order
system is OpenSUSE 11.1 (x86 64, Kernel 2.6.27.25) with to continuously generate load on the systems. If a real life
HTTP Server Apache 2.2.10, Tomcat 6.0.18 and PHP 5.2.11. scenario should be simulated configuring the time a user stays
The data base server has 2 x 8-Core XEON E5530 (1M L2 on a webpage is very important because normally a user is
Cache, 8M L3 Cache, 2.40 GHz, 1066 MHz FSB, with HT), not able to navigate in milliseconds he or she has to read the
116
Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
content on a particular webpage and then decides where to logon. The accounts are delivered randomly using a credential
click next. In Funkload this time can be generated using an server delivered by Funkload. The tolerable maximum request
equal distribution with a minimal and a maximal border. response time was set to 1 second.
117
Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
tolerable. Another load test in a 1-4-1 architecture approved 1 -α -
the assumptions. Figure 4 shows the results. The critical point - ... @R
@ - S(µ)
is at 500 concurrent users in the system that is a doubling
n - α -
of the results in the 1-2-1 architecture. With the exception of
some outliers (that are reproduceable but for which we have no
Fig. 6. Closed application-server-model with n clients
conclusive explanation by now) in the interval between 1.200
and 1.400 concurrent users half of the requests are served less
than 1 second. The other 40 percent approximate very closely In figure 6 the variable n represents the number of clients, µ
at the average response time that also could be improved. So the service rate of server S and α the request rate of the clients.
by now we can state that We suppose an exponential distribution of the corresponding
1) a duplication of the application server leads to a dupli- service and interarrival times. The model consists of n + 1
cation of the maximum number of concurrent users in possible states and transition between them. In the stationary
the system according to the critical point and case we get the following transition diagram, see figure 7.
2) the average response time is cut in half and increases
more slowly by duplicating the application server. nα - -
(n − 1)α -
(n − k)α
··· -
0 1 2 k k+1
µ
µ
···
µ
Fig. 7. Transition diagram
118
Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 8. Probability for waiting Fig. 10. Averaged response time µ∗ htv i
119
Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
ρ
replace ρ by m in the expressions for pw , hki and ht0v i:
ρ
pw (n, ρ) 7→ pw (n, m, ρ) = 1 − Sn−1 ( )
m
(n/m) 1
µht0v i 7→ µhtm
v i= − = hk (m) i
pw (n, m, ρ) ρ
In figures 12 and 13 we can observe how the additional server
systems reproduce the desired result.
(m)
For the reason of the shape of µhtv i, in figure 14, we
see how change the system performance by use of m servers.
If the number of clients n is less n∗1 m, all clients have the
same response time µ1 like one client. For n > n∗1 m we get
a linear increasing function, in which the slope is decreasing
by a factor of m 1 1
. Both effects (1 7→ m ) and (n∗1 7→ mn∗1 )
imply a surprising growth of the speed-up, indeed increasing
Fig. 13. Averaged response time µ∗ htv i for m=1,2,4 server and ρ = 3/4
n reduces it.
We analyze µhtm v i in the asymptotic case (n 1, (1) 1 1
pw (n, m, ρ) ≈ 1) – it follows µhtm µhtv i n− 1−
v i ≈ (n/m)−(1/ρ). Similar Sp = =
ρ
=m
ρn
→ m for n → ∞
(m) n 1 m
to the discussion in section III we introduce the variable µhtv i m − ρ
1− ρn
n∗m = m(1 + ρ1 ) and finally get
n − n∗m n∗ 1 n − n∗m Focusing on qualitative aspects we observe a good agreement
µht(m)
v i≈ + m− = θ(n − n∗m ) + 1 with the empirical data in figure 5.
m m ρ m
120
Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 16. Probability for waiting by m=1,2,4 server and ρ = 3/100 Fig. 18. Averaged response time with DB for m=1,2,4 server by ρ = 3/100
Fig. 17. Averaged response time without DB for m=1,2,4 with ρ = 3/100 Fig. 19. Speeed-up with DB for m=2,4 server by ρ = 3/100
121
Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.
of server needed to get acceptable response times. This is
very important in avoiding dead-lock-situations of many web-
applications, but also by saving a lot of financial and power
resources if you have deployed too many server systems. The
presented results allow an effective, simple and transparent
estimation of the configuration like a thumb-rule. A doubling
of server implies a doubling of the number of clients without
waiting time. Further, in the case of large client numbers
(asymptotic case) we get a cut in half of the slope of the
increasing response time. Both effects can overlap below the
asymptotic region and reveal a relevant non-linear amelioration
of performance.
Acknowledgments
One of the authors, H.G.Kruse, thanks Dr. E. Strohmaier
for the hospitality of a research visiting in LBNL/Berkeley.
The inspiring and exciting atmosphere favored the becoming
of this work. All authors thank the colleagues of the Rechen-
zentrum of the University of Mannheim for the support in the
configuration of the test-bed.
R EFERENCES
[1] H. Kredel, H.-G. Kruse, I. Ott, Lastverhalten und Systemkonfiguration
von Web-Applikationsservern, PIK, Vol. 3, 2011, to appear.
[2] The Apache Software Foundation. Apache HTTP Server. 2011 [last
visited 13.03.2011]; http://httpd.apache.org/.
[3] J. L. Hennessy, D. A. Patterson, Computer Architecture, Morgan
Kaufmann, 2003.
[4] HIS GmbH. HIS-GX und QIS. 2011 [last visited 13.03.2011];
http://www.his.de/abt1/gx.
[5] ILIAS open source e-Learning e.V. ILIAS. 2011 [last visited
13.03.2011]; http://www.ilias.de/.
[6] H.G. Kruse, Leistungsbewertung bei Computer-Systemen, Springer,
2009.
[7] Nuxeo SAS. FunkLoad Documentation. 2011 [last visited 13.03.2011];
http://funkload.nuxeo.org/.
[8] L. L. Peterson, B. S. Davie, Computer Networks – A Systems Approach,
Morgan Kaufmann, 2011.
[9] H. Kredel, H.-G. Kruse, S. Richling, Zur Leistung von verteilten,
homogenen Clustern, PIK, Vol. 2, 2010, pp. 166–171.
[10] PostgreSQL Global Development Group. PostgreSQL. 2011 [last visited
13.03.2011]; http://www.postgresql.org/about/.
[11] Rajesh Sumra, Arulazi D., Quality of Service for Web Services De-
mystification, Limitations, and Best Practices, 2003. [last visited 1. 7.
2011]; http://www.developer.com/services/article.php/2027911.
[12] Sabine Richling, Steffen Hau, Heinz Kredel, Hans-Günther Kruse,
Operating Two InfiniBand Grid Clusters over 28 km Distance, Proc.
3PGCIC-2010, IEEE, 2010.
[13] The Apache Software Foundation. Apache Tomcat. 2011 [last visited
13.03.2011]; http://tomcat.apache.org/.
[14] The Apache Software Foundation. Apache Velocity Engine. 2011 [last
visited 13.03.2011]; http://velocity.apache.org/.
[15] W3C Working Group Note, QoS for Web Services: Requirements and
Possible Approaches. 2003 [last visited 1.7.2011];
http://www.w3c.or.kr/kr-office/TR/2003/ws-qos/
122
Authorized licensed use limited to: VIT University. Downloaded on March 01,2022 at 14:26:35 UTC from IEEE Xplore. Restrictions apply.