GNU MyServer Threads Model

The MyServer threads model explained
Giuseppe Scrivano
gscrivano@gnu.org
Abstract
In the era of Internet and Web 2.0, web servers have the impor-
tant role to serve information as quickly as possible. In the Web 1.0
era, web contents were mainly static and rarely modified; AJAX web
sites now introduce many additional requests for web contents to be
updated dynamically.
In this scenario there is need for web servers able to serve many
clients at the same time. Threads are the ideal choice to handle many
requests: they don’t need much resources and share the same memory
among them.
1 Introduction
MyServer is a web server based on a multithreaded architecture; many con-
nections can be served at the same time by the same process.
Other common architectures are the single threaded server, where a pro-
cess can handle only one connection at time and the multi-processes server,
where a new process is created to handle each request. A process needs
many resources to be allocated. Within UNIX-like operating systems this is
done by duplicating the calling process, through a mechanism called “fork”.
Some modern implementations allow the child process to share memory with
the parent process until one of them modifies something (copy-on-write).
Although all these are improvements to the original mechanism, creating a
process is still a slow operation.
Aside from processes there are threads or lightweight processes, that do
almost the same things that processes do, with the advantage of being faster
to create than a process. Threads are not only faster to create, but they use
fewer resources than processes; and every thread needs to save only its stack
and CPU registers status information.
Threads of the same process share the same memory and descriptors table,
as well as access to any opened resource (files, sockets, etc...). The cost of
1
better overall performance is the introduction of many security problems.
From this point of view, what was an advantage performance-wise turns
into a disadvantage. Sharing the same memory needs some synchronization
mechanism between threads; two or more threads must not read or write
data at the same time and a single thread that crashes will cause the the
entire server process to crash.
Despite these difficulties, MyServer uses a single process-multiple threads
architecture that can serve multiple clients using different threads on a single
process.
Synchronization is performed through different mechanisms: semaphores,
mutexes, events. However, an excessive usage of these can transform a mul-
tithreaded application’s performance into that of a single-process one, with
only one request served at time.
In addition, MyServer has an internal scheduler to handle connections,
and allows the possibility of defining a priority for each of them. Giving
higher priority to a connection will make it possible for requests from this
connection to be processed faster than requests from a connection with a
lower priority.
2 Threads pool
An allocated threads-pool is supposed to serve any request to the server.
Since it is the job of the scheduler to listen for new connections or new
requests,and it internally handles the order of requests to be served, the
threads need to contact the scheduler to be notified of a new request.
When the threads contact the scheduler for a new request to handle,
they are put in a “sleep” status and wake up when request is active (or
when some other special case occurs). Synchronization here is done using a
semaphore that is increased when a connection is moved to the ready queue
and decreased when a thread acquires a connection to process.
Figure 1 shows a situation with N allocated threads and a new request
is ready, a thread that previously was in a waiting status is woken up and
the ready connection is sent to it.
2.1 Number of threads

The number of threads is not fixed; more are allocated when there are more
connections to be processed and removed when the server activity is low.
This value depends on two values present in the main configuration file:
N T HREADS ST AT IC and N T HREADS M AX, the first one specifies
2
Figure 1: Threads pool
the number of threads always active: they will not be destroyed while the
server is alive. The second one specifies the maximum number of threads that
can be created, considering both static threads and dynamically allocated
ones. It is right to state that the number of threads will vary between these
two values and the exact value at any given time depends on the server load.
When a thread is not used
for several seconds it is marked
for removal. It is easy to imag-
ine that if there are no requests
for some seconds, all the threads
are marked and removed until
the number of threads is exactly
N T HREADS ST AT IC. To pre-
vent this behaviour, a mechanism
called “slow-stop” is used.
The name comes from the TCP
protocol’s “slow-start” mechanism.
The “slow-start” aim is to prevent
network congestion by increasing
the number of packets to be sent
until a threshold value is reached.
Figure 2: New thread creation
If a packet is not delivered correctly
in this phase, the number of packets
to be sent is reduced to avoid additional congestion.
3
The MyServer “slow-stop” mechanism works in a complementary way.
The number of threads to be removed is increased at some specific times and
it continues until the number of active threads is N T HREADS ST AT IC.
If at this phase there is need to create a new thread, the “slow-stop” sys-
tem ceases marking and removing threads. Different from TCP “slow-start”
mechanism, “slow-stop” uses a more aggressive technique, as the “slow-stop”
value is reset to its initial value.
This method allows the number of threads to be decreased exponentially
when there is no need for all of them. However, threads are created linearly.
When a request passes into the ready queue, MyServer checks if there are
sleeping threads ready to process it, if there are none and the current number
of threads is less than N T HREADS M AX then a new thread is created.
Figure 2 shows a scenario where all the threads are already busy and the
thread N + 1 is created to serve the incoming request.
3 Connections scheduler
The connections scheduler uses N priority classes, where N is a constant
defined at compile-time. The scheduler is based on a simple rule, a connection
with priority X has twice the probability of being served at a given time than
a connection with priority X-1.
The scheduler is implemented with a counter that counts how many con-
nections were served on the current priority queue. When the number of
connections served is equal to the number of connections in the queue, the
pointer passes on to the next queue. When the last queue is completed, the
pointer is then positioned onto the first queue to start its count over. If there
are no ready connections to be served on the current queue then the pointer
automatically passes on to the next one.
When a connection in the waiting queue has new data available to be
processed, it is then moved to the ready queue associated with its priority.
The connection is not really removed from the waiting queue but a flag is
used to mark it as used. This way, at any moment the waiting queue contains
all the connections active on the server.
With such a design there are no starvation problems; any priority queue
will be processed at some time and every request will be served. Asymp-
totically, every queue N will take half the time to process requests than the
queue N − 1, but no queues will be deprived of time.
Connections in the waiting queue are subject to timeouts, when a con-
nection doesn’t request anything from the server for a long time, then it is
automatically removed from the connections pool. Connections marked as
4
Figure 3: Scheduler
Figure 4: New data available for a connection
used are not subject to timeouts.

A connection removed from the ready queue to be processed is still present
in the scheduler (in the waiting queue), but it is not registered to be checked
for new events, it is the serving thread’s responsibility to register it again
with the scheduler when it has finished its work. Under some circumstances
(with HTTP pipelining, for example) which allow multiple requests to be sent
at once, making them appear like a single data block, it is the responsibility
of the serving thread itself to register the connection as ready for the next
phase of processing.
5
4 Experiments
The tests were done on an AMD64 3200+ with 512MB of RAM using version
“2.6.21-1-amd64” of the linux kernel.
The first set of requests were done against a KByte sized file, while the
second set of requests were done against a MByte sized file. Requests for a
small file keep the serving thread busy for a short time and single-threaded
server performance is not so different from a multithreaded one; most of the
work is spent on managing and initializing the connection. With a bigger file,
it is evident how a multithread approach offers better performance; giving
performance boost of almost +50% on served requests.
Empirically, it is possible to say that there is a limit to number of active
connections to the server such that any new connections over the limit will
not benefit from a throughput increase (and in fact incur a decrease due to
the additional work necessary to manage them).
This point becomes more right-positioned as the number of used threads
increases. In other words, this limit is proportional to the number of threads,
and after this limit there are no more benefits if the number of connections
increases.
It should be noted, however, that the performance beyond this point is
still better than the single-threaded model.
6
(a) 1 single thread (b) 10 static threads
(c) 10 static threads and 30 as maximum (d) 50 static threads and 100 as maxi-
number of threads mum number of threads
Figure 5: Average requests served per second (higher values are better) on a
Kbyte file
7
Some notes on the tests: both the server and the client doing the requests
were placed on the same machine. This influences the results, especially when
there are many active connections, so you can imagine a real value multiplied
by a value proportional to the number of connections; in a networking envi-
ronment, differences between a single threaded model and other types would
be more evident as the serving threads are kept busy for a longer time due
to network latency and possible transmission errors at TCP level.
(a) 1 single thread (b) 10 static threads
(c) 10 static threads and 30 as maximum (d) 50 static threads and 100 as maxi-
number of threads mum number of threads
Figure 6: Average requests served per second (higher values are better) on a
MByte file
First figure of each set shows how many requests per second can be served
by MyServer using a single thread. In the second figure, 10 static threads
are used. In the third, 10 static threads which may be increased up to 30 if
there is a need for more threads; the last figure shows the request throughput
using 50 static threads with an upper limit of 100. The X axis shows the
number of connections concurrently requesting the file from the Server; a
higher throughput means better server performance.
8
Thanks to Edmund Gonzalez for his suggestions and mistakes correction
on the first revision of this article.
References
[1] The MyServer web server http://www.myserverproject.net.
[2] The C10K problem http://www.kegel.com/c10k.html.
[3] The linux O(1) scheduler http://www.hpl.hp.com/research/linux/kernel/o1.php.
[4] The libevent library http://monkey.org/ provos/libevent/.
[5] The slow-start algorithm http://en.wikipedia.org/wiki/Slow-start.

GNU MyServer Threads Model

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GNU MyServer Threads Model

Uploaded by

Copyright:

Available Formats

The MyServer threads model explained

2.1 Number of threads

Figure 4: New data available for a connection

used are not subject to timeouts.

(a) 1 single thread (b) 10 static threads

[2] The C10K problem http://www.kegel.com/c10k.html.

[3] The linux O(1) scheduler http://www.hpl.hp.com/research/linux/kernel/o1.php.

[4] The libevent library http://monkey.org/ provos/libevent/.

[5] The slow-start algorithm http://en.wikipedia.org/wiki/Slow-start.

You might also like