You are on page 1of 11

Universal Web Server for Web Service Components

Proposal Submitted to the

Department of Computer Science

University of Cape Town

By Reinhardt van Rooyen and Andrew Maunder

August 2004
1. Introduction and Background Web servers run as ‘nobody’, which means
that the Web server processes run with very
1.1 Introduction few privileges – this includes access to users’
data. The reasoning behind this technique is
Over the last ten years, the Web has rapidly that if the security of the Web server is
grown to the extent where all businesses and compromised, the resultant rogue processes
organisations, no matter how large or small, will not be able to access any other files on the
desire some sort of Web presence. The Web server. The security of the rest of the system is
page might contain completely static content ensured.
that only changes when updated by the Web
page administrator. The most recent approach Running the Web server as ‘nobody’ does not
is to allow a Web page to show different provide a useful solution to the context-
content in response to the type of requests sent switching problem. Processes created by a
to the Web server. This is described as a page Web server running as ‘nobody’ may only
containing dynamic content. access globally accessible files.

Figure 1 shows a typical situation where a In a multi-user environment, such as a


Web browser formulates a request message university Web server, it is not desirable to
according to the relevant action performed by store users’ source code files in a single ‘code-
a user and sends that request to a Web server. repository’. It is even less desirable to have all
The Web server then routes the request files in a single repository and to set them to be
message to the applicable server application. globally accessible. Any process running on
The request is analysed and a response the server has the opportunity to modify and
message is generated. This message is sent even destroy such files.
back to the Web browser for display.
1.3 Multiple Language Support
The Web server application could be a PHP,
Perl or ASP script, Java servlet or any other Various Web application languages and
server application. toolkits are currently available for server side
development. The most popular of these being
PHP, ASP and Java Servlets. A modern Web
servers instance is unable to support all the
language modules available. Many modules
require modifications to be made to the server
configuration, resulting in incompatibilities
with other language modules.

1.4 Market Survey

A small survey of popular, local Web hosts


revealed a number of interesting facts about
the services that they provide. These services
are summarised in table 1.

Host Name Services Provided

Figure 1: A typical Web server interaction WebOnline PHP on high end packages, no Java Servlet
or JSP support
Afrihost Java Servlet support with dedicated
1.2 Context Switching personal Tomcat server, PHP support with
access to MySQL database.
Current Web servers primarily utilise a Circular No Java Servlet or JSP support, PHP
single directory to store application code; a Systems support with access to a MySQL data base.
common example is the ‘cgi-bin’ directory of DigitalHost No Java Servlet or JSP support, PHP
the Apache Web server. All code, such as Perl support available
or CGI scripts, are stored in that single Table 1
directory, irrespective of who the owner of the
script is. The results confirm our statement that many
Web hosts still provide support for a limited
A serious problem arises from this single number of development languages. Further
‘code-repository’ scheme. In order for a Web investigation revealed that a quarter of the
server process to access the required piece of hosts provided PHP without Java Servlet
code, the process must have the appropriate support. Those that did support both languages
access rights to be able to read or write the did so by providing a private instance of a
source code. Unfortunately most Unix based

2
Tomcat server that would service all Servlet 2.4 CGIWrap
requests.
CGIWrap[5] is a wrapper script that
Our project will investigate the possible allows a server to execute non-trusted (i.e. the
reasons for the lack of concurrent support for script was not made by the Web-master or
Java Servlets and PHP. We will propose a employee) scripts in a controlled, limited
secure architecture that will solve the support environment. The wrapper provides the
problem and provide test results that prove that following functions.
the performance of the proposed system is
comparable with previous systems. • The process executing the script has the
same privileges as the owner of the
script.
2. Previous or Related Work
• Resource ceilings are set to limit the
2.2 The Common Gateway Interface (CGI) resources used by a script to protect
Web server.
The Common Gateway Interface (CGI)[3]
provides a standard interface for Web • Auditing trail that leads directly to
applications to receive requests and pass faulty scripts.
application-generated responses back to the
calling Web server. CGI provides many The wrapper is ‘suid’ to root and can therefore
benefits, which include: change its process ID to match that of the
script it must run. This is desirable as now an
• Simplicity of the CGI commands. non-trusted script will be run with no
privileges but a highly trusted script (i.e. one
• CGI applications can be written in any written by the host system Web programmer)
programming language will allowed to run with much greater
privileges.
• CGI applications run in a separate
process from that of the Web server, 2.5 Sandboxes
thus isolating the Web server processes
from dangerous CGI processes. The sandbox[9] security model is allows a
system to run non-trusted scripts and programs
• CGI applications can run on a variety of in an isolated environment. The sandbox idea
Web server architectures. is analogous to a quarantined area during the
outbreak of a disease. A subject leaving the
There are some inherent problems with the infected area is detained in the quarantined
CGI mechanism. Most importantly, the area so that the medical staff can monitor them
performance of CGI applications running on to see if they show any symptoms of the
the Web server is poor. A new process is disease. In a similar way the server can create
started to handle each request for a CGI a restricted area where the script or program
application. Once an appropriate response has may run, thus restricting the parts of the
been generated, the CGI application process is system that the script can access. The server
destroyed. The excessive process initialisation can then monitor the actions of the script
overhead makes the CGI mechanism during execution, providing it with limited
unacceptable for high performance Web CPU time and memory. If the script is found to
servers. be malicious, the resultant damage to the
system will be minimal. Sandboxes are
2.3 FastCGI currently part of the Java Security Model.[9]

FastCGI[3] provides a fast, open and 2.7 Secure Box (SBox)


secure Web server interface that solves the
performance problems that are inherent in The SBox[5] system takes the CGI wrapper
CGI, without introducing the overhead and idea a bit further. It creates a restricted area for
complexity of proprietary API’s. The FastCGI the target script’s process to exist. The script
interface provides many of the benefits that is possibly infected with a ‘bug’ or
provided by CGI, such as process isolation, dangerous error, is kept in one place and only
language and architecture independence. The has access to the files in the quarantined area.
persistence of FastCGI processes is one of the
major advantages it possesses over the The SBox system can perform the following
traditional CGI system. operations:

3
• Checking that the script is run in the 2.10 Summary Table
appropriate environment or area.
Product Context Persistence Languages
Switching
• Verifies that target script has no Name
Supported
Supported Supported
vulnerabilities. FastCGI No Yes Various
CGIWrap Yes No Various
• Performs ‘chroot’ operation to the
SuExec Yes No Various
directory that contains the target script
and authors other files i.e., HTML files. Mod_PHP No Yes PHP
SpeedyCGI Yes Yes Perl
• Sets the limit for a target script’s SBox Yes No Various
resource usage i.e., CPU, disk and Java Servlets No Yes Java
memory usage.
Table 2

• Lowers the priority of the target scripts A viable solution to these problems would
process. have to provide three essential features, they
are: support of context switching, persistence
• Runs the target script. of processes and the support of any
development language. Table 2 clearly shows
that there is currently no all-encompassing
2.8 SpeedyCGI solution that provides all the features described
above.
SpeedyCGI[8] is a persistent Perl
interpreter that was introduced to attempt to Our research team will attempt to design and
reduce the cost of starting a new process every implement a system that will provide support
time a request is received. The Perl process for all three essential features.
does not exit or die once a request has been
serviced, the SpeedyCGI system attempts to
prolong the lifetime of the process and thereby 3. Project Overview
allow it to service further requests.
3.1.1 Investigation
2.9 suEXEC
We propose the development of a
The Apache research group developed the ‘universal Web’ server that provides a secure,
suEXEC[7] feature for their Apache Web language neutral platform for the execution of
server and introduced it with Apache version Web server components. Such a project would
1.2. suEXEC allows Apache users to run CGI have to address some very relevant issues.
scripts/programs under user IDs that are These have been alluded to in previous
different from the user ID of the calling Web sections and can be summarised by the
server. Without the use of suEXEC, the CGI following:
script/program will run under the same user ID
as the Web server, typically this ID is 1. Is it possible for a single Web-server
‘nobody’. Apache spawns a child process that instance to service all Web components
executes the suEXEC binary and then passes hosted on the system, irrespective of their
the scripts details to it. The suEXEC process implementation language?
then edits its environment variables so as to
create an appropriate environment for the 2. Can a Web-server or subset of the Web-
process to execute in. Finally, the script server system perform a context-switch
executes in this environment. that will allow server scripts to be run
with the same privileges as the script
Their solution hinges on the use of the UNIX owner?
‘setuid’ and ‘setgid’ commands to create
suEXEC, which is considered as a ‘setuid’ 3. Is it feasible to create an instance of a
wrapper program. language interpreter that handles requests
for a specific user-context?

4. Will it be possible to provide script


isolation whereby malicious scripts are
prevented from accessing and possibly
damaging other scripts.

5. Can the resources allocated to script


processes be strictly controlled?

4
3.1.2 Importance during each benchmark. An example could
be the total time taken to produce a
A single, ‘universal’ Web server is response for a given test request.
beneficial for the following reasons.
3.2 System Overview
• Increased security: Only a single Web
server needs to be running on a server 3.2.1 Key Features of System
machine, thereby limiting the access points
to the system. The advanced security The key features of our X-Switch system are
framework provides script isolation and listed below and summarised in Figure 2:
access control, thus ensuring heightened
protection of system and user files.

• Simplicity of deployment: The ability to


place a Web component, written in any
language, into a user’s personal directory
and have the server find and execute it.

• No changes to coding style: The user will


not have to add additional lines of code to
his script or program to ensure system
compliance.

3.1.3 Methods
Figure 2: Proposed system design
The following provides an overview of the
approach to be taken: • Addition of a unique Apache module to
route all PHP and Java Servlet request to
• Identify the attributes of existing solutions the context-switch module.
that enable it one to solve a subset of the
problem domain. This could be done by a • Creation of a context-switching module
code ‘step-through’ or the creation of that is decoupled from the Apache Web
complex test cases to identify the server.
shortcomings of the existing products.
• A persistent process manager that caters for
• Construction of a framework and protocol a multi-user environment thus ensuring
that will allow a request message to be efficient load balancing.
routed from the Apache Web server to the
main modules. • Modular script engines to be created that
will provide a well-defined interface for
• A context switch must be performed before reliable control as well as a robust
any interpreter processes are created. The framework to aid future integration of
context-switching module should be able to additional language interpreters.
read the ID of the requested program or
script and then perform the switch. The • A secure framework for process isolation
subsequent process will have exactly the and resource monitoring.
same privileges as the owner of the script.
3.2.2 Design Challenges – Context Switch
• Once the process has been created, it
should remain persistent to service any When designing the context switch module
subsequent requests for it. (X-Switch) several aspects have to be taken in
consideration.
• A load-balancing module must manage the
life-cycle of the persistent objects to ensure Security of module: The X-Switch module
that every program or script has sufficient must be run as ROOT to enable it to perform
processes to handle its requests. Any the context switch. If the module’s security is
process that remains idle for a prolonged compromised, the attacker will have access to
period should also be destroyed. the whole server system.

• Existing systems will be tested for the Initialisation of Processes: Once the context
establishment of performance benchmarks. switch has been performed, the appropriate
The system can then be accurately engine process must be initialised to execute
evaluated according to its performance the desired script.

5
communication channel that is bound to the
Process Resource Ceiling: A user script must socket. A client can then read from or write to
have a limited amount of resources allocated to the socket.
it. These include CPU time, memory and
allocated disk space. IPC is necessary in two distinct areas of the X-
Switch system. Firstly it is required between
Process Isolation: The system should create a the Apache Web server module and the X-
‘sandbox’ for the script process to run in. The Switch module. Secondly it is required
script process will then only have access to the between the X-Switch module and the script
files in its ‘sandbox’ directory. engines.

An investigation will highlight any


3.2.3 Design Challenges – Persistence of performance differences that exist between the
Processes two approaches and attempt to apply the
technique that is best suited to our
Process Re-use: Once the context switch has requirements. Key factors include
occurred, the system must ensure that all PHP communication overhead and scalability.
or Servlet engine processes remain persistent.
This allows them to handle subsequent X-Switch Module: The X-Switch module can
requests for the same script. be seen as the middle tier of the system. The
module can therefore successfully act as a
3.2.4 Design Challenges – Load Balancing request routing intermediary. Secondly but
more importantly, the X-Switch module will
Number of processes: The X-Switch system need to provide a context switching facility
should be able to dynamically increase the that permits the PHP or Servlet engine
number processes available in a particular processes to be executed under a different ID
context. This is desirable if the number of from that of the Web server.
requests for a particular script increases.
Script Engine: The script engine design will
Termination of Processes: Once the number follow a suitable modular pattern and
of requests for a particular script decreases the implement a well-defined interface. The
X-Switch module should terminate the unused interface will provide the required
processes to free up the resources they utilised. interoperability between the X-Switch module
and the engines. Each engine will be required
3.3 Proposed System Design to initiate its interpreter process and then
provide the appropriate process I/O and
3.3.1 Design Considerations resource management. The initial system will
provide engine modules for handling requests
The Apache module: The Apache Web server for popular PHP and Java Servlet Web
is the most widely used Web server at present. programming languages.
In addition, the source code is freely available
thus ensuring the availability of an extensive 3.3.2 Resources Required
knowledge base of related uses, errors and
technical material. Apache provides a modular The following equipment will be required:
design strategy that allows seamless
integration of functional units. • Unix/Linux based server PC with ‘ROOT’
access, capable of running PHP and Java
Inter process communication (IPC): interpreters as well as the Apache Web
Effective communication is essential for Server.
modular interoperability. We will investigate • Linux/Windows 95/98/2000/XP client PC
the two most popular mechanisms of IPC. The with a Web compatible browser.
first being the Unix named pipe. This is a
FIFO (first in, first out) file that can be opened The following software resources will be
and messages placed inside it by the creator required:
process. The consumer process then reads or
consumes the data from the pipe. The • Apache Web Server.
procedure works in the opposite direction as • Java interpreter.
well (duplex). • PHP interpreter.
• A standard Java interpreter and compiler.
The alternative to named pipes is the use of • A standard C compiler.
TCP/IP sockets. A socket can be seen as the
end point for IPC. The TCP/IP protocol
provides a reliable, connection orientated

6
3.4 Impact 3.5.2 Evaluation techniques

3.4.1 Impact of a ‘universal’ Web server The following evaluation techniques will be
used:
The concept ‘universal’ Web server provides a
system that allows a user to deploy a Web Benchmarking: Performance benchmarks
component quickly and easily. A secure server may be established by testing the existing
process with identical access rights as the user, products, discussed in Section 2, with a variety
interprets the component source code. This of input data. The product, input data and the
ensures the user that no other process can read, resultant data must be well documented to
write or destroy his scripts. allow for experiment replication and
comparison. The input data should test all the
A ‘universal’ Web server will require only a functionality of the system, from normal
single instance to service requests for Web operations to extreme conditions. The test data
applications implemented in any language. documented will be used as input to our
system and the results captured. Accurate
3.5 Evaluation of system replication of the experiments will ensure that
our results are comparable un-biased.
3.5.1 Success factors
The proposed benchmarks will evaluate the
The success of the project will be evaluated following:
according to following criteria:
1. System performance when multiple clients
Simplicity of script deployment: The user attempt to access the same script. Will the
should be able to place his scripts into his own system balance the uneven load
personal directory on the server disk. The X- effectively?
Switch system should automatically find and
execute the script when needed. There should 2. System performance when there are many
be no need for ‘additional’ deployment details clients attempting to access different scripts
or script code. Thus if a script executes simultaneously. Can the system service
correctly on the user’s development system, it each request within a reasonable time
should do the same on the X-Switch server frame?
system.
4. Implementation Plan
Co-Existence of sandboxes: A script should
be executed by a process with identical access 4.1 Time Line
rights as the script owner. If a script is placed
in a personal sandbox, the script processes will See Time Line 1, Appendix A
not be able to access files in other sandboxes.
Co-existence of sandboxes ensures file 4.2 Deliverable Items
security on a multi-user, Web server system.
See Table 3, Appendix A
Response Time of System: The viability of
the X-Switch project hinges on the response 4.3 Milestone Descriptions and Completion
time of the system. Should the X-Switch Criteria
system perform better than previous systems
(see section 2), the project will be considered 1. Definition of Interfaces and
highly successful. Each task will have a Communication Protocols: The modular
minimum required response time allocated to interfaces are defined, according to final
it. system-architecture designs. The runtime,
data flow of the system plays a major role
Security of Web Server and Script in determining the operations that will
Processes: An important component of the need invocation. Interface structure is
project is Web server security. Successful inherited from the operational information
context switching and process isolation will collected. Finally, a communication
limit the accessible domain of any script, thus protocol must define the messages
rendering all system files inaccessible. required for modular communication.

2. Definition of Context Switching and


Apache Module: The functional roles of
the Apache and context switching
modules are finalised. Pseudo-code
depicting the methods and operations

7
required to perform the required functions Interface Definitions: The first task will be
must be included. for the project members to define the modular
interfaces and the format of the messages to be
3. Definition of PHP and Servlet Engines: passed between them. This will allow test data
The run-time environment for a language- generators to be created before the
interpreter process is defined, including implementation of the modules occur.
initialisation variables, resources and
process I/O handling. Member name: Andrew Maunder

4. Creation of modular testing framework The Apache module: Apache module


and synthetic data generators: Modular research, design of module, definition of
test cases are constructed, including module interfaces, implementation of module,
documentation of the correct result sets. module testing and implementation of
Sample data generators are to be used to simulation data sets to prevent project.
synthesise modular data flow for testing
purposes. This also allows for protection Protocol implementation: Identification of
against the delayed completion of a messages required, protocol design, protocol
critical module. Testing is permitted to implementation, data-marshalling.
continue by using the synthesised data
sets. Servlet engine integration: Researching PHP
interpreter operation, process environment
5. Implementation of Interfaces and management, process I/O handling and
Protocols: The final coding of the resource usage.
interfaces and protocols has taken place.
White box testing must be performed by Apache module and Servlet engine testing
the programmer and the results and validation: Creation of test cases and
documented. result sets, development of synthetic test data
generators, white and black box module
6. Implementation of Context Switching testing, black box system testing. Finally,
and Apache Module: The final coding of benchmark tests will be applied to the system
the context switching and Apache to establish whether its performance falls
modules has taken place. White box within an acceptable range.
testing must be performed by the
programmer and the results documented. Member name: Reinhardt van Rooyen

7. Implementation of PHP and Servlet Context switch (X-switch) module:


Engines: The final coding of the context Research of Unix process management,
switching and Apache modules has taken context switching, access control, module
place. White box testing must be design and implementation, load balancing
performed by the programmer and the schemas including process life-cycle control.
results documented.
PHP Engine Integration: Researching PHP
8. Testing of individual modules: Black interpreter operation, process environment
box testing of each module is performed. management, process I/O handling and
The results are compared to the resource usage.
documented results collected during the
completion of section 4. Context switch and PHP engine testing and
validation: Creation of test cases and result
9. Testing of complete system: The sets, development of synthetic test data
complete system is assembled and generators, white and black box module
documented test cases are applied. The testing, black box system testing. Finally,
results are compared to the result sets benchmark tests will be applied to the system
obtained in section 4. to establish if its performance falls within an
acceptable range.
4.4 Task Allocation
Module failure action plan:
The task allocation plan for the project is
derived from the system structure. Below is a Test data synthesis: The well-defined
listing of the group member followed by the modular interfaces allow separate test data
system component name together with the set generators to simulate the correct data output
of tasks associated with it. of a failed module.

8
5. References

1. Apache Software Organisation., 2004, Apache suEXEC Support Technical Document.


Available at: http://httpd.apache.org/docs-2.1/suexec.html.
(Accessed 2 July 2004)

2. Sun Microsystems. 2003. The Java Servlet API: White Paper.


Available at: http://java.sun.com/products/servlet/whitepaper.html
(Accessed 20 July 2004)

3. Open Market Inc. 1996. FastCGI: A high performance Web server interface.
Available at: http://www.fastcgi.com/devkit/doc/fastcgiwhitepaper/fastcgi.htm
(Accessed 21 July 2004)

4. Knambatti, M. 2001, Named pipes, sockets and other IPC. Technical Report, Arizona State University.

5. Stein., L., 2003. sBox: Put CGI scripts in a box. Technical Report, Cold Spring Harbour Laboratory.

6. Laurie B., and Laurie P., Apache: The definitive guide. O’Reilly, Sebastopol, CA, 1999.

7. Coar., K., 2000. suExec and Apache: A tutorial, Apache Software Foundation.
Available at: http://www.serverwatch.com/tutorials/article.php/10825_1126991_1
(Accessed 29 July 2004)

8. Horrocks, S. 2003, Speedy CGI.


Available at: http://daemoninc.com/SpeedyCGI/
(Accessed 31 July 2004)

9. Sun Microsystems. 1999. The Java’s security architecture.


Available at: http://java.sun.com/j2se/1.3/docs/guide/security/spec/security-spec.doc1.html.
(Accessed 28 July 2004)

9
7. Appendix A:
Table 3

Deliverable Item Description Date (2004)


1. Project Proposal Completed 5 August
2. Presentation & Web Page Completed 12 August
3. Project Commences 16 August
4. Report: Definition and theory (Background) 27 August
5. Report: Chapter on Design 16 September
6. First Implementation Running 25 September
7. Evaluation of first implementation 25 September
8. Report: Evaluation and Testing 5 October
9. Paper: First Draft 6 October
10. Report and Poster: First Draft 8 October
11. Final paper hand in 13 October
12. Final Report hand in 15 October
13. Poster hand in as well as Web site updated with project outcome 15 October
14. Project demonstration to project supervisor and second reader 18-22 October
15. Final project demonstration 8-12 November

10
Chart 1:

11

You might also like