You are on page 1of 4

PRACTICAL DESIGN AND IMPLEMENTATION OF WEB-BASED

DOCUMENT MANAGEMENT SYSTEMS


Jason Yao* and Jessica H. Li**
*National Taiwan University

ABSTRACT
This paper presents the design and implementation of two
web-based document management systems customized to
serve the need of an international organization. The systems
were designed to replace a paper-based process that
gradually became unmanageable due to the increasingly
large volume of texts and the need to distribute the
documents to geographically remote locations. Features of
the new systems include online recording of forms and texts,
automatic email notifications and reminders, electronic
authorization, database query functions, a secure mechanism
for web access of documents for approval and future
references. We describe the design principles and introduce
the functional modules in each system. Besides technical
issues, a paragraph is devoted to discuss the factor of user
behaviors in the process. Both systems were implemented
on Windows and UNIX platforms using open-source
software such as PHP4, MySQL, Apache, etc. They have
been functioning online for more than three years and have
processed over 1,000 documents without any problem.

** State Fund, California, U.S.A.

Memo) and EDRF (External Disclosure Request Form).


Originally all the documents were paper-based. The
authoring researchers print their manuscripts and fill out
relevant forms for administrative staff to handle the
requests. Depending on the nature of the documents,
they require a sequence of signatures from managers at
several levels before being distributed to dozens of
colleagues or associates at various locations, often in
different continents. Besides the obviously hefty
paperwork involved, the load can aggravate
tremendously toward the end of each fiscal year when
researchers hurry to file their work for performance
reviews and trace the documents they submitted earlier.
By the end of 2002, the administrative support for
document handling became so overwhelming that the
process was facing an imminent breakdown. An
alternative system was urgently needed and the authors
were called upon to design and provide a company-wide
web-based solution.

2. DESIGN OBJECTIVES
1. INTRODUCTION

Company F is a research and development arm of a


global high tech conglomerate and has grown rapidly
during the last 10 years from four to more than a
hundred employees located in three different US states.
Most employees are engaged in research of advanced
technologies with goals to transfer their results to
business units or to obtain intellectual property rights for
the company. In the process they produce technical
memos, publish their studies in conferences or journals
and file patent applications. The latter three activities are
considered external disclosure and need proper
authorizations. Two forms are required: TM (Technical

The purpose of the project is to deliver two web-based


online management systems that handle company-wide
submission/distribution of TM and EDRF entries with
corresponding contents electronically and automatically.
The new systems are designed to replace the existing
paper-based processes that are not only too costly but
also becoming unmanageable due to the increasing
volume of documents.
3. DESIGN PROCESS

The major desired features of the systems include


submitting EDRF and TM forms through the company
intranet, uploading EDRF papers and TM manuscripts to

10th IEEE International Enterprise Distributed Object Computing Conference Workshops (EDOCW'06)
0-7695-2743-4/06 $20.00 2006

file systems, sending email notifications to the authors


and their managers upon the completion of each
transaction, allowing authors to modify submitted
EDRFs and TMs before they are approved, searching
database by different criteria, consistent cross-platform
format in the printouts of the forms, user and
administrator account management and other associated
functions.
The additional functional requirements specific to EDRF
handling include:
1. allowing the researchers to track and/or update
the status of EDRF papers
2. approving EDRFs sequentially by relevant
upper management
3. generating various statistics reports upon
request
The further functional requirements specific to TMs
include:
1. distributing TM abstracts and manuscripts to
company internal staff and other affiliate
divisions via email
2. allowing users to create their own distribution
list templates beforehand so that they can re-use
the lists each time they submit new TMs
3. restricting the access to confidential TMs (i.e.
access control)
To achieve the above design goals, open source
technologies Apache, PHP4 and MySQL are selected in
the project to build EDRF and TM dynamic Web sites.
Apache is the most popular Web server, PHP is a serverside scripting language with increasing popularity among
web developers, and MySQL is a fast and reliable
database management system. Our investigation found
that many experts recommend the combination of PHP
and MySQL as the best solution for creating data-driven
sites. We also took into account that these technologies
are stable and readily available on both UNIX and
Windows platforms. To conform to the companys
existing system environment, w e decided to develop
PHP/MySQL Web applications on Windows
development machines, test run new applications on
Windows, upload them to Linux staging servers t o
further test the workflow, email notification and file
distribution functionalities, and finally roll out the
applications to production on UNIX servers.
4. DEVELOPMENT

Prior to developing the new systems, the company hired


another programmer briefly to automate recording EDRF
entries online using Perl. After evaluating the source

code, we concluded that the software was inadequate


for lack of expandability and link to database functions.
Thus we decided to develop new systems from scratch.
However, a legacy list of EDRFs had to be converted
and recorded to the new database. Consequently, in
addition to establishing the backend database system for
new applications, we also managed to integrate t h e
legacy data into the new database successfully. Based on
these requirements, eight major front-end modules are
developed for each system to fulfill the overall
functionality. They are:
1. Create New EDRF/TM Request
2. Modify EDRF/TM Record (before it is
approved)
3. Paper Status Update (EDRF only)
4. Create/Edit Distribution List (TM only)
5. Create EDRF/TM User Account, Index to All
Records
6. Search EDRF/TM Records
7. View Paper Viewables (EDRF)/View TM
Manuscripts (TM)
8. Administration module.
All front-end modules can be accessed through the
website home page. On the backend, the database
schema is designed to store all the information posted via
web pages and database tables are created for EDRF and
TM systems respectively.
Upon the completion of the software development, more
than one hundred PHP scripts are coded for both EDRF
and TM management systems, about fifty for each of
them. Each PHP script performs one or multiple related
tasks and corresponds to one web page. For example,
EDRF_submitConfirm.php inserts new request record
into database and sends back confirmation with unique
record ID to the user, EDRF_updateRecord.php
modifies or deletes the selected record where record ID
is passed to the script as a parameter,
TM_editDistriList.php adds, edits or deletes distribution
list for TM electronic distribution where the list is
maintained by each user himself/herself only, etc. In
addition, JavaScript is used in some PHP programs to
perform client-side data verification and all PHP scripts
have error checking mechanism in place and handle
errors gracefully.
Besides installing PHP4 and MySQL, a few small
software tools are also installed for EDRF and TM
systems to support some special functionality such as
installing Ghostscript-8.00 to support PDF printing
function, downloading html2ps-1.0b3 and using Perl
script html2ps to convert HTML page to PostScript file,

10th IEEE International Enterprise Distributed Object Computing Conference Workshops (EDOCW'06)
0-7695-2743-4/06 $20.00 2006

installing Perl module MIME-Lite-3.01 to support file


attachments in the emails for TM electronic distributions.
Meanwhile, a couple of UNIX Cron jobs are developed
for this project in Perl to automatically distribute
approved TM manuscripts daily, check EDRF paper
status and send follow-up notifications daily or weekly
as specified by the administrator. Furthermore, the site
configuration
files
EDRF_config.inc.php
and
TM_config.inc.php, which configure site identification
information, database settings, etc., are also deployed
together with other PHP scripts for EDRF and TM web
sites. The benefit of using configuration files is that the
whole software package of each site can be deployed to
any web server by only changing the settings in the
configuration files and nothing else needs to be modified
outside those files when setting up a new site.
A flowchart of the EDRF workflow management
system is included in Appendix 1.
5. USER BEHAVIOR

During test run of the new systems, we received


feedbacks from users and encountered difficulties as
they adapted to the new processes. As most users are
highly educated professionals, sometimes they do not
follow the instructions and try to accomplish tasks in
their own ways. For example, they would log in multiple
times when they forget to log out or complete previous
sessions, or they might input non-conforming data in the
forms just to test the reliability. As a result, we had to
modify our design taking into account of such user
behaviors and at the end these efforts greatly enhanced
the system robustness. On the other hand, the new
processes distribute the responsibilities of recording and
tracking researchers work output to individuals and they
need to learn and appreciate such empowerment.
Managerial communication is essential to convince users
the benefit and necessity of the online systems to
accelerate the acceptance of the innovation.

been running for over two years without a problem. As


of May, 2006, more than 550 EDRF and 530 TM
requests were handled by the systems successfully. At
the time of submitting this manuscript, the average size
of a TM is about 35 pages with an average of 20
recipients on the distribution list. A simple calculation
shows a reduction of more than 370,000 hardcopy pages
while saving the associated postage expenses. Besides
the obvious cost-saving benefits, they also boast much
faster information delivery to all corporate levels, while
demonstrating their ease of use, flexibility, scalability and
reliability through repeated use in time.

7. REFERENCES

[1] L. Welling and L. Thomson, PHP and MySQL Web


Development (2nd Edition), Sams Publishing, 2003.
[2] The Apache Software Foundation, Apache HTTP Server
Version 2.0 Documentation, http://httpd.apache.org/docs/2.0/,
2003.
[3] Home Page for Ghostscript, http://www.cs.wisc.edu/~ghost/,
2003.

6. CONCLUSION

We report the design and implementation of two


document management systems for an international
company. Publicly available software was utilized to
implement the systems in order to quickly accomplish
the desired goals. Although the technology is not fancy,
careful practice taking into account of users feedback
made the system practical and robust. After intensive
design and implementation phase, both systems have

10th IEEE International Enterprise Distributed Object Computing Conference Workshops (EDOCW'06)
0-7695-2743-4/06 $20.00 2006

APPENDIX 1: FLOWCHART OF EDRF


WORKFLOW MANAGEMENT SYSTEM

Web-based EDRF Workflow Management System


I. Web-based Electronic Approval Process

Submit EDRF Request


(Indicate 1. Hardcopy-based?
or Web-based?
2. If Web-based, then indicate
Primary Author,
Notify Secondary Author?
Confirm Dept. Routing Template
3. Is an Exception?)

Send Email Notifications


Send Approval Request to
Direct Manager
(Web-based)
Use 5 Different Email
Messages
Electronic Approval Notification System
1. Primary Manager (Direct Manager)
2. Secondary Manager(optional)
3. Primary Author
4. Secondary Author(optional)
5. RPC Admin
6. RPC Manager?
If Approved

If not Approved

Send Approval Request to


Sponsor Manager,
Notify Authors & RPC admin,
If not
Update Database
Approved

Notify Authors & RPC admin,


Update Database

If Approved
Send Approval Request to
RPC Manager,
Notify Authors & RPC admin,
Update Database

If not Approved

If Approved
If not Approved
Send Approval Request to
GM Manager,
Notify Authors & RPC admin,
Paper Status Follow-up
Update Database
If Approved

Notify Authors & RPC admin,


Update Database

Paper Status Update and Follow-up


(Accepted?
Uploaded Full Paper and/or
Presentation Slides?
Presented?
Published?)

Note: RPC in the flowchart represents Research Planning and


Coordination, the department in charge of maintaining the systems.

10th IEEE International Enterprise Distributed Object Computing Conference Workshops (EDOCW'06)
0-7695-2743-4/06 $20.00 2006

You might also like