Document Um System Sizing Guide

Documentum
System
Sizing Guide
for all platforms
Release 1.1
November 2001
DOC3-SYSIZEGD-1101
h11KTKH1mTh1dd1d
Copyright 2000, 2001
Documentum, Inc.
6801 Koll Center Parkway
Pleasanton, CA 94566
All Rights Reserved.
Documentum
, Documentum 4i, Docbase, Documentum eContent

Server, Documentum Server
, Documentum Desktop Client,

Documentum Intranet Client, Documentum WebPublisher, Documentum
ftpIntegrator, Documentum RightSite
, Documentum Administrator,
Documentum Developer Studio, Documentum Web Development Kit,
Documentum WebCache, Documentum ContentCaster, AutoRender
Pro, Documentum iTeam, Documentum Reporting Gateway,
Documentum Content Personalization Services, Documentum Site Delivery
Services, Documentum Content Authentication Services, Documentum
DocControl Manager, Documentum Corrective Action Manager,
DocInput, Documentum DocViewer, Virtual Document Manager,
Docbasic
, Documentum DocPage Server
, Documentum WorkSpace
,
Documentum SmartSpace
, and Documentum ViewSpace

are trademarks or
registered trademarks of Documentum, Inc. in the United States and
throughout the world. All other company and product names are used for
identification purposes only and may be trademarks of their respective owners.
Documentum System Sizing Guide iii
Preface
1 Overview of System Sizing
Overview of the Sizing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Common Sizing Mistakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
Terminology Used in This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
2 Deriving Workload Requirements
What Is a Workload? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Determining the Workload for a Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
User Connection States and Resource Consumption . . . . . . . . . . . . . . . . . . . 2-2
Inactive Connections and Resource Consumption. . . . . . . . . . . . . . . . . . 2-5
RightSite Server Connection States . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
The Busy Hour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Response Time Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Using the Derived Workload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
The Documentum Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
The iTeam 2.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Workload Scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
Workload Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
Workload Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Workload Response Time Requirements . . . . . . . . . . . . . . . . . . . 2-14
The WebPublisher 4.1 Workload . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
Workload Scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
Workload Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
Workload Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19
Workload Response Time Requirements . . . . . . . . . . . . . . . . . . . 2-20
The Load and Delete Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
Comparing and Contrasting the Workloads . . . . . . . . . . . . . . . . . . . . . . . 2-23
Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
Resource Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24
Usage Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
Operations Not Included in Workloads. . . . . . . . . . . . . . . . . . . . . . . . . . 2-26
3 Hardware Architecture and Scaling
Overview of Software Trends Affecting Scaling . . . . . . . . . . . . . . . . . . . . . . 3-1
More Powerful Processors and Software Reuse . . . . . . . . . . . . . . . . . . . 3-2
Wide Variance in User Deployments . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
The Trends and Documentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
CONT E NT S
iv Documentum System Sizing Guide
Scaling the Web Tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Scaling the eContent Server Tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Scaling DocBrokers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
Scaling the RDBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
Host-based vs. Multi-tiered Configurations . . . . . . . . . . . . . . . . . . . . . . . . 3-11
High Availability Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
Scaling Across the Enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Scaling the Web Content Management Edition . . . . . . . . . . . . . . . . . . . . . . 3-17
Web Content Authoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19
Site Delivery Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20
Access Software for Dynamic Page and Metadata Retrieval . . . . . . . . . . . . 3-20
Scaling the Portal Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21
4 Server Configuration and Sizing
Overview of Server Sizing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Hardware Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Host-based Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
N-Tier Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Server Sizing Results from Benchmark Tests . . . . . . . . . . . . . . . . . . . . . . . 4-4
Special Focus for Some Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
Interpreting the CPU Sizing Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
Compaq Sizing Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Sun/Solaris Sizing Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Sun Enterprise 450 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
Sun Enterprise 6500 and 4500 . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12
IBM, Windows NT, and AIX Sizing Information . . . . . . . . . . . . . . . . . . 4-15
IBM Netfinity 7000 M10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15
IBM AIX Systems: S7A and F50. . . . . . . . . . . . . . . . . . . . . . . . . 4-16
HP Windows NT and HP-UX Servers . . . . . . . . . . . . . . . . . . . . . . . . 4-18
HP NT/Intel Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
HP-UX Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-20
Other CPU-Related Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-22
Sizing Server Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23
Overview of the Sizing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23
Key Concepts Relating to Memory Use . . . . . . . . . . . . . . . . . . . . . . . 4-25
Virtual and Physical Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25
Cache Memory Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26
DBMS Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27
eContent Server Caches. . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27
RightSite Server Caches and Work Areas. . . . . . . . . . . . . . . . . 4-27
Estimating Physical Memory Usage . . . . . . . . . . . . . . . . . . . . . . . . . 4-28
User Connection Memory Requirements. . . . . . . . . . . . . . . . . . . . 4-28
Documentum System Sizing Guide v
DBMS Memory Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 4-29
Operating System Memory Requirements . . . . . . . . . . . . . . . . . . 4-30
Estimating Paging File Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30
Additional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31
Examples of Memory Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32
Example One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32
Example Two. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-33
Example Three . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-34
Sizing Server Disk Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-35
Key Concepts for Disk Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-37
Disk Space and Disk Access Capacity . . . . . . . . . . . . . . . . . . . . . 4-37
Effect of Table Scans, Indexes, and Cost-based Optimizers on I/O . . . . 4-38
Tuning with the Optimizer . . . . . . . . . . . . . . . . . . . . . . . . 4-38
DBMS Buffer Cache Memory Effect on Disk I/Os . . . . . . . . . . . . . . 4-39
Disk Striping and RAID Configurations . . . . . . . . . . . . . . . . . . . . . . 4-40
Disk Storage Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-42
Disk Space Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-44
Physical Disk Requirements of the Documentum
Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-44
Typical Disk Space Calculation Model for Content and
Attribute Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-45
Additional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-46
Additional References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-46
Database License Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-46
Certified Database and HTTP Server Versions. . . . . . . . . . . . . . . . . . . . . . 4-47
5 Server Network Configuration Guidelines
Overview of Network Sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
Key Concepts for Network Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
Bandwidth and Latency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
Bandwidth Needs and Response Time . . . . . . . . . . . . . . . . . . . . . . . . 5-4
Making the Decision: Localizing Traffic or Buying More Bandwidth . . . . . . . . . . 5-6
More Bandwidth or Remote Web Servers . . . . . . . . . . . . . . . . . . . . . . 5-8
Content Transfer Response Time: More Bandwidth or Content Servers . . . . . 5-9
Operation Response Time: More Bandwidth or Replication . . . . . . . . . . . 5-11
Additional Specific Network Recommendations . . . . . . . . . . . . . . . . . . . . 5-12
6 Sizing for Client Applications
Sizing for Desktop Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
CPU Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
Component Initialization and Steady State Processing . . . . . . . . . . . . . . . 6-3
Memory Resource Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
vi Documentum System Sizing Guide
Sizing for AutoRender Pro. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
System Requirements for Client Products . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
A Additional Workloads
The EDMI Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
Workload Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Workload Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
Workload Response Time Requirements. . . . . . . . . . . . . . . . . . . . . . . A-4
Workload Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6
The Web Site Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6
Workload Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7
Workload Response Times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-8
Workload Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9
The Document Find and View Workload . . . . . . . . . . . . . . . . . . . . . . . . . A-9
The Online Customer Care Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9
Workload Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
Workload Response Time Requirements. . . . . . . . . . . . . . . . . . . . . . A-13
Comparing and Contrasting the Workloads. . . . . . . . . . . . . . . . . . . . . . . A-14
Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-14
Usage Models and Resource Consumption . . . . . . . . . . . . . . . . . . . . A-15
Document Find and View Workload . . . . . . . . . . . . . . . . . . . . . . . . A-17
Operations Not Included in Workloads . . . . . . . . . . . . . . . . . . . . . . . . . A-17
Index
Documentum System Sizing Guide vii
PRE F ACE
Purpose of the Manual
This document describes the process for estimating an organizations initial
requirements for server capacity and configuration options. The specific
intention is to assist our customers in determining a range of optimal
configuration options relative to their business objectives. This document
presumes the customers understand their business requirements and network
operating environment.
Intended Audience
This document is intended for system administrators, technical managers,
operation coordinators, or other technical personnel responsible for sizing a
Documentum environment.
Organization of the Manual
This manual contains six chapters and one appendix. The table below lists the
information that you can expect to find in each.
Chapter Contents
Chapter 1, Overview
of System Sizing
An overview of the sizing process and definitions of terms
used in the document.
Chapter 2, Deriving
Workload
Requirements
A discussion of key concepts in the capacity planning process
and descriptions of workloads.
Chapter 3, Hardware
Architecture and
Scaling
The architecture of the Documentum Server software, how it
scales with increased load, and the implications for hardware
configuration.
Preface
Using Links in PDF Files
viii Documentum System Sizing Guide
Using Links in PDF Files
If you are reading this document as a Portable Display Format (PDF) file,
cross-references and page numbers in the index appear as clickable blue
hypertext links. Table of contents page numbers are also clickable links, but
they appear in black.
To follow a link:
1. Move the pointer over a linked area.
The pointer changes to a pointing finger when positioned over a link. The
finger pointer displays a W when moved over a Weblink.
2. Click to follow the link.
Note: A Web browser must be chosen in your Weblink preferences to follow a
Weblink. See Setting Weblink preferences in your Adobe Acrobat Help for
more information.
Chapter 4, Server
Configuration and
Sizing
Information and guidelines for sizing the server configuration.
Chapter 5, Server
Network
Configuration
Guidelines
Information and guidelines for sizing the server network
configuration.
Chapter 6, Sizing for
Client Applications
Information about Documentum client sizing requirements.
Appendix A,
Additional
Workloads
Information about workload models other than those
described in Chapter).
Chapter Contents
Preface
Bug Lists and Documentation On-Line
Documentum System Sizing Guide ix
Bug Lists and Documentation On-Line
Customers with a Software Support Agreement can read our product
documentation and, after commercial release of a product, view lists of fixed
bugs on Documentums Technical Support Web pages, Support On-Line. To
enter Support On-Line, you must request access and obtain a user name and
password.
Applying for Access
To apply for access to Support On-Line:
1. In your Web browser, open
http://www.documentum.com/
2. Click the Technical Support link.
3. Click the Request Access link.
4. Complete the form and send it.
Documentum will respond to your request within two business days.
Fixed Bugs List
A list of customer-reported bugs that have been fixed will be available two
weeks after this release, at Support On-Line, the Technical Support area of the
Documentum Web site. For information about obtaining access to Support
On-Line, refer to Applying for Access. You must have Adobe Acrobat
Reader or Acrobat Exchange installed to view the lists of fixed bugs.
To view the lists of fixed bugs:
1. In your web browser, open
3. Log on to the Technical Support site.
4. In the Troubleshooting section, click View Bugs.
5. Click Fixed Bugs and Feature Requests Lists.
Preface
Purchasing Bound Paper Manuals
x Documentum System Sizing Guide
6. Click the name of the bug list.
Product Documentation
Customers with a Software Support Agreement can read our product
documentation at the Documentum Web site. You must have a user name and
password, and Adobe Acrobat Exchange or Acrobat Reader installed in order
to view the documentation. To obtain a user name and password, refer to
Applying for Access.
To view a document:
1. In your Web browser, open
3. Log on to the Technical Support site.
4. In the Resources section, click Documentation.
5. Click the name of the document.
Purchasing Bound Paper Manuals
Our product documentation is available for purchase as bound paper
manuals. To place an order, call the Documentation Order Line at (925) 600-
6666. You can pay with a purchase order, check, or credit card.
Documentum System Sizing Guide 11
1
Overview of System Sizing 1
This chapter provides a brief overview of the system sizing process and
introduces the terminology used in this guide. The following topics are
discussed in this chapter:
s Overview of the Sizing Process on page 1-1
s Common Sizing Mistakes on page 1-3
s Terminology Used in This Guide on page 1-4
Overview of the Sizing Process
System sizing is the process of determining what hardware, software, and
network configurations will provide the best performance for users at the
lowest cost to the enterprise. Another term for system sizing is capacity
planning.
Figure 1-1 illustrates the system sizing process.
Overview of System Sizing
12 Documentum System Sizing Guide
Figure 1-1 The System Sizing Process.
The first and most important step is to determine the performance and
configuration requirements for the application or service that will be using
Documentum. These requirements include expectations for:
s Number of users serviced during the hour of peak usage (the busy hour)
s Acceptable response times
s Document sizes
s Document availability (for distributed sites)
s Documentum products used
s Geographic access (some local and some remote)
After the requirements are known, usage of server and network resources can
be estimated and then budgeted. Typically, the budget for resources is
allocated far in advance of their final implementation and deployment.
Derive workload and
customer
performance
requirements
Decide high-level
hardware
deployment
architecture
Estimate CPU
configuration
Estimate memory
needs
Estimate disk
capacity and
access needs
Refine network
analysis
Budget for servers
and
telecommunications
services
Before actual
deployment, check to
ensure requirements
have not grown to
exceed purchased
hardware
Common Sizing Mistakes
Consequently, it is wise to review current knowledge of the application and
environment between budgeting and actual deployment, to ensure that the
budgeted resources satisfy the requirements. For instance, the budgeted
hardware may have been intended for 1,000 users per hour, but the initial
rollout now must cover 2,000 users per hour during the busy hour. The
difference may require a reassessment of the hardware resources.
Documentum provides a spreadsheet for the system sizing process. After the
customer enters user, hardware, and document profile information for the
system, the spreadsheet suggests configuration information.
Common Sizing Mistakes
Several mistakes are commonly made in system sizing:
s Failure to obtain sufficient information about the customer requirements
and the deployed application
Sometimes systems are sized based on only a partial picture of the
workload. If a significant portion of the workload is left out, the estimated
hardware resources might be insufficient to serve the entire workload. This
often happens, unnoticed, when an application that is being developed
experiences feature creep (the addition of features).
s Paying insufficient attention to server machine differences
For example, both an Intel server and an Intel-based lap top might have a
single processor and the same memory, but there are performance
differences between them beyond mere expandability. Server-class
machines have more processor cache, more bus bandwidth, and faster disk
I/O subsystems than a laptop. These result in large performance
differences with server applications such as Documentum.
s Assuming that somehow an Intel server will need fewer disks and less
memory than a comparable UNIX server machine
Intel-based servers are subject to the same limitations with respect to these
resources as UNIX-based servers
s Assuming that application developers will always tune their applications
prior to deployment.
Terminology Used in This Guide
In many cases, the hardware for an application is chosen and budgeted for
many months in advance of the applications full deployment. The
budgeting used in this guide assumes some level of application
performance tuning. However, a couple of poorly optimized queries can
result in large response times. Capacity planning is not a substitute for
performance testing and tuning.
Table 1-1 lists and defines terms used throughout this document.
Table 1-1 Definition of Common Terms
Term Definition
Active User A connected user who has not reached activity time-out. Active
users consume Documentum Server
resources.
Active User In
Transaction
A connected user who is currently waiting for a response to a
request from Documentum Server. An example is a user who
is logged into Desktop Client and waiting for the
Docbase View window to open. An active user in
transaction consumes the most Documentum Server
resources of any user state, including CPU, RAM, network
throughput, and disk throughput.
Active User Out of
Transaction
A connected user who is not currently waiting for a
Documentum Server response to a request. An example is a
user who is logged into WorkSpace and viewing a document
in a word processor. While the word processor displays the
document, the Documentum Server does not receive any
more requests from the user until the user is finished viewing.
An active user out of transaction consumes fewer
Documentum Server resources than an active user in
transaction, but more than an inactive user. RAM is the
primary resource that an active user out of transaction
consumes.
Activity Time-out A Documentum Server feature that conserves server-side
resources. When a connected user has not made a request of
the Documentum Server within a specified time limit, the
server transparently frees the connection to free up unused
OS and DBMS resources. The next time the user makes a
request of the Documentum Server, the request is handled
automatically without requiring the user to login again. The
activity time-out counter is reset after each completed user
request of the Documentum Server.
Bandwidth Refer to Network Throughput.
Bottleneck A resource that limits performance. Examples are CPU, RAM,
network throughput and disk throughput.
Connected User A user who is currently logged into a Docbase.
Connecting User A user who has requested a Docbase connection but is not yet
logged in.
Database Server
(RDBMS instance)
SQL Relational Database Management System required as
part of the Documentum Docbase. Used to store
Documentum object attribute information.
Docbase The dynamic document and Web page repository accessed by
the Documentum Server. The Docbase stores a document or
Web page as an object that encapsulates the documents
native content together with its attributes, including
information about the documents relationships, associated
versions, renditions, formats, workflow, and security.
DocPage Server Documentum Server version 3.x
Documentum Server Software used to service incoming and outgoing document
management requests for data in the Docbase. Different
versions carry different product names:
s eContent Server refers to Documentum Server, version
4.2.
s e-Content Server refers to version 4.1.
s EDM Server refers to version 4.0.
s DocPage Server refers to version 3.x.
Term Definition
Disk Throughput Number of bytes per unit of time transferred to or from the
disk subsystem during read or write operations.
e-Content Server Documentum Server version 4.1
eContent Server Documentum Server version 4.2.
EDM Server Documentum Server version 4.0.
HTTP Server (Web
Server)
Software required to service HTTP requests by a Web browser
from a file system or from Documentum RightSite
.
Inactive User A connected user who has reached activity time-out. Inactive
users do not consume Documentum Server resources
Named User A user for whom a user profile is defined in the Docbase. Each
user profile is stored as a dm_user object in the Docbase.
Network Latency The delay in response to a network request due to the time it
takes for a byte of data to traverse the network and travel
from the client to the server and back again. Latency depends
on the distance between the client and server, how many
pieces of equipment are in between, and the types of
communication lines.
Network
Throughput
The number of bytes per unit of time that can flow between a
client and server. Throughput is also referred to as
bandwidth.
Physical Memory Total RAM dedicated to a physical computer system.
RightSite Server technology required to coordinate document
management requests between an HTTP server and the
Documentum eContent Server. This component is required
for all Documentum Web products including Intranet
Client and other web-content-based applications. RightSite
must be physically installed on the same host as the HTTP
server.
Term Definition
Transactions Requests from the client that have a response from the server.
The client must wait for the response before it can continue.
For example, the client might send Update the Check Out
field with my name and the server sends back Done.
Multiple application transactions can occur for specific user-
level functions
Transformation
Engine
The facility within Documentum Server that automatically
transforms content in one format to another format.
The transformation engine uses a supported converters to
perform the transformation. Through the transformation
engine, you can:
s Transform one word processing file format to another
word processing file format
s Transform one graphic image format to another graphic
image format
s Transform one kind of format to another kind of format
for example, changing a raster image format to a page
description language format
Some of the converters are supplied with the Documentum
system; others must be purchased separately.
User States Named users may be in one of the following activity states:
connected user, active user, or inactive user. Active users may
be divided into two categories: active user in transaction
and active user out of transaction. Understanding the
different user states can have a beneficial impact on system
sizing because they vary in resource consumption due to the
activity time-out feature. (For more information, refer to
User Connection States and Resource Consumption on
page 2-2.)
Virtual Memory A service provided by the operating system (and hardware)
that allows each process to operate as if it has exclusive access
to all memory (0 to 2, typically). However, a process only
needs a small amount of this memory to perform its activities.
This small amount, called the process working set, is actually
kept in memory. The operating system manages sharing of
physical memory among the various working sets.
Term Definition
2
Deriving Workload Requirements 1
This chapter introduces workloads, discusses two concepts that are key to
determining workload requirements, and describes the workloads used in the
benchmark tests provided by Documentum. The following topics are
included:
s What Is a Workload? on page 2-1
s Determining the Workload for a Site on page 2-2
s User Connection States and Resource Consumption on page 2-2
s The Busy Hour on page 2-6
s Response Time Expectations on page 2-9
s Using the Derived Workload on page 2-9
s The Documentum Workloads on page 2-9
s Comparing and Contrasting the Workloads on page 2-23
s Operations Not Included in Workloads on page 2-26
Note: Benchmark results are described in Chapter 4, Server Configuration and
Sizing. Detailed benchmark reports are available in Kpool.
What Is a Workload?
A workload is a usage pattern for a group of users. For example, checking
documents out of the Docbase and checking them in at a later time represents
a simple usage pattern for the users called contributors (because they
contribute content to the Docbase). However, typical workloads involve more
than simple check ins and check outs. Typically, a workload also includes
activities such as navigating through folders, viewing documents,
participating in workflows, constructing or publishing virtual documents,
and so forth.
Deriving Workload Requirements
Determining the Workload for a Site
Determining the Workload for a Site
Before you attempt to define the workload for a site, you should understand
two principles: the relationship between user connection states and resources
consumed by a workload, and the concept of the busy hour. User Connection
States and Resource Consumption on page 2-2 defines the user connection
states and describes how they differ in resource consumption. The Busy
Hour on page 2-6 defines the concept of the busy hour.
To estimate a workload for a site, you must obtain the following information:
s What Documentum products are in use
s The estimated number of users who are connected during the busy hour
s The estimated number of active users during the busy hour
s What Docbase operations are performed and how often each is performed
s The number, size, and content profile of documents in the Docbase
Because it can be especially difficult to identify the Docbase operations and
how often each will be performed, it is strongly recommended that you use
the Documentum Sizing Spreadsheet. The spreadsheet makes some standard
assumptions about Docbase operations based on the user category
(contributor or consumer) and products in use (the workload column in
which you enter the information).
User Connection States and Resource
Consumption
User connection states affect the amounts of resources consumed by a
workload. There are four user connection states, and the resources consumed
by a user in each state differ. The four states are:
s Connecting
s Active - in transaction
s Active - out of transaction
s Inactive
User Connection States and Resource Consumption
A connecting user is a user who is requesting a connection with the Docbase. A
connecting user consumes CPU, network, memory, and disk and swap space.
An active user - in transaction is a user who is connected to the Docbase and is
currently waiting for a response to a request from eContent Server. An active -
in transaction user consumes CPU, network, memory, and disk and swap
space.
An active user - out of transaction is a user who is connected to the Docbase but
is not currently waiting for a response from eContent Server. An active - out of
transaction user consumes server memory and swap space.
An inactive user is an active user - out of transaction whose Docbase session
has timed out. An inactive user consumes only memory on the client machine.
Figure 2-1 illustrates the user connection states and their relationship with
resource consumption.
Figure 2-1 User Connection States and Resource Consumption
Users consume server CPU mainly during session establishment and when
they initiate a request to eContent Server. When an active user is not initiating
a request, only server memory and some operating system networking
resources are consumed. When a user is inactive, the server resources are
reclaimed for other purposes. Only the client machine will consume some
memory resources in this state (essentially remembering where the session
should resume when the session returns to the Active state).
Connecting User
Active User
In Transaction
Active User
Out of Transaction
Consumes Server memory
and swap space
Inactive User
Consumes memory
only on client machine
Consumes CPU, network,
memory, disk, and swap
space
Session Up
Inactivity
Time-out
Reached
User Initiates an
Action
Consumes CPU,
network, memory,
disk, and swap space
Inactive Connections and Resource Consumption
Both eContent Server and the RightSite Server free inactive connections.
Freeing inactive connections reduces the memory demands on the system and
minimizes the number of concurrent DBMS sessions. By default, eContent
Server frees inactive connections after 5 minutes of inactivity and the
RightSite Server does so after 30 minutes of inactivity.
When eContent Server frees an inactive connection, the server disconnects
from the DBMS and kills the process (Unix) or thread (Windows NT) that
corresponds to the inactive connection. When the RightSite Server frees an
inactive connection, the server kills the process or thread associated with the
connection.
The freed sessions can be re-established. With eContent Server, the session is
re-established transparently when the user initiates another command. With
RightSite, the user must login again (for named sessions). However, when a
session is restarted, there is a startup cost that includes operations such as
reconnecting to the DBMS, resetting caches, and so forth.
Inactive time-out trades off CPU time for reduced memory and concurrent
session requirements. That is, stopping and restarting a session repeatedly
uses more CPU than leaving the session connected continuously. However,
disconnecting the session frees memory for other uses and reduces the
maximum number of active database sessions needed.
RightSite Server Connection States
From eContent Servers viewpoint, the RightSite Server is a user. RightSite
Server connections to eContent Server go through state transitions with
associated resource use similar to any other user connecting or connected to
eContent Server.
An active RightSite Server that is not processing a request consumes only
memory and swap space. When the RightSite Server requests a Docbase
connection or processes requests, it consumes all of the major types of
resources (CPU, network, memory, disk, and swap space).
The Busy Hour
The Busy Hour
The busy hour is the hour during the day in which the largest number of
operations and active sessions occur. Even in the busy hour, however, the total
amount of activity is only a percentage of the total possible activity.
To illustrate this, consider the telephone world. Suppose that ABC Telephone
Company has 1 million telephones installed in a given calling area, and in that
area the busy hour is from 11:00 a.m. to 12:00 noon. Assuming that the average
phone call lasts 2 minutes, the busy hour could theoretically involve 30
million calls1 million phones used to make 30 calls each within that hour. In
reality, only a percentage of the phones are used during the busy hour and the
calls vary in duration and occurrence. Users do not typically make repeated
2-minute phone calls. They make a call of some duration, hang up, and
engage in some other activity, and they may or may not make another call.
Figure 2-2 illustrates the busy hour and the assumption that activity during
the busy hour is only a percentage of total possible activity.
The Busy Hour
Figure 2-2 Telephone Busy Hour
The ABC Communications Company sizes the back end of its phone system to
accommodate the real-world busy hour use of the phones, not the upper limit
of theoretical use.
Applying the analogy to Documentum systems, the number of phones is the
number of users (or seats) in a Documentum installation. The number of
phone owners actually making phone calls is the number of users during the
busy hour. The phone conversations are the active sessions established
between a user and the Docbase.
Just as only a percentage of installed telephones are used during the busy
hour and used only intermittently, only a percentage of Docbase users are
logged-in during the busy hour and only a percentage are making requests.
Because eContent Server frees inactive sessions to save on resource
consumption, only a percentage of the logged-in users served in the busy hour
have active sessions at any one time. Figure 2-3 shows the relative proportion
of busy hour users and active users to licensed users. Proportions will vary
from site to site.
0
10
20
30
40
50
60
8:00 9:00 10:00 11:0012:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00
1
10
100
1000
10000
100000
1000000
10000000
100000000
The Busy-Hour calls
Lots of Telephones
Lots & Lots of potential
2-min calls that could be
made by those phones.
K

c
a
l
l
s

p
e
r

h
o
u
r
The Busy Hour
Figure 2-3 Licensed Users Versus Busy-Hour Users Versus Currently Active Users
In general, it is best to try to size the server systems for the busy hour. But
when is it? And how can one estimate it for an application that has not been
deployed yet? A bit of guess-timating is typically required. Typical sites
estimate that 20 to 30 percent of the licensed users (in a full deployment)
request service during the busy hour. Testing has shown about 20 percent of
the users served in one hour are active at any point in the hour (assuming
users make random requests to the Docbase throughout the hour). In the
absence of any data, these are reliable proportions to use on the sizing
spreadsheet.
Note: Because RightSite waits longer to time out its session, there are typically
more RightSite active sessions than eContent Server active sessions at any one
time.
Licensed users Users per busy hour Avg Active users
Response Time Expectations
Response Time Expectations
Response times are an important criterion for judging the effectiveness of the
system or service that is deployed. Response times should match or better
user expectations.
When users are asked about desired response times, they typically respond
two to three seconds if they have no information about the document size or
format. However, users also expect that it will take longer to check out a 10-
megabyte document than one that is 10K bytes. They also expect that it will
take longer to publish a virtual document with 1000 parts than it will to
publish one with 10 parts.
We recommend determining the major components of the workload and
corresponding expectations for response time.
Using the Derived Workload
After you have determined the workload at your site, compare it to the
workloads described in the following section, The Documentum
Workloads. After you determine which workload most closely matches your
sites workload, you can fill in the appropriate columns in the Documentum
Sizing Spreadsheet with your information.
You can also examine the benchmark test results reported in Chapter 4, Server
Configuration and Sizing, for those tests conducted using the workload that
matches your workload. Using these may also help you determine your
configuration requirements.
The Documentum Workloads
This section describes the workloads used in the benchmark tests conducted
by Documentum. The following three workloads are included in this section:
s The iTeam 2.2 Workload on page 2-10
s The WebPublisher 4.1 Workload on page 2-16
s The Load and Delete Workload on page 2-22
Appendix A describes four additional workloads:
s The EDMI Workload on page A-1
s The Web Site Workload on page A-6
s The Document Find and View Workload on page A-9
s The Online Customer Care Workload on page A-9
The iTeam 2.2 Workload
iTeam is a Documentum application that provides a collaborative framework
for developing projects. It groups together the documents, resources, news,
and discussions for a project and allows users to easily reuse these for future
projects. The iTeam workload simulates the activities of iTeam users.
iTeam is Web-based and uses the following Documentum products:
s Documentum eContent Server
s Documentum RightSite Server
s Documentum Web Development Kit (WDK)
s Docbasic
s Documentum Intranet Client

Figure 2-4 illustrates the software architecture.
Figure 2-4 iTeam Software Architecture
Each iTeam connection through the Web Server establishes two Documentum
sessions to eContent Server: one through the Documentum RightSite Server
and the other through the Jrun server. The RightSite-based session is used for
various Documentum Intranet Client component customizations and the
Jrun-based session supports the operations made through the WDK and DFC
using Java Server pages.
Workload Scenario
Each iTeam user logs into iTeam throughout the hour and performs a series of
different tasks. The majority of the users execute frequently occurring tasks
that include checking their inboxes, handling the tasks, displaying the iTeam
personal view, viewing documents in text, MS Word, and Powerpoint formats,
reading news, and participating in group discussions. Other users perform
tasks that occur less frequently, such as document check out and check in,
workflow and business policy operations, and attribute searches. In addition
to these standard operations, the workload also exercises the following
Documentum capabilities:
s JSP code written with Documentum Web Development kit
s Customized Documentum 4i Intranet Client components that use the
Documentum RightSite Server
This workload stresses the multi-user capabilities of the system. The
operations performed on the objects are feature-rich. For example, documents
proceed through a lifecycle of three states: InProgress, Review, and Approved.
When a document is promoted to the Review state, a workflow is started to
notify a group of 10 users that the document is ready for review. The
notifications are placed in the users inboxes, and the first user to review the
change completes the task for the team.
The attribute search is resource intensive. Each document has a set of at least
50 attributes, set to pre-determined values, that are used to achieve fixed-sized
result sets for attribute-based searches. The number of hits for each search is
typically about 30 documents but can be as high as 110. The searches are case-
insensitive and wildcard-based, and use the iTeam simple single-box search.
Note: An acceptable variation of the workload customizes the single-box
search to include a full-text search. In such cases, the values in the attributes
are also placed in the document content. (The full benchmark reports, found
in Kpool, discloses when full-text searching was used.)
Workload Operations
Table 2-1 describes the operations performed by the workload.
Table 2-1 Operations in the iTeam 2.2 Workload
Operation Description
CONN_+_PERSONAL_VIEW Establishes a users connection and Docbase
session and automatically displays the Personal
View for the user. A Personal View shows items of
interest for every project with which the user is
associated. The items include news, activities, and
issues for the user. In this workload, each user has
an interest in 5 deliverables in each of the 5
projects in which the user is participating.
PERSONAL_VIEW Displays the users Personal View.
VIEW_INBOX Displays the workflow tasks in which a user is
participating.
PROCESS_WORKFLOW _ITEM Forwards a completed workflow task.
Workload Scaling
To scale the workload, the Docbase size is increased as more users are tested.
Each user is associated with 5 projects and each iTeam project has 20 users.
Consequently, for every 20 users in the workload, there are 5 projects, and for
a 200 users/busy hour run, there will be at least 50 active projects.
Additionally, a production Docbase in operation at least one year will have at
least as many inactive or completed projects in the Docbase as there are active
projects. Consequently, in a configuration supporting 200 users per busy hour,
there should be at least 100 projects.
Each project in this workload includes:
s 10 news items
VIEW_DOCUMENT Selects a text, Word, or Powerpoint document for
display and then returns to the Document portion
of the Center view.
DISCUSS_POST_REPLY Displays the Discussion tab of the Center/Project
view so that the user can review some random
discussion and then reply to the discussion.
CHECKOUT_DOC Checks out a random text, Word, or Powerpoint
document. The document is selected from the
document list of a Project views deliverables.
CHECKIN_DOC Checks in a checked out document.
PROMOTE_DOCUMENT Displays the page that lists all possible operations
on a document and then promotes the document
to the Review state. The promotion starts a
distribution workflow to route the document to
other members of the team for review.
READ_NEWS Displays a random news article (text format) from
a project.
SEARCH_ATTRIBUTES Searches the documents in one project. The search
is conducted on 4 or 5 attributes in a case-
insensitive mode using wild card matching. The
search returns from 0 to 100 hits.
Table 2-1 Operations in the iTeam 2.2 Workload
s 10 project deliverables (iTeam activities, with at least 5 visible to each user)
s 5 deliverable documents per project deliverable (2 text, 2 Word, and 1
Powerpoint)
s 5 reference documents per project deliverable (2 text, 2 Word, and 1
Powerpoint)
s 10 discussion groups
Each deliverable or reference document (not including discussions and news)
has three versions. In total, each project represents about 160 objects with
content, so a 200-user per hour configuration should have at least 35,000
objects. This represents the minimum number of documents. More projects
and documents can be loaded to support a larger number of users. (The total
number of documents loaded for each benchmark test is disclosed in the full
benchmark reports.) The document formats, sizes, and distributions are
shown in Table 2-2.
Workload Response Time Requirements
When a benchmark test is run, the primary benchmark obtained is the number
of users performing their tasks that can be supported with acceptable
response times.
Each iTeam reference user performs 10 operations at random times, and the
response time for these operations is measured. Each operation typically
displays several HTML screens dynamically generated by RightSite or WDK/
JRun and the content from the Docbase (Powerpoint, Word, and text files).
Users start and end their work randomly throughout the busy hour. The
interval between a users requests affects performance and response time
because Documentum frees a user connection that does not have any activity
Table 2-2 Document Formats, Sizes and Distributions
Documents Portion of Total Number of
Documents
Estimated Average Size
Power Point 17 % 50,000
Word 34 % 40,000
Text 34 % 25,000
Messages (text) 15 % 2000
after some amount of time (typically two to five minutes). Re-establishing the
session (which happens transparently when work is initiated on an idle
session) consumes more CPU resources. Simulating this behavior in the test
models the real world more accurately. (User Connection States and
Resource Consumption on page 2-2 provides more information about the
resource consumption of various user connection states.)
Two to four seconds per screen is the acceptable response time generally, with
some exceptions when the operation is complex. After factoring in the relative
weights of all operations and the number of screens, the average response
time per screen is typically two seconds. Table 2-3 lists the response time
requirements for the operations.
Table 2-3 Response Time Requirements for iTeam 2.2 Workload
Task Number of
Screens
Acceptable
Response Time
per Screen
(in seconds)
Total
Acceptable
Average
Response Time
(in seconds)
CONN_+_PERSONAL_VIEW 2 5 10
PERSONAL_VIEW 1 5 5
VIEW_INBOX 1 6 6
PROCESS_WORKFLOW
_ITEM
4 3 12
VIEW_DOCUMENT 3 3 9
DISCUSS_POST_REPLY 4 2 8
CHECKOUT_DOC 3 3 9
CHECKIN_DOC 4 2 8
PROMOTE_DOCUMENT 4 4 16
READ_NEWS 1 2 2
SEARCH_ATTRIBUTES 3 7 20
Table 2-4 shows a set of sample results. (Note that the sample results were
generated by running the benchmark test on an optimal hardware
configuration for the number of users tested.)
The WebPublisher 4.1 Workload
WebPublisher is a Documentum application that provides a framework for
managing content on a Web site. It integrates with Documentum AutoRender
Pro for file renditions and with Documentum WebCache Server for content
delivery to a Web site. The WebPublisher 4.1 workload simulates the
operations of WebPublisher users.
WebPublisher is Web-based and uses the following Documentum products:
Table 2-4 Example Response Times for iTeam 2.2 Workload
Operation Average
Operation
Response
Time
(in seconds)
Total Operations
Acceptable
Average Response
Time
(in seconds)
Total
Operations in
One Hour
Average
Response
Time per
Screen
(in seconds)
CONN_+_PERSONNAL_VIEW 9.54 10 200 4.77
PERSONAL_VIEW 4.89 5 147 4.89
VIEW_INBOX 3.62 4 200 3.62
PROCESS_WORKFLOW
_ITEM
6.78 12 20 1.70
VIEW_DOCUMENT 6.75 9 649 2.25
DISCUSS_POST_REPLY 6.6 8 588 1.65
CHECKOUT_DOC 8.3 9 56 2.77
CHECKIN_DOC 5.35 8 56 1.34
PROMOTE_DOCUMENT 13.67 16 63 3.42
READ_NEWS 0.42 2 200 0.42
SEARCH_ON_TITLE 19.87 20 24 6.62
For all 2,203 2.31
s Documentum e-Content Server
s Documentum RightSite Server
s Docbasic
s AutoRender Pro
s WebCache Server
Figure 2-5 illustrates the software architecture.
Figure 2-5 WebPublisher Architecture
Workload Scenario
The operations in the WebPublisher workload simulate the various actions of
WebPublisher users. In addition to many basic Documentum operations, this
workload exercises the following features and technology of Documentum 4i:
s eContent Server business lifecycle processing
s eContent Server 4i workflow processing
s Customized Documentum 4i Intranet Client components using the
Documentum RightSite Server
s AutoRender Pro rendering of documents from Word to HTML
s WebCache Web-page publishing
There are two kinds of users: content managers and content authors. Each
content manager works with ten content authors on their own Web site
(wcm_channel).
Each content manager:
s Randomly selects a text or Word template from Content Configuration and
creates a page for a designated author
Content managers create one page for each author with whom they work.
s Reviews any tasks labeled Reviewer
s Approves any tasks labeled Approver
s Views a page on a Web site
Each content author:
s Checks out and edits the page provided by the content manager
s Checks in an edited page
s Unlocks the checked out page
s Publishes a page using the WebCache server and views it on a Web site
s Submits and routes a page to the content manager group for review
The workload documents are attached to the WebPublisher Engineering
Lifecycle, and all operations in the workload are activities in the Manager
Process Workflow.
Workload Operations
Table 2-5 lists the operations in the WebPublisher workload.
Table 2-5 Operations in the WebPublisher Workload
CONN Establishes a users connection and session and
displays the EDIT WEB PAGE screen for the user.
WORK_ON_TASKS Displays any available Reviewer or Approver tasks
for a content manager or any available Author tasks
for a content author.
CREATE_WEB_PAGE Displays a screen that allows the user to create a
new Web page. The user enters a file name, the type
of workflow, the name of the next user who will
work on the page, and the pages expiration date.
Workload Scaling
The Docbase size is increased as the workloads user population grows.
SELECT_A_TASK Selects the Author, Reviewer, or Approver task and
displays actions that a user can execute on the Web
page associated with the task.
VIEW_DETAILS Displays the tasks in which the user is participating
and the attributes of the Web page on which the
user is to work.
REVIEW_WEB_PAGE Writes and saves some review notes for a Web page
and submits the page to the next person for
approval.
APPROVE_WEB_PAGE Writes and saves some review notes for a Web page
and submits the page to the next person for
approval.
WEB_VIEW Moves the Web page from the source Docbase to the
target Docbase.
CHECKOUT_TEXT_PAGE Checks out a text page and brings up a text editor.
CHECKOUT_WORD_PAGE Checks out a Word page and brings up a Word
editor.
CHECKIN_TEXT_PAGE Checks in a checked out text page. The operation
occurs when the Save This Page command is
executed.
CHECKIN_WORD_PAGE Checks in a checked out Word page. The operation
occurs when the Save This Page command is
executed.
CANCELCHECKOUT_PAGE Unlocks a locked page.
FINISH_SUBMIT Submits a Web page to the next person, who may be
a content manager, for review, approval, or both.
Table 2-5 Operations in the WebPublisher Workload
When a benchmark test is run, the primary benchmarked obtained is the
number of users performing their tasks who can be supported with acceptable
response times.
The user operations in this workload represent a possible worst-case scenario,
because each user moves a document through the entire lifecycle (WIP,
Staging, and Approved) within the test period.
Each content manager task consists of 7 operations. Each content author task
consists of 11 operations. All tasks are performed at random times and the
response time for them is measured. Each task typically displays several
HTML screens dynamically generated by RightSite and the content from the
Docbase (Word and text files).
The interval between a users requests affects performance and response time
after some amount of time (typically two to five minutes). Re-establishing the
session (which happens transparently when work is initiated on an idle
session) consumes more CPU resources. Simulating this behavior in the test
more accurately models the real word.
Two to four seconds per screen is generally the acceptable response time, with
some exceptions when the operation is complex. After factoring in the relative
weights of all operations and the number of screens, the average response
time per screen is typically three seconds. Table 2-6 lists the response time
requirements for the WebPublisher workload.
Table 2-6 Response Time Requirements for WebPublisher Workload Operations
Operation Number of
Screens
Acceptable
Response Time
per Screen
(in seconds)
Total
Acceptable
Average
Response Time
(in seconds)
CONN 1 10 10
WORK_ON_TASKS 1 2 2
CREATE_WEB_PAGE 7 3 21
SELECT_A_TASK 1 2 2
Table 2-7 shows an sample set of results.
VIEW_DETAILS 1 2 2
REVEIW_WEB_PAGE 3 1.3 6
APPROVE_WEB_PAGE 3 1.3 6
WEB_VIEW 1 12 12
CHECKOUT_TEXT_PAGE 2 4 8
CHECKOUT_WORD_PAGE 2 4 8
CHECKIN_TEXT_PAGE 2 4 8
CHECKIN_WORD_PAGE 2 4 8
CANCELCHECKOUT_PAGE 3 3 9
FINISH_SUBMIT 3 3 9
Table 2-6 Response Time Requirements for WebPublisher Workload Operations
Operation Number of
Screens
Acceptable
Response Time
per Screen
(in seconds)
Total
Acceptable
Average
Response Time
(in seconds)
Table 2-7 Example Set of Results for WebPublisher Workload
Operation Average
Operations
Response Time
(in seconds)
Total
Operations
Acceptable
Average
Response Time
Total
Operations In
1.5 hours
Average Screen
Response Time
(in seconds)
CONN 9.77 10 88 9.77
WORK_ON_TASKS 1.8 2 686 1.8
CREATE_WEB_PAGE 20.39 21 80 2.91
SELECT_A_TASK 0.83 2 686 0.83
VIEW_DETAILS 2.06 2 159 2.06
REVIEW_WEB_PAGE 3.09 5 138 1.03
The Load and Delete Workload
The load and delete workload simulates a common scenario for content
repositories: loading and deleting documents. Many Documentum sites load a
large number of documents in batches and then provide online user access to
those documents or process the documents in some way (for example, the
documents may be assembled and published). In all cases, Docbases do not
grow infinitely. Documents are brought into the Docbase and eventually aged
out.
The load and delete workload uses eContent Server and Docbasic.
The primary benchmarks for this workload are:
s The time to load 10,000 PDF documents (each 600,000+ bytes in size)
s The time to delete 10,000 PDF documents
s The time to load and delete 10,000 PDF documents (at the same time)
The Docbase is loaded using as many parallel loading sessions as are needed
to increase throughput. Each loading session loads 10,000 documents. A
unique number is assigned to the load session and recorded in the chap_num
APPROVE_WEB_PAGE 2.35 5 78 0.78
WEB_VIEW 11.39 12 157 11.39
CHECKOUT_TEXT_PAGE 5.73 8 82 2.87
CHECKOUT_WORD_PAGE 5.84 8 76 2.92
CHECKIN_TEXT_PAGE 7.00 8 82 3.5
CHECKIN_WORD_PAGE 6.93 8 76 3.47
CANCELCHECKOUT_PAGE 6.47 9 158 2.31
FINISH_SUBMIT 4.95 9 295 1.65
Table 2-7 Example Set of Results for WebPublisher Workload
Operation Average
Operations
Response Time
(in seconds)
Total
Operations
Acceptable
Average
Response Time
Total
Operations In
1.5 hours
Average Screen
Response Time
(in seconds)
Comparing and Contrasting the Workloads
attribute of all documents loaded during that session. (This number is indexed
in the DBMS and used by the delete program.) The reported time-to-load is
the longest time reported for any of the sessions.
The documents are deleted in a single session. Each time the delete program
runs, it deletes all the documents that have the same value in chap_num. This
ensures that 10,000 documents are deleted each time the program runs.
A secondary benchmark for the tests is the number of documents that have
been pre-loaded into the Docbase. The size of the Docbase directly affects the
response times for the primary metrics. The more documents a Docbase
contains, the longer it takes to insert new objects. The minimum number that
are pre-loaded for benchmark tests using this workload is 100,000. To
minimize disk space requirements, the pre-loaded documents can be any size
(for example, 5K bytes).
When loaded, the documents are created as a subtype of dm_document and
have various custom attributes.
This section compares and contrasts the workloads in terms of the software
architecture, resource consumption, and usage patterns modeled in each.
Software Architecture
All of the workloads except for the load and delete workload use an HTTP
thin-client, or N-tier, paradigm. In an HTTP thin-client architecture,
Documentum DMCL (client library) processing occurs on the machine that
hosts RightSite and the Internet Server. This is in contrast to the 3-tier
architecture, in which client library processing occurs on the users PCs. With
HTTP thin-client architecture, very little work actually happens on the client
machine (all users are assumed to be using browsers that support HTTP).
RightSite performs the operations that, in a 3-tier environment, are performed
by the hundreds or thousands of client PCs. Figure 2-6 illustrates the
difference between 3-tier architecture and HTTP thin-client architecture.
Figure 2-6 Documentum Client-library/3-tier Versus N
Resource Consumption
The workloads differ as to per-user operations and resource consumption.
The most resource-intensive workload is the WebPublisher workload,
primarily due to its heavy workflow and lifecycle component. The next most
expensive workload is the iTeam workload, for which the workflow and
lifecycle components are less intensive than the Web Publisher workload but
more intensive than the EDMI workload. (The EDMI workload is described in
Appendix A, Additional Workloads.) The least expensive workload is the
RightSite static Web site workload (which has a very small dynamic HTML
component, while all of the other workloads have heavy dynamic HTML
content).
Figure 2-7 illustrates the CPU consumption, normalized per user, for each of
the workloads (assuming a 400Mhz Pentium II processor). This data is useful
when comparing workloads on hardware for which there is no benchmark
data.
Documentum DMCL operations
(3 Tier Mode)
Hundreds to Thousands of Individual user PCs

Documentum DocPage Server
Thin-client HTTP

Internet Server + Documentum RightSite
On Centralized Middle-tier Server

(Documentum DMCL Operations)

Figure 2-7 Per User CPU Relationships Among the Workloads
Usage Patterns
All the workloads except the RightSite static Web site workload use
100-percent named access. There were no anonymous RightSite users for
those tests. A typical Documentum deployment that includes RightSite has
both named and anonymous users. Named users provide a user name and
password and then are authenticated and provided with some exclusive
resources. In particular, there is a separate RightSite process, WDK session, or
both created for each named user. RightSite anonymous users do not provide
a name or password; they share the anonymous login configured with
RightSite. On the resource side, they share from a pool of RightSite processes,
rather than having their own resources. Consequently, a workload that uses
100 percent named users consumes more CPU and memory resources than a
workload that has anonymous users.
The Document-Find-and-View workload has a client-server architecture and
uses named users. RightSite is not part of that workload. Although the EDMI
workload (described in Appendix A, Additional Workloads) is also Web-
based, in most cases the RightSite portion can be factored out of the data to
allow you to size this workload based on a client-server model.
Cpu Secs Per Reference User
normalized to a 400MHz Pentium II Processor
0 10 20 30 40 50 60 70 80
Web Publisher
iteam
Online Customer Care
EDMI
RightSite Static Website
cpu secs
Operations Not Included in Workloads
The workloads described in this chapter do not include the following
operations:
s Creating full-text indexes and full-text searching
s Dumping and loading a Docbase
s Distributed content operations and object replication
s Operations involving turbo storage areas
If these are part of your workload you may want to increase the expected
resource consumption for your workload.
3
Hardware Architecture and Scaling 1
This chapter discusses the architecture of Documentum Server-side software
and how to scale the software with increased load. The server software is
multi-tiered and can be partitioned horizontally to allow for scaling up within
a single host or scaling out across multiple hosts. This chapter outlines how
this can be done for each server tier. The chapter includes the following topics:
s Overview of Software Trends Affecting Scaling on page 3-1
s Scaling the Web Tier on page 3-5
s Scaling the eContent Server Tier on page 3-7
s Scaling DocBrokers on page 3-10
s Scaling the RDBMS on page 3-10
s Host-based vs. Multi-tiered Configurations on page 3-11
s High Availability Considerations on page 3-12
s Scaling Across the Enterprise on page 3-14
s Scaling the Web Content Management Edition on page 3-17
s Scaling the Portal Edition on page 3-21
Overview of Software Trends Affecting Scaling
Several trends in the computer industry today impose scaling requirements
on server software:
s More processors per server and more powerful processors
s Software reuse
s Wide variance in potential user deployments
Hardware Architecture and Scaling
The requirements of the first two trends point to partitioning software with a
single server as a solution. The requirements of the third trend point to
partitioning the software over unit machines (for higher capacity and better
reliability) as a solution.
More Powerful Processors and Software Reuse
One trend affecting scaling decisions is the rapid increase in processor power
(133MHz to 400MHz to 800MHz within a few years) and the availability of
server systems that can support more of these powerful processors within a
single system. Theoretically, if server software were well tuned, performance
would be improving dramatically. However, this trend is countered by the
popularity of software reuse.
Software reuse ensures that no single vendor provides all the software used in
a system and thus, scaling the system is limited by the weakest component.
One chief lesson of the RDBMS scaling efforts in the past ten years has been
that excellent scaling within a large SMP server is most likely when the server
software vendor has almost complete control over most aspects of the
software, down almost to the operating system level. With software reuse, this
level of control is significantly more difficult to obtain and multi-user
response time problems caused by internal server resource bottlenecks
become harder to locate and fix.
The best option left to software vendors is to ensure that the server software
can be partitioned into independent units that can be accessed transparently
as a single server. If partitioning is possible, sites can add additional servers as
the existing servers start to reach their user limits. The sites can add additional
servers until the entire capacity of the processors is exceeded.
Partitioning allows Documentum to scale on large SMP-based systems.
Customers might prefer to use such systems to consolidate software on one
easily managed system rather than distributing it over a large number of
small server machines. Figure 3-1 illustrates scaling up (partitioning within
one large system).
Figure 3-1 Scaling Up and Internally Partitioning Server Software
The ability to partition the server software into separate units also supports
the ability to scale the system out across multiple server machines. That is,
enterprises can spread the software server units over many machines or put
them within a single large system. Figure 3-2 illustrates scaling out
(partitioning over multiple servers).
Figure 3-2 Scaling Out to Increase Capacity
Wide Variance in User Deployments
Scaling out across many small machines is critical for IT groups faced with
uncertain rollout schedules and user populations. IT groups are faced with
Web-based software deployments that could serve not only their company
employees, but the companys suppliers and customers. The user populations
could easily number in the tens of thousands, and there is typically a large
possible variance in the types of usage.
One way to deal with this uncertainty is to buy a configuration that can
handle the worst-case scenario. This is potentially a waste of hardware budget
money if actual usage falls far below expectations. The other way to handle
the situation is to purchase a smaller system that might need more capacity
later. But, if the small system is not chosen correctly, complicated upgrade and
cutover scenarios might result when the new hardware is added (especially if
servers need to be replaced). It is typically advantageous for an IT group to be
able to add capacity using unit server machines. That is, if one system reaches
its capacity, the ability to add to the overall system capacity by simply
plugging in another system without bringing down the original system is
helpful.
Server software is
spread over multiple
machines but appears as
a single server to the
Scaling the Web Tier
The Trends and Documentum
The Documentum software supports scaling up (partitioning over one server
machine) and scaling out (partitioning over multiple server machines). The
Web-tier software, the eContent Server software, and the DocBroker (the name
server) can all be partitioned internally within a single system and externally
across multiple machines. Additionally, they can be partitioned so that they
appear to be a single server to the connecting clients.
How this transparent load balancing is accomplished depends on the server
component. With the Web-tier software, it is done with standard load
balancing hardware. eContent Server load balancing is accomplished by the
DocBrokers (name servers), and the DocBroker load balancing is handled by
each calling client (app server).
Partitioning not only allows the system to scale seamlessly and easily with
increased user load, it also helps make the entire system less sensitive to
single-system failures. If one server machine crashes, the other machines
assume the work of the crashed machine. Sometimes applications must be
able to handle and retry under certain failures, but in general this failover
feature works for the Web-tier software, eContent Server tier, and the
DocBrokers.
Finally, its important for an enterprise application to be able to scale over
large geographic areas that might be separated by low bandwidth, high
latency or heavily used, high-bandwidth connections. Documentum provides
many distributed feature options to allow applications to scale over these
deployment scenarios. (Table 3-1 on page 3-14 includes a list of these options.)
This section describes how to scale the Web-tier software up within a single
system or out across multiple systems.
Documentum offers two basic Web solutions:
s A RightSite-based solution
s An application-server-based solution
Typically, there are different reasons for partitioning each solution. Prior to
Release 4.2, RightSite servers were partitioned within a single system to work
around an overloaded DocBroker. The dmcl.ini file identified a single primary
DocBroker and the Rightsite server accessed that DocBroker for every user.
Release 4.2 introduced DocBroker load balancing features to address the
problem of overloaded DocBrokers. (DocBroker scaling is discussed in
Scaling DocBrokers on page 3-10.)
The application-server-based solutions (for example, the WDK with BEA or
any application server with DFC) need to be partitioned due to multi-user
internal bottlenecks that cause response times to increase.
Internally, the application servers can be partitioned in the following manner:
1. The Ethernet boards are set up to listen on multiple IP addresses.
2. An HTTP server is set up to work with each of these separate addresses.
3. Each application server is associated with an HTTP server.
After partitioning the software, a network load balancer can be employed to
spread the traffic over the various software servers. One type of network load
balancer that is frequently used is a hardware device that provides a single
virtual IP address to Web farm IP address mapping. Many different vendors
provide these products, but the products must have some form of session
stickiness to ensure that all session traffic from one client stays with the same
server for the duration of the session.
The load balancer can ensure this stickiness in many ways, including source
IP-based mapping, insert-cookie, and passive-cookie-based scenarios. If all
users are on separate PCs, the load balancer must at least support source IP
mapping. This method maps the source IP address of the packet to a
particular server IP address for the duration of the session.
The stickiness between two hosts is typically undone after some preset period
of inactivity, when it is assumed that the session has timed out due to
inactivity.
Figure 3-3 illustrates the use of a network load balancer. Some examples of
network load balancers include the Cisco Local Director, the Coyote Point
Equalizer, and BIG IP.
Scaling the eContent Server Tier
Figure 3-3 Transparent Load Balancing of Web Software
This section describes how to scale eContent Server up within a single system
and out across multiple systems.
IP-0
IP-1
IP-2
IP-3
IP-4, IP-5
User IP-A
User IP-B
Network Load Balancer

Maps traffic to IP-0 address to
addresses IP-1 through IP-4.
Should support at least Source
based mapping (i.e., IP-B IP-
4 mapping).
Web server farm and one of the servers has a a
partitioned HTTP/App server setup and requires
two IP addresses.
HTTP
Server
App
Server
IP-4
HTTP
Server
App
Server
IP-5
eContent Server can be spread over multiple instances within the same host or
over multiple hosts for the same Docbase. When this is done at a single data
center (that is, not split in some fashion to handle geographic separation of
users), all the eContent Servers typically access the same file store as a
network file system. Content is stored on a file server, and file synchronization
over the shared file system is provided by the interactions that eContent
Server has with the database. The procedure for setting up such a
configuration is detailed in the eContent Server Administrators Guide.
The view provided to the client software (such as an application server,
RightSite, or Desktop Client) is that of a single system for the Docbase. The
DocBroker maps client requests to the Docbases eContent Servers. The
DocBroker has a list of all active eContent Servers for a particular Docbase.
When a client initiates a connection to a Docbase, the client first queries the
DocBroker to obtain connection information for the eContent Server. The
DocBroker provides the information, and the client proceeds to set up its
session with that eContent Server. If no special proximity values have been set
for the servers, the default behavior is to randomly pick an eContent Server
from a list of available servers for each client request. Benchmark studies
show that this random-pick load balancing method works quite well.
A group of eContent Servers for a single Docbase is referred to as an eContent
Server cluster or server set. Using a server set not only provides higher capacity
but also provides high availability in an active/active cluster arrangement. If
one server fails, the DocBroker will assign new connections or re-connections
to the other eContent Servers in the cluster.
Figure 3-4 illustrates load balancing for eContent Server.
Figure 3-4 Load Balancing Over a Cluster of eContent Servers
App Server or
RightSite server acting
as a client to the
Group of eContent
server machines for the
same Docbase
RDBMS for attribute information
File Server for shared content
(1) Each eContent
server informs the
docbroker that it is
(2). While trying to
connect to the Docbase
the client software
queries the docbroker to
figure out how to connect
to the Docbase.
(3). The Docbroker
randomly picks an
eContent server from
(4) client software
connects to the
appropriate eContent
Scaling DocBrokers
Scaling DocBrokers
DocBrokers (Docbase name servers) can become a bottleneck in some large
multi-user environments. As the number of simultaneous requests to the
DocBroker increases (more users connect), the DocBroker reaches a point
where it cannot service requests fast enough. When this happens, connection
response time increases.
One solution to this problem is to add more DocBrokers and spread user
requests over these multiple DocBrokers. The DocBrokers can be started on
separate machines or on the same machine. If they are on the same machine,
they have either different port numbers or different IP addresses (after the
Ethernet board has been multi-homed to support multiple IP addresses).
To spread the user requests over multiple DocBrokers, eContent Server release
4.2 lets you configure the client software to randomly pick from the list of
DocBrokers in the dmcl.ini file when the client needs connection information.
That is, the client software will read in the list of DocBrokers and then
randomly pick one from the list. This algorithm load-balances across a group
of symmetric DocBrokers, ensuring that the DocBrokers services scales with
the increase in number of users.
(Prior to eContent Server release 4.2, the users could only be spread across
multiple DocBrokers by changing the DocBroker referenced in the
[PRIMARY_DOCBROKER] clause in the dmcl.ini file. Although secondary
DocBrokers could be listed in the file, they were used only if the primary
DocBroker did not respond. With application servers and RightSite, this
meant having each of the separate servers reference a different dmcl.ini file.)
Scaling the RDBMS
Given that the Web software, the eContent Servers, and the DocBrokers can
increase their capacity just by adding servers and enough hardware, the
bottleneck will shift to the RDBMS (for attribute information) or the shared
file server (for the content). RDBMS technologies cant be partitioned easily
and rely more on scaling up to add capacity, rather than scaling out. Each file
server and database vendor has different scalability limits. We recommend
viewing the Documentum detailed benchmark reports for more information.
Host-based vs. Multi-tiered Configurations
Host-based vs. Multi-tiered Configurations
This section outlines various considerations affecting the decision to
implement a host-based configuration (all software running on the same host)
or a multi-tiered configuration (one or more server software components
running on separate hosts).
First, typically no solution is perfect in every way. The decision must balance
requirements and needs against what each solution offers. Secondly, many of
the considerations that make the largest difference are not technical. This
section focuses on the strengths of each solution and mentions items that are
perhaps neutral for both.
The advantages of a host-based solution include:
s Simplifies some administrative tasks due to the single-host nature of the
installation
s Allows for simplified software upgrade procedures (all software on same
host)
s Provides better load balancing of applications and servers to the available
CPUs
s Enhances ability to share CPU capacity and system resources among
separate applications
The advantages of a multi-tiered, multi-server configuration include:
s Lets you add capacity more simply, through additional server machines
(does not require a fork-lift removal/replacement of one server machine
with a larger one)
s Lets you take advantage of the lowest cost/high-capacity server hardware
s Allows problems with software servers to be isolated more easily
s Supports high-availability configurations at less cost
Although an important consideration is performance, in just about every case,
you can set up the Documentum software on a host-based configuration and
achieve the same performance as on a multi-tier, multi-server configuration.
Performance is, therefore, a neutral consideration.
Another consideration is ease of administration. Aside from the
administrative advantages mentioned for each configuration (single host
means fewer boxes, but multi-host eases problem sectionalization), this very
High Availability Considerations
important consideration can only be addressed within the context of the skills
of a companys administrative staff. This is especially true when trying to
decide which operating system to use (Windows NT or Unix). For example, if
the administrative staffs skills are in Unix and not Windows NT, the staff can
more likely ensure good performance and high-availability with Unix-based
applications than Window NT applications. Their skill set could have a
profound impact on quality of service.
This section outlines some of the considerations and options available for
configurations that support high availability. The section assumes that at the
data integrity level, the need for mirrored drives (RAID 1) or other redundant
configurations (such as RAID 5) is obvious and well covered in the literature
and does not need to discussed here. This section is more concerned with
computer system and service availability and refers to the configurations
discussed earlier.
One advantage of a multi-tier, multi-server architecture is the ability to take
advantage of high availability at lower cost. This advantage is due to the
lower costs of the backup hardware. A system with fewer machines must
have backup machines with more capacity than systems with many machines.
This is illustrated in Figure 3-5.
Figure 3-5 Host-Based versus Multi-Tiered System
If the load is concentrated on a single host, then the backup machine must be
able to handle 100 percent of the load to avoid major degradations in service
response. However, if the load is distributed over six hosts, for example,
service response can be maintained if each host only supports 20 percent of
the load.
The other important consideration is the high-availability technology. It is
possible to use different solutions for the Web tier and the eContent Server
tier. For example, the network load balancers described earlier for the Web tier
typically re-route traffic to non-failed servers. In the eContent Server tier, the
DocBrokers also route users to non-failed eContent Servers. Finally, the client
dmcl software finds other DocBrokers on the list if the one picked initially
does not respond. The native Documentum high-availability features allow
for maximal use of the hardware at lower cost than typical operating system
vendor solutions.
With the Documentum eContent Server, the solutions provided by the
operating system are typically desirable only when a pair of systems supports
numerous Docbases and no single Docbase runs on both systems. Failover
In a host-based
configuration or
two node system
each should be
able to run 100%
In a multi-server
env with 6
machines each
server need only
run only 20% of
the load to
maintain service
with 1 host failure.
Scaling Across the Enterprise
will move the Docbases from one system to the other. If a Docbase spans
multiple hardware servers, you should use the native high-availability
features of eContent Server.
However, in any high-availability scenario, the RDBMS and the file server (if
used) must be set up to be highly available. These servers usually rely on the
OS-specific high-availability technology (for example, the Microsoft cluster
server on Windows NT).
This section outlines some of the distributed features and cost trade-offs that
can achieve good user response time. Most enterprise deployments include
users at remote data centers or branch offices. Deciding which configuration
to use depends heavily on examining the cost of network bandwidth between
the sites against the cost of extra server administration at the remote site. If the
cost of the network bandwidth is high between sites, then it might be more
economical to replicate Docbases at the remote site.
Table 3-1 summarizes the various deployment options offered by
Documentum to handle remote users.
Table 3-1 Summary of Deployment Options
Option Description
Central System All server software is located at some central data center.
Remote Web Servers Remote users access local Web and application servers that
communicate through the DMCL to Docbases that are stored
in a centralized data center.
Distributed Storage
Area
The RDBMS is in a centralized site. The Docbase has eContent
Servers installed at the central site and each remote site. Each
eContent Server has its own storage area so that local users
can access their content at higher speeds.
Content Servers A feature available for Distributed Storage Areas. The
eContent Servers at remote sites are designated as content
servers and used to retrieve only content. All metadata
requests are handled by the server at the primary site. This
reduces the amount of database communication.
One criterion for selecting one of the options listed in Table 3-1 is whether a
single Docbase will be employed. There are times when this occurs naturally
in an enterprise and has nothing to do with response time. In those cases
object replication, reference links or federations might be the appropriate
deployment option given the existence of multiple docbases. In many cases,
an enterprise deployment is being considered and the whether to create
multiple Docbases or have a single set of centrally managed docbases has not
been decided. In fact, sometimes it would be easiest to create a small set of
centrally managed Docbases and multiple Docbases would be considered
only when trying to ensure good response time across an enterprise with
various remote sites that have poor connecting network bandwidth. The most
important points of this section are to convey an understanding of the
administrative and network bandwidth needs of the various deployment
options to guide the decision process.
Content Replication Similar to the Content Servers option, except that the content
is replicated to the remote site. All users are able to access the
content quickly, unlike in the Content Server situation.
Reference Links In a multi-Docbase model, reference links allow users to
reference objects in other Docbases. Only metadata is copied
to the remote site. Content is not replicated to the remote site.
Users have fast access to local objects, but when remote
objects are displayed, users need to get them from the remote
site.
Object Replication In a multi-Docbase model, objects and their content are
replicated from one Docbase to another. Replication occurs in
off-peak hours; during the busier day users are able to access
local objects. This option is most useful when bandwidth is
extremely expensive compared to the additional cost of
setting up Docbases at the remote site.
Federations In this model, multiple Docbases already exist at the local and
remote site. Federations allow those Docbases to be integrated
in a more seamless fashion. Users have fast access to local
objects, however, accessing remote objects still requires high
bandwidth for good response.
Table 3-1 Summary of Deployment Options
Option Description
Figure 3-6 shows the relationship between administrative overhead at a
remote site and the various Documentum deployment options. Using
modems and routers at the remote site generates the least amount of
overhead. However, doing so requires remote users to access the central
system to get to the Docbase. One possible solution is to move the Web servers
to the remote site (causing a slight increase in administrative overhead to deal
with the Web server machine). This might be done in order to take advantage
of the DMCL being less verbose over the network than HTML (in Web
deployments). The distributed storage areas, content servers, and content
replication options all involve setting up an eContent Server at the remote site.
Finally, in environments that use federations, reference links, or object
replication, the remote site typically has an entire Docbase setup (RDBMS,
eContent Server, Web server, and network equipment). This represents the
largest administrative overhead for the remote site.
Figure 3-6 Administrative Overhead for the Deployment Options
Network bandwidth is another consideration. It might be administratively
simple to locate all Docbases in a central site but financially prohibitive due to
the cost of network bandwidth. The cost of networking associated with the

Documentum Deployment Option
M
o
r
e

a
d
m
i
n
i
s
t
r
a
t
i
v
e

o
v
e
r
h
e
a
d
modem
routers or
modems
Web Servers and routers
eContent Servers, Web
Servers, and routers
RDBMS, eContent Servers,
Web Servers, and routers
remote user to central
system
remote user to central
system
remote user to local
website to central system
distributed storage areas,
content servers, & content
replication
Ref. Links, Object replication,
& Federations
Scaling the Web Content Management Edition
deployment options is in many respects the inverse of the amount of
administrative overhead. Locating a Docbase at a remote site (which localizes
access to their own site for the remote users) typically reduces the need for
network bandwidth between the two sites. This relationship is outlined in
Figure 3-7.
Figure 3-7 Documentum Deployment Feature vs. Network Bandwidth Need
The Web Content Management (WCM) Edition allows a company to manage
their Web content and deliver it to a Web site as content and metadata. The
WCM Edition has several major components: content authoring, Site Delivery
Services, and access software for dynamic content and attribute retrieval from
Internet users. It can be characterized by the hardware and software needed
on either side of the Internet firewall and the software that connects them.
Figure 3-8 shows those components.
Documentum Deployment Option
M
o
r
e

n
e
t
w
o
r
k

b
a
n
d
w
i
d
t
h
Object Replication
Reference Links and
Federations
Central System
distributed storage areas,
content servers, & content
replication
Figure 3-8 Outline of WCM Architecture
RightSite/HTTP
Server
EContent Server
RDBMS
Content
Personalization
Services
RDBMS
Application
Server / Web
Server machines
Web
Cache
JDBC from
App Server for
Attribute
Information
Content
consumers
(typically
Internet)
Content Authors
(typically Intranet)
Web
Cache
AutoRender
Web Content Authoring
A Web content management offering must be able to handle increasing
numbers of:
s Content authors (or contributors) and organizational complexity
s Static and dynamic content
s Content consumers (or Internet users)
Managing the content going into the Web site is accomplished with
Documentum WebPublisher. WebPublisher is a Web-based application that
uses the services of RightSite, eContent Server, AutoRender Pro, and Content
Personalization Services. For the most part, sizing and scaling of users and
content in a WebPublisher implementation is like sizing traditional
Documentum applications.
One way in which WebPublisher differs is that AutoRender Pro, an optional
piece of the 4i family, is highly integrated into the WebPublisher application.
WebPublisher uses AutoRender Pro to convert WebPublisher-managed
documents into PDF or HTML automatically. AutoRender Pro polls a work
queue for each Docbase it services. One AutoRenderPro server can service
multiple Docbases or multiple AutoRender Pro servers can be set up to serve
a single Docbase (when the capacity of one AutoRender machine has been
exceeded). The Content Personalization Services service, integrated into
WebPublisher, provides the optional ability to set content attributes and link
documents automatically.
WebPublishers support of eContent Server document lifecycle and workflow
services allows for scaling of Web site content as well as organizational
complexity. For example, it permits the use of a business process to ensure
that all content on the web has been approved at the right levels. As the
content grows in a Web site, it becomes more difficult to ensure that all the
content and information is correct, up-to-date, and approved. In addition, the
organization required to manage that process become more complex as the
process matures. Both of these items are addressed by the eContent Server
document lifecycle and workflow features used by WebPublisher.
Consequently, for regular content contributors, WebPublisher can be more
workflow intensive than the average eContent Server implementation.
Site Delivery Services
The software that distributes content and attributes outside a firewall is
Documentum Site Delivery Services. This software has two main components:
WebCache and ContentCaster.
WebCache moves content and attributes to the other side of a firewall and
ContentCaster moves content to Web server machines worldwide. WebCache
is integrated into WebPublisher in that single object pushes of content and
attributes are done by WebPublisher users. The WebCache software consists of
two parts: a source transmitter and a target receiver. The source transmitter
consumes the larger part of the system resources. Because the transmitter is
coupled tightly with eContent Server, it can be scales to accommodate
additional users in the same fashion as eContent Server. That is, adding more
eContent Servers for the Docbase provides more WebCache transmitters.
Access Software for Dynamic Page and Metadata Retrieval
An Internet Web site grows not only in terms of the amount of content on the
site, but also the number of users who access the site and the number of
dynamic pages created.
The growth in users has several effects on scaling and sizing. First, the
resource consumption of these additional users will require additional Web
server, application server, and database server machines. ContentCaster will
ensure that the content delivered by WebCache is synchronized throughout all
the Web server machines in a data center and across a worldwide deployment.
After the content is delivered, no additional overhead incurred by
Documentum in accessing that content from a Web server. Consequently,
sizing and scaling the machines needed for static access is no different from
standard static content from Web sites.
The metadata is delivered to the RDBMS by WebCache. After it is delivered,
metadata values can be accessed in several ways. Documentum provides a
JDBC interface for application servers that use metadata to construct dynamic
HTML pages; however, any native database interface can be used. In this way,
Documentum software also causes no additional overhead in the creation of
dynamic pages after the metadata values have been stored in the database
serving the Internet users. Consequently, a sites scalability is limited not by
Documentum, but by the Web server environment, the application servers,
and the RDBMS serving up the dynamic content.
Scaling the Portal Edition
The application basis for the Portal edition is the iTeam application. iTeam is
based on Documentum WDK and the RightSite and eContent Servers. Sizing
and scaling an iTeam implementation is the same as sizing an application
based on those technologies.
4
Server Configuration and Sizing 1
This chapter discusses server configuration and sizing. The following topics
are included:
s Overview of Server Sizing Process on page 4-1
s Hardware Configurations on page 4-2
s Server Sizing Results from Benchmark Tests on page 4-4
s Other CPU-Related Notes on page 4-22
s Sizing Server Memory on page 4-23
s Examples of Memory Calculation on page 4-32
s Sizing Server Disk Capacity on page 4-35
s Database License Sizing on page 4-46
s Certified Database and HTTP Server Versions on page 4-47
Overview of Server Sizing Process
The server sizing process has two components:
s Sizing server CPU
s Sizing memory needs
Sizing server CPU for a Documentum deployment requires you to identify the
workload and the hardware configuration for the site. With that information,
you can use the benchmark reports to help determine how many CPUs are
needed. You must also size server memory. Figure 4-1 illustrates the server
sizing process.
Server Configuration and Sizing
Hardware Configurations
Figure 4-1 CPU Selection Process
The benchmark tests reported in this chapter were conducted using the
standard workloads described in Chapter 2, Deriving Workload
Requirements, and Appendix A, Additional Workloads. The hardware
configurations are briefly described in Hardware Configurations below and
compared in detail in Host-based vs. Multi-tiered Configurations on page
3-11. Not all possible combinations of workloads and hardware configurations
were tested.
This section describes the hardware configurations used in the benchmark
tests. These configurations include only eContent Server, the RDBMS, and
Documentum Web-tier software. Products like WebCache are add-ons with
respect to CPU consumption.
Benchmark results for these configurations with a variety of workloads and
hardware vendors are reported in Server Sizing Results from Benchmark
Tests on page 4-4. The reports include sizing information for CPU and
memory for each configuration tested.
Estimate your users
per busy hour
Match your workload
to a Documentum
standard workload.
Locate desired HW
type (for example,
Sun, Intel) and find the
corresponding users/
hour in chart to select
the number of CPUs.
Adjust number of
CPUs based on
differences between
your workload and
the standard
workload.
Move on to selecting the
memory.
Host-based Configuration
In a host-based configuration, all server software runs on the same physical
machine. The machine can be either a relatively small machine or a larger
mainframe-class system (for example, a Sun E6500 or HP V2250). (The
smallest machine used in benchmark tests on a host-based configuration had 2
processors.) Figure 4-2 illustrates a host-based configuration.
Figure 4-2 Host-Based ConfigurationAll Server Software Running on Same Host
N-Tier Configurations
In an N-tier configuration, the server software resides on different host
machines.
Figure 4-3 shows an N-tier configuration in which the Web servers, the HTTP
server (for example, Iplanet or IIS) and the Documentum application server
software (either RightSite or WDK running under some application server)
reside on a different server machine than the one that hosts eContent server
and the RDBMS. The CPU data for benchmark tests using an N-tier
configuration will focus on how many CPUs are needed for the entire system,
not just for a particular server.
HTTP Server +
RightSite Server
(and/or WDK/App Server) +
eContent Server +
RDBMS
Server Sizing Results from Benchmark Tests
Figure 4-3 N-tier Web Separate
Figure 4-4 illustrates an N-tier configuration in which the Web server, the
eContent server and the RDBMS are on separate machines. If a benchmark
uses this configuration with non-homogenous processors, the benchmark
notes will clearly identify which systems were used for what purpose.
Figure 4-4 N-tier - All Separate
This section contains sizing results from benchmarks conducted on a variety
of hardware configurations, using the following workloads:
s iTeam
s WebPublisher
s Online Customer Care
s EDMI
s Website (Anonymous access to RightSite virtual links)
s Find and View
eContent Server +
RDBMS
HTTP Server +
RightSite Server (and/or
WDK/App server)
eContent Server
HTTP Server +
WDK/App server)
RDBMS
Not all possible combinations of hardware configurations and workloads
were tested.
The results are reported by hardware vendor in tables that list the number of
users per hour supported on various hardware configurations. Some tables
also list a projected number of supported users for some configurations.
(Interpreting the CPU Sizing Tables on page 4-6 provides some assistance in
interpreting the tables.) Because it is unlikely that a customers prospective
workload will match one of the standard Documentum workloads perfectly, it
is necessary to adjust the selection of CPUs to ensure that the system is sized
appropriately.
Note: Treat RightSite and WDK/App server CPU as the same. For example, if
a set of operations takes 3 CPU seconds for RightSite, then you can assume
that the operations consume 3 CPU seconds for the WDK/App server
configuration. This is not true for memory, but can be assumed for CPU.
Special Focus for Some Tests
Some of the benchmarks conducted on N-tier configurations focus on the
capacity of a single tier within the N-tier environment. For those benchmarks,
the sizing result tables reference the figures below, to ensure that you
recognize the focus of the test.
In Figure 4-5, the focus is on the Web tier, running either RightSite or an
application server running the WDK.
Figure 4-5 Special Focus: Web Server Software Only in an N-tier Environment
eContent Server
HTTP Server +
WDK/App server)
RDBMS
Special Focus: Web
Server
Figure 4-6 shows a configuration in which the focus is on eContent Server on
its own host. Figure 4-7 shows a configuration in which the focus is on the
host machine on which both eContent Server and the RDBMS server reside.
Sizing results from benchmark tests that use the configuration in either Figure
4-6 or Figure 4-7 can be applied to deployments that dont include Web
software because the focus ignores the Web-server hardware.
Figure 4-6 Special Focus eContent Server Machine in an N-tier Environment
Figure 4-7 Special Focus: eContent Server and RDBMS on the Same Host
Interpreting the CPU Sizing Tables
Table 4-1, Table 4-2, and the accompanying notes provide guidance for
interpreting the sizing tables in this chapter.
eContent Server
HTTP Server +
WDK/App server)
RDBMS
Special Focus:
eContent Server
eContent Server +
RDBMS
HTTP Server +
WDK/App server)
Special Focus: eContent
Server and RDBMS on
same host
Table 4-1 is the first example table.
Explanation of the first example table:
s The table title identifies the workload used in the test and the hardware
vendor of the server machines used in the test.
s The first column, Configuration, lists all of the hardware that was used in a
test. The component in boldface was the focus and the limiting factor for a
particular test. For example, in the second row, the component on which
eContent Server was installed is the limiting factor. The model-2 means
that the component had two CPUs.
2x means that there were two of those systems used for that tier.
s The second column identifies the number of users per hour supported on
the configuration with acceptable response times. The users-per-hour
number has meaning only within the context of the workload identified in
the title.
Table 4-1 <Workload_name> on <Hardware_Vender>
Configuration Users/Busy Hour Notes
s model-8 (RDBMS)
s 2 x model-4 (eContent)
s 2 x model-2 (Web)
1600 The database and file server
could have easily been run
on a 4 processor machine.
The RDBMS host is the
bottleneck in this test.
s model-8 (RDBMS)
s 1 x model-2 (eContent)
s 2 x model-2 (Web)
400 This is a derived
configuration based on the
test results with a 4
processor 6400R.
The eContent Server host is
the bottleneck in this test.
(Figure 4-5)
The second example, Table 4-2, shows another type of sizing table found in
this chapter.
Explanation of the second example table:
s The table title identifies the workload used in the test and the hardware
vendor of the server machines used in the test.
s The first column identifies the system. The notation is:
vendor- model- number of processors
s The remaining columns identify the configuration on which the workload
was run. For example, in the above table:
The second column indicates the number of users per hour running the
workload on a host-based system with two processors.
The third column indicates the number of users per hour running the
workload on an N-tier system with 8 and 12 processors.
In this case, the model could have at most 4 processors, so the System
column (first column) indicates how many of the servers were used for
these tests.
The fourth column indicates the number of users per hour running the
workload on the eContent Server/RDBMS system. This ignores the
Web tier.
Table 4-2 <Workload_name> on <Hardware_vendor_name>
System and number of
CPUs
Users/hour on
host-based
system
(Figure 4-2)
Users/hour on an
N-tier system
(Figure 4-3)
Users/hour on N-
tier with focus on
eContent Server
and RDBMS
(Figure 4-7)
Users/hours on N-
tier with focus on
Web Server
(Figure 4-5)
Vendor-model-2 200 - 700 500
Vendor-model-4 450* - 1500 900
Vendor-model-8
(2 servers)
- 1000 - 1500*
Vendor-model-12
(3 servers)
- 1500* - -
The fifth column indicates the number of users per hour running the
EDMI workload on the Web server tier. It is sizing for the Web server
only.
Compaq Sizing Information
The information in this section covers the following Compaq servers:
s DL360
s Proliant 6400R
s Proliant 8500
The sizing information is based on N-tier tests. In the tests, the eContent
Server tier machines were Proliant 6400Rs. The features of a Proliant 6400R
include:
s Up to four Intel
Pentium III

Xeon
TM
processors at 550 MHz
s 6 PCI slots: 2 64-Bit/66 MHz; 3 64-Bit/33MHz; 1 32-bit/33 MHz
The Web-tier machines were Proliant DL360s. Each machine has, at most, two
800MHz CPUs, 4GB of memory, and two internal drives set up in a mirrored
pair. The DL360s are 1u machines that fit 42 to a standard rack.
The Proliant 8500 was the RDBMS server and file server for this test. Its
features include:
s Up to eight 700MHz Pentium III Xeon processors (1M or 2M L2 Caches)
s Up to 16GB of 100MHz SDRAM memory (only 4 GB addressable by NT)
s Multi-peer 64-bit PCI buses including 66MHz PCI slots
The tests employed 550Mhz processors and the machine had two Compaq
disk storage arrays attached to it. Table 4-3 shows the results of the tests.
Sun/Solaris Sizing Information
This section describes the following Sun machines and the benchmarks run on
them:
s Sun Enterprise 450 on page 4-11
s Sun Enterprise 6500 and 4500 on page 4-12
Table 4-3 Compaq Sizing Data for iTeam Workload
Configuration Users/busy hour Notes
s 8500-8 (RDBMS)
s 2 x 6400R-4 (eContent)
s 4 x DL360-2 (Web)
1600 The database and file server could
have easily been run on a
4-processor 8500
The database was the bottleneck in
this test (single process address
space limitation).
s 8500-8 (RDBMS)
s 2 x DL360-2 (Web)
400 This is a derived configuration
based on the test results with a
4-processor 6400R.
The eContent Server host was the
bottleneck in this test (Figure 4-6).
s 8500-8 (RDBMS)
s 2 x DL360-2 (Web)
800 The eContent Server host was the
s 8500-8 (RDBMS)
s 1 x DL360-2 (Web)
500 The Web server host was the
Sun Enterprise 450
The Sun Enterprise 450 is the top-of-the-line workgroup server from Sun. It
can have up to four 400MHz Ultra Sparc II processors (each with 4MB E-cache
memory). The E450 can have up to 182 GB of internal storage capacity and up
to 6 TB of external storage. It has 6 PCI buses providing up to 1GB/sec I/O
throughput.
Table 4-4 shows the results when the EDMI workload is run on configurations
with the Sun Enterprise 450.
Table 4-5 shows the results of the Anonymous RightSite Website Workload
run on configurations with the Sun Enterprise 450.
Notes:
s The Sun E450 can have up to four 400MHz Ultra Sparc II processors. The
system Sun-E450-8 is two E450s with 4 processors each. Sun-E450-12
represents three E450s with 4 processors each.
s Values marked with an asterisk (*) are from actual runs. All other values
are estimated based on the actual runs.
Table 4-4 EDMI Workload on Sun Enterprise 450 with 400 MHz Ultra Sparc II Processor
System and Number of
CPUs
Users/Hour on
Host-based System
(Figure 4-2)
Users/Hour
on an N-tier
System
(Figure 4-3)
Users/Hour on
N-tier with
Focus on
eContent Server
(Figure 4-7)
Users/Hour on N-
tier with Focus on
Web-Server
(Figure 4-5)
Sun-E450-2 200 - 700 500
Sun-E450-4 450* - 1500 900
Sun-E450-8 (2 servers) - 1000 - 1500*
Sun-E450-12 (3 servers) - 1500* - -
Table 4-5 Anonymous RightSite Website Workload on Sun Enterprise 450 with 400MHz Ultra
Sparc II Processors
System and Number of CPUs Users/Hour on Host-Based System (Figure 4-2)
Sun-E450-2 1000
Sun-E450-4 2000*
s The benchmark tests were run with Solaris 2.6 and EDMS 98 (v 3.1.6)
s In the 1500 EDMI users/hour test, each of the two E450s was about 60
percent busy. It is likely that the RightSite Application server results can be
increased by 30 percent.
s The metric for the Website benchmark can also be stated in terms of HTTP
Gets serviced per hour. Each Website user performs five STATIC_HTML
operations, and each STATIC_HTML operation, in turn, performs five
HTTP gets. Consequently, a 2000 Website users/hour run generates 50,000
HTTP gets per hour (each with an average HTTP get response time of 550
msecs).
Sun Enterprise 6500 and 4500
The Sun Enterprise E6500 is the top-of-the-line mid-range server from Sun. It
can have up to 30 x 336 MHz Ultra Sparc II processors (each with 4MB E-cache
memory). The E6500 can have up to 375 GB of storage capacity in the internal
cabinet and up to 10 TB of external storage. It can have 16 system boards,
which are either I/O boards or CPU/memory boards (2 CPUs per board).
Each I/O board has four PCI channels. The system on which the tests were
conducted had a total of 26 physical slots (the slots used to house the CPU/
memory system boards are also used to house the I/O boards).
The Sun Enterprise 4500 is the mid-range server from Sun. The Enterprise
4500 can have up to 14 CPUs at 336 MHz for each processor (or 8 system
boards).
See http://www.sun.com/servers/ for more information.
It is assumed that, with respect to the Documentum application, given equal
numbers of CPUs, memory, and disk, these systems will perform in an
identical fashion. That is, a 14-CPU E4500 will achieve the same performance
as a 14-CPU E6500. Therefore, in the sizing tables (Table 4-6 and Table 4-7)
they are treated as the same.
Figure 4-6 shows the sizing figures when the EDMI workload was run on
configurations with the Sun Enterprise 4500 and 6500
Table 4-6 EDMI Workload on Sun Enterprise 4500 and 6500 with 336MHz Ultra Sparc II
Processors
CPUs
Users/Hour on
Host-based
System
(Figure 4-2)
Users/Hour on an
N-tier System
(Figure 4-3)
(refer to the Notes,
the fourth bullet)
Users/Hour on N-
tier with Focus on
eContent Server
and RDBMS
(Figure 4-7)
Users/Hour on N-
tier with Focus on
Web Server
(Figure 4-5)
Sun-E6500/E4500-2 225 475 475
Sun-E6500/E4500-4 425 950 950
Sun-E6500/E4500-6 650 1425 1425
Sun-E6500/E4500-8 850 950 1900 1900
Sun-E6500/E4500-10 1075 1200 2350 2350
Sun-E6500/E4500-12 1300 1425 2825 2825
Sun-E6500/E4500-14 1500* 1650 3300 3300
Sun-E6500/E4500-16 1650 1900 3775 3775
Sun-E6500/E4500-18 1850 2125 4250* 4250*
Sun-E6500/E4500-20 2050 2350
Sun-E6500/E4500-22 2250 2600
Sun-E6500/E4500-24 2825
Sun-E6500/E4500-26 3050
Sun-E6500/E4500-28 3300
Sun-E6500/E4500-30 3550
Sun-E6500/E4500-32 3775
Sun-E6500/E4500-34 4025
Sun-E6500/E4500-36 4250
Table 4-7 shows the sizing figures when a host-based Web site workload was
run on configurations with the Sun Enterprise 4500 and 6500.
Notes:
s The host-based EDMI test that achieved 2,250 EDMI users/hour was
actually run on a 26-processor machine. However, even at the peak, the
average CPU utilization was around 65 percent. We believe that if the
system is reduced to 22 CPUs, response times will be maintained and the
utilization will be 80 percent.
s The 3-tier 4,250 result was actually run on a 3-tier configuration of two
E4500s and a single E6500 with 54 total CPUs. However, from an analysis
of the CPU utilization we feel that the response times could have been
maintained on a system with 18 CPUs on both the eContent Server/DBMS
machine and the RightSite application server machine (for a total of 36
CPUs).
s The scores on the Website 4 processor test are 40 percent higher than those
shown in Table 4-6. This difference is due to the fact that the DBMS server
were from different vendors.
s In all cases, we take the largest number of users tested with and
extrapolate downward for lower CPUs. The largest number is not
necessarily the limit of that server technology; it is the largest number
tested in a lab situation.
Table 4-7 Web Site Workload CPU Sizing for Sun Enterprise 4500 & 6500 with 336MHz Ultra
Sparc II Processors
System and Number of CPUs Users/Busy Hour
Sun-E6500/E4500-2 1300
Sun-E6500/E4500-4 2800
Sun-E6500/E4500-6 4300
Sun-E6500/E4500-8 5800
Sun-E6500/E4500-10 7300
Sun-E6500/E4500-12 8800*
s The metric for the Web site benchmark can also be stated in terms of HTTP
Gets serviced per hour. Each Website user performs five STATIC_HTML
operations. Each STATIC_HTML operation, in turn, performs five HTTP
gets. Consequently, an 8,800 Web site users/hour run generates 220,000
HTTP gets per hour (each with an average HTTP get response time of 300
msecs).
IBM, Windows NT, and AIX Sizing Information
This section contains sizing information for the following machines:
s IBM Netfinity 7000 M10 on page 4-15
s IBM AIX Systems: S7A and F50 on page 4-16
IBM Netfinity 7000 M10
The IBM NF7000M10 is the top-of-the-line Intel server from IBM. It can have
up to 4 x 400 MHz Pentium II Xeon processors. The NF7000 M10 has 6 PCI
slots and can support up to 54 GB of internal storage and 5 TB of external
storage. The system on which the tests were conducted had a total of 4
physical CPUs, 4GB of memory, and 2 EXP10 disk arrays.
The EXP10 array can hold up to ten 9GB drives. Two I/O channels are used to
interface with the two controllers on the array. The array supports various
levels of RAID.
Table 4-8 lists the sizing figures for the IBM Netfinity 7000 M10.
Table 4-8 EDMI Workload on IBM Netfinity 7000 M10
CPUs
Users/Hour
on Host-
based System
(Figure 4-2)
Users/Hour on
an N-tier
System
(Figure 4-3)
Users/Hour on N-
tier with Focus on
eContent Server
and RDBMS
(Figure 4-7)
Users/Hour on N-
tier with Focus on
Web Server
(Figure 4-5)
IBM-NF7000M10-2 250 500 500
IBM-NF7000M10-4 500* 500 900* 1000*
IBM-NF7000M10-8 900*
Notes:
s There are some performance differences between certain DBMS vendors
on Windows NT.
s In all cases, we take the largest number of users tested with and
extrapolate downward for lower CPUs. The largest number is not
necessarily the limit of that server technology; it is the largest number
tested in a lab situation.
s These tests were conducted with Windows NT 4.0 SP4.
IBM AIX Systems: S7A and F50
The IBM S7A is a high-end RS6000 AIX server from IBM. Its highlights are:
s Standard configuration
Microprocessor: 4-way 125 MHz RS64-I or 262 MHz RS64II (upgrade
only)
Level 2 (L2) cache: 4MB for 125 MHz processors; 8MB for 262 MHz
processors
RAM (memory): 512M
Media bays: 3 (2 available)
Expansion slots: 14 PCI (11 available)
PCI bus width: 32- and 64-bit
Memory slots: 20
s AIX operating system
Version 4.3 for 125 MHz processors and Version 4.3.1 for 262 MHz
processors
s System expansion
SMP configurations: Up to 2 additional 4-way processors
RAM: Up to 32GB
Internal PCI slots: Up to 56 per system
Internal media bays: Up to 12 per system
Internal disk bays: Up to 48 (hot-swappable)
Internal disk storage: Up to 436.8GB
External disk storage: Up to 1.3TB SCSI; up to 14.0TB SSA
The IBM F50 is a lower-end Enterprise server. Its highlights are:
s Standard configuration
Microprocessors: 166 MHz or 332 MHz PowerPC 604e with X5 cache
Level 2 (L2) cache: 256KB ECC
RAM (memory): 128MB ECC Synchronous DRAM
Disk/media bays: 18 (1 used)/4 (2 used)
I/O expansion slots: 9 (7 PCI, 2 PCI/ISA)
PCI bus widths: 2 32-bit and 1 64-bit
Memory slots: 2
s AIX operating system
Version 4.2.1 or Version 4.3
s System expansion
SMP configurations: To 2, 3 or 4 166 MHz or 332 MHz processors
(cannot be mixed)
RAM: Up to 3GB
Internal disk storage: Up to 172.8GB (163.8GB hot-swappable)
External disk storage: Up to 4.8TB SCSI-2; up to 3.5TB SSA
Table 4-9 shows the CPU sizing for the Tested IBM/AIX Servers.
Table 4-9 EDMI Workload on IBM/AIX Servers
Configuration
in the format:
No. of Servers x System-No. of Processors
Users/Busy Hour
s 1 x S7A-4 (RDBMS)
s 1 x F50-4 (eContent)
s 3 x Netserver-2 (Web)
3000
Notes:
s The 4-tier tests used three IBM NF7000M10s as RightSite application
servers. Each 7000M10 had four 400MHz Pentium II Xeon processors (for a
total of 20 CPUs used between Unix and NT).
s The tested F50/S7A combination equipped both systems with four
processors. The DBMS server ran on the S7A (AIX 4.3 operating system)
and the eContent Server ran on the F50 (AIX 4.2 operating system).
s The notation F50/S7A-4 means that both systems have two processors.
s The S7A was set up to have only 4 processors, to match the processing
power of the new, less expensive IBM H70 (4-processor machine).
HP Windows NT and HP-UX Servers
This section discusses the following machines:
s HP NT/Intel Servers on page 4-18
s HP-UX Servers on page 4-20
HP NT/Intel Servers
The HP NETSERVER LXR 8000 is the top-of-the-line Intel server from HP. It
can have up to 4 x 400 MHz Pentium II Xeon processors (with up to 1M cache
per processor) and upgrades to 8-way multiprocessing and future processors.
The NETSERVER LXR 8000 has 10 full length PCI slots: four 64-bit hot-swap
slots, five 32-bit slots, and one shared 32-bit PCI/ISA slot. It can handle up to 8
GB of physical memory. The system on which tests were conducted had a total
of 4 physical CPUs and 4GB of memory and connected to a disk array model
30/FC through two fiber-optic interfaces.
s 1 x S7A-2(RDBMS),
s 1 x F50-2 (eContent)
s 3 x Netserver-2 (Web)
1500
Table 4-9 EDMI Workload on IBM/AIX Servers
Configuration
in the format:
No. of Servers x System-No. of Processors
Users/Busy Hour
HP has another comparable server called the LH4. This server is also a
4-processor Intel-based system. It only lacks some of the expandability of the
LXR 8000. The LH4 can only go up to 4 GB of memory.
The HP Lpr is a two-processor rack-mounted server that can have up to 1 GB
of memory. The processors used in the tests were 600Mhz. Each server is 2U in
size, and a standard rack can hold 20 servers.
Table 4-10 lists the sizing results for the HP LXR8000 and LH4 when the iTeam
workload is run.
Table 4-11 lists the sizing results for the Lpr/LH4 N-tier test.
Table 4-10 iTeam Workload on HP LXR8000 & LH4
System and Number of CPUs Users per Busy Hour on Host-Based System
(Figure 4-2)
HP-LH4-2 100
HP-LH4-4 200*
Table 4-11 iTeam Workload on Lpr/LH4, N-tier Test
Configuration
in the format:
No. of Servers x System-No. of
Processors
Users per Busy
Hour
Notes
s 1 x LH4-2 (RDBMS),
s 1 x Lpr-2 (eContent)
s 2 x Lpr-2 (Web)
400 -
s 1 x LH4-2 (RDBMS)
s 1 x Lpr-2 (eContent)
s 1 x Lpr-2 (Web)
200 The Lpr can only have 1GB
of memory and this limits
the number of RightSite
server connections. The
2-processor Lpr was
memory bound, not CPU
bound.
Table 4-12 lists the sizing results for the EDMI workload on HP LXR8000 and
LH4 machines.
Note:
s Values marked with an asterisk (*) are from actual runs. All others are
estimates based on the actual runs.
HP-UX Servers
Two types of HP-UX servers have been tested with Documentum: the V2600
and the K580.
The V2600 machine is a high-end HP-UX server with the following features:
s Up to 32 CPUs (a maximum of 16 were used in the tests)
s Up to 32 GB of memory
s Up to 28 2X PCI slots
s System-wide throughput of up to 15.36 GB with HPs HyperPlane crossbar
technology
s Up to 19-GB I/O throughput
The K580 machine has the following features:
s Up to six-way symmetric multiprocessing
s Single-level, large, full-processor-speed 2-MB/2-MB and 1-MB/1-MB
instruction/data caches
s Up to 37 I/O slots with optional I/O expansion cabinets
s Four internal Fast/Wide Differential SCSI-2 disk storage bays
Table 4-12 EDMI Workload on HP LXR8000 and LH4
System and
Number of CPUs
Users per Hour
On Host-based
System
(Figure 4-2)
Users per Hour
on an
N-tier System
(Figure 4-3)
Users per Hour on
N-tier with Focus
on eContent and
RDBMS
(Figure 4-7)
Users per Hour on N-
tier with Focus on
Web-Server
(Figure 4-5)
HP-LH4-2 250 - 500 500
HP-LH4-4 500* 500 900* 1000*
HP-LH4-8 (2 servers) - 900* - -
s Up to 30 TB of total disk capacity using optional expansion cabinets
Table 4-13 shows the sizing results for the HP-V2600 using the online
customer care workload. (The online customer care workload is described in
The Online Customer Care Workload on page A-9.)
Table 4-14 shows the sizing results for the HP-K580 running the Document-
Find-and-View workload. (The Document-Find-and-View workload is
described in The Document Find and View Workload on page A-9.)
Notes:
s The Document-Find-and-View workload is different from the EDMI
workload in that it is client-server based (not Web-based). It also has a
subset of the EDMI operations (folder searching and attribute searching). It
is read-only.
s Values marked with an asterisk (*) are from actual runs. All others are
estimates based on the actual runs.
Table 4-13 Online Customer Care Workload on V2600 Machines
System and Number of CPUs Users/Hour on Host-based System (Figure 4-2)
HP-V2600-4 500
HP-V2600-8 1000
HP-V2600-16 2000
Table 4-14 Document Find and View Workload on K580 Machines
System and Number of CPUs Users/Hour on Host-based System (Figure 4-2)
HP-K580-2 1300
HP-K580-4 2600
HP-K580-6 4025*
Other CPU-Related Notes
Other CPU-Related Notes
Here are some other guidelines and recommendations:
s When feasible, dedicate a separate server to the Documentum installation.
s Do not run Documentum on the same physical server as an ERP
(Enterprise Resource Planning) system. Do not install Documentum on the
PDC (Primary Domain Controller), the NIS (Network Information Services
Master), a file or print server, or another application server.
s Documentum supports eContent Server and RightSite on all Microsoft-
certified Windows NT Server hardware vendors. Omission of any
Windows NT Server hardware vendor from this document is due to lack of
space and is not intended to imply any lack of support for that vendors
Windows NT Server hardware.
s Contact your chosen hardware vendor to select the best server for your
immediate and future needs. Note that the term Enterprise Servers refer to
business, not scientific, application servers. Workgroup Servers may also
be acceptable for development, testing, workgroup, and other less
demanding configurations. Consult the release notes for the specific
product for the exact hardware on which a product is certified.
The following web sites may be useful references for researching hardware
vendor and associated product information:
HP UNIX Enterprise Servers http://www.enterprisecomputing.hp.com/
Sun Enterprise Servers http://www.sun.com/servers
IBM RS/6000 Servers http://www.rs6000.ibm.com/hardware
IBM NT servers http://www.pc.ibm.com/us/netfinity/
NT Server OS http://www.microsoft.com/ntserver
HP NT Server http://www.hp.com/netserver/
COMPAQ NT Server http://www.compaq.com/products/servers
Sizing Server Memory
This section discusses how to size server memory. It includes the following
topics:
s Overview of the Sizing Process on page 4-23
s Key Concepts Relating to Memory Use on page 4-25
s Estimating Physical Memory Usage on page 4-28
s Estimating Paging File Space on page 4-30
s Additional Considerations on page 4-31
Memory sizing considers two system components: physical memory and
paging (or swap) file space.
To size physical memory, you must determine how much memory process
working sets are going to consume. Aside from the DBMS, you should be
primarily concerned with the non-shared pages of process working sets.
(Key Concepts Relating to Memory Use on page 4-25 defines process
working sets and how they are managed within physical and virtual
memory.)
To size the paging file, you consider the virtual memory allocated.
If you run out of either, problems can arise. If you run out of physical memory,
then lots of I/O could occur to the paging (swap) file, which leads to poor
performance. If you run out of space in the paging file, commands may fail.
Although some operating systems clearly distinguish between the two
problems, most typically give out messages that are confusing and vague.
Figure 4-8 illustrates the steps to take to determine memory and swap space
needs.
Figure 4-8 High-Level Steps to Determining Memory and Swap Space Needs
The information gathered in step one is used to obtain the estimates for
physical memory and paging file space.
Oversizing memory is strongly recommended because most memory use in a
Documentum deployment is attributed to the server caches in the system. The
caches enhance the performance of various operations, and more efficient
operations mean better response times. Consequently, it is better to oversize
memory than risk undersizing it.
Estimating Physical Memory Usage on page 4-28 and Estimating Paging
File Space on page 4-30 contain guidelines for estimating physical and
paging or swap file memory. Some general guidelines are listed in
Additional Considerations on page 4-31.
For some examples of memory calculations, refer to Examples of Memory
Calculation on page 4-32.
1. Determine Operating System.
2. Estimate Active users
3. Estimate Number of documents
Estimate the amount of physical memory
required
Estimate the amount of swap or paging
space required
Key Concepts Relating to Memory Use
This section discusses two key concepts:
s Virtual and physical memory
s Cache memory use
Understanding these concepts is crucial to accurate memory estimation.
Virtual and Physical Memory
Virtual memory is a service provided by the operating system (and hardware)
that allows each process to operate as if it had exclusive access to all physical
memory. However, a process only needs a small amount of the virtual
memory to perform its activities. This small amount, called the process working
set, is the actual amount of physical memory used by the process. The
operating system manages the sharing of physical memory among the various
working sets.
Physical memory is a limited resource. When the operating system wants to
conserve physical memory or manage a situation in which all working sets
wont fit into physical memory, it moves the excess pages into a paging file.
Additionally, although a process may think it has exclusive access to all of
virtual memory, the operating system transparently shares many of the read-
only portions (such as program instructions).
Figure 4-9 illustrates the relationships between physical memory, virtual
memory, and process working sets.
Figure 4-9 Real Memory vs. Virtual Memory
Cache Memory Usage
On a typical Documentum server software installation (RightSite, eContent
Server, and a DBMS server), memory is used most heavily by various caches.
A cache is a memory area used to hold some data or object so the software can
avoid performing an expensive operation (read data from disk, from network,
and so forth).
As a particular process grows, it typically fills its caches with information,
making its operations less expensive. (An administrator can control the
maximum size of some caches, but others are sized automatically.) This
trade-off between performance and cache size means that excessive memory
use by a cache is not always bad. The most important sizing task is to ensure
that cache needs do not outstrip the available physical memory.
Shared Pages
(EXE, RD only
or real shared
memory)
Private
Pages
Private
Pages
Process #1
Process #2
Physical
Memory
Process
Working
Virtual Memory: An
abstraction provided by OS and
hardware
The Paging File or Swap file.
May have to reserve space for all
virtual memory allocated
(depending on OS) . Holds pages
that have been pulled out of real
memory.
DBMS Caches
The DBMS data cache is generally the most dominant cache. (Its size is under
administrative control.) The DBMS uses the data cache to minimize the
number of disk I/Os that it must perform for data and index rows. It is
significantly less expensive for a DBMS to find 100 rows in a data cache than
to find them on disk. A production server system with many documents will
likely need hundreds of Mbytes (perhaps even one or more Gbytes) of
memory for this cache to ensure acceptable performance. Sizing the DBMS
cache generously reduces disk I/Os significantly.
Several DBMS servers also have caches for SQL statements that are executed
repeatedly. These caches conserve the number of CPU cycles needed for
operations such as security validation, parsing, and execution plan
preparation. It is typically good to give these caches plenty of memory. Check
the RDBMS documentation for more details.
eContent Server Caches
eContent Server uses several caches to enhance performance for operations
such as DBMS interactions, CPU cycles, and network operations. Most of the
caches are small (less than 1M byte) and bounded by the number of objects
they can contain.
The global type cache is the most dominant of eContent Servers caches. The
global type cache holds structures that provide eContent Server with fast
access to the DBMS tables that make up a types instances. The size of this
cache is limited by the number of types in the Docbase. The amount of real
memory consumed is determined by how many instances and types are
accessed. Although this cache is called the global type cache, it primarily
supports per-session access. Each eContent Server process, or thread, has its
own copy.
If the process working set of your eContent Server is larger than the memory
estimates listed in Table 4-15, your installation is probably using more custom
types than those used in the capacity testing.
RightSite Server Caches and Work Areas
RightSites memory use is dominated by:
s DMCL object cache
s Docbasic compiled code
s Temporary intermediate Dynamic HTML construction memory
The DMCL object cache requires memory to store recently referenced Docbase
objects. Its size is bounded by the maximum number of objects that can be
stored in it (the number is set by an environment variable).
The Docbasic compiled-code memory area contains the pre-compiled
Docbasic code for the dynamic HTML executed by RightSite. The memory
used is, at most, equal to the space used by the on-disk cache of pre-compiled
Docbasic routines.
The temporary, intermediate dynamic HTML construction memory is the
memory used by RightSite to construct a dynamic HTML page. RightSite
makes heavy use of memory when constructing dynamic pages, and the
larger the number of dynamic pages accessed, the more memory is needed.
All of these areas can grow Mbytes in size depending on the workload.
Estimating Physical Memory Usage
This section contains guidelines for estimating physical memory needs for a
server machine.
User Connection Memory Requirements
Table 4-15 lists the physical memory estimates for a single-user connection for
eContent Server and RightSite on Unix and Windows NT. These estimates are
based on observations of WebPublisher 4.2 and iTeam 4.2. Actual memory
needs will vary depending on the complexity of the workload.
Table 4-15 Estimated Physical Memory Needed per Connection
Server Solaris and AIX HP-UX Windows NT
eContent Server 10M Bytes 10M Bytes 10M Bytes
RightSite Server 20M Bytes 20M Bytes 20M Bytes
DBMS Memory Requirements
When estimating memory usage for the DBMS, consider giving the DBMS
hundreds of Mbytes of memory (perhaps even several Gbytes if the Docbase
has several million documents). Because eContent Server supports a general
purpose query language, some queries can result in DBMS table scans. By
having more memory, you minimize the number of disk I/Os needed to
execute those queries.
Table 4-16 describes the DBMS memory requirements for Docbases of various
sizes. The values in this table are presented for planning use only. The actual
requirements may vary, depending on the object types and number of objects
in the Docbase.
Operating systems generally require special tuning to support multi-giga
bytes of physical memory for the DBMS. Table 4-17 lists some examples of the
required tuning.
Table 4-16 DBMS Memory Recommendations by Docbase Size
Number of Documents Planning Ranges of
Document Metadata Size
(DBMS disk space)
Minimum Recommended
Memory Size for DBMS
1,000,000 4G to 5G 500M to 1GB
2,000,000 8G to 10G > 1GB
3,000,000 12G to 15G > 1GB
4,000,000 16G to 20G > 2GB
5,000,000 20G to 25G > 2GB
Table 4-17 Special Considerations for Supporting Large Memory
DBMS and Platform To Achieve Required Tuning
Oracle on Unix A DBMS buffer cache >2GB Oracle executable must be
relinked/rebased. See Oracle
documentation for more
details.
Oracle on Solaris A DBMS buffer cache >1GB Shared memory parameters
must be set properly in the
system file.
Operating System Memory Requirements
The operating system buffer cache also uses physical memory. This cache is
used to store content file blocks as they are read (or written) to disk. On some
versions of Unix, this cache can be explicitly sized. It should be small for
installations with content files that are small. For deployments with large
content files, it should be large.
On Windows NT, set up the operating system to favor process working sets
over the buffer cache. The buffer cache and the memory set aside for process
working sets are dynamically sized, and an administrator can configure how
conflict between the two areas is resolved. The buffer cache must have less
priority than the process working sets or an anomalous situation could arise
that forces the DBMS or eContent Server out of memory to make way for the
file cache. This will lead to very poor performance.
Additionally, if the RDBMS is Microsoft SQL Server, you can put the tempdb
in physical memory (called temp DB in RAM). Doing so may lead to some
performance gain and is useful especially when there is sufficient memory
available. See the SQL Server product information for more details.
Estimating Paging File Space
Determining the space required for the paging file is almost as important as
determining the required physical memory. Table 4-18 lists the paging file
space recommendations for eContent Server and the RightSite Server.
Oracle on Windows
NT
A DBMS buffer cache >2GB Windows NT Enterprise must
be booted with the /3GB flag in
the boot.ini.
Table 4-17 Special Considerations for Supporting Large Memory
DBMS and Platform To Achieve Required Tuning
Table 4-18 Recommended Paging File Space per Active Connection
Server Unix Windows NT
eContent Server 11M to 20M Bytes 4G
RightSite Server 6M to 12M Bytes 4G
The actual amount of paging file space required for each active connection
depends on the number of Documentum object types that are created.
Implementations that create many customized types might require more
paging file space per connection than shown in Table 4-18.
The maximum page file size for each drive letter on Windows NT is 4GB.
Always size the paging file to this maximum for each drive in the server
unless it interferes with the operating systems ability to create a crash dump.
If the paging file area is improperly sized, errors can occur that appear to be
memory errors. On Unix, you can use vmstat to detect an out-of-memory or
out-of-paging file condition. On Windows NT, you can use the performance
monitor, the task manager, or post-error from an error popup to detect these
conditions.
Additional Considerations
Keep in mind these additional guidelines when you size memory:
s On Solaris 2.6, the disk space for /tmp is treated like a process and its
working set. This means that /tmp will consume some physical memory
and swap space. You may need to increase some of the swap space
estimates if there is heavy use of the /tmp file system.
s Consider the impact of the following business-related factors on memory
utilization:
End of month, quarter, year peaks
Company growth over three years
Batch processing (system administration jobs)
Use of AutoRender Pro and the Transformation Engine
Users connecting from other sites
Integrations with other systems (such as SAP R3, PeopleSoft, and so
forth.)
s Manage the risk of the unknown by adding a safety margin to your
calculation
s Research maximum RAM capacity per server. If you may need more
physical RAM in the future, do not buy a server box with maximum RAM
currently installed. If it is easy to buy physical RAM, then plan for the
Examples of Memory Calculation
worst case and buy for the best. Your choice of a server model may be
governed by the memory requirements calculated for your active
Documentum users rather than by the CPU requirements.
The three examples in this section use the following equation as a basis for
memory estimates:
Memory = Base Memory + DBMS Memory +
(Per User Memory for Documentum x Number of Active Users)
Example One
This example is based on the following assumptions:
s All components (eContent Server, HTTP Server, RightSite, RDBMS) reside
on one server (host-based).
s The Docbase will have up to 500,000 objects within the next two years.
s The maximum number of active users will be 50 users (20 percent of 250
users per hour).
Table 4-19 shows the memory calculations for the first example.
Table 4-19 Memory Calculation Example 1
Software Components Minimum Memory
Capacity Required
Operating system 128 MB
RDBMS 500 MB
eContent Server 32 MB
RightSite and HTTP (required only for Web clients or
RightSite applications)
32 MB
50 Active Users x (10 MB + 20 MB) 1.5 GB
TOTAL Estimated Server Memory Requirements 2.2 GB
Example Two
s eContent Server and the RDBMS are on one server; RightSite and the
HTTP Server are on a second server.
s The Docbase will have up to 500,000 objects within the next two years.
users per hour).
Table 4-20 shows the calculations for the first server, and Table 4-21 shows the
calculations for the second server.

Table 4-20 Memory Calculation for Example 2, First Server (eContent Server and RDBMS)
Software Component Required Minimum Memory
Capacity
RDBMS 500 MB
50 Active Users x 10 MB 500 MB
TOTAL Estimated Server 1 Memory
Requirements
1.2 GB
Table 4-21 Memory Calculation for Example 2, Second Server (RightSite and HTTP Server)
Capacity
HTTP Server 32 MB
RightSite Server 32 MB
50 Active Users x 20 MB 1 GB
TOTAL Estimated Server 2 Memory
Requirements
1.2 GB
The aggregate minimum required memory for both servers is 2.4 GB.
Example Three
s Three servers: One for the Documentum eContent Server, another for the
RDBMS, and another server for the HTTP server software and RightSite.
s The Docbase will have up to 500,000 objects within the next two years
users per hour)
Table 4-22 shows the calculations for the first server; Table 4-23 shows the
calculations for the second server; and Table 4-24 shows the calculations for
the third server.

Table 4-22 Memory Calculation for Example 3, First Server (eContent Server)
Capacity
50 Active Users x 10 MB 500 MB
TOTAL Estimated Memory Requirement
for the eContent Server Machine
660 MB (round up to 700 MB)
Table 4-23 Memory Calculation for Example 3, Second Server (RDBMS)
Capacity
RDBMS 500 MB
for the RDBMS Machine
628 MB (round up to 768 MB)
Sizing Server Disk Capacity
The aggregate estimated minimum memory for all three servers is 2.3 GB.
This section describes how to size server disk capacity and provides
guidelines and information to help you with that task.
Figure 4-10 summarizes the disk capacity sizing process.
Table 4-24 Memory Calculation for Example 3, Third Server (HTTP and RightSite Servers)
Capacity
HTTP Server 32 MB
RightSite Server 32 MB
50 Active Users x 20 MB 1 GB
for the HTTP and RightSite Server
Machine
1.2 GB
Figure 4-10 Disk Storage Sizing Process
There are numerous inputs to this process, and it is important to obtain correct
estimates and information for them. If the inputs are incorrect and disk space
is sized incorrectly, the server will perform very badly. Fixing problems
related to disk space is difficult after the fact. The inputs to the process
include:
s The amount of physical space needed for all file stores
s The recovery characteristics for all file store areas
s The disk access needs for each file store area
After you obtain the needed inputs, you can determine which disk
configuration will best suit your needs.
1. Determine Operating System.
2. Estimate Active users & users per
hour
3. Estimate Number of documents
4. Est. # of renditions and versions
5. Frequency of Full text Searches,
case-insensitive searches, and
dump/load operations
Estimate the amount of physical space
required on disk
Estimate the amount of drives needed to
provide sufficient disk access capacity.
Establish recovery requirements for each
Data storage area
The next section Key Concepts for Disk Sizing, reviews some of the factors
that affect decisions about disk sizing and configuration. Disk Striping and
RAID Configurations on page 4-40 describes a variety of common disk
configurations. Disk Storage Areas on page 4-42 describes the
characteristics of the disk storage areas used in a Documentum deployment
and makes some disk sizing and configuration recommendations for each.
Key Concepts for Disk Sizing
To correctly size disk space, it is important that you understand several key
concepts:
s Disk Space and Disk Access Capacity
s Effect of Table Scans, Indexes, and Cost-based Optimizers on I/O
s DBMS Buffer Cache Memory Effect on Disk I/Os
Disk Space and Disk Access Capacity
To size disks for a Documentum installation, you must size both the required
disk space and the disk access capacity. Sizing disk space entails ensuring that
there is sufficient room to store the permanent and temporary data. Sizing
access capacity entails ensuring that there are sufficient disk spindles (or
arms) to gather all of the data in a reasonable time frame.
To illustrate the difference, suppose there is a 4 GB DBMS table, which can fit
on a single 9 GB drive. Suppose also that the DBMS needs to scan the entire
table. If the DBMS uses 16K blocks, then 262,000 disk I/Os are needed to scan
the table. If the scan takes place over a single disk drive (at 10 msec an I/O), it
takes 44 minutes to scan the table (assuming that no pre-fetching occurs).
However, if the table is striped over 10 drives in a RAID 0 stripe, the scan
might actually only take 4 minutes if one read could hit all 10 drives (by small
stripe units). Notice that ten 9 GB drives offer 10 times the space needed for
the table; however, meeting the space requirements did not necessarily meet
the response time requirements.
Effect of Table Scans, Indexes, and Cost-based Optimizers on I/O
The biggest demands on disks typically come from large table scans. A table
scan occurs when a DBMS reads all of the data in a table looking for a few
rows. In a table scan, the DBMS does not use an index to shorten the lookup
effort. Table scans are not always bad (in some cases using an index can
actually hurt performance). However, if the table is large enough or the
amount of physical RAM is small enough, table scans generate enormous
amounts of disk I/O.
Note: A database index is a data structure that allows the DBMS to locate
some records efficiently without having to read every row in the table.
Documentum maintains many indexes on tables and even allows the
Administrator to define additional indexes on their underlying tables.
In general, large table scans should not appear in your workload, but there are
some operations that will result in a table scan. For example, the DBMS cannot
use an index for a case-insensitive attribute search, which leads to a table scan.
If your applications contain operations that result in large table scans which
cant be tuned away, it is important to size the servers disks properly to get
best performance.
Tuning with the Optimizer
Some table scans are unavoidable (case-insensitive search); others are the
result of query optimization problems by the DBMS vendors. Some vendors,
such as Oracle, actually support multiple modes of query optimization. One
popular mode used frequently by the Documentum Server relies on the
Oracle rule-based optimizer. This optimizer picks a data access plan based on a
sequence of rules, not on the statistics of the tables (the number of rows). As
Docbases get larger, using a rule-based optimizer can lead to some costly
mistakes that cause table scans. On large Docbases, a cost-based optimizer
might deliver better access plans because it can determine table size and
column value distributions. However, cost-based optimizers are not
guaranteed to pick the best data access plan. Even with the table statistics,
they can mistakenly pick an access plan that leads to more disk I/O.
Its not too difficult to switch between the Oracle rule-based optimizer and a
cost-based optimizer (like ALL_ROWS). (The global parameter is set in the
init.ora file.) If you use a cost-based optimizer but the tables have no statistics,
Oracle uses the rule-based optimizer instead. Therefore, in Oracle, tables
could be selectively moved from rule-based to cost-based optimizing if the
site doesnt generate statistics for the tables.
Databases like Sybase and the Microsoft SQL Server always use cost-based
optimization and need to have statistics for the various tables.
DBMS Buffer Cache Memory Effect on Disk I/Os
The DBMS data cache (illustrated in Figure 4-11) holds images of rows that
were read from or written to disk. If the DBMS can find the data rows in the
cache, it does not need to read from the disk. Consequently, there is an inverse
relationship between the amount of data in the cache and the demands on the
disk.
Figure 4-11 Illustration of a DBMS Cache
Theoretically, if memory is large enough and the Docbase small enough,
nearly all of the Docbase attribute information may fit in memory, almost
eliminating the requirement for disk I/O. However, in reality, all attribute
information for large Docbases cannot fit in physical memory. So, the DBMS
will keep some information and remove other information from the cache as
needed. A Least Recently Used (LRU) algorithm keeps the data most often
referenced in the cache and lets the least referenced go. This also helps
minimize disk I/O because the most referenced data stays in the cache.
Is data in
cache?
Yes! No
need for disk
I/O
No! then go
out to disk
and get it.
DBMS data cache
Table 4-25 shows disk I/O ranges for some Docbase sizes based on the EDMI
workload. The workload does not have any table scans; however, this
represents disk I/O statistics for various database vendors under different
memory configurations.
Disk Striping and RAID Configurations
Disk striping distributes a piece of data over separate disks. When the data is
requested, all of the disks participate in returning the data.
It might seem that retrieving data from multiple drives would take longer
than retrieving data from one drive. However, given that the data is typically
large (many Kbytes), it is unlikely that retrieving the data from a single disk
could be accomplished in a single operation anyway. If you stripe the data, the
drives work in parallel, thus allowing the operation to happen much more
quickly.
The striping logic of an operating system (or of disk array firmware) makes a
group of disks appear as a single disk or volume. The operating system will
break up a disk request into multiple disk requests across multiple drives.
Figure 4-12 illustrates how striping works.
Table 4-25 Ranges of Disk I/Os vs. Memory Used vs. Docbase Sizes vs. Users/Hour
Docbase Size
(Megabytes)
Disk I/Os per second DBMS Memory
(Gigabytes)
EDMI Users/Hour
1 60 - 200 1 - 2 900 - 1500
2 - 2.5 60 - 300 1 - 2 800 - 3000
5 410 - 2000 1.5 - 3 2000+ - 4000+
Figure 4-12 Disk Striping Concept
A key component of disk striping performance is the size of the data stripe
(data block). The smaller the stripe, the more parallel drives that might used
for a single I/O. The more parallel drives, the better the performance.
However, if the stripe is too small, the overhead of dealing with the stripe
exceeds any performance gains from the striping. If the stripe is too large, then
I/Os are likely to queue up on a single drive.
To illustrate, using an extreme example of poor stripe size from the DBMS
world, suppose an administrator has many individual disks and stripes the
data by creating multiple table space files across the independent disks. The
administrator puts a single, large, sequential portion of the table space on each
drive. If a request is made for a portion of the table, it is likely that the I/Os
are going to be concentrated on a single drive in an uneven fashion.
In general, RAID0 (striping without parity), as described above, outperforms
tablespace or DBMS device-level disk striping. However, RAID0 has some
disadvantages. If a single drive fails, the entire stripe set fails. That is, four
disks in a RAID 0 stripe set (or logical drive) have a shorter mean time before
failure (MTBF) than a single drive, because any one of the four physical drives
can bring the logical drive down.
Striping Logic
Striping Logic
Retrieve the data in 4
sequential I/Os
Retrieve the data in 4
parallel I/Os
There are two major ways to protect performance yet maintain reliability:
mirroring and striping with parity. With mirroring (or RAID 1), the data is
written to two drives. If one drive fails, the data can be read from the other.
When data is striped over a set of mirrored drive pairs, the configuration is
called RAID1+0 or RAID10.
In striping with parity (or RAID5), parity information is written out in
addition to the data. The parity information can be used to recreate data if a
drive fails. The parity information for a write operation is always written to a
drive that does not contain the data that generated the parity code. The parity
information can be written to any drive in the configuration that doesnt
contain the data that generated the parity code. For example, suppose there
are four drives on which data is striped. One write might put data on drives 1,
2, and 3 with the parity information on drive 4. Another write might put the
data on drives 1 and 2, with the parity information on drive 3.
The disadvantage of a RAID1+0 configuration is the cost of the additional
drives. The disadvantage of a RAID5 configuration is the extra I/Os needed to
write out the parity information. In general, the access penalty for RAID5 is
fairly severe for DBMS files. It can provide decent performance for Docbase
content.
Disk Storage Areas
A Documentum installation has many different disk storage areas that need
sizing. Table 4-26 outlines these storage areas and some of their characteristics.
Table 4-26 I/O Characteristics for Documentum
Data Area Used For Size Recovery
Characteristics
I/O Activity Advice
DBMS and
Documentum
Server backup
Backup copies
of metadata and
content files
Same size as
content plus
DBMS data
typically Gbytes
Hard to recover Sequential
writes, but
infrequent
usage
RAID 5
Documentum
content and
fulltext index
Actual online
content plus full
text index
Gbytes Hard to recover Random read/
write for small
files and
sequential for
large files
RAID5 or
mirrored
pairs bound
into a RAID 0
stripe (RAID
1+0)
Documentum
eContent Server
temp storage
area
Intermediate file
transfer area
Typically less
than 100M bytes
Easy to recover Random read/
write for small
files and
sequential for
large files
RAID5 is
acceptable
DBMS
transaction logs
Ensuring DBMS
operations
remain stable
after a failure
Mbytes to
Gbytes
Hard to recover Sequential
writes
Mirrored
pairs bound
into a RAID 0
stripe (RAID
1+0)
DBMS data &
index
Holding the
document meta-
data
Gbytes Hard to recover Random read/
write
RAID 1+0
preferred
DBMS temp &
Oracle rollback
segments
Temp is used for
DBMS sorting
and worktables;
rollback seg-
ments are used
for transaction
aborts
Mbytes to
Gbytes
Easier to recover
and rebuild than
DBMS data and
indexes
Sequential and
random writes
(some reads)
Mirrored
pairs bound
into a RAID 0
stripe (RAID
1+0)
RightSite temp
& log storage
area
Temp is used as
an intermediate
file transfer
area; log storage
area stores log
files
Hundreds of
Mbytes
Easy to recover Random read
and write for
small files and
sequential for
large files
RAID5 is
acceptable
RightSite DMCL
cache
Per session file
cache (what
would have
been on client
machines in a
client server
environment)
Hundreds of
Mbytes
Easy to recover Random read
and write for
small files and
sequential for
large files
RAID5 is
acceptable
Characteristics
I/O Activity Advice
Disk Space Sizing
This section provides a formula for calculating disk space needs.
Physical Disk Requirements of the Documentum Software
Components
Table 4-27 lists the disk space requirements for the servers in the system. The
figures in this table are based on test Docbase scenarios. The actual
requirements for a particular Docbase will vary based on such factors as
subtype depth, number of attributes, size of each attribute value, number of
repeating values, and overhead for non-document objects.
RightSite
Docbasic
compiled disk
cache
Pre-compiled
Docbasic scripts
Mbytes Easy to recover Random read
and write for
small files and
sequential for
large files
RAID5 is
acceptable
Internet Server
log files
Per-operation
logging by
Internet Server
Tens of Mbytes Easy to recover Random read
and write for
small files and
sequential for
large files
RAID5 is
acceptable
OS paging/
swap files
Holds the pages
of a process
working set no
longer needed
in memory
Gbytes Painful to
recover (OS will
have hard time
to boot)
Mostly
sequential
writes
Suggest RAID
1+0
Characteristics
I/O Activity Advice
Table 4-27 Physical Disk Requirements for Server Software in Documentum System
Server Requirement
RDBMS (not including Table Space) 100 MB
Typical Disk Space Calculation Model for Content and Attribute
Data
You can use the following formula to estimate disk usage for document
content and the metadata:
5K Per Object x Number of saved versions stored in the RDBMS table
(Document Object Data) +
Document Size x Number of saved versions stored in the Docbase
(Document Content) +
Rendition Size x Number of saved versions stored in the Docbase
(Renditions) +
30% of the original document size x Number of saved versions stored in
the Docbase (Fulltext Indexes) +
2.5K x Number of saved versions stored in the Docbase (Annotations)
Expressed mathematically, the formula is:
((5k * number of versions) +
(Document Size * number of versions) +
(Rendition Size * number of versions) +
((30% * Document Size) * number of versions) +
(2.5 * number of versions)) * Total Number of Documents
= Total Disk Capacity
(Note that not all documents require versions, renditions, annotations or full-
text indexes. You can configure a system to prune version and rendition trees
and annotations).
Use the Excel Spreadsheet to automatically calculate the estimate.
Base Documentum Objects
(including RDBMS Table Space)
50 MB
RightSite 60 MB
Table 4-27 Physical Disk Requirements for Server Software in Documentum System
Server Requirement
Database License Sizing
Additional Considerations
Documentum does not impose additional overhead on content storage in the
file system beyond the actual size of the content file unless the contents are
full-text indexed.
If a Docbase is participating in object replication, there must be available disk
space to execute the requirements of the replication job.
Similarly, disk space must be available when a dump and load is performed.
Additional References
System Administrators Guide, Chapter 11, Tools
Docbase Design Principles Customer Training Course.
If your organization has an existing Docbase, you can generate pertinent
information regarding disk utilization (and more) by executing eContent
Servers State of Docbase system administration tool. This tool is described in
the System Administrators Guide, Chapter 11, Tools.
Database License Sizing
RDBMS user licenses are an important component of the cost of a deployed
system. Database vendors typically have three different licensing schemes:
s Per seat licensing (fee for each possible named user of the system)
s Concurrent user licensing (fee for the maximum number of concurrent
users on a system)
s Per CPU licensing (fee per CPU used during production use)
Per CPU licensing is common when associated with Internet applications
In most cases, Documentum works well with concurrent user licensing.
Because user interaction with the Documentum environment is likely to be
erratic (get document, read document) and eContent Server closes inactive
sessions, the number of concurrent DBMS users is determined by the number
of active Documentum sessions. The number of active Documentum sessions
Certified Database and HTTP Server Versions
is a percentage of the number of users supported per hour. For example, in the
EDMI workload, the number of active Documentum sessions was about 17 to
20 percent of the total EDMI users supported per hour.
Consequently, you can estimate the number of active sessions. Increase this
value to account for peak periods (for example, estimated number + 20
percent extra).
Some notes on concurrent user licensing and Documentum:
s For some implementations (for example, Oracle), multiple DBMS sessions
will be created for some user activities. Consequently, the actual number of
concurrent DBMS users will be greater than the number of active
Documentum users. Typically, the actual number is approximately one and
half times the number of active users. In these cases, however, the DBMS
vendors typically do not charge extra for multiple sessions per
Documentum user.
s Although shortening the eContent Server client session time-out reduces
the number of active sessions at any one time, it also causes more frequent
reconnections, which drives response time up. For remote users, this
penalty might be severe. You might actually need to increase the client
session timeout to keep remote users happy.
For anonymous RightSite users, per-CPU licensing might be more
appropriate. In this case, the number of active Documentum sessions created
by the pool of anonymous RightSite servers is typically fairly small. However,
each of those active servers might perform numerous operations. This fits well
with a per-CPU licensing scheme.
Please refer to the eContent Server release notes and the RightSite release
notes for information about the certified RDBMS versions and HTTP server
versions.
5
Server Network Configuration
Guidelines 1
This chapter contains guidelines for sizing and configuring the network
between multiple Documentum sites. The following topics are covered:
s Overview of Network Sizing on page 5-1
s Key Concepts for Network Sizing on page 5-2
s Making the Decision: Localizing Traffic or Buying More Bandwidth on
page 5-6
s Additional Specific Network Recommendations on page 5-12
Overview of Network Sizing
Sizing the network between Documentum sites is principally a matter of:
s Sizing bandwidth needs for servers and users
s Determining locations for servers
First, you must understand network bandwidth needs. Sometimes the need
for bandwidth between remote users and their server is so great that it makes
more sense to relocate the server (or just content) closer to users to improve
response times and drive down telecommunications costs.
The main issue that affects remote user response time is content size relative to
available bandwidth. The second issue is the number of operations that have
to take place between a client and the server and how much data is generated
during those operations.
Figure 5-1 illustrates the steps for configuring your network resources.
Server Network Configuration Guidelines
Key Concepts for Network Sizing
Figure 5-1 Steps To Configure Network Resources
Before you can make decisions about sizing and configuring the network, you
must understand several key concepts about networks and how they affect
sizing. These key concepts are:
s Bandwidth and Latency on page 5-3
s Bandwidth Needs and Response Time on page 5-4
1. Estimate Number of users/hour
2. Estimate operations per user
3. Estimate the bytes per operation
4. Estimate document size
5. Determine locations of users
Determine network demand per user
community as if they had their own
Determine possible geographic
locations for servers
Rework network load based on
geographic locations. Adjust if
network demand outstrips budget.
Bandwidth and Latency
Bandwidth describes the available throughput of the various components in a
network. It is usually measured in Bytes/sec or bits/sec. The factors affecting
bandwidth are:
s Transmission speed
s The amount of external traffic using the media
To illustrate how external traffic affects bandwidth, lets look at a single-lane
freeway with a speed limit of 60 miles per hour. Optimal bandwidth allows
the cars to go at the speed limit (that is, a car could cover 60 miles in an hour).
Actual bandwidth is likely to be far less due to rush hour traffic or accidents.
For example, the traffic on the freeway may have an average speed of 20 miles
per hour during rush hour. Available bandwidth is diminished by external
traffic forces, in this case, additional cars.
Data transfer latency is the time it takes to send data between two hosts. The
factors affecting latency are:
s Processing time on the respective host machines
s Propagation delay across the transmission media
s Available bandwidth for each network between the two hosts
s Processing time for all routers and gateways between the two hosts
To continue with the above example, latency is the time it takes to get from
one place to another. The distance from Pleasanton, California to the Golden
Gate bridge in San Francisco might only be about 50 miles, but the delays
caused by the toll bridge between Oakland and San Francisco and the various
traffic lights in San Francisco are likely to make that trip take longer than 1
hour (Figure 5-2).
Figure 5-2 Example Of Bandwidth vs. Latency
When you apply the concepts of latency and bandwidth to sizing a real-world
network, you must be sensitive to the various components in the network and
how the components affect the response time of a Documentum
implementation.
Bandwidth Needs and Response Time
There are two ways to provision network resources based on network load.
You can:
s Optimize for average network demand
s Optimize for online response time
To optimize for average demand, you need only to determine the number of
bytes transferred in a busy hour and ensure that there is enough bandwidth to
meet that demand.
Unfortunately, optimizing for network demand often leaves online users with
poor response time. For example, suppose only 5 Mbytes of data must be
transferred between two sites in one hour. A 56K bps link should provide
sufficient bandwidth for users to get good response time (5M bytes per hour is
only 20 percent of the total amount of bytes that could be transferred on a 56K
bps link in 1 hour). However, that isnt the case. Network demands of online
users are characterized by small bursts separated by long pauses, and
response will be judged as good or bad by how long it takes to service one of
those bursts. Figure 5-3 illustrates the nature of the network demand.
50 miles distance
65 miles/hour on HW 580
(Bandwidth)
25 miles/hour on HW 101
in San Francisco
Golden Gate Bridge
In San Francisco
Pleasanton, CA
Latency for trip from Pleasanton to Golden Gate Bridge >= 1 hour
Figure 5-3 Example of Bursty Network Load Caused by Online Users
As an example, suppose that a particular command (such as the display of a
dynamically rendered Web page) transfers 80,000 bytes of data. With a 56K
bps line, the command will take 11 seconds to complete at best and will be
judged by the users as a command with poor response time. At 256K bps, the
response can go down to approximately 2.5 seconds and be considered a
command with acceptable performance. Now suppose that 25 users issue the
above command 10 times in 1 hour. Their average network demand for the
hour is 4.8M bytes (80,000 bytes x 6 commands x 10 users), which is only 20
percent of the 56K bps bandwidth and 4 percent of a 256K bps bandwidth.
Although this seems to indicate that an enormous amount of bandwidth is
needed for users to achieve any level of decent performance, it turns out that
because there are large pauses in the use of the line, more online users can
share bandwidth due to the random nature of their requests. That is, the 256K
bps line could also likely serve an additional 70 users at the same number of
requests with good response time (which would drive the average usage of
the line to about 30%).
Typically, it is not cost effective to provide so much bandwidth so that all
commands can complete in 3 seconds or less. Rather, it is important to try to
ensure that the most frequently performed operations have good response
times. For example, if users log into their application only once an hour, then a
login that takes 10 seconds over some amount of bandwidth is much less
Example of bursty Network load caused by
online users
0
0.2
0.4
0.6
0.8
1
time (secs)
%

b
a
n
d
w
i
d
t
h

u
s
e
d
unused
bandwidth
longer the burst, the poorer the
response time
Making the Decision: Localizing Traffic or Buying More Bandwidth
annoying than some other command that takes 10 seconds and is run 10 times
in the hour for each user. Determining how much bandwidth to allocate for a
particular Documentum application focuses on trying to make the most
frequent commands run the most quickly. This can be achieved by adding
more bandwidth or by choosing a Documentum distributed option to localize
network requests. Making the Decision: Localizing Traffic or Buying More
Bandwidth, discusses those decisions.
Making the Decision: Localizing Traffic or Buying
More Bandwidth
When you are sizing and configuring a network between Documentum sites,
the goal is to achieve good user response time at minimum cost. This section
describes the trade offs between bandwidth and a variety of Documentum
options, including:
s Remote Web servers
s Content servers
s Object replication
For all options, there is a trade off between purchasing more bandwidth and
localizing access by putting a server at the remote site. No single formula can
be applied to make the evaluation. Bandwidth costs differ by region and by
proximity to service and telephone facilities. There are also different choices
for the types of server machines to put at the remote site, and maintenance
and staffing costs for that software and hardware will also differ by region.
Lets illustrate the cost trade offs with a hypothetical example. Suppose a
remote office supports 25 users. Analysis determines that without any special
servers at that site, users need a bandwidth of 700K bps to achieve good
response time. However, locating a special server at the remote office would
reduce the traffic demands so that a bandwidth of 128K bps would provide
good response time.
The assumed costs associated with the remote server machine include:
s Server host (2 CPUs, 1GB memory, 2 internal disks, monitor, keyboard,
tape drive, and rack): $15,000
s Software costs (OS, etc): $1,500
s Tape backup costs: $200 per year
s Other administrative costs: $1,000 one-time fee for training remote admin
(who already supports the users at the remote site) + $2,000 one-time fee
for initial setup
In this example, the remote server machine requires fairly little day-to-day
administration, and the administration can be done remotely from the central
site. Additionally, the example assumes that the service will be in production
for three years and that power costs are negligible. These assumptions bring
the total charge for the remote machine to about $20,000.
Lets also assume that the remote office has access to a frame-relay service
provided by the local telephone company and that the frame-relay service
charges include both port prices and access facility charges. These port prices
are shown in Table 5-1.
Table 5-2 shows the access facility charges.
Table 5-1 Example Frame Relay Port Prices
Speed Installation Fee Monthly Fee
56K bps $354.67 $70.93
128K bps $354.67 $141.87
384K bps $354.67 $378.32
1.544M bps $354.67 $472.90
37M bps $1,418.69 $4,539.79
Table 5-2 Example Frame Access Facility Charges
Speed Installation Fee Monthly Fee
56K bps $597.37 $47.41
128K bps $600.69 $165.94
384K bps $600.69 $165.94
1.544M bps $600.69 $165.94
Table 5-3 shows the cost of setting up and using frame relay from the central
site to the remote office for bandwidths of 1.544M bps and 128K bps.
In this example, a remote server saves the company $3,000 over three years.
Such a savings is less than 10 percent. If the remote office is in a different
country or a cost-effective frame-relay service cant be used, then using the
remote server could add up to significantly more savings.
The following sections discuss how to determine the bandwidth needs for a
variety of deployment options. The Documentum Sizing Spreadsheet also
contains information about this topic.
More Bandwidth or Remote Web Servers
Documentum is an N-tier server architecture, and each tier employs a
different protocol with different networking characteristics. In Web-based
deployments, the browser-to-Web server protocol (HTTP/HTML) is likely to
be the most verbose. Depending on the application, the HTML could be 20
times more verbose than the corresponding DMCL. Consequently, if remote
users are centralized in a single office, it might be best to locate a remote Web
server in that office to achieve better response times. Doing so takes
advantage of the fact that HTML requires more bandwidth than the
corresponding Documentum DMCL protocol between the Web tier and
eContent Server. Because there is little state to maintain on a Web or
application server machine, the cost to use a remote Web or application server
machine is low.
Table 5-3 Total Cost for Three-Year Period for Two Line Sizes
Charges 128K bps 1.544M bps
Port installation fee $354.00 $354.00
Port charge for 3 years $5,076.00 $16,992.00
Access installation fee $600.00 $600.00
Access charge for 3 years $5,940.00 $5,940.00
Total for a single site $11,970.00 $23,886.00
Total for both sites $23,940.00 $47,772.00
Figure 5-4 Bandwidth and Remote Web Servers
Content Transfer Response Time: More Bandwidth or Content
Servers
Suppose that the Web server is located at the remote site, but performance is
still poor (or expected to be poor) given the bandwidth provided. The next
item to improve is the time required to transfer content (or files). You can add
more network bandwidth (for example, upgrade a 56K bps link to a 128K bps
link) or add a Documentum content server at the remote site to localize
content access. Deciding which strategy to employ will depend on the relative
costs of each, and the costs are likely to differ based on geography.

Verbose HTML from
Web/App server top Browser
Less verbose
DMCL between
Web/App server
and eContent
Verbose HTML
from Web/App
server top Browser
Less verbose
DMCL between
Web/App server
and eContent
Using Documentum content servers is quite attractive when bandwidth is
very expensive and users typically access large files. The administration and
hardware needs of a Documentum content server are fairly small. A content
server only requires additional disk space for content (beyond the needs of a
regular Web or application server) and it can be administered remotely.
Figure 5-5 illustrates the use of content servers.
Figure 5-5 Bandwidth and Remote Content Servers

DMCL + Content Transfer
Docbase is remote.
Response time can be
improved by
increasing bandwidth
DMCL only
Content is local and
only operation traffic
is sent to the remote
eContent server.
Content served locally by Content
Operation Response Time: More Bandwidth or Replication
The trade off between bandwidth and local server machines also applies to
DMCL operations issued from remote application servers or from remote
Desktop Client users. That is, although content and Web access are moved to
the remote site, users will still get poor response time. In such cases,
Documentum object replication might be the best solution for the enterprise.
With object replication, all or part of a Docbase is replicated to the remote site.
The replication process happens during off-peak hours, and consequently,
remote users almost always interact with their local servers and get great
response time. The costs of object replication include setting up a remote
RDBMS, the overhead of replication administration, and the overhead of
replica update latencies (nightly updates). Figure 5-6 illustrates how
bandwidth and object replication interact.
Figure 5-6 Bandwidth and Object Replication
Replication update at night
Fast online access of replicated data during the day
Additional Specific Network Recommendations
Additional Specific Network Recommendations
s Keep roundtrip ping time between servers in a distributed server
configuration to below 250 milliseconds. Roundtrip ping time measures
network latency.
s Land-based communication is better than satellite communication due to
the physical distance data travels using satellite communication.
s Consider adding additional network interface cards to each server to
prevent network saturation.
s To handle large images across the network, consider a second network
card and higher speed media (for example, 100Mbit Ethernet not 10Mbit)
s Place the WAN between the client and eContent Server or the Web Browser
and RightSite (if possible).
s Do not place a WAN between eContent Server and the RDBMS (if
possible).
s Determine whether you can use Documentums network compression
feature to optimize network performance between the client and server.
6
Sizing for Client Applications 1
This chapter describes sizing considerations and requirements for
Documentum clients. The following topics are covered:
s Sizing for Desktop Client on page 6-1
s Sizing for AutoRender Pro on page 6-5
s System Requirements for Client Products on page 6-6
Sizing for Desktop Client
CPU speed and memory are the main resources that must be sized properly
for Documentum Desktop Client. It is also important to size the disk space
and network resources correctly, as these can have a profound impact on
response time.
CPU Speed
Desktop Client offers a wide array of operations and functionality. Generally,
the richer the feature set, the more CPU processing is required. A typical large
enterprise will deploy a range of functionality over a PC population that
varies from very slow (for example, 166Mhz) to extremely fast (for example,
800Mhz). There is a direct relationship between the speed of a machine and
the number of features that can run on the machine with acceptable response
times. Figure 6-1 illustrates this relationship.
Sizing for Client Applications
Figure 6-1 CPU Needs vs. Features used for Desktop Client
Basic usage represents a fixed set of basic operations carried out on
documents that are not extraordinarily large. For example, navigating folders
and checking documents out and in are fairly basic operations and should be
achieved with acceptable performance on a 166MHz server. However, if the
folders in the navigation path have a large number (hundreds) of subfolders
and documents or the documents checked out and in are many megabytes in
size, the 166MHz server probably wont be fast enough to get acceptable
response time.
Table 6-1 shows some example response times for a set of basic office
integration operations on two different servers. (Response times were
measured for the steady state invocation, not the initial operation.)
160Mhz 300MHz 600Mhz
Basic usage: Navigation,
checkout/in, workflow, no OLE
linked documents
Additional usage scenarios:
Business Policy, etc
Advanced usage: OLE Link
processing, XML, large document
processing, custom validations
Table 6-1 Response Time on Two CPUs
Operation Response Time (in seconds) on
400Mhz CPU 166Mhz CPU
Launch App 0.42 1.93
Open Dialog Box 1.58 3.73
Open Small Doc (70K bytes) 2.73 4.71
Simple usage of custom validations (user-defined attribute integrity checks
made when importing a document or changing its attributes) is unlikely to
provide good response time with slower machines. And the more advanced
the validation, the more processing is required.
Advanced usage features are those that not only provide more sophisticated
functionality (such as the conversion of OLE linked documents or XML
documents into Documentum virtual documents) but also process large
documents (large in size or in the number of virtual documents created). For
example, the response time for an XML document chunked into thousands of
nodes will be much higher than the response time for one chunked into a few
nodes. Similarly, checking in a document with hundreds of OLE links will
take longer than checking in a document with a single OLE link. Finally,
checking in a 40 Mbyte Powerpoint document takes longer than checking in a
2 Mbyte Powerpoint document.
As the documents get larger and more disk I/O is performed at the PC, disk
performance becomes a larger issue. SCSI drives are typically better choices
for PCs that will be using advanced features than EISA drives.
Component Initialization and Steady State Processing
Some operations take longer the first time they occur in a Desktop Client
session than on second or subsequent executions, due to component
initialization. Those operations display dialog boxes that use data that must
be initialized the first time the dialog box is displayed. Once the start-up
penalty is paid, the data is cached for the duration of the session.
Save Small Doc (70K bytes) 2.74 4.05
Open Medium Doc (300K bytes) 3.99 5.81
Save Medium Doc (300K bytes) 3.01 5.56
Open Large Doc (4M bytes) 4.98 8.58
Save Large Doc (4M bytes) 5.40 12.71
Table 6-1 Response Time on Two CPUs
Operation Response Time (in seconds) on
400Mhz CPU 166Mhz CPU
Response times when an operation must initialize components is longer than
when the operation is executing in a steady state (using cached data). Table 6-
2 lists some example initialization and steady-state response times for a set of
operations (on a 400MHz CPU).
If initialization response times on a system are unacceptable, you can reduce
them by increasing the CPU speed of the Desktop Client machine. (Note that
this will also reduce the steady-state response time. There will always be some
difference between initialization and steady-state response times.)
Memory Resource Needs
This section describes memory resources needed for Documentum Desktop
Client and integrations.
The base Explorer integration (using basic features) can run using 64M bytes.
However, actual memory resource needs will vary, depending on:
s The level of features used
s The type of applications used
s Size of the documents being processed and the folders being navigated
Using advanced features, such as converting OLE links to documents, could
require 96 Mbytes and more of memory.
When using the MS Office integrations, the memory required is about 10
Mbytes for each application invoked. (The Office integrations integrate
Desktop Client with MS Word, PowerPoint, and Excel.) If all three are invoked
and are running at the same time, the system requires an extra 30 Mbytes of
memory.
Table 6-2 Example Response Times Comparing Initialization and Steady State Operations
Operation Response Time (in seconds)
First 2nd and Subsequent
Open Dialog Box 10.40 1.58
Open Sm Doc 6.63 2.73
Save Sm Doc 3.65 2.74
Sizing for AutoRender Pro
Sizing for AutoRender Pro
AutoRender Pro services rendering requests issued by Documentum clients to
convert documents from one format into another. For example,
WebPublisher uses AutoRender Pro to automatically convert MS Word
documents to HTML when they are checked in. AutoRender Pro is a
single-threaded server application that runs on a server machine separate
from eContent Server. Figure 6-2 illustrates how AutoRender Pro works with
eContent Server and documents.
Figure 6-2 AutoRender Pro in Conjunction with a Docbase
The factors affecting response time for a single rendering job are:
s The CPU speed and disk access speed of the AutoRender Pro server
s The complexity of the work
AutoRender Server machine
Convert
Document
Converted
Document
checked into
Docbase
Queue of
Pending
Rendering
Requests
Document
brought to
AutoRender
Server disk
EContent server machine
System Requirements for Client Products
To convert a document, AutoRender Pro must parse and rewrite the
document to a different format. The conversion is CPU intensive, and
document size, complexity (graphics and so forth), and format affect the
amount of resources needed for the job. If the document is large, physical disk
I/Os occur on the AutoRender drives as the documents temporary version is
created. In fact, performance studies show that during the conversion process
both the CPU and the disk are quite busy. If the CPU is slow or if the drives or
their controllers have poor response time, rendering jobs will take longer to
process. Consequently, using good disk controllers, fast drives, and fast CPUs
(> 600MHz) for the AutoRender Pro server machines is strongly
recommended.
Memory requirements are minimal (refer to the release notes for actual
figures).
Multiple AutoRender Pro servers can be set up for a single Docbase. Each
server pulls work requests off the same work queue. Adding more servers
increases the rendering capacity, allowing more users to enter requests
simultaneously while maintaining the desired response time.
Note: It is possible for a single Auto Render Pro server to support multiple
Docbases.
System Requirements for Client Products
Refer to the current product release notes or installation guides for
information about system requirements and certification information. That
documentation is available on the Documentum ftp site, the Documentum
Support Web site, and in the Documentation Library in dm_notes.
Documentum System Sizing Guide A1
A
Additional Workloads A
This appendix contains descriptions of workloads not included in Chapter 2.
It includes the following topics:
s The EDMI Workload on page A-1
s The Web Site Workload on page A-6
s The Document Find and View Workload on page A-9
s The Online Customer Care Workload on page A-9
s Comparing and Contrasting the Workloads on page A-14
s Operations Not Included in Workloads on page A-17
The EDMI Workload
The EDMI workload represents a common software configuration for
Web-based Documentum deployments. The workload uses the following
products:
s eContent Server
s RightSite

Server
s Docbasic
s SmartSpace Intranet Client
The users in this workload are named users accessing the Docbase through
Internet browsers and SmartSpace Intranet Client. The software architecture
for the workload is illustrated in Figure A-1.
Additional Workloads
The EDMI Workload
A2 Documentum System Sizing Guide
Figure A-1 EDMI Software Architecture
This architecture is capable of supporting parallel multi-tier servers (multiple
RightSite and eContent Servers). The architecture allows maximum scalability
and adaptability by ensuring that storage, network, and server capacity can be
added to increase performance and throughput as more clients are added to
the network. A company can easily scale any tier to accommodate growth and
change. The architecture also provides the flexibility for optimizing server and
client hardware.
Workload Scenario
In this workload, a large number of named users work with documents
representing standard operating procedures, work instructions, human
resource notes, corporate Web pages, or other information that must be read
prior to performing some task.
There are three different groups of users: contributors, coordinators, and
consumers. Contributors access and modify documents and submit modified
documents for review using a workflow. Coordinators are a type of
contributor with coordinating workflow tasks. Consumers only read the
documents. Consumers sometimes access documents by logging onto the
system explicitly and at other times only access public Web pages.

WEB
Server
(e.g.,
Microsoft
or
Netscape)

Documentum
RightSite
Server

Documentum
DocPage
Server
RDBMS

OS file
System
Documentum
SmartSpace
(in Browser)
Documentum
SmartSpace
(in Browser)
Browser to
Dyn & static
Web pages
The EDMI Workload
Typically, in the type of Web-based deployment modeled by this workload,
the largest number of users are consumers. Consequently, in the workload, 80
percent of the user population are consumers and 20 percent are contributors.
Workload Operations
The operations in the workload include:
s Locating a document through a folder search and viewing it
s Checking out and checking in documents
s Workflow processing (Inbox, routing, and so forth)
s Virtual document processing (publishing)
s Accessing static Web pages stored in the Docbase
s Accessing dynamic Web pages that query the Docbase
Table A-1 lists the operations in the workload.
Table A-1 Operations in the EDMI Workload
CONN Starts a Docbase session through SmartSpace
Intranet.
FOLDER_SRCH Searches folder by folder, eventually displaying a
selected document.
STATIC_HTML Accesses a sequence of Web pages. 20 percent of
the pages are dynamic. 80 percent of the pages are
static (stored in the Docbase).
VDM_PUBLISH Constructs a virtual document from its
components and then displays the document to
the user. Each document has 10 components of 2K
each.
VIEW_INBOX Displays the users Inbox.
CHECKOUT_DOC Checks out a previously selected and viewed
document.
CHECKIN_DOC Checks in an edited document.
The EDMI Workload
When a benchmark test is run, the primary metric obtained is the number of
users who can be supported with acceptable response times.
In the EDMI workload, each user type (contributor, coordinator, consumer)
performs a specific number of random tasks (operations) at random times
during the hour, and the response times for these tasks are measured. Each
task typically consists of dynamically generating several HTML screens from
RightSite.
after some amount of time (typically two to five minutes, settable by the
administrator). Re-establishing the session (which happens transparently
when work is initiated on an idle session) consumes more CPU resources.
Simulating this behavior in the test more accurately models the real word.
The acceptable response time is, generally, no more than one to two seconds
per screen. Table A-2 lists the response time requirements for the EDMI
workload. Table A-3 shows a sample set of results.
SUBMIT_REVIEW Submits a modified document to a review
workflow, for a technical and compliance review.
This is a contributors task.
ASSIGN_REVIEW Assigns a document to be reviewed for technical
or regulatory compliance. This is a coordinators
task.
FORWARD_REVIEW Forwards a document to the next activity in the
review workflow.
Table A-1 Operations in the EDMI Workload
The EDMI Workload
Table A-2 Response Time Requirements for the EDMI Workload
Operation Number of
Screens
Acceptable
Response Time
per Screen
(in seconds)
Total
Acceptable
Average
Response Time
(in seconds)
CONN 3 2 6
FOLDER_SRCH 5 2 10
STATIC_HTML 5 1 5
VDM_PUBLISH 1 6 6
VIEW_INBOX 1 4 4
CHECKOUT_DOC 2 2 4
CHECKIN_DOC 3 2 6
SUBMIT_REVIEW 3 3 9
ASSIGN_REVIEW 3 3.3 10
FORWARD_REVIEW 2 2.5 5
Table A-3 Example Response Times for the EDMI Workload
Operation Average
Response
Time
(in seconds)
Total
Acceptable
Average
Response Time
(in seconds)
Total
Operations in
One Hour
CONN 2.83 6 2268
FOLDER_SRCH 7.26 10 3860
STATIC_HTML 1.87 5 2981
VDM_PUBLISH 2.3 6 3025
VIEW_INBOX 0.65 4 1742
The Web Site Workload
Workload Scaling
The Docbase size is increased as the number of users in the workload
increases. To support larger numbers of users per hour, many more
documents are preloaded. There are at least 650 documents in the Docbase for
supported user. However, in most benchmarks with this workload, there are
well over 2,000 documents per user. Consequently, as more users are added to
the test, the queries become more expensive to perform. The documents range
in size from 2 Kbytes to 1 Mbyte.
The Web Site workload simulates a common deployment scenario for some of
our Web-based implementations. It uses eContent Server and RightSite Server
and incorporates RightSites ability to retrieve HTML content from the
Docbase transparently for the user. Companies often use RightSite in this
manner to obtain better security and version control for their Web content.
The user population is entirely anonymous. Anonymous users dont need to
provide any security credentials in order to access a page. (Tighter security
can be provided by RightSite named users. The EDMI workload uses named
users.)
CHECKOUT_DOC 1.63 4 1397
CHECKIN_DOC 5.7 6 805
SUBMIT_REVIEW 7.17 9 428
ASSIGN_REVIEW 9.17 10 155
FORWARD_REVIEW 2.64 5 241
Table A-3 Example Response Times for the EDMI Workload
Operation Average
Response
Time
(in seconds)
Total
Acceptable
Average
Response Time
(in seconds)
Total
Operations in
One Hour
User access starts at a dynamically generated home page, EDMI_home.htm.
This page includes an HREF reference to all the root-level static Web pages.
(Static Web pages are pages that are stored in the Docbase.) The home page
dynamically constructs the next level of references and serves them as a page
to the user.
The static Web pages are all stored in the cabinet /Website in a folder structure
that is several layers deep. The pages are linked to each other through
<HREF> tags in a hierarchical structure. The structure models the real-world
way of storing Web pages in separate directories (grouped by applications, for
example). The HREF references are five levels deep, with four references per
level. Consequently, in a large Docbase, each root page ultimately points,
directly and indirectly, to thousands of pages.
Each static web page consists of 3000 bytes of random text, followed by up to
four references to other web pages. The pages at the bottom level dont
reference any pages. (Figure A-2, in the next section, illustrates the Web page
structure.)
Workload Operations
There is only one operation in this workload: STATIC_HTML. The
STATIC_HTML operation consists of a sequence of Web page accesses. In 20
percent of the accesses, the page is dynamically generated. In the remaining 80
percent of the accesses, the page is fetched from the Docbase.
The sequence of access operations starts from the home page. A reference is
picked at random from each page, and the operation moves to the next page
until the bottom is reached. Figure A-2 illustrates the access strategy. The
boxes in bold represent the total number of Web pages accessed by one cycle
(six Web page references).
Figure A-2 Static Web Page Example
Workload Response Times
Each user performs five tasks. Each task consists of an HTML screen that is
dynamically generated from RightSite plus the additional static pages. The
tasks are performed at random times and the response time is measured.
Acceptable response time is, in general, considered to be no more than one or
two seconds per screen. Table A-4 shows the response time requirements for
the workload.
The Documentum client session time-out has little effect on this workload
because there is a steady stream of activity to only a few anonymous servers
in the RightSite pool. However, their random activities do affect the CPU

EDMI
home
Root
Of
group #1
Root
Of
group
Root
Of
group #3
2
nd

Level
group
3rd
Level
group
4th
Level
group
5th Level
group #2

2
nd
Level
group #2
2
nd
Level
group #2

3rd Level
group #2

3rd Level
group #2

4th Level
group #2

4th Level
group #2

5th
Level
group
5th Level
group #2

Table A-4 Response Time Requirements for the Web Site Workload
Operation Number of
Screens
Acceptable
Response Time per
screen (in seconds)
Total Acceptable
Average Response
Time (in seconds)
STATIC_HTML 5 1 5
The Document Find and View Workload
resources used, because RightSite spawns more servers to handle the excess
load when all anonymous servers are busy. Random user activity helps ensure
that more servers are spawned during a run, which ensures that the test
models real world activity peaks more accurately.
Workload Scaling
The Docbase size is increased as the number of users in the workload
increases. There are at least 40 Web pages in the Docbase for each anonymous
user supported. However, this workload is usually part of a configuration
used in the EDMI benchmark and, consequently, the static Web pages are only
20 percent of the total number of documents stored in the Docbase. The Web
pages are all 3K bytes.
The Document Find and View Workload
The Document Find and View workload is essentially a small subset of the
EDMI workload, exercised in a client-server environment using WorkSpace
rather than in a Web environment using SmartSpace Intranet. Users locate
documents using attribute searches or by navigating through folders. After a
document is found, it is displayed to the user. Each document is 400K bytes.
The Online Customer Care Workload
The online customer care workload demonstrates Documentum interactive
performance for a large number of users on a very large Docbase (100,000,000
content objects). This workload uses out-of-the-box SmartSpace with a few
customizations. The customizations are centered on querying and setting
custom attributes.
The workload includes the following Documentum products commonly used
in Web-based deployments:
s eContent Server
s RightSite
s Docbasic
s SmartSpace Intranet Client
Workload Operations
In this workload, a large number of users work with documents representing
customer and supplier correspondence or standard operating procedures that
must be read prior to performing some task. The users access the Docbase
through Internet browsers.
The operations mimic the content management needs of businesses such as
insurance or financial services for online customer care. These needs are
defined by large volumes of images (customer correspondence or policy
agreements) and large groups of users who read the documents or modify
their attributes. The operations include:
s Locating a document (through a folder search)
s Inserting documents and notes into the Docbase and viewing them
s Workflow processing (Inbox, routing, and so forth)
s Accessing dynamic Web pages that query the Docbase
Table A-5 lists the basic operations in the workload.
Table A-5 Online Customer Care Workload Operations
CONN Starts a Docbase session using SmartSpace Intranet.
CREATE_DOCUMENT Creates a new document and opens the documents
editor for the user.
CREATE_DOCUMENT_F Checks in the new document.
FOLDER_SRCH Searches through a folder hierarchy for a document.
This is performed by office users.
FORWARD_REVIEW Forwards a document for review in a workflow.
This operation occurs when a workflow or data
entry user completes a task.
QUERY_SRCH This represents a Docbase search based on a
customer ID.
In the workload, the activities performed by a user are made up of the actions
described in Table A-5 grouped in a way that is meaningful for the users
function. Table A-6 describes the user functions and their associated activities.
SET_PROPERTIES Sets the classification attributes of a TIFF document.
Data entry users perform this operation.
VIEW_DOCUMENT Displays a TIFF, Word, text, or Excel document for a
user to view.
VIEW_INBOX Selects a users Inbox icon and displays the next set
of tasks in the Inbox.
Table A-5 Online Customer Care Workload Operations
Table A-6 User Activities in the Online Customer Care Workload
User Function Description of User Activity
Data entry user Data entry users examine correspondence that was scanned
into the Docbase to ensure that attributes are correctly set.
The data entry users work on documents that have been
routed to them through their Inboxes.
A data entry users activity includes the following
operations:
s Check inbox for items to re-attribute (VIEW_INBOX)
s View TIFF image noted by next inbox item
(VIEW_DOCUMENT)
s Change the attributes to fit desired descriptions and
categories (SET_PROPERTIES)
s Complete workflow task (FORWARD_REVIEW)
Workflow users Workflow users review and sometimes annotate documents
routed to them. (The documents are routed prior to the busy
hour.) A workflow users activity includes the following
operations:
s Check inbox for work (VIEW_INBOX)
s View document designated by next item in the inbox
(VIEW_DOCUMENT)
s Complete the task (FORWARD_REVIEW)
s For 30 percent of the users, create a text note associated
with the document (CREATE_DOCUMENT &
CREATE_DOCUMENT_F)
Workflow users continue activities until all items in their
Inboxes are processed.
Office users Office users create MS Office Word and Excel documents. An
office users activity includes the following operations:
s Create Word document (CREATE_DOCUMENT &
CREATE_DOCUMENT_F)
s Create Excel document (CREATE_DOCUMENT &
CREATE_DOCUMENT_F)
s Navigate some folders (FOLDER_SEARCH)
s View a text document (VIEW_DOCUMENT)
Branch users Branch users search for documents using a policy number or
customer ID and then view the documents. A branch users
activity includes the following operations:
s Query for document based on customer ID
(QUERY_SRCH)
s View TIFF image (VIEW_DOCUMENT)
Each user type performs a specific number of random tasks at random times
during the hour, and the response time for these tasks are measured. Each task
typically consists of several HTML screens that are dynamically generated
from RightSite. Acceptable response time is, in general, considered to be no
more than one to two seconds per screen.
after some amount of time (typically two to five minutes, settable by the
administrator). Re-establishing the session (which happens transparently
when work is initiated on an idle session) consumes more CPU resources.
Simulating this behavior in the test more accurately models the real word.
Another important metric is obtained when response times are measured for
operations on the documents in the Docbase. For tests using this workload,
the Docbase is loaded with 100,000,000 TIFF images, to model a multi-user
environment after years of use. To preserve space but still have a sufficient
number of database rows, 90 percent of the content objects have zero length.
Call Center Users Call center users staff phone banks, taking customer calls.
The user receives a call and queries the customers records
based on the customers ID. Sometimes, the call center user
must put some additional information in the Docbase in
response to the call.
s Query for a document based on customer ID
(QUERY_SRCH)
s View TIFF image (VIEW_DOCUMENT)
s For 25 percent of the activities, create a text document
associated with this image (CREATE_DOCUMENT &
CREATE_DOCUMENT_F)
s Otherwise query again once complete.
When the benchmark test starts, an array of policy numbers corresponding to
documents with non-zero length content is loaded. All queries for documents
choose from this array. In this way, only documents with content are queried.
Content sizes range from 50K to 500K bytes.
Table A-7 lists the response time requirements for the workload.
This section compares and contrasts the workloads in terms of the software
architecture, the usage patterns modeled in each, and the resulting resource
consumption.
Software Architecture
The EDMI and Web site workloads use an HTTP thin-client or 4-tier
architecture. In an HTTP thin-client architecture, Documentum DMCL (client
library) processing occurs on the machine that hosts RightSite and the Internet
Server. This is in contrast to the 3-tier architecture, in which client library
Table A-7 Response Time Requirements for the Online Customer Care Workload
Operation Number of
Screens
Acceptable
Response Time per
Screen (in seconds)
Total Acceptable
Average Response
Time (in seconds)
CONN 3 2 6
CREATE_DOCUMENT 6 2 12
CREATE_DOCUMENT_F 5 2 10
FOLDER_SRCH 4 2 8
FORWARD_REVIEW 2 2 4
QUERY_SRCH 3 2 6
SET_PROPERTIES 2 2 4
VIEW_DOCUMENT 1 2 2
VIEW_INBOX 2 2 4
processing occurs on the users PC. With HTTP thin-client architecture, very
little work actually happens on the client machine (all users are assumed to be
using browsers sending HTTP). That is, RightSite performs those operations
that, in a 3-tier architecture, are performed on the hundreds (or even
thousands) of client PCs. Figure A-3 illustrates the difference between 3-tier
architecture and HTTP thin-client architecture.
Figure A-3 Client-Library 3-Tier Architecture vs 4-Tier (HTTP thin-client) Architecture
Usage Models and Resource Consumption
The EDMI and Anonymous RightSite Web Site workload usage models are
opposites.
In the EDMI workload, all the users are named users. Named users provide a
user name and password and then are authenticated and provided with some
exclusive resources. In particular, a separate RightSite process is created for
each named user.
The tasks in the EDMI workload operate primarily on dynamic Web pages.
The named users use RightSite with SmartSpace to generate dynamic pages in
80 percent of the accesses and fetch static pages from the Docbase in 20
percent of the accesses.
Documentum DMCL operations
(3 Tier Mode)

Thin-client HTTP

Internet Server + Documentum RightSite
On Centralized Middle-tier Server

(Documentum DMCL Operations)

In the Web site workload, all users are anonymous users. Anonymous users
dont provide a name or password; they share the anonymous login
configured with RightSite. On the resource side, anonymous users share a
pool of RightSite processes, rather than having their own resources.
The tasks in the Website workload operate primarily on static Web pages. The
anonymous users use RightSite only (SmartSpace is not used) to generate
dynamic pages in 20 percent of the accesses and to fetch static pages in 80
percent of the accesses. This dynamic-to-static Web page profile is opposite
the one used in the EDMI workload.
The difference in the dynamic-to-static Web page profiles means that the
EDMI workload makes much heavier demands on the RightSite Server than
the Website workload does. In benchmarks using the EDMI workload, the
RightSite Server will consume more CPU than any other piece of server
software included in the benchmarks because the dynamic-to-static page ratio
makes heavy demands on the RightSite Server. Figure A-4 illustrates the
relationship between CPU consumption and dynamic-to-static ratio.
Figure A-4 Relationship Between Workloads and RightSite CPU Consumption
The difference in the user profiles means that the EDMI workload will
consume more CPU and memory resources per user than the Web site
workload, because named users consume more resources than anonymous
users.
Document Find and View Workload
The Document Find and View workload is client-server and named. RightSite
is not part of that workload. In addition, although the EDMI workload is Web-
based, in most cases the RightSite portion can be factored out of the data to
allow you to size this workload based on a client/server model.
The following operations are not included in the workloads described in this
appendix:
s Creating PDF renditions
s Creating or searching full-text indexes
s Default SSI attribute searching (case insensitive attribute searches)
s Deleting documents from the Docbase
s Dumping and loading a Docbase
s Distributed content operations and object replication
s Operations on objects in turbo storage
If these are part of your workload, you may want to increase the expected
resource consumption for your workload.
Documentum System Sizing Guide Index-1
I NDE X
A
active user 1-4
active user - in transaction 1-4, 2-3
active user - out of transaction 1-4, 2-3
activity timeout 1-5
AIX F50 (IBM server) 4-17
AIX S7A (IBM server) 4-16
Anonymous RightSite Web Site workload.
See Web Site workload
AutoRender Pro
sizing 6-5
in WCM Edition 3-19
availability considerations, for system 3-12
B
backup capacity and sizing 3-12
bandwidth
defined 1-5, 5-3
response times, affect on 5-4
vs localizing 5-6
vs remote Web servers 5-8
benchmark tests
Document Find and View workload on
HP K580 machines 4-21
EDMI workload
on AIX servers 4-17
on IBM Netfinity 7000 M10 4-15
on LXR8000 & LH4 4-20
on Sun Enterprise 450 4-11
on Sun Enterprise 6500/4500 4-13
focus for N-tier tests 4-5
hardward configurations 4-2
iTeam workload
on Compaq servers 4-10
on HP LXR8000 & LH4 4-19
on Lpr/LH4 4-19
Online Customer Care workload on
V2600 4-21
result tables, interpreting 4-6
Web site workload
Web Site workload on Sun Enterprise
450 4-11
bottleneck, defined 1-5
busy hour
active sessions, estimating 2-8
defined 2-6
C
caches
affect on performance 4-26
database 4-27
eContent Server 4-27
memory use 4-26
RightSite 4-27
capacity planning. See system sizing
cluster, eContent Server 3-8
Compaq servers
described 4-9
Web site for 4-22
configurations
host-based vs multi-tiered 3-11
connected user 1-5
connecting user 1-5, 2-3
connection states. See user connection states
content
replication, described 3-15
servers 3-14
transfers, response times 5-9
ContentCaster 3-20
cost-based optimizers 4-38
CPU usage
Documentum server 2-4
Documentum workloads 2-24
RightSite Server 2-5
Index-2 Documentum System Sizing Guide
D
data caches, database 4-27
database
caches
described 4-27
disk I/O and 4-39
license sizing 4-46
memory requirements 4-29
scaling 3-10
server 1-5
DBMS. See databases
Desktop Client sizing
CPU speed 6-1
dmcl operations 6-3
memory 6-4
disk access capacity 4-37
disk capacity sizing
overview 4-35
process inputs 4-36
query optimizers and 4-38
space vs access capacity 4-37
tables scans, affect on 4-38
disk I/O
database cache and 4-39
of disk storage areas 4-42
disk space sizing
formula for 4-45
general notes 4-46
server software requirements 4-44
space vs access speed 4-37
disk storage areas
sizing 4-42
disk striping
defined 4-40
with parity 4-42
without parity 4-41
disk throughput 1-6
distributed storage areas 3-14
DL360 (Compaq server) 4-9
DMCL
object cache 4-28
response times 5-11
Docbase size and memory requirements
4-29
Docbasic compiled-code memory area 4-28
DocBrokers
load balancing 3-10
scaling 3-10
DocPage Server 1-5
Document Find and View workload
described A-9
on HP K580 machines 4-21
Documentum Server
defined 1-5
transformation engine 1-7
Documentum Sizing Spreadsheet 1-3, 2-2
Documentum workloads
Docbase usage patterns 2-25
Document Find and View A-9
EDMI A-1
iTeam 2-10
Load and Delete workload 2-22
Online Customer Care A-9
operations not included 2-26, A-17
resource consumption 2-24, A-15
software architecture 2-23, A-14
Web Site A-6
WebPublisher 2-16
dynamic HTML, memory use 4-28
dynamic Web pages, affects on scaling 3-20
E
e-Content Server 1-6
eContent Server
caches 4-27
cluster 3-8
defined 1-6
load balancing 3-8
scaling 3-7
server set 3-8
editions
Portal 3-21
Web Content Management 3-17
EDM Server 1-6
EDMI workload
on AIX servers 4-17
described A-1
execution scenario A-2
on IBM Netfinity 7000 M10 4-15
on LXR8000 & LH4 4-20
operations A-3
response time requirements A-4
scaling A-6
Enterprise 450 (Sun server) 4-11
Enterprise Resource Planning system,
Documentum and 4-22
errors in system sizing 1-3
F
F50 (AIX server) 4-17
failover
operating system solutions 3-13
partitioning and 3-5
federations 3-15
G
global type cache 4-27
H
hardware configurations
in benchmarks 4-2
choosing configuration 3-11
host-based configurations 3-11
HP servers
K580 4-20
LH4 4-19
Lpr 4-19
NETSERVER LXR 8000 4-18
V2600 4-20
Web sites for 4-22
HTTP Server 1-6
I
IBM servers
AIX F50 4-17
AIX S7A 4-16
Netfinity 7000 M10 4-15
Web site for 4-22
inactive users
resource consumption 2-5
inactive users, defined 1-6, 2-3
indexes and disk capacity 4-38
iTeam in Portal edition 3-21
iTeam workload
on Compaq servers 4-10
execution scenario 2-11
on HP LXR8000 & LH4 4-19
on Lpr/LH4 4-19
operations 2-12
purpose 2-10
response times
examples 2-16
requirements 2-15
task performance and 2-14
scaling 2-13
K
K580 (HP-UX server) 4-20
L
latency
defined 5-3
network 1-6
LH4 (HP server) 4-19
Load and Delete 2-22
load balancing
DocBrokers 3-10
eContent Server 3-8
network 3-6
transparent 3-5
Lpr (HP server) 4-19
M
memory
physical 1-6
virtual 1-7
memory sizing
cache usage 4-26
database requirements 4-29
for Desktop Client 6-4
Docbase size, affect on 4-29
dynamic HTML 4-28
equation for 4-32
examples 4-32 to 4-35
guidelines, general 4-31
for MS Office integrations 6-4
operating system 4-30
operating system requirements 4-29
oversizing 4-24
overview 4-23
paging file 4-30
RightSite 4-27
user connection requirements 4-28
virtual memory 4-25
metadata retrieval, affects on scaling 3-20
Microsoft Windows NT
Web site 4-22
mirroring, in disk striping 4-42
MS Office integrations memory use 6-4
multi-site deployments
administrative overhead 3-16
deployment options, list of 3-14
network bandwidth 3-16
multi-tiered configurations
advantages 3-11
availability and 3-12
N
name servers. See DocBrokers
named user 1-6
Netfinity 7000 M10 (IBM Windows NT
machine) 4-15
NETSERVER LXR 8000 (HP server) 4-18
network
latency 1-6
load balancer 3-6
throughput 1-6
network sizing
bandwidth
cost of 3-16
vs localizing 5-6
vs response time 5-4
for content transfer speed 5-9
for DMCL operations 5-11
overview 5-1
N-tier configurations
benchmark test focus 4-5
described 4-3
O
object replication, described 3-15
Online Customer Care workload
described A-9
on HP V2600 machines 4-21
operations A-10
user activities A-11
operating systems
physical memory, tuning 4-29
P
paging file
memory use, estimating 4-30
out of memory detection 4-31
sizing 4-23
usage 4-25
on Windows NT 4-31
parity, in disk striping 4-41
partitioning
and failover 3-5
RDBMS 3-10
scaling out 3-3
scaling up 3-2
Web tier software 3-6
performance and cache use 4-26
physical memory
cache usage 4-26
calculation examples 4-32 to 4-35
defined 1-6
global type cache 4-27
large memory support 4-29
operating system, configuring 4-30
process working set 4-25
sizing 4-23
user connection requirements 4-28
Portal Edition, scaling 3-21
primary domain controller, Documentum
and 4-22
process working sets
defined 4-25
on Windows NT 4-30
Proliant servers (Compaq) 4-9
Q
query optimizers, affects on table scans
4-38
R
RAID configurations 4-40
reference links, described 3-15
relational databases. See databases
remote Web servers
defined 3-14
vs bandwidth 5-8
resource consumption
busy hour 2-6
Documentum workloads 2-24, A-15
dynamic and static web pages A-16
inactive users 2-5
process working set 4-25
RightSite 2-5
user connection states and 2-2
response times
Desktop client 6-3
examples
iTeam workload 2-16
WebPublisher workload 2-21
requirements
EDMI workload A-4
iTeam workload 2-15
Online Customer Care workload
A-13
Web Site workload A-8
WebPublisher workload 2-20
user expectations 2-9
vs bandwidth needs 5-4
RightSite
described 1-6
DMCL object cache 4-28
user connection states 2-5
rule-based optimizers 4-38
S
S7A (AIX machine) 4-16
scaling
across sites 3-14
databases 3-10
DocBrokers 3-10
dynamic Web access and 3-20
eContent Server 3-7
iTeam workload 2-13
out 3-3, 3-4, 3-5
Portal Edition 3-21
trends affecting 3-1 to 3-5
up 3-2, 3-5
Web Content Management Edition
3-17, 3-19
server machines
Compaq 4-9
Sun Solaris 4-10
server sizing
CPU use
RightSite and WDK/App server
4-5
overview 4-1
server CPU use 2-4
server software
configurations, compared A-14
host-based configurations 4-3
N-tier configurations 4-3
partitioning
across multiple hosts 3-3, 3-4
on one host 3-2
reuse 3-2
Site Delivery Services 3-20
Solaris memory sizing 4-31
spreadsheet, sizing 1-3, 2-2
striping. See disk striping
Sun servers
Enterprise 450 4-11
Enterprise 6500 & 4500 4-12
Web site for 4-22
swap file. See paging file
system sizing
AutoRender Pro 6-5
benchmark result tables, interpreting
4-6
benchmark test results 4-4 to 4-21
configuration constraints 4-22
database license sizing 4-46
Desktop Client 6-1
disk capacity 4-35
disk space sizing formula 4-45
disk space vs access capacity 4-37
disk storage areas 4-42
disk striping 4-40
Documentum Spreadsheet 1-3, 2-2
glossary of terms 1-4
hardware configurations 3-11
for high availability 3-12
memory calculation examples 4-32 to
4-35
mistakes, common 1-3
network
bandwidth needs 5-4
overview 5-1
overview 1-1
paging file 4-30
requirements of 1-2
scaling
across sites 3-14
databases 3-10
DocBrokers 3-10
server
guidelines 4-22
memory 4-23
process overview 4-1
software requirements 4-44
software reuse and 3-2
table scans, affect on 4-29
user deployment considerations 3-4
virtual memory 4-25
workloads
defined 2-1
estimating 2-2
T
table scans
disk capacity, affect on 4-38
query optimizers and 4-38
throughput, defined 1-6
transactions 1-7
transformation engine 1-7
U
user connection states
active user 1-4
active user - in transaction 1-4, 2-3
active user - out of transaction 1-4, 2-3
connected user 1-5
connecting user 1-5, 2-3
described 1-7
inactive user 1-6
inactive users
defined 2-3
paging file use 4-30
RightSite Server 2-5
V
V2600 (HP-UX server) 4-20
virtual memory 1-7, 4-25
vmstat UNIX utility 4-31
W
WCM Edition. See Web Content
Management Edition
Web Content Management Edition
AutoRender Pro and 3-19
described 3-17
scaling requirements 3-19
Site Delivery Services and 3-20
WebPublisher and 3-19
Web Site workload
described A-6
operations A-7
scaling A-9
Web tier software, scaling 3-5
WebCache 3-20
WebPublisher 3-19
WebPublisher workload
operations
list of 2-18
overview 2-17
purpose 2-16
response times
examples 2-21
requirements 2-20
task performance and 2-20
scaling 2-19
Windows NT paging file 4-31
workloads
See also Documentum workloads
busy hour 2-6
defined 2-1
estimating 2-2
response time expectations 2-9
use in sizing 2-9

Document Um System Sizing Guide

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Document Um System Sizing Guide

Uploaded by

Copyright:

Available Formats

Documentum

, Documentum 4i, Docbase, Documentum eContent

, Documentum Desktop Client,

, Documentum DocPage Server

, and Documentum ViewSpace

s Documentum Intranet Client

You might also like