You are on page 1of 436

Front cover

IBM FileNet Content Manager


Implementation Best Practices
and Recommendations

Use system architecture, capacity


planning, and business continuity

Design the repository, security,


application, and solution

Learn to deploy,
administer, and
maintain

Wei-Dong Zhu
Dan Adams
Dominik Baer
Bill Carpenter
Chuck Fay
Dan McCoy
Thomas Schrenk
Bruce Weaver

ibm.com/redbooks
International Technical Support Organization

IBM FileNet Content Manager Implementation Best


Practices and Recommendations

April 2008

SG24-7547-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page xi.

First Edition (April 2008)

This edition applies to Version 4, Release 0 IBM FileNet Content Manager (product number
5724-R81).

© Copyright International Business Machines Corporation 2008. All rights reserved.


Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.
Contents

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

Chapter 1. Introduction to IBM FileNet Content Manager . . . . . . . . . . . . . . 1


1.1 IBM Enterprise Content Management (ECM) . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 IBM FileNet P8 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 P8 Content Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 High ingestion and large capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Active content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 IBM FileNet P8 family of products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Content products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Process products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.3 Compliance products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 2. Solution examples and design methodology. . . . . . . . . . . . . . 13


2.1 P8 Content Manager sample solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Document revision and approval process . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Insurance claim processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.3 Call center support operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.4 E-mail capture for compliance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Design methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Requirements analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Functional design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.3 System architecture design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.4 Repository design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.5 Security model design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.6 Application design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.7 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.8 Maintenance planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Chapter 3. System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27


3.1 Basic architecture of an IBM FileNet P8 system . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Major components of an IBM FileNet P8 Platform . . . . . . . . . . . . . . 28
3.1.2 A basic IBM FileNet P8 system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

© Copyright IBM Corp. 2008. All rights reserved. iii


3.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Horizontal scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Vertical scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 A virtualized IBM FileNet P8 system . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Shared infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.1 Introduction to data segregation for a shared system . . . . . . . . . . . . 44
3.4.2 Low data segregation in a shared system. . . . . . . . . . . . . . . . . . . . . 45
3.4.3 Medium data segregation in a shared system . . . . . . . . . . . . . . . . . 46
3.4.4 High data segregation in separate systems . . . . . . . . . . . . . . . . . . . 47
3.4.5 Degree of sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.6 Best practice: Offer different qualities of service . . . . . . . . . . . . . . . . 49
3.5 Geographically distributed systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.1 Domain, site, virtual server, and server configuration . . . . . . . . . . . . 53
3.5.2 Distributed content caching model . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5.3 Request forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.5.4 Use cases of distributed systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Chapter 4. Capacity planning with Scout . . . . . . . . . . . . . . . . . . . . . . . . . . 65


4.1 Scout overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Example use cases for Scout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3 Capacity planning for new systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 Scout output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Predictions from a baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6 Best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.7 Disk sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.8 Performance-related reference documentation. . . . . . . . . . . . . . . . . . . . . 79
4.8.1 Standard product documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.8.2 Benchmark papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Chapter 5. Basic repository design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85


5.1 Repository design goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Object-oriented design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.1 Design approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.2 Design processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.3 Repository naming standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3.1 Display name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.2 Symbolic name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.3 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.4 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.5 Consistency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.6 Specific points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

iv IBM FileNet Content Manager Implementation Best Practices and Recommendations


5.4 Populating a repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.4.1 Generic object system properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.4.2 Creating design elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5 Repository organizational objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.5.1 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.5.2 Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.5.3 Virtual servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5.4 Server instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.5.5 Global Configuration Database (GCD) . . . . . . . . . . . . . . . . . . . . . . 105
5.6 Repository design objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.6.1 Object stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.6.2 Storage areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.6.3 Document classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.6.4 Folder classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.6.5 Custom object classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.6.6 Compound documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.6.7 Property templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.6.8 Choice lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.6.9 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.6.10 Document life cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.6.11 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.6.12 Marking sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.7 Repository content objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.7.1 Folder objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.7.2 Other objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.7.3 Instantiation hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Chapter 6. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


6.1 Security concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.1.1 Facets of security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.1.2 Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.1.3 Authentication and authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2 8 Content Manager security features . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.2.1 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.2.2 LDAP users and groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.2.3 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.2.4 Object store security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.2.5 Security policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Contents v
6.3 P8 Content Manager administration support. . . . . . . . . . . . . . . . . . . . . . 149
6.4 JAAS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5 Product documentation for security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

Chapter 7. Application design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153


7.1 IBM FileNet P8 applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.1.1 IBM FileNet Enterprise Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.1.2 Workplace and Workplace XT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.1.3 Designer applets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.1.4 Application Integration for Microsoft Office . . . . . . . . . . . . . . . . . . . 156
7.1.5 IBM FileNet Business Process Framework . . . . . . . . . . . . . . . . . . . 156
7.2 Application technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.2.1 Traditional Java thick clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.2.2 Java applets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.2.3 J2EE Web applications and other components . . . . . . . . . . . . . . . 159
7.2.4 Service-oriented architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.2.5 .NET components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3 Principles for application design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3.1 Available P8 Content Manager APIs . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3.2 Transports available with the APIs . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.3.3 Authentication models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.3.4 Minimizing round-trips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.3.5 Client-side transactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.3.6 Creating a custom AddOn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.3.7 Using the JDBC interface for reporting . . . . . . . . . . . . . . . . . . . . . . 173
7.3.8 Exploiting the active content event model. . . . . . . . . . . . . . . . . . . . 174
7.3.9 Creating your own API or framework . . . . . . . . . . . . . . . . . . . . . . . 175
7.3.10 Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.3.11 Creating a custom protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.3.12 Creating a data model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.3.13 Additional reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Chapter 8. Advanced repository design . . . . . . . . . . . . . . . . . . . . . . . . . . 185


8.1 P8 Content Manager folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.1.1 Filed as opposed to unfiled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.1.2 Organizing unfiled content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
8.1.3 Repository folder structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.2 Storage media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.2.1 Catalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.2.2 Database stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.2.3 File stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.2.4 About storage policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.2.5 Using fixed storage devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

vi IBM FileNet Content Manager Implementation Best Practices and Recommendations


8.3 P8 Content Manager searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.3.1 User-invoked searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.3.2 Content-based search (CBR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.3.3 Searches for repository maintenance . . . . . . . . . . . . . . . . . . . . . . . 206
8.4 Considerations for multiple object stores . . . . . . . . . . . . . . . . . . . . . . . . 209
8.4.1 Segregate for performance reasons . . . . . . . . . . . . . . . . . . . . . . . . 210
8.4.2 User groups are separated by large geographical distance . . . . . . 211

Chapter 9. Business continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213


9.1 Defining business continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.2 Defining high availability (HA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.3 Implementing a high availability solution . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.3.1 Load-balanced server farms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
9.3.2 Active-passive server clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.3.3 Geographically dispersed server clusters . . . . . . . . . . . . . . . . . . . . 223
9.3.4 Server cluster products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
9.3.5 Server cluster configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
9.3.6 Comparing and contrasting farms to clusters . . . . . . . . . . . . . . . . . 227
9.3.7 Inconsistent industry terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.3.8 Server virtualization and high availability . . . . . . . . . . . . . . . . . . . . 229
9.4 Defining disaster recovery (DR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.4.1 Disaster recovery concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
9.5 Implementing a disaster recovery solution . . . . . . . . . . . . . . . . . . . . . . . 231
9.5.1 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.5.2 Global cluster manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
9.5.3 Disaster recovery approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.6 Best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
9.7 Product documentation for HA and DR . . . . . . . . . . . . . . . . . . . . . . . . . . 243

Chapter 10. Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245


10.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
10.1.1 Multi-stage deployment environments . . . . . . . . . . . . . . . . . . . . . 246
10.2 Process management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.2.1 Release management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
10.2.2 Change management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
10.2.3 Configuration management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.2.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.3 Deployment approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
10.3.1 Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.3.2 Export, transform, and import . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
10.3.3 Scripted generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10.4 Deployment by cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10.4.1 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Contents vii
10.4.2 Access to the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.4.3 Post-cloning activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.4.4 Backup changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.5 Deployment by export, transform, and import . . . . . . . . . . . . . . . . . . . . 261
10.5.1 Incremental deployment compared to full deployment . . . . . . . . . 261
10.5.2 Reduce complexity of inter-object relationships . . . . . . . . . . . . . . 262
10.5.3 Deployment automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.6 P8 Content Manager deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.6.1 CE-Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10.6.2 CE-Objects transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
10.6.3 CE-Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
10.6.4 Exporting and importing other components . . . . . . . . . . . . . . . . . 269

Chapter 11. System administration and maintenance. . . . . . . . . . . . . . . 273


11.1 Online help and existing documentation . . . . . . . . . . . . . . . . . . . . . . . . 274
11.2 System performance monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
11.2.1 Listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
11.2.2 Dashboard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
11.2.3 System Manager performance archiver . . . . . . . . . . . . . . . . . . . . 280
11.2.4 System Manager client API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
11.3 IBM FileNet System Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
11.4 System logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
11.4.1 Message logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
11.4.2 Trace logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
11.4.3 Log4J trace logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
11.4.4 Message and trace log maintenance . . . . . . . . . . . . . . . . . . . . . . 286
11.4.5 Audit and statistics logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
11.5 Reporting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
11.5.1 Queries using the object model. . . . . . . . . . . . . . . . . . . . . . . . . . . 286
11.5.2 Queries using the schema of the database. . . . . . . . . . . . . . . . . . 287
11.6 Capacity monitoring and growth prediction . . . . . . . . . . . . . . . . . . . . . . 288
11.7 IBM FileNet Enterprise Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
11.7.1 Using IBM FileNet Enterprise Manager. . . . . . . . . . . . . . . . . . . . . 290
11.7.2 IBM FileNet Enterprise Manager: Setting system default values . 291
11.7.3 IBM FileNet Enterprise Manager: Enable trace logging . . . . . . . . 293
11.8 Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
11.9 Search and bulk operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
11.9.1 Search using IBM FileNet Enterprise Manager . . . . . . . . . . . . . . . 299
11.9.2 Bulk operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
11.10 Adding security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
11.11 System backup and restore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
11.11.1 System components requiring backup . . . . . . . . . . . . . . . . . . . . 309
11.11.2 Offline backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

viii IBM FileNet Content Manager Implementation Best Practices and Recommendations
11.11.3 Online backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
11.11.4 System restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
11.11.5 Consistency Check utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
11.11.6 Application consistency check . . . . . . . . . . . . . . . . . . . . . . . . . . 314
11.12 Task schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
11.13 Best practice summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

Chapter 12. Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319


12.1 Troubleshooting overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
12.2 A typical P8 Content Manager system . . . . . . . . . . . . . . . . . . . . . . . . . 320
12.3 Problem isolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
12.3.1 Quick checks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
12.3.2 One or a few users report an issue . . . . . . . . . . . . . . . . . . . . . . . . 325
12.3.3 Many users report an issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
12.3.4 Performance troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
12.4 Calling IBM for support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
12.4.1 The IBM Software Support Handbook . . . . . . . . . . . . . . . . . . . . . 330
12.4.2 Open a PMR by calling IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
12.4.3 Open a PMR via the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
12.4.4 Items to have available when contacting IBM Software Support . 331
12.5 Sample Java error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

Chapter 13. Solution building blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337


13.1 Solution building blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
13.1.1 Content ingestion tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
13.1.2 Storage design visual aid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
13.1.3 Content and business process options . . . . . . . . . . . . . . . . . . . . . 345
13.1.4 Presentation features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
13.2 Detailed function references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
13.2.1 Content ingestion design patterns. . . . . . . . . . . . . . . . . . . . . . . . . 352
13.2.2 Content and workflow management-related design patterns . . . . 369
13.2.3 Presentation and delivery management-related design patterns . 378
13.3 Four sample use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
13.3.1 Document revision and approval process . . . . . . . . . . . . . . . . . . . 383
13.3.2 Insurance claim processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
13.3.3 Information capture supporting call center operation . . . . . . . . . . 391
13.3.4 Email management for compliance . . . . . . . . . . . . . . . . . . . . . . . . 393

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399


Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

Contents ix
x IBM FileNet Content Manager Implementation Best Practices and Recommendations
Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

© Copyright IBM Corp. 2008. All rights reserved. xi


Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:

Redbooks (logo) ® IBM® System p™


AIX® Lotus Notes® System p5™
Domino® Lotus® Tivoli Enterprise Console®
DB2® Notes® Tivoli®
FileNet® Rational® WebSphere®
HACMP™ Redbooks® Workplace™

The following terms are trademarks of other companies:

SAP, and SAP logos are trademarks or registered trademarks of SAP AG in Germany and in several other
countries.

Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation
and/or its affiliates.

ReplicatorX, Network Appliance, SnapMirror, SnapLock, NetApp, and the Network Appliance logo are
trademarks or registered trademarks of Network Appliance, Inc. in the U.S. and other countries.

FileNet, and the FileNet logo are registered trademarks of FileNet Corporation in the United States, other
countries or both.

Enterprise JavaBeans, EJB, Java, JavaBeans, JDBC, JRE, JSP, JVM, J2EE, Solaris, Sun, and all
Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or
both.

Active Directory, Excel, Microsoft, Outlook, PowerPoint, SharePoint, Visio, Windows Server, Windows, and
the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

xii IBM FileNet Content Manager Implementation Best Practices and Recommendations
Preface

IBM® FileNet® Content Manager provides full content life cycle and extensive
document management capabilities for digital content. IBM FileNet Content
Manager is tightly integrated with the family of IBM FileNet P8 products and
serves as the core content management, security management, and storage
management engine for the products.

This IBM Redbooks® publication covers the implementation best practices and
recommendations for IBM FileNet Content Manager solutions. It introduces the
functions and features of IBM FileNet Content Manager, common use cases of
the product, and a design methodology that provides implementation guidance
from requirements analysis through deployment and administration planning.

The book addresses various implementation topics including system architecture


design with various options for scaling an IBM FileNet Content Manager system,
capacity planning with the IBM sizing tool Scout, basic and advanced repository
design, security, and application design.

One important implementation topic is business continuity. We define business


continuity, high availability, and disaster recovery concepts and discuss options
for implementing the IBM FileNet Content Manager solutions.

The book also addresses administrative topics of an IBM FileNet Content


Manager solution, including deployment, system administration and
maintenance, and troubleshooting.

Most solutions are essentially a construction of information input (ingestion),


storage, information processing, and presentation and delivery. At the end of the
book, we discuss the solution building blocks that designers can use to specify
and combine to build an IBM FileNet Content Manager solution.

This IBM Redbooks publication is intended to be used in conjunction with product


manuals and online help to provide guidance to architects and designers about
implementing IBM FileNet Content Manager solutions.

It is important to note that this book describes features offered in IBM FileNet
Content Manager Version 4.0. Many of the features described in the book also
apply to previous versions of IBM FileNet Content Manager. For specific details,
refer to the correct version of IBM FileNet Content Manager documentation.

© Copyright IBM Corp. 2008. All rights reserved. xiii


The team that wrote this book
This book was produced by a team of specialists from around the world working
at the International Technical Support Organization, San Jose Center.

Wei-Dong Zhu (Jackie) is an Enterprise Content Management (ECM) Project


Leader with ITSO in San Jose, California. She has more than 10 years of
software development experience in accounting, image workflow processing, and
digital media distribution. Jackie holds a Masters of Science degree in Computer
Science from the University of the Southern California. Jackie joined IBM in
1996. She is a Certified Solution Designer for IBM Content Manager and has
managed and led the production of many Enterprise Content Management IBM
Redbooks publications.

Dan Adams is a Senior ECM Architect, IBM Information Management, in


Denver, Colorado. He has over 15 years of experience in designing, developing,
and delivering complex distributed enterprise solutions in various organizations,
including IBM, Sun™ Microsystems, and Hewlett-Packard. He holds a degree in
Computer Science-Machine Learning and Statistics from Colorado State
University. His areas of expertise include Distributed Management, Storage,
Security, P8, Java™, and J2EE™. He has presented at dozens of industry
technical conferences, served in a number of industry standards organizations,
has published a number of technical articles in various journals, and authored
best practices guides for Storage Service Providers and P8 Content Engine
Deployment.

Dominik Baer is an ECM Solution Architect with IBM in Zurich, Switzerland.


Dominik has 14 years of experience in Information Management, at FileNet and
IBM, as a Global Sales Consultant and as a System and Solution Architect for
the biggest P8 account worldwide. He has experience in development,
consulting, teaching, and project management in the banking and insurance
market. Dominik has broad experience in databases, enterprise application
integration, and Internet-related, content, and process technologies. He has a
degree in Industrial Management and Manufacturing from Federal Institute of
Technology, Zurich.

Bill Carpenter is an ECM Architect with IBM in the Seattle, Washington, area.
Bill has nine years of experience in Enterprise Content Management, at FileNet
and IBM, as a developer, development manager, and architect. He has previous
experience in building large software systems at Fortune 50 companies and has
also worked at small companies. He has been a frequent mailing list and patch
contributor to several open source projects. Bill holds degrees in Mathematics
and Computer Science from Rensselaer Polytechnic Institute.

xiv IBM FileNet Content Manager Implementation Best Practices and Recommendations
Chuck Fay is a Software Architect at IBM in Costa Mesa, California, reporting to
the CTO for Enterprise Content Management, with responsibilities in system
architecture, patents, and industry standards. He has thirty years of experience
in the software industry, as a developer, manager, and CTO staff member,
including eight years with Xerox Corporation, nineteen with FileNet Corporation,
and one with IBM. At FileNet, he was responsible for the design, development,
and deployment of complex document image management systems and
electronic document management application products. For the past six years,
he has advised FileNet (and now IBM) engineering, support, and technical sales
representatives, as well as clients, in the area of system architecture for high
availability and disaster recovery for IBM FileNet P8 products. He holds an A.B.
in Philosophy and an M.S. in Computer Science, both from Stanford University.

Dan McCoy is a Principal Consultant with IBM. He lives in San Diego, California.
He has 10 years of experience working with clients in Content and Records
Management. He holds a degree in Computer Science from San Diego State
University. His areas of expertise include FileNet/IBM Content Services, P8,
Records Manager, and Email Manager. He has written extensively about best
practices for implementing Records and Email Management Systems.

Thomas Schrenk is an ECM Senior Systems Consultant with IBM in


Frankfurt/Main, Germany. Thomas joined IBM in 2007 with 13 years of FileNet
experience in Imaging, Content Management, and Business Process
Management. He holds a Masters degree in Computer Science from University
of Applied Sciences WŸrzburg. His areas of expertise include the IBM FileNet P8
Architecture, Email Management, Compliance, and Storage. Thomas has a
strong background in designing and implementing High Availability and Disaster
Recovery ECM solutions. He is advising clients mainly in the Financial Services
sector in Germany and Luxembourg.

Bruce Weaver is an ECM Consulting IT Specialist in the metropolitan New York


City area. He has 25 years of experience in the Imaging and Content
Management field. His areas of expertise include installation, configuration,
upgrade, and support of IBM FileNet ECM systems. He has worked closely with
clients to design and install highly available ECM solutions. Bruce joined IBM in
2006 with 16 years at FileNet.

Very special thanks to Michael Seaman who has contributed to the review and
part of the writing remotely from England.

We also thank the following people for their contributions to this project:

Deanna Polm
International Technical Support Organization, San Jose Center

Preface xv
Kevin Bates
Debbie Lelek
Qiuping Lu
Xingdong Ji
Gregory Miller
Tim Morgan
Joseph Raby
Shari Perryman
Yvonne Santiago
Diane Searer
Michael Tucker
Shawn Waters
Mike Winter
IBM Software Group, Costa Mesa, California

Become a published author


Join us for a two- to six-week residency program! Help write a book dealing with
specific products or solutions, while getting hands-on experience with
leading-edge technologies. You will have the opportunity to team with IBM
technical professionals, Business Partners, and Clients.

Your efforts will help increase product acceptance and client satisfaction. As a
bonus, you will develop a network of contacts in IBM development labs, and
increase your productivity and marketability.

Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html

Comments welcome
Your comments are important to us!

We want our books to be as helpful as possible. Send us your comments about


this book or other IBM Redbooks publications in one of the following ways:
򐂰 Use the online Contact us review IBM Redbooks publications form found at:
ibm.com/redbooks

xvi IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 Send your comments in an e-mail to:
redbooks@us.ibm.com
򐂰 Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400

Preface xvii
xviii IBM FileNet Content Manager Implementation Best Practices and Recommendations
1

Chapter 1. Introduction to IBM FileNet


Content Manager
IBM FileNet Content Manager (P8 Content Manager) provides full content life
cycle and extensive document management capabilities for digital content. This
chapter introduces P8 Content Manager and describes its features and
functionality to demonstrate how P8 Content Manager makes an excellent
foundation for enterprise content management solutions.

We discuss the following topics:


򐂰 IBM Enterprise Content Management (ECM)
򐂰 IBM FileNet P8 Platform
򐂰 P8 Content Manager
򐂰 IBM FileNet P8 family of products

© Copyright IBM Corp. 2008. All rights reserved. 1


1.1 IBM Enterprise Content Management (ECM)
In the quest for increased efficiency and profitability, organizations strive to
incorporate more and more relevant information into their business processes in
order to make the right decision at the right time. Corporations want to enable
their employees to search, retrieve, and review information in context, to limit
exception handling and manual processing, to reduce costs, and improve
service. It is important to have all the information needed and make the right
decision that satisfies clients, partners, suppliers, and shareholders.

The ability to make decisions better and faster is a real competitive advantage
that IBM Enterprise Content Management (ECM) solutions can help provide.
IBM ECM improves workforce effectiveness by enabling organizations to
transform their business processes, access and manage all forms of content,
secure and control information related to compliance needs, and optimize the
infrastructure required to deliver content anywhere at anytime.

IBM ECM helps organizations make quick, smart, and cost-effective decisions,
right at the moment that it matters the most.

IBM ECM benefits include:


򐂰 Active content
Delivery of information that is unified, accurate, and in context. Extend
content objects to include broader metadata needs.
򐂰 Business agility
Core Business Process Management (BPM) services within service-oriented
architecture (SOA) to support componentized application development.
Critical business artifacts are reusable across the enterprise.
򐂰 Enterprise compliance
Compliance and records management services embedded in the
infrastructure. Security, access, and authorization are implemented to
manage risk.
򐂰 Content anywhere
Content and catalog integration allow for managing of content anywhere
without requiring content migration.

1.2 IBM FileNet P8 Platform


The IBM FileNet P8 family of products are part of IBM ECM product suite. The
IBM FileNet P8 family of products includes back-end services, development

2 IBM FileNet Content Manager Implementation Best Practices and Recommendations


tools, and applications that address enterprise content and process management
requirements.

The IBM FileNet P8 products are based on the IBM FileNet P8 Platform, which
is a unified content, process, and compliance platform that offers maximum
flexibility, accelerates application deployment, and lowers the total cost of
ownership. It is an integrated platform that provides interoperability to a wide
selection of database, operating system, storage, security, and Web server
environments. It serves as the core content management, security management,
and storage management engine for the IBM FileNet P8 family of products.

The IBM FileNet P8 Platform includes the baseline components for enterprise
content management solutions, including Content Engine, Process Engine,
Application Engine, and Rendition Engine. These components address
enterprise content management and Business Process Management
requirements. We discuss these components (excluding the Rendition Engine) in
3.1.1, “Major components of an IBM FileNet P8 Platform” on page 28.

All IBM FileNet P8 Platform capabilities are inherited and therefore are available
in all IBM FileNet P8 products. Additional components can be added to a system
to enable additional capabilities.

The IBM FileNet P8 Platform capabilities can be leveraged for a wide range of
enterprise scalable solutions, including Business Process Manager, Content
Manager, Email Manager, Forms Manager, Image Manager, Records Manager,
and more.

For a list of IBM FileNet P8 family of products and a brief introduction to several
of these products, see 1.4, “IBM FileNet P8 family of products” on page 8.

In the next section, we focus on the main product that this book addresses, P8
Content Manager.

1.3 P8 Content Manager


P8 Content Manager provides full content life cycle and extensive document
management capabilities for digital content.

P8 Content Manager combines document management with readily available


workflow and process capabilities to automate and drive content-related tasks
and activities. It provides unique active content capability to proactively move
content and content-related business tasks through a business process without
requiring human initiation. In addition, P8 Content Manager streamlines

Chapter 1. Introduction to IBM FileNet Content Manager 3


document management tasks by providing mature versioning and parent-child
capabilities, approval workflows, and integrated publishing support.

P8 Content Manager provides the ability to actively manage content across the
enterprise regardless of the repository in which it resides, using Content
Federation Services. It is integrated with the IBM FileNet P8 Platform, which
provides interoperability with the widest selection of database, operating system,
storage, security, and Web server environments in the industry.

In simple terms, P8 Content Manager is a library for electronic content. It can


store simple objects, such as documents and images, as well as more complex
objects, such as workflows, e-mail streams, and corporate records. It is designed
as a central repository for any type of electronic information. Some other types
of electronic content include audio files, Web content, XML files, rich media, and
fax.

In addition to providing the repository, P8 Content Manager provides a rich set of


content management services, tools, built-in features, and programmable
functionality that system designers can use to meet enterprise content
management goals.

1.3.1 High ingestion and large capacity


P8 Content Manager is designed to handle high volumes of information.
Performance studies show that P8 Content Manager can handle both high
ingestion rates (millions of documents per hour) and large amounts of stored
information (over a billion objects in a single repository).

According to the October 2007 published white paper, FileNet P8 4.0: Content
Engine Performance and Scalability using WebSphere Application Server v6 and
DB2 9 Data Server on IBM System p5 595:

“Using an IBM On Demand environment with 16 instances of the Content Engine


(CE), the IBM solution stack achieved an ingestion rate of more than 5.6 million
documents per hour or 1,578 docs/sec. A document retrieval rate of more than
7.8 million documents per hour or 2,172 docs/sec was measured. These rates
demonstrate the ability of the IBM solution stack to handle large, enterprise-level
workloads: more than 45 million documents ingested and more than 62 million
documents retrieved in an 8-hour day, and a 20-hour day allowing more than 113
million ingestions or 156 million retrievals.”

You can download the white paper from the following Web site:
http://w3.ibm.com/software/xl/portal/viewcontent?type=doc&srcID=DM&docI
D=M491211F04837D11

4 IBM FileNet Content Manager Implementation Best Practices and Recommendations


The content is available to the public. If you do not have access to the IBM w3
Web site, ask your marketing representative to send you the PDF version.

Scalable architecture
P8 Content Manager achieves these performance rates with a scalable
architecture. Multiple servers can be added in load-balanced configurations to
handle increasing transaction loads (see Figure 1-1). This architecture makes
the P8 Content Manager repository an ideal candidate for large corporations,
government agencies, or any client with large information management
requirements.

Load Balancer

Scaled Content
Manager
Servers

Repository

Figure 1-1 P8 Content Manager scalability

1.3.2 Active content


P8 Content Manager offers flexible tools that coordinate the revision, routing,
and processing of objects stored in the repository. A powerful feature of P8
Content Manager is the support for active content. Active content is electronic
information that changes according to document life cycle or business
processes. Document life cycle is a feature of P8 Content Manager. It can
consist a simple sequence of steps. Business processes, however, are executed
by the P8 Business Process Manager. Business processes typically contain
branching logic. In essence, both are a series of steps, with each step
representing an event or process that acts on the object content.

P8 Content Manager features event action scripts that can be triggered when
objects are created, modified, or deleted in the repository. Event actions can
launch workflows or execute Java applications. Events, and the actions that they
trigger, are the mechanism that enables active content.

Chapter 1. Introduction to IBM FileNet Content Manager 5


We illustrate active content using two examples: a simple document revision
cycle and a more complex insurance claim process.

Active content example: Simple document revision cycle


In this example, an official document is created and moves through a typical
document revision cycle:
1. Creation - An author creates a document and stores it in the repository.
2. Revision (In Review) - Other authors review and revise information as
necessary.
3. Approval - A manager performs a final review and approves the document for
release (or sends it back for more revision).
4. Publish - Released documents are posted to a company Web site.

Figure 1-2 illustrates the simple document revision cycle.


I

Create

In Review
Publish

Approval

Figure 1-2 Active content example: Simple Document revision life cycle

In this example, the business process is the revision cycle, and the steps of the
process are Create, Revision (In Review), Approval, and Publish. The
document’s state corresponds to the step in the life cycle.

To implement a solution for this example using P8 Content Manager, a designer


defines the number of states, sets up control for the person who can edit or
approve the document at each state, and finally calls a script that renders the
approved document into HTML. In this example, the document becomes active

6 IBM FileNet Content Manager Implementation Best Practices and Recommendations


content as it moves from state to state according to the document life cycle. No
programming is required.

Active content example: Insurance claim processing


A more complex example of active content is an insurance claim. In this example,
an insurance company has strict policies about how claims must be processed.
Claim approval depends on the receipt of additional content items – in this case,
police reports, property valuations, or policy reviews. At the appropriate step, the
process waits until the arrival of required documentation and then proceeds to
the next step in the process.

Using P8 Content Manager and IBM FileNet Business Process Manager (BPM),
designers can build a workflow (Claim Process) that launches when insurance
claims arrive by fax.

The workflow has the following steps:


1. Gather supporting documentation.
2. Adjust claim (claim adjustment).
3. Approve the claim if requirements are met (claim approval).
4. Take action on the claim (claim action).
5. Create a claim record (claim record).

At step one, when supporting documents come in (through fax, scanned, or in


electronic format), the workflow recognizes the new content items and links the
items to the existing claim. If all required supporting documents are gathered, the
workflow moves the claim to the claim adjustment step.

Figure 1-3 on page 8 illustrate the insurance claim process.

Chapter 1. Introduction to IBM FileNet Content Manager 7


Repository
Insurance
Claim Fax Capture

Gather Launch New


Supporting Claim Process
Documentation

Claim
BPM Record
Claim
Adjustment

Claim
Actions
Claim
Approval

Figure 1-3 Active content example: Insurance claim processing

In this example, the claims processing policy is enforced by a business process.


The process branches depending on whether the claim is approved or denied.
The claim is updated to reflect changes made at each step. In this context,
content is integrated with a business process to become active content.

The elements of active content — route control, document states, and event
action scripts — offer designers a powerful toolset for creating strong enterprise
content management systems.

1.4 IBM FileNet P8 family of products


IBM FileNet P8 family of products is based upon the IBM FileNet P8 Platform.
They can be grouped into the following categories:
򐂰 Content products
Enabling companies to activate content with processes to add value and
transform their business.

8 IBM FileNet Content Manager Implementation Best Practices and Recommendations


򐂰 Process products
Automates and optimizes complex processes across the enterprise, using
effective content and compliance.
򐂰 Compliance products
An integrated compliance approach keeps unnecessary costs down while
improving visibility and control of content.

1.4.1 Content products


IBM FileNet P8 suite offers the following content products:
򐂰 IBM FileNet Content Manager
򐂰 IBM FileNet Forms Manager
򐂰 IBM FileNet Image Manager Active Edition
򐂰 IBM FileNet Content Federation Services
򐂰 IBM FileNet Team Collaboration Manager
򐂰 IBM FileNet Web Site Manager
򐂰 IBM FileNet Capture Desktop
򐂰 IBM FileNet Capture Professional
򐂰 IBM FileNet Content Services
򐂰 IBM FileNet Document Publisher
򐂰 IBM FileNet Fax
򐂰 IBM FileNet IDM Desktop
򐂰 IBM FileNet Image Services
򐂰 IBM FileNet Remote Capture Services
򐂰 IBM FileNet Report Manager
򐂰 IBM FileNet Site Publisher
򐂰 IBM FileNet Web Services/Open Client
򐂰 IBM FileNet Connectors for SharePoint®

There are many content products under the IBM FileNet P8 suite. It is beyond the
scope of this book to introduce all of them. However, to help you better
understand what these products can do for your corporation, we briefly introduce
several of them here.

IBM FileNet Forms Manager


IBM FileNet Forms Manager allows users to design, deploy, and process
electronic forms across the enterprise. Designers use IBM FileNet Forms
Manager to create forms that integrate directly with repository, business process,
and records management activities.

Chapter 1. Introduction to IBM FileNet Content Manager 9


IBM FileNet Image Manager Active Edition
IBM FileNet Image Manager provides secure storage and management of
enterprise-level volumes of fixed information. It is a high speed and high capacity
image ingestion storage and retrieval engine.

IBM FileNet Content Federated Services


IBM FileNet Content Federated Services (CFS) manages content that is stored
in IBM FileNet Image Manager, on file systems, or in remote, non-IBM
repositories. Together with IBM WebSphere® Information Integrator Content
Edition, CFS provides central control and normalization of enterprise content
from multiple industry leading content management repositories through a single,
unified enterprise metadata catalog.

IBM FileNet Capture Desktop


Designed for workgroup and departmental applications to capture and import
non-electronic information, this document capture solution automates manual
data capture tasks, streamlines document entry processing, and reduces
long-term operational costs.

IBM FileNet Capture Professional


This product captures, manages, and distributes content for centralized and
decentralized enterprise-wide applications. This highly scalable document
capture solution captures content based on individual content repositories,
records management policies, and business processes.

1.4.2 Process products


IBM FileNet P8 suite offers the following process products:
򐂰 IBM FileNet Business Process Manager
򐂰 IBM FileNet Business Activity Monitor
򐂰 IBM FileNet Business Process Framework
򐂰 IBM FileNet eForms
򐂰 IBM FileNet eProcess
򐂰 IBM FileNet Process Analyzer
򐂰 IBM FileNet Process Simulator
򐂰 IBM FileNet Connector for Microsoft® Visio®

To help you better understand what these products can do for your corporation,
we briefly introduce several of them here.

10 IBM FileNet Content Manager Implementation Best Practices and Recommendations


IBM FileNet Business Process Manager (BPM)
IBM FileNet Business Process Manager automates and optimizes business
processes by managing workflow and content among people and systems. It is
the market leading workflow engine. When content is added to the IBM FileNet
Content Manager repository, IBM FileNet Business Process Manager routes,
processes, or takes specific actions based on content properties or user input.

Note: BPM comes with the three core engines (Content Engine, Process
Engine, and Application Engine) that IBM FileNet Content Manager provides.
The difference is that BPM extends the basic process capabilities of IBM
FileNet Content Manager and provides much more advanced features and
functions for implementing complex business processes.

IBM FileNet Business Activity Monitor (BAM)


IBM FileNet Business Activity Monitor correlates multiple events’ data streams,
monitors the effectiveness of business processes, measures against key
performance indicators, and automates escalations and preventive actions. It
improves operational visibility and agility.

IBM FileNet Business Process Framework (BPF)


IBM FileNet Business Process Framework provides a highly configurable
development framework leveraging IBM FileNet P8 processes and content,
enabling organizations to deploy applications faster while minimizing coding.
BPF reduces process application development cost and time to market and
improves usability by providing a configurable framework for BPM applications.

1.4.3 Compliance products


IBM FileNet P8 suite offers the following compliance products:
򐂰 IBM FileNet Email Manager
򐂰 IBM FileNet Records Manager
򐂰 IBM FileNet Records Crawler
򐂰 IBM FileNet Compliance Framework

To help you better understand what these products can do for your corporation,
we briefly introduce several of them here.

IBM FileNet Email Manager


IBM FileNet Email Manager is an ECM-based e-mail and electronic messaging
active archiving solution for Lotus® Domino®, Microsoft Exchange, and Novell
GroupWise. It captures e-mail directly from Microsoft Exchange, Lotus Notes®,
or Novell GroupWise mail servers. IBM FileNet Email Manager allows users to

Chapter 1. Introduction to IBM FileNet Content Manager 11


manually save messages in the IBM FileNet Content Manager repository, or it
can automatically capture e-mails based on business rules. The e-mails that are
captured are classified and stored in the IBM FileNet Content Manager
repository.

IBM FileNet Records Manager


IBM FileNet Records Manager is a complete application that securely manages
the declaration, classification, security and access, auditing and monitoring,
authenticity, preservation, and destruction of electronic and physical records.
IBM FileNet Records Manager utilizes unique “Zero Click” technology to reduce
the burden and costs associated with proper management of an organization’s
records. IBM FileNet Records Manager integrates directly with IBM FileNet
Content Manager to allow repository objects to be declared and managed as
official records.

IBM FileNet Records Crawler


IBM FileNet Records Crawler monitors, analyzes, and takes action on
documents to enforce policies on objects stored in Microsoft network file shares
to manage compliance risk, achieve cost-effective storage use, and incorporate
content into business activities. IBM FileNet Records Crawler provides high
performance ingestion for system migration or import. It can leverage file plan
elements from IBM FileNet Records Manager to provide instant records
declaration and increased automation.

12 IBM FileNet Content Manager Implementation Best Practices and Recommendations


2

Chapter 2. Solution examples and


design methodology
In this chapter, we describe four common P8 Content Manager
(P8 Content Manager) solutions, which each illustrate P8 Content Manager’s
features and capabilities that fulfill enterprise content management challenges.
In addition, we introduce the design methodology that guides you from
requirements analysis through deployment and administration planning. The
methodology is used as the structure for the remaining chapters of the book.

We describe the following topics:


򐂰 P8 Content Manager sample solutions:
– Document revision and approval process
– Insurance claim processing
– Call center support operation
– E-mail capture for compliance
򐂰 Design methodology

© Copyright IBM Corp. 2008. All rights reserved. 13


2.1 P8 Content Manager sample solutions
In this section, we present four common P8 Content Manager sample solutions:
򐂰 Document revision and approval process
򐂰 Insurance claim processing
򐂰 Information capture supporting call center operation
򐂰 Email capture for compliance

Each solution demonstrates how you can implement P8 Content Manager to


solve enterprise content management challenges.

Note: The solutions we present here are simplified versions. In actual


installations, the solutions are often more sophisticated than what we describe
here. We simplify the scenarios and their solutions for ease of reading and
understanding. The important point from this section is to get an idea of what
P8 Content Manager can do to help solve your business problems.

2.1.1 Document revision and approval process


In the first P8 Content Manager solution, a team of authors, reviewers, and
managers is responsible for updating safety documents. Because of safety and
health regulations, these documents must be rigidly reviewed, and workers on
the factory floor must always have access to the most current safety information.

This P8 Content Manager solution is implemented with the following features:


򐂰 Content versioning, including major and minor versions.
򐂰 Document access controlled by role-based permissions and dependency on
document life cycle status.
򐂰 Checkin and checkout capability.

Figure 2-1 on page 15 illustrates the implemented document revision and


approval process using P8 Content Manager.

14 IBM FileNet Content Manager Implementation Best Practices and Recommendations


1. Author creates 4. New version
supersedes the older;
document for
all versions are retained
revision.
in the repository.

2. Authors and Reviewers


collaborate by checking
document versions in as 0.1

minor versions.
0.2
1.0
0.1

Repository
0.2

3. After final approval ,


document is checked
back in as new major 1.0
version.

Figure 2-1 Document revision and approval process

Solution description
This solution uses P8 Content Manager features without additional
programming. In the design, safety documents are stored in the repository where
they are available to all users (including factory workers) for reference.

Each safety document goes through a document life cycle with multiple states. In
this implementation, the states are minor and major versions. A minor version is
a draft document; a major version is a completed document that has been
approved and released. A security policy is implemented to define the security
that applies to documents in the major version state and those in the minor
version state. Minor versions can only be viewed and modified by authors or
managers. They are invisible to general users. All users can view major versions,
but only authors and managers can modify them.

For simplification and to reflect the majority of actual solutions, this sample
solution does not include document retention. When implementing a document
revision solution for your environment, you must address your document
retention requirements and include them in your solution implementation as
necessary.

Chapter 2. Solution examples and design methodology 15


2.1.2 Insurance claim processing
In this sample P8 Content Manager solution from the insurance industry, the
company policy governs how claims are processed. With the implemented
solution, the insurance claims are handled this way:
1. Claims arrive from field offices by fax.
2. Fax Capture converts these documents to images and adds them to the
repository.
3. The add event (On Add event) launches the claim processing workflow.
4. The workflow routes the claim image to a Review step. A person (Verifier)
verifies the information, makes changes, such as highlights, annotates, or
adds verification marks, and the changes are added as annotations to the
claim image.
5. After review, the claim is routed to the Claim Adjustment step. A person
(Adjuster) examines the claim and either approves or denies the claim. The
decision is recorded as a digital signature that is part of the annotation layer.
6. If the claim is denied, the workflow generates an e-mail notification to the
agent of record.
7. The workflow renders the document to PDF, stores it in the repository, and
updates the company records accordingly. If the claim is approved, a check
will be printed from the accounting department and mailed to the customer.
8. At the end of the workflow step, the claim documents are sent to the
customer.

This P8 Content Manager solution is implemented with the following features and
components:
򐂰 Fax Capture
򐂰 Business Process Manager (for workflow process management)
򐂰 Active content event actions
򐂰 Annotations
򐂰 Branching workflow steps
򐂰 Auto processing step (step processor)
򐂰 Notification through e-mail
򐂰 PDF rendition

16 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Figure 2-2 on page 17 illustrates the implemented insurance claim processing
solution using P8 Content Manager.

1. Insurance claim 2. Fax Capture


arrives from a field adds the claim to
office by fax. the repository.

Capture
3. A workflow
launches
automatically. 8. Claim documents
are sent to the client.

Repository
BPM

4. Workflow 7. If the claim is


approved, the workflow
routes the claim
to a Verifier renders the claim
who adds document to PDF, stores
it in the repository , and
additional data.
updates the company
record accordingly.
A check is also issued
and mailed to the client.
5. Workflow routes the claim
to an Adjuster who approves
or rejects the claim . 6. If the claim is rejected , the workflow
sends an e-mail notice to the agent .

Figure 2-2 Insurance claim processing

Solution description
In this solution, the active content is the insurance claims. They move through a
business process in a series of steps.

IBM FileNet Fax Capture receives faxes, converts them to TIFF images, and
stores them in the repository along with metadata supplied by the insurance field
agent on the fax cover form. A claim workflow is launched automatically when
the fax image is added to the system (the add event). Annotation security, the
TIFF viewer, and other security measures control who can review, approve, and
deny the claim at each step. A custom interface (known as a custom step
processor) provides the interaction with the existing accounting systems and
e-mail servers. Finally, P8 Content Manager’s PDF rendition feature records the
process in a log for an audit trail and adds the PDF format to the repository.

Chapter 2. Solution examples and design methodology 17


2.1.3 Call center support operation
In this sample P8 Content Manager solution from the medical industry, a medical
insurance company must collect and collate claim forms and medical records
submitted by thousands of doctors and patients from across the country. The
information must be available to call center operators as quickly as possible.

The Health Insurance Portability and Accountability Act (HIPA) requires that
insurance providers protect medical records from unauthorized release. This
solution uses P8 Content Manager security features to ensure that only the
patient, the patient’s doctors, and authorized case workers can access the
patient’s record.

A major challenge is the large volume of information that must be sorted,


indexed, and stored. Another challenge is the requirement for fast response and
high availability necessary for call center operations. To address these
challenges, the solution includes load-balanced server farms to achieve high
ingestion and response rates.

This P8 Content Manager solution is implemented with the following features and
components:
򐂰 IBM FileNet Fax Capture
򐂰 IBM FileNet Records Crawler
򐂰 P8 Content Manager security
򐂰 P8 Content Manager server farm
򐂰 Custom application (using P8 Content Manager APIs)
򐂰 High performance search operation (load balancer)
򐂰 Scalability

Figure 2-3 on page 19 illustrates the implemented call center support operation
solution using P8 Content Manager.

18 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Records
3. Load-balanced Web servers provide
Crawler
fast response times required by a large
Servers
call center.

Customer Call Center


Input Files Support

Server-Farmed Load-Balanced
Repositories Application (Web)
Servers
Input Files

1. Patient information, medical charts, 2. A program collates the files by case


and plan coverage are collected at number. The high volume of input files is
several distribution points across the spread across several Content Manager
country. The ingestion rate is generally Servers. Security policies enforce
more than 50,000 files per hour. HIPA regulations.

Figure 2-3 Call center support operation

Solution description
This solution utilizes the following P8 Content Manager components and
capabilities: Records Crawler, server farms, and a custom application using P8
Content Manager APIs.

IBM FileNet Records Crawler


Records Crawler is a file import application. It monitors file directories (FTP
servers in this application) and adds files found in specified directories to a P8
Content Manager repository. Records Crawler can classify the files by document
class and add metadata by reading XML files stored along with the source files.

Server Farms
For applications with high volume loads, P8 Content Manager can be configured
as a server farm. A server farm employs multiple servers to multiply processing
power. In this solution, three P8 Content Manager servers are deployed to
spread the document processing load across three separate P8 Content
Manager servers. A load balancer spreads the incoming load evenly so that even
a very high ingestion rate does not overload a single server.

Chapter 2. Solution examples and design methodology 19


In a similar fashion, searches and document retrieval requests are managed by a
load balancer on the call center side.

Server farms can also be configured in highly available configurations. Refer to


Chapter 9, “Business continuity” on page 213 for more details.

Custom application using P8 Content Manager APIs


P8 Content Manager offers a full featured API set that you can use to build
content management applications that meet your unique business requirements.
In this case, a custom application collates incoming patient records and adds
them to individual patient folders in the repository. Call center operators can
enter the customer name or ID and retrieve the entire folder rather than individual
documents. Refer to Chapter 7, “Application design” on page 153 for information
about custom application design.

2.1.4 E-mail capture for compliance


This solution addresses recent industry concerns about legal discovery of e-mail
messages. It uses IBM FileNet Email Manager and P8 Content Manager to
capture e-mails directly from e-mail server journals.

This P8 Content Manager solution is implemented with the following features and
components:
򐂰 IBM FileNet Email Manager with rule-based automation
򐂰 IBM FileNet Records Manager

Figure 2-4 on page 21 illustrates the implemented e-mail capture for compliance
solution using P8 Content Manager.

20 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Collection Rules

e-mail Server

Inbox
Records File Plan

Email Manager Server


Content Manager
with Records
Manager
1. Effective e-mail
management involves
declaring e-mail content as
business records.

3. e-mails are declared as


2. Email Manager monitors e-mail journals. records and placed in the
e-mails that match a set of collection rules are records file plan where they are
captured. Messages (and any duplicates) are managed by record retention
removed from the e -mail server and replaced by a rules.
link in users’ inboxes. Clicking on the link retrieves
the message from the repository..

Figure 2-4 E-mail capture for compliance

Solution description
This solution uses IBM FileNet Email Manager to monitor an Exchange Server
Journal (Lotus Notes and Novell GroupWise are also supported). The journal
contains a copy of all incoming and outgoing messages. IBM FileNet Email
Manager monitors the journal and searches for messages that meet a set of
conditions or rules. Common conditions include:
򐂰 Messages that contain particular keywords
򐂰 Messages to or from a particular set of addresses
򐂰 Messages that pertain to compliance issues raised by the legal department

Messages that meet the set of conditions or rules are treated this way:
򐂰 The message is captured and added to P8 Content Manager.
򐂰 Duplicates of the message (if the message was sent to multiple recipients)
are identified. Only one copy is added to the repository.
򐂰 The message is classified and declared as an official record subject to legal
retention rules.
򐂰 In the user’s mailbox, the message is replaced by a stub. When the user
clicks the stub, the message is retrieved from the repository and displayed in
Outlook® as expected.

Chapter 2. Solution examples and design methodology 21


2.2 Design methodology
In the previous section, we present four common P8 Content Manager solutions
implemented at client sites. In this section, we introduce the methodology for
creating the solution designs to help you with your implementation process.

From our experience, successful design and implementation follow a well


thought-out, repeatable process. Enterprise content management projects are
complex, involving the installation, configuration, and customization of a mixture
of hardware, software, network resources, content analysis, and process policy.
The projects often cross organizational boundaries and involve teams from the IT
department, the legal department, the Quality Assurance department, and
others. An organized approach is necessary for a successful implementation.

We outline the recommended design methodology that has been used by many
IBM Content Management Lab Services architects in the field and has been
applied successfully in many client situations.

The process starts with a top-down approach:


1. Define requirements.
2. Create a functional design document from the requirements.
3. Design the systems architecture.
4. Design the P8 Content Manager repository.
5. Design the document security model.
6. Build applications and user interfaces.
7. Create a deployment plan.
8. Plan for administration and maintenance.

The remaining chapters of this book address the concepts and recommendations
for each step in the methodology. When you design a P8 Content Manager
solution, we recommend following the chapters in the book and using our
suggestions and recommendations to meet your design challenges.

2.2.1 Requirements analysis


Requirements gathering and analysis is a mandatory first step. If you miss a
critical requirement, the entire system might be flawed. It is important to solicit
input from all groups that will be involved in using, building, testing, training, or
the operation of the system.

Create a requirements document. As a milestone goal, create and review this


document and obtain appropriate sign-offs on the document.

22 IBM FileNet Content Manager Implementation Best Practices and Recommendations


A requirements document needs to include the following information:
򐂰 Functional requirements - What the system must accomplish.
򐂰 Non-functional requirements - What the system must adhere to.
򐂰 Hardware - What standards or limitations apply to hardware specifications.
򐂰 Software - What standards apply to software specifications.
򐂰 Usability - What ease of use standards apply.
򐂰 Performance - What levels of response times are required.
򐂰 Continuity - What service levels are expected.
򐂰 Documentation - What training and documentation are required.

Typically, requirements gathering is an iterative process. You will start with the
functional design, revisit the requirements, complete the requirements analysis,
and then revisit the functional design. This process continues until you feel
confident that all known requirements have been identified and addressed.

The enterprise content management illustration in Figure 2-5 helps in structuring


the discussion among the various people involved in understanding what
functionality the solution requires to solve specific business problems.

ECM
Content and Workflow Presentation and Delivery
Content Ingestion
Manag ement Management

Paper scanning Index and validate Search Searching


templates
Fax Add document
Publishing
Email

Bind documents
Applications Browsing
together

FTP Entry templates Checkin, checkout


Printing
Monitored
filesystem Workflow
Subscriptions Display
definitions
Workflows / EAI

SMTP
Send

Figure 2-5 A simple input and output diagram to assess functional requirements

Chapter 2. Solution examples and design methodology 23


For example, based on the figure, you can discuss the following points that help
you gather functional requirements:
򐂰 How will content be ingested? Options might be paper scanning, fax, e-mail,
other applications, FTP, monitored file system, or workflows.
򐂰 What are the content and workflow management requirements?
Considerations can be indexing, validation, addition of documents
management, binding of documents, usage of entry templates, and checkin
and checkout features.
򐂰 What are your presentation and delivery management requirements?
Consider search, publishing, and browsing requirements; printing needs;
display needs; Simple Mail Transfer Protocol (SMTP) sends; and the
requirement for usage of search templates.

Working from the requirements analysis to a functional design is probably one of


the biggest challenges in your project. This activity requires extensive experience
and solid knowledge about the P8 Content Manager product. We address this
topic in more detail in Chapter 13, “Solution building blocks” on page 337.

As you read through each chapter of this book, remember that each chapter
provides many of the best practices for a number of scenarios but in a
generalized way. Use these best practices and recommendations within the
context of the actual functional requirements for your solution; do not apply them
as is.

2.2.2 Functional design


A functional design shows the components of the systems and describes how
each component handles information objects. The functional design needs to
match the user requirements and be presented to the project team for review.

As mentioned in 2.2.1, “Requirements analysis” on page 22, the functional


design step is often part of an iterative solution design process. Chapter 13,
“Solution building blocks” on page 337 covers this topic in more detail.

2.2.3 System architecture design


The system architecture design (sometimes called a logical design) lays out the
setup of hardware and software components. The system architecture is a
blueprint for system infrastructure construction. A logical design shows servers
(both hardware and software), network connections, storage units, and database
instances. When creating your system architecture design, consider the following
elements:

24 IBM FileNet Content Manager Implementation Best Practices and Recommendations


򐂰 Server topology
򐂰 Network (LAN and WAN) topology
򐂰 Scalability and continuity
򐂰 Virtualization
򐂰 Shared infrastructure
򐂰 Capacity planning
򐂰 Performance

Although system architecture can be derived from nonfunctional requirements, it


can be influenced by the functional requirements. Often, system architecture is
dependent on certain decisions made in the functional design. For more
information regarding system architecture, refer to Chapter 3, “System
architecture” on page 27 and Chapter 9, “Business continuity” on page 213.

2.2.4 Repository design


The repository design is the key design step in a P8 Content Manager project. It
specifies the number, type, and structure of the solution repositories. It defines
the object classes that will be stored in the repositories, including the metadata,
folder storage, and security descriptors for each type of content object.

The repository design typically is tightly linked to the functional design. It affects
and is affected by the security design. The repository design must be carefully
synchronized with the application and security design.

For more details, refer to Chapter 5, “Basic repository design” on page 85 and
Chapter 8, “Advanced repository design” on page 185.

2.2.5 Security model design


You can enforce security through repository design and application design. A
dependency exists between application security constraints and the security
mechanisms applied on the repository. P8 Content Manager offers a rich set of
options for developing a security model.

For more details, refer to Chapter 6, “Security” on page 131.

2.2.6 Application design


Application design is mainly derived from the functional design and must be
synchronized with the repository and security design. The application design
includes user interfaces and custom software components. The design presents

Chapter 2. Solution examples and design methodology 25


the details of application features and functionality and specifies the application
programming interface (API) that developers will use to construct the application.

For more details, refer to Chapter 7, “Application design” on page 153.

2.2.7 Deployment
Deployment is defined as the methodology to move a designed solution from
development to production. When planning for deployment, issues related to
release management, change management, testing, and the steps for the actual
move need to be considered. It is important to plan for deployment as early as
possible, especially at development time, to address many of the challenges that
might arise in this area.

For more details, refer to Chapter 10, “Deployment” on page 245.

2.2.8 Maintenance planning


Maintenance is related to operational aspects, such as system monitoring,
backup and restore, and other tasks. Capacity planning might also be considered
as part of your maintenance planning activity.

For more details, refer to Chapter 11, “System administration and maintenance”
on page 273.

26 IBM FileNet Content Manager Implementation Best Practices and Recommendations


3

Chapter 3. System architecture


In this chapter, we introduce the components of a basic IBM FileNet Content
Manager (P8 Content Manager) system and discuss vertical and horizontal
scaling options. In addition, we describe how you can use a P8 Content Manager
system as a shared infrastructure.

We discuss the following topics:


򐂰 Basic architecture of an IBM FileNet P8 system
򐂰 Scalability
򐂰 Virtualization
򐂰 Shared infrastructure
򐂰 Geographically distributed systems

© Copyright IBM Corp. 2008. All rights reserved. 27


3.1 Basic architecture of an IBM FileNet P8 system
IBM FileNet P8 Platform is a platform that provides the foundation for the IBM
FileNet P8 family of products, allowing the products to seamlessly interoperate
so that their powerful capabilities can be fully utilized. In this section, we describe
the basic architecture of a simple IBM FileNet P8 system.

Note: P8 Content Manager can be used as a stand-alone product or as a


central repository working in conjunction with other P8 family products. The
architecture that we present here therefore can be for a simple P8 Content
Manager product-only solution or a P8 Content Manager-based solution that
works in conjunction with other products.

For the remainder of this chapter, we use the general term IBM FileNet P8
system for a system that is based on P8 Content Manager.

To better understand an IBM FileNet P8 system, we recommend reading the IBM


FileNet P8 System Overview that is shipped with the standard product
documentation in conjunction with reading this chapter.

3.1.1 Major components of an IBM FileNet P8 Platform


There are three primary engines that make up the IBM FileNet P8 Platform. We
define an engine as a collection of services and components that perform a set of
related functions. Although an engine is comprised of many parts, we view it as a
single functional unit. Understanding the complexity of how each engine works is
not necessary, but it is important to know what each engine does.

The three primary engines for IBM FileNet P8 Platform are:


򐂰 Content Engine (CE): Content Engine provides software services for
managing different types of business-related content, which we refer to as
objects. It provides the active content capability so that events involving
content objects can trigger corresponding actions. Content Engine handles
database transactions required for managing one or more object stores.
An object store is a repository for storing objects in an IBM FileNet P8
environment. Each object store manages a database (Content Engine
database) for metadata, and one or many file stores that represent the
physical storage area location. The storage area can be in a database, a file
system, a fixed content device, such as an Image Services repository,
Network Appliance™ SnapLock®, or EMC Centera, or a combination of these
options. For IBM FileNet P8 3.5.x systems, integration with IBM DR550 is
available. (IBM FileNet P8 4.x integration with IBM DR550 is unavailable at

28 IBM FileNet Content Manager Implementation Best Practices and Recommendations


the time of writing this book. For the latest integration information, contact
your IBM sales representative.)
Content Engine uses the latest J2EE technology standards and is deployed
inside of an application server that spans a Java Virtual Machine.
򐂰 Process Engine (PE): Process Engine allows you to create, modify, and
manage automated business processes. It provides software services, such
as business process execution and routing, integration of external rules
engines, process analysis, and process simulation. The processes can be
performed by applications, enterprise users, or external users, such as
partners and clients.
Process Engine uses the Process Engine database in which all
process-related data is stored. Processes run inside of an isolated region that
acts as an individual processing space.
򐂰 Application Engine (AE): Application Engine hosts the Workplace™ Web
application, Workplace Java applets, and Application Programming Interfaces
(APIs). It is the presentation tier for both content and process. It also handles
user authentication against the directory service.
An Application Engine consists of an application server with one or more
deployed applications. It includes a deployment of the JSP™-based Web
client Workplace, as well as the standard Web client Workplace XT, which is
implemented with AJAX using Java Server Faces (JSF) technology. For
simplification, we refer to these applications generally as Workplace. A third
deployed application is the Global Help system. These applications run in the
Java virtual machine context of the application server.

Both Content Engine and Process Engine have their own databases. In addition,
the Global Configuration database (GCD) stores global system configuration
information for all servers in the IBM FileNet P8 domain.

Technically, there is also a Rendition Engine in the IBM FileNet P8 Platform. We


do not address this engine in order to simplify our discussion.

Figure 3-1 on page 30 shows the major components of the IBM FileNet P8
Platform.

Chapter 3. System architecture 29


Application Engine

Process Engine Content Engine

Database

Figure 3-1 Major components of the IBM FileNet P8 Platform

Table 3-1 lists the typical subcomponents that are installed in the main
components of the IBM FileNet P8 Platform.

Table 3-1 Subcomponents installed


Components Subcomponents (applications)

Application Engine Workplace

Process Engine Process Engine database

Content Engine Content Engine database (Global Configuration


Database (GCD)a and one-to-many object stores)
File store
a. The GCD is global across all object stores in a domain and resides in its own
database. Each object store resides in its own database.

Application Engine runs on a Java Virtual Machine (JVM™) of a J2EE application


server (for example, WebSphere Application Server). Workplace is deployed as
a Web application inside of the application server. Process Engine has its own
Process Engine database and manages the process data. Similar to Application
Engine, Content Engine is deployed inside of a J2EE application server and runs
on a JVM. Each object store manages a database (the Content Engine
database) for metadata and a file store for the physical data.

30 IBM FileNet Content Manager Implementation Best Practices and Recommendations


You can install all the components and subcomponents in one physical server.
Although it might be common practice to do so in a testing environment, we do
not recommend this approach for high volume production systems. We discuss
scalability in later sections of this chapter.

3.1.2 A basic IBM FileNet P8 system


A basic IBM FileNet P8 system consists of the components described earlier and
a directory service. The directory service represents an existing directory server
in an enterprise. Many clients require that an enterprise content management
system integrate with their existing security strategy. The IBM FileNet P8 system
leverages industry standards and integrates with the current Lightweight
Directory Access Protocol (LDAP)/architecture description specification (ADS)
implementations for authentication.

Security enforcement consists of authentication and authorization.


Authentication is the process that checks whether a user is who he or she claims
to be. One way to verify this user’s identity is through password authentication.
The IBM FileNet P8 system uses the Java Authentication and Authorization
Service (JAAS) for authentication, which is a set of APIs that enable services to
authenticate. Based on the industry standard, a variety of authorization options
are available, including single sign-on. Authorization is the process of
determining whether the authenticated user has permission to perform certain
actions. The IBM FileNet P8 system implements authorization by using Access
Control Lists (ACLs) to determine for each object which users and groups have
permission to perform actions on that object. For a detailed discussion of
security, refer to Chapter 6, “Security” on page 131.

Together with the directory service, Figure 3-2 on page 32 shows a complete,
basic IBM FileNet P8 system.

Chapter 3. System architecture 31


Directory Server
(existing)

P8 Domain (System)

Global Configuration
Database
P8 Domain
AE Workplace

Process Engine Database

PE

Object Store Content Engine


File Store
Database
CE

Figure 3-2 A basic IBM FileNet P8 4.0 system

The oval around the server in Figure 3-2 marks the IBM FileNet P8 domain to
which all servers belong. In IBM FileNet P8 4.0, all configuration data is stored in
a central location in the Global Configuration Database (GCD). The GCD
contains server and system configuration data, such as information about the
engines that have been installed in the IBM FileNet P8 domain, the topology of
the system, and the locations for caches and storage areas. The GCD belongs to
the entire system.

3.2 Scalability
When planning for an enterprise-wide system, it is hard to predict future
workload. If establishment of the IBM FileNet P8 system is the first project, and
more content process and compliance-related projects follow, there is a good
chance that the system capacity has to be increased and capacity planning has
to be adjusted. It is therefore important to create a system that can easily scale
upwards to accommodate increased workload on the system.

32 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Usually, system scaling is necessary because a client experiences or anticipates
additional workload beyond what was originally planned. For example, due to
new business expansion, a client might decide to add an additional 50,000 power
users to an IBM Filenet P8 system that was originally designed to handle no
more than 10,000 users. Another example is a requirement to extend the existing
system to different geographical locations as a result of new business expansion
or acquisition of companies in other areas. We address this issue in 3.5,
“Geographically distributed systems” on page 52. A third example is a new
requirement for a highly available system. In this case, the existing system might
need to be farmed with redundant components. We address this topic in
Chapter 9, “Business continuity” on page 213.

The question to ask when discussing scalability is, “Should an enterprise use a
few large machines or multiple small ones?” The answer depends on the client’s
existing system infrastructure, preference, available resources, and business
requirements. To help answer this question, we introduce horizontal and vertical
scaling.

3.2.1 Horizontal scalability


Horizontal scaling, also known as scale-out, means to add additional computer
systems to the existing environment. This approach is common in Windows®
environments. Clients who prefer horizontal scaling distribute applications
through a large number of inexpensive machines. Blade servers generally group
multiple compact servers in a rack and allow a large number of servers in a small
physical space.

Figure 3-3 on page 34 shows an example of horizontal scaling, splitting up


engines from one physical server to multiple servers, each hosting an engine.

Chapter 3. System architecture 33


Application Content Process
Directory Server Database Server
Engine Engine Engine
(existing)

Content Engine Process Engine


Database Database

Figure 3-3 A horizontally scaled IBM FileNet P8 system

Farming
This system can be scaled out further by farming engines. In IBM FileNet P8 4.0,
Application Engine, Content Engine, and Process Engine can be farmed using
load balancing. This farming approach can generally be used for components
that do not store data. For databases, the approach is usually vertical scaling. An
exception is Oracle® Real Application Cluster (RAC), which also supports
farming.

Note: Because the terms cluster and farm are not used consistently in the
industry, in this book, we define the terms as:
򐂰 Cluster: Multiple servers are connected by a heartbeat, access shared
storage in an active/passive way, and communicate to the outside world
with one IP address regardless of which server is active.
򐂰 Farm: Multiple servers access a shared resource with each node active,
where the single servers are addressed via a hardware or software load
balancer.

Refer to Chapter 9, “Business continuity” on page 213 for a more detailed


discussion about clusters and farms.

Figure 3-4 on page 35 illustrates an additional level of scaling using farming and
load balance technology.

34 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Load
Balancer
Directory Server
(existing)

Application Application Application


Engine Engine Engine

Load Load
Balancer Balancer

Content Content Content Process Process Process


Engine Engine Engine Engine Engine Engine

Database Server
- Database Instance

Figure 3-4 A farmed IBM FileNet P8 system with load balancers

In this scenario, each server in the Application Engine farm can talk to the
Content Engine and Process Engine farm. Therefore with this scenario, we
address both scalability requirements as well as high availability requirements.

Scaling of the standard IT components, such as directory server, file system, and
database, is not done for this example.

3.2.2 Vertical scalability


Vertical scaling, also known as scale-up, means to use more powerful servers or
extend the existing ones to be more powerful. This approach is frequently used in
the mainframe world and is also available in UNIX® and Windows environments.
An example is adding more CPUs and RAM to the existing system. Vertical
scalability is often used in conjunction with virtualization (see 3.3, “Virtualization”
on page 36). Servers are consolidated and multiple previously stand-alone
servers are merged and run virtualized on a large machine.

The core IBM FileNet P8 components can be scaled vertically as well as


horizontally. As an extension to vertical scalability, clients can scale up a server
that is running an application server instance with multiple deployed applications.

Chapter 3. System architecture 35


The benefit is a better application segregation and more effective use of memory
resources. In the IBM FileNet P8 context, this option applies for Application
Engine and Content Engine.

Instead of just scaling up the server with additional hardware, you can use
multiple J2EE instances on a single physical server, and each application runs
independently in its own J2EE instance. By separating the applications, you
achieve more efficient use of system resources.

Figure 3-5 illustrates the extended vertical scalability option for Content Engine
and Application Engine.

Load
Directory Server Balancer
(existing) JVMs JVMs
JVMs

Application Application
Application
Engine Engine
Engine

JVMs

Load Load
Balancer Balancer
JVMs JVMs JVMs

JVM 3
Port 9083
JVM 2
Port 9081
JVM 1
Port 9080 Process Process Process
Content Engine Engine Engine
Content Content
Engine
Engine Engine

Database Server

Databases

Figure 3-5 Multiple Java Virtual Machines per server

3.3 Virtualization
Virtualization has become a major trend in the IT industry. The drivers for
virtualization are cost reduction and providing better management of hardware
resources. Virtualization can be applied over servers, storage, and applications.
In this section, we focus on server virtualization.

36 IBM FileNet Content Manager Implementation Best Practices and Recommendations


In general, multiple servers are consolidated into fewer servers and operate
inside of their own environment. An abstraction layer between the physical
resources and the running application is created. Physical resources are
encapsulated as logical resources, and the environment for the application is
moved into a virtual machine (VM). The shared resources usually are CPUs,
memory, network bandwidth, and hard disk storage.

The benefit of virtualization is better use of the current hardware, because the
number of physical boxes decreases, and a physical box becomes a virtual
machine. Instead of managing multiple systems, the resource optimization can
be concentrated at one point. It also opens new pathways for high availability and
disaster recovery, because you can copy entire systems to another location.

Depending on the virtualization technology that you use, the system


administrator can assign each virtual machine an individual amount of resources,
such as memory or a fraction of CPU resources at run time. This increases
system agility and ensures scaling on demand. An administrator can react
dynamically to changes in system utilization.

For example, if at the end of the month, usage of a certain virtualized application
increases sharply, it can be scaled on demand and assigned more system
resources. In that way, the system hardware is used more efficiently.

Another example is systems that are usually idle and have predictable peak
times. Given the fact that the peak times occur at different points in time, you can
benefit by moving applications from these systems onto one virtualized server.

A third example is systems that are used for training and support. Because
virtualization technology provides the option to clone an existing system, you can
clone a training system with preloaded data from another system. In the area of
client support environments with different operating systems, application version
and patch levels can be stored and started on demand. That increases flexibility
and speeds up problem deduction, because no time-consuming installation tasks
are necessary.

Within the IBM FileNet context, virtualization was first used for training systems.
Now, certain development and user acceptance systems are also using virtual
machines. In several instances, clients use virtualization in production systems.

Virtualization approaches differ in the degree of abstraction. In this book, we only


provide an overview. For more information about virtualization, refer to white
papers about the virtualization solution providers.

Chapter 3. System architecture 37


Virtual machines using virtual machine monitors
Virtual machines (VMs) run on top of a guest operating system. The virtual
machine is not aware that it is not running on real hardware. Physical resources,
such as a network card, are emulated.

When the VM wants to access resources that are managed in a system context,
the access is performed by a virtual machine monitor (VMM). The VMM analyzes
the code and provides a replacement function that safely accesses the
resources. Figure 3-6 illustrates virtual machines using VMM.

Environment for Virtual Machines


VM 1 VM 2 VM 3

Application Application Application

Operating Operating Operating


System (VM) System (VM) System (VM)

Virtual Machine Monitor (VMM)

Operating System (Host)

Figure 3-6 Virtual Machines using VMM

In certain implementations, the host operating system and VMM are combined
into a single layer. Examples of this approach are VMware products or Microsoft
Virtual Server.

Virtualization on the operating system level


In this architecture, virtualization is done on the host operating system level. The
solution uses a single kernel. Figure 3-7 on page 39 shows the architecture.

38 IBM FileNet Content Manager Implementation Best Practices and Recommendations


VM 1 VM 2 VM 3

Application Application Application

Partition Partition Partition

Partition Management
Operating System (Host)

Figure 3-7 Virtualization on operating system level

In this scenario, the coupling between the host operating system and the VM is
much tighter. Because only one kernel is used, the overhead incurred with this
approach is very small. However, the disadvantage of virtualization at the
operating system level is that it does not allow you to run different operating
systems.

The isolation of the single partition is key, because the system operates in one
kernel. This is done in the partition management part of the operating system.
The resource management, which is where the physical resources, such as
CPUs, memory, and processors, are assigned, is also done in the partition
management part of the operating system.

This level of virtualization is very popular for service providers who offer Internet
services or host special services. For this scenario, the low overhead and the
automation for replicating and horizontal scaling of virtual servers is key.

3.3.1 A virtualized IBM FileNet P8 system


IBM FileNet P8 installations can be implemented using IBM dynamic logical
partitions (DLPARs), Sun Solaris™ Zones, and VMware ESX.

Chapter 3. System architecture 39


Note: At the time that we wrote this book, there were known performance
issues related to content retrieval and authoring when IBM FileNet P8 was
deployed on VMware. The severity of the performance issue can vary
depending on other specifics of the deployment. You need to perform your
own testing to ensure that the system configuration is performing to your
satisfaction. The issue is currently under investigation.

Your test needs to include at least processor utilization and network


throughput statistics. For more information, refer to Hardware and Software
Requirements, Version 4.0, GC31-5480, which is available with the standard
product documentation CD.

Multiple single virtual machines


Figure 3-8 provides one possibility to deploy an IBM FileNet P8 system in a
virtualized environment. Multiple virtual machines are involved. Each component
is deployed in its own virtual machine, in a separate partition.

VM 1 VM 2 VM 3 VM 4 VM 5

Application Content Process Database Directory


Engine Engine Engine Server Server

Partition Partition Partition Partition Partition

Partition Management
Operating System (Host)

Figure 3-8 Deploying IBM FileNet P8 system with each engine in its own virtual machine

This architecture offers the highest flexibility and scalability because of the
number of virtual machines that you can have in the configuration. However, it
presents the highest complexity in regard to network configuration.

In general, you can use this architecture for a production system where
scalability is key.

Colocating engines in a single virtual machine


The opposite approach to multiple single virtual machines is to colocate
everything in one virtual machine. See Figure 3-9 on page 41.

40 IBM FileNet Content Manager Implementation Best Practices and Recommendations


VM 1

Application Content Process Database Directory


Engine Engine Engine Server Server

Partition

Partition Management
Operating System (Host)

Figure 3-9 Colocating engines in one virtual machine

This approach reduces the complexity; however, scalability is limited. Colocating


multiple components in a single virtual machine might be suitable for a small
development system or a test system.

Best practice for system duplication


Figure 3-10 on page 42 presents a solution that allows for easy duplication of a
system.

Chapter 3. System architecture 41


VM 1 VM 2

Application Content Process Database Directory


Engine Engine Engine Server Server

Partition Partition

VM 3

Gateway

Partition

Partition Management
Operating System (Host)

Figure 3-10 Solution with engines in one VM and data in another VM

In this scenario, the applications are separated from the data. All servers in VM1
and VM2 talk to the gateway server in VM3 which holds the connection to the
outside world.

When duplicating the three VMs, the IP address to the outside world has to be
adjusted on the new gateway server VM, and then the clients can access the
new system through a new IP address.

The advantage of a gateway server VM is to decouple the IBM FileNet P8


configuration from the outside access. You can run the VM1/VM2 pairs multiple
times in different partitions without having to reconfigure parameters.

If you need another IBM FileNet P8 system, duplicate the three VMs, reconfigure
the host name and network settings on the gateway server VM and pass the new
URL to the users.

Figure 3-11 on page 43 shows the result after duplication of the system.

42 IBM FileNet Content Manager Implementation Best Practices and Recommendations


VM 1 VM 2 VM 1 VM 2

Application Content Process Database Directory Application Content Process Database Directory
Engine Engine Engine Server Server Engine Engine Engine Server Server

Partition Partition Partition Partition

VM 3 VM 3

Gateway Gateway

Partition Partition

Partition Management

Operating System (Host)

Figure 3-11 Duplicated system

Note that the gateway VM is not an IBM FileNet P8 component, but it is used as
an abstraction layer to the clients. It translates the client IP addresses to the
internal VM IP addresses. All IBM FileNet P8 VM clones operate with the same
IP address but only talk to the translating gateway.

3.4 Shared infrastructure


At times, you might want to set up a shared infrastructure. For example, clients
roll out IBM FileNet P8 as an enterprise-wide solution. They manage multiple
projects on the same system. A central IBM FileNet P8 system therefore shares
its resources and infrastructure among several projects. Another use case is an
internal or external Application Service Provider. The Application Service
Providers can share the infrastructure with several independent customers or
tenants.

In this section, we discuss options for clients using a shared infrastructure model
and provide best practices with regard to the requirements.

The architectures presented are based upon the basic IBM FileNet P8 system
introduced in 3.1.2, “A basic IBM FileNet P8 system” on page 31, that is
colocated on one server.

Chapter 3. System architecture 43


As we mentioned before, on Application Engine, the deployed Workplace
application runs in a Java Virtual Machine. In addition, Content Engine runs
inside of a J2EE application server and contains an object store that manages a
file store and a database for the metadata. Process Engine stores its data in the
Process Engine database. At the domain level, the Global Configuration
Database stores system information.

Communication between the engines


In order to simplify the description, we use Workplace, the predefined application
that comes with IBM FileNet P8, to explain how an application works with
Content Engine and Process Engine. Workplace points to a Content Engine and
to an isolated region.

How does Workplace locate the Content Engine


Workplace is implemented as a deployment inside of an application server. It
uses the setting in the locally deployed WcmApiConfig.properties file for
connection information to Content Engine.

In the WcmApiConfig.properties file, the parameter RemoteServerUrl identifies


the URL of the connected Content Engine. For example:
RemoteServerUrl = cemp:iiop://CEserver:2809/FileNet/Engine

The RemoteServerUrl is used for communication between Application Engine


and Content Engine.

The Content Engine manages object stores. Each object store uses zero to many
file stores.

How does Workplace locate the Process Engine


The Process Engine consists of one to many isolated regions. An isolated region
is a logical subdivision of the workflow database that contains queue, process,
and status information.

In Workplace XT → Tools → Administration → Site Preferences → General


Site Preferences, an administrator selects a connection point for the Process
Engine. A connection point identifies a specific isolated region of the workflow
database. It is created in IBM FileNet Enterprise Manager and stored in the
Global Configuration Database (GCD).

3.4.1 Introduction to data segregation for a shared system


In the following section, we discuss several approaches to using a shared
environment. We start with a system with low data segregation.

44 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Note: Keep in mind that there is no better or worse approach to set up a
system. Different setup options fulfill different needs.

A company that implements multiple content-related and process-related


projects can establish a common platform wherein data among the projects can
be shared and searched over multiple repositories. An Application Service
Provider, who hosts the systems of two competitive customers, needs a strict
segregation of data and does not want to share any data or search across
multiple tenants.

3.4.2 Low data segregation in a shared system


In contrast to a simple IBM FileNet P8 system where all objects share the same
object store and isolated region, you can set up multiple projects each in its own
isolated region. With an isolated region, each project has a separate object store
with its own database for metadata and a file store. Each object store is secured
in a way that ensures that only the project users are allowed to access it. Data is
stored in separate isolated regions.

Each project looks only at its own content and project data. The existing
hardware stays the same and is used with each new project. All engines share
the same operating system. Patches on the operating system and application
level only need to be installed one time.

The content data is secured by the object store security. Each project only sees
the process data of the connected isolated region. All projects share the same
database. Data separation is done by using different isolated regions. It is
possible to use another object store as a shared repository between the projects.

All instances share the same Lightweight Directory Access Protocol


(LDAP)/architecture description specification (ADS) system. Security across all
object stores and isolated regions is the same.

Depending on the number of processes and the duration of projects, you must
carefully examine the size of the Process Engine database.

There is no limit to the number of isolated regions that you can define in a
Process Engine database. However, memory considerations impose a limit on
the number of isolated regions running concurrently, because data specific to an
isolated region is loaded into physical memory on the server when that isolated
region runs (that is, when a logged-on user initiates workflow activity in the
region).

Chapter 3. System architecture 45


We recommend setting no more than five isolated regions per Process Engine.
The amount of memory needed per isolated region is dependent on many
factors, including work item size and workflow complexity.

Note: Application Engine only supports a single isolated region per


Application Engine.

Recommendations
Use this approach when you want to segregate data because of independent
projects. Content data is physically separated; process data is stored in one
database but is separated due to the isolated region.

This system architecture works for the independent projects that run in the same
environment. The projects can share the same infrastructure with a common
upgrade path and the same maintenance hours.

Although the projects share a common LDAP/ADS system, you can separate
security on the application level (for example, by implementing additional filters).

3.4.3 Medium data segregation in a shared system


For medium data segregation in a shared system, you can split the Process
Engine data into several databases so that each Process Engine instance has its
own database. The level of segregation is lifted from an isolated region level to a
database level. For Content Engine and Application Engine, the deployments
can be rolled out to separate instances, and they can be distributed over multiple
servers.

This approach might require additional physical or virtualized environments.


Additional segregation might need to be done on the additional hardware.

The system contains multiple instances of Application Engine, Content Engine,


and Process Engine. Each Application Engine, Content Engine, and Process
Engine triple shares the same operating system. For each segregated project,
administration increases.

This medium data segregation approach can also be combined with the previous
shared system scenario where isolated regions are used.

In this scenario, the content data is secured by the object store security. Each
project can only see the process data of the connected isolated region. All
projects use different databases. Data separation is therefore higher compared
to the previous scenario in which they were separated by isolated regions only.

46 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Shared data and processes can be accomplished by assigning the relevant
security. All instances share the same LDAP/ADS system, so security across all
object stores and isolated regions is the same.

The scalability in this scenario is unlimited. You can establish a new project with
medium segregation by setting up a new Application Engine, Content Engine,
and Process Engine and using another object store and isolated region in an
existing system.

Recommendations
Use this approach for high volume content and process activity. You can use it
when data segregation needs to be at a database level.

This architecture works for projects that run in the same environment. They can
share the same infrastructure with a common upgrade path and the same
maintenance hours.

Although all projects share a common LDAP/ADS system, a separation of


security can be done on the application level (for example, by implementing
additional filters).

3.4.4 High data segregation in separate systems


If you have multiple systems and you want to achieve high data segregation, you
can set up the systems so that they are completely separate, yet still share a
common infrastructure by virtualization over a server. Each system defines its
own IBM FileNet P8 domain and can connect to a different organizational unit in
the directory service. Security is completely separate.

This approach might require additional physical or virtualized environments.


Additional segregation might need to be done on additional hardware.

Each system contains an instance of Application Engine, Content Engine, and


Process Engine. The engines can share the same operating system. For each
segregated project, administration increases. High data segregation can be
combined with medium and low data segregation.

The content data is secured by the object store security. Each project can only
see the process data of the connected isolated region. All projects use different
databases. Data separation is on the database level just as in the medium data
segregation scenario.

With separate systems for high data segregation, data collaboration is limited
due to the different security across the systems. A cross-repository search has to

Chapter 3. System architecture 47


be implemented in a custom application. All IBM FileNet P8 modules, such as
Records Manager, operate within an individual system.

Each system can use a different LDAP/ADS security and reside in a separate
IBM FileNet P8 domain.

The scalability is unlimited. You can establish a new project with high
segregation by setting up a new system in a new domain.

Each system can have its own upgrade path, because there are no shared
components.

Recommendations
This scenario is for clients who need to separate security and block collaboration
among all its systems (applications).

This architecture works for projects that do not share the same environment. The
projects can use different infrastructure, reside in multiple time zones, and have
different maintenance windows.

3.4.5 Degree of sharing


When deciding on a suitable architecture, it is important to review the detailed
requirements. A good starting point is to examine the security requirements.
Does the system need to manage multiple projects using a common security
base? If so, we suggest one domain. If the requirement mandates completely
separated security, we suggest multiple domains.

Can the different projects share content with each other? If so, we suggest one
domain. If not, we suggest look into multiple domains.

If two projects that use different security structures must share data, use two
separate systems and implement some kind of data replication. Another way is to
put all users in a common directory service and secure the content via an access
control list.

One other consideration is the variety of the projects and operation hours. If all
clients are located in the same time zone or the system has a time window for
maintenance, a solution might be to use the medium segregation solution. If the
system has to manage projects with clients in different time zones without a
common maintenance window, you might need to either establish online backups
or set up separate systems.

Last but not least, the diversity of the projects can also be an indicator for the
architecture. Are these groups cooperative and will they accept the same time

48 IBM FileNet Content Manager Implementation Best Practices and Recommendations


slots for system upgrades? If so, we suggest one system. If users from one
system tend to use the latest features and groups in another system tend to
resist system changes unless absolutely necessary, we suggest separate
systems.

Table 3-2 summarizes the advantages and disadvantages of using different


options.

Table 3-2 Degree of sharing and their advantages and disadvantages


Criteria Low data Medium data High data
segregation segregation segregation

Complexity Simple Complex Complex


(hardware)

Administration Simple More effort More effort


effort

Level of data Low High High


segregation

Collaboration High High Low

Security Low Low High

Scalability Low High High

Backup window Must be same time Must be same time Any time

Independent No No Yes
upgrade of projects

3.4.6 Best practice: Offer different qualities of service


In this section, we discuss best practices through a case study, which addresses
both scalability and a shared infrastructure.

In the case study scenario, an application service provider hosts a system with
multiple applications and wants to offer different qualities of service at different
prices:
򐂰 A mission critical application scaled through multiple Application Engine,
Content Engine, and Process Engine servers to guarantee performance and
availability
򐂰 A less mission critical application scaled through one Application Engine,
Content Engine, and Process Engine server

Chapter 3. System architecture 49


So in this model, the price influences the scalability level of a project, which then
influences the cost of setting it up.

In “Communication between the engines” on page 44, we discuss how the


Application Engine, Content Engine, and Process Engine communicate. In
Workplace, a connection point is used, which is configured in IBM FileNet
Enterprise Manager and contains the Process Engine server name and isolated
region. The URL to the Content Engine is stored in the WcmApiConfig.properties
of the deployment.

In this case study scenario (see Figure 3-12 on page 52), the farmed IBM FileNet
P8 system consists of four Application Engines, three Content Engines, and
three Process Engines. A load balancer is used for each engine group to
represent the group as one virtual server. When configuring URLs in Workplace,
we configure the URL of the virtual server and let the load balancer distribute the
workload.

The idea for different qualities of service is to use different virtual servers on each
load balancer layer depending on the calling module.

In this case study, there are three projects. Project01 is a non-critical application
wherein the hosting service was sold to the tenant for a special rate. This is a
candidate for the low quality of service category. Project02 is medium-critical.
Project03 is a mission-critical application that must be able to scale and must not
have a single point of failure.

The projects are separately deployed and available under a URL that contains
the project name. The URL for Project03 is as follows:
http://proj03vae:9080/WorkplaceXT

We appended the host name with vae because the URL points to a virtual
Application Engine.

The Application Engine load balancer is available via Domain Name System
(DNS) under three IP addresses, each of which represents a virtual server
(proj01vae, proj02vae, and proj03vae).

If a user uses this URL, the virtual server on the Application Engine level for
project03 is used (which is proj03vae). The load balancer passes this request to
the physical servers, AE2 and AE3, and does a round-robin load balancing.

The Content Engine load balancer is available via DNS under three IP
addresses, each representing a virtual server (vlowce, vmediumce, and
vhighce).

50 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Similar to Content Engine, the Process Engine load balancer is available via
DNS under three IP addresses, each representing a virtual server (vlowpe,
vmediumpe, and vhighpe).

To summarize, each project’s URL points to its own deployment. At each


deployment, we point to a virtual Content Engine from a certain category and an
isolated region that is managed by a virtual Process Engine of a certain category.

In the Workplace deployment on AE1, the file WcmApiConfig.properties contains


the entry:
RemoteServerUrl = cemp:iiop://vlowce:2809/FileNet/Engine

Deployment on AE2 uses:


RemoteServerUrl = cemp:iiop://vmediumce:2809/FileNet/Engine

And, deployment on AE3 and AE4 use:


RemoteServerUrl = cemp:iiop://vhighce:2809/FileNet/Engine

For PE1, we configure an isolated region managed by Process Engine farm,


vlowpe. For PE2 and PE3, we configure an isolated region managed by the
Process Engine farm vhighpe.

In the case study scenario, the user who uses the following Web address
connects to the virtual Application Engine server proj03vae, which connects to
the virtual Content Engine server vhighce and the virtual Process Engine server
vhighpe:
http://proj03vae:9080/WorkplaceXT

Figure 3-12 on page 52 summarizes the idea and explains it for the Application
Engine and Content Engine level. As described earlier, this works the same for
Process Engine.

The triangle below the load balancers marks the servers that are pooled in a
virtual server.

Chapter 3. System architecture 51


User
User http://proj02vae:9080/WorkplaceXT User
http://proj01vae:9080/WorkplaceXT http://proj03vae:9080/WorkplaceXT
1. The load balancer can be reached under:
Proj01vae
Proj02vae
Load Proj03vae
Balanceproj3vae proj3vae contains AE 3 and AE4
r
proj2vae
2. At each AE , the connection configuration
contains a load balancer virtual server name
For example , on AE2, the connection
string in WcmApiConfig .properties
proj1vae
points to vmedium . This is an alias for the
AE1 AE2 AE3 AE4 load balancer .

Same concept for


3. The load balancer can be reached under
PE
vlowce
vmediumce
Load vhighce
Balance vhighce contains CE 1, CE2 and CE3.
r
vlowce vmediumce

vhighce

CE1 CE2 CE3

Directory Server
(existing)

Figure 3-12 Providing different qualities of service

Summary:
򐂰 For project01, we use a low quality of service, using AE1, CE1, and PE1.
򐂰 For project02, a medium quality of service is provided using AE2, CE1 + CE2,
and PE1 + PE2.
򐂰 For project03, a high quality of service is provided using AE3 + AE4, CE1 +
CE2 + CE3, and PE1 + PE2 + PE3.

3.5 Geographically distributed systems


In previous sections, we discussed scalability, virtualization, and data
segregation. In addition, IBM FileNet P8 contains a number of capabilities to

52 IBM FileNet Content Manager Implementation Best Practices and Recommendations


extend the system geographically and use it as a distributed system with different
locations.

From our experience, the trend in Web application design is toward centralized,
highly available applications. However, under certain circumstances, it makes
sense to think about a distributed system. For example, a client who has multiple
geographical locations , each of which exclusively uses local resources, might
have different options in setting up the system: The client can use different
independent systems, one for each location. However, if all users are managed
in a central directory service, a better solution is one distributed system. In this
case, based on security, a multi-repository search is possible. Collaboration
between the locations is better, and enterprise-wide records and retention
management can be focused on one system.

3.5.1 Domain, site, virtual server, and server configuration


To address requests for a distributed system, you can structure IBM FileNet P8
system into domain, site, virtual server, and server configuration. This
hierarchical configuration scheme is also designed to provide the capability to
administer any given object at a very high and broad level (for example, at the
domain level) or at a very low and granular level (for example, at the server
instance level).

A domain is the environment in which all IBM FileNet P8 servers operate.


Technically, domain information is stored in the Global Configuration Database
(GCD) that holds all topological information of the domain, such as servers and
assigned resources. It contains the descriptive and location information of the
subcomponents.

Figure 3-13 on page 54 shows one IBM FileNet P8 system distributed over two
locations at a domain level. The system is distributed over two locations: the
main location and a satellite location.

Chapter 3. System architecture 53


Main Location

Load Balancer

Application Application
Engine Engine
Database Server
- Database Instance

Load Balancer Load Balancer


Databases
- Global Configuration Database
- Content Engine Database
- Process Engine Database

Content Process Process


Content
Engine Engine Engine
Engine

Directory Server
(existing)

Satellite WAN

Directory Server
(existing)
Database Server
- Database Instance

Application Content Process


Engine Engine Engine
Databases
- Content Engine Database
- Process Engine Database

Figure 3-13 Domain level view of a geographically distributed system

A site represents a geographical location. All site resources are well-connected


via fast, reliable LAN. There is no functional limit to the number of sites that a
single IBM FileNet P8 domain can contain.

A virtual server is the logical service point with which Content Engine clients
interact. A virtual server can map to a single independent server instance or to a
set of server instances. When a virtual server contains multiple server instances,

54 IBM FileNet Content Manager Implementation Best Practices and Recommendations


client requests are load-balanced across the set of server instances through the
J2EE application server’s clustering capabilities or through the use of a hardware
load balancer that provides scalability and high availability. In either case,
applications accessing the virtual server are unaware of the number or type of
server instances that reside behind it. There is no functional limit to the number
of virtual servers that a single IBM FileNet P8 domain can contain. Beside
farming, a virtual server can also be an active/passive cluster. Usually, farming is
preferred, because all nodes are active.

A server instance is an individual J2EE application server instance. Multiple


server instances (each running in their own JVM) can be hosted on a single
physical server. Content Engine clients do not interact directly with a server
instance. Logically, clients always go through a virtual server. There is no
functional limit to the number of server instances that a single IBM FileNet P8
domain can contain.

Figure 3-14 illustrates a hierarchical view of the domain, sites, virtual server, and
server as displayed in the IBM FileNet Enterprise Manager.

Figure 3-14 Hierarchical view of domain, sites, virtual servers, and server

This hierarchy simplifies administration, because attributes are inherited from a


parent component to its children. It minimizes duplicate configuration. One
example is trace logging. If you want to analyze the entire system, you can
activate trace logging at the domain level. If you want to activate trace logging
only at a special site, virtual server, or server, you can configure it at the
appropriate level.

Chapter 3. System architecture 55


To summarize, dividing a system into hierarchical components is useful for
creating a distributed system and simplifies the administration due to the
inheritance feature. For more information about domain, sites, virtual server, and
server instances, refer to 5.5, “Repository organizational objects” on page 101.

3.5.2 Distributed content caching model


In this section, we discuss the caching mechanism and show the architecture of
a geographically distributed system.

Caching is a building block for distributed systems. IBM FileNet P8 includes


caching at the Content Engine level. It is deeply integrated into the system. The
benefits of caching are that it speeds up retrieval and it can be used by any client
regardless of whether the application is Workplace, a custom application, or IBM
FileNet Enterprise Manager. Caching addresses content objects and can be
used for all types of storage. A document can reside in multiple caches. You can
place each cache on the Content Engine server or a network share.

Cache acts in a write-through way. This means that the cache is updated with
any content being added or updated (written) into the system. At retrieval time,
Content Engine checks to see if the document is already in the cache before
retrieving it from a file store. Documents remain in the cache until cleanup time.
Although you can assign a cache at a site, virtual server, or server level, we
recommend assigning a single cache at the site level. A cache can also be used
by more than one Content Engine server. Using custom programming, you can
preload (also known as prefetch) a cache during the night if the retrieved objects
can be predicted. A preloaded cache achieves optimal performance, because
content can be quickly retrieved.

Recommendations
The performance tests that have been done in the IBM FileNet lab provide good
assistance in helping you to decide which configuration is best suited for your
requirements.

Note: At the time of this writing, we used the white paper IBM FileNet P8 3.0.0
WAN Performance as reference. Currently, further tests of IBM FileNet P8 are
in progress. Check for the latest documentation.

In the IBM FileNet P8 3.0.0 WAN Performance white paper, different distributed
architectures are tested and response times for each architecture are
documented. The general findings are:
򐂰 Having all the systems at the same site, or having only the directory service
remote from the other components, results in the best performance. Note,

56 IBM FileNet Content Manager Implementation Best Practices and Recommendations


there are certain scenarios where it makes sense to have Content Engines
acting as cache servers at remote sites.
򐂰 Having WAN between Application Engine and Content Engine produces
better average IBM FileNet Content Manager response times than having
WAN between the clients and Application Engine.
򐂰 Having Content Engine local to the database provides optimal performance.

Client deployments emphasizing the IBM FileNet Business Process


Management capabilities need to observe the following guidelines:
򐂰 Similar to the behavior observed with the IBM FileNet Content
Manager-based operations, having all the systems on a LAN, or having only
the Directory Service remote, achieves the best possible performance.
򐂰 Avoid deployment scenarios having Process Engine remote from the
database. Placing WAN between these components causes a significant
performance loss.

3.5.3 Request forwarding


When talking about distributed systems, the efficient use of the network
bandwidth between the locations is essential. In this area, IBM FileNet P8 4.0
includes major benefits. In this section, we show the mechanism with and without
enabled request forwarding.

Figure 3-15 on page 58 shows a system distributed over two locations, main and
satellite locations. Request forwarding is disabled.

Chapter 3. System architecture 57


Main Location

Client

Database Server
- Database Instance

Application Content Process


Engine Engine Engine
Databases
- Content Engine Database
- Process Engine Database

AN
W
er
File Store

ov
NAS / SAN /

s
Directory Server

rip
fixed

dt
(existing )

un
ro
le
Satellite
ul
tip WAN
M

Directory Server
(existing)
Database Server
- Database Instance

Application Content Process


Engine Engine Engine
Databases
- Content Engine Database
- Process Engine Database

File Store
NAS / SAN /
fixed

Figure 3-15 Retrieval without request forwarding

If a client on the main location (main) initiates a search request on content


residing at the satellite location (sat), the communication goes from Application
Engine (main) to Content Engine (main). Content Engine (main) then contacts
the database (sat), and the database data is transferred over the network.
Finally, Content Engine (main) communicates the result list back to Application
Engine (main) that presents it to the client.

When Content Engine (main) talks to the database (sat) and searches for
metadata, this can require a number of queries, and therefore network
round-trips occur to complete the request. If the WAN link between the sites has
high latency, delayed response times are the consequence.

58 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Figure 3-16 shows the mechanism for IBM FileNet P8 4.0 with request
forwarding enabled.

Main Location

Client

Database Server
- Database Instance

Application Content Process


Engine Engine Engine
Databases
- Content Engine Database
- Process Engine Database

File Store
NAS / SAN / Directory Server
fixed (existing )

Satellite WAN

Mult ip
le rou
ndtrip
s ove
r LAN
Directory Server
(existing)
Database Server
- Database Instance

Application Content Process


Engine Engine Engine
Databases
- Content Engine Database
- Process Engine Database

File Store
NAS / SAN /
fixed

Figure 3-16 Retrieval with request forwarding

When enabling request forwarding, you declare that each defined object store
has affinity with a specific site.

Again, the client (main) addresses Application Engine (main), which contacts
Content Engine (main). Instead of directly contacting the database (sat), Content
Engine (main) forwards the request to Content Engine (sat), which contacts the
database (sat). Content Engine (sat) gathers all data and returns it to Content
Engine (main). Again, Content Engine (main) passes the result back to
Application Engine (main) where it is presented to the client.

Chapter 3. System architecture 59


In general, when request forwarding is configured, client requests to other sites
are forwarded to one or more virtual servers at the site that accessed the object
store with which it is associated. This has the advantage that a possible high
network latency has less influence on the searching, because the cost intensive
database access is performed locally via the LAN instead of through the WAN.

At the time that a Content Engine server receives a request, it evaluates the
request to decide whether to forward it or not. For metadata requests, if all
actions in the client request are based on an object store at a different site,
Content Engine will attempt to forward it. At the destination site, the administrator
enables one or more virtual servers to be able to receive the incoming requests.
In our example, Content Engine (main) has to attempt forwarding, because it is
possible to temporarily disable acceptance of incoming forwarded requests at
Content Engine (sat), for example, for maintenance reasons.

The criteria if a request is forwarded or not is whether the majority of the actions
addresses one object store at a different site and whether there are any requests
for the current site. A forwarded request is not forwarded again. Request
forwarding is across the Enterprise JavaBeans™ (EJB™) transport layer only
and only supported across homogeneous application servers.

3.5.4 Use cases of distributed systems


Let us discuss several use cases and the corresponding architecture. Assume
that we have two locations, which are called main location and satellite location.

The main location contains a full IBM FileNet P8 system (Application Engine,
Content Engine, Process Engine, an object store, file store, database, and
Directory Service). You can set up the following options at the satellite location:
򐂰 No IBM FileNet P8 components are deployed at the satellite location. Only
third party solutions are deployed.
The easiest way to enable the users at the satellite location to use the system
is to provide them with the URL of the Workplace application (or a custom
application) at the main location. You can choose this approach if the satellite
location has a similar infrastructure as the main location with high bandwidth
and low latency.
An alternate approach is the use of third-party software, such as Microsoft
Terminal Server or Citrix, in which the application runs at the main location
and only the content on the window is transferred. This is a solution for clients
who have already deployed this technology.

60 IBM FileNet Content Manager Implementation Best Practices and Recommendations


򐂰 Install Application Engine only.
As stated in the FileNet P8 3.0.0 WAN Performance white paper, establishing
an additional Application Engine at the satellite is a solution for WAN
networks, where it is better to have the WAN cloud between Application
Engine and Content Engine instead of between Application Engine and the
clients. Although the footprint of this solution is small in relation to
performance, much better results can be achieved when using caching.
򐂰 Install Application Engine and Content Engine (not recommended).
We do not recommend this setup, because the Content Engine needs to be
local to the database for optimal performance.
򐂰 Install Application Engine and Content Engine with content cache area and
request forwarding enabled.
This is the classical scenario for a centralized system, where the satellite uses
caching for content retrieval. Nightly preloading completes the solution if the
retrieval pattern is predictable.
򐂰 Install Application Engine, Content Engine, and file store (not recommended).
We do not recommend this scenario, because the Content Engine must be
local to the database for optimal performance.
򐂰 Install Application Engine, Content Engine, file store, and database.
In this scenario, data is stored at the satellite location and is not cached.
This architecture is useful:
– If no data retrieval from the main site is required, and only local retrieval
occurs.
– An independent satellite must store its own data.
– A satellite system is used for ingestion, stores the data locally, and moves
it to the main system.
If retrieval from the satellite location to the main location is required, add a
content cache area to the main location.
򐂰 Install Application Engine, Content Engine, file store, database, and content
cache area.
Choose this architecture if:
– Data retrieval from the main site is required and is locally cached in the
content cache area. If retrieval from the satellite to the main location is
required, add a cache server to the main location.
– An independent satellite must store its own data.
– A satellite system is used for ingestion, stores the data locally, and moves
it to the main system.

Chapter 3. System architecture 61


򐂰 Install Application Engine, Content Engine, file store, database, content
cache area, and Process Engine.
This architecture has the biggest footprint but is also most flexible. Use this
architecture for the highest data segregation level inside of one IBM FileNet
P8 domain that enables collaboration among the locations and has the best
Business Process Management (BPM) performance.

In the following paragraphs, we describe typical client scenarios and the possible
architectural solution.

Pure ingestion scenario


A client needs to ingest content at a satellite location. However, the expected
retrieval rate from the main location will be high. Only in rare cases is retrieval
done from the satellite location.

To eliminate a possible performance decrease of the WAN between the


locations, this solution requires that the documents are stored in the main
location, because high rates of retrieval are done there.

If the purpose of this scenario is basic capture, one solution is to use IBM FileNet
Capture Content Engine clients. You do not need additional server components.
Instead, you configure a shared Content Engine repository using a local SQL
database and then commit the documents at an appropriate time to the Content
Engine at the main location.

If another ingestion method is used and the data is temporarily stored at the
satellite location, an approach is to install a Content Engine with a local file store
and a database. You can create custom code to move the content to the main
location at an appropriate time, for example, during off-peak hours. Or, we
generally recommend using batched ingestion across the WAN to the main site.

Central data storage ingestion and retrieval scenario


This scenario is similar to the previous one with the addition of retrieving
documents at the satellite location. All data is transferred and stored at the main
location, because a high and frequent retrieval load is expected on the main
location.

Because local retrieval is a requirement, we can put an Application Engine,


Content Engine with local file store, a database, and a cache on the satellite
location. As in the previous scenario, the content is moved to the main location at
an appropriate time, but in this case, a local copy resides in the cache for fast
retrieval.

62 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Decentralized system with content independent satellite
scenario
In this scenario, the satellite location is more independent. Files scanned at the
satellite location are stored and mainly retrieved there. The directory service
resides at the main location, and retrieval also occurs from the main location.

A one system approach is taken, because there is collaboration between the


locations. On the satellite location, an Application Engine, Content Engine with
local file store, a cache, and a database are set up. We put a database there to
store the metadata for the local data, because Content Engine and database
need to be grouped together. For retrieval from the main location also, a cache is
configured that is filled during retrieval. Additionally, a custom application is
created that does nightly prefetching and can be used to fill the cache at the main
location and the satellite location.

Note: As in this scenario, we permanently store data on the satellite location.


These data sources have to be included to the system backup. We
recommend backing up the caches, but it is mandatory to do so.

Two satellite locations, own BPM, different scaling scenario


This scenario is an extension of the previous decentralized system. Instead of
one, we have two satellite locations. The main location has the highest workload,
satellite 1 has a high workload, and satellite 2 has a low workload. Due to the
Business Process Management (BPM) activities, Process Engine has been
deployed in the satellite. Figure 3-17 on page 64 shows the architecture.

Chapter 3. System architecture 63


Main location

Load Balancer

Application Application Application


Engine Engine Engine
Database Server

Load Balancer Load Balancer


Databases

Content Content Content Process Process Process


Engine Engine Engine Engine Engine Engine

File Content Cache Area


Store (Filesystem) Directory
Server
WAN
Satellite 1 Satellite 2

Load Balancer Directory Directory


Server Server

Application Application
Engine Engine
Database Server

Databases Load Balancer Load Balancer Database Server

Databases Application Content Process


Engine Engine Engine
Content Content Process Process
Engine Engine Engine Engine

File Content Cache Area


File Content Cache Area
Store (Filesystem)
Store (Filesystem)

Figure 3-17 One decentralized system with different scaling per location

64 IBM FileNet Content Manager Implementation Best Practices and Recommendations


4

Chapter 4. Capacity planning with


Scout
In this chapter, we briefly discuss the capacity planning and use cases for the
IBM FileNet P8 system capacity planning tool, Scout.

We cover the following topics:


򐂰 Scout overview
򐂰 Example use cases for Scout
򐂰 Capacity planning for new systems
򐂰 Scout output
򐂰 Predictions from a baseline
򐂰 Best practices
򐂰 Disk sizing
򐂰 Performance-related reference documentation

© Copyright IBM Corp. 2008. All rights reserved. 65


4.1 Scout overview
When you introduce a new system or extend an existing one, choosing the right
hardware is an important consideration in your planning. The IBM sales team
supports you in planning the capacity of the system during this phase.

The marketing team uses a System Capacity Planning Tool called Scout to
model transactions and to obtain answers to questions, such as:
򐂰 Based on the projected use of the IBM FileNet P8 system, what servers are
needed?
򐂰 Given a certain hardware configuration, how busy will the servers be?

Scout is generally used by IBM FileNet P8 System Engineers, IBM FileNet P8


Lab Services, and IBM FileNet P8 Partners.

After modeling a workload, Scout produces utilization reports that show the
demand placed upon a given set of hardware by that workload.

Figure 4-1 illustrates the basic modeling process for capacity planning.

Adjust

Select hardware

OK?

Define workload Transform (automatic) Examine utilization Document and present

Refine

Figure 4-1 Basic modeling process for capacity planning

Scout uses at least two input sources. One is the hardware configuration, and
the other is the defined workload that consists of one or multiple transactions.
The output from Scout consists of performance charts. If the system utilization of
all components is below a threshold, the system is deemed adequate to meet the

66 IBM FileNet Content Manager Implementation Best Practices and Recommendations


workload requirements. The results are documented. If system utilization is at or
above the threshold, you need to change the hardware configuration.

When defining a workload in a presales situation, the details of a model might not
be obvious. Therefore, it might be easiest to develop your general model first and
refine it as you learn more details.

You might want to start with a moderate hardware configuration. When defining
your workload, after each transaction, you can immediately see the result in the
chart and scale the hardware with the transactions. This provides you a better
feeling of the cost per modeled transaction. However, there is a chart option to
view utilization by transaction function to get the explicit cost per modeled
transaction function.

When modeling the workload, Scout provides a walk-through wizard for a quick
start that helps you to configure the basic parameters of the components that you
want to size. We found it useful to use the wizard and save the result to another
file. The wizard helps you learn which transaction functions to add to your
workload but it creates a simplified model, whereas some of the lesser used
functions can only be obtained by manually adding them to your workload from
the Transaction Templates in the tree view.

4.2 Example use cases for Scout


You will want to use Scout to help you prepare for the following tasks:
򐂰 A new system is planned, and you need to select the hardware.
򐂰 During a system implementation, the Scout sizing is refined reflecting the
latest requirements.
򐂰 An existing system is extended. Additional users and additional applications
are rolled out.
򐂰 An existing system needs to be migrated to new hardware. This can occur in
conjunction with reorganization and moving into new buildings, system
consolidation, new outsourcing contracts, or simply replacing outdated
hardware.
򐂰 The current system needs to be analyzed. For example, a client wants to
know what additional workload the system can handle or requests a detailed
performance analysis. In this case, current production data is available and
can be used by Scout.

Chapter 4. Capacity planning with Scout 67


4.3 Capacity planning for new systems
In this section, we list typical questions for sizing a system.

In Chapter 2, “Solution examples and design methodology” on page 13, the


following P8 Content Manager solutions were introduced: document life cycle
management, insurance claim processing, information capture supporting call
center operations, and e-mail capture for compliance. Each solution focuses on
different functionality:
򐂰 Versioning and document management
򐂰 Scanning and processing via a business process
򐂰 High volume ingestion and storage
򐂰 Ingestion, storage, and compliance

Each system sizing is individual. We concentrate on general sizing questions.


The typical questions to ask the client when preparing to size a system usually fit
into the following categories:
򐂰 Client environment
򐂰 Content ingestion
򐂰 User activities
򐂰 Configuring Records Management
򐂰 Business Process Management specific

Client environment
The following list provides questions to ask during sizing that are related to the
client environment:
򐂰 Does the client prefer specific hardware? If yes, which vendor?
򐂰 Are there standard machine types that the client wants to use? If yes, what is
the standard server, which processor, and how many CPUs?
򐂰 What application server will be used?
򐂰 What database server will be used?
򐂰 What are the default working hours? You can overwrite this default value in
each transaction if needed.

Content ingestion
The following list provides questions to ask during sizing that are related to
content ingestion:
򐂰 If content is ingested through scanning:
– What are the scanning hours?

68 IBM FileNet Content Manager Implementation Best Practices and Recommendations


– What is the average number of scanned documents during the scanned
hours?
– What is the total number of documents usually scanned?
– What is the average size (in KB) of a scanned document?
– In how many batches are these scanned documents processed?
– How many documents are in a batch?
򐂰 If content is ingested through file import:
– What are the importing hours?
– What is the average number of documents imported during that time?
– What is the total number of documents usually imported?
– What is the average size (in KB) of a scanned document?
򐂰 If ingested content is e-mail:
– What e-mail servers are used?
– Is journaling enabled?
– What is the average number of e-mails received per day to be ingested?
– What is the average e-mail size?
– What is the average number of attachments per e-mail?
– What is the average attachment size (in KB)?

User activities
After the content is ingested, corresponding actions are started. The content can
be processed by Business Process Management or simply stored and used for
retrieval later. A user can work on the content using a custom application or
Workplace. How the user uses the content might determine the sizing of the
system. General questions related to user activities are:
򐂰 For logon and logoff activities:
– How many times does a user generally log on and log off per day or per
week?
– Are there peak hours of logon and logoff activities during the day or during
the week?
– Are there different logon and logoff behaviors for different users (for
example, are there different behaviors for power users compared to
occasional users)?

Chapter 4. Capacity planning with Scout 69


򐂰 For search, browsing, and retrieval activities (the same questions can be
asked for different user groups):
– At what times do browsing and retrieval take place?
– Are there peak hours during the day?
– Are there deadlines (such as all orders have to be reviewed by noon)?
– What is the average document size of the documents to be retrieved?
– How many searches are usually performed per day?
– How many documents are returned on average per search action?
– How many custom properties (metadata fields) are retrieved on average
per document?
– How many folders are browsed on average per day by a user?
– How many folders are accessed via a bookmark?
– How many documents are retrieved per day by a user?
򐂰 For new document creation:
– Will new documents be created evenly during the work hours?
– How many documents on average will be created during the work hours?
– What is the average document size (in KB)?
򐂰 For check out and check in activities:
– Will check out and check in be distributed evenly during the work hours?
– What is the number of documents checked out and in during the work
hours?
– What is the average document size (in KB)?

Note: After documents are checked out, they usually are viewed. This
viewing is modeled as an additional retrieval.

򐂰 For metadata modification activities:


– Are there major updates of metadata? If yes, in what time frame?
– How many documents are usually updated during the working hours?
– Before they are updated, how many properties are retrieved?

70 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Configuring records management
We distinguish records management actions by the Records Manager role and
by the users who declare records. Records can be declared through a system
step in a business process or manually by users. Questions to ask when sizing
an IBM FileNet P8 records management solution are:
򐂰 For Records Managers:
– What is the logon and logoff pattern of the Records Managers?
– How many searches for records are performed in a certain time period?
– How many browse actions in the file plan are performed?
– How many times are details retrieved? Examples of details are access
security, detail, history, holds, and so on.
򐂰 For general users who declare records:
– How many existing documents are declared as records in a certain time
period?
– How many new documents are declared as records in a certain time
period?

Business Process Management


If the solution involves Business Process Management (BPM), ask the following
questions for each workflow:
򐂰 What is the time pattern for launching workflows?
򐂰 How many metadata fields does the workflow contain?
򐂰 What is the average field length (in bytes) of the metadata?
򐂰 How many workflows are launched in the time pattern?
򐂰 How many user steps does a workflow contain?
򐂰 How many user steps use eForm?
򐂰 How many system steps does a workflow contain?
򐂰 How often are workflow fields updated?
򐂰 How often are users updating their views?

4.4 Scout output


For every server and certain infrastructure components, Scout produces a
utilization chart for one day. The system is adequately handling the workload if
the CPU utilization is below 40%. This threshold is used, because response time
is exponential, not linear (Queuing Theory).

Chapter 4. Capacity planning with Scout 71


By sizing the system for 40%, the system can handle temporary peaks with
acceptable wait times. Figure 4-2 shows a sample output of Scout with the
threshold at 0.4 (40%).

Figure 4-2 Sample Scout output

You can see the Content Engine load throughout the day. In the morning hours
between 8:30 a.m. to 11:30 a.m., the system load is higher due to scanning
activities. From 11:30 a.m. to 4:30 p.m., the activity level is lower, because only
retrieval and processing activities occur. Between 3 a.m. and 4 a.m., prefetching
takes place. Documents that are needed for the next day are retrieved and
loaded into the cache for better performance.

4.5 Predictions from a baseline


When sizing a new system, Scout converts a given workload to utilization data.
If you are sizing a system upgrade, you already have current data (an existing
baseline) available on which you can perform additional modeling. Examples are
migration to new hardware, added applications, or added users.

The first step is to collect baseline data for the involved systems. For the Content
Engine baseline, you use the System Manager Dashboard. A dashboard is a tool
for gathering performance data and provides current Content Engine utilization
data. (For more information about dashboards, refer to 11.2.2, “Dashboard” on

72 IBM FileNet Content Manager Implementation Best Practices and Recommendations


page 276.) If an Image Services system is also involved, data can be exported by
the integrated performance data collecting function (perf_mon). The baseline
data can be imported to Scout, and the utilization data can be used as the basic
workload.

Figure 4-3 is an extension of the capacity planning process. It includes the


collection and import of baseline data from running systems.

Adjust

Select hardware

Collect and import

OK?

Define workload Transform (automatic) Examine utilization Document and present

Refine

Figure 4-3 Import from a baseline

As an example, we have an existing IBM FileNet P8 system including the FileNet


Image Service System. The client is planning to roll out another application on
Content Engine that is expected to double its workload. In addition to that, a
third-party application is installed that adds about 20% additional load.

For modeling purposes, we import the current Content Engine utilization with a
factor of two, import the Image Server utilization, and add an application that
accounts for an increased workload of 20%.

Figure 4-4 on page 74 shows the utilization for the Image Services system.

Chapter 4. Capacity planning with Scout 73


Figure 4-4 Utilization of an Image Services system

The chart shows the workload summary after importing the three workload
profiles: one for Content Engine, one for the Image Server, and one for the
additional third-party application. The various colors represent single services
that run simultaneously. The chart illustrates the imported workload together with
the new application workload. The result is that with the additional application,
the Image Services server exceeds its threshold at 7:30 a.m. It needs to be
scaled up with two additional CPUs.

4.6 Best practices


The following bullets summarize several recommendations when working with
Scout:
򐂰 Take the IBM course 201778 P8, Sizing, and Capacity Planning (Scout) to
quickly gain the knowledge required to use the tool.

74 IBM FileNet Content Manager Implementation Best Practices and Recommendations


򐂰 When initially performing a Scout sizing, the client will not have the exact
answer to all of the questions; therefore, make assumptions and document
them. Get the clients to sign off on the assumptions used for sizing. Be
conservative when making assumptions. Configure the system for peak
loads.
򐂰 Add a document to the Scout calculations describing which data was
provided by the client, which assumptions were made, and what the Scout
output was. Also, document how the Scout input fields were calculated from
the data given by the client. This helps you to review a Scout calculation after
a certain amount of time and helps you to understand why transactions were
modeled in a particular way at a later refinement.
򐂰 Use project variables to ensure consistency throughout your transactions.
򐂰 When you start, you might want to choose medium performance hardware to
better see the effects of the configured transactions.
򐂰 If you are unsure about the parameters of a transaction, use the online help.
Use the Help Topic Icon that lists the details quickly.
򐂰 Split a complex scenario into several steps to reduce complexity.
򐂰 After changing parameters, immediately check the output to learn what effect
the change has created, which gives you an idea of the costs of the
transactions.
򐂰 Common mistakes are defining workload hourly instead of daily (and
therefore creating eight times the load) or making mistakes when entering the
number of transactions (for example, entering 1000,000 instead of 100,000).
򐂰 If the system looks misconfigured, change the chart to the Average Utilization
view instead of the Transaction Functions view. The Average Utilization view
allows you to compare the utilization by function and helps you to localize the
function that most influences the system load.

Chapter 4. Capacity planning with Scout 75


Figure 4-5 shows an example in which the Content Engine is under a heavy
load.

Figure 4-5 Content Engine under heavy load (utilization is more than 90%)

We want to discover what transaction led to the workload. So, we switch to


the Transaction Functions view. Figure 4-6 on page 77 shows the result and
the transaction responsible for the workload.

76 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Figure 4-6 Transactions Function view (showing the transactions causing the workload)

As shown in Figure 4-6, we see the IBM FileNet P8 4.x Java Create
Documents transaction creates the most intense workload. When verifying
with the system, in this example, we realize a typographical error in the
number of input documents and correct it.
With the correction made, we see in Figure 4-7 on page 78 that the system
operates well under the threshold.

Chapter 4. Capacity planning with Scout 77


Figure 4-7 Normal system workload

򐂰 For international users, two troubleshooting tips might be helpful:


– If you encounter Scout runtime errors, change the regional settings of the
operating system to English.
– Do not use region-specific special characters. If you do, Scout might not
be able to open the files and informs you at which line the problem occurs.
In that case, you can edit the Scout files (.sct), which are in XML format, to
remove the special characters.

4.7 Disk sizing


In addition to sizing hardware by calculating the utilization, which was derived
from a modeled workload, another important point is the sizing of disk space for
the managed content. The IBM FileNet P8 Disksizing Tool spreadsheet enables
you to enter key system values, and then, it produces the estimated required disk
space.

Figure 4-8 on page 79 shows the spreadsheet that contains the input system
values and the output, which is the estimated disk space required for Content
Engine and Process Engine.

78 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Figure 4-8 Disksizing Tool spreadsheet

There is also a spreadsheet for Image Services systems.

4.8 Performance-related reference documentation


In 3.5.2, “Distributed content caching model” on page 56, we provided
performance information from the IBM FileNet P8 3.0.0 WAN Performance white
paper.

In this section, we provide additional references about where to find


performance-related material.

4.8.1 Standard product documentation


These guides ship with the IBM FileNet P8 installation media:
򐂰 IBM FileNet P8 Performance Tuning Guide
Provides information about tuning parameters that can help improve the
performance of your IBM FileNet P8 system. This document covers operating
system, database, and application server parameters and IBM FileNet P8

Chapter 4. Capacity planning with Scout 79


component parameters to help you tune an existing system. You can retrieve
this white paper directly from the following Web site:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/40x/p
8_400_performance_tuning.pdf
򐂰 IBM FileNet P8 Content Engine Query Performance Optimization Guidelines
(for P8 3.x)
This is a guide for optimizing the performance of your Content Java API or
Content Engine COM API client SQL queries made against a FileNet Content
Engine server. Although it is written for Version 3.x, many of the guidelines
are still applicable for Version 4.0 software. To retrieve it, go to the following
Web site:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/35x/V
10_P8_Query_Perf_Guidelines_TechNote.pdf

For the latest performance-related documentation and technical papers, go to


the product documentation Web site for the IBM FileNet P8 Platform:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

4.8.2 Benchmark papers


These papers are system performance tests of specific configurations performed
either by independent companies or in the IBM FileNet test environment. These
documents are available through your local IBM sales team:
򐂰 P8 Content Manager:
– IBM FileNet P8 4.0: Content Engine Performance and Scalability using
WebSphere Application Server V6 and DB2 9 Data Server on IBM System
p5 595
Contains the test results that illustrate the Content Engine’s ability to make
use of the IBM solution stack and available system resources in an
enterprise-class environment
– IBM FileNet P8 2 Billion Objects with File Store Content on Hitachi
Storage
Presents the results of performance tests of IBM FileNet P8 using a large
object store containing 2 billion objects
– One billion objects in Content Engine object store
Documents the management of one billion objects with 2,000 simulated
users
– Database and File Store Performance
Compares ingestion and document retrieval between a system using a file

80 IBM FileNet Content Manager Implementation Best Practices and Recommendations


store compared to a database, with documents ranging in size from 1 KB
to 50 MB
– CM Ingestion with Network Appliance
Demonstrates high-volume ingestion capabilities of Content Engine and
Web Services Interface
– Content Management Throughput with Hitachi Tagmastore
Demonstrates ingestion capabilities of the Content Engine when a Hitachi
device is used as the primary storage device
– P8 3.0 Content Engine with Oracle Ingestion
Characterizes ingestion performance during document creation using the
Java API, Web Services API, file store, and object store, with different
document sizes
– P8 3.0 Content Engine with SQL Ingestion
Characterizes ingestion performance during document creation using the
Java API, Web Services API, file store, and object store, with different
document sizes
– P8 3.0 Content Engine WSI Throughput Rates
Characterizes throughput of the Content Engine when accessed through
the Web Services API
– P8 3.0.0 Content Engine - Hyper-Threading Technote
Characterizes impact of Hyper-Threading on a 4-CPU server hosting the
Content Engine
– P8 3.5 Content Engine - Fulltext Indexing Tuning Parameters
Provides a tuning guide for fulltext indexing
– P8 3.0 Cross-Repository Search
Characterizes CPU utilization and responsiveness when conducting a
cross-repository search
– Rendition Engine
Characterizes the performance of the rendition engine using Microsoft
Word and Excel® formats
򐂰 Content Federation Services (CFS):
– CFS for IS Scalability
Demonstrates the large-scale scalability characteristics of CFS
– CFS for IS Performance
Demonstrates high-volume throughput capabilities
򐂰 Network:
– IBM FileNet P8 3.0.0 WAN Performance
Provides deployment suggestions when configuring an IBM FileNet P8
system in a WAN environment

Chapter 4. Capacity planning with Scout 81


򐂰 IBM FileNet P8 4.0 benchmarks and sizing guides:
– IBM FileNet P8 4.0 Content Engine Sizing Guide: AIX - WebSphere -
Oracle
Presents recommendations for sizing the back-end server components of
IBM FileNet P8 4.0 Content Engine systems using example workloads.
The sizing guidelines are in the form of a procedure that can be used to
predict the amount of hardware required to handle an anticipated
workload.
– IBM FileNet P8 4.0 Content Engine Sizing Guide: Windows - WebLogic -
SQL
Presents recommendations for sizing the back-end server components of
IBM FileNet P8 4.0 Content Engine systems using example workloads.
The sizing guidelines are in the form of a procedure that can be used to
predict the amount of hardware required to handle an anticipated
workload.
– IBM FileNet P8 4.0 Content Engine Sizing Guide: Windows - WebSphere -
Oracle
Presents recommendations for sizing the back-end server components of
IBM FileNet P8 4.0 Content Engine systems using example workloads.
The sizing guidelines are in the form of a procedure that can be used to
predict the amount of hardware required to handle an anticipated
workload.
– IBM FileNet P8 4.0 Content Engine with Hewlett-Packard (HP) Integrity
Scalability Study V2
Shows scalability as well as document retrieval and ingestion performance
– IBM FileNet P8 4.0 Process Engine with HP Horizontal Scalability Study
Characterizes the farming capabilities of the IBM FileNet P8 4.0 Process
Engine
򐂰 Capture:
– FileNet Capture and Content Engine Ingestion Study
Characterizes CPU utilization and response time for high-volume
ingestion into Content Engine using Capture
– Remote Capture
Characterizes performance of Remote Capture Services and effects of the
number of users and configuration types
򐂰 Compliance:
– IBM FileNet Email Manager with Domino Mail Server
Characterizes the performance of IBM FileNet Email Manager Services in
terms of the effects of a number of factors on various output measures.
This information is useful both for performance estimation efforts and for
system sizing.

82 IBM FileNet Content Manager Implementation Best Practices and Recommendations


– Email Manager with Records Manager
Characterizes the performance of IBM FileNet Email Manager (EM), when
deployed with IBM Lotus Domino Server, to capture, archive, declare as
records, and send fulltext index e-mails into the Content Engine
– Records Crawler with Decru Datafort
Presents the results of a benchmark study of the performance of IBM
FileNet Records Crawler. Particular emphasis was placed on
characterizing the effects of a number of factors on throughput and CPU
utilization. Those factors include file size, folder filing status, and using the
Decru Datafort storage security appliance.
򐂰 Business Process Manager:
– Doculabs Business Process Manager Performance Validation
Shows the excellent transaction rates of the Process Engine
– Process Analyzer
Characterizes absolute performance and effects of the process analyzer
tables
– P8 Process Engine 3.5.1 HP-UX Oracle Work Item Creation Study
Describes the performance of P8 3.5.1 Process Engine work item creation
when the Process Engine is configured under HP-UX using a local Oracle
database
– P8 Process Engine 3.5.1 WinSQL Work Item Creation Study
Describes the performance of P8 3.5.1 Process Engine work item creation
when the Process Engine is configured under Windows 2003 using a
remote SQL DB
– FileNet BPF Performance
Presents the results of several tests of IBM FileNet P8 Business Process
Framework

Chapter 4. Capacity planning with Scout 83


84 IBM FileNet Content Manager Implementation Best Practices and Recommendations
5

Chapter 5. Basic repository design


In this chapter, we introduce the basic concepts and elements that comprise a
repository and repository design. Repositories encapsulate not only the content
being managed but also the various metadata elements and infrastructure that
support the IBM FileNet Content Manager (P8 Content Manager) functionality. In
this chapter, we describe the basic repository design elements and guidelines
that are further developed in Chapter 8, “Advanced repository design” on
page 185.

We discuss the following topics:


򐂰 Repository design goals
򐂰 Object-oriented design
򐂰 Repository naming standards
򐂰 Populating a repository
򐂰 Repository organizational objects
򐂰 Repository design objects
򐂰 Repository content objects

© Copyright IBM Corp. 2008. All rights reserved. 85


5.1 Repository design goals
Repositories are the central component of an IBM FileNet P8 implementation.
They store content, such as documents, images, records, and other types of
electronic content along with their respective metadata. Repositories are capable
of storing billions of documents and records, providing a centrally accessible,
enterprise-wide library of information that can be cross-referenced and
cross-correlated.

The decomposition of the solution into the various repository elements is


designed to facilitate not only the separation of logical and functional purposes,
but it also is designed to meet a number of additional goals. The architectural
framework is designed to offer features that facilitate the specific design solution
goals of scalability, maintainability, securability, well behaved enterprise
citizenship, and flexibility for future function and growth.

Each section in this chapter explains the design specific features of that element.
In addition, the entire architectural framework has provided a number of specific
features with the overall design goals. The elements of solution space
decomposition directly support scalability and encapsulate these concepts in a
manner that is easy to reconcile with the physical topology of the infrastructure.
The levels at which any given element is controlled, as well as rolling this
administration up through a single tool, both enhance the power and simplify the
task of continued administration and control of these elements throughout the life
span of the solution. Security features are present at almost every single level
and every single manifestation of the repository elements and, in many
instances, in multiple ways. These features provide a variety of security
granularities from very broad to a very specific and individualized level. See
Chapter 6, “Security” on page 131 for the details.

5.2 Object-oriented design


P8 Content Manager follows an object-oriented design (OOD) paradigm. Every
element represented in the system, whether it is a content object that contains
the metadata and reference for a specific electronic document or image, or the
definition of the document class that defines what these metadata objects look
like, exists as an object. To understand the system as a whole, you need to
understand the processes and assumptions that are inherent in OOD, as well as
how each of these elements relates to the overall design.

An object-oriented approach to the problem space involves the decomposition of


any given problem, into a set of smaller problems, which in turn can be

86 IBM FileNet Content Manager Implementation Best Practices and Recommendations


decomposed further still. The composition of the solution is collections of these
objects into logical and functional units that build back up to the same level as the
original problem that was being addressed. Each level of decomposition, whether
taken from the approach of the problem or the solution space, is intended to
encapsulate specific details of the design. To organize the entities and define
composition boundaries, you consider the four four basic characteristics: identity,
relationship, access patterns, and behavior.

Identity (or metadata)


Encapsulation of identity is the separation into sets of the metadata that defines
any given object, or more precisely, the characteristics of the metadata that
define any specific class of a given set of objects. Decomposing based on
metadata allows logical entity grouping based on the characteristics of the thing
that is being modeled.

Content decomposition consists of examining the types of content across the


organization and grouping the content based on shared properties and
metadata. Policy documents can be grouped together separate from checks,
statements, claims, and others. This perspective focuses on the content and
types of content contained in specific documents.

Relationship or composition
Encapsulation of the relationships of objects is based on an understanding of
how entities relate to each other. This relational association can be grouped into
a hierarchal relationship, where one entity is said to inherit the characteristics of
a parent, in an associative manner where diversely characterized entities can be
related in an ordinal manner, or in an aggregate association, where collections
and groups of entities can be handled as a unit. Decomposing based on
composition allows the associations and relationships of entities, regardless of
type, to be carried across the design.

Grouping content by relationship examines specific relationships between


documents, such as checks related to a specific claim, which in turn is related to
a specific policy. This perspective focuses on formalized relations between
documents.

Access pattern or coupling


Encapsulation of how individual entities are accessed is based on the
interactions that any given entity has with any other entity either inside or outside
the system. This allows considerations for things, such as access authorization
and broad interdependencies, to be captured. The coupling, or
interdependencies, of entities directly affects how additional composition of
solutions occur and can even be an indicator that the decomposition of the
problem space was faulty in some manner. Decomposing based on interactions

Chapter 5. Basic repository design 87


allows granular and secure access to the underlying data and functional
encapsulations.

Examining how content is utilized and accessed can reveal patterns that show
relationships that are not formally captured in metadata but exist nonetheless. A
spreadsheet listing appraisers in a specific geography is usually accessed along
with claims. Customer service representatives typically look at all documents
relating to a specific user or geography. This perspective focuses on how people
access and utilize content to complete their tasks.

Behavior or function
Encapsulation of behavior is grouping entities based on the behavior that they
exhibit, the life cycles through which they go, or functionality that they provide.
Decomposing based on function allows simple changes to be made that can alter
the behavior of a wide set of entities.

The business processes that utilize content, the document life cycles, and work
flows all give a perspective that is based on the functionality of the documents.
The active content perspective allows the grouping of content based on what it
does. Content can be combined with various other content to create new content,
such as report generation. The grouping of content in this manner is best
understood in relationship to active content or as it is considered with Business
Process Management (BPM).

As the P8 Content Manager repository elements are presented, it can be readily


seen how they each fit into the various encapsulation patterns, and how the
overall architectural framework provides a well thought out starting point for
specific solution modeling. Because the problem space is decomposed, as well
as the solution is composed, to arrive at the solution, there are typically two
methodologies that are followed. Which methodology is most applicable in any
given situation is dependent on many things. There is also a set of common
processes that is typically followed that assists in the synthesis of the solution
design. The following sections present a quick overview of these methodologies
and processes as well as a number of useful constructs about how they can be
applied in repository design.

5.2.1 Design approaches


Two basic directions from which to approach repository design are bottom-up
and top-down. Both approaches offer specific benefits and advantages, and
each approach carries with it certain limitations that can make it unusable in any
specific situation. Because design is an iterative process, and because it can
include a reasonably large scope, it is not uncommon for both approaches to be
integrated and applied to different areas of the design as appropriate. It is

88 IBM FileNet Content Manager Implementation Best Practices and Recommendations


considered a best practice to employ both design approaches to the solution,
focusing on both of their strengths and reconciling the approach perspectives
divergence as they meet in the middle.

Regardless of which design approach is used, in what combinations, or with


other design approaches, the ultimate design goals remain the same. There
must always exist a specific set of clear business requirements that is driving the
solution. Regardless of the approach to the design or the methodology used, it
must still be reconciled into one complete whole that is self-consistent and meets
all of the design goals.

Recommendation: We recommend that there is always one individual or


team that has responsibility for the overall design in order to assure
consistency and coherency. We also recommend that regardless of the
approach used that there is a clear set of business requirements for the
design.

Bottom-up design approach


Approaching the design from the bottom up has the advantage of being able to
use the existing content, organization, business knowledge, and expertise that
are either explicitly or implicitly captured within the organization. Involving
business users and subject matter experts (SMEs) greatly enhances the utility
and usability of the resultant design.

Designing the repository from the bottom up is analyzing the existing content and
processes in use by the organization and synthesizing the abstract entities from
this information. Repeated applications of grouping the resultant entities based
on a specific set of characteristics from the four basic types and then
synthesizing the next layer up by abstracting these groupings yields the resultant
design. Each level of organization of entities allows a different facet of design
detail characteristics to be focused on and separated out from the others.

The bottom-up approach has the advantage of your being able to work with
existing, well understood content and with workers who have expert knowledge
of that content. Understanding the abstract characteristics from the four basic
types is well understood and is usually easy to determine. Often as the design
grows from the bottom up, it becomes more difficult, especially for the knowledge
workers, to abstract further away from the concrete details with which they are
used to working.

The bottom-up approach has the disadvantage of taking all of the implicit
knowledge about how the existing problem space is approached, including any
and all artificial constructs that were utilized for historic or other reasons that are
contrary to a good design. It is frequently very difficult to overcome these

Chapter 5. Basic repository design 89


inherent design decisions that have been made in the current organization in
order to understand the true underlying requirements.

Top-down design approach


Approaching the design from the top down has the advantage of allowing the
design to be formalized from a clean start. Any existing faults in the system,
historic processes, and procedures that no longer add business value and any
preconceived notions of what is expected can be avoided. This allows the
enterprise viewpoint to be fully exercised and elevates the considerations for
things, such as future flexibility, growth, and overall integration structure to be
fully considered. This process typically starts by utilizing not subject matter
experts but solution domain experts, who understand the technology and
architecture of enterprise content management systems and works its way down
to the level where subject matter experts need to be consulted for the final
details.

Designing from the top down involves understanding the global picture and
decomposing the various levels of the design through either clear design goals or
specific design choices. It is also an iterative process, which in this case drives
from the most abstract down toward the concrete levels. By designing from the
top down, the specific order of design characteristics can be approached in the
manner that makes the most strategic sense for the organization.

The top-down approach has the advantage of developing a design that does not
include any artificial barriers based on constructs, such as organization as
opposed to function, and producing a design that emphasizes the strategic
requirements of the solution. This often results in the most flexible and adaptable
design moving forward.

The disadvantage of the top-down approach is the difficulty in mapping existing


content and processes into the new design that is developed. As the design
iterations approach the more concrete aspects and need to be mapped directly
to concrete business entities, the process can become conceptually and
politically difficult for knowledge workers, depending on historic organizations.

5.2.2 Design processes


It has been shown that producing the best possible design requires coordination
and cooperation from all of the major areas that the solution touches. In addition,
all of the major areas that will be directly affected by the solution need to be
involved and committed to the goals. However, that is not always possible to
achieve so designing as close to a perfect solution as possible is the next-best
goal. There are a number of design processes and concepts that have been
shown to be extremely useful in producing an effective repository design.

90 IBM FileNet Content Manager Implementation Best Practices and Recommendations


The two key elements necessary are the team that undertakes the design and
the specialized pieces of information that are needed to make the correct design
decisions.

Design team
The design team itself can consist of one or more architects with the specific
responsibility of producing the design. Regardless of the number of individuals in
the design team, there is a clear set of roles and responsibilities that must be
represented. These roles cover both the technical facets of the design as well as
the business facets. The team is usually led by a technical architect who has the
direct responsibility for the content solution. The team is either populated by
architects and representatives from the following areas or contacts in the
following areas that can provide feedback and direction as needed to the team
without being full-time team members:
򐂰 P8 Content Manager architect technical role
This is the architect who has the ultimate responsibility for the overall
repository and solution design itself. This role must always be assumed by a
full-time member of the design staff who has expert level knowledge of the P8
Content Manager product itself.
򐂰 Enterprise architect technical role
This is the architect who is responsible for overseeing the technical fit of the
solution into the existing solution portfolio. This role must always be assumed
by someone who has an expert level of understanding of the current
technology across the enterprise.
򐂰 Application architect technical role
This is the architect who has direct responsibility for specific application or
applications being addressed at this phase of the design, who is responsible
for tracking the business requirements into the solution space.
򐂰 Enterprise security technical role
This is someone who has expert level understanding of the security
environments and models that are utilized in the enterprise infrastructure. The
purpose of this role is to assure that all existing security policies are adhered
to and to provide support as needed for security requirements outside of the
P8 Content Manager solution itself.
򐂰 Technical support roles
There must be experts in server administration, database administration,
storage administration, network administration, and directory service
administration either represented on the team or available to the team. The
purpose of these roles is to assure that the various infrastructure elements

Chapter 5. Basic repository design 91


can support the design and that the design does not violate any existing
policies in these respective areas.
򐂰 Legal business role
This role must be assumed by someone who has expert level knowledge of
the legal requirements of the business sphere in which the solution exists.
They provide guidance to requirements and restrictions on the system that
are imposed for legal, as opposed to business value, reasons.
򐂰 Knowledge worker business roles
These roles represent the business workers who are directly affected by the
solution, whose content and process are being integrated into P8 Content
Manager and who have the inherent and implicit knowledge of the business
that is not usually captured in any other manner.
򐂰 Corporate librarian business role
This is someone who has expert level knowledge of existing content
management solutions as well as records management and retention. This
person provides direct guidance into all existing policies and procedures that
relate to how content is currently managed and all retention processes in the
business.
򐂰 Business problem business role
This is someone who has expert level knowledge of the problems that are
being addressed by the solution from the business perspective. This person
provides the final arbitration and definition of the business requirements and
business value of any element of the design.
򐂰 Project management role
This is someone who is responsible for tracking schedules, budgets,
requirements, and assignments. This person facilitates interactions and
accountability for the various other roles in producing the final design.

Interviewing process
In most cases, it is impossible to staff the design team with all of the requisite
expertise that is required. Even when experts from areas are direct team
members or are directly available to full-time team members, often specific
individuals and groups have pieces of knowledge that are required. The iterative
process of gathering these pieces of knowledge is done through series of
interviews. Each interview needs to start with a clear set of questions to be
answered and involves the complete design team as well as all individuals who
can contribute to obtaining the answers. Conducting interviews in a group format
allows for the potential of addressing additional questions and issues that are
uncovered during the process and greatly reduces the time and effort required by
all to obtain the body of information that is required from any particular area.

92 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Joint architecture and design sessions (JADS)
Similarly to interviews, JADS need to be conducted in a group of the entire
design team, including any non-full-time members who are appropriate for the
specific individual portion of the design that is being developed. Although many
design activities and knowledge collecting activities can, and must, occur on an
individual basis, it is most effective to always work as a team when coordinating
design integrations and making final design determinations.

Joint architecture requirement capture sessions (JARS)


JARS are another formalized method for extracting information from the design
team and subject matter experts directly. JARS are conducted in a similar
manner to JADS but are concerned with capturing the set of requirements that
drives the design. The same governing rules on which JADS operate also apply
to JARS.

5.3 Repository naming standards


Prior to designing the repository, initial thought needs to be given to the
conventions that will be used to standardize naming across all of the various
objects that will be in the repository. At this stage, concern yourself with the
standardization across the organization and design elements of the repository as
opposed to the content objects themselves that will be placed into the repository
by users. Through a well thought-out naming scheme, you can avoid many
potential problem areas at the beginning of the project as opposed to discovering
them during the lifetime of the repository. All objects that are created as
organization and design objects need to be named as descriptively as possible.
Sites can be named by geographic names or common company designations for
specific facilities. Virtual servers can be named for the purpose or designation
that they will service. When there are hierarchal relationships between objects, it
makes sense to capture this relationship in the naming standard as well. For
example, company XYZ has a site that is called Upper Bay. One of the virtual
servers in that site is named Upper Bay-Accounting.

There are a number of standard references to different labels and points that
must be considered in every case. These are presented followed by the call-out
of several naming constructs for specific objects that have been shown to be
useful.

Chapter 5. Basic repository design 93


Recommendation: We recommend that naming standards are put in place
prior to the creation of any design or organizational objects in a repository and
that they are adhered to throughout the lifetime of the repository. We
recommend that names are as descriptive as possible with consideration for
the consumer of the label.

5.3.1 Display name


Display name is used to indicate that this label will be displayed on the user
interface components for the consumer of those user interface components. In
the case of many organizational and design objects, this might be the design
team, as well as the administrators of the system throughout its lifetime. Most of
these display names, however, will be utilized by the users of the system. These
names are intended for human consumption and must have the proper white
space and punctuation to make them the most meaningful to their intended
audience.

In the case where there are custom interfaces of any kind between the system
and the user, the use of this label is optional and might not be used. This field will
be utilized as the object name in all cases where it is accessed through the
standard interfaces and tools, such as FileNet Enterprise Manager and
Workplace.

An example of display names is prompting the users with User Name on a panel
for the users to enter their names.

5.3.2 Symbolic name


Symbolic name is an enforced unique name across the scope of the object.
Design objects have a scope that includes the object store in which they occur.
Organizational objects have a scope that includes the entire domain, because
they are present in the Global Configuration Database (GCD) and therefore
include all object stores as part of their scope as well.

An example of symbolic names covering user identities include the following


entries: UserName, FirstName, LastName, and others.

5.3.3 Uniqueness
Object names across the entire design generally have a requirement for
uniqueness. Unique naming tracks with appropriate naming, that is, when proper
consideration is given to naming objects, the uniqueness typically follows.

94 IBM FileNet Content Manager Implementation Best Practices and Recommendations


Problems can arise when overly abstract names are given to an object where the
same name more appropriately maps at a higher level in the hierarchy.

An example of naming an object E-mail implies that it is utilized high in the


naming hierarchy whereas we expect a name, such as agentCustomerEmail, is a
good choice at a low level.

5.3.4 Taxonomy
Taxonomy is the establishment of categorization based on naming. Having a
specific pattern that is applied to names with definitions for each name part that
are well understood facilitates an organized taxonomy. Giving initial thought to
taxonomy and developing a taxonomy prior to the actual naming simplifies the
naming task and accents the self-descriptiveness of the name given.

The best known example of a taxonomy is the scientific classification of living


organisms, designating a name pattern that contains elements, such as species,
family, phylum, and others.

5.3.5 Consistency
Consistency is important so that as the base of people who will be utilizing the
names is broadened, it leads to better understanding and less confusion as the
system moves forward in scope and in age. Establishing consistency standards
well is beneficial in the long run. Consistency is facilitated by the complete
application of the ideas that are already presented.

5.3.6 Specific points


There are specific applications of naming standards that apply across specific
objects in the design, as well as specific concerns to consider at specific object
levels as well. We address object stores, storage areas, document, custom
objects and folder classes, property templates, and choice lists.

Object stores
Object stores are the highest point of naming for a given repository as well as the
first level of decomposition for the solution space. Make sure that you indicate
the part of the solution that an object store represents when you name it.

For example, company XYZ with a single object store can name its object store
XYZ Enterprise. Another company ZYX has two object stores and it can name
the two object stores, ZYX Operations and ZYX Support. The object stores
represent repositories for all content pertaining directly to the business of ZYX:

Chapter 5. Basic repository design 95


one repository for all of their internal administrative content and one repository
for support organization content.

Storage areas
Storage areas are where the content is saved; there are various types of storage
areas, including file system, cached content, and fixed content. Each type can
represent a number of varieties, each with specific characteristics. Naming the
storage areas in a manner that encapsulates the type and characteristics of the
storage area is useful, because the storage areas are accessed and applied
throughout the lifetime of the system.

For example, Company XYZ has three storage areas in use for the Company
XYZ repository. The first storage area is a file store hosted on the network
accessible protected storage segment of a storage area network (SAN) by a
Network File System (NFS) mount. The second storage area is a fixed storage
area that links to the company’s image management system. The third storage
area is a file store on a local (Just a Bunch of Disks) JBOD device also through
an NFS mount. These three storage areas are named NFS-RAID, IMAGES, and
NFS-CHEAP.

Document, custom object, and folder classes


When naming these objects, consider the inheritance hierarchy to both clarify the
lineage of a specific object as well as to distinguish two leaf objects that might be
the same type at first glance but have totally different lineages.

For example, memos from Engineering are classified under the


xyzOpsDevCommunicate document class, while memos from Human Resources
are classified under the xyzSupHrCommunicate document class.

Property templates
Special considerations for property templates need to be taken as the use of a
given property template can be widely used across many different objects. The
names chosen for the property templates need to be self-descriptive of both the
characteristics of the property template as well as the intended use of the
template.

An example of three property templates are AgencyName, FirstName, and


LastName. Where these can have multiple usages across many different objects
in a somewhat generic way but with clear meaning for Company XYZ.

Choice lists
Choice lists are similar to property templates, but they are used to limit the
entries that the user will fill in for a property template. A choice list is associated

96 IBM FileNet Content Manager Implementation Best Practices and Recommendations


with a property template only at the class level, so choice lists again must have
descriptive, informative names.

For example, the four choice lists in company XYZ are States, Geos, ClaimType,
and Month.

5.4 Populating a repository


In the solution domain, there are two major containers for data: the global
configuration database (GCD) and the repositories, which are illustrated in
Figure 5-1. There is only a single GCD that encapsulates all of the configuration
of the domain and at least one, but possibly many, repositories in the system.

Repository

Repository Repository

Global Configuration Database

Repository Repository

Figure 5-1 Storage objects in a domain

Chapter 5. Basic repository design 97


A repository contains a single object store and potentially one or more storage
areas as shown in Figure 5-2. An object store contains definitions, configuration
information, and metadata for the content that is stored in the repository. The
storage areas store the actual content.

Repository

Object
Store

Database Content
File Content

File Content

Figure 5-2 Repository contents

There are four major stages involved in the population of a repository: three
design stages and one production stage. The three design stages include
organizational design as described in 5.5, “Repository organizational objects” on
page 101, repository design as described in 5.6, “Repository design objects” on
page 106, and repository content design as described in 5.7, “Repository content
objects” on page 128. The final stage in repository population is the actual
production, or test, usage of the repository. The following sections describe the
design stages and their relationships.

During all of these design phases, there are certain commonalities that are
universally, or nearly universally, utilized in the objects of the design.

5.4.1 Generic object system properties


Generic object system properties are properties that are found in the lowest level
of the object-oriented hierarchy from which all other objects are extended. All of
the system properties are available to all the objects and do not need to be

98 IBM FileNet Content Manager Implementation Best Practices and Recommendations


replicated in any custom properties. Therefore, you must understand what is
available in order to leverage these properties where applicable.

Here, we list several of the system properties that have potential application in
other places of the design:
򐂰 Class description
The class description contains the immutable description of the class from
which this object is instantiated.
򐂰 Display name
This immutable label is intended for display to the user for prompting for the
entry of the value of this object.
򐂰 Descriptive text
This immutable text describes the purpose and meaning intended for this
object.
򐂰 Is hidden
This is a Boolean value that indicates if the object is hidden in its current
context. This property affects the user interface and is exposed to external
systems.
򐂰 Symbolic name
This immutable label is used for internal, programmatical references to the
object.
򐂰 ID
This immutable global1 unique identifier (GUID) can be used to reference this
specific object throughout its lifetime.
򐂰 Is content-based retrieval (CBR)-enabled
This is a Boolean value that indicates if content-based retrieval is enabled in
the current context of the object.

1
In this context, global is only across the IBM FileNet P8 domain, because there might be other
objects in other domains with the same GUID.

Chapter 5. Basic repository design 99


Common object properties
In addition to the set of properties just covered that applies to all objects in the
system, there is a set of properties that appears in many of the objects that is
important to mention at this level. The following properties are present in most
objects:
򐂰 Auditing enabled
This property indicates if the object has its auditing enabled. This is a switch
that enables and disables all audit logging for this specific object and its
scope. Many events can be audited and controlled at a more granular level.
򐂰 Security
This property contains the access control list (ACL) for the object. An ACL
consists of a number of access control entries (ACEs). A single ACE contains
either a individual or group from the Lightweight Directory Access Protocol
(LDAP) and the authorizations that entity has in relation to the object (See
Chapter 6, “Security” on page 131 for more details).

5.4.2 Creating design elements


There are a large number of design element types that must be utilized in
cooperation to achieve the optimum design. Each of these elements exists for a
specific purpose and encapsulates a specific set of information. Because
solutions are composed together from these elements, complex relationships
can be created between them that must be maintained for system integrity and
consistency. The majority of the complexities of the relationships are handled by
the underlying engine and removed from the concerns of users.

Modifying and removing design elements can be a tricky procedure given the
complex relationships that are possible. This is especially noticeable when
attempting to remove a design element that might be utilized or referenced from
a number of other design elements at differing levels of the design. It is always
best to be as thorough as possible in the system design prior to actually creating
the elements in the P8 Content Manager, because this avoids most of these
difficult situations.

P8 Content Manager has a number of wizards that assist in the creation and
modification of the various design elements.

Recommendation: Complete the design as much as possible prior to actually


creating the design elements in the system. Use the wizards that are provided
to create the elements.

100 IBM FileNet Content Manager Implementation Best Practices and Recommendations
5.5 Repository organizational objects
The solution space is divided into a number of logical divisions. Each division
serves a specific purpose. The composition of all of these divisions provides a
powerful solution that allows the requirements of any implementation to be
clearly and succinctly decomposed.

Figure 5-3 shows the logical relationships among the decomposition elements,
domain, sites, virtual servers, and server instances.

Site

Site
Site

Virtual Server

Virtual Server
Server Instance

Virtual Server

Server Instance

Server Instance

Figure 5-3 Repository organizational objects

All of the logical elements composing a domain are administered and managed
through IBM FileNet Enterprise Manager.

Chapter 5. Basic repository design 101


We will use an example (configured in Figure 5-4) for the discussion of each of
the elements of the solution decomposition.

Figure 5-4 Geographic dispersion

In this example, we have two central offices where 80% of all of the computing
resources and data reside, along with a number of satellite offices, each with its
own resources and a need to interact with the centralized services as well. Each
office has a highly reliable local area network (LAN) connecting all of the
resources in each office. There is a dual-redundant connection between the
central offices that provides both high reliability and high performance across
their communications. There is a lower speed wide area network (WAN)
connecting all of the satellite offices with the two central offices.

5.5.1 Domains
The highest level of purview for a given P8 Content Manager implementation is
the domain. There is a single domain in this specific P8 Content Manager
implementation. You can choose to have more than a single domain where you
want to totally isolate environments. Do this to support development in its own
logically separate location or for other reasons (See Chapter 8, “Advanced

102 IBM FileNet Content Manager Implementation Best Practices and Recommendations
repository design” on page 185 for complete guidance about multiple domains).
The domain encapsulates all of the logical resources for the implementation, as
well as all of the logical services that provide access to those resources. The
domain defines the absolute boundaries that none of the logical resources or
logical services can cross. The heart of a domain is embodied in the Global
Configuration Database (GCD), which is a database that encapsulates all of the
hierarchy of logical elements that provide access to the resources in the domain.
Each Content Engine that is installed is bound to a specific domain, and it is
within this domain that all of the repository elements are defined.

The domain is, in most cases, analogous to the enterprise. If there are valid
business and technical reasons that the domain is a division of the enterprise, it
is perfectly reasonable to make it so.

Domains contain all logical elements of a P8 Content Manager implementation.


A domain is not limited by the P8 Content Manager to the number of sites that it
can contain. Domains are created by starting up IBM FileNet Enterprise Manager
and choosing the add button to add a new domain. Then, a dialog prompts for all
of the required information to establish a new domain.

In our example, all of the resources shown in Figure 5-4 on page 102 are
included in a single domain, because there are requirements for sharing these
resources across the entire enterprise.

5.5.2 Sites
Sites are encapsulations of geographically colocated physical elements. The
interconnection of these elements is across fast local area network (LAN)
connections. Interconnections between sites is assumed to be across the wide
area network (WAN) with slower connection and bandwidth requirements.

Site decomposition must always be done with direct consideration for the
interconnections. Any site must only contain elements that are interconnected
through high performance, high bandwidth, and highly reliable network
connections. There is typically no functional reason to decompose any
geographic location that is connected through a single LAN into multiple sites;
however, this might be warranted in specific cases.

Object stores, storage areas, and virtual servers are all associated with a specific
site. A site is not limited by P8 Content Manager to the number of object stores,
storage areas, or virtual servers that it might contain.

Sites are created through FileNet Enterprise Manager’s site wizard, which can be
accessed by right-clicking on the top-level folder labeled sites. The only
requirement is to give the site a distinct name and a meaningful description.

Chapter 5. Basic repository design 103


In our example, each office in Figure 5-4 on page 102 is its own site. Although
there is still a high-speed, reliable connection between the central offices, they
are still divided into two logical sites.

Recommendation: We recommend that sites are modeled to represent the


geographical layout of the resources, and that no site attempts to cross a
network boundary that does not provide a high performance and high
reliability connection.

5.5.3 Virtual servers


Virtual servers are the connection points for Content Engine clients. This is the
entity toward which all interactions with P8 Content Manager clients are directed.
The simplest virtual server contains a single server instance, hosting a single
Content Engine. The number and topology of the server instances included in a
virtual server are totally hidden from the clients that access the virtual server and
are transparent to them.

When declaring virtual servers into sets that utilize load balancing through
software or hardware techniques, consider the grouping of the individual server
instances that they contain. These virtual server groupings provide both
performance and availability scaling for the clients that will utilize them as their
access points.

P8 Content Manager does not limit the number of server instances that a virtual
server can have.

Virtual server objects are created dynamically during system initialization and
startup based on the configured topology of the application server or via specific
system properties.

In our example from Figure 5-4 on page 102, each satellite office only contains
virtual servers that contain a single server instance, because they have no
requirement for providing high-performance access to their resources for internal
use. The two central offices each contain multiple virtual servers that utilize load
balancing across multiple server instances. This allows the central offices to
support not only internal resource access but also allows the frequent accesses
that will come from the satellite offices.

Recommendation: We recommend that virtual servers with multiple server


instances are provided wherever there will be a large number of accesses and
where the performance of the Content Engine will be significantly augmented
by balancing those accesses across a number of server instances.

104 IBM FileNet Content Manager Implementation Best Practices and Recommendations
5.5.4 Server instances
Servers are representations of a single Java 2 Platform, Enterprise Edition
(J2EE) application server instance. They do not necessarily equate to a single
physical server instance, because any physical server can contain any number
of J2EE application servers, each running in their own individual Java virtual
machine (JVM) space, or even their own logical server division on a specific
physical server. The server instances are where the individual compute platforms
of the Content Engine are actually deployed. A server is associated with a
specific virtual server, which is the client entry point for that set of servers.

A server instance contains exactly one J2EE application server instance, and
there are no limits imposed by P8 Content Manager to the number of server
instances in a domain. Our example in Figure 5-4 on page 102 has to have every
server where the Content Engine has been installed to be a server instance.

Recommendation: There needs to be a single server dedicated to each IBM


FileNet P8 engine component deployment.

5.5.5 Global Configuration Database (GCD)


All of the repository organizational objects are contained in the GCD. The GCD is
the single container that encapsulates all of the configuration information for a
domain. The GCD is the logical representation of the domain, and it contains the
subsystem configuration, which consists of the other organizational elements for
sites, virtual servers, and server instances. In addition, it contains the specific
configuration information for each object store’s database space, trace log
configuration information, and other information.

Figure 5-5 on page 106 gives a visual representation of the GCD layout. The
GCD is set up at install time and is then managed through IBM FileNet
Enterprise Manager.

Chapter 5. Basic repository design 105


GCD

ObjectStore1
DS/DSXA
ObjectStore2
DS/DSXA

Subsystem Configuration
Domain
Site1
VirtualServer 1
ServerInstance 1
ServerInstance 2
VirtualServer 2
ServerInstance 3
Site2

TraceLogging Configuration

Figure 5-5 Global configuration database contents

5.6 Repository design objects


There are a number of elements that constitute a repository design. Each of
these elements encapsulates a specific view, purpose, and role in the complete
design. The division of responsibility between some of these elements is very
clear while others are highly dependent on the specific environment and
application. There are a large number of design decisions that must be made to
achieve a final design that is both efficient and scalable.

5.6.1 Object stores


In a similar manner that a domain encapsulates an entire repository solution, an
object store is the basic component of a repository that contains not only all of
the content that has been committed to P8 Content Manager, but all of the
additional information and functional objects associated with that content. The
number, type, and location of object stores that are needed for an organization is
an important design consideration (Chapter 8, “Advanced repository design” on
page 185 for additional details). Any object store is associated with a specific site
and the storage areas associated with that site. The object store contains
definitions for various classes that structure metadata, as well as actual

106 IBM FileNet Content Manager Implementation Best Practices and Recommendations
metadata objects along with their connections to the content where applicable.
An object store can contain all of the content for the entire enterprise or can be
segmented from the overall enterprise design and assigned to a specific set of
the overall problem. Regardless of the purpose, the object store contains the
entirety of all of the definitions required for use by users and any applications that
will access it. Figure 5-6 shows a graphical representation of the scope of an
object store.

Class Definitions
Other Definitions

· Workflows
· Choice Lists
· Properties
· Events
· Security polices
· Other

Root Folder Object Store

Figure 5-6 Object store contents

An object store is conceptually an object like all entities that make up a repository
and that has specific characteristics. Object stores are created through the use of

Chapter 5. Basic repository design 107


IBM FileNet Enterprise Manager. The recommended practice is to utilize the
wizard for object creation, which simplifies the interface and insures that all
settings necessary at creation time are both set and synchronized where
applicable.

The first page of the wizard prompts for the display name, symbolic name, and
description of the object store. The wizard displays a list of currently used names
that are not allowed to be reused.

The second page of the wizard prompts for the Java Naming and Directory
Interface (JNDI) data sources for the object store database, where all of the
definitions and metadata will be stored. It requires both the regular JNDI name as
well as the transaction (JNDI XA) name.

The third page of the wizard prompts for the default storage location that will be
used for content storage. This can be either the database, file storage, or a fixed
store.

The fourth page of the wizard prompts for the entry of the initial object store
administrators, as entries from the LDAP.

The fifth page of the wizard prompts for the initial ACL for the object store, which
lists the initial user groups associated with the store from the LDAP.

The sixth and final page of the wizard allows all of the information entered to be
reviewed one final time prior to acceptance and the actual creation of the object
store.

Recommendation: If your design calls for more than a single object store,
create a metastore that can contain all of the design objects that are common
across all of the stores and replicate this as changes are made.

If a metastore is utilized, do not roll this store out into production, because it is
strictly a development object store.

When creating an object store, always set the object store administrator to a
valid administrator logon and grant the administrator all permissions.

5.6.2 Storage areas


Storage areas for repositories can be hosted on a wide range of storage devices
and mediums, from SCSI drives, to fiber-attached SAN devices, to secure
immutable storage units, as well as others.

108 IBM FileNet Content Manager Implementation Best Practices and Recommendations
In addition to storage media type, there are a number of logical storage types,
such as database stores, file stores, fixed content stores, cached content stores,
and others. Each of these logical types has implications for performance and
functionality that must be considered when determining specifically where to
store content.

Database store
There is a single database store per object store, where the database store is
analogous to the object store. The database store can be used to store content,
but the content will be stored as a database binary large object (BLOB).
Depending on the size of the content, this is not a very effective use of the
database and can have serious impacts on the database performance.

Recommendation: The database store must only be used for content that is
no larger than 10 KB in size. Larger content sizes must be stored in a file store
to avoid detrimental impact on the database performance.

File store
There can be multiple file stores per object store with each one a separate
directory structure on the server. The file store can be on local storage media or
can be a mount point for remote, or networked, storage media. This is the typical
location that is used for content with different file stores of different media types
used for different content where appropriate (See Chapter 8, “Advanced
repository design” on page 185 for more details on file stores).

Recommendation: There must always be at least one file store defined for
the repository where content that is larger than 1 KB in size can be stored. The
media type and cost must be clearly understood from the file store name to
eliminate content storage area errors.

Fixed content store


There can be multiple fixed content stores per object store. This content type is
designed to provide access to other content storage systems, such as an image
repository, while leveraging the power of P8 Content Manager’s metadata
management system (See Chapter 8, “Advanced repository design” on page 185
for more details about fixed content stores). Fixed content stores can be a link
into Image Manager or access to other systems.

Content cache store


A cached content store is a special store that allows local caching of content that
is actually stored in a remote repository. A content cache store allows local
access to content that is frequently accessed, or in an active state of processing,

Chapter 5. Basic repository design 109


to be available without degrading the network connection to the remote content
and increasing the performance for these local operations. It is important to note
that the cached content store provides a performance enhancement for remote
content access but does not provide any type of high availability solution for the
content (See Chapter 9, “Business continuity” on page 213 for more details
about high availability solutions).

5.6.3 Document classes


Document classes are the design objects that, when instantiated, will contain the
actual content of the system. Most of the detailed design process is concerned
with developing the correct set, and hierarchy, of document classes. Document
classes are inherited from a common top-level document class object that
contains all of the basic properties that the system needs. It is considered a best
practice to have a single document class as a child of the root document class.
This is the top-level class for the company that will contain all of the metadata
items that are the same across all document objects in the enterprise, either by
requirement or edict.

The first level of document class design is concerned with the common
enterprise objects, as opposed to specific application objects. The result of this
first round of design is a hierarchal document class tree that contains all of the
common enterprise document classes that can be leveraged by specific
applications, because they are included in the P8 Content Manager solution.
There need to be a reasonable number of properties defined in each class. It is
easier to administer and expand a design where each document class is
concerned with a specific aspect of the design. The resultant tree is typically
neither extremely narrow, nor extremely wide. A narrow tree usually indicates
that the class design has focused too specifically on an aspect and has been too
exclusive. A wide tree usually indicates that there are too many aspects of the
design encapsulated at a level.

Another test that can be applied to the resultant design is to see how various
changes to the design can be made. If there are properties that have historically
changed somewhat frequently, or any properties that are projected to change,
see what changes need to be made to the design to accommodate the changes.
The ideal is to address a change with a change in a single class. This is a good
indication that you have the proper level of design encapsulation. The types of
changes to consider are property redefinitions, property additions, property
deletions, class additions, class modifications, class deletions, security updates,
functional changes, and organizational changes.

Adopting an enterprise perspective allows the document class designs to


facilitate greater information sharing and collaboration across the enterprise. In
addition to assisting in breaking down information silos, this makes the overall

110 IBM FileNet Content Manager Implementation Best Practices and Recommendations
design much more usable as well. You must always take usability into
consideration during all the design phases. The use of subject matter experts at
this phase can greatly assist you in meeting the unspoken requirements and
usability goals of users.

As a key design object in the system, there are lots of additional components ton
which the document classes are dependent. Most of these dependencies are
covered in the specific sections for the dependent elements. Probably the most
important dependency is the usage of the property templates in the class
designs. This dependency underscores the need to be clear and concise in the
property template definitions and consistent with naming and topology across the
entire design.

Finally, try to avoid designing for the current organization without being modular
enough to accommodate change. Avoid carrying over limitations of current
system that might have been design flaws in the current system or limitations of
the tools that are used to support it. Take into account any current or future
processes in which the content is utilized. That is, always consider business
process automation in the design. Remember that there will always be additional
applications and functional areas that the system will need to support that are not
currently identified or even identifiable.

There are three focus areas that the document class design typically follows.
These are design based on organization, design based on content, and design
based on function. Although these are the major design approaches that are
used, variations on these themes as well as modifications and combinations of
these approaches are also successfully used. The right approach to utilize is
highly dependent on the specific details of your corporation and the application
that is supported by P8 Content Manager:
򐂰 Design based on organization
Design based on organization starts with the first level of decomposition after
the enterprise root document class, which is groupings around how the
corporation is organized. This can be reflected in line of business (LOB)
objects, support and business value objects, or any other high-level structure
that represents your organization. The subsequent layers of the hierarchy
then follow the organization down into smaller and smaller groupings. Each
level can also have classes that capture content specific aspects where the
document content that they represent has consistency across the entire
organization from that root down the hierarchy. Eventually, the lowest level
represents document content classes that correspond to specific functional
areas or specific content.
This facilitates future changes that occur at the organizational level by
capturing these aspects as high in the tree as feasible and letting these
properties and attributes be inherited down the hierarchy.

Chapter 5. Basic repository design 111


򐂰 Design based on content
Design based on content starts with the first level of decomposition after the
enterprise root document class. The first level includes high-level abstractions
of the content types that will be stored in P8 Content Manager. This often
follows record plans where they have been established. Lower levels of the
hierarchy allow the capture of more and more concrete aspects of the content
types until the resultant leaf nodes are declared.
This approach facilitates communication across the enterprise, because all of
the properties of the document classes will be the same regardless of where
in the organization they are used. You do not capture the organizational
aspects of the corporation. This design approach can have significant political
ramifications dependent on the culture of your corporation.
򐂰 Design based on function
Basing design on function starts with the first level of decomposition after the
enterprise root document class consisting of abstractions of the functions that
are carried out in the corporation without regard to the organizational
structure. As the document class hierarchy extends down, more and more
concrete functional aspects are captured, as well as content specific aspects
for the content types that will be used.
This approach captures many of the functional aspects of the corporation,
which typically mirror the organizational structure, but in a more abstract
perspective of focusing solely on the function, business value, and processes
for which the content is used. This approach is sometimes viewed as a
blending of the purely organizational approach and the purely content
approach.

Document classes are created through a wizard interface in IBM FileNet


Enterprise Manager by right-clicking on an existing document class, such as the
base document class, and choosing a new class.

The first page of the wizard prompts for the display name, symbolic name, and
description of the document class. The wizard displays a list of currently used
names that are not allowed to be reused.

The second page of the wizard prompts for adding properties to the class, based
on the existing property templates. Although you have the ability to launch the
property template wizard from this page to create a new property, the best
practice is to establish all the necessary property templates prior to creating the
document classes.

The third page of the wizard allows you to set the attributes of the properties of
this class. You can set properties to be required, hidden, the name property, a
default value for the property, and a maximum size, associate a choice list with

112 IBM FileNet Content Manager Implementation Best Practices and Recommendations
the property, and other settings. You can set the attributes not only of the custom
properties that you defined on the second page of the wizard, but to any system
or inherited property as well.

The fourth page of the wizard allows you to set the storage parameters of the
class. You can assign a storage policy here, direct the class to use a specific
storage area, or choose to inherit the settings of its parent class.

The fifth page of the wizard prompts for any event auditing that you want on
objects of this class. By defining trigger events, subsequent entries are made in
the audit log when those events occur.

The sixth and final page of the wizard allows all of the information entered to be
reviewed one final time prior to acceptance and the actual creation of the
document class.

Recommendation: There needs to be a single, top-level document class that


extends the base document class and from which all other document classes
will be derived.

All property templates, choice lists, storage policies, and storage areas need
to be created prior to creating any document classes that utilize them.

Each document class encapsulates a single design aspect.

Never skip the step of designing high-level abstract objects that are for aspect
encapsulation and will most likely never be instantiated.

Document class characteristics


Document classes have the following characteristics:
򐂰 Have metadata
򐂰 Are containable
򐂰 Are versionable by both content and metadata
򐂰 Hold content

5.6.4 Folder classes


Folder class objects are the design objects that, when instantiated, provide
aggregation or containment for other objects. The characteristics and usage of
folder objects must not be mistaken for, or confused with, the foldering features
and concepts provided by a file system. P8 Content Manager folder classes
provide containment by reference, which allows any given object to be contained
in multiple folders at the same time. Most of the same considerations that are
given to creating document object classes (See 5.6.3, “Document classes” on

Chapter 5. Basic repository design 113


page 110) also apply to designing folder classes: Single top-level class that all
others are derived from, single design aspect captured per class, design with
changes in mind, design in modularity, and do not repeat any mistakes that the
current system or processes have.

A key design decision that needs to be made is whether the main access
mechanism for content follows the search paradigm (represented in Figure 5-7)
or follows the browse paradigm (represented in Figure 5-8 on page 115). Both of
these paradigms offer their own strengths and weaknesses, and this decision
directly affects how folder classes will be used and instantiated.

Search paradigm compared to the browse paradigm


The model for the search paradigm is represented in Figure 5-7 as a dialog box
requesting some information and returning a set of content that meets the criteria
specified in the dialog. The best analogy is accessing a database. Information is
retrieved from a database by formulating a query, which returns a set of data
elements that matches the criteria in the query.

Search Options

Name:

Size:

Cancel Submit

< Back Finish Cancel

Figure 5-7 Searching for content

The search paradigm is very powerful, because it does not rely on the user
needing to know where the content is in the system or the name of the object that
contains the content. Searching also returns a set of objects as an atomic
operation, the maximum size of this set can be controlled as well. This can
include objects that are located in diverse places in the repository. Effective use
of the search paradigm requires the selection of meaningful distinguishing
properties for the objects that have meaning to users. It also requires meaningful
document classes that are understood by users as well.

The search paradigm can be fronted with various methods of compiling the
search criteria and usually is best served by designing searches or through

114 IBM FileNet Content Manager Implementation Best Practices and Recommendations
custom interfaces. It is usually a faster and a more reliable method of finding
content than is offered by the browse paradigm.

The model for the browse paradigm is represented in Figure 5-8 as a typical file
system structure. There is some meaningful relationship between sets of folders
that leads the user to sets of content in an understandable way. The best
analogy is a file system tree structure. Although the analogy presented to help
understand the browse paradigm is a file system structure, a file system folder is
not the same as a P8 Content Manager folder, which supports multiple filed
locations.

Figure 5-8 Browsing for content

The browse paradigm relies on the users who add the content to be thoughtful
and knowledgeable in the manner in which the content is filed. This potentially
includes filing the same content object in multiple folders. There is also a
requirement that the name of the content object has meaning in its context that is
understood by users.

The browse paradigm can increase the time that it takes for the system to search
for content, but it is well suited to users who all understand the basic concepts of
foldering and are used to using foldering for file system access. The browse
paradigm typically takes longer for users to find content than the search
paradigm, and it requires users to have inherent knowledge to be able to reliably
find content.

Chapter 5. Basic repository design 115


Folder classes are created through a wizard interface in IBM FileNet Enterprise
Manager. by right-clicking on an existing folder class, such as the base folder
class, and choosing new class (the folder classes are grouped under the heading
of other classes along with a number of other class types).

The first page of the wizard prompts for the display name, symbolic name, and
description of the folder class. The wizard displays a list of currently used names
that are not allowed to be reused.

The second page of the wizard prompts for adding properties to the class, based
on the existing property templates. Although you have the ability to launch the
property template wizard from this page and create a new property, the best
practice is to have established all the necessary property templates prior to
creating the folder classes.

The third page of the wizard allows you to set the attributes of the properties of
this class. You can set properties to be required, hidden, the name property, a
default value for the property, a maximum size, associate a choice list with the
property, and other settings. You can set the attributes not only of the custom
properties that you defined on the second page of the wizard, but to any system
or inherited property as well.

The fourth page of the wizard prompts for any event auditing that you want on
objects of this class. By defining trigger events, subsequent entries are made in
the audit log when those events occur.

The fifth and final page of the wizard allows all of the information entered to be
reviewed one final time prior to acceptance and the actual creation of the folder
class.

Recommendation: In most cases, the search paradigm offers a much better


model for performance and maintenance.

Avoid too many layers of too many folders (keep the total number to tens of
folders, not hundreds); this can impact retrieval performance.

There needs to be a single, top-level folder class that extends the base folder
class and from which all other folder classes will be derived.

All property templates and choice lists must be created prior to creating any
folder classes that utilize them.

Each folder class encapsulates a single design aspect.

116 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Folder class characteristics
Folder classes have the following characteristics:
򐂰 Have metadata
򐂰 Are containable.
򐂰 Are not versionable
򐂰 Are not content
򐂰 Are containers

5.6.5 Custom object classes


Custom object classes are design objects that contain metadata without content
and provide no containment. They are designed to be versatile general purpose
objects that can be subclassed to perform a variety of functions.

Most of the same considerations that are given to creating document object
classes (See 5.6.3, “Document classes” on page 110) also apply to designing
custom object classes: Single top-level class from which all others are derived,
single design aspect captured per class, design with changes in mind, design in
modularity, and do not repeat any mistakes that the current system or processes
might have.

Custom object classes are created through a wizard interface in IBM FileNet
Enterprise Manager by right-clicking on an existing custom object class, such as
the base custom object class, and choosing new class (the custom object
classes are grouped under the heading of other classes along with a number of
other class types).

The first page of the wizard prompts for the display name, symbolic name, and
description of the custom object class. The wizard displays a list of currently
used names that are not allowed to be reused.

The second page of the wizard prompts for adding properties to the class, based
on the existing property templates. Although you have the ability to launch the
property template wizard from this page and create a new property, the best
practice is to have established all the necessary property templates prior to
creating the custom object classes.

The third page of the wizard allows you to set the attributes of the properties of
this class. You can set properties to be required, hidden, or the name property, a
default value for the property, a maximum size, associate a choice list with the
property, and other settings. You can set the attributes not only of the custom
properties that you defined on the second page of the wizard, but to any system
or inherited property as well.

Chapter 5. Basic repository design 117


The fourth page of the wizard prompts for any event auditing that you want on
objects of this class. By defining trigger events, subsequent entries are made in
the audit log when those events occur.

The fifth and final page of the wizard allows all of the information entered to be
reviewed one final time prior to acceptance and the actual creation of the custom
object class.

Recommendation: There must be a single, top-level custom object class that


extends the base custom object class and from which all other custom object
classes will be derived.

All property templates and choice lists need to be created prior to creating any
custom object classes that utilize them.

Each custom object class encapsulates a single design aspect.

Custom object class characteristics


Custom object classes have the following characteristics:
򐂰 Have metadata
򐂰 Are containable
򐂰 Are not versionable
򐂰 Hold no content

5.6.6 Compound documents


A compound document is a collection of documents that are used together to
form a single complete document. A compound document consists of a parent
document component, some number of child document components that are
linked to the parent document, and a set of component relationship objects that
perform the linking function. The parent document is not required to have any
content associated with it. The child documents are linked to the parent
document with component relationship links. It is possible for a document to be a
child component of one compound document and also a parent component of
another compound document that consists of a parent document component.
This parent document is not required to contain content. A child document
component is linked to a parent document component by a component
relationship object. Any child document component can be the parent document
component of another compound document structure.

In addition to the methods for searching and browsing normal documents, you
can also search for component relationship objects using the IBM FileNet
Enterprise Manager’s search functionality.

118 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Compound documents provide benefits to organizations by enabling:
򐂰 Independent modifications to various components
򐂰 Reuse of components in other documents
򐂰 Time savings
򐂰 Enhanced document quality and integrity

Most of the same considerations that are given to creating document object
classes (See 5.6.3, “Document classes” on page 110) also apply to designing
compound document classes: A single top-level class from which all others are
derived, single design aspect captured per class, design with changes in mind,
design in modularity, and do not repeat any mistakes that the current system or
processes have.

Compound documents have no special class definition processes, but can be


made from any document class by selecting the options in the document object
properties dialog in IBM FileNet Enterprise Manager.

Because the concept of compound documents can be confusing, we present an


example use case for compound documents here.

Use case example: Legal discovery documents


A realistic example is for legal discovery documents. When matters are litigated,
the opposing sides typically request copies of all of the documents that the other
side has regarding the matter. There are standards for how this information is
transferred and stored. Any given document exists in three or four forms, a TIFF
image of the original document, an ASCII file capturing the content of the
document, a tag file of applicable topics as chosen by the attorney, and
sometime the original document in its electronic format. This collection of
documents is perfectly suited to be treated as a compound document with either
the electronic document or the TIFF image as the parent, and the ASCII
rendition, the tag file, and possibly the TIFF image are the children documents.

5.6.7 Property templates


Property templates are used throughout the design as established containers for
properties. A property template contains a name, a property data type, and a set
of attributes. This enables the definition of common properties to occur once and
be utilized throughout the design in a uniform manner. Properties, such as
FirstName, LastName, and PolicyNumber, are typical generic property templates
in a design.

There are two types of properties: the system properties that come preinstalled in
P8 Content Manager and custom properties that you create for your specific
installation. All of these properties can be utilized in any definitions as you see

Chapter 5. Basic repository design 119


appropriate. Typically, there is a rich set of system properties associated with the
base classes. The system properties that are by default associated with a class
must be examined to both prevent duplication of information and to understand
what is available to be leveraged by your class definitions.

The largest distinction between these property types is how they are displayed in
the properties tab of a class properties dialog. You can selectively display just
your custom properties, your custom properties and the system properties, or all
properties associated with a class.

Property templates must always have a data type associated with them. The
data type can have a cardinality of either single value or multi-value for all data
types. The basic data types used by P8 Content Manager are:
String Can contain any printable character up to the string size
limit set
Integer Can contain signed integer numbers with up to a 32-bit
representation
Object Can contain a reference to any object within the object
store
Float Can contain floating point numbers up to a 64-bit
representation. Floating point is inherently inexact in its
representation of decimal values and must only be used in
scientific and mathematical contexts where that is
understood.
Date/Time Can contain a representation of the date and time,
containing year, month, day, hour, minute, second, and
millisecond
Boolean Can contain a Boolean value of either true or false
Binary Can contain arbitrary BLOBs of binary data that can
contain non-printable characters. You cannot search on
multi-value binary data.
Primary ID Can contain a Microsoft global unique identifier (GUID) for
reference to external entities

There are a number of attributes and controls that you can set on a property
template:
Name property A flag that indicates that the value of this property must be
used as the display name for the object as opposed to
using the GUID or filename defaults
Value required A flag that indicates that this property might not be empty

120 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Hidden A flag that indicates that this property must not be visible
to the users of this class
Settability Controls when this property can be modified. The choices
include read/write, settable only on create, and settable
only before checkin.
Category An optional string value that can be used to group similar
property templates together and can be used for sorting
purposes
Choice list An association with a defined choice list that restricts the
users to selecting a value from the choice list as opposed
to a free-form entry
Minimum value Depending on the property data type, this property sets
the minimum allowed value to which the property can be
set.
Maximum value Depending on the property data type, this property sets
the maximum allowed value to which the property can be
set.
Default value Depending on the property data type, this property sets
the default value for the property.
String size Depending on the property data type, this property sets
the maximum length that the string property can be.

Property templates are created utilizing IBM FileNet Enterprise Manager and the
property template wizard. The wizard can be invoked by selecting the property
template folder in a specific object store, right-clicking, and selecting new
property template.

The first page of the wizard prompts for the display name, symbolic name, and
description of the property template. The wizard displays a list of currently used
names that are not allowed to be reused.

The second page of the wizard prompts for the selection of the property
template’s data type, one of the previously mentioned types.

The third page of the wizard allows the selection of a choice list to be associated
with this property template. This page of the wizard is only shown for the string
and integer data types. When the data type is string, this page allows the
association of a marking set in place of a choice list (For details about marking
sets, see Chapter 6, “Security” on page 131).

The fourth page of the wizard allows you to set the cardinality of the property
template and, in the multi-value case, if these values are either non-unique but

Chapter 5. Basic repository design 121


ordered, or unique and non-ordered. In addition, this page allows an additional
dialog to pop up to set the additional property template attributes.

The fifth and final page of the wizard allows all of the information entered to be
reviewed one final time prior to acceptance and the actual creation of the
property template.

Recommendation: Property templates need to follow a standardized naming


scheme and topology established at the enterprise level.

Property templates need to be generic enough that they can be used in a


number of design classes, but not so generic that they cannot be given a
meaningful name.

Avoid the creation of property templates that are named in such a manner that
it might be confusing to know which template to use.

Avoid the creation of property templates that encapsulate the same


informational data but have distinct names.

5.6.8 Choice lists


Choice lists are defined to limit users from being able to enter free-form text or
integer data into a property. Choice lists protect against typing mistakes and
other human errors. It is not always appropriate to use a choice list, because
there must be a well understood, mostly static set of data that the property can
take.

Choice lists can consist of levels of groupings of values to make it easier for the
correct value to be selected. In the case of multi-value properties, the user can
select multiple entries from the choice list. Choice lists are created by selecting
the choice list wizard in IBM FileNet Enterprise Manager by right-clicking on the
choice list folder and selecting new choice list.

The first page of the wizard prompts for the display name, symbolic name, and
description of the choice list. The wizard displays a list of currently used names
that are not allowed to be reused.

The second page of the wizard prompts for the selection of the choice list data
type, which can be either integer or string.

The third page of the wizard allows interactive building of the choice hierarchy by
adding, moving, and deleting groups and entries until the desired choice list is
created.

122 IBM FileNet Content Manager Implementation Best Practices and Recommendations
The fourth and final page of the wizard allows all of the information entered to be
reviewed one final time prior to acceptance and the actual creation of the choice
list.

Recommendation: Group choice list elements logically with the user


experience in mind.

Limit the number of elements in each group to a small enough set that it can
be easily displayed and scanned.

Avoid assigning the same value to more than one item in a choice list.

Do not use choice lists for properties where the values are expected to change
frequently.

5.6.9 Annotations
Annotations allow users to link additional information or comments to other
objects, such as documents. These annotations can be in any format, such as
text, audio, video, image, highlight, and sticky note. An annotation’s content does
not necessarily have to be the same format as its parent document and can be
published separately. Document annotations are uniquely associated with a
single document version; they are not versioned or carried forward when their
document version is updated, and a new version is created.

You can modify and delete annotations independently of their annotated object.
However, you cannot create versions of an annotation separately from the object
with which it is associated. By design, the annotation will be deleted whenever its
associated parent object is deleted. Annotations receive their default security
from both the annotation’s class and the parent object. You can apply security to
annotations that is different from the security applied to the parent.

Applications that use annotations will likely add properties for the particular kind
of annotation being implemented. For example, a property can be added
indicating the presence and location of the annotation. A voice annotation needs
a BLOB property in order to contain the sound file.

Annotation classes are created by utilizing the annotation wizard in IBM FileNet
Enterprise Manager by right-clicking on an existing annotation under other
classes and choosing new class.

The first page of the wizard prompts for the display name, symbolic name, and
description of the annotation class. The wizard displays a list of currently used
names that are not allowed to be reused.

Chapter 5. Basic repository design 123


The second page of the wizard prompts for adding properties to the class, which
are based on the existing property templates. Although you have the ability to
launch the property template wizard from this page to create a new property, the
best practice is to have established all the necessary property templates prior to
creating the annotation classes.

The third page of the wizard allows you to set the attributes of the properties of
this class. You can set properties to be required, hidden, the name property, a
default value for the property, a maximum size, associate a choice list with the
property, and other settings. You can set the attributes not only of the custom
properties that you defined on the second page of the wizard, but to any system
or inherited property as well.

The fourth page of the wizard allows you to set the storage parameters of the
class. You can assign a storage policy here, direct the class to use a specific
storage area, or choose to inherit the settings of its parent class.

The fifth page of the wizard prompts for any event auditing that you want on
objects of this class. By defining trigger events, subsequent entries are made in
the audit log when those events occur.

The sixth and final page of the wizard allows all of the information entered to be
reviewed one final time prior to acceptance and the actual creation of the
annotation class.

Annotations are created on a document or other object by utilizing the new


annotation wizard. This wizard is accessed from the property sheets and context
menus of objects that can be annotated.

The first page of the wizard prompts for a description of this specific annotation
and gives you the option to associate one or more content objects with the
annotation.

The second page of the wizard is displayed when you are associating content
with the annotation. It prompts for the location of the files to add and allows you
to add any number of them to this annotation.

The third page of the wizard allows you to select the specific annotation class for
this annotation, as well as the storage policy to use for the annotation.

The fourth and final page of the wizard allows all of the information entered to be
reviewed one final time prior to acceptance and the actual creation of the
annotation.

124 IBM FileNet Content Manager Implementation Best Practices and Recommendations
5.6.10 Document life cycles
Document life cycles allow for the fact that a given document exists in a number
of states throughout its lifetime. Figure 5-9 shows a sample state diagram for the
typical document life cycle in the XYZ corporation.

A3
Retain Final
Version

A2 Share/
B3
Collaborate Retain Final
Version

A B C
Personal A1 Revise Process Workgroup Corporate
B1 B2 Revise
Documents (workflow) Documents Documents

NON RECORDS NON RECORDS RECORDS

C1
Prepare for
Revision
A4 Destroy B4 Destroy C2 Destroy

SYSTEM-GENERATED SYSTEM-GENERATED
AND EXTERNAL AND EXTERNAL
DOCUMENTS DOCUMENTS
(NON-PRODUCTION) (PRODUCTION)

Figure 5-9 Sample document life cycle model

In this example, documents are in one of three states: personal documents not
being shared or collaborated on, workgroup documents that have a limited scope
of sharing and are intended for collaboration, and corporate records that have
meaningful business value to the company.

In the first two states, a document can be revised and remain in its current state,
reach its end of life and be destroyed, or be promoted to a higher state. In the
workgroup collaboration state, documents can also be processed in some
automated way, such as through Business Process Manager. In the final
corporate document state, a document can also be demoted back to the
workgroup for revisions and updates.

While the figure captures the states and transitions between the states that a
document can take, it also illustrates how IBM FileNet Content Manager
document life cycles can be extremely useful. A document life cycle allows for the
definition of the states in which a document can be and then can associate that

Chapter 5. Basic repository design 125


document with a set of security templates that depend on the state that the
document is in. This controls the access of the document as it progresses from
being a personal document, to a workgroup document, to a corporate document.

IBM FileNet Enterprise Manager enables you to set up life cycles for documents.
Document life cycles are contained in two design classes: the life cycle policy
class and the life cycle action class:
򐂰 Life cycle policy class
The definition of the document’s states, and the policy also identifies the life
cycle action that executes in response to the state changes.
򐂰 Life cycle action class
Action that the system performs when a document moves from one state to
another.

Document types in the Content Engine have default life cycle policies. You can
also assign a default life cycle policy to any new document class. When you
create a document using a class with an associated life cycle policy, the
document uses it as a default life cycle policy. This can be overridden at creation
time by assigning a different life cycle policy to the document.

A document that has a life cycle policy assigned also receives an additional tab
in the documents property sheet. This tab enables promotion, demotion,
resetting, or placing the document in an exception state. Use this method to
change a document state manually when you design and test your life cycle
policies.

Recommendation: Assign life cycle policies to a document class whenever


possible, instead of assigning them to individual documents. This practice
helps the operator select the correct policy by choosing the document class
associated with the desired life cycle policy. This practice also prevents
problems that can occur if you need to delete a life cycle policy.

5.6.11 Events
IBM FileNet Enterprise Manager enables you to define events that extend the
functionality of an object store, which enables you to configure objects to perform
actions in response to specific activities that occur on each object defined on an
Content Engine server.

An event consists of an event action and a subscription. An event action


describes the action to take place on an object. A subscription defines the object

126 IBM FileNet Content Manager Implementation Best Practices and Recommendations
or class of objects to which the action applies, as well as which events trigger the
action to occur.

Recommendation: Add properties to subclass event actions and


subscriptions.

Keep event actions short to ensure quick completion. This is especially true
for synchronous subscriptions where the subscription processor waits for an
event action to complete before moving on to subsequent processing.

Do not rely on priority to guarantee order of execution for subscriptions.

Make sure that you thoroughly test your events and subscriptions before
implementing them.

Set up each event action with code stubs that specify each event trigger
(Create, Update, Delete, CheckIn, CheckOut, File Event, Unfile Event), even if
you do not define functions for every trigger. The subscription controls which
of the triggers call an action. You need to prepare the action to handle all
triggers gracefully.

5.6.12 Marking sets


Marking sets are intended for records management applications. They allow
access to objects to be controlled based on the values of specific properties. The
ACL for an object with a marking set is a combination of the settings of its original
ACL and the settings of the markings constraint mask for each marking that is
applied to it. The result of this combination is the effective security mask. It is
important to note that marking sets are only subtractive in nature, that is access
can only be denied or removed through marking sets.

The general mechanisms of marking sets include:


򐂰 A marking set is defined that contains several possible values called
markings.
򐂰 Each marking value contains an ACL that defines who can assign that
specific value to an object property, who can modify that value, who can
remove that specific value, and who will have access to the object to which it
is assigned.
򐂰 The marking set is assigned to a property definition that is assigned to a
class. All instances of that class must have this marking property set to one of
the markings defined in the marking set.

Chapter 5. Basic repository design 127


򐂰 The value for the marking properties can only be assigned by users
authorized by the associated marking.
򐂰 Markings do not replace conventional access permissions on an object, but
rather are coequal with them in determining access rights. In other words, if
an object has one or more markings applied to it in addition to one or more
permissions in its ACL, access to that object is only granted if it is granted by
the permissions and by the markings.

The number or size of markings in a single marking set is limited by available


system memory. To perform an access check on a marked object, the entire
marking set and all its markings must be loaded into memory. This is not going to
work if there are millions of markings. For this reason, we recommend that you
limit the number of markings in a marking set to no more than 100.

Recommendation: Marking sets need to contain no more than 100 markings.

5.7 Repository content objects


Part of the repository design process also involves how the content will be
organized and laid out in the repository, in other words, how to structure the
objects that are instantiated from the design classes.

This section touches on points for you to consider when laying out the content in
the object store.

5.7.1 Folder objects


Folder objects can be participants in both sides of aggregation by reference.
Because the foldering concept in IBM FileNet Content Manager is done by
reference, and a containable object can be referenced in multiple locations at the
same time, this can be an extremely powerful tool to meet sophisticated
requirements. Care must be given to not abuse this powerful tool by using an
excessive number of folder objects, either in a very flat structure, or a very deep
structure. In general, if the search paradigm is followed, folder objects serve a
purpose for actual reference aggregation and an additional layer of security for
those aggregated objects.

Recommendation: Try to limit the number of folder objects in the system and
try to avoid using the browse paradigm whenever possible.

128 IBM FileNet Content Manager Implementation Best Practices and Recommendations
5.7.2 Other objects
Other objects in the repository need to consider how the folder objects are
intended to be used and leverage their unique ability. Another aspect of object
repository layout is in the storage media that the content will use. Try to provide a
range of storage media and use it appropriately.

Recommendation: Try to give meaningful name properties to objects,


because this can assist users in navigating through collections of documents
returned in a search or in a browsing session.

Try to match content with storage media in a meaningful way. Internal memos
and other short-lived pieces of content without lots of business value can be
stored on a simple network-attached storage (NAS) storage device while
content that is critical to the business operation can utilize a high-speed,
highly available storage subsystem that also has a higher cost associated with
it.

5.7.3 Instantiation hierarchy


There always needs to be at least one folder under the root where you place
objects. Additional foldering structures need to be created to meet specific
requirements and need to be described in the repository design documentation.
Avoid spontaneous additions of folders and do not hesitate to instantiate classes
that might be parents of other classes but an appropriate fit for the specific object
needs.

Chapter 5. Basic repository design 129


130 IBM FileNet Content Manager Implementation Best Practices and Recommendations
6

Chapter 6. Security
In this chapter, we discuss security concepts and how they apply to IBM FileNet
Content Manager (P8 Content Manager) and how P8 Content Manager can fit
into an enterprise security architecture. We introduce the basic concepts and
aspects of security. We describe the specific ways in which P8 Content Manager
utilizes various mechanisms for security and explain how security can be
controlled to provide highly granular and layered security.

We discuss the following topics:


򐂰 Security concepts:
– Facets of security
– Models
– Authentication and authorization
򐂰 P8 Content Manager security features:
– Authentication and authorization
– Object store security
– Other features
򐂰 P8 Content Manager administration
򐂰 Java Authentication and Authorization Service (JAAS)
򐂰 Product documentation for security

© Copyright IBM Corp. 2008. All rights reserved. 131


6.1 Security concepts
Security is a very broad subject. In this section, we introduce four facets of
security (access, integrity, privacy, and verification), security models used in
securing the IT enterprise (such as Access Control Lists), and authentication and
authorization.

6.1.1 Facets of security


There are four facets of security: access, integrity, privacy, and verification.
Responsibility for some of these facets lies almost totally outside the realm of P8
Content Manager, while others are controlled directly by P8 Content Manager.
None of these security facets is totally separate from the others, but each facet
focuses on solving a specific security concern or issue. As with any enterprise
integration, the degree of control over each of these security facets is directly
dependent on the underlying enterprise security architecture. We make
recommendations therefore to capture a minimum level of best practices around
security that will enable the security features of P8 Content Manager to be fully
utilized.

Access
The access control facet of security concerns controlling the access to the data.
This access control includes controlling viewing the existence of the information,
creating information, viewing the information itself, modifying the information,
and removing the information. The access control facet is focused on being the
gatekeeper, allowing access to the information for only for the authorized
operators and the authorized processes.

P8 Content Manager, as a Java 2 Platform, Enterprise Edition (J2EE)


application, utilizes the standard JAAS security model to provide authentication
of operator IDs. The JAAS module in the J2EE container is in turn tied into the
corporate security platform. This is typically a connection to a directory service.
The configuration of the JAAS module is done at installation time, while the
security framework supporting JAAS is outside the realm of P8 Content
Manager.

Recommendation: There must be an enterprise-wide standard for access


security with an approved JAAS login module for all J2EE applications.

Integrity
The integrity facet concerns controlling the ability to alter the data. This includes
not only modification, but also creation and deletion as well. Integrity of the

132 IBM FileNet Content Manager Implementation Best Practices and Recommendations
information goes beyond the basic access control facet and provides a granular
ability to control and audit all changes that are made. This builds on the concept
of access by focusing on the control of changes that are made. This can include
specific controls over access at a highly granular level as well as specific access
permissions for specific operations, such as viewing, creating, modifying, and
deleting information.

P8 Content Manager provides for granular controls that can specify who is
authorized to alter any piece of data that it contains. These controls are provided
through ACLs that are associated with every object in the system. These ACLs
allow individual settings for actions, such as creation, modification, viewing, and
deletion based on an individual security principle.

Privacy
The data privacy facet of security concerns controlling visibility of the content of
the data. Although this appears to be the same as the access facet, this facet is
focused on techniques, such as data encryption and cryptography of the actual
data, as opposed to the access facet, which focuses on accessing the data
without regard to understanding the content or messages contained therein.

The major concern for privacy is the physical security of the hardware and
hardware access points of the information. The hardware must be trusted and
controlled as to who can physically access it. In addition, the network access
points, as well as the network topology, must be considered. Physical security is
outside the control of the P8 Content Manager.

In addition to physical security, there can be another layer of privacy focusing on


the data. There are two major divisions of the data privacy facet: static privacy or
privacy of the data while at rest and dynamic privacy or privacy of the data while
it is moving in the system:
򐂰 Static privacy
Static privacy concerns privacy controls of the data while at rest, for example,
sitting on a disk, on a specific piece of physical media, or in a specific
location. Encryption of the information or the utilization of a specialized
hardware component that encapsulates encryption is typically used to satisfy
any static requirements.
Static privacy of the information is outside the purview of the P8 Content
Manager.
򐂰 Dynamic privacy
Dynamic privacy is the privacy of the data while the data that is active in the
system is either being transferred from one specific piece of media to another
for consumption by a system or individual or for active processing in the

Chapter 6. Security 133


system. Encryption of information while it travels on the network is the major
focus of dynamic privacy.
P8 Content Manager includes Secure Sockets Layer (SSL) encryption of
network traffic, which can be set up at installation time to conform to all site
security policies as well as to use corporate specific certificates.

Recommendation: All hardware that hosts P8 Content Manager must be


physically accessible only by authorized system administrators. All network
traffic across any untrusted network must be encrypted.

Verification
The verification facet of security is concerned not directly with the data, but with
the user, or entity, that is attempting to interact with the data. The aspect of
authenticating, or verifying, the identity under which the process that is trying to
access a piece of data is running, is an example of the concerns of the
verification facet.

There are two major classes into which verification falls: a stand-alone,
self-contained verification source and a trusted federation of identity source:
򐂰 Enterprise security service (single source)
Enterprise security service is enterprise service that is responsible for
authentication identities. It is usually a single source of authentication
information, such as Lightweight Directory Access Protocol (LDAP),
administered under the same administrative authority as the rest of the
security domain in the enterprise. A corporate-wide directory service is an
example of a single source for verification. It is important to note that a
corporate-wide security service is only considered secure and valid if it is
considered the repository of record for the authentication information.
It is also possible to have multiple sources or copies of the information, but
established in some synchronized manner that allows a master service with
distributed subordinate services as required, possibly in differing formats. In
cases where a single source is not technologically compatible with P8
Content Manager, it is acceptable for a bridging technology to provide the
information in a format that is consumable by JAAS as long as it does not
cache information, but always references back to the single source.
򐂰 Security service federation (SSO)
Federation, or SSO, is the conceptual single point for individuals to
authenticate themselves and then have the resultant security token
recognized across all other locations where the same security is in place and
the token is recognized. Federation relies on established circles of trust that
allow externally verified identifications to be accepted into the system or

134 IBM FileNet Content Manager Implementation Best Practices and Recommendations
enterprise as trusted identities. Security federation technologies, such as
Security Assertion Markup Language (SAML), can also provide trusted
authentication verification for an individual or process. These technologies
enable trusted and secure transport of identity from outside the enterprise or
organization that utilizes a repository of record. This allows trusted and vetted
external entities authentication mechanisms in the system.

Recommendation: All organizations need to have a single source


repository of record for authentication information, and it must be used
uniformly across the entire enterprise.

6.1.2 Models
Various facets of security are implemented through different security models.
Where 6.1.1, “Facets of security” on page 132 focuses on security of information
from a system or enterprise view, the models focus on security from the user
perspective. These models represent different ways of approaching security
issues. They are not necessarily mutually exclusive, nor are they intended to be
exhaustively implemented. These models need to encapsulate the security
viewpoints and goals of the security policies of any entity. We also discuss how
each of these models can be applied to, or supported by, the P8 Content
Manager environment.

Access control lists (ACL)


An ACL is a list of specific identities along with their specific authorizations for
access and control of the entity to which they are attached. An example of an
ACL is the typical permission set on a file in a computer file system. It includes
ownership as well as access controls, such as read, write, and execute.

An access control entry (ACE) is a single entry from the ACL that contains a
single identity. The ACL is a list that is made up of ACEs.

P8 Content Manager supports the ACL methodology in most objects in its


system. Most P8 Content Manager objects have their own independent ACL.
Through the associations that objects have with one another, there is a certain
amount of dependent security that is also allowed through the ACL mechanism.
For example, a content element object can only exist in relationship to a
document object. The document object is independently securable, while its
various content elements are dependent on the document object to which they
are assigned.

Chapter 6. Security 135


It is crucial to understand the ACL and ACE model, how they interact, and how
security identities and principals are obtained in the P8 Content Manager to
successfully design a secure P8 Content Manager solution.

Recommendation: Fully understand the ACL and ACE model and how it is
utilized in P8 Content Manager prior to doing any design work.

Silos
Silos or islands are individual instances of security that are based on
organizational or functional composition of business. Silos are not centralized,
and they are not enterprise-wide. Silos typically represent ad hoc or
departmental security frameworks that are not tied into external resources. This
model allows isolation of security into a smaller than usual domain. It is typically
used during the development and testing of products in order to keep any
security vulnerabilities out of the production environment.

There is no direct correlation between silos and P8 Content Manager; however,


we recommend isolating the security domain during development and testing of
the P8 Content Manager applications.

Recommendation: Isolate the security domain into its own silo during
development and testing. Utilize development to test principals and
authentication mechanisms that are separate from the production systems.
This safeguards the enterprise in the event of unforeseen interactions or
breaches while tuning the security setup of P8 Content Manager.

Chinese Wall
The Chinese Wall model (also known as the Brewer and Nash model1) is a
security model where read/write access to files is governed by membership of
data in conflict-of-interest classes and data sets. This is the basic model used to
provide both privacy and integrity for data. This methodology allows the security
to be driven directly from the classes and data sets of the information,
segregating the accesses based on their content.

P8 Content Manager can support the Chinese Wall methodology through ACL
settings on the data objects contained in the system themselves. The control can
be assigned on a role basis, identifying the conflict of interest data sets at design
time.

1
Dr. David F. C. Brewer and Dr. Michael J. Nash (1989), “The Chinese Wall Security Policy”:
http://www.cs.purdue.edu/homes/ninghui/readings/AccessControl/brewer_nash_89.pdf

136 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Note: Although roles are not supported in IBM FileNet P8, IBM FileNet P8 can
leverage a role-based methodology. See “Role-based access control” on
page 138. Role-based access control is also a very important concept from
the enterprise security level.

Perimeter
The Perimeter security model concerns entrance and exit points around the
data. This model assumes that a perimeter can be visualized around the security
concerns, all entities inside the perimeter are considered trusted and secure,
while all entities outside the perimeter are suspect. This focuses the security
control on all the interaction areas between the trusted and untrusted zones.

An example is a Demilitarized Zone (DMZ), a location in the network where the


internal, trusted network overlaps the external, uncontrolled Internet. DMZ sits
between the internal intranet and the external Internet. Services deployed in this
zone have visibility to the external net as well as access to the internal net
services. The DMZ has tight security that controls the interactions between the
zones. It is accessible from outside the perimeter in the untrusted zone and has
access to services inside the trusted zone. This provides for secure, auditable,
and controlled access across the perimeter.

P8 Content Manager supports perimeter security through the JAAS modules that
control the authentication and verification of entities attempting to access the
system. The controls can be put in the JAAS layer to act as a gatekeeper across
this line. Perimeter controls that have the P8 Content Manager inside the trusted
zone are outside of the control and purview of the P8 Content Manager.

Multi-level
Multi-level security occurs at separate and distinct places in the enterprise where
each location is focused on securing a specific security aspect. This method
allows distribution of the security administration to specific organizational entities
that have the best understanding of their specific security requirements.

P8 Content Manager supports multi-level security by allowing the modification of


effective security properties of objects at all the levels of the object hierarchy.
This method allows different levels of the hierarchy to encapsulate the specific
security concerns for a specific entity.

Layered security
Layered security occurs at a number of layers, therefore, protecting the central
core element. Similar to multi-level security, a layered model accumulates the
security from all the surrounding layers into a single security setting for the data
at access time. The major distinction with this model is every layer serves to

Chapter 6. Security 137


augment security, as opposed to multi-level where each level can modify
security, potentially reducing it as designed.

P8 Content Manager supports layered security in a similar manner to multi-level


security. The distinction between these types of security is a design
consideration that must be enforced by the implementers of the design objects in
the system.

Role-based access control


Role-based access security is access to data based not on individual identity, but
based on the role that is assigned to a principal. Multiple roles can be assigned
to an individual as appropriate, as well as multiple individuals can be assigned
the same role. This creates a set of relationships between identities and roles in
some set of n:n, 1:n, n:1 or 1:1 relationships. All security is then based on the
assigned roles to verified identities as opposed to the identities themselves.

This method allows an abstraction of security roles away from the concrete
identity of a specific individual. This model allows a great flexibility in the
assignment of principal identities in a dynamic manner while retaining a relatively
static assignment of roles from the application perspective. This is a powerful
model to decouple the direct dependence of a certain application on the
underlying authentication mechanism.

From the perspective of the application, P8 Content Manager in our case, this is
indistinguishable from identity-based control as the translation from individual
principal identity and role is accomplished in the security level directly.

Although P8 Content Manager does not contain a role concept, this can be
modeled for P8 Content Manager by utilizing the corporate directory service and
assigning groups based on a role perspective. Having groups in the corporate
directory that represent roles provides that abstract layer benefit of role-based
access control.

State transition control


This model allows security to be controlled by the current state that the
information is in, transitioning security levels as it transitions state. This can be
security of an object as it goes through various states or a process external to the
data that changes the security to that data based on its external state. An
example might be a document that changes security as it is produced, reviewed,
edited, and published.

P8 Content Manager provides direct support of state transition control through


the life cycle management feature. This allows life cycles to be defined, along
with specific states, while each life cycle has its own specific set of security

138 IBM FileNet Content Manager Implementation Best Practices and Recommendations
controls. Objects assigned to the life cycle can then progress through the state
transitions and have their security aspects changed as appropriate.

Object aspect control


This model is similar in certain respects to the Chinese Wall model. The
difference is that there is no attempt to group objects into distinct
conflict-of-interest groupings, but to allow each object’s specific accesses to be
based on an aspect of the individual data, typically in combination with external
security information.

P8 Content Manager supports the object aspect model directly through the
facility of marking sets, therefore, allowing specific aspects of an object to be
directly responsible for their access control. We explain object aspect control
more completely in the P8 Content Manager security features section.

Access control matrix (ACM)


ACM is similar to ACL, but it adds an additional degree of freedom, or axis, that
considers both identity and role to determine the ACE that is relevant at this
invocation. A traditional ACL model contains entries that assign rights based on
the identity of the principal. The ACM model has identity entries that consist of a
secondary table of ACEs that is based on role for the individual. This allows a
differentiation in security based on not only the identity of an individual, but the
role that individual is acting under at any given time.

The ACM model provides a complex relationship that allows every possible
situation to be individually modeled. It is seldom used in practice due to the
overhead and complexity involved. P8 Content Manager can simulate certain
aspects of ACM through the Workplace interface and its implementation of roles.
The role-based security provided is outside of the scope of this document, and
therefore, we do not discuss it. The ACL support of P8 Content Manager, as
previously discussed, is the support for ACM provided at this level.

6.1.3 Authentication and authorization


Authentication is one expression of the verification facet of security. It concerns
taking a security assertion, which contains a principal identity and a set of
tokens, and using a security service to inspect all aspects of the assertion and
either accept or reject it. Authorization is then taking that authenticated identity
and determining if that specific identity has authorization to take a specific action
at a certain time. To be precise, authorization determines a set of access rights
(also referred to as permissions) that are granted to the authenticated identity in
respect of a particular object. Each access right granted confers authorization to
perform a particular action or type of action on the object in question.

Chapter 6. Security 139


Authentication always occurs first and is always followed by authorization for
specific operations. Individuals are authenticated in P8 Content Manager based
on its installation and configuration. The assigned security principal corresponds
to a particular individual context in the associated authentication system through
the JAAS mechanism, which verifies that users are indeed who they say they are
in a trusted manner. After this, the security principal assigned is utilized for each
individual operation to determine the rights of that identity as to whether they are
authorized to perform the requested operation, as defined in the ACLs
associated with the objects in the operation. The ACL on each securable object
will determine what actions the individual can perform on the specific object
based on their identity.

Users and groups in the security system can be authenticated by that system,
and gain access to a P8 Content Manager system in their domain, but there are
no inherent permissions with this authentication. All of the authorization
permissions exist in the P8 Content Manager-specific security contexts that have
been assigned. Individual accounts are typically mapped into role-based groups,
and then these defined roles are utilized in the system to set all the ACL
permissions as appropriate. Because the authorization mechanism is inside the
control of P8 Content Manager, it is possible for a superuser of the system to
have no rights or privileges in the P8 Content Manager.

6.2 8 Content Manager security features


Specific features and facilities in the P8 Content Manager support various
security facets, models, and levels of security as previously described. The basic
mechanisms for authentication and authorization that are implemented in P8
Content Manager provide the building blocks for all the implementation of the
various security models or methodologies. Several additional features of P8
Content Manager work with these basic building blocks to extend the power and
facility provided into a rich, highly granular security environment.

6.2.1 Authentication
Authentication of individuals, or ideally of the roles that an individual has, through
the external authentication mechanism is key to the security features in P8
Content Manager. The two standards at the core of the authentication process in
P8 Content Manager are the Java Authentication and Authorization Service
(JAAS) standard and the Web Services Security standard (WS-Security). The
JAAS standard forms the framework for security interoperability in the J2EE
world, while the WS-Security standard forms the framework for security

140 IBM FileNet Content Manager Implementation Best Practices and Recommendations
interoperability in the heterogeneous world of clients and servers that
communicate through Web services interfaces.

Companies rely on a wide variety of authentication technologies to secure their


corporate intranets. By implementing and adhering to these standards, P8
Content Manager enables a wide range of authentication integrations.

6.2.2 LDAP users and groups


P8 Content Manager depends on an established directory service for
establishing identities. The directory entries need to model various individuals,
groups, and roles for the organization. The associations with groups of identities
and the hierarchical interrelationships among various identities
must be well understood, and conventions for these associations and
interrelationships must be established. The discussion and recommendations for
setting up and maintaining this directory structure is outside the scope of this
book, as well as outside the purview of P8 Content Manager.

In addition to the LDAP authentication mechanism, there are two special


accounts that are maintained internally in the P8 Content Manager engine: the
#AUTHENTICATED-USERS logical group and the CREATOR-OWNER logical
user. These logical identities provide simplification of the management of
security when utilized in the various ACLs.

The CREATOR-OWNER identity takes on the principal identity of the user who
creates an object at creation time. This identity is typically used in the default
instance permissions and is replaced by the actual creator’s identity when the
default permissions are transferred into the object instance at creation time. This
identity is typically granted full rights to an object, because it has total control
over its modification and destruction, which allows designers to model security
that will be given to the person who actually creates an object at run time.

The other logical identity internally maintained by P8 Content Manager is the


logical group #AUTHENTICATED-USERS. Every authenticated user is treated
as a member of this group. This allows an equivalent to the
“everyone” concept, making it easy to have broad permissions for both allow and
deny to be made for all potential users. Because this is a logical group, as the
identity pool changes, the definition of this group changes as well, to encompass
the entirety of the identity pool as it exists at any point in time.

6.2.3 Authorization
When an individual, who has already been authenticated, attempts to access
IBM FileNet P8 objects, Content Engine will attempt to retrieve that individual’s

Chapter 6. Security 141


user and group memberships from the directory service provider. The user or
group will then be authorized to perform actions described by the access rights
placed on the objects via the ACL.

The ACL on a specific object has a number of entries or ACEs. Each ACE either
allows or denies a specific right or a set of rights to a specific identity. For
example, a particular class of documents can allow one identity to modify a
document and at the same time deny a second identity the same right. It is
important to note that deny always takes precedence over allow2, which means
that you must set up ACLs carefully. If an individual is allowed access to a
document under one identity but belongs to a group identity that is denied
access, the individual will not have access to the object.

Every ACE has a source, which you can view in the IBM FileNet Enterprise
Manager’s security editor:
򐂰 Default Permissions are placed on an object by the Default Instance Security
ACL of its class, as well as permissions placed on a subclass by its parent
class. Default permissions are copied from the class definition to the object’s
ACL at creation time and are treated identically to Direct ACEs. Default ACEs
are directly editable; if you edit a Default ACE, its source type becomes
Direct.
򐂰 Direct Permissions are added directly to an object. Direct ACEs are directly
editable.
򐂰 Inherited Permissions are placed on the object by a security parent or by
setting up a relationship with an object-valued property whose Security Proxy
Type has been set to Inherited. Inherited ACEs are not directly editable.
򐂰 Template Permissions are placed on the object by a security policy or
document life cycle policy. Template ACEs are not directly editable and do
not appear on classes. Rather, a document, folder, or custom object class
might have a default Security policy that will pass template ACEs to the
instances of the class, if all the conditions for the template apply.

Each ACE has one access type: either Allow or Deny. When evaluating the
access granted by a particular ACL, the current system applies ACEs in the
following order of precedence (higher in the list takes precedence over lower):
򐂰 Direct/Default Deny
򐂰 Direct/Default Allow
򐂰 Template Deny
򐂰 Template Allow
򐂰 Inherited Deny
򐂰 Inherited Allow
2
That is, within the same hierarchical level. Directly applied ACEs take precedence over inherited
ACEs.

142 IBM FileNet Content Manager Implementation Best Practices and Recommendations
You cannot remove or change an inherited access right, but you can override one
by directly allowing or denying an access right. To edit an inherited access right,
the administrator must modify the parent that is the source of the inherited
access right.

Because Deny has precedence over Allow within each category (for example, a
Template Deny takes precedence over a Template Allow), if you explicitly deny an
access right to a group and explicitly allow it to a member of that group, the
access right will be denied to the member.

Thus, if an ACL contained two ACEs that were identical in every respect except
that one was an Inherited Deny and the other a Direct Allow, the Direct Allow
takes precedence with the result that the user is allowed the ACE.

Because objects in P8 Content Manager are designed in a hierarchical,


object-oriented manner, the same hierarchal relationship exists with the ACLs on
those objects. The ACL permissions can be inheritable as well. P8 Content
Manager lets you configure whether an ACL is inherited only by objects that are
immediate children of the object in which they are defined, inherited by all child
descendents of the object, or that they are not inherited.

Access rights are assigned within P8 Content Manager to identities that are
defined in the directory service. Thus, a directory service user who might, in
other contexts, be considered a superuser, might not have any rights within P8
Content Manager, depending on how the IBM FileNet P8 administrator chooses
to assign access rights to the user.

Each ACE that is present on an object’s ACL either allows or denies the right to
do certain things. For example, a particular class of documents can allow one
user to delete a document but deny another user the same right. Following
standard practice, deny always takes precedence over allow, which means you
must set up ACLs carefully. An ACE can allow or deny rights to either a user or a
group, which is referred to as the grantee. If the grantee is a group, the ACE
applies to anyone who is a member of that group. Thus, if one user is allowed
access to a document as a user but belongs to a group that is denied access, the
user will not have access to the object.

6.2.4 Object store security


Access checking for an object, such as a document or folder, contained within an
object store takes place in two steps or, to put it another way, there are two gates
through which the object must pass before the requested operation is allowed.

Chapter 6. Security 143


Object store gate
In this step, the ACL of the object store is evaluated to determine a set of access
rights that apply to all objects in that store, conveying permission to retrieve
(view), create, modify, or delete objects. The rights will typically be granted to
#AUTHENTICATED_USERS, but the administrator can exercise more precise
control by using specific groups or user identities in place of
#AUTHENTICATED_USERS.

Note that permission granted by the object store ACL is necessary, but not
sufficient. The object must pass through both gates for the operation to be
allowed.

Object gate
The object gate evaluates the rights granted for the specific object that is the
target of the requested operation and either permits or disallows the operation
depending on whether those rights include those particular to that operation. For
example, a delete operation will be permitted or disallowed by the DELETE
access right.

Evaluation of the rights granted to the object is also in two steps. First, the
object’s own ACL is evaluated, yielding a provisional set of granted rights. Then,
a check is made of the markings applied to the object, if any, which might result
in some rights being removed from the provisional list. (Markings only remove
rights, they do not add them. Later, we discuss more details about markings).

Evaluation of the object’s ACL follows the order of precedence described earlier.
All rights are implicitly denied unless there is an explicit allow ACE that is not
overridden by a higher precedence deny.

Because the object’s ACL is formed from multiple contributing sources, as


described previously, a variety of models, used singly or in combination, are
possible for establishing the security for an object.

Inheritance model
This model allows the security of an object to be determined based on placement
in a containment hierarchy or, more generally, in any subordinate relationship to
another object, inheriting its security from the object. The mere placement of an
object in a container does not automatically activate inheritance; it must be set by
identifying the container as a security parent. In this case, the ACL is then
checked as part of this authorization process, which allows folders to have
specific ACL settings that can supersede or augment security on objects which
they contain.

144 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Life cycle model
This model allows the security of an object to be determined based on transitions
through states in a life cycle, either the implicit life cycle of the versioning model,
an explicit life cycle defined through a document life cycle policy, or
application-defined state transitions. In all these cases, the security effect of the
state transition involves the addition and removal of ACEs with a source
Template from the object’s ACL. Versioning and application-defined state
transitions enact these changes through a Security Policy. Document life cycle
state transitions enact these changes through security templates attached to the
state definitions within the document life cycle policy.

Document life cycle objects, life cycle actions, and life cycle policies have the
following security characteristics:
򐂰 Both are instances of their own classes: the Document Lifecycle Policy class
and Document Lifecycle Action class. Therefore, they obtain initial security
from the Default Instance Security ACL of their class, just as all objects do
when first created. Both classes are subclassable. You can view and modify
these classes under IBM FileNet Enterprise Manager’s Other classes node.
򐂰 Life cycle actions and policies are independently securable. They are not
required to have the same security as the security placed on the document
class (or individual document) to which they are attached. It is obviously a
much simpler security model if they do. However, it can be configured
differently if required by the needs of the application’s security.
򐂰 Document life cycle actions and policies do not have a security parent
relationship with any other object. Specifically, a life cycle policy does not
have a security inheritance relationship with the document class to which it is
associated.
򐂰 In IBM FileNet Enterprise Manager, individual life cycle actions and life cycle
policies are displayed in subfolders of the Object Stores → object store
name → Document Lifecycles node. If these folders are empty, it means that
none have been created for that object store. Both objects have property
sheets containing a Security tab, which allows you to view and modify the
security on that object.
򐂰 Like other objects, life cycle actions and life cycle policies have an owner
property. The owner does not need to be the same as the owner of the
document with which the life cycle policy is associated.

Default instance security


Each object is given an initial set of ACEs at the time that it is created,
determined from administratively controlled default instance permissions
assigned to the class definition for the class to which the object belongs. This

Chapter 6. Security 145


provides the ability to establish a uniform security model for all objects of the
same class.

Note that the default ACEs placed into the object’s ACL are copies from the class
definition. Subsequent changes to the class definition do not take effect on
existing instances of that class.

Explicit object security


ACEs can be added explicitly to the ACL of an individual object, combining with
and potentially overriding the effect of the previously described models.

Marking sets
Marking sets are intended to provide a mechanism for supporting records
management from the content inside of P8 Content Manager. Marking sets allow
specific metadata attributes to be identified that control the effective access of an
object in conjunction with the object’s ACL. Due to the specific design intentions
of marking sets, only utilize this security mechanism if there is no other feature
that can provide the model that you are seeking. When using marking sets, you
must exercise caution as well as make sure that there is a thorough knowledge
of the specifics of the operation of marking sets, which are used in the IBM
FileNet Records Manager extension to P8 Content Manager.

A marking modifies the access granted to an object (by the object’s ACL) based
on a specific property value. The marking set definition assigns a constraint
mask to each possible value of the marking-controlled property and assigns
rights to “use” each value to individual users or groups. If a user attempts to
access an object marked with a value to which that user does not have “use”
permission, the access rights represented by the constraint mask for that value
are removed from the provisional set of rights determined by evaluating the
object’s ACL. (If “use” permission is granted, nothing happens; no rights are
added).

You can have multiple properties assigned to a single class with associated
marking sets, and they will all be used to determine the final access to the object.
The collection of all markings that are actually applied to a particular object is
displayed by the IBM FileNet Enterprise Manager as the object’s “active
markings”.

6.2.5 Security policies


A security policy serves as a collection of security templates, each of which
contains a predefined list of permissions, or Access Control Entries (ACEs), that
can be configured to apply to a document, custom object, or folder.

146 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Except where specifically mentioned, this topic describes the association of
security policies with documents and document classes. To fully understand this
topic, you must be familiar with document versioning and the versioning states
Released, In Process, Reservation, and Superseded.

Note: Security policies are just one way to apply ACEs to an object’s ACL.
The other sources are the object’s class, a security parent, direct edits to the
object’s security, and by programmatically setting the object’s access rights.

Security policies allow system administrators to apply access control to large


numbers of documents without directly editing the ACL on each document.

Security policies, in conjunction with versioning states, allow a system


administrator to configure the system to automatically modify ACLs on
documents when their versioning state changes. For example, the administrator
can configure the system to automatically grant access to a document to a wide
audience when it is released.

Sequence tables detail the versioning states through which documents proceed
following check in, check out, and other versioning actions.

To create a security policy, you run one of the security policy wizards provided by
IBM FileNet Enterprise Manager and by Workplace and Workplace XT. The
wizard creates a security policy that the system administrator can then customize
by adding security templates. When created, the security policy can be
associated with documents, folders, or custom objects. Alternatively, the
administrator can make the security policy the default value for the security policy
property for one or more classes. Making a specific security policy the default for
a class ensures that all instances of the class are associated with that security
policy unless the value is explicitly overridden.

The security policy class can have subclasses, just like the other classes in IBM
FileNet Enterprise Manager’s Other Classes node. The security policy wizard
lets you create a security policy using a subclass, whereas IBM FileNet
Enterprise Manager’s wizard supports only the base class. You can also use one
of the supported IBM FileNet P8 APIs to create a security policy using a
subclass.

There are two kinds of security templates:


򐂰 Versioning security templates automatically update the permissions on
documents as their versioning state changes to one of the four possible
document versioning states: Reservation, In Process, Released, and
Superseded, for which there are four corresponding versioning security
templates.

Chapter 6. Security 147


򐂰 Application security templates can be configured to apply a list of permissions
to a document, custom object, or folder according to logic programmed into
an application using IBM FileNet P8 APIs.

A security template applies to a document version if (1) the document version


has an associated security policy, and (2) the associated security policy has a
template for the document’s version state. For example, if the document goes
into the Released versioning state, and the security policy has a Released
template, the permissions listed in the Released template apply.

Templates cannot be shared between security policies and cannot be


independently loaded or saved apart from their security policy. Permissions on
an object that originate from a security policy will appear on the object’s ACL with
a Source type of Template. And, they cannot be directly edited using IBM FileNet
Enterprise Manager’s Access Control Editor or by using the IBM FileNet P8 API.

Newly created security templates contain no default permissions that have been
placed on them by the Content Engine. Administrators can add permissions at
creation time while running the security policy wizard, or at any later time.
Applying a template that contains no permissions to an object will have the effect
of removing any existing permissions on that object that were previously applied
by a security policy.

Security policies can become associated with documents in two ways:


򐂰 By assigning it as the default security policy for a document class
򐂰 By assigning it to a specific document version

In the first case, the default security policy is automatically associated with the
object instance at the time of creation unless the default is explicitly overridden.
The default security policy will continue to be associated with all versions in the
document’s version series, unless you do something to change the association.
By having the same security policy for all documents in a class, you have a
simple, easily understandable and manageable security scheme. If, however,
you change a single document version’s class, the default security policy of the
new class (provided there is one) is immediately applied to that document
version, and the old security policy (if there was one) is removed. However,
changing a version’s class does not override a security policy that was directly
assigned to that version by a user, nor does it change any earlier versions of the
same document.

In the second case, you assign a security policy to a specific document version.
Each document version in a version series can, theoretically, have a different
security policy assigned to it. The default security policy of the document class
will be placed on each instance of the class, but you can override the default with
a different security policy. You do this manually by using IBM FileNet Enterprise

148 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Manager to open the document version’s property sheet and changing the
security policy. This is cumbersome and difficult to manage from a system
administrator’s point of view and must be done only as an exception to the
normal application by the document class.

In addition to the list of security templates associated with it, each security policy
has an important property called Preserve Direct ACEs (also called Preserve
Direct Permissions). This property, which can be set to either True or False,
governs whether or not direct permissions are preserved in the target object’s
ACL when a security template is applied to it. The value of this property applies
to all the templates contained by the security policy.

By default, this property is set to True, because this is likely to be the most
common use case. In fact, the security policy wizard does not ask you to set a
value and just sets it to True. After you have created the security policy, you can
open its property sheet’s General tab to view or change the Preserve Direct
ACEs setting.

6.3 P8 Content Manager administration support


Control of the various aspects of security and the components in P8 Content
Manager are provided through many of the available interfaces. Specific support
for the administration of the security aspects, not intended for the normal user,
include facilities for addressing, auditing, and monitoring security in the P8
Content Manager system. The main tool for administrative interface is the IBM
FileNet Enterprise Manager. It has an ACL wizard for all the objects that support
ACLs. The ACL settings can also include the ability to modify the security
settings of an object, and this authority can be passed to any role including the
user.

IBM FileNet Enterprise Manager is the tool that system administrators use in
their daily work. IBM FileNet Enterprise Manager gives system administrators
easy access to most of the administrative and security features needed for
Content Engine security configuration tasks.

Security auditing can be turned on for any class and for any operation that can
be authorized for any object. Turning on security auditing adds an audit log to the
object for which it has been activated. It is important to note that the utilization of
security auditing, especially if activated for actions, such as view or search, can
be very resource intensive both from a storage perspective as well as a
processor perspective. You must exercise care when activating any auditing
feature. When these types of changes are made, we highly recommend that
there be a formal logging of what has specifically been activated in order to

Chapter 6. Security 149


assure that the auditing features can be correctly and completely deactivated
when it is time.

Although most security changes beyond the design phase are normally atomic in
nature, explicit security settings are sometimes done in bulk. IBM FileNet
Enterprise Manager provides for a batched-mode update operation, which is
driven by a search to allow a bulk change in things, such as specific roles or
identities.

6.4 JAAS overview


P8 Content Manager relies on Java Authentication and Authorization Service
(JAAS) for authentication. JAAS is a standard J2EE service component.
Figure 6-1 shows the JAAS authentication process.

User C
redenti
als
JAAS S ubjec

JAAS
Login
Module
t

JAAS Credentials

Figure 6-1 JAAS authentication (login)

The user presents a set of credentials to the security service. These credentials
are processed through the JAAS login module, which references the directory
service as well as a secure credential store. The login module, upon
authentication of the user credentials, creates a JAAS subject containing the
identities of the user as given in the directory server. This JAAS subject is then

150 IBM FileNet Content Manager Implementation Best Practices and Recommendations
returned to the user’s session and is then available to be utilized for authorization
of operations as they are requested. The entire process of subject creation is
dependent on the J2EE architecture and on the J2EE container service, such as
WebSphere, on which P8 Content Manager is installed.

After a user has a JAAS subject, the JAAS subject is then utilized for the
authorization of specific operations as shown in Figure 6-2, which illustrates the
JAAS authorization process. The application server infrastructure automatically
passes the JAAS subject to the P8 Content Manager server. Note, this might not
be the case depending on the application server implementation. The JAAS
subject is not always directly passed over the wire. WebLogic might serialize the
subject, but WebSphere uses a security token to propagate and revive the JAAS
subject on the other end.

JAAS JAAS JAAS

Figure 6-2 JAAS authorization process

The identities that are encapsulated in the JAAS subject are from the directory
service and are the same set of identities that is utilized in P8 Content Manager
ACL settings. The authorization process takes the identity from the JAAS subject
(along with group membership information for that identity that is obtained from
the directory) and uses that combined information for comparison against
applicable objects’ ACLs.

Although the basic concepts of authentication and authorization, as well as the


ACL and ACE models, are fairly straightforward, it is in the specific usages of
these ACLs, as well as their relationships that are inherent to the design of the
solution, that allow complex and highly granular security to be active across the
entire P8 Content Manager system. The ACLs become a powerful tool that must

Chapter 6. Security 151


be considered at every level of the design and administration to assure that you
adhere to the intended model.

6.5 Product documentation for security


You can obtain product and technical documentation for the IBM FileNet P8
Platform at the following Web site:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

For exhaustive and detailed coverage of security for the IBM FileNet P8 Platform,
including the P8 Content Manager and its specifics, see IBM FileNet P8 Security
Help, GC31-5524, which is a compilation of the detailed installation and
configuration information for IBM FileNet P8 security from ecm_help. You can
download it from the previous Web site or directly from the following URL:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/40x/File
Net_P8_security.pdf

152 IBM FileNet Content Manager Implementation Best Practices and Recommendations
7

Chapter 7. Application design


In this chapter, we discuss useful principles for designing IBM FileNet Content
Manager (P8 Content Manager) applications.

We discuss the following topics:


򐂰 IBM FileNet P8 applications
򐂰 Application technologies
򐂰 Traditional Java thick clients
򐂰 Principles for application design

© Copyright IBM Corp. 2008. All rights reserved. 153


7.1 IBM FileNet P8 applications
P8 Content Manager includes a number of standard applications, and many
more applications are available as add-ons to the basic product. The applications
are aimed at different audiences and use cases. In this section, we introduce
several of the applications to serve as examples for application development.

7.1.1 IBM FileNet Enterprise Manager


IBM FileNet Enterprise Manager is a powerful tool for administrators to use in
performing routine setup, maintenance, and specialized tasks. In fact, the IBM
FileNet P8 Platform Version 4.0 Installation and Upgrade Guide, GC31-5488,
directs you to use IBM FileNet Enterprise Manager for several key steps in
completing the initial installation of P8 Content Manager. This guide can be
downloaded from:
http://www-1.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

IBM FileNet Enterprise Manager is implemented as a thick client application


using the Content Engine .NET API.

Because it is an administrator’s tool that can be used for doing extraordinary and
powerful low-level changes, IBM FileNet Enterprise Manager strikes a balance. It
exposes low-level details of the P8 Content Manager, yet it remains usable
through extensive task wizards and other user interface help.

There are many, many things for which IBM FileNet Enterprise Manager can
used, but here are just a few of the things for which you are most likely to use it:
򐂰 Creating an object store, storage area, or other infrastructure object
򐂰 Creating or changing a marking set
򐂰 Adding classes and properties to an object store
򐂰 Creating and running queries for routine maintenance and troubleshooting
򐂰 Adjusting Content Engine server logging levels
򐂰 Managing subscriptions and events
򐂰 Browsing the contents of an object store
򐂰 Exporting and importing objects from and to object stores

Although IBM FileNet Enterprise Manager is an administrator’s tool, it uses the


normal Content Engine APIs, and you are subject to normal security access
checks. The administrator running IBM FileNet Enterprise Manager typically has
a high level of security access, but IBM FileNet Enterprise Manager does not and
cannot provide any additional privileges.

154 IBM FileNet Content Manager Implementation Best Practices and Recommendations
7.1.2 Workplace and Workplace XT
In contrast to IBM FileNet Enterprise Manager (see 7.1.1, “IBM FileNet
Enterprise Manager” on page 154), Workplace and Workplace XT are intended
for the wider audience of non-administrator users. Even though they are generic
in nature, they still provide a comfortable and productive user interface for
accomplishing a variety of everyday tasks. Workplace and Workplace XT are
Web applications. The user interface for these applications uses emerging Web
2.0 and Ajax technologies to closely model a desktop application experience. It
provides easy-to-use windows and wizards for navigating and searching for
documents and folders. It also provides integration with the Process Engine
through an inbox for workflow tasks and the ability to manage and launch
workflows.

Note: Technically, Workplace refers to the older version of the Web


application. Workplace XT is the name for the new version of the Web
application. The new version has a better user interface and is easier to use
than the older one.

In this book, for simplification, when we use the term Workplace, it is


applicable to both the new version Workplace XT and the older version
Workplace.

7.1.3 Designer applets


Workplace provides tools with advanced capabilities that are not of interest to all
users. In this context, the advanced tools are used for the creation of special
infrastructure objects, which are likely to then be used by a wider set of users.
These are implemented as Java applets and are launched from the Workplace
tools menu. (Not all of the tools are visible to all users, because Workplace has
an access roles feature and intentionally hides certain features if you do not have
access to them.) There are several tools, including:
򐂰 Search Designer, which is used to create or edit Stored Searches and Search
Templates
򐂰 Process Designer, which is used to create or edit workflow definitions
򐂰 Process Simulation Designer, which is used to create or edit definitions for
use with IBM FileNet Process Simulator

Chapter 7. Application design 155


7.1.4 Application Integration for Microsoft Office
Workplace also provides a downloadable installer for Application Integration for
Microsoft Office. When installed and configured, it adds menu items to office
applications so that you can directly open or add documents to a P8 Content
Manager repository without having to run an external application.

Application Integration is implemented using COM objects meeting interfaces


specified by Microsoft. Communication with the Content Engine is via a custom
HTTP and XML request/response protocol. The requests are serviced by a
Workplace servlet and requests are translated into Content Engine Java API
calls.

7.1.5 IBM FileNet Business Process Framework


As its name implies, IBM FileNet Business Process Framework (BPF) provides a
framework for the rapid development of custom applications for your
environment. With no programming and only limited technical knowledge of P8
Content Manager internals, you can create results-oriented applications suitable
for use by non-technical users. The user interface can use a combination of
traditional Web-based windows and input forms, as well as integrating the use of
IBM FileNet P8 eForms, for high-fidelity and familiar rendering.

BPF is optimized for creating applications that involve workflow steps for
processing content. Many business applications fall into this category, which is
often called case management. Examples of case management in everyday
situations include loan processing, customer service inquiries, document
authoring and approval cycles, and many others. Typically, there are one or
more documents, which together must go through several logical business steps
performed by different organizations, employees, or automated systems.

BPF makes it simple to create and deploy such applications. Internally, it


implements a generic and customizable case management object for keeping
track of the documents and other information. To create an application, you use
the IBM FileNet Business Process Manager (BPM) Process Designer tool (a
Java applet) to create workflows. You import the workflow into the BPF Explorer
(a thick client). After making several customization choices in BPF Explorer, you
use it to generate your custom BPF-based Web application. The deployed
application provides an inbox of tasks for your users. The display of the task
shows the associated documents and other information and allows the user to
make decisions and take actions to move the case forward.

156 IBM FileNet Content Manager Implementation Best Practices and Recommendations
7.2 Application technologies
Content Manager comes with a set of applications that you can use as is. The
applications include IBM FileNet Enterprise Manager, Workplace XT, and others
(see 7.1, “IBM FileNet P8 applications” on page 154). The operations and
interfaces provided by these applications might not always satisfy your
company’s business requirements. In many circumstances, you have to create
custom applications to fulfill your business needs. Your applications will be
designed with specific business goals in mind, and those come in many varieties.
We do not attempt to cover business goals here. Instead, we discuss more
general technical application technologies.

7.2.1 Traditional Java thick clients


P8 Content Manager’s Content Engine (CE) consists of Java 2 Platform,
Enterprise Edition (J2EE) components, but your client application can be a Java
thick client. By thick client, we mean an application running in its own Java virtual
machine launched from the desktop. It can be a simple command line program or
have a full-featured graphical user interface. Because it is launched from the
local client machine, there are virtually no restrictions on what a thick client
application can do.

A thick client application normally consists of directories or Java archive (JAR)


files of Java classes, both for the application and for supporting utility libraries.
One of the biggest problems in using thick clients is the logistical hurdle of
keeping all of the copies of the application up-to-date. This trait is not unique to
Java applications; it is the same for any thick client technology. Because of this
problem, however, thick clients are best suited for use by a small number of
users or for very mature and stable software.

One of the JAR files distributed with P8 Content Manager is Jace.jar. The
Jace.jar file contains the classes and supporting files for the Content Engine
Java API. The API acts remotely from the server but ultimately must be able to
communicate with the server to do any productive work. For a discussion of the
available transports for communicating with the server, see 7.3.2, “Transports
available with the APIs” on page 165. The use of the Web services transport with
a thick client is easy to understand: the API translates requests into Web
services calls, and the server’s Web services listener receives and responds to
them.

If Enterprise JavaBean (EJB) transport is used, the interaction within the API is
more complicated. The details of the interaction are not exposed, and you
generally do not have to worry about them. However, it is useful to have a basic
understanding. The J2EE specification refers to a stand-alone Java virtual

Chapter 7. Application design 157


machine, which is remotely connected to the application server as an application
container. Some J2EE documentation refers to these types of applications as
thin applications. It might seem confusing that the applications are called both
thick and thin applications, but the terminology of thin applications makes sense
from the point of view of the J2EE application server. In contrast to the containers
within the application server, the J2EE-oriented services provided in the
application container are quite a bit more limited. Ordinarily, you have to worry
about the propagation of the authentication context and transaction context. The
J2EE specification gives application server implementers some leeway in
providing these services. To be absolutely sure of the latest information, check
the application server documentation for the specific application server release.
When an application server provides these services (most do, though sometimes
with limitations), they typically provide one or more JAR files, configuration files,
and command line options, which together comprise the application container
environment.

7.2.2 Java applets


A specialized form of thick client is a Java application that runs inside a
security-constrained Java environment in a Web browser. This kind of application
is called an applet. An applet can have most of the rich interactions of a
traditional thick client, but it has advantages and disadvantages.

The most obvious disadvantage is that the users must run a Java-capable Web
browser, and the use of Java applets must be enabled. All major Web browsers
are Java-capable, but for security reasons, organizational policies sometimes
forbid enabling the running of Java applets.

An applet is launched from a link on a Web page. The applet infrastructure has
built-in mechanisms for caching the applet and its supporting JAR files on the
client machine. The infrastructure automatically notices version updates and
performs fresh downloads when needed.

Applets have restricted access to client system resources. Your applet can be
granted access to whatever resources you need, because Java has a rich
permissions infrastructure. However, the conservative permissions infrastructure
makes it difficult to deploy even simple applets without security pop-up dialogs.
The dialogs either create unnecessary worry in users, or they are casually
approved without full consideration. This unpleasant aspect of the user
experience has given applets a reputation for being difficult for the everyday
user, and their use is often limited to experts and administrators.

158 IBM FileNet Content Manager Implementation Best Practices and Recommendations
7.2.3 J2EE Web applications and other components
The technology underlying much of the enterprise software development these
days is Java 2 Enterprise Edition (J2EE). The J2EE platform helps you make
efficient use of resources by providing common services, such as security, high
availability, transaction management, and scalability. Because the platform
provides these services with mechanisms for configuring them when the
applications are deployed, you are free to concentrate on business logic in your
applications. The Content Engine, which is implemented as J2EE components,
uses many common features of the J2EE platform. You can write Content
Engine applications with traditional thick client Java applications or even
non-Java client technologies, but the tightest integration will naturally be
available when your application is integrated with a J2EE application server.

There are many standardized technologies available in the J2EE platform, but a
couple are particularly worth mentioning, because they often show up in typical
J2EE application development: servlets and Enterprise Java Beans (EJBs):
򐂰 The J2EE servlet container is often thought of as the container for Web
applications, because it represents the tier where J2EE presentation logic is
generally placed. Web applications are perhaps the most popular use for
servlets, but it is not necessary to have an actual Web interface to use
servlets. For example, the P8 Content Manager WebDav provider is
implemented using a servlet, and the user interface is provided by the
WebDav client applications. The servlet container is appropriate for
application components, which receive and respond to outside requests and
which optionally preserve some state on the server side between requests.
򐂰 The J2EE EJB1 container provides what are often thought of as
enterprise-level services. For example, EJBs can have declarative security
and transactional properties, provide transparent load balancing across
servers, and provide nearly transparent access to relational databases. EJBs
are frequently used to encapsulate reusable business logic and seldom, if
ever, contain any presentation logic.

7.2.4 Service-oriented architecture


Service-oriented architecture (SOA) is not a specific platform or technology
product. It is a set of architectural principles and guidelines for creating
enterprise applications and components. Of course, many vendor products and
common technologies have arisen for the implementation of SOA, and those
specific choices have become intertwined with the abstract concepts of SOA.
1
Although the Content Engine Java API uses EJBs internally to implement the EJB transport, those
EJBs are not exposed or available to application developers. They are accessible only indirectly
through the use of the Java API.

Chapter 7. Application design 159


Perhaps the defining characteristic of SOA is a loose coupling between the client
and the server. Services provided are independent of the client application
requesting them (with the exception of normal security constraints). The server
will seldom, if ever, maintain any state after servicing a client request. This
characteristic maps well to the common implementation technology of Web
services; although, other technologies are available and in use.

Consider implementing SOA if you can decompose your application logic into
small, well-defined request and response pieces that map well to the loose
coupling of SOA.

7.2.5 .NET components


Just as the Java community has standardized on J2EE as a software component
architecture, Microsoft has popularized the .NET environment. .NET shares
many concepts with Java and J2EE, but, from the point of view of the Content
Engine, only clients can be written using .NET technology. .NET is fundamentally
incompatible with Java and J2EE except when interacting via a common
protocol. In the case of P8 Content Manager, the common protocol is the Web
services transport of the IBM FileNet P8 Content Engine APIs or the direct use of
IBM FileNet P8 Content Engine Web Services.

7.3 Principles for application design


In this section, we present principles to consider when designing your own
applications. Obviously, situations vary, and not all of these principles apply to
every situation. Our intention is to give you a brief survey, which will have a
bearing on your designs and which might even suggest new application designs
to you.

7.3.1 Available P8 Content Manager APIs


One of the goals of the P8 Content Manager is to make all features available
through robust APIs. P8 Content Manager applications add their own utility
layers, often with significant amounts of application logic, but interaction with the
server always comes down to a set of calls to published and documented APIs.
Those APIs are also available to you for custom application development. If you
see a feature in a P8 Content Manager application, you can be confident that
your custom application can do the same or similar things via the APIs.

This section describes the APIs available in the IBM FileNet P8 4.0 release. We
do not discuss the compatibility APIs (for Java and Content Manager) that exist

160 IBM FileNet Content Manager Implementation Best Practices and Recommendations
to help in the transition of applications written for earlier P8 Content Manager
releases. The IBM FileNet P8 Platform Version 4.0 Installation and Upgrade
Guide describes features and limitations of the compatibility APIs. We
recommend the IBM FileNet P8 4.0 APIs for any new development and, where
possible, for additions to existing applications. It is not possible to use the IBM
FileNet P8 4.0 APIs to communicate with a pre-4.0 Content Engine server.

Sample applications are made available from time to time in the support area for
P8 Content Manager at:
http://www.ibm.com/software/ecm

Java API
P8 Content Manager provides a full-featured Java API. Any feature that is
available in the server is completely available to Java programmers. This access
includes routine operations, such as retrieving and updating Document objects,
and specialized operations, such as adding a custom class or property to an
object store’s metadata definitions.

Because this is not a programmer’s manual, we do not cover programming


details of the Java API, but we discuss a few of the major principles behind it.
Refer to the online help files, IBM FileNet P8 Documentation, for complete
reference material for the Java API as well as how to information.

The Java API is structured with Java classes having names that match system
metadata classes in the server. For example, there are Java classes for
Document, Folder, CustomObject, and so on. When you instantiate a Java object
(usually through a subclass of the Factory class), it refers to an object that
actually resides in the server. The API maintains stateless interactions with the
server and is intentionally loosely coupled, which means that the API objects are
not actually holding a server-side connection or other resources for the life of the
Java object.

In simplified terms, an API object can be thought of as containing the following


information:
򐂰 Something that identifies the object residing on the server. Typically, this is an
object store and an object ID or path.
򐂰 Some number of locally cached properties. These might have been fetched
from the server, or they might have been set locally. A property value that has
been set or changed in the API object and not yet sent to the server is said to
be dirty, because its value does not match what is on the server.
򐂰 Some number of pending actions. When you call a method that implies a
change to the object (other than simple property value changes), the change
is not made immediately. Instead, a representation of that change is added to

Chapter 7. Application design 161


the API object’s list of pending actions. For example, if you call the method
Document.checkin(), a Checkin pending action is added to the API object.

Dirty property values and pending actions are not sent to the server until an
explicit call is made to do so. If an API object is discarded without that call, the
changes are never made on the server. The most common method of sending
changes to the server is to call the save() method on an API object. There is also
a batching mechanism for sending updates to multiple objects in a single
round-trip over the network. Batching provides improved performance and
provides transactional atomicity for all of the changes in the batch.

.NET
P8 Content Manager provides a full-featured .NET API, which you can use to
write programs in any .NET-compatible language. With a couple of exceptions,
any feature that is available in the server is completely available to .NET
programmers. The exceptions are mainly custom code that must be executed
within the server, for example, EventActions. Because the Content Engine
server is a J2EE application, internally executed custom code is limited to
Java-compatible technologies.

The principles behind the .NET API are the same as those behind the Java API
(see “Java API” on page 161), so we do not repeat that discussion here. One
significant feature available only with the .NET API is the use of Kerberos to
perform authentication via Microsoft Windows Integrated Login. This is only
possible when both the client application and the Content Engine server are
running on Microsoft Windows.

Web services
Modern, loosely coupled frameworks, such as service-oriented architecture,
favor Web services protocols for connecting components. P8 Content Manager
provides Content Engine Web Services (CEWS) for accessing nearly all features
available in the server.

Typically, if you as a programmer want to use a Web services interface, you


obtain the interface description in the form of a Web Services Description
Language (WSDL) file. You run the WSDL file through a toolkit to generate
programming language objects for interacting with the Web services interface.
You then usually build up a library of utilities to provide abstraction layers,
caching, security controls, and other conveniences. The Java and .NET APIs
provided by P8 Content Manager are already exactly equivalent to that, and both
APIs can use Web services as a transport (see 7.3.2, “Transports available with
the APIs” on page 165). Consequently, there is not as much motivation to use
CEWS directly.

There are still a few occasions where the direct use of CEWS might be useful:

162 IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 You have an application already using CEWS, and no plans exist for
immediately porting it to the Java or .NET API.
򐂰 You are building an application component as part of a framework in which
the use of Web services is the model for communicating with external
systems.
򐂰 Although a rare occurrence, you might be using a language or technology that
can make use of Web services but is not compatible with the use of a Java or
.NET API.

For these occasions, the direct use of CEWS is a good choice and is fully
supported.

When setting things up, use care in choosing the Web services endpoints. There
are two distinct sets of endpoints. One set, which you sometimes hear referred to
as “the 3.5 endpoints”, has WSDLs compatible with CEWS from P8 Content
Manager 3.5.x releases. Those endpoints, which are easily identified by noting
the 35 in the endpoint name, are supported for direct use. The other set of
endpoints, sometimes referred to as “the 4.0 endpoints”, is used internally by the
APIs for transport and is not supported for direct use outside of the APIs.

In theory, you can take the WSDL file for CEWS and use any current Web
services toolkit to generate the interfaces that you will use on your end. In
practice, however, toolkits are still individualistic in their handling of various
WSDL features, and it is difficult to write a WSDL for a complex service that is
usable by a wide cross-section of Web services toolkits. Check the latest
hardware and software support documentation, IBM FileNet P8 Platform 4.0.x
Hardware and Software Requirements, and only use a supported toolkit.

Your toolkit will generate programming language stubs and other artifacts so that
you can include CEWS calls in your program logic. The details of those artifacts
vary from toolkit to toolkit, but you will surely see representations of Content
Engine objects, properties, and update operations. The P8 Content Manager
product documentation describes how these pieces interrelate, but we
recommend you also read the developer documentation for the Java or .NET API
to get additional understanding of how the things mentioned in the WSDL were
intended to be used.

API classes overview


Most of the classes2 in the APIs correspond directly to classes also found in the
Content Engine metadata hierarchy or to collections of objects of metadata
classes. For example, there are API Folder and FolderSet classes
2
For simplicity, we use the term classes here to refer to both classes and interfaces in the APIs. It will
be obvious to the programmer which are classes and which are interfaces when actually working
with one of the APIs.

Chapter 7. Application design 163


corresponding to the metadata Folder class. There are also many classes which
appear only in the APIs to assist in programming language aspects. For example,
Factory subclasses exist to provide a way to instantiate various other objects,
and the DeletionAction class exists only to provide type-safe enumeration
constants for use in API method calls.

In all, there are over 600 classes in the APIs. To assist developers in keeping the
usage of all these classes organized, the classes themselves are arranged into
packages in the Java API and namespaces in the .NET API. The arrangement is
similar between the two APIs, differing only in the stylistic naming conventions of
the different programming environments. All exposed classes (classes supported
for external use) are in subpackages of com.filenet.api (in Java) or
subnamespaces of FileNet.Api (in .NET). Unless specifically documented
otherwise, classes in any other package or namespace are strictly internal and
not supported for external use.

Table 7-1 summarizes the more prominent types of classes in the APIs. The table
is intentionally incomplete and only gives a flavor of the API organization.

Table 7-1 Types of classes in the APIs


Package Description Examples

core Provides classes related to the EntireNetwork, Document,


core business objects and other Folder, CustomObject, Factory,
classes that will be used in most Batch, Connection
applications

meta Provides classes for holding ClassDescription,


immutable metadata for Content PropertyDescription,
Engine classes and properties PropertyDescriptionDateTime

admin Provides classes used in the ClassDefinition,


administration of an Content PropertyDefinition,
Engine. This includes classes for PropertyDefinitionDateTime,
updating metadata objects. DirectoryConfiguration,
PEConnectionPoint,
ServerInstance,
TableDefinition

security Provides classes related to User, Group, AccessPermission,


authentication, authorization, and MarkingSet
user-specific and group-specific
data

query Provides classes related to SearchScope, RepositoryRow


constructing and performing
Content Engine searches

164 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Package Description Examples

collection Provides type-safe classes that FolderSet, ContentElementList


are related to collections of objects

events Provides classes representing FileEvent, UnfileEvent,


events triggered on Content EventAction,
Engine objects, as well as classes InstanceSubscription
related to handling those events
raised within the Content Engine

property Provides classes related to Properties, PropertyDateTime,


Content Engine properties PropertyDateTimeList,
PropertyFilter

constants Provides classes defining AccessRight, Cardinality,


collections of related, type-safe DatabaseType, PropertyNames,
constant values ReservationType

7.3.2 Transports available with the APIs


When designing any multi-tiered application, you must carefully consider how
information will be conveyed back and forth between the client side and the
server side of the network connection. Different frameworks for remote calls
typically come with different advantages and constraints.

In the P8 Content Manager APIs, the framework mechanisms are called


transports. The APIs were designed so that all API operations are completely
independent of the transport used. (The few exceptions deal with the
propagation of security and transaction contexts.) A benefit of this independence
is that applications can be written without considering the transport. The
selection of a transport is a configuration decision when the application is
deployed (the API finds out about it through the URI used for the Connection
object).

There are two available transports: Web services (WS) and Enterprise Java
Bean (EJB). EJB transport is available only for the Java API, whereas WS
transport is available for both APIs. For most situations, the EJB transport is
preferred, but the WS transport can be used in more environments. In all cases,
the transport is considered stateless, which means that the APIs operate on the
basis of a single request and response for each interaction. No client state is
maintained by the server after a request has been serviced.

EJB transport
The EJB transport internally uses EJB method calls. The method calls are made
on the client side and transported by the application server to the server side of

Chapter 7. Application design 165


the network connection. Although many people think of EJBs using Java Remote
Method Invocation (RMI) as the remote communications mechanism, that is not
necessarily the case. Application server vendors are free to provide whatever
implementation they like as long as they meet the EJB requirements, and many
vendors use something other than RMI. In any case, the details of the application
server’s implementation are transparent to the API, and the API does not need to
have facilities for controlling things, such as clustering or server affinity of the
EJB, because those things are configured within the application server.

WS transport
As its name implies, the WS transport uses Web services protocols. In fact, the
WS transport uses an enhanced version of the Content Engine Web Services
(CEWS) protocol. You probably already know that means XML over HTTP or
HTTPS. Because HTTP and HTTPS use only a single port for the entire
conversation and use a strict client-server interaction model, it is generally easier
to configure a firewall or reverse proxy through which to allow WS transport
requests to pass.

Web services attachments are used for carrying pieces of content between the
client and server sides. Attachment handling has undergone a lot of changes
over the years, and different environments and tools support different standards:
򐂰 When using the Java API, you must select the CEWS endpoint that supports
Direct Internet Message Encapsulation (DIME) attachments (recognizable
because it has DIME in the endpoint name).
򐂰 When using the .NET API, you must select the CEWS endpoint that supports
Message Transmission Optimization Mechanism (MTOM) attachments
(recognizable because it has MTOM in the endpoint name).

In both cases, you must select the endpoint with 40 in the endpoint name. Do not
select the endpoint with 35 in the name.

Comparing the transports


Here are things for you to consider when deciding which transport to use:
򐂰 Because it usually employs a binary protocol likely to have been engineered
for high performance, the EJB transport typically has better performance than
the WS transport in the same environment. The actual performance
difference is extremely dependent upon the specific mix of API calls your
application makes.
򐂰 The EJB used by the EJB transport automatically propagates any active
transactional context to the server. In contrast, transaction propagation is not
possible when using WS transport. Whether transaction propagation is
desirable depends on the application. To make the two transports as
compatible as possible, the API disables transaction propagation for the EJB

166 IBM FileNet Content Manager Implementation Best Practices and Recommendations
transport unless instructed otherwise through a parameter setting on the
Connection object.
򐂰 The EJB used by the EJB transport automatically propagates any ambient
JAAS authentication context to the server. If you are already using a
JAAS-based authentication scheme, either in isolation or as part of a single
sign-on (SSO) framework, P8 Content Manager is very likely to participate in
that scheme with few or no configuration changes.
򐂰 In contrast, there is no general framework for propagating an authentication
context when using WS transport. Although a standard called WS-Security
provides a high-level framework for adding authentication schemes, WS
transport can only support schemes backed by specific implementation
programming. P8 Content Manager directly supports WS-Security Username
token and Kerberos token authentication schemes. The latter can be used to
facilitate integration with Microsoft Windows applications. Custom
authentication schemes can also be implemented by using the IBM FileNet
Web Services Extensible Authentication Framework (WS-EAF). Specific
details of using Kerberos and WS-EAF are provided in the Web Service
Extensible Authentication Framework Developer’s Guide section of the online
help files, IBM FileNet P8 Documentation.
򐂰 WS transport, which is based on HTTP or HTTPS, uses just one or two
TCP/IP ports for all interactions. There are also commercially available
products for examining and validating Web services traffic. Therefore, many
administrators find it easier and more secure to open their firewalls to WS
transport requests. In contrast, EJB transport might use a vendor-specific
binary protocol. Such protocols often employ a range of TCP/IP ports. These
factors typically lead to a greater willingness to allow WS transport to pass
through firewalls and a reluctance to do the same for EJB transport.
򐂰 In cases where WS transport is using Username token authentication, the
credentials will appear on the wire unprotected unless you use Secure
Sockets Layer (SSL), which we strongly recommend.

7.3.3 Authentication models


We are all familiar with the traditional authentication model of a user providing a
user ID and password. This tried-and-true mechanism has been in widespread
use for decades. It is easy to implement and conceptually simple, but it does
have some drawbacks:
򐂰 Software systems built to rigidly expect user ID and password credentials are
difficult to adapt in the face of other forms of credentials. Examples of other
forms of credentials, which can be used with or without a password, are
fingerprint scans and hardware security tokens. It is not possible for a

Chapter 7. Application design 167


software system built today to anticipate all of the forms of credentials that
might be used in the future.
򐂰 In an environment where users must interact with several applications, either
the user must repeatedly enter credentials when crossing application
boundaries, or the credentials must be passed from one application to
another. The first choice represents a usability annoyance, and the second
choice represents an information security hazard, because it gives more
opportunities for the credentials to be discovered or exploited by an attacker.

For the first of these problems, the software industry has evolved to a model of
pluggable authentication. That means that components for verifying different
credential types can be developed independently of the framework into which
they fit. The output of a pluggable authentication framework is often a token
affirming that valid credentials were presented and verified. That is typically
enough information for most authentication consumers; although, some systems
also provide information about the types of credentials that were presented.

Pluggable authentication also works toward solving the second problem,


because the token produced can be more securely passed between applications
than can be done for raw credentials. There are more factors involved in single
sign-on (SSO) solutions than pluggable authentication. There must be additional
conventions or APIs for the applications to communicate with each other or at
least with the SSO framework. A full discussion of SSO frameworks is beyond
the scope of this book. Most application server vendors provide at least some
SSO capabilities, and there are many vendor solutions available.

In the Java environment, the pluggable authentication framework is Java


Authentication and Authorization Service (JAAS). P8 Content Manager fully
delegates authentication to JAAS (but does not use JAAS for authorization
purposes). Even if you use the Java API’s transitional convenience method
UserContext.createSubject(), behind the scenes the API is performing a JAAS
login. Virtually all modern SSO solutions also work in concert with JAAS, so the
Content Engine server will almost always automatically participate in any SSO
solution that you use.

The Java environment and JAAS framework work together to affiliate a security
Subject with a thread of execution after authentication has taken place. When
you use EJB transport to connect to the Content Engine server, the transport
automatically propagates the Subject along with the rest of the request. The
Content Engine server does not care what credentials were used to authenticate
the user; it cares only that authentication was successfully performed via JAAS
within a trusted environment. P8 Content Manager includes sample JAAS
configuration files suitable for use with various application servers, but the
Content Engine does not depend on their use. For any particular environment,

168 IBM FileNet Content Manager Implementation Best Practices and Recommendations
any JAAS configuration can be used as long as it is compatible with the
application server.

If you use WS transport to connect to the Content Engine server, the JAAS
Subject cannot be directly propagated due to technology differences. Instead,
individual authentication and credentials schemes must be specifically
anticipated in code. For a Java environment, you must use the
com.filenet.api.util.WSILoginModule. It intercepts the raw user ID and
password credentials and arranges for them to be transmitted to the Content
Engine server. For .NET environments, the Content Engine .NET API can
transmit Kerberos tokens to take advantage of Windows Integrated Logon. In
either case, the WS Listener on the Content Engine server immediately takes the
security context information that it receives and performs a JAAS login, so that
the bulk of the Content Engine server is only aware of the JAAS framework.

7.3.4 Minimizing round-trips


The number and nature of network round-trips, that is, requests from the client to
get a response from the server, usually dominate the performance picture of the
application. There are simple and powerful tools available in the APIs to reduce
your round-trips, and API logging can be used to assess how well you are doing.

Get or fetch
When many people think about interacting with an object from the server, they
first think about doing a round-trip to fetch the object. That is a necessity for
many things, but there are several cases where you do not need that initial fetch.
For example, if you are only going to use an object so you can set the value of an
object-valued property on another object, you really only need a reference. If you
somehow know that the object already exists, you can skip the round-trip to fetch
it. (If it turns out that you were wrong and it did not already exist, the referential
integrity mechanisms in Content Engine will throw an exception when you try to
save the referencing object.) The APIs have a mechanism called fetchless
instantiation. There are three flavors of Factory methods for creating
programming language objects that reference Content Engine objects, and you
can tell them apart by the word used as the beginning of the method name:
򐂰 create indicates that a new Content Engine object is to be created. No
round-trip is done as the result of this Factory method call; although, a save
call must eventually be done.
򐂰 fetch indicates that a round-trip will be immediately made to the Content
Engine to verify that the object exists and to return an initial set of properties.
Fine-tuning of the properties returned can be controlled via an optional
PropertyFilter (see “Property filters” on page 170).

Chapter 7. Application design 169


򐂰 get indicates that no round-trip will be made. This is a fetchless instantiation.
The API is taking your word for it that the object actually exists. There is no
initial set of property values available, so you will need to request any
property values that you need. If you know that you will always need some
property values immediately, there is no advantage to fetchless instantiation.

Property filters
Property filters are optional parameters to a number of methods that fetch
objects or properties from the Content Engine. They allow highly granular control
of the objects or properties being returned.

It is easy to understand how returning fewer properties can improve


performance, but, less obviously, you can also improve performance by returning
more properties and objects. The savings comes if you can return multiple
objects in a single round-trip instead of making multiple round-trips to do the
same work. A property filter can do just that. Because property filters operate
with a concept of recursion levels, you can use them to navigate object-valued
properties and actually return an entire tree of objects with selected properties.
Over time, most applications know what properties and objects they need, so this
can be an efficient way to do most or all of your retrievals in just a few round-trips.

Options for using property filters are described in detail in the online reference
help for the PropertyFilter class. Most of the Content Engine API calls that can
take a property filter will also accept a null value. In these cases, the API still
works correctly, but it might make additional round-trips behind the scenes. It is
designed that way so that you can get your application working quickly and
optimize the performance later.

Pending actions
Methods, which at first glance seem to be making updates to Content Engine
objects (for example, checkin(), checkout(), and delete()), are only marking
the programming language object with the change. The APIs call these pending
actions. The concept is easy enough to understand. It is the reason that a call to
save() must be done to send pending changes and property value updates to
the Content Engine. Not as obvious is that you can queue up multiple pending
changes on a single object. Not all combinations of pending actions make sense,
but when it does make sense to combine them, you can save network
round-trips. This feature is especially useful when combined with batching.

Batching
The Content Engine APIs contain two separate but similar batching mechanisms:
򐂰 A RetrievingBatch is used to fetch multiple, possibly unrelated, objects from
the Content Engine in a single round-trip. Object references and property

170 IBM FileNet Content Manager Implementation Best Practices and Recommendations
filters are added to the batch, and retrieveBatch() is called to trigger the
round-trip.
򐂰 An UpdatingBatch is used to group multiple updates in a single round-trip to
the Content Engine. Instead of calling save() on individual objects, the
objects are added to the batch, and updateBatch() is called to trigger the
round-trip. Updates are performed as an atomic transaction.

7.3.5 Client-side transactions


All work performed by Content Engine in a database or other storage is done
transactionally, which means you never get partially successful calls to Content
Engine. The call either completely succeeds or completely fails. This is important
for maintaining consistency of the data in the repositories. You do not need to do
anything to get that sort of transactional behavior inside Content Engine.
Actually, there is no way to avoid it, because it is hard-coded into Content Engine
logic.

There is another type of transaction that you can control in your application. If
you use the Java API with EJB transport, you can include Content Engine activity
within a client-side transaction. This feature is unavailable when using WS
transport (see “WS transport” on page 166). The client-side transaction can be
started implicitly by the J2EE container or started explicitly through your use of a
javax.transaction.UserTransaction object.

P8 Content Manager follows the J2EE model for transactions, and J2EE in turn
follows industry standards for distributed transactions. In this context, the
relevant facts are that a transaction is started, operations performed by a
transactional resource (in this case, Content Engine) are tagged with the
transaction identifier, and the transaction is either committed or rolled back. All
changes tagged with a given transaction identifier are committed or rolled back
as an atomic unit.

You control whether or not your Content Engine calls participate in a client-side
transaction by configuring the Connection object. By default, Connection objects
are configured to not participate in a client-side transaction (even for EJB
transport). This presents the least surprising behavior, because both transports
give the same behavior by default. You make the following call on a Connection
object conn to change from the default behavior:

conn.setParameter(ConfigurationParameter.CONNECTION_PARTICIPATES_IN_TRA
NSACTION, Boolean.TRUE);

An exception is thrown if the connection does not support participation in


transactions.

Chapter 7. Application design 171


Now that we have described the use of client-side transactions, here are a few
reasons to avoid them:
򐂰 Client-side transactions tend to create or magnify performance problems. The
reason is that the overall transaction times are longer simply due to network
latency and other factors inherent in the interaction between client and server.
Longer transaction times mean that resources all the way into the database
are being held for longer periods of time. This greatly increases the chances
for resource contention and slows overall system throughput.
򐂰 Most of the things that applications want to do in a client-side transaction can
be done more efficiently with the API batching mechanism using an
UpdatingBatch object. A batch is performed as an atomic transaction, but the
transactional control is on the Content Engine side.
򐂰 API batches can be used with all APIs and transports, so it is a more flexible
mechanism than client-side transactions.

After some analysis, it almost always turns out to be the case that applications
using client-side transactions can be rewritten to use API batching. For the few
cases where client-side transactions are genuinely needed, they are supported
as described. The case where you might be forced into a client-side transaction
is when your application must include transactional resources outside of P8
Content Manager. For example, if you must include P8 Content Manager
updates atomically with updates to a stand-alone database, that is a motive for
using a client-side transaction. If you do find yourself using a client-side
transaction that you cannot avoid, do your best to minimize the amount of time
that the transaction is active.

7.3.6 Creating a custom AddOn


If you plan to use your application in multiple environments, either in your own
organization or by distributing it to others, you need to be able to recreate the
classes, properties, and perhaps some instance data from your repository. We
discuss in much detail the process of moving from development to production
environments in Chapter 10, “Deployment” on page 245. For situations where
you want to deliver your application as a package, you can consider developing
an AddOn. An AddOn is a bundle of exported data with an optional post-install
script. The post-install script is run automatically after the AddOn is installed and
can be used for any kind of programmatic activity that you need to make the data
completely the way that you want it. An AddOn also has information about other
AddOns that must be installed as prerequisites.

An AddOn is created by creating an instance of the Content Engine AddOn class.


When saved, the AddOn is stored within the Global Configuration Database
(GCD). Available AddOns are accessible via the Domain object’s AddOns property.

172 IBM FileNet Content Manager Implementation Best Practices and Recommendations
An available AddOn can then be installed into an object store, which means that
the data is imported and the post-install script is run. IBM FileNet Enterprise
Manager has menu actions and wizards for manipulating AddOns, including
selecting which AddOns to install when an object store is created.

7.3.7 Using the JDBC interface for reporting


In addition to programming language APIs, P8 Content Manager also presents a
read-only Java Database Connectivity (JDBC™) interface. This interface is not
an interface directly to the relational database tables used in the repository.
Rather, it is a view into the object model represented by the Content Engine
metadata. In the JDBC interface, queries follow a model analogous to that of the
native APIs, where each metadata class looks like a database table and each
property looks like a database column.

The SQL query syntax is exactly the same for the JDBC interface and the native
API queries, including extensions for handling class hierarchies and folder
containment. Here is an example of a typical query:

SELECT CustomerNumber, CustomerName, GrandTotal FROM Invoice WHERE


GrandTotal > 1000.0 ORDER BY GrandTotal DESC

In this example, Invoice is a custom class and the properties mentioned are
custom properties on that class. For the purposes of this query, it does not really
matter whether it is a subclass of Document, Folder, or some other class.

The JDBC interface is implemented in the driver class


com.filenet.api.jdbc.Driver, which is contained in the standard Java API
client JAR file, Jace.jar. Configuring the client application to use the JDBC
interface is not much different from configuring any other stand-alone Java client.
There is an additional step of configuring the JDBC connection string. The details
of configuring the JDBC connection string are provided in the documentation for
the Driver class. Many third-party reporting tools support connections to JDBC
interfaces even if the tools are not native Java applications. The details of
configuring a reporting tool are, of course, specific to each tool, and they are not
discussed here.

The JDBC interface follows the JDBC specifications and programming models,
but the motivation for its development was primarily for use by reporting tools.
The JDBC interface is also purely read-only. Therefore, the JDBC interface is not
an especially good choice for use in application development. For general
application programming, the native APIs provide a richer interface.

Chapter 7. Application design 173


7.3.8 Exploiting the active content event model
P8 Content Manager provides a unique active content capability that proactively
moves content and content-related business tasks through a business process
without requiring human initiation. You probably have several objects, which are
mostly directly controlled by your application, but you also want to be aware of it
if another application tries to make a change to these objects. When that
happens, you might like to either prevent the change or perform follow-up actions
to ensure data consistency in an application-specific way. One very well-known
follow-up action is to launch a workflow activity so that an affiliated Business
Process Management (BPM) system can coordinate a complex chain of events.

As a programmer or an administrator, your exposure to active content is via the


P8 Content Manager’s event subscriptions model. You create and register a
subscription for various events. The subscriptions can be created for individual
object instances or for an entire class of objects. The subscribed events
represent updates (or at least attempted updates) to an object. The complete list
of available events can be found by looking at the reference documentation for
the subclasses of com.filenet.api.events.Event. (Events are closely related to
audit logging in Content Engine. There are a few types of events, subclasses of
RetrievalEvent, that can be selected for auditing but not for subscriptions.)

When an event occurs in Content Engine, any active subscriptions link the event
to an EventAction and ultimately to your code, which implements the interface
com.filenet.api.engine.EventActionHandler. (Because the Content Engine
runs in a J2EE application server, all event action handlers are written in Java so
that interface does not appear in the .NET API.) Through the onEvent() method,
your code receives parameters that describe the event that occurred as well as
the state of the object when the event occurred. For some events, you get both
before and after snapshots of the object.

Event subscriptions come in two types: synchronous and asynchronous. It is up


to you as the creator of the subscription to decide which type to use:
򐂰 For a synchronous event subscription, your event action handler is called
after the change has been made to the object, but before it is committed (in
the transactional sense). You are not allowed to make changes to the object,
but you do have the opportunity to veto the change by throwing an exception.
Because the event action handler for a synchronous event subscription runs
within the context of an active EJB transaction, do the minimum amount
possible so that transaction timeouts do not occur. Therefore, limit your logic
in the event handler to making decisions about vetoing the change. If you
must make changes to other objects for data consistency or other reasons,
only do it synchronously if it absolutely must be done by the time that the
transaction commits.

174 IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 For an asynchronous event subscription, your event action handler is called
after the change to the object has been committed to the database. Your
handler does not run within the context of an active EJB transaction. Instead,
it has its own transaction started by Content Engine. You can make changes
to the triggering object, but those changes are just normal, additional changes
like you might make from a client program. Your handler cannot veto the
original change, because it has already happened and been committed.
Because it is an asynchronous event subscription, your handler is called at
some time after the commit. Although you can usually expect your handler to
be called within a few hundred milliseconds, overall system load and
competing event action handlers contribute to the overall timing, and there is
no guaranteed time by which your handler will run.

By using the event subscription model in Content Engine, you can create
handlers that monitor changes to objects not just from your application or
components, but from all sources.

7.3.9 Creating your own API or framework


You might create an application in isolation, and no one will ever need that same
program logic again. That does happen, but in today’s world of interrelated
applications, it is not that common. Instead, you might find yourself implementing
similar logic over and over. To make the most of your development investment,
consider implementing a layer that presents an API. If your application presents a
user interface, consider implementing a reusable framework.

Even with the inconvenience of managing multiple generations of your


interfaces, it is still better in practice than the two common alternatives when
someone wants to reuse your work:
򐂰 You write a nice application-specific module, but it is not exactly the interface
that someone else wants. They look around and find internal methods and
objects which do what they want. Without a mechanism, such as a published
API or framework, to precisely designate what is exposed for use, you now
have other people using things in ways that you do not know about. When
you rearrange internal coding details, their applications break.
򐂰 You write a nice application-specific module, but it is not exactly the interface
that someone else wants. They take a copy of your source code and make the
modifications that they need. Even if they tell you about the changes that they
are making, it is extremely difficult to keep multiple copies of the source code
synchronized. Complicated dependencies will arise and will probably not be
tracked very well.

An up-front decision to create an API or framework with a planned mechanism


for evolution can help you avoid these common problems.

Chapter 7. Application design 175


You will often start out writing an application or software module with a particular
use case in mind. A lot of the particulars of that use case will show up in your
initial implementation. Later, you will realize that one use case is just one aspect
of a more general situation. Try to think about your implementation logic as a
collection of infrastructure pieces that can be combined in different ways for use
in accommodating multiple use cases. The collection of infrastructure pieces is
where you want to create an API or framework.

The details of how to create and publish an API or framework vary by technology
and by organizational environments. Here are general considerations when
planning these interfaces:
򐂰 Can you separate out the control of your interface into configuration (which
controls overall behavior, locations, names, and so on) and parameters
(which can vary from call to call within the same invocation)?
򐂰 Have you built any assumptions into the interface which can just as easily be
made configuration items or input parameters?
򐂰 Alternatively, do not make things into configuration items or parameters if
they will actually never change. It is easy to add configuration items or
parameters later with appropriate default values, but it is more difficult to
remove or change them after they are published.
򐂰 What is the appropriate layer in your module that is likely to be useful to
someone else? The separation between infrastructure and use cases is a
good starting point, but you might find further layering points within the
infrastructure and use cases. You might end up publishing more than one API
or framework.
򐂰 Your organization, industry segment, or technology community might already
have formal or informal standards for the look and feel of APIs or frameworks.
Use those as a guide when creating your own; although, you will ultimately
have to come to your own conclusions when there is not an exact fit. The goal
is to make it easy for others to understand and use your interfaces
productively.

There is a downside to reusable APIs and frameworks. The more others come to
depend on them, the harder it is to change them in upwardly compatible ways.
There is a really good chance that the first revision will not be correct, and you
will want to change things in later releases in ways that are incompatible with
earlier releases. If you are the only developer or if the development of all the
using applications is in the same small organization, it is usually not a problem to
have a short period of time when everything switches from the old interface to the
new interface. In even moderately complex development environments, that
short period of time can be infeasible. The usual solution to this challenge is to
organize things so that different generations of your API or framework can be in

176 IBM FileNet Content Manager Implementation Best Practices and Recommendations
use by different applications at the same time. This solution avoids the problem
of forcing the conversion of all applications at exactly the same time.

7.3.10 Logging
P8 Content Manager APIs have built-in logging, which focuses on providing
details of round-trips between the client and server. The reason for that focus is
because those details are typically interesting information for resolving both
performance and functional problems. The main purpose of the logging is to
have artifacts for diagnosing problems when hands-on debugging is not possible.

When designing logging for your own applications, you are likely to have similar
goals. You might want to consider the following points:
򐂰 Determine the interesting interactions in your application. Focus your logging
efforts on those interactions first. You can always add more logging as your
application evolves or as you get a feel for the types of problems that occur in
production. Think of logging those interesting interactions as a unit, whether
they are all contained within a given software module or not.
򐂰 Do not log uninteresting details. Log files can become quite large, and many
details that are logged will turn out to be distracting clutter when you are
looking at log files later. If something is likely to help solve a problem, log it.
If there is just a remote possibility that it will help, skip it.
򐂰 Be careful about tying things to source code. It is fine to assume that the
people looking at the logs will have access to the source code to see what
entries mean, but only if that is actually true. Otherwise, log entries must be
reasonably self-explanatory so that you can teach someone what they mean.
򐂰 Log the impossible. In any application, there are conditions that are supposed
to be impossible. It is tempting to ignore those conditions in program logic. If
one of those conditions actually happens, it must be logged, because it is an
indication of a design flaw or something seriously strange in the runtime
environment.
򐂰 Pick a few severity and verbosity levels. It is probably better to have fewer
rather than more levels of granularity in your controls for logging. Modern
logging toolkits often give you the freedom to control things with many levels.
Do you really need them all? You probably do not. You probably do not need
much more than on, off, and perhaps one level in between. For each
combination, ask yourself who will really use it and why it is better than
another combination that you already need. One reason to have an
intermediate level is because voluminous logging usually has an impact on
performance. You can sometimes get ideas for narrowing your focus by using
only intermediate logging.

Chapter 7. Application design 177


7.3.11 Creating a custom protocol
If your application can connect directly to Content Engine, you are in a good
position to just use Content Engine APIs and let the APIs handle the
communications protocol details. If your application runs in a J2EE container,
that will almost certainly be the case.

However, if your application runs outside of an application server, there might be


firewalls or other network reasons that prevent you from reaching the Content
Engine directly. When you design an application for your own enterprise, you will
probably have a pretty good understanding of the network topology and whether
direct connectivity is an issue. If you are designing an application for others to
use, the situation can be less clear. In fact, the assumptions about your own
enterprise can change in the future.

If you think your application might run in an environment without direct Content
Engine connectivity, there are alternative approaches.

Custom protocols
If your application makes only a few types of requests to the Content Engine
without much variation in the request parameters, you might consider creating an
abstraction layer for those requests and creating an application-specific proxy
solution. For this solution, you build a proxy, probably as a J2EE servlet, to
receive requests from your application and translate them into appropriate
Content Engine API calls. The proxy is also responsible for taking the results of
the Content Engine API calls and relaying them back to your application.

You obviously have quite a few technology options available to you for creating
the proxy and the protocol used between the proxy and the application. Instead
of creating a one-of-a-kind, technically isolated solution, think about generalizing
the types of requests and responses. You can probably let your proxy design
evolve into one or more services that fit into the SOA model. A lot of information
and tools are available for SOA solutions, so your overall effort will likely be less
than a one-of-a-kind solution. Here are points to consider when using your own
proxy and protocol:
򐂰 Do not assume that any requests coming in are coming from your application.
When a protocol listener is deployed on a network, you really cannot control
who or what connects to it.
򐂰 Use a robust authentication scheme. You might have an application in which
anonymous access is allowed, but those situations are pretty rare. Even if
access is not tightly constrained, there is often still a requirement for logging
who had access.

178 IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 Do not expose user credentials in non-encrypted network packets. Even if
your system handles only low security information, users often have the same
credentials for many systems (whether there is a policy forbidding that or not).
Compromising user credentials is often a bigger problem than compromising
your own system. You will often be able to use the easy solution of securing
the entire connection with SSL or Transport Layer Security (TLS); as a side
effect, the credentials are also protected.

Reverse proxies
After reading the section about custom protocols, a natural question is why the
product does not offer this type of solution for common use cases. Actually, it
does. It offers this not just for common use cases but for all use cases that the
Content Engine APIs serve.

You can restrict yourself to the WS transport and use an HTTP/HTTPS reverse
proxy for getting access to Content Engine inside the firewall. There are many
reverse proxy solutions available, and we are not discussing any specific
package here. The functional principles of a reverse proxy are the same at a high
level. The reverse proxy resides in the boundary area of the firewall and
selectively allows requests and responses to pass through.

Because WS transport can be used with both the Content Engine .NET API and
the Content Engine Java API, there are many application-building technologies
available to you.

In principle, you can use a reverse proxy with the EJB transport, but the options
are much more limited. Depending on the application server and underlying
protocol, there might not be any workable reverse proxy for EJB transport. Many
EJB protocols are derivatives of the standard Java Remote Method Invocation
(RMI) protocol. RMI can be tunneled over HTTP, but there is usually a pretty
significant performance cost to do that. If you can find a reverse proxy that
handles the underlying protocol of your application server’s EJB layer with
reasonable efficiency, the reverse proxy is a good model for EJB transport.

7.3.12 Creating a data model


Designing applications goes along with the design of how you plan to store your
permanent data. In the case of P8 Content Manager, the available mechanisms
in the repository fully support your use of object-oriented programming models.
We describe here just a few of the items that might be overlooked by developers
unfamiliar with an object-oriented persistence layer.
In certain cases, there are features that are not commonly available in
object-oriented programming languages.

Chapter 7. Application design 179


Inheritance
Repository classes support a convenient inheritance model. You can define new
subclasses that add properties or change various characteristics of existing
properties for the subclass. You can also add new properties to most system
classes; although, it usually makes more sense to define a subclass just for that
purpose and extend it (by adding properties or further subclassing) for your
application’s needs.

Property value constraints


The repository metadata model also allows you to define default values and
constraints on properties that will be enforced by the server. For example, you
can define an integer property constrained to a specific range or set of allowed
values. Although you might traditionally put that kind of validation logic into your
application, having it in the metadata ensures that no other application can put
invalid data in those properties. After the constraints are in the metadata, your
application can read the metadata and use that to guide application layer
validation.

Object-valued properties
One of the more powerful features of the data model is object-valued properties
(OVPs). When one object needs to reference another object, use OVPs instead
of storing the ID or path to the object. By using OVPs, you can directly navigate
from object to object. For an OVP, the metadata provides type safety by only
allowing you to point to objects of a given class (or subclass), just like an object
reference in a programming language. The server provides features for
referential integrity (prevents the occurrence of pointers to nonexistent objects)
and configurable cascading deletion (automatically controlling the deletion of
pointed-to objects or preventing the deletion of pointing-to objects).

Reflective properties
A particularly useful form of OVPs is a reflective property, also known as
association properties. You can configure these OVPs via a wizard in IBM
FileNet Enterprise Manager. More than one object can point to a particular other
object. When that happens, the reflective property mechanism is used to simplify
the bookkeeping and let Content Engine perform most of the work. The usual
examples have a parent and many children. Suppose you have an Invoice
object with many LineItem child objects. With the reflective property mechanism,
define an Invoice property on the LineItem class and a LineItems property on
the Invoice class. (The naming is just a convention that works well in practice.
Any property names can be used.) To affiliate a new LineItem with the Invoice,
you need to only populate the Invoice property on the LineItem object. Because
it was created as a reflective property, the LineItems property on the Invoice
class is automatically updated to reflect the new line item. When you access the

180 IBM FileNet Content Manager Implementation Best Practices and Recommendations
multi-valued property (the LineItems property in our example), Content Engine
automatically performs a query for applicable objects with the appropriate value
in the single-valued property (the Invoice property in our example).

Many-to-many relationships
Especially because of reflective properties, it is easy to use OVPs to model
one-to-many and many-to-one relationships. You might find the need to model a
many-to-many relationship. The usual solution for that is to use an intermediate
object to express a single pair of relationships. The system class,
ReferentialContainmentRelationship (RCR), is an example of this solution for
the special case of containing objects in folders. A single object can be contained
in many folders, and a folder can contain many objects. The Document class has
a reflective property, Containers, which identifies all the RCRs (and, therefore all
the containment relationships) that reference a specific Document instance. The
Folder class likewise has a Containees property.

You can see that this intermediate relationship object, combined with reflective
properties, is a powerful tool for simplifying your modeling of many-to-many
relationships. Not only does it express the relationship, but it can also have
properties specific to that particular relationship. For example, an RCR has a
property, ContainmentName, that gives a unique name to a contained object for
the purposes of path-based navigation. When you use an intermediate object for
a relationship, you can add whatever properties are appropriate to your business
needs. Both ReferentialContainmentRelationship and
DynamicReferentialContainmentRelationship classes are subclassable, and
you can use them for your own relationships if they happen to fit the folder
containment model. Other good choices for the intermediate object are
subclasses of CustomObject and Link system classes.

Custom objects
You will often find yourself with a need to hold a collection of related properties
for one reason or another. In a database programming environment, you might
create a new table with rows representing the collection of information. The P8
Content Manager solution for this is to create a subclass of the CustomObject
class. The CustomObject system class has only a few properties of its own, and it
exists specifically to be subclassed for this kind of use. The invoice and line item
example used for reflective properties can also be modeled this way.

Database row limits


The Content Engine presents an abstract, object-oriented data model. In
practical terms, however, the data model is always realized in terms of a set of
real database tables and columns. Different database vendors enforce limits on
row sizes in different ways. All properties occupy space in database table rows,
so you will want to add only properties that you really intend to use. You can

Chapter 7. Application design 181


always add more properties later. String-valued properties are of particular
interest for database row limits, because string-valued properties can take up
relatively wide areas of table rows. For any string-valued property that you
define, Content Engine gives you the choice to implement it as a short string or a
long string. Whenever possible, use the long string implementation, because this
implementation has the smallest impact on row size limits. There are some
database-specific trade-offs in using a long string. Those trade-offs are
described in detail in the documentation for the property UsesLongColumn, which
appears on the PropertyTemplateString, PropertyDefinitionString, and
PropertyDescriptionString objects.

7.3.13 Additional reference


There is a technical notice, IBM FileNet P8 Content Engine Query Performance
Optimization Guidelines, that provides a guide for optimizing the performance of
your IBM FileNet Content Java API or Content Engine COM API client SQL
queries made against an Content Engine server.

The technical note provides the following guidelines (several of which we have
already discussed earlier in previous sections of this chapter):
򐂰 Limit rows returned.
򐂰 Avoid non-indexed ordering and searching.
򐂰 Avoid non-function-indexed case-insensitive comparisons.
򐂰 Avoid unnecessary object type searches.
򐂰 Avoid unnecessary column returns.
򐂰 Use the free-threading model.
򐂰 Tune query batch size parameters.
򐂰 Avoid complex table linkages.
򐂰 Avoid unnecessary result row ordering.
򐂰 Avoid subqueries (Oracle).

Although the paper is written for Version 3.x, many of the guidelines are still
applicable for Version 4.0 software. To download the complete technical notice,
visit:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/35x/V10_
P8_Query_Perf_Guidelines_TechNote.pdf

182 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Another technical notice, IBM FileNet P8 Recommendations for Handling Large
Numbers of Folders and Objects Technical Notice, provides recommendations
about handling large numbers of contained folders, documents, and custom
objects in an IBM FileNet P8 environment. The paper is also written for Version
3.x. However, the recommendations are still applicable for Version 4.0 software.
To download the complete technical notice, visit:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/35x/fold
erlimitrecommendations.pdf

To view all available technical notices, go to the product documentation page for
IBM FileNet P8 Platform and look for the technical notices:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

Chapter 7. Application design 183


184 IBM FileNet Content Manager Implementation Best Practices and Recommendations
8

Chapter 8. Advanced repository design


In this chapter, we describe repository design issues that go beyond basic
repository configuration.

We discuss the following topics:


򐂰 P8 Content Manager folders
򐂰 Storage media
򐂰 P8 Content Manager searches
򐂰 Considerations for multiple object stores

© Copyright IBM Corp. 2008. All rights reserved. 185


8.1 P8 Content Manager folders
In this section, we offer advice and best practices for developing a repository
filing system. For many clients, a primary reason for installing a central repository
is to bring scattered information into an organized structure. For other clients,
however, repository folders are not a primary concern. For either case, we offer
suggestions for organizing your repository.

8.1.1 Filed as opposed to unfiled


In an IBM FileNet P8 Content Manager (P8 Content Manager) repository,
content objects can be added to a repository in two ways: without reference to a
folder structure or into a particular folder (or set of folders). We refer to these
options as unfiled and filed (see Figure 8-1). Unlike a file system, repository
folders do not represent physical locations in the repository. Content is added,
indexed, and is accessible whether or not it is filed in a folder.

Un-filed Repository Filed Repository

Figure 8-1 Repositories can be filed or unfiled

One of the primary benefits of filing into a folder is browsing. Browsing allows
users to traverse a folder structure and locate content inside a folder. Hopefully,
all the content in any given folder relates to a particular activity or function.
Another advantage is that in a P8 Content Manager repository, content can be

186 IBM FileNet Content Manager Implementation Best Practices and Recommendations
filed in more than one folder at a time. There is one master copy of the content,
and references filed in multiple folders point back to the single master.

Remember that with P8 Content Manager, users can always search for and view
any content that meets search criteria whether or not the content is filed in a
folder. Folders are simply a convenience for users who wish to browse for
repository content.

There are use cases where unfiled content makes sense. Table 8-1 is a decision
table for the filed option as compared to the unfiled folder option.

Table 8-1 Folder options and their impact


Folder option Impact

Unfiled content (does not use folders) 򐂰 Content is only accessible by search.
򐂰 There is no need to organize
repository content using folders.
򐂰 Transactions that add content are
slightly faster.
򐂰 Appropriate for high-volume image
applications where access will be by
search only.

Filed content (uses folders) 򐂰 Users can browse or search for


content.
򐂰 There is a need to organize the
repository content.
򐂰 A single version of a content object
can be filed in more than one folder.
򐂰 Appropriate for lower volume
applications, or applications where
users will be manually adding content.

8.1.2 Organizing unfiled content


The P8 Content Manager repository can act as a receptacle for high volume
archive systems for image (scanned paper) or e-mail messages. For these
applications, folders and an organization scheme are not a priority. The “add
content” transaction in P8 Content Manager is slightly faster when foldering is not
required, and in this type of solution, transaction rates and efficient searching are
the most important criteria.

In solutions of this kind, searching becomes the primary mechanism for content
retrieval. For this reason, the metadata that identifies the content when it is
added to the repository is vital.

Chapter 8. Advanced repository design 187


The metadata set that is collected for each content item must include all
properties necessary to identify and retrieve the content. This set must include
the usual properties, such as content title, content subject, and date collected, in
addition to application-specific properties, such as customer name, customer ID,
and account number.

Organizational metadata elements


You must also consider another set of metadata. You can add metadata
properties that provide organization for the content. Organizational metadata
identifies the type of content, the division or department to which it belongs, and
potentially, the record series that controls its retention. Examples of
organizational metadata properties are:
򐂰 Division
򐂰 Department
򐂰 Function
򐂰 Activity
򐂰 Document type
򐂰 Record type

Adding organizational metadata tags to repository content is a valid method of


providing a central structure to repository content without using folders. You can
add the same elements that create an efficient folder structure to unfiled
repositories as organizational metadata properties.

8.1.3 Repository folder structures


The design of a central repository is an opportunity to place scattered content
into an organization-wide filing system. One of the primary functions of a
repository is to offer ease of access; users must be able to quickly locate
information with a minimum of effort.

Several parameters contribute to a well-designed repository folder structure:


򐂰 Is the structure self-explanatory? Is it easy to locate information?
򐂰 Does the structure work for all groups in your organization?
򐂰 What about groups that want to create their own folder structure?
򐂰 Does the structure avoid placing too many folders in a single subdirectory?

We will consider these questions as we move forward in this section.

An organization-wide folder structure


A central repository folder structure must make sense for all groups in your
organization. During implementation, it is not necessary to build out the entire

188 IBM FileNet Content Manager Implementation Best Practices and Recommendations
folder structure; the first three levels are sufficient. The goal for the first three
folder hierarchical levels is a structure that is accessible at first glance to any
member of your organization.

Best practice: When designing a central repository folder structure, start with
the first three levels of the structure. Build this out for your entire organization.

The first three levels of the folder hierarchy form the central organization scheme
for your repository. Three levels are not an absolute rule; four or five levels might
be necessary for large organizations. The idea is to create a structure that
provides an organizational foundation.

Depending on your organization, there are several approaches for organizational


schemes. The best way to illustrate this concept is through examples.

Example: By organizational chart


The first example is a folder structure that follows a company organizational
chart. In this organizational scheme, as shown in Figure 8-2, the folder levels
represent:
(1) Department → (2) Activity → (3) Document type

Figure 8-2 A organizational folder structure

Chapter 8. Advanced repository design 189


Example: By geographical location
Another example is a repository that stores construction project records. For this
organization, construction projects are organized by location (see Figure 8-3). In
this scheme, the folder levels are:
(1) Region → (2) Construction project → (3) Document type

Figure 8-3 A geographical folder structure

Example: By function
The next folder structure is based on function. This structure is appropriate for
records systems that are typically organized by the function of the document, the
activity with which it belongs, and the record category under which it needs to be
filed. In this scheme, as shown in Figure 8-4, the folder levels are:
(1) Function → (2) Activity → (3) Document type

Figure 8-4 A functional folder structure

190 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Best practice: Create a folder structure that makes sense for your entire
organization. Develop your folder structure to at least the third folder
hierarchical level. This structure forms the framework for your repository
organizational scheme.

Beyond the third level


The goal of the three level folder hierarchy is to impose an organization-wide
structure for repository content. But many times, individual groups have their own
requirements for folder structures and want to organize their content without
system-enforced rules. For these groups, simply release the organizational rules
for any folders created under the third level.

Folder-inherited security allows repository administrators to restrict the creation


of folders in the first, second, and third folder hierarchical levels and grant folder
creation privileges to group owners below this level. This enforces the integrity of
the organizational scheme, while still allowing individual departments to organize
content to their own satisfaction.

In the example in Figure 8-5 on page 192, folder creation rights to levels 1-3
need to be reserved for system administrators only. Folder creation rights under
the accounting folder need to be granted to the accounting group manager, and
folder rights under projects need to be granted to the IT group manager.

Chapter 8. Advanced repository design 191


Repository
Level 1 Information Store

System-wide:
may not be changed Level 2
finance HR marketing IT

Level 3
accounting tax audit projects

Group controlled: legder Sub-levels CA-File


may be changed to
meet group expenses
requirements ECM

Mary’s Files

Figure 8-5 Folder creation rights in an organization-wide folder structure

Avoiding an excessive number of subfolders


It is possible to create too many subfolders under a parent folder. For all
implementations, avoid creating more than several hundred subfolders under
any specific repository folder.

In any foldering application, large numbers of subfolders create performance


problems. The system slows down when users open the parent folder and an
excessive number of subfolder entries must be queried and returned from the
database. The design goal is to create a deeper hierarchy rather than a overly
shallow structure.

Best practice: Do not create a folder that contains more than several hundred
subfolders. Otherwise, performance suffers as a result.

There is a technical notice, IBM FileNet P8 Recommendations for Handling


Large Numbers of Folders and Objects Technical Notice, which provides
recommendations about handling large numbers of contained folders,
documents, and custom objects in an IBM FileNet P8 environment. You can
download the technical notice.

192 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Although the paper is written for Version 3.x, the recommendations are still
applicable for Version 4.0 software. To download the complete technical notice,
visit:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/35x/fold
erlimitrecommendations.pdf

Or, go to the product documentation page for IBM FileNet P8 Platform and look
for the technical notice:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

8.2 Storage media


A P8 Content Manager repository stores data in two areas: the catalog and the
object store. The catalog is a relational database that stores repository
configuration: object references, properties, choice lists, and object relationships.
The object store holds actual content: electronic media files. Object stores can
be configured to use three distinct types of storage (see Figure 8-6 on page 194):
򐂰 Database Store
򐂰 File Store
򐂰 Fixed Store

Chapter 8. Advanced repository design 193


Content Manager

Catalog

File Store Database Store Fixed


Store

Object Stores

Figure 8-6 P8 Content Manager object store storage options

When choosing a storage method for your content, keep in mind that each of
these storage methods can be configured on a per document class basis.

8.2.1 Catalog
The catalog is a relational database (RDMS) that is specified at installation time.
The catalog can be created on any supported RDMS; refer to the product
documentation for information about supported brands and versions.

The catalog database stores all of the P8 Content Manager configuration


information. If you expand the object store view using IBM FileNet Enterprise
Manager, the object tree that displays is pulled from the catalog database. The
catalog stores:
򐂰 Configuration information
򐂰 Object references
򐂰 Object properties
򐂰 Object security lists
򐂰 Choice lists

194 IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 Property values
򐂰 Document (content) links
򐂰 Search definitions

Indexing custom properties


In general, indexing and configuration of the catalog system components are
handled by the installation wizard. P8 Content Manager does not support writing
to the catalog database through direct SQL commands; interaction with the
catalog database must be handled using IBM Filenet Enterprise Manager or
through the application programming interface (API).

Important: P8 Content Manager does not support direct writes (updates) to


the P8 Content Manager catalog database.

The exception to this rule is custom property indexes. You can create a database
index for any class property except system-owned properties. These database
indexes, also known as single indexes, are stored within the object store
database. For properties that users search frequently, single indexes reduce
processing time for queries on this property. After creating indexes through IBM
FileNet Enterprise Manager, you can use RDMS tools to analyze the
performance of the indexes and apply refinements as necessary.

To create database indexes for class properties:


1. From the IBM FileNet Enterprise Manager tree view, right-click the class
containing the properties that you want to index and select Properties from
the pop-up menu.
2. Select the Property Definitions tab.
3. Select the property that you want to index and click Edit.
4. On the Properties dialog, click Set/Remove.
5. On the Set/Remove Indexing dialog, select Set and check Single Indexed.
6. Click OK to close all dialog boxes and apply the changes.

Figure 8-7 on page 196 shows the Date Hired property where you can set or
remove the associated database index.

Chapter 8. Advanced repository design 195


Figure 8-7 Indexing custom properties

Note: When selecting a property to index, the object store search must be
case-sensitive, or the index is not created correctly. You must create
additional indexes in Oracle and DB2® to avoid full table scans.

8.2.2 Database stores


P8 Content Manager can be configured to store content inside a relational
database. With this configuration, P8 Content Manager converts document
content into binary large objects (BLOBs) for storage in the database.

When choosing either file or database storage for a document class, consider the
size of the content that will be stored. We recommend storing large content in file
stores. For small content (maximum file size smaller than 10 MB), a database
store has a measurable storage and retrieval performance advantage over file
storage.

196 IBM FileNet Content Manager Implementation Best Practices and Recommendations
8.2.3 File stores
With a file store, P8 Content Manager stores content files on a local or shared
network disk drive. A file store is the most common object store configuration. To
organize the files on disk, P8 Content Manager sets up a managed hierarchy of
directories on the specified drive.

Note: The file names of the content written to this directory will be named for
the content object’s Global Unique Identifier (GUID).

Share Shared directory


(File storage area
Parent directory)

File storage area


Root (Root directory)

Content Committed content


Element files

Figure 8-8 File store directory structure

A file storage area consists of a hierarchy of folders on a local or shared network


location:
Share The shared folder serves as the parent directory to one or
more file storage areas.
Root The root directory of the file storage area is the top-level
directory for content storage. A single parent shared
folder can contain one or many file storage area root
folders.
Content The directory where all committed content element files
are stored in a large hierarchy of subfolders.

File store directory hierarchies can be quite large and are limited only by
available disk space. The software handles directory size limitations by
automatically creating new directories when needed.

Chapter 8. Advanced repository design 197


8.2.4 About storage policies
A storage policy provides mapping to a specific object storage area and is used
to specify where content is stored for a given content class. P8 Content Manager
supports the mapping of storage policies to one or more storage objects;
therefore, each storage policy can have one or multiple storage areas as its
assigned content storage target (see Figure 8-9). This concept is known as
farming.

Storage
Doc class A Policy1
Disk Drive

Doc class B Storage


Policy2
Repository

Storage Farm

Figure 8-9 Storage policies

Farming
A storage area farm is a group of storage areas that acts as a single logical
target for content storage. With farming, Content Engine provides load-balancing
capabilities for content storage by transparently spreading the content elements
across multiple storage areas. Therefore, the storage policy functions as both the
mechanism for defining the membership of a storage area farm and also the
means for assigning documents to that farm.

Create separate file storage areas to ensure efficient document management.


For example, you can create a file storage area to group documents with the
same deletion or backup requirements. Map storage areas with documents by
modifying the storage policy property on document classes.

Best practice: Use IBM FileNet Enterprise Manager to configure storage


policies and storage farms.

To create a new storage area:


1. Start FileNet Enterprise Manager.
2. Create a new storage policy by right-clicking on the storage policy node in the
object tree.

198 IBM FileNet Content Manager Implementation Best Practices and Recommendations
3. Create a new file store under a new directory name, and complete both
wizards. See Figure 8-10.
4. Refresh the object store.
5. On the document class that will be stored in the new location, right-click and
select properties. Select the new storage policy.

All documents created with this object class will now be created in the new
storage area.

Figure 8-10 Manage content storage by creating new storage policies

If it becomes necessary to move a file store to a new disk, the Move File Store
wizard enables you to relocate a file storage area from one physical location to
another. Rather than moving a file storage area for you, the Move File Store
wizard prompts you to perform certain steps, and it also performs certain steps
for you. For the actual transfer, you can use the file transfer tools of your choice.

Chapter 8. Advanced repository design 199


Supported file store devices
File store devices can be any device that meets the following minimum
requirements:
򐂰 Presents to the operating system as a Windows drive, share or Universal
Naming Convention (UNC) designator, or UNIX mount device
򐂰 Supports synchronous writes

8.2.5 Using fixed storage devices


Fixed storage devices are large capacity third-party storage devices that feature
hardware level content protection. Examples of fixed storage devices are EMC
Centera or NetApp® Snaplock. Fixed content systems potentially provide
extremely large storage capacity, as well as write-once hard drive technology.

Fixed content stores compared to file stores


Before deciding on a fixed content store, review the following considerations:
򐂰 Content stored in the fixed storage area is accessed via the Content Engine
using a third-party API rather than the file system API.
򐂰 Read/write access to the repository can be slower in a fixed content store
than access to Content Engine’s file storage area.
򐂰 For fixed content store, the repository might be write-once, which does not
allow any changes to the content. This is exactly the same as normal file
storage areas in that document content can never be changed after it has
been added to the repository; it can only be revised, and new versions can be
added to the repository.
򐂰 The repository might not allow deletion of content except through third-party
device tools. The repository can support a retention period for content, which
means that deletion of the content is not allowed until the retention period has
expired.
򐂰 The fixed content system can limit the number of concurrent connections to
the server, which means that there are fewer connections allowed than
current read/write requests normally supported by the Content Engine. This
might result in decreased performance, but not error conditions.

200 IBM FileNet Content Manager Implementation Best Practices and Recommendations
8.3 P8 Content Manager searches
There are several methods of searching for content in the P8 Content Manager
repository. The methods can be divided by the purpose of the search:
򐂰 User-invoked searches
򐂰 Content-based searches
򐂰 Repository maintenance searches

P8 Content Manager offers a set of tools for each purpose.

8.3.1 User-invoked searches


Users can create and invoke P8 Content Manager searches through Workplace.
Workplace is a pure Web application so there is no Windows application to
download and install on users’ desktops.

Workplace offers two types of searches: Search and Stored Searches.

Workplace search
Workplace search is a Web site that can be customized by individual users. It is
a feature of the Workplace Web interface. Search appears when users log in to
P8 Content Manager using Workplace (see Figure 8-11 on page 202).

Chapter 8. Advanced repository design 201


Figure 8-11 Workplace search

Workplace search is an ideal tool for user-invoked ad hoc searches for repository
content. When users click Modify in the lower right area of the page, they can
modify the search criteria. Any system or custom property can be added to the
criteria display.

Note: When users modify their search criteria, the system remembers the
settings and will display them again on the next visit to the site.

Workplace-stored searches
Workplace also offers a tool for designing search templates for more
sophisticated content searches. Search Designer offers the following enhanced
features:
򐂰 Cross-object store searches
򐂰 Search criteria expressions (AND/OR options)
򐂰 Preset criteria for filtering search results
򐂰 Searches that appear as links on a browser favorites menu

Use Search Designer to create stored searches. This is an applet that can be
found on the Advanced Tools page within Workplace.

202 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Cross-repository search
To create a cross-repository search, click the Object Store tab to add the
repositories to be search to the selected object store list. See Figure 8-12.

Figure 8-12 Cross-repository search

Search criteria expressions


To create a search expression using AND/OR conditions, enter the search
criteria, shift click to select groupings of criteria, and click AND or OR. See
Figure 8-13 on page 204.

Chapter 8. Advanced repository design 203


Figure 8-13 Search criteria expressions

Adding a stored search to a browser’s favorites menu


Stored searches are accessed as simple Web links. Stored searches can be
added to a browser’s favorites menu by simply opening the stored search and
clicking Add to Favorites.

Preset search criteria


To preset search criteria to create an automated search, enter a value for a
search criteria and select the Hide value option to the right. See Figure 8-14 on
page 205.

204 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 8-14 Preset search criteria

8.3.2 Content-based search (CBR)


P8 Content Manager supports content-based retrieval (CBR) for documents,
annotations, folders, and custom objects or their properties. With CBR, you can
search an object store for objects that contain specific words or phrases
embedded in document or annotation content; or embedded in string properties
of objects that have been configured for fulltext indexing.

Content-based searches can be performed from all P8 Content Manager search


tools. Notice the AND Property & Content conditions field in Figure 8-14; this
is the CBR feature in Workplace search.

Configuring P8 Content Manager CBR


Before you configure an object store for CBR, complete a few preliminary steps.
Several of these steps are optional and depend on the types of searches that
users will perform on an object store and the amount of activity that you
anticipate on an object store. Several of the steps are performed with
Verity-provided tools. Refer to the Verity documentation for more information.

Chapter 8. Advanced repository design 205


For each object store you plan to configure for CBR, determine which
components will support CBR. Classes and properties of those classes can be
independently configured for CBR. To configure:
1. Use the Verity K2 Dashboard to import new Verity style files.
2. Use the Verity K2 Dashboard to create Verity Servers. At least one index and
one search server must exist for a Verity domain.
3. Use the Verity K2 Dashboard to create a Verity Broker for each Content
Engine server or virtual server. Each broker must be attached to every server
that it references.
4. Use the Verity K2 Dashboard to create Verity Ticket Servers if the fulltext
indexing information is to be secure (recommended for production systems).
See the P8 Content Manager Administration section in FileNet Enterprise
Manager for additional help with Verity configuration.
5. Use IBM FileNet Enterprise Manager to create the Verity Domain
Configuration object.
6. Use IBM FileNet Enterprise Manager to create at least one index area in the
site where the object store that you plan to configure for CBR is located, even
for those object stores that store content in the database only. CBR indexes
must be located within an index area.
7. (Optional) Update the excluded word list. Excluded word lists contain those
words that you want to exclude from an index, such as “an” and “the”. The
default excluded word list is located in the Filenet\Verity\defaultstyles\<default
language> subfolder and is named style.stp. If there are certain words that
you want to exclude from all indexes all of the time, add them to the default
excluded word list.
8. (Optional) Update the Verity style files to support any indexing features, such
as sentence and paragraph searches, that you want to enable.

8.3.3 Searches for repository maintenance


IBM FileNet Enterprise Manager features a query tool that can be used for
detailed report generation or for maintaining an object store repository. With the
Query Builder tool, you can create a search query and apply bulk actions on the
objects returned in the result set. With Query Builder, you can:
򐂰 Find objects using property values as search criteria.
򐂰 Create, save, and run simple searches.
򐂰 Create and save search templates that will prompt for criteria when launched.
򐂰 Launch search templates that are provided with each Content Engine and
IBM FileNet Enterprise Manager installation. These templates are provided to

206 IBM FileNet Content Manager Implementation Best Practices and Recommendations
assist with managing the size of your audit log and for managing entries in the
QueueItem table.
򐂰 Create, save, and run SQL queries.
򐂰 Searches can be combined with bulk operations that include the following
actions (available on the Query Builder’s Actions tab):
– Delete objects.
– Add objects to export manifest.
– Undo checkout (for documents).
– Containment actions (for documents, custom objects, and folders): file in
folder and unfile from folder.
– Run VBScripts or JScripts (Query Builder Script tab).
– Edit security by adding or removing users and groups.
– Lifecycle actions: set exception, clear exception, promote, demote, and
reset

Query Builder
To access Query Builder, open IBM FileNet Enterprise Manager and expand the
object store that you want to search. Right-click Search Results and select New
Search. See Figure 8-15.

Figure 8-15 Create a new Query Builder search

There are two possible ways to construct searches: Simple View and SQL View.
Select view from the toolbar to select a view style:
򐂰 Simple View offers a point-and-click interface where you can select tables,
classes, and criteria from drop-down lists.
򐂰 SQL View translates anything that you create in Simple view (one-way
translation only: you cannot translate an SQL View into a Simple View) and
presents the query in an SQL text window that you can then directly edit or
load any *.qry files that you have saved on the network.

Chapter 8. Advanced repository design 207


Both views construct a query that can be bundled with the other Query Builder
features: bulk operations, scripts, and security changes. Both views support
Search Mode and Template Designer Mode.

Tip: To aid administrators using SQL View, the P8 Content Manager help files
contain P8 Content Manager database view schema.

Search templates and template designer mode


Search templates are like simple queries except that when search templates are
loaded from IBM FileNet Enterprise Manager’s Saved Searches node, they
prompt you for search criteria and whether you want to include any defined bulk
operations.

IBM FileNet-provided search templates are installed with every Content Engine
or IBM FileNet Enterprise Manager-only installation into a folder on the local
server named SearchTemplates, which is located in the FileNet installation
directory. Any queries placed in this folder appear in IBM FileNet Enterprise
Manager’s Saved Searches node as long as they have .sch as a filename
extension.

Querying object-valued properties


One of Content Engine’s powerful search features is the ability to retrieve an
object given another object that is a member of one of its object-valued
properties. For example, you can find a document that has a particular security
policy by using the identifying ID of the security policy in the search criteria.

View search results


After Query Builder executes a search, the Search Results details pane displays
the result set. The results pane displays the columns of properties from the
selected object table. You can resize the dialog, move table columns, or sort the
search results.

Multiselect operations
Multiselect (or bulk) operations perform an operation on all objects returned in
the search result dialog. This feature is especially useful for object store
maintenance activities. With multiselect operations, you can perform the
following actions on multiple files at the same time:
򐂰 Delete
򐂰 File to folder
򐂰 Unfile from folder
򐂰 Undo checkouts
򐂰 Change life cycle states

208 IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 Add to security access control lists (you cannot delete existing entries)
򐂰 Run an event action script

To access the multiselect menu, run a Query Builder search and select the items
that you want to modify from the result set. Right-click and select Multiselect
Operations.

Figure 8-16 Multiselect operations

For example, assume that several documents had been checked out by
someone who left your company. Using multiselect operations, you can search
for all documents that were left checked out by that person and undo these
checkouts in one operation. To do this, you use the Query Builder to construct a
search to find all documents currently checked out under the former employee’s
system login name.

8.4 Considerations for multiple object stores


There are several valid use cases for deploying multiple object stores. Keep in
mind, however, that a single object store can handle a catalog containing over a
billion objects and, using multiple storage policies, a virtually unlimited amount of
storage. Except under extreme conditions, size is not be a factor in the decision
to add additional object stores.

Multiple object stores are warranted in the following situations:


򐂰 An object store is subject to high ingestion rates or frequent update
procedures and needs to be segregated for performance reasons.
򐂰 Content must be separated for security reasons.
򐂰 User groups are separated by large geographic distance.

Chapter 8. Advanced repository design 209


8.4.1 Segregate for performance reasons
If an object store will be the target of high volume ingestion rates, such as those
produced by Capture or Email Manager, it makes sense to separate that object
store from others that are dedicated to document life cycle use. Users who
search for and check out documents for editing will experience better
performance if the object store they use is not busy handling high volume
automated processes. There are two common examples of this situation where
multiple object stores are used: E-mail archiving and IBM FileNet Records
Manager solutions (see Figure 8-17).

The IBM FileNet Records Manager object store that hosts record information is
subject to processing intensive database activity during retention and disposition
processing. In addition, record objects are small and best-suited for database
stores. For these reasons, records need to be stored in a separate object store.

Best practice: Set up a separate, database object store for IBM FileNet
Records Manager. This object store is commonly called the file plan object
store (FPOS).

High Ingestion Solution

Email
Manager Email Object Store

Document Object Store

Records Manager Solution

Content Records
Object Store Object Store

Figure 8-17 Two solutions with multiple object stores

Segregate content for security reasons

210 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Another reason to implement multiple object stores is a requirement to strictly
separate content for security reasons. Although it is possible to keep classified
content secure using marking sets and security policies, certain content must be
kept absolutely separate. In these situations, install a second object store for
classified content.

Here are a few situations where secure object stores are a solution:
򐂰 Board of director level content
򐂰 Secret or top secret government content
򐂰 Public-facing Internet accessible libraries
򐂰 Service companies that offer enterprise content management services to
multiple customers

8.4.2 User groups are separated by large geographical distance


Many organizations have large offices in several countries. Wide Area Network
(WAN) links are expensive over large distances and typically have low bandwidth
and high latency. It is not always practical for offices in this situation to share the
same P8 Content Manager system. One solution to this situation is two separate
repositories managed by two separate P8 Content Manager systems as shown
in Figure 8-18.

WAN
Link

Content Engine Content Engine

Location 1 Location 2
Repository Repository

Figure 8-18 Two separate repositories

In this solution, an organization has installed two separate P8 Content Manager


systems in two distant offices. Users in each office have high speed access to
the local repository for document retrieval and editing. Users in remote offices
can still search and retrieve content in the remote office repository, but because

Chapter 8. Advanced repository design 211


this activity is less frequent than local access, traffic over the WAN link is
reduced.

212 IBM FileNet Content Manager Implementation Best Practices and Recommendations
9

Chapter 9. Business continuity


In this chapter, we describe how to provide for business continuity with IBM
FileNet Content Manager (P8 Content Manager).

We discuss the following topics:


򐂰 Defining business continuity
򐂰 Defining high availability (HA)
򐂰 Implementing a high availability solution
򐂰 Defining disaster recovery (DR)
򐂰 Implementing a disaster recovery solution
򐂰 Best practices
򐂰 Product documentation for HA and DR

© Copyright IBM Corp. 2008. All rights reserved. 213


9.1 Defining business continuity
Business continuity is defined as maintaining business services to an
organization’s customers despite disruptive events that have the potential to
interrupt service. Disruptions range from human errors or component failures to
full-scale human-caused or natural disasters. Providing for continued business
operations in the event of a local component failure is called high availability,
while business continuity in the event of a full-scale disaster is called disaster
recovery.

Business continuity is concerned with resuming all critical business functions


after disruptive events, whereas high availability and disaster recovery are
concerned primarily with the subset of business continuity devoted to keeping
information technology (IT) services available during and after disruptions.
Beside IT services, business continuity covers all aspects of continuing business
operations, including crisis management and communications, alternate work
sites for employees, employee disaster assistance, temporary staffing,
emergency transportation, physical security, and chain of command.

Business continuity planning (BCP) involves all aspects of anticipating possible


disruptions to mission-critical business functions and putting in place plans to
avoid or recover from those disruptions. BCP focuses on planning for the
successful resumption of all mission-critical business operations after a
disruption, not just restoring IT functions. It involves much more than IT
professionals. It touches every department in an enterprise from upper
management to human resources, to external communications professionals,
telecommunications staff, facility management, health care services, finance,
sales, marketing, and engineering.

Business continuity planning in the limited scope of IT functions will involve the IT
department, facility management, telecommunications, and line of business
management who can assist in evaluating which IT functions are mission-critical
after a disruption or disaster. High availability and disaster recovery plans need
to be formally developed and reviewed by all these stakeholders, implemented,
and then regularly tested by all staff to be certain that they will function as
expected during and after a real disruption.

This chapter covers the part of business continuity that concerns restoring IT
functions, in particular P8 Content Manager, after a disruptive event.

214 IBM FileNet Content Manager Implementation Best Practices and Recommendations
9.2 Defining high availability (HA)
What is high availability (HA) and how is it measured? We start by defining
availability. A business system is said to be available whenever it is fully
accessible by its users. Availability is measured as a percentage of the planned
uptime for a system during which the system is available to its users, that is,
during which it is fully accessible for all its normal uses.

Planned uptime is the time that the system administrators have agreed to keep
the system up and running for its users, frequently in the form of a Service Level
Agreement (SLA) with the user organizations. The SLA might allow the system
administrators to take the system down nightly or weekly for backups and
maintenance, or, in an increasing number of applications, rarely if at all. Certain
mission critical systems for around-the-clock operations now need to be
available 24 hours a day, 365 days a year.

The concept of high availability roughly equates to system and data available
almost all of time, 24 hours a day, 7 days a week, and 365 days a year.
Achieving high availability means having the system up and running for a period
of time that meets or exceeds the SLA for system availability, as measured as a
percentage of the planned uptime for a system.

Table 9-1 on page 216 helps quantify and classify a range of availability targets
for IT systems. At the low end of the availability range, 95% availability is a fairly
modest target and hence is termed basic availability. It can typically be achieved
with standard tape backup and restore facilities. The next level up, enhanced
availability, requires more robust features, such as a Redundant Array of
Independent Disks (RAID) storage system, which prevents data loss in the first
place, rather than the more basic mechanisms for recovering from data loss after
it occurs. Highly available systems will range from 99.9% to 99.999% availability
and require protection from both application loss and data loss. At the high end of
this continuum of availability is a fault tolerant system that is designed to avoid
any downtime ever, because the system is used in life and death situations.

Chapter 9. Business continuity 215


Table 9-1 Range of availability
Availability percent Annual downtime Availability type

100% 0 minutes Fault tolerance for life and


death applications

99.999% 5.3 minutes Five nines -- near


continuous availability

99.99% 53 minutes High availability

99.9% 526 minutes (8.8 hours) High availability

99% 88 hours (3.7 days) Enhanced availability

95% 18 days (2.6 weeks) Basic availability

To make this more concrete, consider the maximum downtime that can be
absorbed in a year while still achieving 99.999% availability, also called five nines
availability. As Table 9-1 indicates, five nines permits no more than 5.3 minutes of
unscheduled downtime per year, or even less if the system is not scheduled for
round-the-clock operation. This is near continuous availability, but not strictly fault
tolerant. For a three nines target of 99.9%, we can allow 100 times more
downtime, or 8.8 hours per year. An availability target of 99%, which still sounds
like a high target, can be achieved even if the system is down 88 hours per year,
or over three and half days. So the range of availability is actually quite large.

You might be asking yourself, “Why not provide for the highest levels of
availability on all IT systems?”. The answer, as always, is cost. The cost of
providing high availability goes up exponentially as availability approaches
99.9% and higher.

Choosing an appropriate availability target involves analyzing the sources and


costs of downtime in order to justify the cost of the availability solution. Industry
experts estimate that less than half of system downtime can be attributed to
hardware, operating system, or environmental failures. The majority of downtime
is the result of people and process problems, which comes down to a mix of
operator errors and application errors.

This chapter focuses primarily on how to mitigate downtime due to hardware


outages, system and IBM FileNet software problems outside the control of an
IBM FileNet client, or environmental failures, such as loss of power, network
connectivity, or air conditioning. This covers less than half of the sources of
downtime. The majority of the sources requires people or process changes.

Our advice is to determine what has caused the most downtime in the past for a
particular system and focus first on that. Frequently, we have found that stricter

216 IBM FileNet Content Manager Implementation Best Practices and Recommendations
change control and better load testing for new applications will pay off the most.
Focus on the root causes of outages first and then address the secondary and
tertiary causes only after protecting against the root causes.

Here are several examples of best practices for avoiding downtime from people
and process problems:
򐂰 System administrators need to be well-trained and dedicated full-time to their
systems, so that they are least likely to commit pilot errors.
򐂰 The applications running on the system must be designed with great care to
avoid possible application crashes or other failures.
򐂰 Exception handling, both by administrators and application programs, must
be well thought-out, so that problems are anticipated and handled efficiently
and effectively when they occur.
򐂰 Comprehensive testing and staging of the system is paramount to avoiding
production downtime. Testing of the system under a simulated production
workload is critical to avoiding downtime when the system is stressed with a
peak load during production. Downtime on a test system does not affect
availability of the production system, so make sure to wring out all the
problems before taking a new system, software release, service pack, or
even software patch into production.
򐂰 Deploying a new application into production must likewise be planned and
tested carefully to minimize the possibilities of adversely affecting production
due to an overlooked deployment complication.
򐂰 Thorough user training will help keep the system performing well within the
bounds for which it was designed. Users who abuse a system due to
ignorance can affect overall system performance or even cause a system
failure.

Make sure that all sources of downtime are addressed, if high availability is truly
to be achieved. After the fundamental people-related and process-related
problems have been addressed, you need to consider hardware and software
availability next.

9.3 Implementing a high availability solution


There are a variety of building blocks for high availability, ranging from the most
basic backup and restore facilities, to hardened servers and backup servers, to
the best practices: server farms and server clusters. It is important to note that
server farms and server clusters, as those terms are used in this chapter, are
different solutions. We will explore server farms first, and then explain how
clusters differ.

Chapter 9. Business continuity 217


9.3.1 Load-balanced server farms
Server farms are the best practice for Web servers. In fact, they are the best
practice, in terms of high availability, for all the server tiers in a P8 Content
Manager solution where they are supported. The architecture and function of
some servers do not lend themselves to a server farm configuration, but the 4.0
versions of the Content Engine and Process Engine support server farming, as
do all the Web and presentation tier products. In addition, Oracle’s Real
Application Clusters (RACs) support server farming.

As we have already discussed in 3.2, “Scalability” on page 32, the key concept
for a server farm is to distribute the incoming user workload across two or more
active, cloned servers. This distribution is commonly called “load balancing,”
which can be implemented either in hardware or software.

This is a scalable architecture, because servers can be added to the farm to


scale it out for greater workloads. It also provides improved availability, because
the failure of one server in a farm still leaves one or more other servers to handle
incoming client requests, which keeps the service available at all times.

In a load-balanced server farm, clients of that server see one virtual server, even
though there are actually two or more servers behind the load-balancing
hardware or software. The applications or services that are accessed by the
server’s clients are replicated, or cloned, across all the servers in the farm. And
all those servers are actively providing the application or service all the time.

The load-balancing software or hardware receives each request and uses any
one of a variety of approaches for distributing the request workload over the
servers in the farm. This can be a simple round-robin approach, which sends
requests to the servers in a predefined order. A more sophisticated load balancer
might use dynamic feedback from the servers in the farm to choose the server
with the lightest current load or the fastest average response time, for example.

In any case, the load balancer keeps track of the state of each server in the farm,
so that if a server becomes unavailable, the load balancer can direct all future
requests to the remaining servers in the farm and avoid the down server, thereby
masking the failure.

The key enabler for a server farm is the load balancer. IBM FileNet leverages
IBM and third-party load-balancing hardware and software products. Microsoft,
for instance, includes its Microsoft Network Load Balancer with every copy of
Windows Server® 2000 and Windows 2003 Server. All the Java application
server vendors provide software for to balance the Java application workload
running in their Java 2 Platform, Enterprise Edition (J2EE) environments. For
example, IBM WebSphere Network Deployment and Extended Deployment
application server products include built-in software load balancing. J2EE

218 IBM FileNet Content Manager Implementation Best Practices and Recommendations
application server vendors, including IBM, use the term cluster for their
load-balancing software feature.

Network hardware vendors, such as Cisco and f5 Networks, have implemented


load balancing for server farms in several of their network devices. f5 BIG-IP is a
popular hardware load-balancing device. There are also many other vendors that
have load balancer products.

Figure 9-1 shows a logical diagram of a load-balanced server farm. This figure
shows a pair of hardware load balancers and multiple servers in the server farm.
Redundancy is essential to prevent the failure of one load balancer from taking
down the server farm.

Load
Balancer

Figure 9-1 A load-balanced server farm

This concept of no single point of failure is key to high availability. Every link in
the chain, that is, every element in the hardware and software, must have an
alternate element available to take over in case it fails. Software load balancers,
for example, are designed to avoid any single point of failure; therefore, each
server in the farm has a copy of the load-balancing software running on it in
configurations using software instead of hardware for load balancing.

Note that the software running on each server in a farm is functionally identical.
As changes are made to any server in the farm, you must replicate those
changes to all the servers in the farm.

Load balancing offers a good solution: Any client calling into a load-balanced
server farm can be directed to any server in the farm. The load can be evenly
distributed across all the servers for the best possible response time and server
usage. However, load balancing can be a problem if the servers in the farm

Chapter 9. Business continuity 219


retain any state between calls. For instance, if a user initiates a session by
providing logon credentials, it is beneficial for those credentials to be cached for
reuse on all subsequent calls to the server for that user session. We cannot ask
the user to log in over and over every time that the application needs to
communicate with the server; therefore, in one solution, the server keeps a
temporary copy of the user’s validated credentials in its memory. This works fine
if there is only one server, but in a load-balanced server farm, the load balancer
can easily direct subsequent calls from the same user session to different
servers in the farm. Those other servers will not have the session state in their
memory.

Load balancers can be configured for session-based load balancing to solve this
session state problem. This is also known as sticky sessions, session affinity, or
stateful load balancing. The load balancer keeps track of which server it selected
at the beginning of a user session and directs all the traffic for that session to the
same physical server. Session-based load balancing is required for the
Application Engine, but not for the Content Engine or Process Engine.

Now, we turn to server clusters and explore how they differ from farms.

9.3.2 Active-passive server clusters


Historically, server clusters have been required for the business logic and data
tiers beneath the Web and presentation layer tier of servers. Examples include
business process servers, library or repository servers, and database or file
system servers. However, the IBM FileNet P8 4.0 Content Engine and Process
Engine servers are exceptions that can be deployed in load-balanced server
farms.

Business logic and data tier servers all differ from Web and presentation servers
in that they directly manage substantial dynamic data, such as content or
process data. A stream of dynamic data, by definition, is a stream of new or
rapidly changing data. It is common for a single server to manage a dynamic
data set, rather than a set of servers that need to cooperate to manage the data
jointly.

Because of that single server architecture, a server farm with two or more active
servers does not fit well with servers that have not been designed for cooperative
data management. Yet a second server is still needed for continued availability,
in case the first server fails. The solution in this case is an active-passive server
cluster, where the second server stands by until the first server fails, before
stepping in to take over the data management.

The second server needs to have access to the data that was being managed by
the first server, either the same exact copy, or a copy of its own. The common

220 IBM FileNet Content Manager Implementation Best Practices and Recommendations
solution allows both servers to have access to the same copy of data either via a
network file share or, more commonly, a Storage Area Network (SAN) device
that both servers can access, but only one at a time. The active server owns the
SAN storage, and the passive server has no access. Sharing the SAN storage in
this way is a simpler solution than replicating the data to a second storage device
accessed by the second server.

So, shared data storage is a key concept for server clusters. Figure 9-2 shows
two servers in a server cluster with access to the same shared storage. Recall
that server farms typically do not have this requirement for shared storage, so
this is an essential difference between server farms and server clusters. Oracle
RAC and the Content Engine are exceptions, in that they exhibit both server farm
and server cluster characteristics. They take advantage of load balancing,
combined with cooperative data management using storage that is shared by all
the RAC or Content Engine servers. In the case of a load-balanced server farm
with shared storage, all the servers are active and thus need to access the
storage in parallel, so a network file share is required. An active-passive server
cluster, however, is designed to allow only the active server to access the
storage, so the single-owner model of SAN storage works well. The typical
server cluster does not support load balancing, but it does support shared
storage via SAN. Note that the storage is shared in the sense that both servers
are connected to the same storage, so they share access to the same storage,
but never concurrently in the case of SAN storage.

Shared storage

Figure 9-2 Active-passive server cluster

As with server farms, clients of a server cluster see one virtual server, even
though the physical server they interact with will change if the primary server
fails. If the primary server fails, a failover occurs, and the second server takes
over the data copy and starts up the software to manage the stored data. It also

Chapter 9. Business continuity 221


takes over the virtual network address, which is shared by the two servers,
making the failover transparent to the client of the server cluster.

Both triggering a failover and actually accomplishing the failover are the
responsibility of clustering software running on both servers. This software is
configured on the secondary server to monitor the health of the primary server
and initiate a failover if the primary server fails.

After the failed server is repaired and running again, a failback is initiated to shift
the responsibility back to the primary server and put the secondary server in
waiting mode again. This failback is necessary to get back to a redundant state
that can accommodate another server failure.

In certain cases, intentional failovers can be used to mask planned downtime for
software or hardware upgrades or other maintenance. You can upgrade and test
the secondary server offline, and then, you can trigger a failover and apply the
upgrade to the primary server while the secondary server is standing in for the
primary server.

This type of configuration, in which the second server is inactive or passive until it
is called to step in for the active server, is called an active-passive server cluster.
Several clustering software products, if not all, also support an active-active
cluster configuration, which is similar to a server farm where all servers are
active. An active-active cluster configuration is useful for data managing servers
that are designed to share the management across more than one server.

However, IBM FileNet P8 products that use clustering software for high
availability all require an active-passive configuration. IBM FileNet P8 products
that work with an active-active configuration always use a server farm and load
balancing rather than clustering software. (Server farms are always
active-active.)

Server cluster software requires agents or scripts that are configured to manage
key server processes on a particular server. These agents or scripts are
configured so that they can monitor the health of the application software, as well
as start and stop the application software on that server.

Cluster software typically comes with predefined agents or scripts for common
server types, such as database servers. In addition, you can develop custom
agents when there is no predefined, or when you want more granular control of
the processes during a failover.

222 IBM FileNet Content Manager Implementation Best Practices and Recommendations
9.3.3 Geographically dispersed server clusters
Most server clusters consist of two side-by-side servers. However, certain
software vendors also support geographically dispersed clusters. Symantec’s
Veritas Cluster Server, for instance, supports both stretch clusters and replicated
data clusters. A stretch cluster is defined as two servers in a cluster separated by
as much as 100 km (62 miles). The distance limitation is due to the requirement
to connect both servers via fiber to the same Storage Area Network device for
shared storage and also due to the maximum amount of time allowed for the
heartbeat protocol exchange between the two servers. The two servers in a
stretch cluster always share the same SAN storage device, just as though they
were side by side and operate identically with a local server cluster.

You can use a stretch cluster as a disaster recovery solution as long as there is
an offline copy of the data at the second site. It requires only two servers total,
rather than the more typical three servers that are needed for HA plus DR: two in
a local cluster in one site for HA and a third server in the other site in the event of
the loss of the first site.

A replicated data cluster is similar to a stretch cluster, but the remote server
always has its own replicated copy of the data. In the event of a failover, the
second server comes up on its local copy of the data. In certain cases (but not all
cases), this capability removes the need for an expensive fiber connection
between the two sites, because neither server needs the speed of fiber to access
storage at the other site. Data replication can be done over an IP network. There
is still a 100 km (62 miles) distance limitation to insure that the heartbeat
between servers will not time out due to transmission delays and to allow for
synchronous replication. See 9.5.1, “Replication” on page 231 for an explanation
of synchronous and asynchronous replication.

Like a stretch cluster, a replicated data cluster can act as a DR solution, as well
as an HA solution. However, a replicated data cluster cannot provide the same
level of availability as a local cluster, because of the additional downtime
required for a data resync to the primary site on a site failback. In addition, the
communication requirements between the two sites are typically much more
expensive and substantially more prone to failure than the local communication
requirements between two servers in a local cluster. In order to support a
replicated data cluster, the two sites need to be connected by a dedicated and
redundant high-speed network, and their physical separation must be no more
than 100 km (62 miles).

Because of the availability trade-offs and communication costs, geographically


dispersed clusters are generally not the best practice for high availability.

Chapter 9. Business continuity 223


9.3.4 Server cluster products
All the server vendors offer their own server cluster software products (see
Table 9-2), as well as several software vendors.

Table 9-2 Server cluster products


Server and software platform Server cluster software product

IBM System p™ AIX® HACMP™

Windows 2000 Server and Windows Microsoft Cluster Server


Server 2003

Hewlett-Packard (HP) 9000 HP-UX HP ServiceGuard

Sun Solaris Sun Cluster

AIX, HP-UX, Solaris, Windows, Linux® Symantec Veritas Cluster Server

9.3.5 Server cluster configurations


Server cluster configurations may include the following variants1:
򐂰 Asymmetric 1-to-1
򐂰 Symmetric
򐂰 Asymmetric N+1

Asymmetric 1-to-1
The simplest of these configurations adds a passive server to be paired with
each active server. See Figure 9-3 on page 225. This asymmetric 1-to-1
configuration doubles the numbers of servers, assuming active-passive
clustering, and half of those servers are idle until an active server fails. Luckily,
there are more efficient server cluster configurations.

1
These variants are described here using Symantec Veritas terminology.

224 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Database Server File Server

Active Passive Active Passive

Figure 9-3 Asymmetric 1-to-1 clusters

Symmetric
A symmetric server cluster uses two active servers as backups for each other,
thereby avoiding any idle servers. In the example in Figure 9-4, part A on the left
shows one server running a database and the other server running a file server.
If the file server fails, the clustering software on the database server detects the
failure and starts up a copy of the file server on the database server as shown in
part B of Figure 9-4. Note that both servers must have both the database and file
server software installed for this to work.

Database Server
Database Server File Server File Server File Server

Failover

Active Active Failback Active Failed

A. Initial state and state after failback B. State after failure and failover

Figure 9-4 Symmetric cluster

Symmetric clusters require the same operating system on all servers in the
cluster. Also, the two servers need to support coresidency for this configuration
to work. In this example, the database and file server support coresidency; that
is, the database and file server can be installed on the same server and coexist
there. When the failed server is fixed or replaced, a failback is required to get
back to an HA configuration (shown in part A of Figure 9-4) in order to handle a
subsequent failure.

Chapter 9. Business continuity 225


This kind of cluster has no idle servers, but do not be fooled: there are idle
processing cycles in this configuration, in both servers, so that either one has the
capacity to take over the service of the other server while still maintaining its
primary service as well. The idle servers of an asymmetric cluster are replaced in
a symmetric cluster with idle processing capacity distributed across both servers
in the cluster. Still, many IT departments prefer symmetric clustering so that no
one server is completely idle, unlike asymmetric clustering.

Asymmetric N+1
Another asymmetric configuration, which is called N+1, limits the number of idle
servers to one passive server that is shared by N active servers. In the example
in Figure 9-5 on page 227, the three servers are a database server, a file server,
and one extra server that acts as the passive server for both of the active
servers. If the file server fails, for instance, the file server function fails over to the
extra server.

Another asymmetric configuration, this one called N+1, limits the number of idle
servers to one passive server that is shared by N active servers. In the example
in Figure 9-5 on page 227, part A shows three servers in a 2+1 configuration: a
database server (1), a file server (2), and one extra server (3) that acts as the
passive server for both of the active servers. If the file server (2) fails, for
instance, the file server function fails over to the extra server (3) as shown in part
B.

One of the advantages of this approach is that the failed server (2) simply
becomes the new passive server when it is repaired, so no failback (and the
associated downtime) is required for N+1 clusters. Another advantage is that,
like asymmetric 1-to-1 clusters, there is no change in server performance after a
failover, assuming all three servers are identical in capacity. In contrast,
symmetric clusters cannot guarantee unchanged performance after a failover,
because two independent servers with independent workloads have to coexist
on the same physical server after a failover.

226 IBM FileNet Content Manager Implementation Best Practices and Recommendations
New
File New
Database Passive Passive
Server Database File
Server Server Server
Server Server

1 2 3 1
Failover 2 3

Active Active Passive Passive


Active Active

A. Initial state before server 2 fails B. State after File Server fails over from
server 2 to 3, and subsequent repair to 2

Figure 9-5 Asymmetric N+1 cluster

9.3.6 Comparing and contrasting farms to clusters


Table 9-3 summarizes the differences and similarities between load-balanced
server farms and active-passive server clusters.

Table 9-3 Comparison of farms to clusters


Feature Farms Clusters

Clients see one virtual IP Yes Yes


address and one virtual
server

All servers active Yes No

One server active and one No Yes, typically


server passive

Capacity and performance Yes No


scalable by adding servers

Instantaneous failover Yes, all servers active all No, must wait for software
the time to be started after failover

Shared storage between Not necessarily, but can Yes, typically SAN
the servers include a network file share storage, which allows just
for parallel accesses from the active server to access
all the servers in the farm the storage

Used for Web servers, Yes Not usually


presentation tier, and
certain services tier
servers

Chapter 9. Business continuity 227


Feature Farms Clusters

Used for data tier servers No Yes (except Oracle RAC)

Requires hardware or Yes, like BIG-IP or No


software load balancer WebSphere ND clustering

Requires failover cluster No Yes, like HACMP,


software Microsoft Cluster Server,
or Symantec Veritas
Cluster Server

Now that we have covered the differences between server farms and server
clusters, we explore the advantages of farms over clusters and the advantages
of clusters over farms. Server farms have no idle servers, by definition, because
all servers in a farm are active, whereas asymmetric server clusters always have
one or more idle servers in a steady state. Even more importantly, you can
expand server farms by simply adding a server clone, thereby scaling out the
farm to handle larger workloads. This horizontal scalability is not possible with
active-passive server clusters. The last advantage of a farm over a cluster is
faster recovery time. Server cluster failovers are delayed by the time that it takes
to start up the FileNet software on the passive server on a failover, whereas all
the servers in a server farm are active and immediately available to accept work
that has been redirected away from failed servers.

There are also advantages that clusters have over farms.

Many clients prefer clustering IBM FileNet P8 servers over farming in order to
standardize on clustering for all servers in their data center. They anticipate
lower total cost of ownership through this standardization, because there are
fewer technologies to learn, support, and maintain.

Also, software load balancing consumes a non-trivial amount of network


bandwidth for the traffic between servers for balancing the load. This traffic is not
present for clustered servers, although clustered servers do typically require a
private network for the heartbeat function between the servers.

9.3.7 Inconsistent industry terminology


The terminology used in this book to distinguish load-balanced server farms from
active-passive server clusters is not unique to the book, but also not standard
across the industry. As you can see in Table 9-4 on page 229, many vendors use
the term “cluster” for both farms and clusters. Microsoft, for example, uses both
terms for server farms. Symantec/Veritas uses “failover group” for a cluster and
“parallel group” for a farm. Both BEA and IBM call their J2EE application server

228 IBM FileNet Content Manager Implementation Best Practices and Recommendations
farming configurations clusters. As we have seen, farms and clusters, under our
definition of those terms, are quite different, hence the emphasis here on distinct
terms for these HA approaches.

Table 9-4 Inconsistent industry terminology for HA


Vendor HA terminology

Microsoft “NLB cluster” and “cluster farm” = farm


“Server cluster” and “cluster server” = clusters

Symantec “Failover group” = cluster


Veritas “Parallel group” = farm

BEA WebLogic “cluster” = farm

IBM WebSphere “server group of clones” = farm


WebSphere “cluster” = farm

HACMP “cluster” = cluster

9.3.8 Server virtualization and high availability


Chapter 3, “System architecture” on page 27 introduced the concept of server
virtualization and its promise of consolidating data center hardware and thus
reducing total cost of ownership for the data center. This has considerable
appeal, but it can also have a negative impact on availability. If a server farm or
server cluster with two physical servers is consolidated into two virtual servers
hosted on the same physical server, you must be careful to insure that the
physical server has no single points of failure. Does it have redundant power
supplies, network interface cards, processors, memory, and so on? If any single
component failure on a server can take down all the virtual servers hosted on it,
that server cannot act as host for all the servers in a cluster or farm. Two of the
virtual servers must be hosted by different physical servers in this case to avoid
downtime caused by a single component failure.

9.4 Defining disaster recovery (DR)


Now, we turn from high availability to disaster recovery. How do they differ? Both
high availability and disaster recovery are part of business continuity, that is,
making sure that critical business systems and processes can continue to
operate despite system failures and disruptions. However, disaster recovery and
high availability solutions perform under different circumstances that require
different solutions.

Chapter 9. Business continuity 229


Disaster recovery concerns restoring service after the loss of an entire business
system or data center due to natural or human-made disasters, such as fire,
flood, hurricane, earthquake, war, criminal action, or sabotage. In contrast to
that, high availability concerns keeping a business system available despite a
local component failure – such as a server power supply failure, a network switch
failure, or a disk crash – that leaves most of the system untouched.

For recovery from the loss of an entire production system in a disaster, a full
remote system with its own up-to-date copy of the data is needed. All users and
operations must be switched over to the remote system. Alternatively, the
optimal high availability solution is an automated, localized, and limited
substitution of a single replacement component for the failed component. Server
farms and clusters substitute a single replacement component with minimal
disruption to the rest of the system and its users. Disaster recovery solutions are
much more drastic, disruptive, time-consuming, and heavyweight, because they
have to replace an entire system or data center, not just a single failed
component. Therefore, disaster recovery solutions are an inappropriate choice
for high availability.

Disasters, such as the World Trade Center destruction on 9/11/2001 or


Hurricane Katrina in New Orleans and the Mississippi Gulf Coast, can have a
devastating effect on businesses in their path. Organizations with business
continuity, HA, and DR plans were much more likely to rebound and recover from
9/11 and Katrina than those without such planning. Analysts estimate that a
significant number of businesses that suffer an extended IT systems outage due
to disaster go out of business within a year or two; other businesses never
resume operations at all. The obvious inference is that planning and preparing
for disaster recovery is a best practice for businesses of all sizes.

9.4.1 Disaster recovery concepts


There are two key metrics that play important roles in determining an appropriate
disaster recovery (DR) solution for a particular business and application. They
are Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

In certain cases, the most recent data changes at the production site, which
stretch back to a point in time prior to the disaster, do not make it to the recovery
site because of a time lag that is inherent in how the data is replicated. The
magnitude of this time lag is dependent on the particular type of data replication
technology that you choose. Assuming a disaster occurs, the recovery point is
the point in time before the disaster that represents the most recently replicated
data. How far back in time is the business willing to go after the disaster
happens? That is, the Recovery Point Objective translates to how much recent
data the business is willing to lose in a disaster.

230 IBM FileNet Content Manager Implementation Best Practices and Recommendations
The duration of time that passes before the systems can be made operational at
the recovery site is called the recovery time. The Recovery Time Objective is the
business’s time requirement for getting the system back online. That is, how
much downtime can the business endure?

Recovery Point Objectives and Recovery Time Objectives for different


businesses and industries range from seconds to minutes or days, even to
weeks, depending on business requirements.

9.5 Implementing a disaster recovery solution


Disaster recovery can be greatly facilitated by two key technologies. One key
technology is data replication to a remote recovery site, and the other key
technology is global cluster management software that can automate most of a
site failover to a recovery site after a disaster takes away the primary site. The
RTO and RPO for a particular business determine when these two technologies
are required for that business’s disaster recovery solution. With an RPO and
RTO measured in days to weeks, that is, if the business is willing to lose days to
weeks of data and can wait days to weeks for the system to come back online,
tape backup and restore are sufficient. But if an RPO of seconds to hours is
desired, a form of data replication is required. If an RTO of hours to weeks is
acceptable, replication alone might suffice. But if an RTO of seconds to hours is
desired, both replication and a global cluster manager will be required.

Next, we explore replication and global cluster managers in more detail.

9.5.1 Replication
Backing up to tape or other removable media is the minimum for copying data for
use after a disaster. You must ship the media off-site to a location outside of the
projected disaster impact zone. The greater the distance of the location from the
production site, the lower the risk that both production and recovery sites will be
impacted by the same disaster. One general rule is that a backup tape vault and
recovery site must be at least 30 miles away from the production system, which
in most cases, is sufficient to avoid a flood or fire disabling both sites. However,
sites that close together can still be in the same impact zone for earthquakes,
hurricanes, or power grid failures, so more cautious organizations separate their
production and recovery sites by hundreds, if not thousands, of miles.

Companies usually perform backups once a day, which meets only a 24 hour
recovery point objective, which means that as much as 24 hours of data can be
lost. The recovery time required for data restoration from tape can be days due to
the need to restore a series of tapes that represents a full backup and

Chapter 9. Business continuity 231


subsequent incremental or differential backups. So, you measure both RPO and
RTO in days if the only DR provision is tape backup.

For a better RPO, that is, to reduce the potential data loss in a disaster, you need
to periodically replicate the data to a remote disk, because periodical replication
can be done more often than tape backup, which effectively reduces the window
of data loss. Continuous replication that is done in real time can avoid any data
loss at all.

Note: When you use continuous data replication products, point-in-time


backups, such as tape backup or periodic replication, are still required in order
to recover from data corruption or accidental deletion. Continuous replication
copies the corruption or deletion to the replica; therefore, you need to be able
to fall back on a point-in-time copy prior to when the corruption occurred.

Note that there are several levels at which you can perform replication: the
application level, the host level, and the storage level. Database replication is the
best example of application-based replication. Host-based replication is beneath
the application level, but it still resides on the server host and typically runs at the
file system or operating system level. Storage-level replication is implemented by
the storage subsystem itself, frequently a Storage Area Network (SAN) device or
a Network-Attached Storage (NAS) device.

Application-based replication
Application-level software that understands the structure of data and
relationships between data elements can copy the data intelligently, so that the
structure and relationships are preserved in the replica. Database and
object-based replication are examples. Database replication insures that the
replica database is always in a consistent state with respect to database
transactions. Object-based replication insures that content objects that include
both content and properties are replicated as an atomic unit, so that the content
and properties are always consistent with each other in the replica.

Each database vendor has replication products that replicate just the database,
but not other data. Examples include IBM DB2 high availability disaster recovery
(HADR) and Oracle Data Guard. Database replication products are typically
based on shipping database logs to the recovery site to be applied to a database
copy there. The advantage of these products is that they keep the database
replica in a fully consistent state at all times, with no incomplete transactions,
which reduces the recovery time required when bringing up the database after a
disaster. The disadvantage of these products is that they have no means to
replicate anything other than databases. File systems that need to be kept
consistent with the database, for instance, have to be replicated by a different

232 IBM FileNet Content Manager Implementation Best Practices and Recommendations
replication mechanism, which introduces the possibility of inconsistency between
the database and file system replicas.

Host-based replication
In contrast to application-based replication, host-based replication has no
understanding of the data content, structure, or interrelationships. It detects
when a file or disk block has been modified and copies that file or block to the
replica. NetApp ReplicatorX™, Symantec Veritas Volume Replicator, and
Double-Take Software’s Double-Take are examples of host-based replication
products. Unlike application-based replication, they can be used to replicate all
forms of data, whether it is in a database, a file system, or even a raw disk
partition. Several of these products use the concept of consistency groups, which
tie together data in different volumes and allow all the data to be replicated
together, thereby maintaining consistency across related data sets, such as
databases and file systems. In contrast to application-based replication,
however, the replica is not guaranteed to be in a clean transactional state,
because the replication mechanism has no visibility into database or file system
transactions. Recovery can take longer, because incomplete transactions must
be cleaned up prior to making the data available again.

Storage-based replication
All of the storage vendors offer storage-based replication for their SAN and NAS
products. The storage products themselves provide storage-based replication
and do not use server host resources. Examples include IBM Metro Mirror
(PPRC) and Global Mirror (XRC), EMC SRDF and MirrorView, Hitachi Data
Systems TrueCopy, and Network Appliance SnapMirror®.

NAS products replicate changes at the file level, whereas SAN products replicate
block by block. In both NAS and SAN replication, as with host-based replication,
there is no knowledge of the structure or semantics of the stored data. So,
databases replicated in that way can be in any transient state with regard to
database transactions and hence might require significantly more database
recovery time when the replica is brought online. That increases the overall
recovery time.

NAS replication covers any data in the file system, whereas SAN replication,
which is at the lower level of disk blocks, covers all data stored on the disk.

An emerging specialization of storage-based replication uses a SAN network


device to intercept disk writes to SAN storage devices and manage replication
independently of both the server host and the storage devices. IBM SAN Volume
Controller (SVC) is an example of this type of product. It has the advantage of
being able to span heterogeneous SAN storage devices and replicate data for all
those devices in a consistent manner. You can think of the SVC as a new form of

Chapter 9. Business continuity 233


storage-based replication, because it resides in the Fibre Channel infrastructure
used to access SAN storage. Analysts have a new term for this kind of
replication: network-based replication.

Synchronous as opposed to asynchronous replication


Host-based and storage-based replication commonly support two modes of
operation: synchronous and asynchronous. Synchronous replication writes new
data to both the production storage and the remote recovery site storage before
returning success to the operating system at the production site for the disk write.
So, when the operating system signals that a disk write is complete, it has
actually been completed on both storage devices. You can think of synchronous
replication as logically writing the data at both sites at the same time. That means
that after a disaster strikes the production system, we know that the recovery site
has all the data right up to the last block that was successfully written at the
production site. Synchronous replication ensures that there is no data lost in a
disaster, as long as the recovery site survives the disaster. But to make the
latency for disk writes short enough, synchronous replication is typically feasible
only for sites that are separated by 60 miles (96.5 km) or less. Above that
separation, the wait for the write to the recovery site slows the overall speed of
the system significantly. The wait is a function of the distance between sites,
because signals can travel no faster than the speed of light between sites. At
more than 60 miles (96.5 km), the latency becomes too great in many cases,
although certain storage vendors are now extending this distance to 180 miles
(290 km).

For sites that are separated by more than 60 miles (96.5 km), asynchronous
replication is the choice. Asynchronous replication is not done in lock step, the
way that synchronous replication is. Instead, the local disk write is allowed to
complete before the write is completed to the second site. The update to the
second site is said to be done “asynchronously” from the local update, that is, not
in the same logical operation. This method frees the production system from the
performance drag of waiting for each disk write to occur at the remote site.
However, it opens up a time window during which the production site data differs
from the recovery site copy. That difference represents the data that is lost in a
disaster when asynchronous replication is used. In exchange for that data loss,
the two sites can be any distance apart, although the further apart they are, the
greater the typical data loss.

Storage vendors have devised a way to insure no data loss over any distance,
however, by a configuration involving a third copy as shown in Figure 9-6 on
page 235. This solution requires a nearby synchronous replica and a remote
asynchronous replica. The data from the production site is replicated
synchronously to a backup site within 60 miles (96.5 km), which is Site 2 in
Figure 9-6 on page 235, and replicated asynchronously to a remote site, Site 3,
any distance away. As long as only one of the three sites is lost in a disaster, it is

234 IBM FileNet Content Manager Implementation Best Practices and Recommendations
always possible to recover all the data from the remaining two sites. In the
diagram in Figure 9-6, if Site 1 is lost in a disaster, the synchronous copy at Site
2 holds all the data up to the moment of the disaster. From there, the data can be
replicated asynchronously to Site 3, the actual recovery site, thereby extending
zero data loss all the way to Site 3. It works, but the added replica and site can
be expensive.

Site 1 Site 2 Site 3

Synch Asynch

Asynch

< 60 miles Any distance

Figure 9-6 Zero data loss replication over any distance

Several vendors support an optimized version of the second site called a “bunker
site” where only the blocks not yet replicated are stored and no others. The list of
the blocks that have not yet been replicated is typically a small list, so a bunker
site can be configured with minimal storage space, which reduces the overall
cost of this solution. IBM Asynchronous Cascading Peer-to-Peer Remote Copy
(PPRC) is an example of this three-site zero data loss solution.

Comparing the replication options


What makes host-based or storage-based replication better than
database-based replication? First, storage-based replication has the advantage
that it allows a single replication product to be used for all data. With
database-based replication, the database is replicated separately from the rest of
the data, which can lead to inconsistency between the databases and the other
data stored in a file system, such as content data. Second, using a common
replication product for all data also simplifies the DR solution, which leads to less
required training of system administrators and less total cost of ownership
overall. Third, synchronous storage-based replication prevents any data loss,
whereas database-based replication typically is asynchronous and thus is
vulnerable to some data loss in a disaster. Host-based replication shares these
three advantages over database-based replication. Lastly, storage-based

Chapter 9. Business continuity 235


replication is implemented entirely by the storage device, whereas
database-based or host-based replication runs on the server and takes up server
resources. (Vendors of host-based replication products counter that the load on
the server is minimal and just a small percent.)

Why choose database-based replication over storage-based replication after you


see these disadvantages? The key reason is the lower recovery time that can
result from the database replica being in a cleaner state and hence requiring less
recovery processing. A database replicated via its native replication facility is
always in a clean database transaction state, so no incomplete database
transactions have to be rolled back when the backup database is activated. This
allows the system to recover more quickly, which can be viewed as more critical
than a small amount of data inconsistency, when minimal recovery time is of
paramount importance. Moreover, if all the data is stored in the database, which
is an option with the P8 Content Manager, database-based replication has no
consistency disadvantage or cost of ownership disadvantage.

9.5.2 Global cluster manager


The second key technology that is used in many disaster recovery solutions is a
global cluster manager. This is also called a geographic cluster manager by
certain vendors, but we use the term global cluster manager to distinguish it from
geographically dispersed clustering, which we described previously. Recall that
geographically dispersed clusters are still clusters in the sense of a heartbeat
between the nodes and failover if the active server fails; they just have the
servers dispersed over a distance as great as 60 miles (96.5 km). A global
cluster manager, however, extends an ordinary server cluster with the capability
to oversee multiple sites that are any distance apart. It manages local server
clusters at each site, controls replication between sites, and updates Domain
Name System (DNS) servers to redirect users to the recovery site system. Its
major function is to automate most or all of the process of failing over from a
production site to a recovery site after a disaster.

Most organizations prefer to have at least one manual decision step before
declaring a disaster, because of the gravity and cost of switching all operations
and users to a recovery site. But after that decision has been made, a global
cluster manager can automate the rest of the process. This is advantageous,
because automating the process reduces the chances of human error, makes
the process repeatable and testable, and thus increases the chances of a
successful site failover in the highly stressful period following a disaster. IBM
High Availability Cluster Multiprocessing/Extended Distance (HACMP/XD) is one
example of a global cluster manager with these capabilities for the AIX platform.
Symantec Veritas Global Cluster Option is another example that runs on a
variety of platforms.

236 IBM FileNet Content Manager Implementation Best Practices and Recommendations
9.5.3 Disaster recovery approaches
IBM FileNet Lab Services has defined three common approaches for disaster
recovery:
򐂰 Build it when you need it.
򐂰 Third-party hot site recovery service.
򐂰 Redundant standby system.

Build it when you need it


The lowest cost approach, but the slowest and the hardest to test, is to build a
replacement system after a disaster has occurred. There is nothing in place prior
to a disaster, which makes it extremely low cost, but it allows no testing either.
This approach has an RTO of days to weeks.

Third-party hot site recovery service


The second approach is to contract with a third party for a hot site recovery
service. Third parties, such as Sungard, IBM, and HP, have shared recovery
sites around the world that you can reserve by contract for use in the event of a
disaster. This approach costs more than the first approach, of course, but it also
offers a shorter recovery time, because the site is equipped and hot at the point
of disaster. Data has to be restored at the hot site, but no hardware has to be
acquired or configured. The third party providers include regular testing of
failover to their site as a part of their service, and IBM FileNet Lab Services has
an offering to assist you in setting up and testing the hot site and activating it in
the event of a disaster. This approach has an RTO of hours to days.

Redundant standby system


The third and most frequently chosen approach is a standby redundant system in
place at a client-owned and operated remote recovery site or at a third-party site.
This approach is the highest cost approach, because the cost of the redundant
system is not shared with anyone else. But it offers the shortest recovery time,
particularly if the data replica is constantly updated and available for use. It also
can be tested on a regular basis, which is in keeping with best practices for
insuring that a disaster recovery plan will actually work as expected when
needed. This approach has an RTO of minutes to hours.

Comparing the costs and technologies


No matter which of these DR options you choose, it is essential to have a copy of
the data off-site. Table 9-5 on page 239 summarizes the data backup or
replication choices and costs, as well as the recovery site choices. Table 9-5 on
page 239 shows the relationship between recovery time, recovery point, and the
type and cost of data replication required to achieve that recovery time and

Chapter 9. Business continuity 237


recovery point. Like high availability choices, the choices for disaster recovery
become exponentially more expensive as RTO and RPO approach the
minimums of hours to minutes. The cost increase is due to the changes in
disaster recovery technologies required to meet increasingly more ambitious
recovery times and points.

For an RTO of three days or more, the minimum level of data replication, namely
backup to tape, is sufficient. As we noted earlier, a form of point-in-time backup,
such as tape backup, is always required, regardless of RTO, as a means of
recovering from data corruption or accidental deletion. The solution is to retrieve
the latest backup tape or other point-in-time backup from the off-site storage
location and restore the data to a point in time prior to the corruption or deletion
of the data. Full data restoration from tape is a slow and laborious process, which
typically involves a full backup tape and a number of incremental backup tapes
after that, which takes days for completion. Backups are done periodically,
usually once a day, possibly multiple times a day, so the RPO for this minimum
solution is hours to days of lost data.

Periodic replication to off-site storage characterizes the next two solutions up the
cost curve with an increase in cost for communications links, but providing an
RPO and RTO of hours, not days. Periodic point-in-time backup to remote
storage, usually disk storage, is the first step up from standard local tape backup.
The next step up consists of shipping database or file system update logs to the
remote recovery site, where they are applied to a copy of the data to bring it up to
date with that log. These are both done on a periodic basis, but as the period is
shortened, it approaches the limit of continuous replication, which is the next step
up the cost curve.

238 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 9-5 Range of disaster recovery solutions
Recovery time Recovery point Cost Technologies

Minutes to an hour Zero data loss $$$$$$$$$$$$$ Hot standby site,


synchronous
replication, global
clustering

1-6 hours Minutes of data lost $$$$$$$$$ Hot or warm


standby site,
asynchronous
replication, global
clustering

6-12 hours Hours of data lost $$$$$ Warm standby site,


continuous or
periodic replication
or log shipping

12-24 hours Hours to days of $$$ Warm or cold


data lost standby site,
periodic backup to
remote storage

Days to weeks One or more days $ Cold or no standby


of data lost site, nightly tape
backups shipped
off-site

The cost now starts to accelerate upward. As the name implies, continuous
replication is the process of replicating data to the recovery site as it changes,
that is, on a continuous basis. Near continuous and continuous replication
greatly decrease the potential for data loss when compared to periodic
replication, which brings the RPO down to seconds worth of data loss, or even
zero data loss in the case of synchronous replication.

Disaster recovery time is similarly decreased with synchronous and


asynchronous replication, because the data is kept continuously in sync, or close
to it, at both sites. In the event of a disaster, no time is required to bring the data
up-to-date, as is the case with restoring from backup, periodic replication, or log
shipping, but time might be required for configuring and bringing up a duplicate of
the application environment on the replicated data. The RTO is in the range of
hours in that case, or, if a complete application environment is maintained at all
times at the recovery site, and global clustering is used to automate and speed
site failover, RTO can be in the range of just minutes.

Chapter 9. Business continuity 239


9.6 Best practices
Having defined the concepts of high availability and disaster recovery and having
detailed the key technologies and approaches used for HA and DR solutions,
what are the best practices for configuring P8 Content Manager for high
availability and disaster recovery from the available options and approaches?

Best practices for high availability


We start with high availability, which is summarized on the left side of Figure 9-7.
At the Web and presentation tier, where the IBM FileNet P8 Application Engine
provides the foundation for the WorkPlace and WorkPlace XT predefined Web
applications, as well as custom applications, the best practice is load-balanced
server farms. All the servers in this tier are active with incoming client requests
directed to the load balancer virtual IP address via DNS server entries and
distributed across the servers via software or hardware load balancers. IBM
FileNet P8 eForms, IBM FileNet Records Manager, IBM FileNet Team
Collaboration Manager, and IBM FileNet Site Publisher are all hosted on this tier
and thus must be deployed in load-balanced server farms for high availability.

Production Site Standby DR Site

DNS Server
Web/Presentation Tier
FileNet Server Farms
(AE, eF, RM, TCM, FSP)

Business Logic Tier


FileNet Server
Farms & Clusters

CE, PE CS, IS CE, PE CS, IS

Data Tier
Server Farms
& Clusters
Data Replication

Oracle RAC Oracle RAC DB2, SQL, Files


DB2, SQL, Files

Figure 9-7 Best practices for IBM FileNet P8 4.x releases

At the business logic tier, sometimes also called the services tier, the HA best
practices shown in Figure 9-7 are a mix of load-balanced server farms and
active-passive server clusters. The P8 Content Engine and Process Engine
servers must both be deployed in load-balanced server farms.2 The Content

240 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Engine has been qualified with both hardware and software load balancers; the
Process Engine requires a hardware load balancer as of P8 4.0.

IBM FileNet Content Services and IBM FileNet Image Services repositories can
be federated with the P8 Content Manager via Content Federation Services.
Both of these older products must be deployed in active-passive server clusters
for high availability; they do not support being deployed in load-balanced server
farms.

At the data tier, all the database servers can be deployed in active-passive
server clusters for HA. Oracle can also be deployed in its load-balanced RAC
configuration for HA. The Content Engine makes use of network file shares for
file storage areas for content storage and index areas for content-based search
indexes, so the network file servers or NAS devices underlying the Content
Engine file storage areas and index areas need to be highly available as well. For
a network file server, the typical HA configuration is an active-passive server
cluster; NAS devices typically have either active-active or active-passive
configurations for HA.

Best practices for disaster recovery


For disaster recovery, the best practice is dependent on RTO and RPO. In all
cases, point-in-time backup to tape or disk is the best practice for protection
against data corruption or accidental or malicious deletion. For any RTO/RPO
values less than days to weeks, we recommend a form of data replication for the
best practice, such as at least a warm recovery site. At the high end, with RTO
and RPO in the range of minutes to hours, a hot recovery site and global
clustering are best practices to automate and speed up the process of site
failover, and near continuous to continuous replication is also the best practice.
Zero data loss requires synchronous replication to a bunker site or intermediate
site if the distance to the remote recovery site is too great. For the absolute
minimum RTO, on the order of minutes, database-based replication, in addition
to storage-based or host-based replication for the other data, is the best practice.
For the best data consistency after a disaster, at the risk of adding minutes to an
hour of database recovery time to RTO, the use of a single replication
mechanism for all data, combined with consistency groups, is the best practice.

The best practice for redirecting the user community to the replacement systems
at the recovery site is via DNS updates. DNS aliases (CNAMES) must be used
by the user’s client computers to locate the P8 Content Manager services, so
that the aliases can be redirected after a disaster through DNS updates. This
redirection allows reconnection to the recovery site without making any client
2
Prior to P8 4.0, the Content Engine supported both farming and clustering for its Object Store
Services component, but only active-passive clustering for its File Store Services component. For
4.0, these components were unified and now support farming across the board. Prior to P8 4.0, the
Process Engine required active-passive server clustering for high availability.

Chapter 9. Business continuity 241


computer changes. The DNS servers themselves must be redundant, of course,
to avoid being a single point of failure.

Combining HA and DR into a single solution


There is a common temptation to try to simplify business continuity by combining
high availability and disaster recovery into a single solution. The idea is to locate
a second site within the same metropolitan area as the production site and make
both sites active with each site having a full copy of the data. This is a workable
approach when the data being managed is essentially static, as in a corporate
Web site. Changes to the Web site are carefully reviewed and managed and
then pushed out to multiple hosting sites in parallel, and incoming user requests
can be load-balanced across the sites. If one of the sites goes down or even is
lost in a disaster, user requests can be directed to the other site for continuous
access to the largely static content (assuming the second site is far enough away
to be out of the disaster’s impact zone).

Why does this approach not work with Content Manager? The key is the nature
of the data and how it must be managed. P8 Content Manager, as the name
suggests, is designed to manage rapidly changing and growing collections of
data that are being accessed and modified in parallel by users across an
enterprise. Unlike the largely static data of a corporate Web site, which is
published or released to the site in a carefully controlled authoring and
information publication process, content in a typical P8 Content Manager object
store is being collaboratively authored, enhanced, deleted, created, and
processed in a dynamic manner under transaction control to avoid conflicting
changes. As a result, only a single active copy of the data can be online and
changeable at any point in time so that transaction locking can be enforced and
changes are saved in a safe, consistent manner. This means that the basic idea
of two sites, in which each site has an active copy of all the content, is not the
best practice for a transactional system. It is not supported by the P8 Content
Manager.

A related temptation is to deploy a disaster recovery solution with a standby


(inactive) copy of the data at the recovery site and depend on this single solution
for both high availability and disaster recovery. This can be done with P8 Content
Manager, but there is a clear trade-off that you need to carefully consider.
Relying on a disaster recovery configuration for high availability compromises the
availability target for the system, because any failure leads to a full site failover
as though the entire production site had been lost in a disaster. A site failover is a
time-consuming, complicated process that necessarily takes much longer than a
single server failing over to a local passive server in a cluster, and even longer
than the nearly instantaneous switch to another, already-active server in a server
farm when a server fails in that farm. The net result is that high availability (in the
range of 99.9% and higher) is not reachable when every local failure triggers a
full site failover (and later a full site failback to return to a protected state).

242 IBM FileNet Content Manager Implementation Best Practices and Recommendations
How about using geographically dispersed farms and clusters, that is, with the
farms and clusters split between the two sites? If one server fails, the server at
the other site takes over, either coming up at the time of failure in the case of an
active-passive server cluster or simply taking on redirected client requests in the
case of server farms. Again, there is an availability trade-off because of the
added risk of communication problems between the two sites. As we noted
earlier, we do not recommend geographically dispersed farms and clusters as
best practice because of the added risk and higher networking costs.

So the best practice is to deploy local server farms and clusters for high
availability in order to provide for continuing service in the event of local
component failures and to deploy a second site with data replication and,
optionally, global clustering, to provide for rapid recovery from disasters. The
best practice is to locate the recovery site outside the disaster impact zone of the
production site.

9.7 Product documentation for HA and DR


You can obtain product and technical documentation for the IBM FileNet P8
Platform at the following Web site:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

Two documents (downloadable from this Web site) are devoted to high
availability and disaster recovery:
򐂰 FileNet P8 High Availability Technical Notice
򐂰 FileNet P8 Disaster Recovery Technical Notice

Chapter 9. Business continuity 243


244 IBM FileNet Content Manager Implementation Best Practices and Recommendations
10

Chapter 10. Deployment


In this chapter, we discuss solution deployment. We provide advice about how to
automate deployment for IBM FileNet Content Manager (P8 Content Manager)
solutions.

We discuss the following topics:


򐂰 Environment
򐂰 Process management
򐂰 Deployment approach
򐂰 Deployment by cloning
򐂰 Deployment by export, transform, and import
򐂰 P8 Content Manager deployment

For those of you who are not interested in concepts and background information
but the details of P8 Content Manager deployment, you can skip to the last
section, 10.6, “P8 Content Manager deployment” on page 264, for specific
information.

© Copyright IBM Corp. 2008. All rights reserved. 245


10.1 Environment
Before we discuss solution deployment, let us first discuss software development
environments.

Many people involved in software development have different responsibilities and


requirements for their working environment. It is hard to set up a single
environment that works for everyone. Creating multiple environments for different
purposes is a common practice in software development.

With multiple environments, more people can work on different tasks


simultaneously without interfering with each other. For example, you can have an
environment for developers to create code, an environment for developers to
perform functional testing, and another testing environment for system
integrators to test everything. Every company needs to have a production
environment in which only tested and deployed software runs. Development and
testing must not be done in the production environment.

Synchronizing the various environments becomes a new challenge. You want to


make sure that every environment behaves identically after having the same
changes applied. This verification ensures that no surprises occur after deploying
the software to the production environment. Of course, because development
and even the test environments usually do not have the same hardware as the
production environment, performance test results typically differ.

10.1.1 Multi-stage deployment environments


The term environment in this chapter describes a collection of servers that
typically belong to one P8 domain for one particular purpose. The purpose can
be development, functional testing, system testing, or production.

Typical projects split their infrastructure into at least three environments:


򐂰 Development
򐂰 User acceptance/Testing/Quality assurance
򐂰 Production

Best practice: Development must not be done in the same environment as


the production site or test site. Segregating these activities in different
environments avoids the introduction of unwanted configuration changes or
code changes by developers before those changes are ready to be tested or
put into production.

246 IBM FileNet Content Manager Implementation Best Practices and Recommendations
While trying to isolate phases of the software development cycle into different
environments, the complexity of maintaining the different states becomes
challenging. Deployment must be maintained in an organized manner.

Larger companies tend to add these additional environments to the basic three
environments identified earlier:
򐂰 Performance testing
򐂰 Training
򐂰 Staging

Reasons for adding more environments can include:


򐂰 A need to mitigate risks associated with multiple projects running at the same
time interfering with each other, while retaining the ability to reproduce errors
from the production system in a test environment
򐂰 A need for multiple training environments so that many people can be
educated in an extremely short period of time

The more environments that you have, the more important it is to maintain and
synchronize them properly.

The segregation of environments by IBM FileNet P8 domain is optional, but in


most situations, the isolation achieved by this approach is optimal to allow people
to work simultaneously and independently on the same project but in different
phases without adversely affecting each other. In particular, giving each
environment its own IBM FileNet P8 domain makes it easy to grant domain-wide
permissions in each environment to different groups. For example, developers
can be given full permission to configuration objects in the development
environment but little or no permission to configuration objects in the production
environment.

10.2 Process management


While the industry is shifting to standardized development approaches, such as
.NET or Java 2 Platform, Enterprise Edition (J2EE), there is a shift from
architectures, such as client/server, toward more distributed architectures that
span multiple tiers. Therefore, naturally the process management that guides
complex projects has evolved as well.

In this section, we focus on the following areas of process management:


򐂰 Release management
򐂰 Change management
򐂰 Configuration management

Chapter 10. Deployment 247


Figure 10-1 provides an overview of release, change and configuration
management during an IBM FileNet Content Manager solution deployment from
development to production.

Development QA, System Testing, Perf. Testing Production

Release Management

Release
planning SW
Design
Build Quality Release Rollout Implement Verify
configure review accepted plan Release Release

Testing

System
Unit System
integration
Test testing
testing
Integration
tests User
Performance
Acceptance
Unit
Test
Regression Regression Regression

Request Change and configuration management


For CMDB
Change

Figure 10-1 Overview of the support processes for deployment

Figure 10-1 shows three phases of a software development life cycle:


development, testing, and production. Each phase can correspond to one or
more individual environments. The activities for release management must be
executed in a sequential manner.

The illustration in Figure 10-1 also shows the change and configuration
management processes. Although the configuration database (CMDB) maintains
the state of involved software and hardware assets, in typical client situations, a
more detailed CMDB just for IBM FileNet Content Manager deployment is
needed. You can maintain a few spreadsheets with the detailed information,
which allows you to track every change. It is absolutely crucial to have a good
change management process and to track the same level of detail for the
non-production environments.

In the next few sections, we present more details about release, change, and
configuration management, as well as testing, before we dive into a discussion of
moving the applications from development to production.

248 IBM FileNet Content Manager Implementation Best Practices and Recommendations
10.2.1 Release management
Over the past several decades, we have seen an evolution in enterprise
architecture. The evolution has gone from monolithic architectures
(COBOL-based programs running on mainframes) to component-based
architectures (J2EE and .NET applications) and toward service-oriented
architectures (SOA). The changes transform the enterprise into a highly
interoperable and reusable collection of services that are positioned to better
adapt to ever-changing business needs.

As architectural approaches lead to more reuse and separation, the development


of enterprise applications continues to require well-defined processes and more
tiers of technology. As a result, certain areas of enterprise application
development increase in complexity. In enterprise development (for example,
J2EE and .NET), software vendors have made many efforts to reduce this
complexity by providing advanced code generation and process automated
tooling and simplifying complex aspects of enterprise development through the
usage of proven design patterns and best practices.

It is important to understand the requirements and to implement them in a


systematic way. One way to meet this challenge is by introducing the role of a
release manager. In most large companies, this role becomes crucial for large
scale deployments.

A software release manager is responsible for handling the following tasks and
requests:
򐂰 Risk assessment
򐂰 Deployment and packaging
򐂰 Patch management (commercial or customized bug fixes)
Commercial patches for the runtime environments (for example, for operating
systems and application servers)
򐂰 From the software development area:
– Software change requests (modifications)
– New function requests (additional features and functions)
򐂰 From the quality assurance (quality of code) area:
– Software defects of custom code/commercial code
– Testing (code testing)
򐂰 Software configuration management (the rollout of new releases)

Chapter 10. Deployment 249


Release management is concerned with the features and functions of the
software; how the software is designed, developed, packaged, documented,
tested, and deployed.

A solid release management process can produce the following documentation:


򐂰 Project plan
򐂰 Release notes
򐂰 Test matrix, test plan, and test results
򐂰 Installation scripts/documentation
򐂰 Support documentation
򐂰 User documentation
򐂰 Training material
򐂰 Operations documentation

The release manager for an IBM FileNet Content Manager solution might find the
following documentation and information helpful in performing release
management tasks:
򐂰 Hardware and software compatibility matrix from the customer support site
򐂰 Available export and import options to deploy the solution between
development and production environments. Search and replace scripts used
to prepare exported assets for use in the target environment where object
stores, users, or groups differ from the source environment
򐂰 Deployment guidelines from the customer support site
򐂰 Online help

In addition, the Rational® product line from IBM can be helpful in supporting
release management, change management, and testing. For reference, go to:
http://www.ibm.com/software/rational

Release management delegates several of the underlying support processes to


the change and configuration management that is discussed in 10.2.2, “Change
management” on page 251 and 10.2.3, “Configuration management” on
page 253.

A release can consist of multiple components in specific configurations of the


involved components. Release management handles the validation of
combinations of application releases, commercial components, customized
components, and others. While a specific component is developed on the basis
of a concrete version of its underlying commercial application programming
interface (API), at the moment of deployment to production, this combination
might have changed in the bigger context of the solution. The management of
combinations of versions of involved components is a time-consuming activity
and needs to be scheduled and planned carefully and early.

250 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Another aspect of release management deals with objects that have been
created in production that affect the configuration of the solution and might
impact deployment. In IBM FileNet Content Manager solutions, these types of
objects include folder structures, add entry templates, search templates, and
others. Release management must have a strategy in place to handle or restrict
bidirectional deployment between multiple environments.

Best practice: Have a strategy in place to handle or restrict changes to the


production environment that might affect the overall solution configuration and
future deployment. Have a policy that all application changes be made first in
development/test environments, then deployed to production.

10.2.2 Change management


In most organizations, change management is the process of overseeing,
coordinating, and managing all changes to:
򐂰 Hardware
򐂰 Communication equipment and software
򐂰 System software
򐂰 All documentation and procedures associated with running, supporting, and
maintaining production systems

With IBM FileNet Content Manager solutions, there are two issues associated
with change management that you must consider:
򐂰 The number and details of configuration items needed for a proper
deployment might overwhelm a configuration management database (which
supports the change management process for all IT-related changes, not only
changes for IBM FileNet Content Manager).
򐂰 When the development system is not part of a change management process,
situations can occur in which changes are applied to the development
environment without being documented or in an uncontrolled manner.

Best practice: Every change applied to the production system must be


carefully tracked, documented, and, where possible, automated. Automation
of changes reduces the risks of error-prone manual deployment processes.

Consider the following areas related to P8 Content Manager when managing the
change process:
򐂰 Commercial code and assets (versions of P8 Content Manager, as well as
individual patches and levels of its components, such as Content Engine,
Application Engine, and Process Engine)

Chapter 10. Deployment 251


򐂰 Custom code and assets (for example, the code versions of the application
leveraging a commercial API, such as the P8 Content Manager API or
versioned assets, such as document classes)

Best practice: Deployment starts in the development phase. Incorporate a


defined build process that acknowledges changes to commercial components
and custom components in a controlled manner.

At the beginning of deployment, you must not handle commercial code and
custom code separately. For the targeted solution release, everything must be
assembled via an automated process if possible. The combining of custom and
commercial code must be handled by release management and documented by
configuration management.

Figure 10-2 shows the areas for which you distinguish between custom (yellow
area) and commercial code (green area):
򐂰 Workplace
򐂰 Object store

Workplace

Your
war - file
Application Engine

Document
Classes

Documents

Property Templates

Content Engine FileNet Repository

Figure 10-2 Custom Code at the level of object store and Workplace

For more information regarding the separation of custom and commercial code
and the build process for Workplace, refer to 10.6.4, “Exporting and importing

252 IBM FileNet Content Manager Implementation Best Practices and Recommendations
other components” on page 269. For more information regarding the separation
of custom and commercial assets during repository design, refer to Chapter 5,
“Basic repository design” on page 85.

10.2.3 Configuration management


Typical P8 Content Manager projects use three environments, but many projects
use five or more environments to satisfy the diverse needs of development,
training, testing, staging, performance measuring, and production. Every
environment has its own set of configuration items, such as server names, IP
addresses, and versions of the various components (commercial and
customized).

While an enterprise configuration management database might not be suited to


keep track of all parameters needed for the deployment process, it must be
under the responsibility of configuration management to keep track of applied
changes.

From our experience, when performing automated deployments for P8 Content


Manager-based applications, it is generally a good practice to employ a
centralized datastore. The centralized datastore tracks the specific values of
parameters, such as Object Store name, Object Store GUID, directory objects
prefix per environment, virtual server name of Content Engine farms, virtual
server names of Application Engine farms, database names, database server
names, and ports. These parameter values can be used by the build process for
specific environments.

Retain a zip/tar file of all release-specific data, including code, exported assets,
and documentation, in a central datastore. Typically, you maintain
release-specific data by using a code version control system. IBM clients can use
Rational Clear Case, for example.

Best practice: Implement a central datastore that tracks the parameters,


such as GUIDs, Object Store Names, and Project Names, that you need for
the deployment. The datastore needs to be implemented for all target
environments in one location that is accessible to every environment.

10.2.4 Testing
There are multiple ways to address environments associated with testing. One
way is to split testing into two major phases, which typically happen in different
environments:
򐂰 Development environment

Chapter 10. Deployment 253


In this environment, the following tests are commonly conducted:
– Unit testing verifies that the detailed design for a unit (component or
module) has been correctly implemented.
– Integration testing verifies that the interfaces and interaction between the
integrated components (modules) work correctly and as expected.
– System testing verifies that an integrated system meets all requirements.
򐂰 Testing environment
In this environment, the following tests are commonly conducted:
– System integration testing verifies that a system is integrated into the
external or third-party systems as defined in the system requirements.
– User acceptance testing is conducted by the users, customer, or client to
validate whether they accept the system. This is typically a manual testing
process with documented expected behavior and the tested behavior.
– Load and performance testing.

Best practice: Whenever a software system undergoes changes, verify that


the system functions as desired in a test environment, before deploying to
production. Include time and resources to test and make corrections based on
testing whenever planning and scheduling a software release. Applying this
best practice without fail helps avoid costly and time-consuming problems in
production.

Regression tests
For all environments, regression testing must be implemented to enable a quick
functional test to see whether all components are up and running. A baseline
version might include from each relevant aspect one test object, such as a test
document class, a test search template, a test folder, and a test workflow.

The regression test must be used after having modified software (either
commercial or custom code) for any change in functionality or any fix for defects.
A regression test reruns previously passed tests on the modified software to
ensure that the modifications do not unintentionally cause a regression of
previous functionality. Regression testing can be performed at any or all of the
previously mentioned test levels. The regression tests are often automated.
Automating the regression test can be an extremely powerful and efficient way to
ensure basic readiness.

254 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Best practice: Establish a small suite of regression tests in each
environment. The best synergies are achieved by having the deployment of
the test assets and the test script as automated as possible. One side effect is
that this automation of regression tests affects the repository design.

Test automation
Two areas of consideration for automating tests are:
򐂰 The load and performance test
򐂰 The regression test

While the load and performance test might be executed only on major version
changes (commercial or custom releases), the effort to maintain the code for the
automation might be substantial.

The regression test must be generic enough so that the scripts are written once,
and maybe updated if there are minor changes, but typically stay pretty stable
over time. We recommend that you store the scripts together with the supporting
version of the test application in one location. Typically, this location is your
source code version control system.

Test automation tools are available from IBM and other vendors. For example,
refer to the IBM Rational products Web site at:
http://www.ibm.com/software/rational

Best practice: Distinguish load and performance tests from regression tests.
Each area has its own characteristics.

You can typically use the existing testing infrastructure for load and
performance tests. For regression testing, it typically makes no sense to use a
centralized large and complex infrastructure. It is more important that the tests
can be executed and will quickly show simple results.

Test documentation
Before we move to a discussion of the actual deployment, we must discuss the
testing documentation and its importance.

In a P8 Content Manager project, multiple departments with different skill sets


are typically involved, and it is very difficult to perform user acceptance testing or
integration testing without having a clear concept of what needs to be tested,
how it must be tested, what the expected behavior is, and how the tests must be
conducted.

Chapter 10. Deployment 255


Documenting the test cases with descriptions of the inputs and expected
behavior is useful. Test descriptions must have enough information to achieve
repeatability, which means that multiple testers can perform the same test (in an
identical environment) while working from the test documentation and get the
same results. After the execution of the tests, collect and document all of the
observed system behaviors. Using this information, the release manager can
decide to proceed with the new release or to delay the release if there are more
bugs to fix.

Several of the tests might fail. It is crucial to document the behavior but also the
resolution. The knowledge base must be part of problem management.
Combining test documentation with a searchable interface to find known
problems is very advantageous.

Best practice: Carefully document your tests with sufficient detail before the
tests are executed. Make the test documentation database searchable to
search for problems previously seen by users. Create a knowledge database.

10.3 Deployment approach


The term deployment is typically used in two contexts when mentioned in
combination with P8 Content Manager solutions:
򐂰 In a broad sense, the term includes all of the activities that are needed to
move from one environment to another environment.
򐂰 In a stricter sense, the term describes the actual execution. Less frequently,
this type of deployment is called migration or transport. It describes the
typical automated process of export, transform, and reimport of
application-related items, such as application code, configuration settings,
and repository assets.

Most of the implementation preparation work occurs in the development


environment. In the development environment, the IBM FileNet P8 components,
the J2EE/.NET applications, and configurations provide an exportable blueprint
for the same configuration that is used for testing and the quality assurance
environment.

There are three ways to deploy (transport) changes from one environment to
another:
򐂰 Cloning
򐂰 Exporting, transforming, and importing
򐂰 Scripted generation of all the necessary documents and structures

256 IBM FileNet Content Manager Implementation Best Practices and Recommendations
10.3.1 Cloning
You can deploy changes from one environment to another by cloning the source
environment and bringing it alive as a new but identical instance of the source
environment.

Cloning is practical when you temporarily need a dedicated environment and it


must be an exact copy of what you have already in place. For example:
򐂰 In a training class, you need to be able to quickly revert or go forward to a
well-known working environment (by a teacher over lunch time) for the next
class or for the next part of the lectures (because a few students might not be
able to follow the exercises and then are not be able to continue with the rest
of the class if their environment has not been set up correctly.)
򐂰 Many parallel identical training environments are needed to educate more
people in a short period of time.
򐂰 Development environments are needed to work in parallel.
򐂰 Test environments are needed for specific tests.

You can use local VMware-based images to clone a system. For a large system,
however, this might not be a workable solution. Large systems are often not as
flexible as small systems, or there is a lack of powerful machines that can be
made available in a timely manner for cloning. Sometimes, the security and
networking policies do not allow these virtual environments to connect to
back-end machines.

The next logical step is to use virtual farms that host applications at larger client
sites. This approach might not be practical for the following reasons:
򐂰 From the corporate network, they cannot be accessed unless using remote
desktop applications. A direct interaction is not possible due to using the
same host names and IP addresses multiple times in the same network.
򐂰 Single virtual images are typically not powerful enough for the full stack of
components that are needed for a solution (which includes directory server,
database, application server, and other P8 Content Manager components).

A good way to still rely on virtualization techniques is described in 3.3.1, “A


virtualized IBM FileNet P8 system” on page 39. The solution is mainly built on
individual images hosting the various IBM FileNet P8 components, including
database and directory server. The images are accessed by a gateway, which
shields the network topology of the IBM FileNet Content Manager solution from
the corporate network by using network address translation (NAT) and virtual
private network (VPN) access.

Chapter 10. Deployment 257


You clone an environment by copying all files representing the storage of virtual
images. With this approach, you can clone an environment within hours with very
little know-how. Use this approach predominantly for development and training.

10.3.2 Export, transform, and import


A common way of deploying an environment is the process of exporting,
transforming, optionally installing, and importing. A key advantage of this
approach is that it allows you to carry forward incremental changes from the
source to the target environment without requiring the recreation of the entire
target environment. The major difficulty with this approach lies in the number of
dependencies between different components and the number of manual steps
that are needed to achieve a target configuration that matches the source
configuration.

In the past, we have seen projects struggle for months when using a manual
process to move J2EE applications that include IBM FileNet Content Manager
components. Today, we can deploy similar projects within one to two days. The
following factors contribute to the improvements:
򐂰 Introduction of a solid release management process
򐂰 Separation of commercial code from custom code and automation of the build
process mainly for J2EE-based or .NET-based applications
򐂰 Adherence to the proposed guideline of stable GUIDs to reduce
dependencies (as described later in this chapter)
򐂰 Implementation of a central datastore (database-based or file-based) in which
environment-specific information is stored
򐂰 Automation wherever possible

Deployments typically apply assets from the development environment to the


production environment. There are cases for which you might consider the other
way around, too:
򐂰 Documents with configuration characters, such as search templates, add
entry templates, and folders, that have been created in a production
environment (This raises the question of what a release needs to contain and
restrict.)
򐂰 Hot fixing a serious production problem in the staging area
򐂰 Populating a training environment with production data

The activity for transformation can take place as described in the IBM FileNet P8
Platform Planning and Deployment Guideline, before or just after import. Custom

258 IBM FileNet Content Manager Implementation Best Practices and Recommendations
scripts can be called to make the necessary transformation. The transformation
can also be conducted on the exported files before importing.

10.3.3 Scripted generation


This approach assumes that after a basic object store has been generated,
every property template, document class, folder, and other structural asset is
generated by a script.

This approach has been proven to work, but the effort to maintain this type of
script is huge and every change must be put into the script code. All of the
benefits of using a tool, such as IBM FileNet Enterprise Manager, are lost with
this approach. There is very little benefit in using this approach unless it is to
overcome limitations where there is no alternative. This approach can be used to
create marking sets or to maintain application roles. We will not discuss this
approach further due to its limitations.

10.4 Deployment by cloning


One of the biggest challenges to making an environment clonable is the system
dependencies that cannot be easily removed. An example is the dependencies
related to a Microsoft Active Directory®. Changes might take time to implement
and can impede the cloning process.

10.4.1 Topology
Figure 10-3 on page 260 illustrates a clonable topology with three identical
environments using VMware images. Every domain is formed by a collection of
servers, which are part of multiple VMware images. All images of one domain are
connected over a private network to a special image called the router. The router
implements network address translation (NAT) and virtual private network (VPN)
gateway functionality. This can be done using Microsoft Remote Access Server
or other products. The other network link of the router is mapped to a network
card of the brick, which is accessible by the corporate network.

To clone the environment, only the router image has to be modified and the
public interface needs to be set up correctly. The Application Engine resolves the
public Domain Name System (DNS) of the router image.

Chapter 10. Deployment 259


router router router

Figure 10-3 Three identical environments using VMware images

10.4.2 Access to the environment


There are two clients:
򐂰 The user’s workstation
򐂰 A development system hosted on VMware running on the user’s workstation

Even though a large group of developers has all of the tools necessary to
perform their tasks, developers might prefer to have a pre-configured image to
run on their individual workstations. If Microsoft Active Directory is used, use a
VMware image that was initially part of the same Active Directory.

10.4.3 Post-cloning activities


After cloning an environment, consider resetting passwords and generating new
users and groups for the project. These tasks need to be automated as much as
possible.

10.4.4 Backup changes


There is no real benefit in backing up a cloned environment. However, it makes
sense to back up the folder that contains all of the changes to the environment.

260 IBM FileNet Content Manager Implementation Best Practices and Recommendations
10.5 Deployment by export, transform, and import
In this section, we discuss deployment by export, transform, and import either for
a full or incremental deployment.

10.5.1 Incremental deployment compared to full deployment


The two major types of deployments are:
򐂰 Full deployment
򐂰 Incremental deployment

A full deployment for a P8 Content Manager solution means that both the
structure information and the documents are deployed in one iteration. The
target environment gets everything with the assumption that the target object
store is empty.

An incremental deployment for a P8 Content Manager solution means deploying


only the changes made since the last deployment. Documents, custom objects,
and other objects might have already been instantiated. New changes to the
structure must respect the associated constraints.

For example, you need to import an object containing a reference to another


object. You cannot import this object if the referenced object does not exist in the
object store.

Full deployment is a very powerful vehicle to move a project the first time through
the various stages of deployment. You only perform a full deployment one time
for a project.

After having populated an object store with documents, it is impractical to do full


deployments any longer due to the following reasons:
򐂰 The number of documents that you need to move from production into
development and then propagate back is typically too high.
򐂰 Security restrictions often prevent us from moving production data to other
environments.
򐂰 The production system cannot be stopped for the duration of creating the next
release.
򐂰 The duration for moving documents is much higher than just applying
structure.
򐂰 There are documents created in production that have configuration
characteristics, such as search templates and entry templates, which you
might consider as part of a release or just as another set of documents.

Chapter 10. Deployment 261


An incremental deployment means the propagation of changes that will transition
an environment from a given status (existing release) to a new status (new
release).

There are multiple ways to figure out the differences between the two releases:
򐂰 Manually
򐂰 By strictly rolling forward changes from the source environment to the target
environment and preventing any changes to the target environment between
releases
򐂰 Automated discovery of the differences

Manually detecting the differences between the source environment and the
target environment is time-consuming and error-prone. This option is only valid
for small deployments.

Clients typically choose the second option with the consideration that someone
has manually verified both environments. In a multi-stage environment, there is a
good chance that mistakes in this approach will be detected in the first
deployment step from the development environment to the test environment.
When errors are detected at this point, there is an opportunity to fix the
underlying problems and retry the same procedure. As soon as the deployment
to the test environment passes testing (and is documented), the future
deployment to production most likely works smoothly.

The third option is extremely difficult to achieve and potentially too expensive.
There are a lot of exceptions when just comparing date times between the
various environments. A development or source environment might include more
objects than will be used for the target deployment. So, a selective tagging of
objects that are part of a release seems to be mandatory.

10.5.2 Reduce complexity of inter-object relationships


You can reduce the complexity of inter-object relationships through the usage of
GUIDs.

A Globally Unique Identifier (GUID) is a 128-bit data identifier, which is used to


distinguish objects from each other. The algorithm used to create GUIDs works
in a way that is extremely unlikely to produce duplicate IDs. The underlying
database scheme of an object store also ensures the uniqueness of such IDs
(within the same table). GUIDs prevent two objects from having the same ID.
This helps if there is ever a need to merge objects from multiple object stores into
a common object store, because the IDs of the objects do not overlap.

262 IBM FileNet Content Manager Implementation Best Practices and Recommendations
When moving objects between multiple environments, you must consider
dependencies. Objects are often dependent on other objects in the object store
or on external resources. Examples:
򐂰 An Add Entry Template references a folder.
򐂰 An application’s Stored Search definition is an XML document in an object
store. The XML content references multiple object stores by name and ID.
򐂰 A document references an external Web site that contains its content.

While there is no work-around for external dependencies, there is one for inter-
object store dependencies by keeping the identification of these objects (GUIDs)
consistent across the various environments. This is not in contradiction to the
previously mentioned uniqueness of GUIDs, but it is rather a consequence for
two reasons:
򐂰 The objects, which are considered to be kept consistent with the same GUIDs
across object stores, have configuration characteristics, such as document
classes, folders, property templates, add entry templates, search templates,
and others.
򐂰 The predefined population of an object store after you run the object store
creation wizard follows the same pattern.

In Figure 10-4 on page 264, we show two options of how to deploy a search
template that has a dependency on a folder structure. While you might argue that
there are better ways to reference folders by referring to a full path, you might
discover similar situations where there are good reasons to depend on a GUID.
In the first option, we followed the practice of using stable GUIDs which we did
not do in the second option:
򐂰 Deploying the folder with the same GUID leads to no additional corrections
deploying the search template above.
򐂰 Deploying the folder and letting the system generate a new GUID leads to a
situation where the search template must be changed to refer to the deployed
folder.

You can avoid the extra effort of maintaining the dependencies in the target
environment by following the pattern of having stable GUIDs.

Figure 10-4 on page 264 illustrates the deployment from development with stable
GUIDs in the top box on the right and not applying stable GUIDs in the lower box
on the right. Not following the stable GUID pattern results in maintaining the
dependencies with additional deployment logic.

Chapter 10. Deployment 263


Development Test

Object Store A

456
123

Object Store A

456
123
Test

Object Store A

885
789

Figure 10-4 Example of using stable and non-stable GUIDs

10.5.3 Deployment automation


The more environments that there are to maintain, the more practical it is to
automate the deployment to achieve the following goals:
򐂰 Save time.
򐂰 Reduce errors.
򐂰 Reduce risks.
򐂰 Ensure similarity among environments.
򐂰 Reproduce problems.

10.6 P8 Content Manager deployment


The process of deployment using IBM FileNet Enterprise Manager is described
in the IBM FileNet P8 Platform Planning and Deployment Guide available on the
IBM FileNet support site.

264 IBM FileNet Content Manager Implementation Best Practices and Recommendations
There are three major types of objects to be exported:
򐂰 Structure (such as document classes and folders)
򐂰 Configuration documents (such as templates and workflow definitions)
򐂰 Business documents (such as faxes, e-mails, and images)

Configuration documents do not contain business content but contain


configuration information that is used by an application. Configuration documents
might need a transformation step before being deployed to the target
environment, because they might hold information about dependencies.

Business documents contain business information and are viewed by users.


Business documents typically do not need transformation when being deployed,
because they have no internal or external dependencies.

10.6.1 CE-Export
When preparing an export, you need to consider the granularity of the export. It is
usually unnecessary to include everything in one export run. If you need to fix a
problem, we recommend having multiple exports addressing smaller chunks of
data rather than one huge XML file that describes everything.

Known successful deployments use the following practices:


򐂰 Break the deployment apart into a hierarchy of exports.
򐂰 Strictly separate configuration documents from business documents.
򐂰 Avoid dealing with user and group information in your exports and address
this topic at a later time.

Best practice: Reduce complexity of exporting by splitting a large export into


smaller logical chunks. Separate structure and configuration documents from
business documents.

Hierarchy of exports
Build a logical hierarchy of exports, which can help you to test the imports
sequentially and fix dependencies more easily.

Certain objects, which include all P8 Content Manager domain level objects,
including marking sets, cannot be exported. There are Application
Engine-related objects that cannot be exported as well. See AE-based CE
Export 10.6.4, “Exporting and importing other components” on page 269.

If marking sets have been used, you have to create them manually in the target
system. The export and import sequence is:

Chapter 10. Deployment 265


1. Marking sets
2. Choice lists
3. Property templates
4. Document classes
5. Custom objects
6. Folders
7. Documents (configuration documents, real documents)

This list is incomplete. We outline the sequence in order to explain the


dependencies. There are other assets, such as event actions and various
policies, that are not mentioned here. For more detail, refer to the official FileNet
P8 Content Manager Deployment Guide.

Best practice: Consider the hierarchy of objects and their dependencies by


importing them in an order that dependencies can be resolved, in other words,
bottom-up.

Exporting content
When exporting configuration documents, you have an option to specify an
external directory where the actual content must go. Choosing this option gives
you a better starting point for your future transformations. Not choosing this
option embeds the actual content that is encoded into a CDATA section in the
exported XML file.

Best practice: Choose an external subfolder for content when exporting your
configuration documents.

Exporting user and group information


When moving between different directory servers or different contexts of one
directory server, user and group information must be adapted to the target
environment. You locate the user and group information in the access control
lists that are present in and control the access to almost every exported object.
You can also locate the user and group information in object-specific fields of
certain types of objects (for example, Add Entry Templates and Workplace User
Preferences). You can adapt this information by including the user and group
information at the moment of export and then later handle it at the moment of
transformation. This approach makes it mandatory to set the user and group
information correctly before the actual time of performing the deployment. In
addition, it is time-consuming to adapt to changes in users and groups, because
the exported files are inconsistent and need to be reexported again. To
overcome this limitation, it is useful to deal with users and groups separately as
part of a script that sets the security for all of the objects of one release from the
top down. Using this approach isolates the problem of transforming users and

266 IBM FileNet Content Manager Implementation Best Practices and Recommendations
groups correctly by coding the specific users and groups into only one script
instead of into hundreds of objects. In this approach, you handle user and group
information in access control lists.

There is another aspect of setting user and group information within the Add
Entry Templates. These objects contain user and group information that is used
to grant permissions to new objects (similar to the default instance security of
document classes). We discuss this topic in the section about AE-based content
deployment in 10.6.4, “Exporting and importing other components” on page 269.

Best practice: If you move between different directory contexts by either


using the same directory service provider or another one, try to define the
target users and groups in one script that you can reuse for all subsequent
versions of your deployment. However, if the directory service context is the
same between source and target environment, use the “Export security”
feature of IBM FileNet Enterprise Manager to include user and group
information in your exported XML.

Automation of export
The description of the detailed P8 Content Manager deployment is based on the
experiences gained with larger deployments under IBM FileNet P8 V3.5x.

Note: The ability to export and import assets from the object store has been
changed between IBM FileNet P8 CE Version 3.5x and 4.x.

In Version 3.5x, IBM FileNet Enterprise Manager is the preferred method for
interactive export and import. The CE COM API also offers export/import
methods, which can be used when automation is a goal. These methods
operate on individual objects, leaving it up to the developer to import and
export objects in the correct sequence and accounting for dependencies
between objects, which is a non-trivial task.

In Version 4.x, IBM FileNet Enterprise Manager is still available for interactive
export and import. You can use a command line export and import utility that
is new in Version 4.x when automation is a goal. Unlike the 3.x API calls, the
new command line utility handles object sequencing for you. The utility takes
an XML manifest file as its instruction set. This manifest file can be generated
using IBM FileNet Enterprise Manager (the file can be generated once,
interactively, in IBM FileNet Enterprise Manager, and then used multiple times
in automated deployments). The Version 3.x export/import API calls are not
available in Version 4.x.

Chapter 10. Deployment 267


10.6.2 CE-Objects transformation
Various exported assets need different treatment for a successful import into the
target object store. Table 10-1 shows a partial list to give you an idea of how to
distinguish the options.

Table 10-1 P8 Content Manager assets and transformation


Type of asset Transformation Remark
required

Property Templates Not required Import with the same GUID

Choice Lists Not required Import with the same GUID

Document Classes Not required Import with the same GUID

Workflow Definitions Required Contains references to


environment-specific constants, such
as Object Store Name and external
references for Web services

Folders Not required Import with same GUID

Business documents Not required Do not use the same GUID unless they
have the configuration characteristic

Note: If user and group information must be transformed to suit the target
environment, the transformation applies to all objects, including those where
this table says transformation is “Not required”.

Configuration documents must be exported using the option to put their content
into a separate folder. If you do this, the XML file that describes each document
object holds an absolute pointer to the content in the configured folder.

The transformation takes care the following issues:


򐂰 Configuration documents might be encoded. The tool performing the
replacement of certain values must be able to understand the encoding.
򐂰 The absolute pointer between the XML file describing the object and the
content is absolute. If the location from which the transformed files are
reimported is different than the location where they were created, the pointers
in the XML file must be corrected either at the transformation time or shortly
before actually importing.
򐂰 Take care to check the null, zero-length strings, and encoding in the values of
the XML file for special characters.

268 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Note: IBM is not responsible for testing and supporting your exported files and
objects. Always test them in a non-production environment before deploying
them in a production environment.

10.6.3 CE-Import
Importing the exported and transformed objects using IBM FileNet Enterprise
Manager is straightforward by using the order that is presented in “Hierarchy of
exports” on page 265.

Common errors occurring at this stage are:


򐂰 User and group information has not been updated to match the target
environment, so users cannot access the objects as expected.
򐂰 Objects that are depended on do not yet exist in the target environment.
Create objects in correct sequence.
򐂰 You have a refreshing problem. Take care to refresh the IBM FileNet
Enterprise Manager after the import.
򐂰 You have circular dependencies, so refer to the Deployment Guide for more
information.
򐂰 Options have not been set correctly for the content folder.
򐂰 The absolute pointer to content folder was not transformed to match the
runtime environment where the import takes place.
򐂰 The reuse of GUID has not been checked.

10.6.4 Exporting and importing other components


In this section, we address exporting and importing the database, fulltext,
Directory Service Provider, Process Engine, and Application Engine.

Database
All changes to rows in the object store database are covered by exporting and
importing objects as previously explained.

In addition, you can consider propagating changes that have been applied at the
database level, such as adding additional indexes, changing server options, and
others. You can typically accomplish this by rerunning the SQL-based scripts
that were written to configure the database in the source environment. Check
whether the scripts depend on infrastructural information, such as user ID,
password, server name, IP addresses, and database name.

Chapter 10. Deployment 269


Fulltext
If you have customized the style.stp for the embedded Verity engine, migrate this
asset according to the Deployment Guide.

Directory Service Provider


You can move parts of a directory either by exporting, transforming, and
reimporting the parts by using tools, such as ldiff, or by writing scripts that create
users and groups and add the users as members of the groups.

In any case, you must map the users, groups, and memberships to the target
environment, which depends on the security settings in your company. If
possible, use the same scripts and transform them based on a naming
convention. This step needs to happen prior to the CE/PE Import.

Process Engine (PE)


Process Engine export and import is straightforward by using the Process
Configuration Console, which supports both full and incremental deployments.

The underlying Process Engine APIs contain all the required methods to move
Queues, Rosters, EventLogs, and to validate Workflow Definitions.

You can export the Process Engine configuration by a call to the Process Engine
Java API. The method VWXMLConfiguration.exportConfigurationToFile
(apiObjects [], outputFile) takes a list of objects to be exported to an XML file.

The import into the Process Engine works in a similar way. The method
VWXMLConfiguration.importConfigurationFromFile (session, inputFile, option)
imports the XML file’s content into an existing session by either overwriting
existing items or merging them.

If you currently use other BPM features or services, refer to the FileNet P8
Planning and Deployment Guide, which you can download from:
http://www-1.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

Or, apply a technique similar to those techniques described earlier in this


chapter.

Application Engine
Whenever Workplace applications have to be moved between environments,
there are business assets and application configuration assets to be deployed.

Workplace is available in different versions: Workplace and Workplace XT. For


this discussion, all comments apply to both versions.

270 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Workplace stores various objects in an object store, such as:
򐂰 Site Preferences
򐂰 User Preferences
򐂰 Add Entry Templates
򐂰 Stored Searches
򐂰 Search Templates
򐂰 Application Roles

We have already discussed the business assets under the export, transform, and
import process. We do not need further explanations from a methodology point of
view.

We need to further explain the application configuration object, the Add Entry
Template. The Add Entry Template is used at the moment when you add a
document to the object store. It describes the metadata information that is used
to choose the document class containing folder, initial values, and initial object
permissions.

When exporting the Add Entry Template, it is important to understand that the
user and group information, which is embedded in the exported XML file, has
nothing to do with the IBM FileNet Enterprise Manager option “Export security.”
This information is used by the Workplace application, and you need to either
manipulate this section as part of the transformation or remove this section from
the XML export and import. Then, you can modify all Add Entry Templates
manually by editing them. If you have a large number of Add Entry Templates,
you might consider editing the section in the XML file automatically, which has
been done successfully.

Table 10-3 on page 272 provides a short summary.

Table 10-2 Summary


Asset type Transformation Remark
required

Site Preference Required Can be put in place by checking out


the Site Preferences manually and
checking in the new transformed
version

User Preferences Required Try to avoid deployment and instead


customize the general MyWorkplace
style for all users; this will save you a
lot of effort

Chapter 10. Deployment 271


Asset type Transformation Remark
required

Add Entry Templates Required Deal with security settings, folders,


document classes, and default
values

Search Templates Required Dependent on object name and


GUID. Remember there are two
content elements per object.

Application Roles No export/import Circular references cannot be


possible maintained

Workplace is a Web application spanning one war file, which contains the
relevant Java APIs to connect to Content Engine and Process Engine.
Workplace is built on top of the Workplace toolkit that can be leveraged to
customize Workplace. When Workplace is deployed as an application, there are
two areas for consideration:
򐂰 Custom code incorporated into the same Workplace war file
򐂰 Custom code incorporated into a different ear/war file that accesses the
commercial Workplace application

In both situations, there are dependencies between custom code and


commercial code. The installed Workplace application does contain a shell/batch
script to build the war file. This is a the starting point to define the build of the
Workplace war file and other application assets.

It is beyond the scope of this book to explore the details about how to achieve a
good build process in detail, but there are tools around in the market such as Ant
from Apache Foundation to help to facilitate the build process. Many clients
successfully did this to adapt Workplace and to automated its deployment.

Every Workplace environment will be dependent on a few values, which can be


maintained in its own CMDB. See Table 10-3.

Table 10-3 Dependent values of a Workplace installation


Parameter Location

prefname, installDir /WEBINF/bootstrap.properties

RemoteServerUrl, /WEB-INF/WcmApiConfig.properties
RemoteServerDownloadUrl,
Remote-ServerUploadUrl and CryptoKeyFile

uploadDir and downloadDir /WEB-INF/web.xml

272 IBM FileNet Content Manager Implementation Best Practices and Recommendations
11

Chapter 11. System administration and


maintenance
In this chapter, we describe various tools and methods that are used to monitor
and maintain your IBM FileNet Content Manager (P8 Content Manager) system
to insure optimal performance.

We discuss the following topics:


򐂰 Online help and existing documentation
򐂰 System performance monitoring
򐂰 IBM FileNet System Monitor
򐂰 System logs
򐂰 Reporting
򐂰 Capacity monitoring and growth prediction
򐂰 IBM FileNet Enterprise Manager
򐂰 Auditing
򐂰 Search and bulk operations
򐂰 Adding security
򐂰 System backup and restore
򐂰 Task schedule
򐂰 Best practice summary

© Copyright IBM Corp. 2008. All rights reserved. 273


11.1 Online help and existing documentation
Your P8 Content Manager installation includes an online searchable ecm_help
application. The ecm_help application covers many of the topics discussed here
in greater detail. We recommend that you use it in conjunction with this chapter.
The ecm_help application is installed on your Java 2 Platform, Enterprise Edition
(J2EE) application server. You connect to it by pointing your browser to your
ecm_help system.

The help system uses framed Web pages. The left frame contains links to details.
In this document, we point to additional details by selecting ecm_help → Help
Directory → How to use Help. On your ecm_help system, expand Help
Directory → How to use Help to find additional details. See Figure 11-1.

Figure 11-1 ecm_help: How to use Help expanded

You can obtain product documentation for the IBM FileNet P8 Platform from the
following Web site:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

274 IBM FileNet Content Manager Implementation Best Practices and Recommendations
You can obtain technical notices from this Web site in the Technical Notices
section, including:
򐂰 IBM FileNet P8 Performance Tuning Guide
򐂰 IBM FileNet P8 High Availability Technical Notice
򐂰 IBM FileNet Content Engine Query Performance Optimization Guidelines
Technical Notice
򐂰 IBM FileNet Application Engine Files and Registry Keys Technical Notice
򐂰 IBM FileNet P8 Asynchronous Rules Technical Notice
򐂰 IBM FileNet Content Engine Component Security Technical Notice
򐂰 IBM FileNet P8 Directory Service Migration Guide
򐂰 IBM FileNet P8 Disaster Recovery Technical Notice
򐂰 IBM FileNet P8 Extensible Authentication Guide
򐂰 IBM FileNet P8 Process Task Manager Advanced Usage Technical Notice
򐂰 IBM FileNet P8 Recommendations for Handling Large Numbers of Folders
and Objects Technical Notice
򐂰 IBM FileNet P8 DB2 Large Object (LOB) Data Type Conversion Procedure
Technical Notice

Although several technical notices were written for IBM FileNet P8 3.5, much of
the content provided is useful for the 4.0 version as well.

11.2 System performance monitoring


In this section, we discuss system performance monitoring. P8 Content Manager
ships with a centralized performance monitoring mechanism called System
Manager. System Manager is composed of two parts: a Listener that runs on
each server collecting information and a Manager that displays the information.
The Dashboard is the supplied Manager application to display and save the
collected information. System administrators use this tool to routinely monitor
system performance.

There is a technical notice, FileNet P8 Performance Tuning Guide, that is written


about performance tuning that we recommend that you read. The document
provides tuning tips that can help improve the performance of your IBM FileNet
P8 system. To view the document, go to:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/40x/p8_4
00_performance_tuning.pdf

Chapter 11. System administration and maintenance 275


11.2.1 Listener
When P8 Content Manager’s core engines are installed, a default system
Manager Listener is automatically installed. The Listener component collects
details about the engine software version and performance data. You must
configure and activate the Listener before it can begin to collect data.

By default, the Listener buffers approximately 24 hours worth of collected data


details. When a client connects, they see historical data, as well as data going
forward at the selected interval. This can be important if a performance problem
was reported but no longer exists; you can view the details in the Dashboard.
When a client connects to a Listener, it pulls collected data details periodically
based on the interval setting of the client. Multiple clients can connect to the
Listener without impacting each other. The Listener is designed to support up to
60 client connections.

You can obtain more details in the ecm_help → FileNet P8 Administration →


Enterprise-wide Administration → FileNet System Manager.

11.2.2 Dashboard
Administrative personnel can use this tool to routinely monitor system
performance. The Dashboard provides a means to generate detailed reports
regarding performance. It displays the details and also has the ability to save the
information in various formats.

The Dashboard is a Java utility that can be installed and run on Windows or
UNIX/Linux clients. It is installed separately from the server installation. It can
also be installed and run on the P8 Content Manager servers. On Windows
machines, run the Dashboard utility. On UNIX, you must have an XWindows
display exported and run the P8Manager shell script. The Dashboard installs a
local copy of its online help; it can be accessed from the help menu option.

When the Dashboard is first run, you need to create clusters of P8 Content
Manager components to monitor. These clusters are not used for high availability
but are simply a user-defined logical collection or cluster of servers to monitor.
The cluster contains servers and monitoring frequency. Select the Cluster tab
and click New. Enter a name for the cluster, which is typically the application
system name or location, and click OK. See Figure 11-2 on page 277.

276 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 11-2 Dashboard: New Cluster

Click Edit to add servers and timing details. See Figure 11-3 on page 278. The
Interval sets the frequency that the Dashboard polls the server listeners to get
details in seconds. For a 15 minute interval, enter 900 seconds. The number of
datapoints sets the maximum number of interval details that the Dashboard
keeps in the display.

Chapter 11. System administration and maintenance 277


Figure 11-3 Cluster add server

Click OK. At this point, the Dashboard tool begins querying for System Manager
Listeners on the servers and populates details in the Dashboard tool’s various
windows. It finds all listeners running on each server; individual servers only
need to be defined once. You can save the cluster details for future use or open
existing ones from the file menu. The cluster file is an XML-formatted file that is
saved on the local computer. You can copy the cluster.xml file to other
computers where the Dashboard is installed for use on other workstations.

The Dashboard summary tab simply shows a graph of the cluster’s performance.

The Details tab contains counter details for all listeners. You can expand the P8
Content Manager applications on each server and view the following items:
򐂰 CPU, Network, and Disk utilization
򐂰 Environmental details, such as OS level, P8 Content Manager version, and
Java virtual machine (JVM) settings
򐂰 Remote Procedure Call (RPC) activity shows how the P8 Content Manager
subsystems are performing and it details the count and average time
consumed (duration) by the calls during the interval

Figure 11-4 on page 279 shows RPC count details and the number of items
processed per interval.

278 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 11-4 RPC Count details window

The Dashboard is a Java application that holds the details in memory. If it


crashes or seems unresponsive, try these actions:
򐂰 Insure that you have the latest Dashboard and check its documentation for
any additional memory configuration details. The Dashboard is designed to
be Listener version independent; the latest Dashboard will function with older
Listeners.
򐂰 Increase the Java memory for the Dashboard. The Dashboard can run on
machines with less than 2 GB memory. If you use a machine with less than 2
GB of memory and monitoring many listeners, increase the machine’s
memory to 2 GB or take the following actions:
– Reduce the number of listeners being monitored.
– Increase the collection interval.
– Reduce the number of data points specified.
– Add memory.

Chapter 11. System administration and maintenance 279


The Dashboard has a report mechanism that allows you to save reports in
Comma Separated Value (CSV) format, which can be extremely useful for
generating reports for a spreadsheet. Refer to the Dashboard’s online help for
reports. You can save the report template for future use.

Figure 11-5 shows a sample report output.

Figure 11-5 Sample Dashboard Report CSV output

11.2.3 System Manager performance archiver


System Manager provides a Java archiver.jar application that can be used to
collect data automatically. The JAR file can be run on any server or workstation
with Java and connectivity to the P8 Content Manager listeners.

Running the archiver.jar can be automated through host scripts. The archiver.jar
writes to files with one file per listener in a log directory. The archived files are
binary files that can be opened via the Dashboard’s File → Open Archive menu.
The same view and report options apply as in a live system monitoring session.

280 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 11-1 lists the archiver.jar parameter options.

Table 11-1 archiver.jar parameter options


Option Description

-t hh:mm Total amount of time in hours and minutes that the archiver process
must run

-n hh:mm The interval at which the current archived files must be closed and
new ones opened

-i integer The interval, which is specified in seconds, at which to poll for data
from the specified machines

-d file path The path to the location at which to place the archive log files

FileName.xml The complete path to the saved cluster file that specifies which
machines to poll

This is an example of archiver.jar use:


java -jar archiver.jar -d Logs -t 12:00 -n 04:00 -i 00:15 cluster.xml

In this example, the archiver collects performance details in 15 minute intervals,


creates new archive logs every four hours, and automatically stops after 12
hours. If you stop the archiver process early, part of the buffered performance
data might not appear in the last archive file. If the archiver loses connectivity
with a Listener, by default, it attempts to reconnect five times at intervals that are
five seconds apart before it stops attempting to connect to the failed Listener.

Best practice: Start the archiver.jar immediately before your system activity
picks up during the peak times (for example, in the morning) and run it until
activity slows down (for example, in the evening).

If you restart your system while the archiver.jar is running, you must restart the
archiver.

11.2.4 System Manager client API


P8 Content Manager’s System Manager includes an API set for clients, who
want to add Dashboard monitoring into their applications. The API set is included
with the Dashboard software. Using the API, your custom application can feed
application performance data into the Dashboard.

For more information, select ecm_help → Developer Help → FileNet System


Manager Development.

Chapter 11. System administration and maintenance 281


11.3 IBM FileNet System Monitor
IBM FileNet System Monitor (FSM) is an optional component. FSM provides
automated, proactive system monitoring that can notify your support personnel
directly or through system management consoles, such as IBM Tivoli®
Enterprise Console®. Use FSM to monitor all aspects of your P8 Content
Manager servers. It provides early fault detection and prevention to aid support
personnel in reducing system downtime. It monitors performance, disk utilization,
event logs, and literally hundreds of P8 Content Manager application and system
parameters. FSM contains a default set of monitors for P8 Content Manager
components and allows you to create your own monitors for application-specific
monitoring.

FSM features a Web interface that authorized personnel use to monitor and
manage your system. It features a knowledge base of faults and possible
corrective actions. You can customize this knowledge base to offer
application-specific corrective actions. When a fault is encountered, support
personnel can quickly identify and correct the failing component. Figure 11-6
shows the main window of FSM.

Figure 11-6 FileNet System Monitor main window

The rapid fault isolation and corrective action database make FSM a must-have
for mission critical systems. FSM reduces manual efforts in the daily
administration of P8 Content Manager and helps to increase system availability.
FSM can help reduce your operational costs and help you meet your your
Service Level Agreements more efficiently.

282 IBM FileNet Content Manager Implementation Best Practices and Recommendations
For more information about IBM FileNet System Monitor, go to:
http://www.ibm.com

Select Products → Software → Information Management → Products by


category → Content Management → FileNet System Monitor.

Or, use the following hot link:


http://www.ibm.com/software/data/content-management/filenet-system-moni
tor

11.4 System logs


In this section, we discuss system message logs, tracing, and log maintenance.

11.4.1 Message logs


P8 Content Manager is written in Java. Java applications do not log error
messages; they log exceptions. Java normal messages and exceptions are
written to message log files. P8 Content Manager has three major engine
components: Application Engine, Content Engine, and Process Engine. We
introduce logging for each engine component and other servers next.

Application Engine does not have a message log. Messages and exceptions are
written to the J2EE application server’s log.

Content Engine has two message logs:


򐂰 For IBM WebSphere, under the default install location:
p8_server_error.log and p8_server_trace.log are located in
AppServer\profiles\default\FileNet\<serverInstanceName>
򐂰 For BEA WebLogic:
p8_server_error.log and p8_server_trace.log are located in
\bea\user_projects\domains\mydomain\FileNet\<serverInstanceName>

Process Engine message logs:


򐂰 Fatal errors for the Process Engine, such as database connectivity errors, are
logged to different locations depending on the operating system. On
Windows, they are logged in the Windows Application Event logs. On UNIX,
they are logged in a file residing in /fnsw/local/logs/elogs/. The file names are
in the format elog<YYYYMMDD> where <YYYYMMDD> is the year, month,
and day that the file was created in the directory.

Chapter 11. System administration and maintenance 283


J2EE Application Server message logs:
򐂰 For IBM WebSphere, under the default log locations:
SystemErr.log and SystemOut.log are located in
AppServer\profiles\default\logs\<serverInstanceName>
򐂰 For BEA WebLogic:
myserver.log is located in
\bea\user_projects\domains\mydomain\<myserver>
where <myserver> is your Web server name.

11.4.2 Trace logs


Trace logs are primarily used for debugging purposes. Trace logging can be
enabled for many components. By default, trace logging is turned off for all
components except for fatal errors. Fatal errors, such as database connectivity,
are logged in the server log files. Refer to 11.4.1, “Message logs” on page 283.
You can obtain additional trace log information in Chapter 7, “Application design”
on page 153.

Note: Trace logging all components can create enormous log files with very
little system activity. Performance might also be impacted. Turn on the
minimum trace logging necessary to collect the required information in relation
to the problem that you are investigating.

You can monitor the Application Engine activity through Content Engine’s API
trace logging. You can obtain additional information in 7.3.10, “Logging” on
page 177.

For more information, click ecm_help → Developer Help → Content Engine


Development → Java and .NET Developer's Guide → Trace Logging →
Concepts. Relevant details are in the API subsystem.

You can turn Content Engine trace logging on or off without recycling the server.
It can be enabled at several different levels, for example, to include all Content
Engine servers or only one server. It can be enabled for specific components or
all components. 11.4.2, “Trace logs” on page 284 contains additional details for
enabling trace logging.

For more information, select ecm_help → FileNet P8 Administration →


Content Engine Administration → Trace Logging.

Process Engine trace logging is controlled by a Process Engine command line


utility, vwtool. For more information about vwtool, select ecm_help → FileNet

284 IBM FileNet Content Manager Implementation Best Practices and Recommendations
P8 Administration → Process Engine Administration → Administrative
tools.

11.4.3 Log4J trace logs


P8 Content Manager provides Apache log4j logging capabilities, which are
typically used for debugging application issues. System administrators might be
called on to assist with enabling log4j tracing.

For more information, select ecm_help → Developer Help → Content Engine


Help → Java and .NET Developers Guide → Trace Logging.

The Content Engine provides a log4j.xml.server file that must be edited to enable
logging, and it must be copied into a directory specified in the Content Engine’s
CLASSPATH.

Copy and edit FileNet\ContentEngine\config\samples\log4j.xml.server

To:
WebSphere\AppServer\profiles\default\installedApps\hqdemo1Node01Cell
\FileNetEngine.ear\APP-INF\lib\log4j.xml

Or:
bea\user_projects\domains\mydomain\myserver\.wlnotdelete\Engine-wl\A
PP-INF\lib\log4j.xml

The Application Engine provides a log4j.properties.client file that must be edited


to enable logging, and it must be copied into a directory specified in the
Application Engine’s CLASSPATH.

Copy and edit FileNet\AE\CE_API\config\samples\log4j.properties.client

To:
WebSphere\AppServer\profiles\default\installedApps\hqdemo1Node01Cell
\Workplace.ear\app_engine.war\WEB-INF\lib\log4j.properties

Or:
bea\user_projects\domains\mydomain\myserver\.wlnotdelete\extract\mys
erver_Workplace_Workplace\jarfiles\WEB-INF\lib\log4j.properties

Note: Improper log4j settings can cause system problems. Always test on a
development system before you implement logging in a production
environment.

Chapter 11. System administration and maintenance 285


11.4.4 Message and trace log maintenance
The system logs must be periodically cleared. If left unchecked, system logs can
grow quite large and difficult to manage when investigating a problem.

Best practice: Rename system logs to a date format name to keep them for a
brief period, and then delete them. The log maintenance timing depends on
how busy your system is and how large the logs grow over time.

11.4.5 Audit and statistics logs


Audit and statistics logs are stored in the server’s database.

Content Engine audit log:


򐂰 The Content Engine provides audit logging capabilities. When auditing is
used, the audit log entries are stored in the Content Engine database.
򐂰 For more information, select ecm_help → Workplace → FileNet P8
Administration → Content Engine Administration → Auditing.

Process Engine statistics log:


򐂰 The Process Engine logs information in vwlog database tables.
򐂰 For more information, select ecm_help → Workplace → Workplace
overview → Process Engine Reference → Events and statistics.

Best practice: If your system has a Process Engine or you use Content
Engine audit logs, a best practice is to maintain the logs weekly, which insures
that the database tables are kept as small as possible. If left unchecked, audit
logs can become very large and impact system performance.

11.5 Reporting
To get reports about your system, you can use queries. Two distinct models are
available for performing queries against object store objects. In this section, we
describe both models and gives examples of their use.

11.5.1 Queries using the object model


Whenever possible, perform queries using the object model. It allows you to
express your queries using the familiar classes represented as SQL tables and

286 IBM FileNet Content Manager Implementation Best Practices and Recommendations
properties represented as SQL columns, but you have the full benefit of Content
Engine access controls, data conversions, and other internal optimizations.
There are a number of ways to perform queries using the object model:
򐂰 The IBM FileNet Enterprise Manager provides a guided user interface called
Query Builder to assist in creating and running queries. Refer to 11.9.1,
“Search using IBM FileNet Enterprise Manager” on page 299, which
describes using the search IBM FileNet Enterprise Manager for more
information.
򐂰 The Java API, the .NET API, and Content Engine Web Services all provide
programmatic query interfaces.
򐂰 The Java API provides a Java Database Connectivity (JDBC) driver for use
with commercial reporting packages. Refer to 7.3.7, “Using the JDBC
interface for reporting” on page 173 for details.

Regardless of the method used, the SQL syntax of the queries is identical. The
syntax is generally a subset of SQL-92 (only the SELECT statement) with
several IBM FileNet P8-specific extensions.

The complete syntax is described in ecm_help → Developer Help → Content


Engine Development → Java and .NET Developer’s Guide → Reference →
SQL Syntax Descriptions.

Suppose you need to find the largest content in your repository. You plan to run
this query periodically and therefore are interested in finding only the content that
was created or updated since the query was last run. This is a sample query to
accomplish this task:
SELECT TOP 500 Creator, ContentSize, Id FROM Document d
WHERE ContentSize > 50000000.0 AND DateLastModified > 20071031T040000Z
ORDER BY ContentSize DESC

In this example, we searched for content larger than 50 MB that has been
modified after 4 A.M. in Coordinated Universal Time (UTC time) on the last day
of October 2007. We selected the Document properties Owner and Id, although we
can use any other Document properties of interest. The results are ordered in
descending size. To prevent retrieving too many results, we constrain the SELECT
statement with a TOP modifier that limits the result set to a maximum of 500.

11.5.2 Queries using the schema of the database


There are times when you might want to use a native query against the
underlying database. You write native queries using the SQL syntax and tools
applicable to your underlying database brand and version. For queries returning
extremely large result sets, there can be a noticeable performance difference

Chapter 11. System administration and maintenance 287


from the object model queries. You might also need to use native queries if there
is not a facility available via the SQL syntax of the object model queries, such as
the use of COUNT(*). When using native queries, you lose security access control
and other Content Engine optimizations.

Best practice: Use the object model whenever possible for queries.

Important information: Although native SELECT queries are supported for


reading data from the underlying database, we do not support using native
database facilities for updating the database while bypassing the Content
Engine. Doing that bypasses important security, referential integrity, and data
consistency safeguards.

The P8 Content Manager classes and properties do not map one-to-one with
tables and properties in the underlying database. As an example, the Document
class objects are stored in the DocVersions database table. The Document
properties Creator, ContentSize, Id, and DateLastModified are mapped to
DocVersions columns creator, content_size, modify_date, object_id, and
modify_date, respectively.

Details about the database schema are in ecm_help → Developer Help →


Content Engine Development → Java and .NET Developer’s Guide →
Reference → Database Reference.

11.6 Capacity monitoring and growth prediction


When planning for a P8 Content Manager system, you need to estimate the
average amount of content added per day, average size of the content, how
many users have access to it, and other basic information about your planned
application. Answers to these questions are run through a modeling tool, Scout,
by your IBM representative. The modeling tool provides details about servers
needed, database space needed, and overall disk space needed for storage.
The modeling tool also estimates the CPU utilization of the necessary servers.
For more information, refer to Chapter 4, “Capacity planning with Scout” on
page 65.

As you deploy and begin using your application, monitor and record these server
statistics:
򐂰 Disk usage
򐂰 CPU and memory utilization
򐂰 Database usage

288 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Your database administrators can provide database details. The most important
information is actual database size, but it is also good to know if specific tables or
data fields are growing rapidly.

P8 Content Manager systems tend to grow over time. Content is added daily,
additional applications are developed, and users are added. By monitoring and
recording these statistics, you can measure how your system is performing
against the initial model. More importantly, you can track how quickly you are
using resources and determine the impact to the system when an increase in
system usage is planned.

We recommend monitoring for capacity weekly. Proper capacity monitoring


provides you with advanced notice that additional server resources need to be
purchased. The initial model is an estimate of what is needed using the numbers
that you provided. If your estimated content count or size was too small, you
need to plan additional space to accommodate your real workload earlier than
the model predicted. The System Manager Dashboard utility can assist with
performance monitoring; see 11.2.2, “Dashboard” on page 276.

11.7 IBM FileNet Enterprise Manager


The IBM FileNet Enterprise Manager (FEM) is a utility that is used to manage
your P8 Content Manager system. The IBM FileNet Enterprise Manager is a
Windows Microsoft Management Console snap-in tool. Many maintenance
functions use the IBM FileNet Enterprise Manager utility. Your system
administrators need access to the IBM FileNet Enterprise Manager utility. It is
used to initially configure object stores and content properties, assign content
security, and administer your P8 Content Manager system.

When P8 Content Manager is installed on a Windows server, IBM FileNet


Enterprise Manager is installed as part of the process. Alternatively, or for UNIX
installations, you can install IBM FileNet Enterprise Manager on system
administrator Windows workstations.

Chapter 11. System administration and maintenance 289


Best practice: Several functions performed within IBM FileNet Enterprise
Manager (or through APIs) can consume a lot of resources. Limit performing
these functions to non-peak hours to minimize user impacts.

These functions include:


򐂰 Metadata authoring, for example, creating or updating classes and
properties
򐂰 Administrative object updates, for example, creating new object stores,
IBM FileNet P8 Domain-level objects, or other Global Configuration
Database (GCD) objects.

11.7.1 Using IBM FileNet Enterprise Manager


When first running IBM FileNet Enterprise Manager, you need to configure at
least one P8 Content Manager system. Several P8 Content Manager systems
can be configured if you have multiple systems. After a P8 Content Manager
system is configured, select the desired system and click Connect. See
Figure 11-7.

Figure 11-7 FileNet Enterprise Manager initial configuration window

Figure 11-8 on page 291 shows the IBM FileNet Enterprise Manager main
window. IBM FileNet Enterprise Manager allows you to create object stores,
assign security to all P8 Content Manager items, define and run searches, and
turn trace logging on or off. All P8 Content Manager administrative functions are
performed via IBM FileNet Enterprise Manager.

290 IBM FileNet Content Manager Implementation Best Practices and Recommendations
=

Figure 11-8 Main IBM FileNet Enterprise Manager window

11.7.2 IBM FileNet Enterprise Manager: Setting system default values


You can set system default values via the IBM FileNet Enterprise Manager.
These defaults range from process time-out values to the number of specific
processes permitted to run on a server. You can also update most of these
values through application programming interfaces in your applications. You can
set most values at the various system levels that are listed in order of
precedence:
򐂰 Server
򐂰 Virtual server
򐂰 Initial site
򐂰 IBM FileNet P8 Enterprise domain

Chapter 11. System administration and maintenance 291


This architecture allows you to quickly and easily set system-wide default values
and fine-tune them down to the individual server level.

To set the properties, in IBM FileNet Enterprise Manager, right-click the


appropriate server level and select Properties. The Properties page presents a
tab view that displays current values that can be changed.

Figure 11-9 shows the Server Cache setup at the IBM FileNet P8 domain level
properties page. The Server Cache setup is also shown on the Properties page
for server instance, virtual server, and site objects.

Figure 11-9 IBM FileNet Enterprise Manager domain level Properties page

Figure 11-10 on page 293 shows a server level Properties page. Note that the
server level has fewer tabs; it includes only the values that pertain to the specific
server.

292 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 11-10 IBM FileNet Enterprise Manager server level Properties page

11.7.3 IBM FileNet Enterprise Manager: Enable trace logging


In 11.4.2, “Trace logs” on page 284, we discussed the Content Engine trace
logging capability. We show you how to enable and disable trace logging in this
section.

Figure 11-11 on page 294 shows the main IBM FileNet Enterprise Manager
window.

Chapter 11. System administration and maintenance 293


Figure 11-11 IBM FileNet Enterprise Manager trace logging

Trace logging can be enabled at the following levels:


򐂰 Individual server, selected in Figure 11-11.
򐂰 Virtual server, immediately above selected server in Figure 11-11.
򐂰 Site level, above the Virtual Server in Figure 11-11.
򐂰 Domain level, IBM FileNet Enterprise Manager [p8demodom] in Figure 11-11.

In our example, shown in Figure 11-11, it does not matter where we enable trace
logging, because we only have one server. P8 Content Manager systems can
have literally hundreds of servers. Turn on the minimum logging on the fewest
possible servers as necessary to investigate a problem.

Important information: When using trace logging, enable it for the fewest
possible number of servers. Depending on what logging is enabled and how
busy your system is, all of your servers can produce large logs if trace logging
is enabled at the site or domain level. Performance can also be impacted on
busy systems if all trace logging is enabled.

Double-click trace logging or right-click the virtual server or domain level and
select properties to open the Properties page. The Properties page is shown in
Figure 11-12 on page 295.

294 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 11-12 IBM FileNet Enterprise Manager Properties page

In the Properties page, which is shown in Figure 11-12, you select the
subsystems to monitor and the level of detail to log. If the property page is from
the server or virtual server level, you must select the Override inherited settings
check box.

11.8 Auditing
P8 Content Manager provides audit logging to monitor event activity of objects.
Audit data is stored in a table in the object store’s database. Auditing is controlled
in the IBM FileNet Enterprise Manager.

Chapter 11. System administration and maintenance 295


Note: You can enable auditing for any type of update or access. Audit details
are stored in the Content Engine database, which can grow quite large. The
process of audit logging also adds to the server load and affects overall
performance. Because audit logging requirements usually come from a
business need, you might be required to enable audit logging.

Best practice: When you enable audit logging, try to enable it for the fewest
objects and for the shortest amount of time that you reasonably can.

You can obtain details by selecting ecm_help → FileNet P8 Administration →


Content Engine Administration → Auditing,

To enable auditing, open FileNet Enterprise Manager, right-click an object store,


select Properties, and then select the General tab. See Figure 11-13.

Figure 11-13 Enterprise Manager Object Store

In the General Tab, check Auditing Enabled? and see Figure 11-14 on
page 297.

296 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 11-14 Object Store General Tab

Select a Content Access Recording Level, and then click OK. Content Access
Recording levels are:
򐂰 None: Specifies that updates to the DateContentLastAccessed property are
disabled (which is the default behavior). The value for this constant is -1.
򐂰 Immediate: Specifies that the DateContentLastAccessed property is updated
as soon as content is accessed. The value for this constant is 0.
򐂰 Hourly: Specifies that the DateContentLastAccessed property is updated
only when an hour (3600 seconds) has elapsed since the last update of the
DateContentLastAccessed property. Any access of content within an hour of
the last update is not recorded.
򐂰 Daily: Specifies that the DateContentLastAccessed property is updated only
when a day (86400 seconds) has elapsed since the last update of the
DateContentLastAccessed property. Any access of content within a day of
the last update is not recorded.

Chapter 11. System administration and maintenance 297


11.9 Search and bulk operations
You might find it necessary to modify properties or delete multiple pieces of
content. In this section, we discuss using search functions and performing
modifications on a bulk set of content. This search and bulk operation capability
uses IBM FileNet Enterprise Manager.

The searches discussed in this section are IBM FileNet Enterprise Manager
searches for maintenance purposes only. The P8 Content Manager application
layer has similar stored search capability. The application searches are stored in
a different location. While similar, IBM FileNet Enterprise Manager searches
cannot be directly promoted for application use. Refer to 8.3, “P8 Content
Manager searches” on page 201 for an additional search discussion.

IBM FileNet Enterprise Manager provides a Query Builder to create, save, and
run search functions. Predefined search templates are provided with each P8
Content Manager installation. These templates are provided to assist you with
managing the size of your audit log and for managing entries in the QueueItem
table.

Searches can:
򐂰 Find objects using property values as search criteria
򐂰 Create, save, and run simple searches
򐂰 Create and save search templates that prompt for criteria when launched
򐂰 Create, save, and run SQL queries

Searches can perform bulk operations, such as the operations in the following
list, on content that meets the search criteria:
򐂰 Delete objects
򐂰 Add objects to an export manifest
򐂰 Undo document checkout
򐂰 Perform life cycle actions, such as set exception, clear exception, promote,
demote, and reset
򐂰 Perform containment actions, such as file into a folder and unfile from a folder
򐂰 Run VBScripts or JScripts
򐂰 Edit security permissions

298 IBM FileNet Content Manager Implementation Best Practices and Recommendations
11.9.1 Search using IBM FileNet Enterprise Manager
In this section, we demonstrate a simple search using IBM FileNet Enterprise
Manager. We build a search for all content created by Joe User on 11 September
2007.

Note: The example that we used is a typical maintenance function where


search is used. It is also a good example of a poor search. Ideal searches use
indexed values. Using indexed values provides quick results to your search.
Use indexed values whenever possible. When not possible, your search can
run for several hours or more, because each database record must be tested
to see if it meets the conditions of the search. This is known as doing a “full
table scan” in database terms. It can decrease your overall system
performance, because your database is quite busy while performing the full
table scan. When performing a non-indexed search, limit the operation to one
person at a time and preferably only when there is low system usage.

Launch IBM FileNet Enterprise Manager (refer to 11.7, “IBM FileNet Enterprise
Manager” on page 289). Expand an object store tree view and click Saved
Searches.

Figure 11-15 on page 300 shows the supplied search templates. Templates are
represented by a binocular icon with a shaded bottom and right edge.

Chapter 11. System administration and maintenance 299


Figure 11-15 Saved Searches window

To create a new search, right-click Search Results and click New. This starts
the Query Builder application.

Figure 11-16 on page 301 shows the Query Builder application.

300 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 11-16 Query Builder

Query Builder provides a point and click interface to create searches. Query
Builder allows you to perform bulk mode actions against the result set. You can
also use VBScripts or JScripts to build queries.

Details about Search options is available by selecting ecm_help → FileNet P8


Administration → Content Engine Administration → Search and Bulk
Operations.

In our example shown in Figure 11-16, we created a search for all items that are
created by Joe User on or after 11 September 2007.

When you have completed all search criteria, click OK to run the query. If the
search that you created might be needed again, you must select File → Save
before running the query. After clicking OK, the query runs and a results window
appears indicating the progress and when the query completes.

When the query completes, click OK in the Query Status window. Content that
matches your search appears in the IBM FileNet Enterprise Manager Search
Results window.

Figure 11-17 on page 302 shows the Search Results window.

Chapter 11. System administration and maintenance 301


Figure 11-17 Search Results

You can right-click the items to view their properties in order to validate that they
meet your criteria.

Figure 11-18 on page 303 shows the Content Properties window.

302 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 11-18 Content Properties

11.9.2 Bulk operations


In chapter 11.9.1, “Search using IBM FileNet Enterprise Manager” on page 299,
we created a search that found all of the content that Joe User created on
9/11/2007. Next, we run the search to modify the content in bulk to add
AUserGroup with specific permissions granted to members of AUserGroup.

To perform bulk operations, your IBM FileNet Enterprise Manager logon ID must
have sufficient security privileges to perform the desired actions.

Important information: Searching and performing bulk operations are


extremely powerful operations. Before performing a Bulk Operation,
double-check your search criteria. If the Bulk Operation is to delete content
and you have sufficient privileges, it deletes everything that meets the search
criteria. Deleted content cannot be restored.

Chapter 11. System administration and maintenance 303


Follow these steps to run the search to modify the content in bulk to add
AUserGroup with specific permissions granted to members of AUserGroup:
1. Because we saved the search before running it, we can select Saved
Search, right-click, and open our search in Query Builder from there. If your
newly saved search does not appear, right-click in the results pane and click
refresh.
When we open a Saved Search, it opens in Query Builder. We can modify the
search and save the modifications or simply run the query with the new
information.
2. To run the Query Builder, we opened the FindJoeUser search that we created
in 11.9.1, “Search using IBM FileNet Enterprise Manager” on page 299.
3. With our query open, we select the Security tab.
4. We then click Add to find AUserGroup.
5. We enter AUserGroup and click Find. We then select the name and click OK.
We select the Query Builder Security tab and add the desired permissions.
Figure 11-19 shows the Query Builder Security tab page with the desired
options selected.

Figure 11-19 Query Builder Security tab page with the selected modify options

304 IBM FileNet Content Manager Implementation Best Practices and Recommendations
6. When we click OK on the Query Builder Security tab page, our query status
returns, but this time, it indicates that it has successfully updated two items.
The same content items appear in the query results window. If we check the
properties, we now see that AUserGroup with our selected permissions has
been added to the content.
Figure 11-20 shows that the modified properties now include AUserGroup.

Figure 11-20 Modified content property

Figure 11-21 on page 306 shows additional bulk operations that can be
performed on a query.

Chapter 11. System administration and maintenance 305


Figure 11-21 Query Builder window Actions tab

11.10 Adding security


As your system grows, you might find it necessary to add users and groups to
create or access content. While content security can be added for specific users,
a best practice is to use security groups. Users can easily be added to the group
to gain the security roles that they need. Securing content to a specific user
requires maintenance to find content and add security for them as user roles
tend to change over time.

To update an object store with new users or groups, use IBM FileNet Enterprise
Manager’s Security Script Wizard to run the OSecurityUpdate.xml script.

Note: While you might find other ways to apply security, using the Security
Script Wizard is the only way to insure that it is set correctly in the object store.
Failure to use the wizard can cause problems when users attempt to access or
create content.

306 IBM FileNet Content Manager Implementation Best Practices and Recommendations
To update an object store with new users or groups, follow these steps:
1. In the IBM FileNet Enterprise Manager, right-click the object store node,
choose All Tasks and run the Security Script Wizard.
2. When prompted to select an XML security script information file, browse to
and select OSecurityUpdate.xml. It is installed in the installation base
directory:
FileNet\ContentEngine\Scripts\Component Library\
3. When prompted to define security roles, you see two roles under Security
Role: Object Store Administrators and Object Store Users.
Use Add to add security participants for the selected role. The Select Users
and Groups dialog box opens. Click OK when you have added the
participants for that particular role. See Figure 11-22, which shows the
Security Script Wizard.

Figure 11-22 Security Script Wizard interface

4. Click Finish when you are done. The wizard generates a prompt informing
you where its log file will be located. The wizard proceeds to apply the
security permissions to the objects in the object store. This process can take
time, depending on the number of objects that need to be updated. The
wizard informs you when the process of applying security is complete.
5. If you added groups to only one Security Role, a notice appears. Simply click
OK, because no current Security Roles will be deleted; only the new roles will
be added by the wizard. See Figure 11-23 on page 308.

Chapter 11. System administration and maintenance 307


Figure 11-23 Security Wizard Notice

Examine the new permissions on IBM FileNet Enterprise Manager’s root folder.
Depending on how you have configured the inheritance from the root folder and
all generations of child folders, these new permissions might not yet have been
inherited. You need to configure the folder security parentage as appropriate.

You can read more detailed information by selecting ecm_help → FileNet P8


Administration → Content Engine Administration → Managing Security →
Security Script Wizard.

11.11 System backup and restore


In this section, we discuss P8 Content Manager backup and restore.

Chapter 9, “Business continuity” on page 213, discusses types of events that can
require a system restoration. It focuses on building a highly available
environment with protection against catastrophic system or site loss to ensure
that your system is always available. If you are responsible for system recovery,
familiarize yourself with business continuity methods whether your budget
permits a hot site or not. You might be able to use some business continuity
methods to reduce backup and restore times in your data center. If your budget
permits a hot site, you still need a backup and restore mechanism to recover
from human errors, such as deleted or modified files. A mirrored hot site mirrors
all activity; it lacks a means to differentiate an intentional or accidental change.

Best practice: Store your backup media off-site away from your primary
servers. You must make sure that the media is moved to the off-site location
as soon as possible after the backup completes.

The longer your backup media is stored near your primary servers, the greater
the chance that a catastrophic event can destroy both your servers and your
ability to restore your systems to operational condition.

308 IBM FileNet Content Manager Implementation Best Practices and Recommendations
P8 Content Manager does not provide backup software. You must use backup
utilities that are supplied with your operating system or database or by third
parties.

11.11.1 System components requiring backup


This is a list of system components requiring backup:
򐂰 Databases (all tables or table spaces and schema for your system)
򐂰 File storage areas if used or configured, including fixed content storage areas
򐂰 Content-Based-Retrieval files and indexes
򐂰 Server operating system, J2EE environment, and all P8 Content Manager
installed software
You can choose to omit the operating system and software backup. In the
event of a failure that requires a restore operation, this choice requires a
reinstallation of all components on a server, which increases the time
required to return the server to normal operations.
򐂰 Lightweight Directory Access Protocol (LDAP) security system
You must insure that your user ID’s unique identifiers are maintained during a
restore. P8 Content Manager uses the unique identifier for security. Simply
recreating deleted users or groups will not work, because recreating deleted
users or groups typically creates a new unique identifier.
򐂰 Any external systems with which your P8 Content Manager application
operates
Typical P8 Content Manager installations operate in concert with existing
applications. Examples are Customer Relations Management systems,
database applications, and mainframe applications, Their data needs to be
backed up at the same time that your P8 Content Manager system is backed
up to insure full data consistency.

Note: If your system uses Fixed File Storage areas for compliance or
Image Manager applications, you need a normal file storage area for
temporary staging of content. If your application performs content
reservations or uses annotations, that metadata is stored in the
“temporary” file store. This file storage area must be included in your
backup and recovery strategy.

Chapter 11. System administration and maintenance 309


11.11.2 Offline backup
An offline backup is the preferred method for P8 Content Manager. An offline
backup insures that all application data is in a consistent state. When a restore
becomes necessary, all data must be recovered to the same point in time.

A backup window is the amount of time that your system can be down for
backup. If your system has users running from 6:00 A.M. to 11:00 P.M., you have
a seven hour backup window. A best practice is to allot time before and after
users require the system to accommodate late workers or a backup that runs
longer than usual. We recommend allotting 1/2 to one hour before and after
users expect the system to be operational. In this example, allotting one hour
before and after gives you 5 hours total backup window to stop the servers,
perform the backup, and start the servers.

Typical installations store content in a file system, metadata in a database, work


items in a process database, and pointers to the content in external systems.
The amount of time that is necessary to back up the individual system
components can vary by minutes or hours.

The amount of time required for the longest component’s backup must fit within
your backup window. Your content storage area usually consumes the greatest
amount of backup time.

There are a few steps that you can take to decrease backup time to fit your
window:
򐂰 Use a combination of full and incremental backups. Incremental backups
simply capture information that has changed since the last backup. This can
greatly reduce time spent backing up data. During a restore, you must restore
from your last full backup and apply the incremental backups before starting
your system, which increases the amount of time necessary to restore your
system. A best practice is to perform full backups weekly when a larger
backup window is available and perform incremental backups during the
week when your backup window is smaller.
򐂰 If you use tape as your backup media, a faster alternative is to back up your
data to disk files. When the backup to disk completes, transfer the backup
files to tape, which allows your P8 Content Manager system to run while the
transfer to tape occurs.
򐂰 Section 11.11.3, “Online backup” on page 311 discusses potential methods to
run online backups. Those techniques can safely be used for offline backups.
Simply stop your P8 Content Manager servers, run the copy and restart your
system. This approach provides the fastest possible offline backup.
򐂰 If your backups cannot be completed within your backup window, you need to
look at the online backup methods discussed next.

310 IBM FileNet Content Manager Implementation Best Practices and Recommendations
11.11.3 Online backup
You need to investigate online backup alternatives if your system must run 24x7,
your backup time exceeds your backup window, or your Service Level
Agreements (SLAs) require a higher frequency than a nightly backup.

The problem with online backups is insuring consistency in your backups. As we


mentioned in 11.11.2, “Offline backup” on page 310, backup times can vary
between different components. If your P8 Content Manager database backup
completes in 30 minutes but your file store backup runs three hours, it is highly
possible that when a restore is performed, your database might not have
metadata pointers to all the files in your file store. The result is an inconsistent
system. P8 Content Manager provides a Consistency Check utility (refer to
11.11.5, “Consistency Check utility” on page 312) that you can use to find
inconsistent objects.

There are options on the market that can help resolve this situation. Disk,
volume, or storage area network (SAN) mirroring techniques are available that
permit time slice backups or snapshots of your data. These options typically work
similarly to the disk mirroring that has been used for many years. Where they
differ is that they mirror several disks or volumes in groups and permit adding
time slice details. Restoring involves copying the mirror back to the last good
time slice. Several techniques offer offline tape backup of the mirror and time
slice copies. Ideally, the utilities provide a means of capturing consistent time
slices across all disk drives and servers used by your application.

Section 9.4.1, “Disaster recovery concepts” on page 230 discusses methods that
use these techniques to copy your data to a remote facility. The same techniques
can provide copies in your primary data center. Most storage vendors offer local
and remote mirroring capabilities for this copying. It might be called a time slice,
snapshot, or flash backup capability. Most storage vendors also provide tape
backup solutions to move the data off-site.

Check to see if your database vendor has any special requirements for using
these techniques for system backups; most vendors do have special
requirements. Consider using an online database backup for additional safety.

Note: At the time of writing this book, the P8 Content Manager engineering
group has not tested or supported these techniques. Many clients are
currently using these techniques for online backups. The suggestions
provided here are for your reference.

If you need to perform online backups, you must perform due diligence and
validate that the techniques used can create a restorable backup data set.

Chapter 11. System administration and maintenance 311


11.11.4 System restore
There is no particular required component order when a system restore becomes
necessary. Your LDAP security and databases need to be operational before
starting P8 Content Manager after a restore. Typically, you restore information in
this sequence:
򐂰 LDAP system
򐂰 Database server
򐂰 P8 Content Manager server operating system, J2EE environment, and all P8
Content Manager installed software
򐂰 File stores if used or configured
򐂰 Any external systems with which your P8 Content Manager application
operates

Your P8 Content Manager system needs to be down during the restore process,
If you used incremental backups, restore all incremental backups before starting
your P8 Content Manager system. After all restores have completed, start your
P8 Content Manager system normally. Check for individual component errors.
Refer to 12.3.1, “Quick checks” on page 323 for system test procedures.

After a system restore, if your P8 Content Manager system uses file stores,
perform a consistency check using the Consistency Check utility.

11.11.5 Consistency Check utility


If your P8 Content Manager uses fixed or file storage areas, the IBM FileNet
Enterprise Manager provides a Consistency Check utility. You can perform a
consistency check after a major event, such as a system restore, server crash, or
power loss. The consistency check verifies that the files in the file store
correspond to metadata in the database. Your P8 Content Manager must be
running to perform a consistency check.

To run the Consistency Check utility, start IBM FileNet Enterprise Manager and
select an object store. A set of object store tasks displays in the right pane for the
selected object store. (See Figure 11-24 on page 313 in which an object store is
selected).

312 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Figure 11-24 Enterprise Manager with object store selected

Double-click Content Consistency Checker. The Content Consistency Checker


Tool window appears (see Figure 11-25).

Figure 11-25 Content Consistency Checker Tool

Chapter 11. System administration and maintenance 313


Select the storage area that you want to check. Click Set Options to set the
appropriate options.

Click Start Consistency Check. The consistency check progress window shows
status, start time, approximate completion time, and when complete.

Figure 11-26 shows a completion window.

Figure 11-26 Consistency check complete window

Best practice: Consistency checks can run for a very long time depending on
the amount of content in your system. Limit the amount of time that the
consistency check runs. Set the check to start a few hours before the major
event that requires its use.

11.11.6 Application consistency check


If your application uses external systems, your application developers must
consider creating tools to allow validating the consistency between your P8
Content Manager system and the external systems. There might be an event

314 IBM FileNet Content Manager Implementation Best Practices and Recommendations
where you need to restore your P8 Content Manager system and external
systems cannot be restored to the same point in time. In those cases, you need a
means to validate that content references in the external systems are on the P8
Content Manager system.

11.12 Task schedule


Table 11-2 lists the recommendations for the frequency of performing P8 Content
Manager system administration tasks.

Table 11-2 Task schedule recommendations


Task Frequency Comments

Monitor system Daily Processes, performance, and logs

Back up system Daily Databases, file stores, and LDAP service

Log maintenance Weekly See Note 1

Check free space Weekly All file systems and databases

Check Weekly See Note 2


performance

Check for latest Monthly See Note 3


Fix Packs

Database Periodically Consult your database vendor for periodic


maintenance maintenance functions to keep the database
optimized. Insure that you meet their
recommendations.

Backup software Monthly Operating systems, J2EE server, and


installed software

Apply patches Semi-annually See Note 3

Test restore Annually A full system restore must be performed at


least once per year on DR hardware.

Note 1: Log maintenance must include all operating system, application server,
and P8 Content Manager product error and trace log files. Log maintenance
must also include the Content Engine audit log and the Process Engine log
database tables, if used. All log files can grow quite large over time; on busy
systems, you might need to increase the maintenance frequency. Low use
systems might be able to reduce the frequency.

Chapter 11. System administration and maintenance 315


Note 2: 11.2.3, “System Manager performance archiver” on page 280 describes
how to archive performance logs. You can generate reports from the archived log
files. If you use IBM FileNet System Monitor, you can configure it to keep
archived performance data and generate reports as well.

Note 3: IBM FileNet Fix Packs are produced at regular intervals. Fix Packs, as
well as the latest documentation, are available at:
http://www.ibm.com

Select Support and downloads → Information Management → FileNet


product.

Or use the following hot link:


http://www.ibm.com/software/data/content-management/filenet-content-man
ager/support.html

11.13 Best practice summary


Below is a summary of the best practices that we have discussed in this chapter:
򐂰 Run the archiver.jar to capture performance data during peak hours of
activity. For reference, go to 11.4.4, “Message and trace log maintenance” on
page 286.
򐂰 Maintain message logs by renaming them and then deleting them after a
period of time. For reference, go to 11.4.5, “Audit and statistics logs” on
page 286.
򐂰 Manage (clean up) audit and statistics logs weekly when used. For reference,
go to 11.2.4, “System Manager client API” on page 281.
򐂰 Use native report queries only when necessary. For reference, go to 11.5.2,
“Queries using the schema of the database” on page 287.
򐂰 Limit metadata authoring (such as creation and update of class and
properties) and administrative object updates (such as GCD objects) to
non-peak hours. This applies to tasks performed within FileNet Enterprise
Manager and through APIs. For reference, go to 11.7, “IBM FileNet Enterprise
Manager” on page 289.
򐂰 Use the minimum auditing possible. For reference, go to 11.8, “Auditing” on
page 295.
򐂰 Use security groups to secure content. For reference, go to 11.10, “Adding
security” on page 306.

316 IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 Store your backup media off-site. For reference, go to 11.11, “System backup
and restore” on page 308.
򐂰 Allot free time before and after the backup. For reference, go to 11.11.2,
“Offline backup” on page 310.
򐂰 If using incremental backups, perform full backups weekly. For reference, go
to 11.11.2, “Offline backup” on page 310.
򐂰 When running the Consistency Checker utility, configure it to start checking a
few hours before the major event. For reference, go to 11.11.5, “Consistency
Check utility” on page 312.

Chapter 11. System administration and maintenance 317


318 IBM FileNet Content Manager Implementation Best Practices and Recommendations
12

Chapter 12. Troubleshooting


In this chapter, we discuss the methods that are used to troubleshoot IBM
FileNet Content Manager (P8 Content Manager) issues. P8 Content Manager
implementations range from a small departmental system running one
application on a single server to very large enterprise systems running many
applications on many servers.

We discuss the following topics:


򐂰 A typical P8 Content Manager system
򐂰 Problem isolation:
– Quick checks
– One or a few users report an issue
– Many users report an issue
– Performance troubleshooting
򐂰 Calling IBM for support
򐂰 Sample Java error log

© Copyright IBM Corp. 2008. All rights reserved. 319


12.1 Troubleshooting overview
The P8 Troubleshooting Guide contains specific errors and their corrective
actions. It is updated periodically. Always keep a current copy available when
you perform troubleshooting. You can obtain the document at:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/40x/P840
_Troubleshooting.pdf

Ideally, you will have automated system monitoring in place. Automated


monitoring helps you quickly identify a major component problem:
򐂰 IBM FileNet System Monitor (see 11.3, “IBM FileNet System Monitor” on
page 282) allows you to quickly identify a major component failure, possibly
before your first user calls. It provides a knowledge base with potential
solutions when it finds a problem so corrective action can occur quickly.
򐂰 Dashboard (see 11.2.2, “Dashboard” on page 276) can help identify problem
areas. You will need to manually check logs and functionality, because the
Dashboard is meant primarily as a tool for gathering performance data.

Automated tools can greatly reduce troubleshooting time, because they can alert
you to a major component failure or problem, such as a disk or file system full.

IBM FileNet Enterprise Manager and Workplace are also tools (applications) that
help you identify problems. Having access to the user applications will be very
helpful in determining what parts of the application are working.

Note: When enabling trace logging for troubleshooting, only enable the
subsystems that are necessary to diagnose the issue. Unconditionally
enabling all levels of all subsystems will have a negative impact on
performance.

12.2 A typical P8 Content Manager system


Before we focus on problem isolation, let us review what a typical P8 Content
Manager system consists of and how it works. Figure 12-1 on page 321 shows a
basic P8 Content Manager system.

320 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Human Interaction

Application Engine
Directory Server

Content Engine

Object
Store
Database Server

Figure 12-1 Pictorial view of a basic P8 Content Manager system

The basic P8 Content Manager application uses:


򐂰 Client Web browser
򐂰 J2EE application server
Our Content Engine and Application Engine are installed on application
servers.
򐂰 Lightweight Directory Access Protocol (LDAP) directory service for security
򐂰 Database for storing content and or metadata properties (Object Store)

In the basic P8 Content Manager application, the user points their browser to the
Application Engine running on the Java 2 Platform, Enterprise Edition (J2EE)
application server:
1. The user receives a logon window and enters the user ID and password.
2. The Application Engine passes the user details to the LDAP server. If correct,
the user’s credentials are obtained by the Application Engine.

Chapter 12. Troubleshooting 321


3. The Application Engine passes the credentials to the Content Engine and
logs the user on. In certain applications, the authentication can pass through
the Content Engine for LDAP credentials.
4. The Content Engine passes information about the system back to the
Application Engine.
5. The Application Engine displays P8 Content Manager information in the
user’s browser.

At this point, the user can view or create new content. The benefit of logging on
this way is that as your system grows you can quickly and easily increase power
by scaling vertically (adding more power to your server) or horizontally (adding
more servers). This approach allows your P8 Content Manager system to grow
and support hundreds of applications with thousands of users working on an
enormous amount of content. The N-Tier J2EE architecture (server-client) is
what allows us to scale from small systems to very large enterprise systems with
minimal effort.

You can simplify the logon sequence by looking at it from a client/server


perspective. The components used are essentially several client and server
components working together:
򐂰 The users’ Web browser is a client to the Application Engine.
򐂰 The Application Engine is a client to both the LDAP and Content Engine.
򐂰 The Content Engine is a client to the database.
򐂰 The J2EE application server is basically the server on which the Application
Engine and Content Engine run.

This might be oversimplified, but as you approach a problem and think about it in
client/server terms, finding the failing client/server section allows you to quickly
rule out what is working and focus on the component that is not working.

We will look at common problems and break them down to a client/server style
approach in the following chapters.

12.3 Problem isolation


IBM FileNet System Monitor provides a means to quickly identify system
problems. We discussed it in 11.3, “IBM FileNet System Monitor” on page 282.
It provides a Web interface to quickly identify failing components and servers.

322 IBM FileNet Content Manager Implementation Best Practices and Recommendations
IBM FileNet Enterprise Manager is also a great tool for problem isolation, and it is
similar to Workplace or Workplace XT. In this section, we offer tips to assist with
the problem isolation process.

12.3.1 Quick checks


These quick checks start at the P8 Content Manager’s foundation and work
toward the application level:
1. Start the Dashboard utility. If your system is running, the Dashboard will
connect to the Listeners. If one or more fail to connect, start looking at the
servers that are failing to connect. (We discussed the Dashboard in 11.2.2,
“Dashboard” on page 276.)
2. Use server Ping pages to see if the Content Engine (CE) or Process Engine
(PE) servers are running. The Ping page indicates whether a server is
running and shows details about the software versions that might be needed
when opening a problem management record (PMR) with IBM support:
a. Ping the server to see if it is running by pointing your browser to:
http://<CE host machine>:<port number>/FileNet/Engine
For example:
http://hqdemo1:9080/FileNet/Engine
If your Content Engine is running, you will see a window similar to
Figure 12-2 on page 324.

Chapter 12. Troubleshooting 323


Figure 12-2 Content Engine ping page

b. To check if the Process Engine server is running, use its ping page by
pointing your browser to:
http://<PE host machine>:32776/IOR/ping
Example:
http://hqdemo1:32776/IOR/ping
You browser will display a window similar to Figure 12-3.

Figure 12-3 Process Engine ping page

324 IBM FileNet Content Manager Implementation Best Practices and Recommendations
3. Log on to FileNet Enterprise Manager. If you are able to log on and view
configuration details, the Content Engine is running.
4. Log on to Workplace. Test your system with it. Depending on your
applications, browse folders and view configuration details. If Workplace
works fine, your Application Engine is running.
If the Process Engine is installed, Workplace can run the Process
Configuration Console to determine if your Process Engine is running. This
assumes your logon ID is a member of the Process Engine Administrators
group.
5. If you have access to your users’ application, run it from your workstation.
Remember, you probably have administrative privileges. If you can, test with
user privileges to quickly validate if there are invalid security settings.

If these tests work, your primary P8 Content Manager server components are
functioning. If one of these tests fail, you have a good place to start
troubleshooting.

Recommendation: Familiarize yourself with these tests while the system is


functioning correctly.

Note: On farmed systems, it takes time for changes to propagate to all


Content Engine Servers, which can cause a reported problem that seems to
go away. In particular:
򐂰 Allow up to two minutes for metadata authoring, which is the creation of
classes or properties to propagate to all caches.
򐂰 Allow for domain level administrative object updates to propagate across
the farm based on the GCDCacheTTL value (the default is 30 seconds).
Refer to 11.7.2, “IBM FileNet Enterprise Manager: Setting system default
values” on page 291 for details.

In the next sections, we look at common problems that users report, and we
diagnose the problems in a client/server fashion.

12.3.2 One or a few users report an issue


Perform the tests in 12.3.1, “Quick checks” on page 323. The tests help you
quickly identify whether this is the first call of many calls to follow or whether most
users are able to access the system.

Chapter 12. Troubleshooting 325


If a user does not see the logon window of the application, look for problems on
the user’s workstation:
򐂰 Is the user using the correct syntax in their browser?
http://<Server>:<Port Number>/<Application>
򐂰 Can the user access other Web servers?
򐂰 Do the other network applications work:
– A problem with other network or Web servers points to network or browser
configuration problems.
– If the network is okay, but Web servers are not working, there can be
problems with their local Java Java Runtime Environment (JRE™)
configuration or how it is configured in their browser.

If a user gets the logon window but cannot logon to the application:
򐂰 Verify that the user’s credentials on the LDAP system are correct:
– Is the user’s ID locked?
– Does the user have the correct group memberships for the system that the
user is attempting to access?
– Was the Content Engine security recently added or changed? Did you use
the Security Script Wizard (see 11.10, “Adding security” on page 306)?

If the logon appears to work, but the application does not appear or does not
work correctly:
򐂰 Did the user receive any new applications or system patches?
A new non-related application might have updated files on the operating
system, loaded a different Java runtime version, or altered system settings.
򐂰 Is the user accessing a different part of the application than the parts that
work?
򐂰 Is the user using a piece of the application that requires a server or external
system that other parts do not?
򐂰 Are any special permissions required to access this portion of the
application?
򐂰 Were there any recent changes to the application that might impact only this
portion?
򐂰 Was the Content Engine security recently added or changed? Did you use the
Security Script Wizard (see 11.10, “Adding security” on page 306)?

326 IBM FileNet Content Manager Implementation Best Practices and Recommendations
12.3.3 Many users report an issue
We assume that the checks in 12.3.1, “Quick checks” on page 323 have failed,
and many users are reporting issues.

Are all of the Content Engines running:


򐂰 Is the J2EE server running or reporting errors? See 11.4.1, “Message logs”
on page 283.
򐂰 Are the Content Engine applications running or reporting errors on the J2EE
server?
򐂰 Are there any full file systems or Windows disk drives?
򐂰 Was the Content Engine security recently added or changed? Did you use the
Security Script Wizard (see 11.10, “Adding security” on page 306)?
򐂰 Is the database running:
– Can the Content Engine connect to it?
– Is the database reporting errors?
򐂰 Can the Content Engine connect to the LDAP service?

Are all of the Application Engines running:


򐂰 If using load balancers, try browsing directly to your AEs.
򐂰 Are all of the external servers, which your application needs, running? Can
the Application Engine connect to them?

12.3.4 Performance troubleshooting


There is a P8 Performance Tuning Guide that contains specific information about
tuning the P8 Content Manager components. To view the paper, go to:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/40x/p8_4
00_performance_tuning.pdf

Performance troubleshooting tips:


򐂰 The Dashboard 11.2.2, “Dashboard” on page 276 is extremely useful for
determining whether performance problems are at the database, Content
Engine, or application level. Familiarize yourself with the Dashboard’s
counters when things are running well so that you can quickly find items that
perform poorly when investigating performance problems.
򐂰 Has user activity changed since the system was running correctly:
– Have user counts or workload increased?

Chapter 12. Troubleshooting 327


– Have any users altered their work pattern?
If one user entered work throughout an eight hour period and the business
unit decided eight people doing the work in one hour is a better way to
conduct their business, performance might be impacted during that hour.
򐂰 Database considerations:
– Your database requires routine maintenance to keep it optimized. Consult
with your database vendor and insure that their routine maintenance
practices are performed.
– Application design has a major impact on system performance. Proper
use of database indexes is a requirement; see 7.3, “Principles for
application design” on page 160 for details.
– You might need to enable tracing in the Content Engine to capture SQL
syntax; see 11.7.3, “IBM FileNet Enterprise Manager: Enable trace
logging” on page 293 for details. SQL tracing can also be enabled at the
database level, so, consult with your database administrator for that.
򐂰 Server considerations:
– J2EE servers routinely perform garbage collection to clear unused code
from memory and manage memory available for applications. The IBM
FileNet Performance Tuning Guide discusses garbage collection and
details memory configurations in the Java Virtual Machine (JVM) tuning
chapter.
– Insure that the server has sufficient physical memory for the applications.
Servers use paging space or virtual memory, which effectively increases
the memory using disk space. Virtual memory is much slower than
physical memory and can cause performance problems if it is used by
your application. Consider a server with 4 GB of RAM and 4 GB virtual
memory. You are running a 2 GB JVM application on it and decide a
second JVM is needed. Your operating system requires space in memory
so adding a second 2 GB JVM will force the operating system to use
virtual memory and swap portions of itself and applications to disk. It
seems as though there is sufficient memory to do this with 8 GB total
memory. This example might work fine in a test environment but can
devastate performance on a production system when paging occurs.
Insufficient physical memory can cause “swap death” where the server
spends more time moving data between physical and virtual memory than
it spends actually running applications.
– When the applications frequently consume 60% of a server’s CPU time,
you need to plan additional server capacity. While 60% utilization might
sound low, a small change in workload can quickly increase utilization and
cause performance problems.

328 IBM FileNet Content Manager Implementation Best Practices and Recommendations
– Administrative functions (performed within IBM FileNet Enterprise
Manager and also through APIs) must be performed during non-peak
hours, because they can consume a lot of CPU resource:
• Metadata authoring, for example, creating and updating classes and
properties.
• Administrative object updates, for example, creating new object stores,
IBM FileNet P8 Domain-level objects, or other Global Configuration
Database (GCD) objects.
򐂰 Application considerations:
– Applications that use inefficient queries or folder schemes can cause
performance problems. We describe application design considerations in
other chapters of this book. We also have two technical notices on the
Web site that discuss ways to efficiently use queries and folders:
• The FileNet Content Engine Query Performance Optimization
Guidelines Technical Notice describes query details.
• The FileNet P8 Recommendations for Handling Large Numbers of
Folders and Objects Technical Notice describes folder use.
You can read the Technical Notices at:
http://www.ibm.com
Select Support & downloads → Documentation → Choose support
type → Information Management → Choose a product → FileNet
Content Manager → Go → Learn → Product documentation →
FileNet P8 Platform.
Or use this hot link:
http://www.ibm.com/support/docview.wss?rs=3273&uid=swg27010422
The technical notices are located in the section FileNet P8 Platform
Technical Notices.

12.4 Calling IBM for support


If you are having difficulty correcting a problem yourself, you can open a Problem
Management Record (PMR) with IBM software support. You can open PMRs by
calling IBM support directly or via the Web. To open a PMR, you will need your
IBM Customer Number (ICN).

Chapter 12. Troubleshooting 329


12.4.1 The IBM Software Support Handbook
The IBM Software Support Handbook, G221-9002, details all facets of IBM
software support. It contains details on self-help, creating a PMR, and escalation
procedures. You can obtain the IBM Software Support Handbook at the following
link:
http://www14.software.ibm.com/webapp/set2/sas/f/handbook/home.html

12.4.2 Open a PMR by calling IBM


IBM software support has local numbers to call in most countries. Local numbers
are at:
http://www.ibm.com

Select Support & downloads → More → Find resources Support phone


numbers/contacts

Or use this hot link:


http://www.ibm.com/planetwide

In the US, call IBM software support at 1-800-IBM-SERV or 1-800-426-7378 and


select option 2. Ask the dispatcher to open a PMR on your behalf and connect
you with a support specialist.

12.4.3 Open a PMR via the Web


You can open a PMR via the Web through the Electronic Service Request (ESR)
tool, which is available at:
http://www.ibm.com

Select Support & downloads → Open a service request → Send request to


choose your country → Select type and submit Software → Select product
and submit → FileNet → Submit

Or use this hot link:


http://www.ibm.com/software/support/probsub.html

Then, select ESR.

To submit PRMs via the Web site, your Site Technical Contact, at your company,
must authorize you to submit PMRs electronically to IBM.

330 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Note: Only use electronic PMRs for minor problems. If your production system
is down, call IBM Support.

12.4.4 Items to have available when contacting IBM Software Support


This list is a summary of items of which you must be aware when opening a
PMR. The items are described in the IBM Software Support Handbook.

When calling or submitting a problem to IBM software support about a particular


service request or problem, have the following information ready:
򐂰 Your IBM customer number
򐂰 Your company name
򐂰 Your contact name:
– Preferred means of contact (voice or e-mail)
– Telephone number where you can be reached if request is voice
򐂰 Machine type and model number:
– Related product and version information
– Related operating system and database information
򐂰 Detailed description of the issue
Being able to articulate the problem and symptoms before contacting
software support will expedite the problem solving process. It is extremely
important that you are as specific as possible in explaining a problem or
question to our software specialists. Our specialists want to be sure that they
provide you with exactly the right solution; therefore, the better they
understand your specific problem scenario, the better they are able to resolve
it.

Gather Background Information


To effectively and efficiently solve a problem, the software specialist needs to
have all of the relevant information about the problem. Being able to answer the
following questions will help in the efforts to resolve your software problem:
򐂰 Has the problem happened before, or is this an isolated problem?
򐂰 What steps led to the failure?
򐂰 Can the problem be recreated? If so, what steps are required?
򐂰 Have any changes been made to the system, such as hardware, network, or
software?

Chapter 12. Troubleshooting 331


򐂰 Were any messages or other diagnostic information produced? If yes, what
were they?
It is often helpful to have a printout of the message numbers of any messages
received when you place the call for support.
Define your technical question in specific terms and provide the version and
release level of the products in question.
Gather relevant diagnostic information, if possible. It is often necessary that
the software support specialists analyze specific diagnostic information, such
as storage dumps and traces, in order to resolve your problem. Gathering this
information is often the most critical step in resolving your problem.

On more difficult problems, you might also need to have the following items:
򐂰 Application architecture diagram that details how all application components
are designed to work
򐂰 Network topology diagram, including servers, routers, firewalls, and network
load balancers
򐂰 If your problem is performance-related, performance archive files

Determine the business impact


You need to assign a severity level to the problem when you report it, so you
need to understand the business impact of the problem that you are reporting. A
description of the severity levels is in Table 12-1 on page 333.

332 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 12-1 Problem severity descriptions and examples
Severity level Further definitions Examples

Severity 1 Critical situation/system down: The P8 Content Manager


Business critical software system is down and affecting
component is inoperable. As a all users.
rule, it applies to production
environment.

Severity 2 Severe impact: The P8 Content Manager


A software component is severely system cannot be accessed
restricted in its use, causing by one department. Other
significant business impact. users are able to access the
system.

Severity 3 Moderate impact: A client cannot connect to a


A non-critical software component server.
is malfunctioning, causing
moderate business impact.

Severity 4 Minimal impact: Documentation is incorrect.


A non-critical software component Additional documentation
is malfunctioning, causing minimal requested.
impact, or a non-technical request
is made.

When speaking with a software support specialist, also mention the following
items if they apply to your situation:
򐂰 You are under business deadline pressure.
򐂰 Your availability, or when you will be able to work with IBM Software Support.
򐂰 You can be reached at more than one phone number.
򐂰 You can designate a knowledgeable alternate contact with whom the IBM
support representative can speak.
򐂰 You have other open problems (PMRs) with IBM regarding this service
request.
򐂰 You are participating in an early support program.
򐂰 You have researched this situation prior to calling IBM and have detailed
information or documentation to provide for the problem.

Chapter 12. Troubleshooting 333


12.5 Sample Java error log
This is a sample Java log file. It is from the p8_server_error.log discussed in
11.4.1, “Message logs” on page 283. In it, we performed a normal P8 Content
Manager start. We then forced errors by stopping the database and attempting to
use it. We truncated certain information, which we denote by an ellipsis (...) or an
arrow ( → ) .

Example 12-1 shows the normal start messages.

Example 12-1 Normal start messages


2007-09-17 10:39:58,750 INFO [server.startup : 3] - Server startup
completed
2007-09-17 10:39:59,828 INFO [TaskManager$RootTask_RootTask_#1] -
TTLStreamReaper task has started.
2007-09-17 10:39:59,843 INFO [QueueItemDispatcher_EVTFS_#4] - Starting
queue dispatching for EVTFS Queue:
com.filenet.engine.queueitem.QueueItemDispatcher
2007-09-17 10:39:59,843 INFO [ContentQueueDispatcher_EVTFS_#5] -
ContentQueueDispatcher starting :
{41870319-0F37-45FC-96E1-2B51CBF89410} EVTFS
2007-09-17 10:39:59,843 INFO [PublishRequestDispatcher_EVTFS_#7] -
Starting queue dispatching for EVTFS Queue:
com.filenet.engine.publish.PublishRequestDispatcher
2007-09-17 10:39:59,906 INFO [ContentQueueDispatcher_EVTFS_#5] -
ContentQueueDispatcher [EVTFS] Session
Id={D140A90E-39CA-4C8E-82AD-AFDE82080708}
2007-09-17 10:40:00,421 INFO [QueueItemDispatcher_EVTFS_#4] - EVTFS
({41870319-0F37-45FC-96E1-2B51CBF89410}) is enabled
2007-09-17 10:40:00,500 INFO [CBRDispatcher_EVTFS_#8] - Starting CBR
queue dispatching for EVTFS
2007-09-17 10:40:00,500 INFO [CBRDispatcher_EVTFS_#8] - EVTFS
({41870319-0F37-45FC-96E1-2B51CBF89410}) is enabled
2007-09-17 10:40:02,000 INFO [PublishRequestDispatcher_EVTFS_#7] -
EVTFS ({41870319-0F37-45FC-96E1-2B51CBF89410}) is enabled

After we start forcing errors by stopping the database, and then we try to use it,
we get the error messages shown in Example 12-2.

Example 12-2 Error messages


2007-09-17 10:51:40,859 ERROR [WebContainer : 1] - method name:
throwEngineException principal name: Administrator Global Transaction:
false User Transaction: false Exception Info:

334 IBM FileNet Content Manager Implementation Best Practices and Recommendations
com.filenet.api.exception.EngineRuntimeException: DB_ERROR: An error
occurred accessing the database. ErrorCode: 0, Message: 'Connection
reset by peer: socket write error'

...


com.filenet.engine.dbpersist.DBMSSQLContext.throwEngineException(DBMSSQ
LContext.java:186)
...
Caused by: com.ibm.websphere.ce.cm.StaleConnectionException: Connection
reset by peer: socket write error
→ at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
...
2007-09-17 10:52:01,203 INFO [ImportAgentDispatcher_EVTFS_#6] -
Retrying Connection to:FNGCDDS Caused by:The TCP/IP connection to the
host has failed. java.net.ConnectException: Connection refused:
connectDSRA0010E: SQL State = 08S01, Error Code = 0DSRA0010E: SQL State
= 08S01, Error Code = 0
2007-09-17 10:52:01,437 INFO [CacheUpdateDispatcher_EVTFS_#3] -
Retrying Connection to:FNGCDDS Caused by:The TCP/IP connection to the
host has failed. java.net.ConnectException: Connection refused:
connectDSRA0010E: SQL State = 08S01, Error Code = 0DSRA0010E: SQL State
= 08S01, Error Code = 0

The exception log for this error contains over 200 lines. The logs can get
extremely large during normal operation. The log clearly indicates that the
Content Engine cannot communicate with the database (see the text highlighted
in bold in Example 12-2 on page 334). After starting the database, everything
returned back to normal operation.

Chapter 12. Troubleshooting 335


336 IBM FileNet Content Manager Implementation Best Practices and Recommendations
13

Chapter 13. Solution building blocks


In this chapter, we provide product descriptions, function matrixes, and visual
tools to aid designers in preparing Enterprise Content Management (ECM)
solutions. The material describes available options for the input, storage,
process, and presentation phases of ECM solutions.

We introduce these ECM design aids:


򐂰 Solution building blocks:
– Content ingestion tools
– Storage design visual aid
– Content and business process options
– Presentation features
򐂰 Detailed function references
򐂰 Four sample use cases

© Copyright IBM Corp. 2008. All rights reserved. 337


13.1 Solution building blocks
At a fundamental level, any ECM solution is composed of four major solution
components:
򐂰 Content ingestion
򐂰 Storage
򐂰 Content and workflow management
򐂰 Presentation and delivery

All ECM solutions are essentially a construction of information input, storage,


information processing, and presentation and delivery. Figure 13-1 illustrates the
major ECM solution components.

Content Content and Presentation


Ingestion Workflow And Delivery

Repository Design

Figure 13-1 Major ECM solution components

These major components form the building blocks of ECM solution design.
Solution building blocks are application tools that ECM solution designers can
specify and combine to build out each of the four components of an ECM
solution: ingestion, storage, process, and presentation. The IBM FileNet suite of
products contains applications and tools that offer designers a wide range of
features and functions for the design of each of the major components of an ECM
implementation.

Figure 13-2 on page 339 shows several of the IBM tools that are available to
ECM designers and the place for these blocks within the four major design
phases of an ECM solution.

338 IBM FileNet Content Manager Implementation Best Practices and Recommendations
ECM
Content and Workflow Presentation and Delivery
Content Ingestion
Manag ement Management

Paper scanning Index and validate Search Searching


templates
Fax Add document
Publishing

Email

Bind documents
Applications Browsing
together

FTP Entry templates Checkin, checkout


Printing
Monitored
filesystem Workflow
Subscriptions Display
definitions
Workflows / EAI

SMTP
Send

Figure 13-2 ECM input and output diagram

13.1.1 Content ingestion tools


IBM FileNet offers several content ingestion (capture) applications, each of which
is designed for capturing a particular type of media. Refer to Table 13-1 on
page 340. The capture applications listed in this section can handle the following
media types:
򐂰 Paper documents
򐂰 Faxes
򐂰 Office (electronic) documents, presentations, or spreadsheets
򐂰 PDF, Web, .txt files, or multimedia files
򐂰 E-mail messages
򐂰 Documents stored on network drives or Desktops
򐂰 Documents stored in remote repositories, such as Domino or Documentum
򐂰 Documents stored in IBM FileNet Image Services

Chapter 13. Solution building blocks 339


Ingestion tools
Table 13-1 Tools for content ingestion
Document type IBM FileNet P8 Comments and features
application tool

Paper documents Capture Pro Supports high speed


scanning of paper
documents or forms;
supports barcoding; and
offers batch operations for
efficient review and data
entry functions. Capture
will also import files stored
on file systems.

Electronic documents Workplace Workplace allows users to


(manual capture) Office Integration manually capture any type
of electronic document.
Office Integration supports
the Microsoft Office suite:
users can manually add
documents from the file
menus of these
applications.

Electronic documents Records Crawler Both of these tools will


(automatic capture) Capture Pro monitor file share or
desktop C: drive locations.
Documents that are stored
in or are copied to the
monitored locations are
automatically copied or
moved to the P8 Content
Manager repository.
Folder structures, security
settings, and file properties
are also captured and
added to the repository
along with the documents.

340 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Document type IBM FileNet P8 Comments and features
application tool

Electronic documents Content Federated Content Federated


(automatic capture) Services (CFS) Services is capable of
managing documents “in
place”. CFS allows users
to add documents to the
P8 Content Manager (CM)
repository, but leave the
original in place on file
systems, Image Services,
or selected remote
(non-IBM) repositories.
When the original
documents are updated,
changes are reflected as
new versions in the P8
Content Manager
repository.

Email Messages Email Manager Email Manager monitors


either Microsoft Exchange,
Lotus Notes, or
GroupWise mail servers.
Rules are configured to
capture any message that
meets rule criteria. The
captured messages are
added to the P8 Content
Manager repository, and
links are left behind that
allow users to open the
captured messages in their
usual mail application.

SAP® Outbound Archiving SAP Integration SAP Integration allows


ingestion of documents
from SAP. SAP documents
are created in R/3
application modules, such
as SD, FI, HR, or MM,
through SAP scripts, and
are imported to P8 Content
Manager through SAP
Integration modules.

Chapter 13. Solution building blocks 341


13.1.2 Storage design visual aid
After content has been captured, it is stored in the P8 Content Manager
repository. Efficient content storage is a major challenge for ECM designers.
After it has been ingested, content must be classified by type, identified by
metadata, and assigned security protection.

The following storage design visual aid will help designers build a repository
storage scheme that will store content with the correct class, with appropriate
metadata properties, on the right storage device, and with the proper security
settings.

As covered in Chapter 5, “Basic repository design” on page 85, repository


content is stored by class. In fact, document class definitions form the foundation
of P8 Content Manager storage design. Figure 13-3 on page 343 shows how to
design classes by hierarchy. Start the design process by listing each type of
document that will be stored in your repository. Beside each type of document
(document class), list any special storage or security requirements for that type
of document.

Class hierarchy
In an IBM FileNet P8 repository, content is stored according to class. As
described in Chapter 5, “Basic repository design” on page 85, document classes
are object-oriented containers that hold content, properties that describe the
content, security descriptors, and pointers to storage areas on disk where the
content will be written.

In an IBM FileNet P8 repository, document classes are constructed in a


hierarchy. The root document class contains a set of system properties that will
be inherited by any child classes that are created beneath the root. When you
design the document classes of your repository, list the classes in a spreadsheet
like the one shown in Figure 13-3 on page 343. Each level of the hierarchy has a
particular purpose:
Root Class The root document class contains a set of system
properties that will be shared by all document classes. At
this level, you can also assign a default storage policy that
will apply to all child classes unless specifically set
otherwise.
Base Class The next level of the class hierarchy, the base class, lists a
set of custom properties that will be shared by all
documents in this organizational scheme. These custom
properties are client-configurable and need to describe
unique identifiers and navigation strings that will help
users filter search criteria. For example, users must be

342 IBM FileNet Content Manager Implementation Best Practices and Recommendations
able to construct a search for any document in the
Finance division, in the Accounting department, and in the
Accounts Receivable group.
Document Class The document class level describes individual document
types. The properties at this level describe properties
unique to individual document types. A “loan” document,
for example will have a loan number, while a “contract”
document will have a contract number and a contract
date.

A complex repository can have more than three levels of class hierarchy, but
regardless of the number of levels, the spreadsheet arranges the classes in a
hierarchy that makes the inheritance pattern clear. Figure 13-3 shows the class
hierarchy spreadsheet.

Docum ent C lass Hierarchy Class Storage Policy Class Security Policy

D ocum ent Class


Docum ent T itle Default Storage D evice Default Security Policy
Creator
Date Created
Date Last M odified
O wner

BaseDocum entClass
Docum ent ID
Division
Departm ent
G roup

Contract Class Legal_Contract


Contract N um ber
Contract D ate
Expiration Date
Custom er ID
Vendor N um ber

Loan Class Finance_Storage Finance_Paper


Loan Num ber
Custom er ID
Loan Effective Date
Loan M ature Date
Loan Status

Em ployee Inform ation C lass HR _Storage HR_Confidential


Em ployee N um ber
Em ployee N am e
Em ployee Status
Hire Date
T erm ination date

Figure 13-3 Class hierarchy spreadsheet

The spreadsheet also shows two additional columns, “Class Storage Policies”
and “Class Security Policy”. These columns define two other aspects of
document class design.

Class storage policy


As described in Chapter 8, “Advanced repository design” on page 185, each
document class may have a designated storage location on disk. Storage

Chapter 13. Solution building blocks 343


locations are designated by storage policies. In the class storage policy column,
list the storage policy that will be associated with each document class. If a
document class does not require a special storage policy, just leave the column
blank and let the child document class inherit the default storage policy that is
defined in the parent class.

The storage policy column labels need to refer to a storage policy description as
shown in Figure 13-4.

Policy Name FileStore Server Name Share SubFolder Service Name

Default Storage ContactOS_FS \\hqdemo\P8_SAN_Storage Default hqdemo


Finance Storage ContactOS_FS \\hqdemo\P8_SAN_Storage Finance_Data hqdemo
Contract_Storage ContactOS_FS \\hqdemo\P8_SAN_Storage Contract_Data hqdemo
HR_Storage ContactOS_FS \\remote_device\Fixed Restricted_HR hqdemo

Figure 13-4 Storage policy design

Class security policy


You can also set security settings by document class by creating default instance
security entries. Default instance security is not the only security mechanism
available in IBM FileNet P8 repositories. Folder-inherited and marking sets are
also available, but default instance security is useful, because it acts as a default
security permissions template for each type of document that is stored in the
repository.

An IBM FileNet P8 repository security best practice is to always define security


by user group. In the security matrix that is shown in Figure 13-5 on page 345,
the column headers show group names, and the row headers describe the
security permissions that each group will be granted.

By defining security by group, designers anticipate situations when individual


users leave the organization, transfer between groups, or need to be granted
access to multiple groups. For each of these common events, it is a simple
process to disable or move user accounts among Lightweight Directory Access
Protocol (LDAP) groups, but a harder task to adjust security on document
classes or individual repository documents.

Best Practice: Always define repository security permissions by LDAP group,


not by individual users.

344 IBM FileNet Content Manager Implementation Best Practices and Recommendations
C la s s : L e g a l C o n tra c ts

G ro u p s
All C o n tra c t C o n trac t CM
E m p lo ye e s C le rk s O ffic e rs Ad m in is tra to r
P e rm is sio n s

V ie w D oc u m e n t x x x x
A d d D o c um en t x x x
M o dify P ro p e rtie s /A nn o tatio n s x x x
M o dify S e c urity x x

C ha n g e D o c um e nt S ta tu s x x x
P u b lis h D o c u m e n t x x x
D ele te x
C re a te n ew C la s s e s &
S u b C la s s e s x

Figure 13-5 Security policy design

13.1.3 Content and business process options


One of the great strengths of the IBM FileNet P8 ECM toolkit is the wealth of
options available for actively managing content after it has been stored in a P8
Content Manager repository. The tools fall into several categories:
򐂰 Actions that occur when content is created or changed
򐂰 Life cycles that define content states: in creation, under review, approved, or
published, for example
򐂰 Workflows that define a business process that controls content according to a
business policy

Table 13-2 on page 346 describes IBM FileNet P8 application tools for actively
managing content.

Chapter 13. Solution building blocks 345


Table 13-2 IBM FileNet P8 content processing application tools
IBM FileNet P8 application tool Features and capability list

Content life cycles P8 Content Manager enables you to set


up document life cycles, or states, that
documents pass through during their
useful life. Life cycles are always
sequential: a document moves forward or
backward through a series of states.

Life cycles consist of two P8 Content


Manager objects:
򐂰 Life cycle policy: Defines the
document’s states. The policy also
identifies the life cycle action that
executes in response to state
changes.
򐂰 Life cycle action: Action that the
system performs when a document
moves from one state to another.

State changes occur when the following


actions occur:
򐂰 Promote: Moves a document forward
to its next life cycle state.
򐂰 Demote: Returns a document to its
previous life cycle state.

Actions that can be triggered when


documents are promoted and demoted
are virtually unlimited. Actions can be any
function that can be encoded in a Java
script, VB script, XML Function, or Java
Class.

346 IBM FileNet Content Manager Implementation Best Practices and Recommendations
IBM FileNet P8 application tool Features and capability list

Event actions Event actions are scripts that run when


content is created or changed. Event
scripts can invoke Java code or launch
workflows. The code or workflows act
directly on the affected content when
events occur. Events can be triggered by
any of the following actions:
򐂰 Change class
򐂰 Change state
򐂰 Checkin
򐂰 Classify complete
򐂰 Creation
򐂰 Custom event
򐂰 Deletion
򐂰 Promote version
򐂰 Demote version
򐂰 File (in a folder)
򐂰 Unfile (including deletion of a
subfolder)
򐂰 Update (whenever an object’s
properties are changed)
򐂰 Update security (whenever the
security of an object is changed)

Business Process Management You can also manage content by


workflow. Workflows or business
processes can be launched by any of the
event actions just listed. Workflows can
have approval steps, rules-based
branching routes, subprocesses, and step
processors (steps that invoke java
classes).

13.1.4 Presentation features


Several P8 Content Manager features are available for presenting content to your
users. IBM FileNet P8 presentation applications, which are shown in Table 13-3
on page 348, include options for converting active content to the following
formats:
򐂰 PDF
򐂰 HTML
򐂰 Email attachment
򐂰 eForms or eForms attachment
򐂰 Annotated image

Chapter 13. Solution building blocks 347


Table 13-3 IBM FileNet P8 presentation tools
IBM FileNet P8 presentation tool Features and capability list

Rendition Engine The Rendition Engine facilitates document


publishing. Publishing a document
enables a replica of a document to be
made in either PDF or HTML format. The
replica, known as the publication
document, can have its own security and
property settings. Publishing can be
triggered by event actions or by changes
in a document’s life cycle state. When a
document reaches the “released” life cycle
state, for example, a PDF version can be
automatically created with public-view
security rights.

Published documents:
򐂰 Can continue to exist after the source
document has been deleted
򐂰 Can be automatically deleted when
the source document is deleted
򐂰 are not changed when their source
documents are changed
򐂰 Can exist in a different folder than the
source documents
򐂰 Can have a different file format than
the source documents, for example,
the source document might be a word
document, while the publication
documentation might be an HTML
document. Publishing options defined
by individual templates
򐂰 Can originate as Microsoft Office (for
example, Word, Excel, and
PowerPoint®) documents and be
rendered to PDF or HTML.

Liquent RenderPerfect is the Rendition


Engine publishing agent.

348 IBM FileNet Content Manager Implementation Best Practices and Recommendations
IBM FileNet P8 presentation tool Features and capability list

Document Publisher The document publisher allows authors to


publish Microsoft Word documents
directly to HTML Web pages. The
publishing process is customizable and
works by converting Microsoft Word styles
into HTML style sheets.

The Document Publisher automatically


creates Web style backgrounds, tables of
content, embedded images, and table and
figure references. It reads Microsoft Word
style references, translates the styles to
HTML, and produces stylistically
consistent Web sites that mirror the
Microsoft Word document in layout and
content.

The Document Publisher can be triggered


by event action, life cycle state, or
Workflow step.

Send Email Any P8 Content Manager content can be


attached to an e-mail and sent as part of
an event action. Send mail events can be
triggered by event action, life cycle state,
or Workflow step.

Chapter 13. Solution building blocks 349


IBM FileNet P8 presentation tool Features and capability list

eForms IBM FileNet eForms offers a set of tools for


designing, filing, and routing electronic
forms. Repository content can attached to
eForms and routed by workflow, or
eForms can be stored as stand-alone
active content. eForms offers the following
features:
򐂰 Comprehensive forms management,
including design workbench,
versioning, and form search and
tracking utilities
򐂰 Fully integrated with P8 Content
Manager security
򐂰 Forms offer database connectivity for
choice restrictions and data validation
򐂰 Offline forms allow users to fill out
forms in remote locations
򐂰 Forms are stored in XML format for
ease of interconnection with
databases or existing applications
򐂰 Forms can be rendered in PDF,
HTML, or ODBC formats
򐂰 Forms can be customized with
embedded Java scripts and class
calls
򐂰 Supports both Microsoft Cryto API
and IBM FileNet I-Sign digital
signature methods

350 IBM FileNet Content Manager Implementation Best Practices and Recommendations
IBM FileNet P8 presentation tool Features and capability list

Annotations Annotation objects, such as arrows, sticky


notes, stamps, or call outs, can be
attached to an image or document object
for the purpose of annotating or footnoting
that object. You can associate annotations
with custom objects, documents, and
folders.

Annotations:
򐂰 Are independently securable. Default
security is provided by the class and
by the annotated object. Can
optionally have a security policy
assigned to it
򐂰 Can have subclasses
򐂰 Can have zero or more associated
content elements, and the content
does not need to have the same
format as its annotated object
򐂰 Are uniquely associated with a single
document version and thus are not
versioned when a document version
is updated
򐂰 Can be modified and deleted
independently of the annotated object
򐂰 Can be searched for and retrieved
with an ad hoc query
򐂰 Can subscribe to server-side events
that launch when an action (such as
creating an annotation) occurs
򐂰 Can be audited

13.2 Detailed function references


A complex ECM solution can be decomposed at the first level into one of the four
major ECM building blocks: input, storage, processing, and presentation, and
then at a second level, into a series of smaller components that we call design
patterns. Design patterns are standard solutions to common ECM design
challenges. Here are a few examples of ECM solution design patterns:
򐂰 Allow a document reviewer to promote a document to “approved” status
򐂰 Capture all e-mail messages older than 90 days
򐂰 Route a received document according to an approved business process
򐂰 Publish a document in PDF format

Chapter 13. Solution building blocks 351


The tables in this section describe detailed functional references from several of
the application tools that are listed in the preceding sections. The purpose of the
tables is to offer a reference of common solutions for typical ECM solution
challenges. We encourage ECM designers to recognize patterns in their ECM
designs and look for common solutions in the reference tables.

We cover the following areas:


򐂰 “Content ingestion design patterns” on page 352
򐂰 “Content and workflow management-related design patterns” on page 369
򐂰 “Presentation and delivery management-related design patterns” on
page 378

13.2.1 Content ingestion design patterns


We discuss design patterns for content ingestion in the following sections:
򐂰 “E-mail design patterns” on page 352, including use cases and applicable
design patterns
򐂰 “Electronic document design patterns” on page 356, including use cases and
applicable design patterns
򐂰 “Design patterns for content ingestion by Business Process Management” on
page 358, including use cases and applicable design patterns
򐂰 “Design patterns for content ingestion by custom applications” on page 360,
including use cases and applicable design patterns
򐂰 “Image (scanned paper and fax) ingestion design patterns” on page 361,
including use cases and applicable design patterns
򐂰 “Content Federation Services-related design patterns” on page 365, including
use cases and applicable design patterns
򐂰 “SAP outbound archiving” on page 368

E-mail design patterns


We summarize e-mail design patterns and challenges in Table 13-4 on
page 353.

352 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 13-4 E-mail design patterns
Design pattern Description Challenges

EmailManInb Manually archiving 򐂰 Distribute plug-in for e-mail client


e-mails by using an 򐂰 Enhance functionality for indexing the e-mail in
e-mail client for the e-mail client
inbound e-mails 򐂰 Customizing notification icons

EmailManOut Manually archiving 򐂰 Distribute plug-in for e-mail client


e-mails in the e-mail 򐂰 Enhance functionality for indexing the e-mails in
client for outbound the e-mail client
e-mails 򐂰 Customizing notification icons

EmailAutoRules Automated rule-based 򐂰 Volumes


e-mail archival for 򐂰 Administration of a large number of rules
outbound and inbound 򐂰 Mapping security between the messaging
e-mails without user system and the directory service that serves P8
interaction Content Manager
򐂰 Grouping resolution for security and compliance
reasons
򐂰 Setting security to recipients or to a technical
user
򐂰 Postprocessing might be too resource-intensive
򐂰 Declaration of record

EmailStub Stubbing e-mail 򐂰 Defining the moment of stubbing


attachments after a 򐂰 Protecting a stubbed attachment by using a
certain period technical user
automatically 򐂰 Protecting the Stub URL from malicious usage
by leveraging the hashed information
򐂰 Mobile users have no access to P8 Content
Manager
򐂰 Performance aspect preforming the stubbing by
mailboxes, by server, and so forth
򐂰 Usability issues not having single sign-on in
place

EmailOutSMTP Sending e-mail from an 򐂰 Legacy systems do not contain the recipient’s
event in CM/Business e-mail addresses for outbound e-mails
Process Management 򐂰 No spell checker available
(BPM) 򐂰 No customized text

RelateEmail Storing e-mails either as Linkage between body and attachments for
individual files (body searches
and attachments) or as
one file See more details about relating documents in
“Relate and bind document design patterns” on
page 375

Chapter 13. Solution building blocks 353


Design pattern Description Challenges

EmailDocFormat Converting e-mail to a Deciding where this happens (Email Manager, P8


different document Content Manager, or rendering later)
format, such as ASCII,
PDF, TIFF, and so forth Also see “DeliverInterceptedStream” on page 380

EmailDocClass How are the body and 򐂰 Email Document class


the attachments stored 򐂰 Custom Document Class
in CM

RestrictEmails How are the e-mails 򐂰 By recipients from the messaging system
secured (users/groups)
򐂰 By technical users/groups
򐂰 Functional mailboxes

See as well “Restricting design patterns” on


page 381

EmailIndex When is the e-mail 򐂰 Before archiving


indexed 򐂰 Moment of archiving
򐂰 After archiving
򐂰 Database index to support the searches

EmailFullText Are the body and 򐂰 Before archival


attachments fulltext 򐂰 At archival time
indexed 򐂰 After archival
򐂰 Never
򐂰 Just on certain properties, such as the subject

EmailDeclRecord Can the e-mail be 򐂰 Yes


declared as a record at 򐂰 No
the time of archival 򐂰 Maybe later as part of a postprocessing step

EmailRestrictChange What is the mechanism 򐂰 Not versionable


to prevent changes to a 򐂰 Correct security in place
stored e-mail 򐂰 Records management
򐂰 Audit logs activated

See as well “Restricting design patterns” on


page 381

EmailSingleInstance Is it your goal to store 򐂰 Multi-business context index requirements


one e-mail only one 򐂰 Security concept
time and use it in all
kinds of business
contexts

354 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Design pattern Description Challenges

EmailClassificationAut Where will the e-mails 򐂰 Automated rule-based before ingestion


omation be classified 򐂰 Automated after ingestion
򐂰 Manually before ingestion
򐂰 Manually after ingestion

E-mail use cases and applicable design patterns


Table 13-5 lists the e-mail management use cases, their value propositions, and
the applicable design patterns for each use case.

Table 13-5 E-mail management use cases and applicable design patterns
Use case Value propositions Applicable design
patterns

Mailbox and storage 򐂰 Manage mailboxes 򐂰 EmailAutoRules


space management 򐂰 Increase server performance 򐂰 EmailStub
򐂰 Enable faster backup and restore 򐂰 EmailSecure
򐂰 Easier server upgrades
򐂰 Leverage storage best practices
򐂰 Apply simple retention

Manage e-mail as a 򐂰 Enable records management 򐂰 EmailManInb,


record and ensure 򐂰 Perform legal discovery EmailManOut,
compliance 򐂰 Supervise and monitor for non-compliance EmailAutRules
򐂰 EmailClassificationAutom
ation
򐂰 EmailSecure
򐂰 EmailRecord
򐂰 EmailIndex
򐂰 EmailFullText
򐂰 EmailDocFormat
mainly for export

Manage e-mail as 򐂰 Manage e-mail as a content type 򐂰 EmailManInb,


content: Extract 򐂰 Automate or suggest message EmailManOut
knowledge and data classification 򐂰 EmailDocClass
buried in e-mail 򐂰 Use content analytics to identify trends, 򐂰 EmailIndex
risks, and analyze data 򐂰 EmailFullText
򐂰 Additional tagging and metadata creation Thesaurus
򐂰 Response suggestion or routing of e-mail 򐂰 EmailClassificationAutom
ation
򐂰 EmailSMTP
򐂰 EmailSecure

Chapter 13. Solution building blocks 355


Use case Value propositions Applicable design
patterns

Manage e-mail as part of 򐂰 To accelerate and automate business 򐂰 EmailSMTP as


a business process processes where e-mail participates in the opposed to
workflow or is part of the active case EmailManOut
򐂰 Automate workflow steps 򐂰 EmailAutRules
򐂰 Associate e-mail content to processes, 򐂰 EmailClassificationAutom
cases, and LOB systems ation
򐂰 EmailDocClass
򐂰 EmailDocLink

Electronic document design patterns


The nature of most documents stored in P8 Content Manager is as an electronic
document. An electronic document means that the document has a certain life
cycle associated with it, which includes authoring, reviewing, correcting,
releasing, publishing.

In this section, we assume that the document does not contain pages and the
document is not a compound document. Support for compound documents is
very new, and therefore, we do not address compound documents in this book.

All design patterns share the same challenges. The repository design must be
completed before the ingestion method can be used.

Electronic document design patterns and challenges are summarized in


Table 13-6.

Table 13-6 Electronic document design patterns


Design pattern Description Challenges

ElecDocHighVolOnce Documents are available 򐂰 The existing file system is not structured
on a file system for a single ideally
ingestion. 򐂰 Folder as opposed to searches question
򐂰 Usability of accessing application after
ingestion
򐂰 Source files are multiple times stored in
various versions

ElecDocHighVolMulti 򐂰 Documents arrive from 򐂰 Document naming conventions to map


several sources and existing versions
are uploaded 򐂰 Indexes must be provided either through a
continuously. control file or through a meaningful folder
򐂰 Indexing takes part of structure to derive part of the index
the process. information
򐂰 Provide security context

356 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Design pattern Description Challenges

ElecDocLowManually Documents are ingested 򐂰 Due to the manual step, this process can
and typically indexed by a only be done for low volume ingestion
user. 򐂰 User has to authenticate at the ingestion
application (usability)
򐂰 Office Integration for non-Microsoft Office
products
򐂰 Distribution of Office Integration if used
򐂰 User has to provide meaningful index
information
򐂰 Lack of drag and drop support
򐂰 Education of users

ElecDocEmail Documents are 򐂰 Provide indexing information


transported over e-mail 򐂰 See Email Manager design patterns in
and leverage the Email “E-mail design patterns” on page 352
Manager infrastructure

ElecDocWebDav Documents are copied to a 򐂰 Add Entry Templates must be


mapped network drive preconfigured for every major folder
using WebDav. 򐂰 Folder bound
򐂰 Cannot use versions very well
򐂰 Only document-based, not page-based
(only first content element can be used)

ElecDocAutoRules Automated rule-based 򐂰 Volumes


document archival without 򐂰 Administration of large number of rules
user interaction 򐂰 Mapping security between file system and
the directory service that serves P8
Content Manager
򐂰 Groups resolution for security and
compliance reasons
򐂰 Setting security to file system users or to a
technical user
򐂰 Prevent further postprocessing for every
single file
򐂰 Declaration of record

Electronic document use cases and the applicable design patterns


Table 13-7 on page 358 lists the electronic document use cases, their value
propositions, and the applicable design patterns for each use case.
These use cases are explicitly for new documents. We discuss the checkin of
newer revisions under the 13.2.2, “Content and workflow management-related
design patterns” on page 369.

Chapter 13. Solution building blocks 357


Table 13-7 Electronic document use cases and the applicable design patterns
Use case Value proposition Applicable design
patterns

Add new document using 򐂰 Zero deployment needs ElecDocLowManually


Workplace 򐂰 Predefined capability
򐂰 Leveraging add entry templates

Add new document using 򐂰 Predefined application for Microsoft ElecDocLowManually


Office Integration Office Suite
򐂰 Ease of use
򐂰 Leveraging add entry templates

New documents are stored 򐂰 No human interaction needed, back-end ElecDocHighVolMulti


in a File System by either integration possible
users or applications 򐂰 No logon needed

Windows Explorer is used 򐂰 Windows support for WebDav can be ElecDocWebDav


to place a new document leveraged. Or
in a folder structure Or ElecDocHighVolMulti
򐂰 Windows Explorer or any other File
Manager can be used (drag and drop)

Use bespoke application 򐂰 Ease of use ElecDocAutoRules


򐂰 Drag and drop can be provided
򐂰 Leading application takes control over
storage

While Workplace, Office Integration, and WebDav are mainly suited for low
volume ingestion with human interaction for the indexing part, Records Crawler
and other alternative tools can be leveraged for high volume ingestion. You can
use the tools for high volume ingestion for image ingestion, as well, which we
discuss in “Image (scanned paper and fax) ingestion design patterns” on
page 361.

Design patterns for content ingestion by Business Process


Management
There are situations when a workflow expects input either from the user or from a
back-end system. In the industry, there are situations when an external party
sends an electronic document either as an online form or a Web service. There
are also situations when an internal user attaches additional documents, which
have not yet been archived to the P8 Content Manager, to a workflow.

Table 13-8 on page 359 summarizes the design patterns for content ingestion by
Business Process Management and the design challenges.

358 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 13-8 BPM ingests content design patterns
Design pattern Description Challenges

OperationsBasedDoc All document-based activities in BPM are Custom object operations are
delegated to P8 Content Manager using a available in BPM operations or
library of functions that are called need to be crafted according your
operations. The default operations are specification.
called CEOperations and implement the
most needed interactions between BPM
and CM.

OperationsBasedCusto Instances of workflows do not maintain Custom objects are not


mobject persistency of the various states during versionable and therefore are
run time after being completed. Often this instantiated for every status
pattern is used to save certain BPM change if needed.
states persistently by creating instances
of custom objects.

CreateDocRelationship When storing documents in the context of Finding the most flexible
a process, they often need to be related to mechanism to bind the
each other. documents together, using:
򐂰 Folders
򐂰 Properties
򐂰 Custom objects
򐂰 Links
򐂰 Containments

See “Relate and bind document


design patterns” on page 375.

BPM use cases and the applicable design patterns


Table 13-9 on page 360 lists the Business Process Management use cases, their
value propositions, and the applicable design patterns.

Chapter 13. Solution building blocks 359


Table 13-9 BPM ingests content use cases
Use case Value proposition Applicable design
patterns

Attaching an additional 򐂰 Ease of use 򐂰 OperationsBasedDoc


document to an already 򐂰 Manually completing the collection of 򐂰 CreateDocRelationship
running workflow as part of documents that belongs to a certain
the user interaction. process

Part of a step, an electronic 򐂰 Nice user interface OperationsBasedDoc


form is filled out and filed 򐂰 Form (document) is GUI and container
to an object store through for status Designing a long workflow
Process Engine. 򐂰 All changes are documented as with many human
versions of the eForm interaction work steps will
򐂰 Maintain persistency after workflow has lead to a complicated
ended by having tabs for all major eForms design.
activities on the form

An external Web service 򐂰 Simplifies external communication OperationsBasedDoc


calls the Process Engine 򐂰 Additional logic can be built in a process
for further processing. The and can be configured instead of The Web service schema is
Web service has submitted programmed automatically generated by
a document. the Process Engine and is
provided. A potential client
application must be able to
call this Web service.

Any event in a workflow Maintain persistency after the workflow has OperationsBasedCustomo
can generate a custom ended bject
object and store it in an
object store.

Design patterns for content ingestion by custom applications


Applications can connect to the P8 Content Manager using available APIs, which
are .NET, a Web service, or Java-based. See Chapter 7, “Application design” on
page 153.

Before writing your own application, it makes sense to check whether the
functionality is already available either by a third party or whether there are
quicker ways to enhance your application by either using the Web Application
Toolkit or Workplace Portlets. (See ecm_help → developer road map.)

Table 13-10 on page 361 summarizes content ingestion by application design


patterns and their challenges.

360 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 13-10 Content ingestion by application design pattern
Application design Description Challenges
pattern

DocApiUploadBatch The APIs are used with the batch Transactional behavior can be
mode to be able to achieve high achieved in a limited way
throughput

DocAPIUploadSingle The API is used per document Throughput will suffer

DocApiUploadEnduser The user is authenticated in the 򐂰 Store user ID and password in the
API call application
򐂰 Single sign-on

DocApiUploadTechUser The user is not used in the API 򐂰 Direct access to CM must become
call but a technical user is very restricted
򐂰 Introduction of BPM and Records
Management (RM) is very difficult,
because the content-based
security is not in place
򐂰 If you are using this pattern in a
Web application, consider
securing the link by an additional
hash key to prevent malicious
access to arbitrary content

Content ingestion by custom application use cases and the


applicable design patterns
Table 13-11 summarizes content ingestion by custom application use cases,
their value propositions, and the applicable design patterns.
Table 13-11 Content ingestion by applications use cases
Use case Value propositions Applicable design
patterns

Enhance existing application Unlock the data and content silos DocApiUploadTechUser
to delegate content Do not change the bespoke application too DocAPIUploadSingle
management to CM much and integrate easily with CM

Integrating a modern J2EE Use synergies between content DocApiUploadEnduser


application with CM management and the bespoke application. DocAPIUploadSingle

Image (scanned paper and fax) ingestion design patterns


Today, at most of our clients’ sites, images from scanned documents are either
produced and delivered early in a business process (early scanning) or at the
end of a process when most activities have been completed (late scanning). The
topic of document scanning can fill its own chapter. In this section, we do not

Chapter 13. Solution building blocks 361


distinguish the mechanisms in the use cases of early scanning or late scanning;
rather, we concentrate on the ingestion, indexing, and validation, as well as the
annotation, which fit closely with content management. All other activities, such
as recognizing the document type and enhancing the images, are summarized
as scanning.

Table 13-12 summarizes the design patterns associated with fixed content and
images and their challenges.

Table 13-12 Images ingestion design patterns


Design pattern Description Challenges

ScanIndexAll Scanning application provides all index 򐂰 Direct link between


information based on document class scanning application and
definition CM helpful
򐂰 Document class model
exposed; no add entry
templates
򐂰 External scanning provider
needs tight integration with
back-end systems for
validation

ScanPartIndex Scanning application provides partially 򐂰 Human interaction needed


indexed information, because no 򐂰 Volumes/effort
validation can be done due to no direct
connectivity between scanning This approach needs later
application and CM adjustments of the
classification.

ScanNoIndex Scanning application provides no index Post-process-ingested


information documents with optical
character reader (OCR) or
manual step

ScanAutoValidate Scanning application has direct Effort to integrate the validation


connectivity to existing system or IBM steps in the scanning
FileNet P8 system to validate index application is typically high

ScanDelegateVal2ActCont Validation takes place after storing Routing of content could be


documents in P8 Content Manager, complex
leveraging the active content
mechanism See as well “AdjPropertyByVal”
on page 371

362 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Design pattern Description Challenges

Scan2CM Scanned images are directly transferred 򐂰 Scanning is dependent on


to P8 Content Manager, including meta the availability of IBM
information: FileNet P8 system
򐂰 Inexpensive integration 򐂰 Document Class must be
򐂰 Higher validation determined at a certain
point in the scanning
process

Scan2FS Scanned images are saved to a file 򐂰 Additional media rupture,


system: which might not be needed
򐂰 Different scanning applications can 򐂰 Cost for maintaining
be connected to the same another interface
repository easily 򐂰 Additional maintenance
򐂰 Decoupling scanning infrastructure overhead for the mapping
from archive between the scanning
application and repository
design
򐂰 Additional license cost for
the ingestion tool

Scan2IS Scanned images are saved to IBM 򐂰 Additional maintenance


FileNet Image Services (IS) and overhead for the mapping
federated to P8 Content Manager: between the scanning
򐂰 Leverage existing assets application, IS document
򐂰 Proven, flexible, and high class model, and IBM
performance infrastructure FileNet P8 document class
򐂰 Existing scan applications do not design
need to be changed 򐂰 Potentially additional
license costs for IS
򐂰 Non-functional
dependencies on the data
type mappings and the
support matrix for IS/CE

Annot@Scan Annotations are done at scanning: 򐂰 Annotations must not be


򐂰 Annotations are used for describing abused as application
bad scans that cannot be improved notepads, because they are
technically not really searchable.
򐂰 Allows you to annotate when the 򐂰 Annotation as P8 Content
scan does not include all of the Manager annotations as
needed information opposed to PDF
򐂰 Allows you to annotate when certain annotations
information must be blacked out 򐂰 Dependent on scanning
product

See “Signing and annotation


design patterns” on page 377

Chapter 13. Solution building blocks 363


Design pattern Description Challenges

ScanDelegateAnnot2CM Annotations are done after ingestion to 򐂰 This is not needed at all.
P8 Content Manager: 򐂰 Annotation as P8 Content
򐂰 Allows you to annotate when the Manager annotations as
actual committed documents have opposed to PDF
not been indexed and visually annotations
verified 򐂰 Additional license cost for
򐂰 Dependent on the viewer, can be the full-featured viewer
used to use annotations in a
consistent way regardless of the
document format (TIFF, PDF,
Office, and so forth)

Scan2Tiff Images are saved as TIFF files: Where to put the OCR text
򐂰 Ingestion page-wise
򐂰 Ingestion document-wise

Scan2PDF Images are saved to PDF Support of PDF/A

Scan2PDFSearch Images are saved to searchable PDFs Generate fulltext (indexing at


ingestion time might consume a
lot of resources). Sizing

Scan2Pages Each page of a scanned document is 򐂰 Ingest the various files into
saved as a file same document using
different content elements
򐂰 Export the document as
one file

Scan2Documents All pages of a document are saved as Small network bandwidth


one document

Image ingestion use cases and the applicable design patterns


Table 13-13 on page 365 lists the image ingestion use cases, their value
propositions, and
the applicable design patterns.

364 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 13-13 Image ingestion use cases and the applicable design patterns
Use case Value propositions Applicable design
pattern

Outsourced mailing processing: 򐂰 Decouple scanning from ingestion ScanPartIndex


Outsourcer scans all mail and 򐂰 Converting mail into electronic form ScanTOFS
puts it on a file system that is 򐂰 indexing takes place in accordance Scan2Tiff
used by the company to ingest with achievable automation levels ElecDocHighVolMulti
the mail into the content 򐂰 Inexpensive resources for the bulk
managment system operations and more expensive
resources for the subsequent tasks

Departmental decentralized 򐂰 Manual indexing takes place only ScanIndexAll


scanning: up-front (validation manually) Scan2PDFsearchable
Smaller departments are 򐂰 Only well-indexed documents are Scan2CM
scanning and indexing their ingested
most important
correspondence on their own,
for example, Legal and Human
Resources, which typically
have very sensitive content

Centralized scanning for whole 򐂰 High automation level ScanPartIndex


company or “Early scanning” 򐂰 High return on investment (ROI) due ScanValidate
to improved information handling in Scan2Tiff
business processes Scan2FS
ElecDocHighVolMulti
BPMValidate

Centralized scanning for whole 򐂰 All relevant paper-based information ScanPartIndex


company or “Late scanning” gets archived Scan2Tiff
򐂰 Highly automated Scan2CM
򐂰 Less ROI because the information is
not available in real-time business
processes
򐂰 Only for compliance purposes

You can distinguish the use cases by the level of automation and whether the
paper to electronic conversion occurs in an early or late stage in the business
process.

Content Federation Services-related design patterns


Typically, repositories are connected over Content Federated Services (CFS) to
P8 Content Manager for one or many of the reasons mentioned in the table as
patterns:
򐂰 Cannot migrate large assets of archived documents up-front
򐂰 Cannot migrate applications to other APIs in a timely manner

Chapter 13. Solution building blocks 365


򐂰 Add records management capabilities to existing assets
򐂰 Need active content behavior in existing repositories
򐂰 Leverage existing scanning infrastructure

Depending on the repository of your choice (Image Manager, Content Services,


P8 Content Manager, or other repositories), the level of integration might differ.

Federation can be a long-term pattern or you can use it as a workaround to get


the architecture cleaned up over time.

Table 13-14 on page 367 summarizes the design patterns associated with
repositories that are connected to Content Federated Services (CFS) and their
challenges.

366 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 13-14 Design patterns for repositories connected through CFS
Design pattern Description Challenges

SCANCFS Existing scan infrastructure will be 򐂰 Index information is duplicated in


leveraged: Content Manager
򐂰 Scanning solution has no 򐂰 Updates to Content Manager are
interface to P8 Content not fed back to the repository
Manager 򐂰 Data field mapping might
򐂰 Scanning solution is highly constrain the data model in P8
customized and needs more Content Manager
time to be migrated to P8
Content Manager

APICFS Existing applications need more All new functionality can only be
functionality made available loosely coupled and
in the background, because existing
Changing the access path to the applications are not going to change
repository leveraging the IBM
FileNet P8 API cannot be done in
a timely manner

RMCFS Need for Records Management Depending on the repository of your


choice, only new IBM FileNet P8
Regulations are mandating RM clients will get full RM capability on
functionality over your existing existing federated repositories. As
repositories soon as a special compliance
connector is available, this gap is
bridged.

ActiveCFS Additional need for a notification Data field mapping might constrain
after ingestion or launching the data model in P8 Content
workflows for existing repositories Manager

After the P8 Content Manager


object is automatically created,
the event subscription mechanism
can be leveraged

OpticalCFS There is a business need for Data field mapping might constrain
support of optical media the data model in P8 Content
Manager
Optical media support only for
Image Manager available

New clients to IBM FileNet P8 products must consider a direct ingestion path into
P8 Content Manager to ensure that they can leverage the newest features of the
IBM FileNet P8 Platform.

Chapter 13. Solution building blocks 367


Image ingestion use cases and the applicable design patterns
Table 13-15 lists the use cases, their value propositions, and the applicable
design patterns.

Table 13-15 Image ingestion use cases


Use case Value proposition Applicable design
pattern

Large assets of documents stored on 򐂰 Protect investment ActiveCFS


Image Manager 򐂰 Seamless integrate with IBM RMCFS
FileNet P8
Client wants to move to IBM FileNet P8 򐂰 Smooth upgrade path
but is not able to migrate documents due
to volumes

No custom applications

Many mission critical applications highly 򐂰 Migration of applications can be APICFS


customized based on Image Manager, done over a longer period of time
you need to introduce RM toward IBM FileNet P8
򐂰 Content does not need to be
migrated

Has a heterogeneous landscape of Provide the same functionalities for ActiveCFS


repositories and is looking for an umbrella all repositories. One approach to RMCFS
architecture to introduce RM and BPM different information sources

Large scanning center with highly 򐂰 Protect investment SCANCFS


customized capture software, which is 򐂰 Allow for a long-term solution by
highly integrated with existing systems gaining time by federating data
instead of rewriting application

SAP outbound archiving


SAP outbound archiving means the ingestion of documents from SAP. We
consider them as inbound documents for P8 Content Manager. Typical
documents are created in R/3 application modules, such as SD, FI, HR, or MM,
through SAP scripts. In addition, print lists and reorganization data are very
popular content for P8 Content Manager.

We describe SAP inbound documents or documents generated externally to


SAP in 13.2.3, “Presentation and delivery management-related design patterns”
on page 378.

Table 13-16 on page 369 summarizes the SAP ingestion design patterns and
their challenges.

368 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 13-16 SAP ingestion design patterns
Design pattern Description Challenges

SAPInbReo Optimizing volume of data held in the SAP Huge volumes of data. Ask
underlaying database. Unneeded documents for appropriate storage
are archived to P8 Content Manager and can media
be put under Records Management if needed

SAPInbAlf Printed lists can be archived to P8 Content N/A


Manager and made available to non-SAP
clients

SAPInbDoc SAP-generated documents are printed and N/A


archived to P8 Content Manager

13.2.2 Content and workflow management-related design patterns


After a document is stored in an object store, there are many situations where
additional functionality must be applied. These functionalities are either manually
or automatically triggered.

Active content design patterns


P8 Content Manager provides a rich event model, to which you can subscribed.
Most events can be audited if needed. The capability of reacting on certain
changes is called Active Content.

To understand the event model, this is a list of possible events:


򐂰 Checkin, Checkout, and Cancel Checkout
򐂰 Change Class and Change State
򐂰 Classify Complete
򐂰 Creation, and Deletion
򐂰 Custom
򐂰 Demote Version and Promote Version
򐂰 File and Unfile
򐂰 Freeze, Lock, and Unlock
򐂰 Update and Update Security

Best practice: While the event handling is an extremely powerful mechanism,


remember that volumes might have an influence on your throughput,
especially when you plan to audit several of the events.

We discuss design patterns associated with:


򐂰 “Design patterns related to when an event is first triggered” on page 370

Chapter 13. Solution building blocks 369


򐂰 “Design patterns related to subsequent triggered events” on page 374
򐂰 “Signing and annotation design patterns” on page 377
򐂰 “Deletion-related design patterns” on page 378

Design patterns related to when an event is first triggered


Typically after the document is stored in the object store, certain actions need to
be applied to ensure consistency or further processing. These activities are
typically called postprocessing and are triggered from the creation or checkin
event.

These activities can be coded or delegated to a workflow.

Best practice: Try to avoid large code blocks in events. Instead, use a
delegate pattern that allows you to test code separately from P8 Content
Manager.

When an event is first triggered, the following activities might occur:


򐂰 Adjustment of classification
򐂰 Adjustment of security
򐂰 Moving of content
򐂰 Routing of content

Adjustment of classification-related design patterns


Table 13-17 on page 371 summarizes the adjustment of classification-related
design patterns and their challenges.

370 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 13-17 Adjustment of classification-related design patterns
Design pattern Description Challenges

AdjDocClass At the moment of Additional properties might be empty. If there


ingestion, the definitive are additional mandantory properties, they
class might not be have to be filled unless defaults are set.
determined. Now, the
subclass is known.

AdjPropertyByVal At the moment of Calling existing systems programmatically


ingestion, there was no might be not too attractive. Leveraging BPM
validation possible. and calling existing Web services might be
easier.

AdjPropertAutoClassify For large volume If you heavily rely on Microsoft Office


ingestion, the content is document properties, this is one way to get
also the control file, for P8 Content Manager synchronized with the
example, synchronize native properties.
Office Properties with CM
properties.

AdjFileInFolder Folder structure was not If you save certain properties, which are
known at ingestion time. common to other documents on a folder level
and then file the document in just a folder, the
logic can be put in this type of an event. The
event can, as well, deal with the folder
creation.

AdjDeclareAsRecord Ingestion tool cannot While BPM offers you an easier way to
declare as records. declare a content as record, in a P8 Content
(Third-party tools). BPM is Manager only use case, you can decide to
not needed. call the declare as record method (part of the
RM API) from an event.

AdjDataMapping There is a data mapping You have a standard for your meta
problem, and the information but the need to federate another
workaround is that repository might end up with data type
property content gets mapping problems. For these cases, you can
assigned to target map the data to additional properties, which
property. can be mapped, and then mapped back the
values in the event. This removes the
technical complexity and allows you to
enforce a common data model.

Adjustment of security-related design patterns


Table 13-18 on page 372 summarizes adjusting security-related design patterns
and their challenges.

Chapter 13. Solution building blocks 371


Table 13-18 Adjustment of security-related design patterns
Design pattern Description Challenges

AdjSecByValues Marking Sets Users do not worry about security; they just
want to select another value for a property in
a choice list.

AdjSecByVersions Security Policy Users do not worry about security. By just


creating minor and major versions, the
security is handled correctly in the
background.

AdjSecByLifeCycle Life Cycle Policy (LCP) The life cycle goes through many different
types of status, and users do not want to think
about minor and major versions; therefore,
Life Cycle Policies are great from a user
perspective. For the programmer, they are
additional work. Promoting and Demoting is
the mechanism to leverage LCPs.

AdjSecByFileInfolder Filing in folder The security is only applied when filed in a


folder. If the folder security changes, the
document security is untouched. This is often
not the desired behavior.

Moving of content-related design patterns


Table 13-19 on page 373 summarizes design patterns associated with moving
content and their challenges.

372 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Table 13-19 Design patterns associated with moving content
Design Pattern Description Challenges

AdjStoragelocation Change the storage You can distinguish between short-term storage
location for content. and long-term storage. As long as the object store
has two different file stores associated, the
content can be relocated to a different file store.
If the content must be relocated, including the
metadata, see the copy content option.

ReplicateByProxyObj Create another object for The content needs to be left in its original place
ect indexing purposes of a but made available through a different object
second business context. store, maybe even in a different IBM FileNet P8
The new object refers back domain. This is possible by writing code that
to the same physical generates a contentless object that includes a
content. URL to the actual physical content.
Take care on the security. The accessing user
must have the security for the proxy, as well as for
the physical content.

ReplicateByCopyCon Create a duplicate in There are rare situations where you need a
tent another object store. document in two or more business contexts, and
you do not have a “binding” strategy, so you are
tempted to copy. From a compliance perspective,
this is not allowed. See “Relate and bind
document design patterns” on page 375.

ReplicateByRenderC Create a duplicate in a There are formats that need certain platform
ontent different content specific components, such as Microsoft Office, on
representation. a server platform.

Routing of content-related design patterns


You can achieve routing by either starting a workflow, by changing a property
(status), or by notification. Table 13-20 on page 374 summarizes design patterns
related to routing.

Chapter 13. Solution building blocks 373


Table 13-20 Routing design patterns
Design pattern Description Challenges

RouteByWorkflow After a content is stored, Multiple documents belonging to the same context
a workflow is launched. each can launch a workflow. The launched workflow
must have the ability to wait for a certain amount of
time, and other documents belonging to the same
context must be able to attach to the already launched
workflow. This behavior is implemented in the
Business Process Framework (BPF).

RouteByPropertyV A value of a property is People have to query P8 Content Manager actively to


alue set to a list of discrete see which documents are “waiting for them”. See
values. “FindByStoredSearch” on page 379.

RouteBySecurity Depending on the It might look as though security is abused in this


security, which can be scenario. There are situations where this makes
changed in various ways, sense, for example, implementing a recycle bin.
content will be made
visible or invisible for the
user.

There are more activities that can be part of the first moment that an event is
launched. We describe other popular situations, such as relating documents, in
the next section.

Design patterns related to subsequent triggered events


While the content is now accessible by different users, there are various
mechanisms available that can be leveraged for additional processing:
򐂰 An event triggers the execution of code or the launch of a workflow.
򐂰 A property value change drives the change of permissions (markings).
򐂰 Promotion and demotion are affecting the document status and its associated
security (life cycle policy).
򐂰 A version change is affecting the security (versioning security policies).

All four mechanisms are triggering events, but, other than the first one, the rest of
the mechanisms do not need coding in the event to allow users to interact with
the system.

It is impossible to enumerate a complete list of functionality that can be called in


events. Certain events have been described already in “Design patterns related
to when an event is first triggered” on page 370. We explain other events in this
section, including:

374 IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 “Processing-related design patterns” on page 375
򐂰 “Relate and bind document design patterns” on page 375

Processing-related design patterns


Table 13-21 summarizes processing-related design patterns and challenges.

Table 13-21 Processing-related design patterns


Design pattern Description Challenges

ProcByEvent Event triggers and There are almost no constraints in the functionality that you
subscripted code or can achieve by calling external code or launching
workflow are workflows. Think carefully which event must do what. While
launched BPM gives you, for many situations, an easier interface for
defining the functionality graphically, it might be appropriate
to call external code directly. If you have a complicated
system, the BPM approach might give you an easier
documentation path rather than generating graphs for all the
dependencies of your code manually.

ProcByPropertyV Update event This option is very powerful and, from a usability
alueChange launches and the perspective, very beneficial. Without any code written,
associated value is security can be changed by leveraging marking sets.
changed. This can Defining the marking sets needs a bit of education. From an
also trigger security operations point of view, it might be difficult to maintain,
changes if a marking because marking sets are globally available to the IBM
set is associated. FileNet P8 domain. A naming convention for the marking
sets might be useful.

ProcByVersionSt Version status Education of users to use minor and major versions might
atusChange changes and be sort of a challenge. After this is done, security policies
associated security might be extremely beneficial. In addition, they provide a
policy change powerful vehicle, which prevents changing security on
permissions documents over the life cycle.

ProcByLifeCycle ChangeState event Very flexible but more labor intensive for the programmers.
StateChange launches and the From a usability perspective, this is user friendly. A user can
associated life cycle just demote or promote and does not have to worry about
actions are executed. security or minor and major versions at all.
For every change,
code can be executed
and security
changed.

Relate and bind document design patterns


Relating and binding documents belong to the same business context.
Table 13-22 on page 376 summarizes the design patterns for relating content
design patterns and their challenges.

Chapter 13. Solution building blocks 375


Table 13-22 Relate content design patterns
Design pattern Description Challenges

RelateDocByFolder Documents of the same How to distinguish sets of document versions


business context are bound to belonging to each other?
the same folder. The folder
does not know the version of
the document (version
independent).

RelateDocByAELink Documents of the same How to understand whether there is a newer


business context are tightly version?
linked to each other using the
Application Engine (AE) Link
object, which is browsable. The
Link is version stable.

RelateDocByProperty Documents of the same How to trigger the user to find related
business context are loosely documents? A stored search or a search
coupled by the same value in a template might help.
specific property. How to prevent deletions of some of the
documents?

RelateDocByAssociati Documents of the same You can only administer the foreign entity
on business context are tightly and not the primary entity. For example,
coupled using the association each claim has an associated policy. You
property, ensuring referential can add a policy number to a claim, but you
integrity. cannot add a claim to a policy. So, the claim
(foreign entity) references the policy (primary
entity).

RelateDocByExtConte Documents of the same The contentless document option accepts an


nt business context are loosely URL. You can choose to paste in a
coupled using the external Workplace URL or your own servlet URL to
content URL. The link can be retrieve the target.
version independent or version This option is very loosely coupled and
stable. requires security set correctly on the
contentless object, as well as on the target.
If the target is deleted, the contentless
object becomes an orphan.

RelateDocByCompou Documents of the same With this approach, single documents can be
ndDoc business context can be linked changed and the collection of documents
using the compound document that are treated as an entity can be
feature. This is a very tight refreshed. So, a major version of a
coupling mechanism which compound document can be the collection of
takes care of every change. all major versions of its children.

376 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Design pattern Description Challenges

RelateDocByCustomO Documents of the same This approach can be used to model


bjects business context are relationships between information
referenced from custom independently from content carrying objects,
objects. such as documents. The custom objects can
also model links to documents; it might be
interesting to find documents linked to
different contexts. This approach was
followed by BPF.

Signing and annotation design patterns


In addition, there are a few actions, which are often handled manually:
򐂰 Signing a document
򐂰 Annotating a document

Table 13-23 summarizes annotation-related design patterns and their


challenges.

Table 13-23 Annotation-related design patterns


Design pattern Description Challenges

AnnotDocByDeja Use DejaViewer and annotate for The annotations can only be made available
all file formats consistently. while having access to IBM FileNet Content
Manager. As soon as the content is checked
out, there is no way to access these
annotations.

AnnotDocByNative Use PDF Annotations and Office Many users do not have the license to
Version Tracking. annotate PDFs. For Office documents, often
it makes more sense to track changes
natively in the Microsoft Office applications.

SignDecisionByEfo Use an eForm as a container for Integration in public key infrastructure is


rm decisions and use the built-in dependent on your infrastructure.
signing functions.

SignDocByNativeA Use the native environment to sign Loosely coupled information about signed
pp a document, for example, PDF documents is a challenge.
documents

SignDocByCustom Use a custom functionality in Make or buy.


App Workplace to seal a document.
The hash code is written in a
property, which is protected from
further changes.

Chapter 13. Solution building blocks 377


Deletion-related design patterns
From a compliance perspective, you must prevent deletion and substitute a
retention management system, which initiates the destruction process in a
controlled manner.

Best practice: Align hardware-based retention mechanisms with


software-based mechanisms. They need to cooperate.

Table 13-24 summarizes document retention-related design patterns and


challenges.

Table 13-24 Document retention-related design patterns


Design pattern Description Challenges

DeleteDocManual Allow some superusers to Audit the deletes to make sure that the
delete documents. users who can delete cannot change
audit levels.

RMSweep Delegate the deletion to RM. Not every content is part of a record.

DeleteDocByBulkSearch A search based on the create The number of documents might be


date will search for all bigger than the number being retrieved
candidates to be deleted. It can by IBM FileNet Enterprise Manager
be executed as part of a (FEM) in one pass.
sysadmin job. Refer to 11.9.2,
“Bulk operations” on page 303.

DeleteDocByHide Do not delete, but hide the This does not cope with the storage
documents by removing growth.
permissions or by unfiling from
folders.

DeleteDocAutoStorage The retention period ends on the Make sure that the content is not just
storage tier. removed from the storage; this leaves
the file store in an inconsistent state and
the metadata can still be found.

13.2.3 Presentation and delivery management-related design


patterns
After content has been archived and maintained in P8 Content Manager, multiple
consumers access the metadata and the content. We discuss the following
areas:
򐂰 “Finding, searching, and browsing patterns” on page 379
򐂰 “Content delivery design patterns” on page 380

378 IBM FileNet Content Manager Implementation Best Practices and Recommendations
򐂰 “Restricting design patterns” on page 381
򐂰 “External linking design patterns” on page 381

Finding, searching, and browsing patterns


Table 13-25 summarizes the design patterns related to finding, searching, and
browsing.

Table 13-25 Finding content and metadata design patterns and their challenges
Design pattern Description Challenges

FindByBrowse Use folders as a navigation Content is in more than one folder


mechanism. (possible). Maintenance by filing in
folders and by unfiling might be
extensive.
Typically, folders are abused to
match a file system structure instead
of taking advantage of newer
capabilities, such as stored searches
and using properties.

FindByAdhocSearch Use simple searches. This might not be sufficient. It can


leverage fulltext search. It might
deliver too many hits.

FindByStoredSearch Uses a similar structure as a folder The effort to create the stored
(Workplace and Workplace XT) but searches and to maintain them might
executes a search. This is be substantial.
extremely user friendly and
powerful.

FindBySearchTemplate This is similar to the StoredSearch They expect a user interaction.


option. The results do not show up
in the folder structure (Workplace)
as the stored searches.

FindAfterNotify An e-mail is sent with a link. There is additional effort to send


e-mails; this is helpful for
occasionally connected users.

FindInQueue The content is delivered as part of There is a need for BPM.


a Workflow and can be looked up in
your inbox.

FindByReport Reports can be done by using the Depending on the nature of the
IBM FileNet Content Manager queries, a system administrator or a
JDBC driver, including security user can run them. Think about the
settings, or by reading the security implications.
database natively (no security).

Chapter 13. Solution building blocks 379


Content delivery design patterns
Content delivery-related design patterns are summarized in Table 13-26.

Table 13-26 Content delivery design patterns and their challenges


Design pattern Description Challenges

DeliverStream This is the default behavior when Multiple Content Elements might not
accessing a document through be supported by the configured
Workplace. consuming application. Workplace
The MIME type is relevant to (DejaViewer) allows you to step
launch an associated application through single page TIFFs if they are
at the client. saved in multiple content elements
Workplace can be customized to (page-wise ingestion).
behave differently for certain
MIME types.

DeliverURL When a document is selected, the Delivering URLs to an application or a


formed URL contains parameters browser must be stable over time. Use
that allow you to retrieve the last Virtual Server Farm Domain Name
released document or a concrete Server (DNS) entries per project for
version of the document. the greatest flexibility to change
architecture over time without
affecting users.

DeliverToFolder This is a server-side unload of This depends on volumes. This can be


multiple documents into one done using the export command line
folder. or by using the CE API storing the
stream to disk.

DeliverByEmail A limited number of documents Recipients’ e-mail addresses are often


can be saved as a zip file and sent unknown or unavailable within ECM
per Simple Mail Transfer Protocol applications. Users prefer using their
(SMTP) via Workplace. own e-mail client to use the address
book and spell checker.

DeliverZipped A limited number of documents Only the first content element is used.
can be selected in Workplace and The number of files is limited.
marked for download. A zip file is
delivered containing selected
documents.

DeliverInterceptedStream There are mechanisms available Carefully describe the customization


to intercept the streaming of of Workplace.
content and to add additional
functionality in addition to
rendering a dxl file to HTML using
Workplace.

380 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Restricting design patterns
Design patterns related to restricting content delivery are summarized in
Table 13-27.

Table 13-27 Restricting content delivery design patterns and challenges


Design pattern Description Challenges

RestrictBySecurity Security is a powerful way to hide a Security needs to be stable over


document either completely or time.
while the document is in a specific Changing security on large numbers
state. of documents is time-consuming.
Look for a security feature that
prevents direct changes to the
Access Control List (ACL) of a
document, for example, security
policies or resource-based access
control (RBAC) through the nesting
of groups in the directory service
provider.

RestrictByProperty Changing a property value will not Make sure the property is indexed
make the candidate document any correctly on the database. There is
more a result of a certain search no guarantee that the document will
and therefore restricts access to the not show up, because it can be
document for a given search found by a different search, unless
criteria. Using a marking set for the the marking set sets permission
choice of values makes this pattern more restrictively.
even more powerful by combining
the strength of the
RestrictBySecurity pattern with the
ease of use of the
RestrictByProperty pattern.

RestrictByFolder This is filing a document or unfiling The document is not browsable any
a document from a folder. longer, but it is still searchable.

RestrictByVersion Changing from a minor to a major Education for users is a challenge.


version will influence the visibility.

RestrictByHiding Workplace interprets the hidden This is only important to Workplace


property. and is not really restrictive for other
applications.

External linking design patterns


Design patterns related to external linkage to content are summarized in
Table 13-28 on page 382.

Chapter 13. Solution building blocks 381


Table 13-28 External linkage to content patterns and challenges
Design pattern Description Challenges

LinkByDocGuid This is a version stable document ID. This is not really linking to the
content at run time; the content
was linked in the past.

LinkByDocVersionGuid This is a version specific document Newer version is available but


ID. inaccessible.

Link2Folder Multiple documents can be filed in a It might not be clear which


folder. The user can select the information is really linked.
document and the preferred version.

Link2ContentElement This references a concrete content Displaying a certain page of a


element, which is typically a page in a document can be achieved by
multipage TIFF file. customizing the ViewOne viewer.

All external linking patterns are challenged by the authentication problem that
needs to be addressed by either implementing a single sign-on mechanism or by
leveraging a technical user pattern “DocApiUploadTechUser” on page 361.

When leveraging Workplace, the following link patterns are available:


򐂰 Link to a folder:
http://server/Workplace01/getContent?id=%7BD0E399AF-FCBF-49B9-9489-
C6696A04D154%7D&objectStoreName=test&objectType=folder
򐂰 Link to a specific version of a document:
http://server/Workplace01/getContent?id=%7BA305F4B6-BF41-487C-87BE-
B70F7C681EAD%7D&vsId=%7BA5E0FD8E-9C64-4958-86DC-781847D507
97%7D&objectStoreName=test&objectType=document
򐂰 Link to the current version of the document:
http://server/Workplace01/getContent?id=current&vsId=%7BA5E0FD8E-9C6
4-4958-86DC-781847D50797%7D&objectStoreName=test&objectType=doc
ument
򐂰 Link to the released version of a document:
http://server/Workplace01/getContent?id=release&vsId=%7BA5E0FD8E-9C6
4-4958-86DC-781847D50797%7D&objectStoreName=test&objectType=doc
ument

The parameter vsId points version-agnostic to the document. For each version,
there is a different id. The parameter Id can be overridden by a literal current or
release.

382 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Customizing the ViewOne component is described in ViewOne HTML and the
Installation Manual, which is available from the customer support page at
Product Documentation for IBM FileNet P8 Platform:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27010422

13.3 Four sample use cases


Let us review the four sample use cases.

13.3.1 Document revision and approval process


The use case concerns a policy or procedures and safety documentation. Legal
or human resource documents need to go through a simple life cycle process.
This example, which is illustrated in Figure 13-6 on page 384, is typically the first
project introducing P8 Content Manager in a company.

From requirements to functional design


The use case has the following requirements:
򐂰 Documents can be authored, reviewed, approved, or released.
򐂰 There are four roles (author, reviewer, approver, and user).
򐂰 Each role has certain permissions.
򐂰 Users can only see approved documents and always the latest version.
򐂰 Changes to documents can be logged.
򐂰 There is minimal effort for application distribution and education.
򐂰 Most documents are electronic documents: 50% Excel, 30% Word, and 20%
other.
򐂰 Today, the documents are stored in a file system.
򐂰 The user must save and open the documents from Microsoft Office
applications.
򐂰 The number of users is 200.

The requirements mentioned can now be mapped directly in the design patterns
found in the relevant section for content ingestion, content and workflow
management, and delivery and presentation management.

Figure 13-6 on page 384 illustrates the document review and approval process.

Chapter 13. Solution building blocks 383


1. Author creates 4. New version
supersedes the older;
document for
all versions are retained
revision.
in the repository.

2. Authors and Reviewers


collaborate by checking
document versions in as 0.1

minor versions.
0.2
1.0
0.1

Repository
0.2

3. After final approval,


document is checked
back in as new major 1.0
version.

Figure 13-6 Document review and approval require minor and major versions

By going to the list of all of the design patterns that we presented earlier, we
marked potential relevant patterns and summarized them in the following shaded
box.

Applicable design patterns for this use case:


򐂰 Content ingestion:
– Ingestion pattern, “ElecDocHighVolOnce” on page 356
– Ingestion pattern, “ElecDocLowManually” on page 357
򐂰 Content and workflow management:
– Processing pattern, “AdjSecByValues” on page 372
– Processing pattern, “RouteByPropertyValue” on page 374
– Processing pattern, “ProcByPropertyValueChange” on page 375
(Notification, Audit)
– Processing pattern, “AnnotDocByNative” on page 377
򐂰 Delivery and presentation:
– Delivery pattern, “FindByStoredSearch” on page 379
– Delivery pattern, “FindAfterNotify” on page 379
– Delivery pattern, “DeliverStream” on page 380
– Delivery pattern, “RestrictByProperty” on page 381

384 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Discussion of the chosen design patterns
By reading the list of all ingestion patterns, remember that you have to deal with
the task of importing existing documents, as well as with the daily document
interactions after importing is completed. ElecDocHighVolOnce was selected to
ensure that the choice of the import tool and the preparation of the document
import are addressed in the solution description. Because most of the documents
have an electronic document character, there is no need for a tool addressing
the page-wise ingestion. So, our tool can be Records Crawler, which can handle
document-based ingestion well. For the import, the project can do an analysis of
the target folder structure or might just use properties instead of folders at all.
The techniques to allow the users to still file in their existing folder structure while
migrating documents to P8 Content Manager is described in 13.3.3, “Information
capture supporting call center operation” on page 391.

ElecDocLowManually was selected for the type of interactions for daily


operations. The selection of this pattern immediately addresses the potential
voluminous and nonfunctional needs in terms of system architecture,
performance, and capacity planning.

The choice of the tool for the daily interactions with P8 Content Manager is not
yet clear. There are no restrictions yet. It might be Workplace and Office
Integration so far. This might become clearer when the delivery management
patterns are understood.

We decided to use AdjSecByValues, RouteByPropertyValue to make sure the


user can easily select a meaningful value for a state of a document and
immediately decide about security and routing without having to worry about the
details. This needs to be solved in the background.

We have not chosen the alternative approach where security and routing are
solved by promoting and demoting. One reason was that we did not want to use
a programmer to code the actions for the purpose of a simple use case. In reality,
you might decide on the alternative approach depending on the skill set available
to the project.

The patterns on processing ProcByPropertyValueChange, restricting security


and routing, will support you in leading the discussions and elaborating the
advantages and disadvantages of each pattern.

The pattern on annotation AnnotDocByNative is straightforward. This will not


change the users’ behavior marking their changes in Microsoft Office documents.

The delivery patterns are guiding you through the choice of application.
You compare the capabilities that are offered by Workplace. If they satisfy your
requirements, you might use WorkplaceXT. If you need more specific features,
you might consider comparing the two approaches, customize Workplace as

Chapter 13. Solution building blocks 385


opposed to buying an IBM FileNet Partner’s application, or even consider
building a bespoke application. These questions are typically reduced to deciding
whether you are going to use a horizontal or vertical approach. Workplace is
interesting as a horizontal front end for P8 Content Manager.

The pattern on navigation is an interesting one, FindByStoredSearch. This


pattern tries to combine the strength of a browsable structure by using properties
for finding the documents. See Chapter 7, “Application design” on page 153 for
more information about “filed as opposed to property”.

Solution details
The four different states that a document can represent typically are a key
indicator to use ProcessByLifeCycleStateChange. To keep the solution design
simple, we translated the functional requirements into a design, which allows
users to toggle a status flag on the document. Depending on its status (draft, to
review, reviewed, and approved), a property can be toggled by the user who
has the needed permissions.

Each change to the status of the document will be audited and an notification is
sent to the reviewer group as soon as the status has changed to to review.

This is an attractive alternative for reviewers, who do not work with Workplace on
a daily basis. As soon as a document is ready for review, the reviewers get a
notification. Later, when reviewers are using P8 Content Manager more often,
this notification is not needed any longer.

Each role (author, reviewer, approver, or user) will have a preconfigured portlet,
which shows a list of objects for which the user has permission and the status of
those objects.

A user typically has one portlet that shows all approved documents. A reviewer
sees all documents that are ready for review and perhaps the reviewed
documents. In addition, the user needs to see approved documents.

Authors need to see their own documents and the documents ready for review,
as well as the reviewed and approved documents.

The portlets can consume a stored search as their browsing “folder”. This is a
good way to combine property-based searches with the look and feel of folders.

According to the illustration Figure on page 384, only major versions are
approved. This is not really needed technically, but it is a common way to
implement access to documents. Typically, users will only see major versions.

386 IBM FileNet Content Manager Implementation Best Practices and Recommendations
The system will be implemented on an “all-in-one” approach where all engines
are installed in one machine.

Two hundred users only generate a small number of documents relative to what
other clients are achieving using P8 Content Manager. Therefore, we decided to
implement everything in one object store but use separate document classes for
each functional aspect.

The life cycle of the documents was implemented by a status property with
associated marking sets. This addresses two issues at the same time: Control
the access to change the security, and simplify the function of changing security
by toggling a value. The marking sets will be implemented per document class.

From a security perspective, different groups are needed for each functional
aspect of the document type and per user role, such as author, reviewer, and
approver. There might be more sophisticated ways to achieve the combination of
functionality and roles that we did not include in this simple scenario.

From a user interface perspective, a centralized My Workplace page was


designed, which shows the four states by executing stored searches against the
repository. This is shown in Figure 13-7 on page 388. In this way, we mimic the
concept of workflow just by showing different views on the document population.

It is possible to configure centrally My Workplace pages for every Role and


Department or even let the users create their own views.

Chapter 13. Solution building blocks 387


Figure 13-7 Workplace browse portlets configured to consume a stored search

13.3.2 Insurance claim processing


This use case involves an active document with workflow processes. Documents
can come in as a fax, in the mail, by phone, or online. Requests get batched up
for scanning and get attached to eForms. The company might use eForms.

Important: Active content triggers workflow and starts the workflow


processing.

Figure 13-8 on page 389 illustrates the insurance claim processing use case.

388 IBM FileNet Content Manager Implementation Best Practices and Recommendations
1. Insurance claim 2. Fax Capture
arrives from a field adds the claim to
office by fax. the repository.

Capture
3. A workflow
launches
automatically. 8. Claim documents
are sent to the client.

Repository
BPM

4. Workflow 7. If the claim is


approved, the workflow
routes the claim
to a Verifier renders the claim
who adds document to PDF, stores
it in the repository , and
additional data.
updates the company
record accordingly.
A check is also issued
and mailed to the client.
5. Workflow routes the claim
to an Adjuster who approves
or rejects the claim . 6. If the claim is rejected , the workflow
sends an e-mail notice to the agent .

Figure 13-8 Insurance claim processing use case

From requirements to functional design


The use case has the following requirements:
򐂰 Documents scanned to become paperless, which is an important requirement
to make them routable
򐂰 Additional validations required
򐂰 Approval process
򐂰 Notification
򐂰 Rendering into PDF
򐂰 Back-end integration
򐂰 Ability to handle one thousand users working on the system
򐂰 Automate as much as possible
򐂰 Building application on a green field (no backscanning)

By going to the list of all design patterns that we presented earlier, we marked
potential relevant patterns and summarized them in the following shaded box.

Chapter 13. Solution building blocks 389


Application design patterns for the use case:
򐂰 Content ingestion:
– Ingestion Pattern, “Scan2CM” on page 363, “Scan2Tiff” on page 364
– Indexing Pattern, “ScanPartIndex” on page 362
– Validation Pattern, “ScanDelegateVal2ActCont” on page 362
– Annotation Pattern, “Annot@Scan” on page 363
򐂰 Content and workflow management:
– Validation Pattern, “AdjPropertyByVal” on page 371
– Processing Pattern, “AdjDocClass” on page 371
– Routing Pattern, “RouteByWorkflow” on page 374
– Security Pattern, “AdjSecByValues” on page 372
– Processing Pattern, “ProcByEvent” on page 375
– Annotation Pattern, “AnnotDocByDeja” on page 377
– BPM Patterns {Approval Logic, Notification, Rendering2PDF}a
򐂰 Delivery and presentation:
– Delivering pattern, “FindInQueue” on page 379
– BPM Pattern {Call existing system}
a. There are many BPM patterns that we have not covered in the chapter, including
the ones we mention here.

Discussion of the chosen design patterns


Scan2CM was chosen to achieve a very high degree of integration among
scanning, image enhancement, indexing, and committing to the archive. It
assumes that most of the steps that can be done are automated. Scanning and
storing against a file system was not chosen to make the interface as slim as
possible.

Scan2Tiff was chosen, because it is still state-of-the-art. We decided in this use


case to use PDF just for the last part and not for the ingestion. Annotations for
PDF involve additional licenses, and compatibility must be ensured using PDF/A.

Annot@Scan was selected to make sure that annotations are only used for
describing bad images or corrections to images that cannot be rescanned.
There is no plan to allow users to annotate the images after ingestion. For this
purpose, we want to use a property that allows us to store long text.

We selected ScanPartIndex, because we plan to assign most index values as


soon as possible before ingesting to P8 Content Manager. Only missing
information will be completed within the content and business process
management part.

390 IBM FileNet Content Manager Implementation Best Practices and Recommendations
ScanDelegateVal2ActCont expresses the capability that validation of certain
assigned index values will be performed as part of a workflow, which might
include human interaction.

13.3.3 Information capture supporting call center operation


This sample use case is about ingestion from various sources, simple indexing
for retrieval, and high volume static document ingestion for search and retrieval.
No document processing is required.

From requirements to functional design


This use case has the following requirements:
򐂰 Documents are scanned in various scanning centers and there is no
integration of the scanning application with the back-end applications
planned.
򐂰 Electronic documents are sent over e-mails.
򐂰 XML documents are delivered on FTP servers.
򐂰 Documents are archived and indexed according to medical cases.
򐂰 Searches are executed to retrieve documents per medical case.

By going to the list of all design patterns that we presented earlier in the chapter,
we marked potential relevant design patterns and summarized them in the
following shaded box.

Application design patterns for the use case:


򐂰 Content ingestion:
– “Scan2FS” on page 363
– “Scan2PDFSearch” on page 364
– “EmailAutoRules” on page 353
– “EmailFullText” on page 354
– “ElecDocHighVolMulti” on page 356
– “ElecDocAutoRules” on page 357
򐂰 Content and workflow management:
All intelligence is up-front, and no postprocessing is needed.
򐂰 Delivery and presentation:
– “FindBySearchTemplate” on page 379
– “RestrictBySecurity” on page 381

Chapter 13. Solution building blocks 391


The illustration in Figure 13-9 shows a use case, which requires high volume
ingestion and fast response time.

Records
3. Load-balanced Web servers provide
Crawler
fast response times required by a large
Servers
call center.

Customer Call Center


Input Files Support

Server-Farmed Load-Balanced
Repositories Application (Web)
Servers
Input Files

1. Patient information, medical charts, 2. A program collates the files by case


and plan coverage are collected at number. The high volume of input files is
several distribution points across the spread across several Content Manager
country. The ingestion rate is generally Servers. Security policies enforce
more than 50,000 files per hour. HIPA regulations.

Figure 13-9 Information capture supporting call center operation

Discussion of the chosen patterns


The Scan2FS pattern was chosen to indicate that paper-based information is
scanned at an external company and delivered as searchable PDFs. With the
Scan2PDFSearch approach, the client has the flexibility to fulltext index easily,
because the recognized ASCII stream is available in the PDF file. The Records
Crawler is configured similarly to the Email Manager. As soon as an e-mail
arrives or files are delivered in a file system folder structure, the ingestion follows
a rule-based approach (EmailAutoRules or ElecDocAutoRules).

The EmailFullText pattern describes the capability to fulltext index the e-mails
(attachments and body), which complements the ingestion of electronic
documents.

In this scenario, we consider a simple class model, which holds all of the
necessary information of a case regardless of the ingestion channel. We do not
distinguish between body text and attachments. The object store needs to be

392 IBM FileNet Content Manager Implementation Best Practices and Recommendations
fulltext indexed to serve the information provided in the stream to the user.
ElecDocHighVolMulti in this context describes mostly the nature of versioning.
Every single piece of information, which is ingested, is understood as a new
major version 1. There is no versioning needed at all. Performance with high
volumes is another characteristic of this pattern. The original files were removed
from the file system and from the e-mail journal files respectively as soon as
e-mails were archived.

The FindBySearchTemplate pattern can be used to find all information that


belongs to one case.

We assume that at ingestion time, the rules have been powerful enough that the
case numbers have been defined all along.

Access to the relevant information can be restricted through a very rough


security model, which allows certain roles to access the stored information. The
RestrictBySecurity pattern or the AdjSecByValues pattern can be used to allocate
certain groups access to certain case numbers.

13.3.4 Email management for compliance


This use case involves high volumes, e-mail ingestion, stub e-mails, placing
content in IBM FileNet P8, and using declare as records automatically. This
scenario uses “zero click” based on rules for compliance.

From requirements to function design


This use case has the following requirements:
򐂰 Email must be automatically classified, secured, and archived.
򐂰 Email retention must automatically be set at the moment of classification.
򐂰 Every recipient on the e-mail distribution list needs to have access to the
stored e-mail.
򐂰 After 60 days, the e-mail attachments are replaced by a link from every
recipient’s mailbox to reduce storage needs in the messaging back end. This
allows the users to still retrieve the attachments from the archive.
򐂰 Emails must be semi-automatically deleted after the retention period is
reached.

Figure 13-10 on page 394 illustrates the use case.

Chapter 13. Solution building blocks 393


Collection Rules

e-mail Server

Inbox
Records File Plan

Email Manager Server


Content Manager
with Records
Manager
1. Effective e-mail
management involves
declaring e-mail content as
business records.

3. e-mails are declared as


2. Email Manager monitors e-mail journals. records and placed in the
e-mails that match a set of collection rules are records file plan where they are
captured. Messages (and any duplicates) are managed by record retention
removed from the e -mail server and replaced by a rules.
link in users’ inboxes. Clicking on the link retrieves
the message from the repository .

Figure 13-10 E-mail management for compliance

By going to the list of all design patterns that we presented earlier in the chapter,
we marked potential relevant design patterns and summarized them in the
following shaded box.

394 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Applicable design patterns for the use case:
򐂰 Content ingestion:
– Ingestion Pattern, “EmailAutoRules” on page 353
– Stubbing Pattern, “EmailStub” on page 353
– Indexing Pattern, “EmailIndex” on page 354
– Classification Pattern, “EmailDocClass” on page 354
– Ingestion Pattern, “EmailDeclRecord” on page 354
– Classification Pattern, “EmailClassificationAutomation” on page 355
򐂰 Content and workflow management:
– Processing Pattern, “AdjDeclareAsRecord” on page 371
– Security Pattern, “AdjSecByValues” on page 372
– Security Pattern, “RestrictEmails” on page 354
򐂰 Delivery and presentation:
– Delivery Pattern, “FindByStoredSearch” on page 379
– Delivery Pattern, “DeliverInterceptedStream” on page 380
– Delivery Pattern, “RestrictBySecurity” on page 381
– Deletion Pattern, “RMSweep” on page 378

Discussion of the chosen patterns


The main pattern for this use case, EmailAutoRules, requires the project team to
carefully think about possible rules to ensure that most e-mails can be
recognized correctly and classified meaningfully. There are organizational
aspects, which help to model the rules simply, such as putting certain mailboxes
together on separate e-mail servers or introducing naming conventions for the
recipients’ mailboxes. A rule, which takes care of items that might not be fully
classified automatically, might be meaningful.

The level of automation in EmailClassificationAutomation can be determined as


soon as it becomes clear whether the classification can be completed at the point
of ingestion or whether it can be deferred to the CM and BPM part. In our
solution, we collect all of the e-mails of relevant mailboxes and are positive that
the rules that we specify cover every e-mail type.

This assumption makes it extremely easy to use the EmailDeclRecord pattern at


the time of ingestion as a built-in feature.

Without this assumption, we might defer the records declaration to the CM and
BPM part, AdjDeclareAsRecord, which typically requires human interaction to be
able to complete the classification.

Chapter 13. Solution building blocks 395


It is important to remember that a classification scheme must not be introduced
just for the purpose of e-mails but needs to be applicable to all kind of
information.

If the setup follows a simple structure, the EmailDocClass pattern is a good


choice. We decide here on an approach where we do not distinguish between
ingestion channels in our target document class model and, therefore, select a
custom class to implement all of our needs. We derive this class from the built-in
e-mail document class to ensure the linkage mechanism between the body of the
notes and the attachments. We illustrate this option in Figure 13-11 in the middle.
The option on the left does not use e-mail as its parent class and therefore does
not address linking attachments to the actual body. The option on the right
implements the various channels on each functional level of the document class
model. While the option on the left is potentially the simplest, there is additional
effort for the application. The option in the middle seems to have a good mix
between ready to use functionality. However, there is an overhead of extending
the e-mail document class that holds all the potential channels’ properties as
non-mandatory properties. The option on the right has a smaller number of
properties but has the burden of implementing all channels on every functional
level.

RelateEmail describes the options of linking attachments and body. Using an


option that is not derived from the built-in e-mail class requires one of the linking
patterns, which you can use by applying, for example, a value from the
messaging system, such as the Lotus Notes Unique ID, in a property. There is
more information about patterns to relate documents in “Relate and bind
document design patterns” on page 375. Figure 13-11 shows the difference in
modeling the document classes for e-mails.

Figure 13-11 Difference in modeling the document classes for e-mails

The pattern EmailIndex concerns when we will complete the index information
describing the ingested documents. If we are able to classify the documents
correctly, we can also assume that we can complete indexing at the time of

396 IBM FileNet Content Manager Implementation Best Practices and Recommendations
ingestion. In reality, this might be different. Carefully think how you want to index
the multi-item values (to:, cc:, and bcc: fields) either by the underlying database
indexing mechanisms or by the usage of verity fulltext indexing. This greatly
reduces the overhead of running queries against recipient lists.

In our situation, we decided to use the database, and we make sure that we have
an index applied over the ListofString table.

The pattern EmailStub requires at least two major decisions:


򐂰 Will the archived e-mails be accessed by users in other business contexts
rather than just retrieving the attachments from an e-mail link? If the answer is
yes, the e-mail documents must be protected by a group and user access
control entry (ACE). If the answer is no, the introduction of a technical user ID
in the ACL of the stored e-mails might help you to avoid setting individual
permissions on every single e-mail.
򐂰 At what point will mobile users accept that they no longer have access to the
attachments? Many clients apply a grace period between 20 and 90 days.

As we have mentioned before, the RestrictEmails pattern is derived from the


EmailStub pattern. There are situations where the rules are close enough to the
business rules that you can protect an e-mail not by the actual recipients but
instead by meaningful groups. This approach requires that the groups are
implemented in the directory protecting P8 Content Manager access. Being in
the position to know which security must be applied to address which rule
releases us from the problem of trying to protect e-mails by individual ACLs,
which unlocks the items for further usage in other business contexts.

FindByStoredSearch is a powerful mechanism of executing a search on


properties by giving the user a feeling of browsing through a folder structure. This
pattern was selected to move ahead using properties instead of folders while still
giving the users the same navigation feeling. With a bit of customization,
frequently used searches can be implemented in a virtual folder hierarchy that is
generated by stored searches. This guides the user toward the actual virtual
folder, which then runs the desired query.

DeliverInterceptedStream can be used in this scenario to implement a thin viewer


for rendering your mails server-side. While you have options at the Email
Manager Configuration to save the e-mail bodies as ASCII files, most clients
save e-mails in the native format of the messaging system. For clients running
Lotus Notes, the DXL format can be rendered with effort to HTML server-side
and displayed in your browser. For Microsoft Exchange or Novell GroupWise
clients, this might be possible, as well running the Application Engine on servers
with the correlating APIs installed.

Chapter 13. Solution building blocks 397


The RestrictBySecurity pattern guides you to implement a permission system,
which becomes stable over time. Whenever possible, consider restricting access
to e-mail groups and not on the individual recipient base in order to eliminate
situations where you might add thousands of ACLs to one e-mail. You might
think about implementing the recipients just as properties and not use them in
the ACLs.

The RMSweep pattern helps you to semi-automate the controlled deletion of


content. If you use P8 Content Manager in combination with Records Manager,
this use case is fully implemented and can be leveraged.

If there is no Records Manager installed, you might decide to implement certain


rules that adjust the end date (retention period) at the time of ingestion. You
might as well implement a Boolean property to preventing deletion for e-mail that
is mandated by a legal hold. You can use Workplace or FileNet Enterprise
Manager to search content where the date has elapsed and there is no legal hold
applied. After you locate the e-mail, you can delete it.

Consider switching object stores based on time intervals, for example, every
year. With this organizational approach, clients with huge e-mail ingestions can
get a more convenient method of improving search times and improving
throughput for deletion.

Before you delete anything, identify documents with a legal hold and move them
to a different object store. This approach helps you to easily drop the object store
and delete the file store, which might sound extremely pragmatic but can help to
speed up the deletion process tremendously.

398 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Related publications

The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this book.

Online resources
These Web sites are also relevant as further information sources:
򐂰 IBM FileNet Content Manager support Web site:
http://www.ibm.com/software/data/content-management/filenet-content-
manager/support.html
򐂰 Product documentation for IBM FileNet P8 Platform:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27010422
򐂰 You can obtain technical notices from the previous product documentation
Web site, in the Technical Notices section, including:
– IBM FileNet P8 Performance Tuning Guide
– IBM FileNet P8 High Availability Technical Notice
– IBM FileNet Content Engine Query Performance Optimization Guidelines
Technical Notice
– IBM FileNet Application Engine Files and Registry Keys Technical Notice
– IBM FileNet P8 Asynchronous Rules Technical Notice
– IBM FileNet Content Engine Component Security Technical Notice
– IBM FileNet P8 Directory Service Migration Guide
– IBM FileNet P8 Disaster Recovery Technical Notice
– IBM FileNet P8 Extensible Authentication Guide
– IBM FileNet P8 Process Task Manager Advanced Usage Technical Notice
– IBM FileNet P8 Recommendations for Handling Large Numbers of Folders
and Objects Technical Notice
– IBM FileNet P8 DB2 Large Object (LOB) Datatype Conversion Procedure
Technical Notice
Although several technical notices were written for the 3.5 version, much of
the content provided is useful for Version 4.0 as well.

© Copyright IBM Corp. 2008. All rights reserved. 399


򐂰 W3 Xtreme Leverage
http://w3.ibm.com/software/xl/portal
This is where you can find all kinds of information for the product of your
interest. Click the IBM Information Management link on the left side to look
for documentations available for this area.
򐂰 White paper IBM FileNet P8 4.0: Content Engine Performance and Scalability
using WebSphere Application Server v6 and DB2 9 Data Server on IBM
System p5 595
http://w3.ibm.com/software/xl/portal/viewcontent?type=doc&srcID=DM&d
ocID=M491211F04837D11

How to get IBM Redbooks publications


You can search for, view, or download IBM Redbooks publications, IBM
Redpaper publications, technical notices (Technotes), draft publications and
additional materials, as well as order hardcopy IBM Redbooks publications, at
this Web site:
ibm.com/redbooks

Help from IBM


IBM Support and downloads
ibm.com/support

IBM Global Services


ibm.com/services

400 IBM FileNet Content Manager Implementation Best Practices and Recommendations
Index
Application Program Interface (API) 20, 26, 154,
Symbols 157, 160, 163–164, 175–176, 195, 250, 252
.NET 160
Application Role 272
.NET API 154
application server 29, 68, 79, 104–105, 158, 165,
249, 257
A application-based replication 232
access control 132 architecture 5, 24, 218, 322, 385
Access Control Entry (ACE) 100 architecture and design sessions 93
access control entry (ACE) 135 architecture requirement capture session 93
Access Control List (ACL) 31, 100, 139 archiver.jar 280, 316
access control list (ACL) 135 association property 180
Access control matrix (ACM) 137 asymmetric 1-to-1 224
access data 132 asymmetric N 224, 226
access role 155 asymmetric server cluster configuration 226
ACL asynchronous
object 143 event subscription 174
Active content 5 asynchronous replication 234
active content 5, 16–17, 88, 174, 347, 350 audit log 286, 316
example 7 Content Engine 315
Active Directory 260 audit logging 295
Add Entry Template 263, 272 auditing 316
AddOn event 172 AUTHENTICATED_USERS 144
administration 282 authentication 31, 139, 167–168
AJAX 29 authorization 31, 139
alter data 132 automated system monitoring 320
annotation 16, 123–124 availability 216
annotation class 123 avoid downtime 217
annotations 123
Apache Foundation 272
Apache log4j 285
B
backup 231, 317
API batch 172
data 237
API call 361
offline 310
APIs 162, 360, 365
online 311
applet 156, 158, 202
point-in-time 232, 238
application container 158
required system components 309
application crash 217
backup and restore 308
application design 20, 25, 160
backup tape 238
impact on performance 328
backup window 310
Application Engine 44, 57, 270, 322
best practice
checking 325
folder structure 189
exporting and importing 270
best practices
multiple instances 46
disaster recovery 241
Application Engine (AE) 29–30, 50, 220, 240, 265,
high availability 240
283

© Copyright IBM Corp. 2008. All rights reserved. 401


Binary Large Objects (BLObs) 196 clear set 89, 92
bottom up approach Client-side transaction 171–172
repository design 89 clonable topology 259
bottom-up approach 89 cloning
BPF Explorer 156 deployment 259
Brewer and Nash model 136 cluster 34, 221, 243
browse paradigm 114 server cluster 227
bulk operations 208, 298 cluster manager 236
business continuity 213–214, 229 Comma Separated Value (CSV) 280
business continuity planning (BCP) 214 command-line option 158
business proce ss 220 common object properties 100
business process 5, 16–17, 68, 71, 220, 345, 347, communication
361, 365 between the engines 44
automation 111 compliance 20, 82
execution 29 compound document 118–119, 356, 376
management 88 child component 118
system step 71 example use case 119
Business Process Framework (BPF) 83, 156 major version 376
Business Process Management (BPM) 57, 59, 63, parent component 118
68–69, 71, 156, 174 configuration database (CMDB) 248
business Process Management (BPM) 63 configuration document 265
Business Process Manager (BPM) 250 configuration information 194
Business processes 5 configuration item 176, 251, 256
Configuring P8 Content Manager CBR 205
connected isolated region
C process data 45–46
cached content store 109
consistency check utility 312
caching 56
Consistency Checker 317
capacity monitoring 288
constraint
capacity planning 25–26, 32, 65, 68, 73, 385
property value 180
best practices 74
content
capture 82
active content 5, 16–17, 88, 174, 347, 350
cascading deletion 180
active content example 7
catalog database 195
ingestion 68
CBR 99
versioning 14
CEWS endpoint 166
Content Access Recording levels 297
Change management 251
content decomposition 87
changes
content document 265
control 133
Content Engine 44, 62, 74, 157, 159, 171, 322–323
choice list 95–96, 112, 193–194, 266, 268
audit log 286, 315
actual creation 123
checking 325
data type 122
database 28, 30
design recommendation 123
enable tracing 328
folder 122
event subscription model 175
multiple entries 122
hybrid scalability option 36
wizard 122
problem determination 327
choice lists 122
referential integrity mechanisms 169
class type
transactional behavior 171
Java APIs 164
Content Engine (CE) 28–30, 72–73, 103, 154, 156,

402 IBM FileNet Content Manager Implementation Best Practices and Recommendations
198, 200, 218, 220 catalog 195
message logs 283 Content Engine 28, 30
Content Federated Services (CFS) 10, 81 exporting and importing 269
content ingestion row limit 181
sizing questionairs 68 database schema 287
Content Manager database store 109, 194, 196
full deployment 261 database view schema 208
incremental deployment 261 database-based replication 235, 241
content object 25, 28, 56, 86, 93, 115, 186–187, default instance security 145
197, 232 Demilitarized Zone (DMZ) 137
single version 187 deployment 252, 256
content storage 108–109, 197–198, 241 cloning 259
load-balancing capabilities 198 full 261
single logical target 198 incremental 262
content store 96, 109 deployment approach 256
content-based retrieval (CBR) 205 design
control changes 133 document class, based on content 112
coupling 87 document class, based on function 112
CPU utilization document class, based on organization 111
Dashboard 278 impact on performance 328
crash loggging 177
application 217 repository 93
cross repository search 203 repository, bottom up 89
custom application 20, 48, 56, 69, 156–157, 240, repository, interviewing process 92
361 repository, top-down 90
Custom object design methodology 13, 22
class 117–118 design pattern 361, 365, 368
classes characteristic 118 design patterns
custom object 95–96, 117, 181, 205, 207, 261, 266, definition 351
360 development 253
class 117 direct ingestion 367
custom object class Direct Internet Message Encapsulation (DIME) 166
design recommendation 118 directory server 31, 53
custom property 70, 99, 113, 116, 173, 195, 202 directory service 29, 31, 35, 141, 267, 353, 357,
381
different organizational unit 47
D Directory Service Provider
Dashboard 276, 278–279, 281, 323
exporting and importing 270
data
disaster recovery 37, 214, 223, 229–230, 239, 242
access 132
best practices 241
data backup 237
common approaches 237
data integrity 132
disk utilization
data loss 215, 232, 234, 239
Dashboard 278
data model
Display name 94, 99
creating 179
disruption 230
data privacy 133
disruptive event 214
data replication 223
critical business functions 214
data segregation 44–47, 52
distributed system 52–53, 57
database
DNS server entry 240

Index 403
document reason for 247
revision cycle, example 6 event 126
revision process 14 AddOn 172
state 8 design recommendation 127
document class 19, 86, 110–111, 181, 194, 196, On Add 16
198, 252, 259, 263, 265, 342–344, 362 event action scripts 8
actual creation 113 event subscription
database storage 196 asynchronous 174
design 112 synchronous 174
design based on content 112 event subscription model 175
design based on function 112 explicit object security 146
design based on organization 111 export 267
design recommendation 113 Export security feature 267
document content 111, 196, 200 export sequence 265
document life cycle 125 exporting 258
Document type 126 exporting and importing components 270, 272
document type 188–190, 362, 387
domain 53, 101–102, 206
Domain Name Server (DNS) 241
F
facility management 214
downtime 216
failback 222
avoid 217
failover 221–223
dynamic privacy 133
farm 34, 218, 221
server farm 227
E farming 34
eForm 71, 360, 377 Fax Capture 17–18
EJB transport 157, 159 Fax capture 16
disable transaction propagation 166 fetch 169
good model 179 fetchless instantiation 169
Java API 171 file storage area 241
reverse proxy 179 file store 28, 44, 80, 96, 109, 196–197, 199–200
workable reverse proxy 179 file store device 200
EJB™ transport layer 60 file system 24, 28, 35, 96, 113, 115, 186, 200, 232,
electronic document 86, 119, 340, 356–358 356–357, 363, 379
email 349, 353–354 filed
Email Manager 11, 20–21, 82, 210, 341, 357, 392, folder option 186
397 FileNet Enterprise Manager
e-mail message 187 annotation wizard 123
encapsulation 87 document object properties dialog 119
engine 28 wizard interface 112, 116
engines FileNet Enterprise Manager (FEM) 44, 50, 94, 101,
communication 44 149, 154–155, 157, 194–195, 289–290, 294, 325
enterprise configuration management database enable trace logging 293
253 installation 206
Enterprise Java Bean (EJB) 159 tree view 195
entry template 24, 251, 258, 271 FileNet P8
environment 135, 246 domain 32, 54
testing 254 FileNet P8 Platform
environments 246 planning 258, 264

404 IBM FileNet Content Manager Implementation Best Practices and Recommendations
FileNet System Monitor 283 Distance (HACMP/XD) 236
FileNet System Monitor (FSM) 282 horizontal scalability 33, 228
final time 108, 113 horizontal scaling 33
fixed content store 109, 200 host-based replication 232–233, 235
consideration, vs file store 200 hot site
fixed storage device 200 third party recovery services 237
folder 128, 265
design 116
hierarchy 197
I
IBM Customer Number (ICN) 329
inherited security 191
IBM FileNet
folder class 95–96, 113–114, 116, 164, 181
Content Service 241
actual creation 116
Fax Capture 17
folder hierarchy 191
support side 264
folder object
system capacity planning tool 65
recommendation 128
IBM FileNet P8
folder option
Business Process Manager 5
un-files and filed 186
Content Engine 82
folder structure 186, 188, 191, 251, 263, 356, 358,
Content Federated Service 10
379, 385
documentation 161, 167
best practice 189
eForms 156
full deployment 261
Enterprise Manager (FEM) 44, 50
fulltext
family 28
exporting and importing 270
Forms Manager 9
functional area 111
Image Manager 10
functional design 22–24, 383, 389, 391
Records Manager 12
functional requirement 23
system 65, 79
IBM FileNet P8 Platform 3
G IBM FileNet P8 platform 28
generic object system properties 98 IBM FileNet P8 system 28, 35, 39
geographic cluster manager 236 IBM Metro Mirror (PPRC) 233
geographically-dispersed farm 243 IBM support 329
global cluster manager 236 Image Server 74
Global Configuration Database (GCD) 44, 53, 105 import 267
Global Configuration database (GCD) 29, 32, 94, import sequence 265
97, 172 importing
Globally Unique Identifier (GUID) 262 objects 269
grantee 143 inbound documents 368
GUID 253, 258, 262–263 incremental deployment 262
index 354, 356, 362
index area 206, 241
H Information capture 14, 392
heartbeat 34
information capture 68
help 274
ingestion
hierarchy
content 68
folders 197
content, sizing questionairs 68
high availability 218–219, 229, 242
direct 367
best practices 240
ingestion rate 4
high availability (HA) 214–215
inheritance
High Availability Cluster Multiprocessing/Extended

Index 405
object class 180 Lightweight Directory Access Protocol (LDAP) 31,
instantiation hierarchy 129 100, 108
integration 358, 362–363, 366 LineItem 180
integration testing 254 LineItems property 180
integrity lines of business (LOB) 111
data 132 load balance 18
interviewing process load balancer 19–20, 34, 50, 218, 220
repository design 92 layer 50
isolated region 29, 44–46 product 219
virtual IP address 240
load balancing 34, 219, 221
J session-based 220
J2EE application
load testing 254
development 159
load-balanced server farm 218–220
server 30
local area network (LAN) 102–103
J2EE application server 105, 158, 174
log
instance 105
Process Engine 315
message logs 284
log4j 285
vender 218
setting 285
J2EE container 171, 178
log4j.xml.server 285
J2EE environment 218
logging 177
J2EE servlet container 159
design 177
J2EE specification 157
logs
Java API 80, 156, 159, 182
message logs 283–284
reference material 161
long string 182
Java APIs
class type 164
Java applet 158 M
Java Authentication and Authorization Service maintenance
(JAAS) 31, 150–151, 167–168 best practices summary 316
Java Server Faces (JSF) 29 logs 316
Java Virtual Machine (JVM) 44, 105 maintenance planning 26
Java Virtual Machine (JVM™) 30 major version 381, 393
JDBC interface 173 version
major 15
many-to-many relationship 181
K marking set 127
knowledge base 256
recommendation 128
knowledge worker
maximum downtime 216
business role 92
memory 279
message log
L maintenance
LDAP 134 logs
life cycle maintenance 316
design recommendation 126 message logs 283–284
document 125 Message Transmission Optimization Mechanism
policy 374 (MTOM) 166
Life cycle policy 126 meta information 271
lifecycle 5 metadata class 161, 163

406 IBM FileNet Content Manager Implementation Best Practices and Recommendations
metadata elements multiple fixed content stores 109
organizational 188 multiple object stores 262
minimal disruption 230 name 268
minor version property template folder 121
version repository 206
minor 15 search 196
mission critical system 282 security 45–46, 143
monitoring services component 241
system 320 tab 203
multiple folders 113 underlaying database scheme 262
multiple locations 128 various objects 271
multi-repository search 53 object store gate 144
multiselect operations 208 object store security 143
object-oriented design (OOD) 86
ObjectStore 161
N object-valued properties 180
NAS replication 233
object-valued property 169–170, 208
NetApp® Snaplock 200
On Add event 16
network address translation (NAT) 257
on-line help 274
network device 219
operation
network topology 133
bulk 298
Network utilization
bulk operation 208
Dashboard 278
Oracle RAC 221, 228
Network-Attached Storage (NAS) 232
organizational metadata elements 188
non-functional requirement 23
organizational metadata properties 188

O
object P
P8 Content Manager 154, 156–157
ACL 143
Administration section 206
generic, system properties 98
APIs 165, 177
object class
architect technical role 91–92
inherintance 180
catalog database 195
object gate 144
client 104
object security 146
configuration information 194
object store 28, 44, 80, 94–95, 154, 193–194, 250,
content transaction 187
252–253, 369–370, 387, 392
database view schema 208
actual creation 108
document life cycle 125
administrator 108
folder 113, 115
box population 263
foldering concept 128
configuration 197
help file 208
creation wizard 263
product documentation 163
database 108, 195
release 161
design 106
repository 186–187, 193
design recommendation 108
repository element 88
GUID 253
search 201
import assets 267
search tool 205
initial ACL 108
solution 110, 181, 218
maintenence activity 208
support area 161
multiple file stores 109

Index 407
update 172 message logs 283
P8 Platform 3 Process Simulator 155
P8 platform 28 properties
P8 system 28, 35, 39 object 100
parent folder 192 property template 95–96, 119, 259
pattern 87, 382 actual creation 122
PDF rendition design recommendation 122
feature record 17 Special considerations 96
peak hours 69 property value
pending change 170 constraint 180
perf_mon 73 PropertyDefinitionString 182
performance 79 PropertyDescriptionString 182
application design impact 328 PropertyTemplateString 182
monitoring 275
trace log 284
troubleshooting tips 327
Q
queries
performance archiver 280
creating and running 287
performance data
report 316
capture 316
using database schema 287
performance issue 40
Query Builder 206–207, 287, 298, 301, 304
performance test 56, 80, 255
Query Builder Script 207
performance testing 254
questionairs
physical security 133
content ingestion 68
PMR
sizing, user activities 69
open by calling IBM 330
QueueItem table 298
open via Web 330
point-in-time backup 232, 238
policy R
life cycle 374 Real Application Cluster (RAC) 34, 218
security 344, 351, 375 recommendation
post-install script 172 choice list design 123
pre-fetch 56 custom object class design 118
preload 56 document class design 113
preloaded cache 56 event action and subscription design 127
primary function 188 folder design 116
privacy folder object 128
data 133 life cycle design 126
problem 326 marking set 128
isolation 323 object store design 108
Problem Management Record (PMR) 329 property template design 122
Process Designer 156 site design 104
Process Engine 44 virtual server 104
checking 325 Records Crawler 18–19, 83, 340, 358, 385, 392
exporting and importing 270 records management 71
isolated regions 46 records management (RM) 68, 71, 92, 127
log database 315 Records Manager 20, 48, 71, 240, 398
statistics log 286 separate, database object store 210
transaction rates 83 recovery 230, 237
Process Engine (PE) 29, 78, 82, 220, 240, 269 disaster 214, 223

408 IBM FileNet Content Manager Implementation Best Practices and Recommendations
disaster, common approaches 237 process, document 14
Recovery Point Objective (RPO) 230–231 round-trip 162, 169–170
recovery service Content Engine 170
third-party hot site 237 multiple objects 162, 170
recovery site 230–231 round-trips
replacement systems 241 minimizing 169
Recovery Time Objective (RTO) 230–231 route control 8
recursion level 170
Redbooks Web site 400
Redundant Array of Independent Disks (RAID) 215
S
scalability 35, 52
redundant standy system 237
scaling
references
horizontal 33
Webservices 268
vertical 35
referential integrity 180
scaling scenario 63
referential integrity mechanisms 169
schema
reflective property 180–181
database 287
Container 181
Scout 65–66, 73
mechanism 180
output 71
regression test 254–255
sample output 72
performance test 255
use cases 67
small suite 255
utilization chart 71
regression testing 254
search 114, 201, 298
relational database (RDMS) 194
cross repository 203
release 250
folder design recommendation 116
release management 250
Workplace 205
release manager 250
search criteria expression 203
Remote Method Invocation (RMI) 166, 179
search criterion 114, 187, 202
Remote Procedure Call (RPC) 278
search paradigm 114
Rendtion Engine 29
search server 206
replicated data clusters 223
Search Template 272
replication 232
search template 24, 155, 202, 206, 208, 251, 254,
application-based 232
258, 263
database-based 235–236, 241
searches
host-based 232–233, 235
stored 155
storage-based 233, 235–236
security 31, 306
replication choice 237
default instance 145
report
explicit object security 146
Dashboard 280
inherited, folder 191
report queries 316
object store 143
repository 186–187, 193
security changes 150
design 89, 93
security features 132
design goal 86
security granularity 86
naming standard 93
security policy 15, 208, 344, 351, 375
repository design 25, 85, 88, 98, 253, 255, 356, 363
identifying ID 208
request forwarding 57
security verification 134
restore
server capacity 328
system 308, 312
server cluster 221
reverse proxy 179
active-active 222
revision

Index 409
active-passive 220 SQL View 207–208
configuration 224 SSO framework 168
software products 224 full discussion 168
server clusters standby system
comparing with server farms 227 redundant 237
server farm 18–19, 217–219, 221 state
essential difference 221 document 8
key enabler 218 static privacy 133
load balancing 219 statistics log 286
load-balanced 218–220 Storage Area
server farms Network 232
comparing with server clusters 227 storage area 28, 95, 113, 124, 197–199
server instance 35, 54–55, 101, 104–105 design 108
Service Level Storage Area Network (SAN) 221, 223
Agreement 215 storage farm 198
Service Level Agreements 282 storage policy 113, 124, 198–199
Service Oriented Architecture (SOA) 159, 162, 249 storage-based replication 233, 235
Service-Oriented Architecture (SOA) 159 database-based replication 236
servlet container 159 emerging specialization 233
session-based load balancing 220 stored search 202
shared infrastructure 25, 27, 43, 49 Stored Search definition 263
shared storage 221 stored searches 155
short string 182 stretch cluster 223
single round-trip 170 string-valued property 182
Content Engine 170 subject matter experts 89
multiple objects 162, 170 subscription
single server event 127
architecture 220 event, design recommendation 127
instance 104 swap death 328
single sign-on (SSO) 167–168 symbolic name 94, 99
site 54, 101 symmetric cluster 225
recommendation 104 symmetric server cluster 225
Site Preference 271 synchronous
sites 103 event subscription 174
sizing 74 synchronous replication 234
disk space 78 system
hardware 78 backup and restore 308
system 72 restore 312
system, user activity questionairs 69 sizing 68
sizing questionairs system architecture 24, 385
content ingestion 68 System Capacity Planning Tool 66
sizing system 68 system components
software release manager requiring backup 309
challenges 249 system integration testing 254
software service 28–29 system log
solution building blocks maintenance 286
definition 338 System Manager 275, 281
Sout 74 System Manager client
SQL database 62 Dashboard 276

410 IBM FileNet Content Manager Implementation Best Practices and Recommendations
System Manager Dashboard 72 content 187
System Manager server transaction load
Listener 276 handling 5
system monitoring 320 transaction rate 187
system properties transformation 268
generic object 98 transforming 258
system testing 254 transport 165
transports
comparing 166
T troubleshoot 319
table row 181
troubleshooting
relatively wide areas 182
performance 327
taxonomy 95
performance, tips 327
technical user
pattern 382
technology community 176 U
template un-filed
entry 251, 258, 271 folder option 186
property 119, 259 unit testing 254
search 155, 202, 206, 208, 251, 254, 258, 263, use case 43, 60, 154, 176, 187, 209, 355, 357, 371,
298 383
search template 24 software module 176
test use cases
automation 255 Scout 67
performance 255 user acceptance testing 254
performance test 255 user activities
regression 254–255 sizing questionairs 69
regression test 255 user experience 123, 158
test environment 80 unpleasant aspect 158
testing 253 user interaction 353, 357, 360, 379
regression testing 254 User Preference 271
thick client 158 user-interface component 94
thin application 158 UsesLongColumn 182
threshold 74 utilization 73
toolkit 162–163 Dashboard 278
top-down approach 90
repository design 90
top-level directory 197
V
VBScript 207
topology
verification
cloning 259
security 134
trace log 284
version
maintenance 286
major 381, 393
trace logging 55
versioning 68, 350, 374, 393
enable 293
content 14
tracing
vertical scalability 35
capture SQL syntax 328
vertical scaling 35
transaction
virtual 38
behavior 171
virtual machine 29, 38
client-side 171–172
virtual machines monitor (VMM) 38

Index 411
virtual memory 328
virtual private network (VPN) 257
virtual server 39, 50, 53–54, 93, 101, 104–105,
206, 218, 221
horizontal scaling 39
recommendation 104
virtualization 25, 35–36, 52
operating system-level 39
VMWare 257

W
WAN 58
Web service 157, 160, 167, 268, 358, 360
common implementation technology 160
external references 268
Web services description language (WSDL) 162
Web Services Extensible Authentication Framework
(WS-EAF) 167
Web Site Voice 123
wide area network (WAN) 102–103, 211
wizard display 108, 112
workflow 7, 11, 16–17, 155–156, 338, 347, 350
workflow activity 45, 174
workflow definition 155, 265, 268
workflow management
requirement 24
workflow step 16
workload 74
modeling 66
workload modelling 73
Workplace 44, 271–272, 325
exporting and importing 272
Workplace search 205
CBR feature 205
Workplace XT 155, 157, 379
WS transport 165–166, 179
WSDL file 162–163

X
XML 263, 278
XML file 4, 19, 265
XML manifest file 267

412 IBM FileNet Content Manager Implementation Best Practices and Recommendations
IBM FileNet Content Manager Implementation Best Practices and Recommendations
(0.5” spine)
0.475”<->0.875”
250 <-> 459 pages
Back cover ®

IBM FileNet Content Manager


Implementation Best Practices
and Recommendations
®

Use system IBM FileNet Content Manager provides full content life cycle
architecture, and extensive document management capabilities for digital INTERNATIONAL
capacity planning, content. IBM FileNet Content Manager is tightly integrated TECHNICAL
and business with the family of IBM FileNet P8 products and serves as the SUPPORT
continuity core content management, security management, and ORGANIZATION
storage management engine for IBM FileNet P8 family of
products.
Design the
repository, security, This IBM Redbooks publication covers the implementation BUILDING TECHNICAL
application, and best practices and recommendations for IBM FileNet Content INFORMATION BASED ON
solution Manager solutions. It introduces the functions and features of PRACTICAL EXPERIENCE
IBM FileNet Content Manager, common use cases of the
product, and a design methodology that provides
Learn to deploy, IBM Redbooks are developed by
implementation guidance from requirements analysis
administer, and the IBM International Technical
through deployment and administration planning. Support Organization. Experts
maintain
The book addresses various implementation topics including from IBM, Customers and
system architecture design, capacity planning, business Partners from around the world
create timely technical
continuity, repository design, security, and application information based on realistic
design. Administrative topics covered include deployment, scenarios. Specific
system administration and maintenance, and recommendations are provided
troubleshooting. We also discuss solution building blocks to help you implement IT
that you can specify and combine to build a solution. solutions more effectively in
your environment.
This book is intended to be used in conjunction with the
product manual and online help to provide guidance to
architects and designers about implementing IBM FileNet
Content Manager solutions. For more information:
ibm.com/redbooks

SG24-7547-00 ISBN 0738485829

You might also like