Momentum2011 0103

TRANSFORMATION GATEWAY
Optimizing EMC Documentum:

Best Practices for Deployment
Ed Bueché
EMC Distinguished Engineer
TRANSFORMATION GATEWAY
Agenda
• Documentum D7 Performance & Availability Enhancements

• Documentum xPlore HA / DR best-practices
Documentum D7
Performance & Availability Enhancements
• Session pooling improvements
• Rolling upgrade support
• Dynamic Capacity Allocation
Session pooling background:
How “sessions” can be shared
time
Session for user 2
Session for user 1

Long random
pause
= User 2’s Interaction with server

How “sessions” can be shared
time

Operating Context within Documentum
Discrete interactions
DFC
Users HTTP App Content RDBMS
Server Server Server
(Browsers)
Pooled
sessions
Web-tier server
Session pooling changes for D7
Multi-user test (200 users)
• Session pooling context switching session count
improved to be significantly faster
– In some simple scenarios 100x 160
faster worse
140
120
• Pool session replacement is now
LRU 100
80
• Session request queuing 60
– Bursty behavior of users doesn’t

lead to DB connection run-up, 40 better
rather to delay in getting session
20
– Huge benefit for massively large
environments 0
D6.7 D7.0 prototype
Continuous Availability Background:
A Comparison of release types
Item Patch Service Minor Major
Release Pack Release Release
Can back out software changes    
(without restore)
Provides new functionality    
Potentially data model changes    
 = Yes
 = Yes, but to a small degree
 = No, but historically, some small low-risk deviations have happened
 = No
D7 Rolling upgrade
• D7 target : rolling upgrade of patch releases

– Still under evaluation: rolling upgrade of service packs
and minor releases
• Assumptions and caveats apply

– Architectural deployment in “server pods”
– Systems properly sized
– Load balancers likely required to isolate user traffic
– Tests and limitations to be published
D7 Dormant Mode Feature
• A state which privileged users (Data Center Managers) can set so that a
server component becomes read-only
– Content Server enforced restrictions
• No changes on Database for meta-data nor on File system for content files
• Agent exec won’t process jobs
• No methods can be launched
• No SQL passthrough (i.e., the execsql call)
• Not generating audit trail nor event notification
• Existing open transaction can make changes until committed or aborted
• No new connections
o Can set dormant on connection only for load balancing
o DFC will avoid connecting to dormant servers
– JMS enforced restrictions
• Not allowing Http Post
– xPlore enforced restrictions
• Not indexing any content.
• Data Center Managers can always connect and enable updates for own
sessions at dormant state if needed
Rolling Upgrade Procedure for Patches
• Upgrading components without shutting down the whole

system
– Assumption: patch can be backed out
• no state change that the previous software can’t handle
• Procedure:
1. In server cluster, setting dormant state on a server,
2. shut it down when sessions have terminated
3. Load new software
4. restart with a new binary.
5. Repeat the same process until all servers are
upgraded
Potential Procedure for Patch upgrade
Example
#1 – App server to create new
session and refresh active content Application
server map Server
(DFC)
Docbroker #2 – docbroker
provides list of
“active” servers
Content Content Content

#3 – session is established
Example
Application
Server
(DFC)
#3 – Sessions
Docbroker informed of
“dormant” state

#2 – Content server informs docbroker

that it is now in “dormant mode”. #1 - Administrator directs this
Docbroker will now not send any Content Server instance to go
sessions to that Content Server into “Dormant Mode”
Example
Application
Server
(DFC)
Docbroker

Content Server instance can now be

upgraded and later brought back
into “active state”
Potential Procedure for Service Pack Upgrade
• Technologies
– Storage Snapshot mechanism (e.g., EMC Timefinder)
– Documentum D7 Dormant mode
– Application “Dormant mode awareness”
• Procedure
1. Place production system into Dormant mode
2. Have it operate on read-only point-in-time snapshot of filesystems
3. Create writable snapshot copy
4. Upgrade on snapshot copy
5. If upgrade succeeds then move users to upgraded copy and discard read-only
point-in-time snapshot
6. If upgrade fails then discard writable snapshot copy and bring point-in-time
snapshot into write mode and bring dormant production environment back
into normal mode
Potential procedure for upgrade with service
packs illustration
Documentum 7
(CS, xPlore, DB)
DB Content fulltext
1. Place production system into Dormant mode

2. Have it operate on read-only point-in-time snapshot of
filesystem
Step #3: Create writable snapshot copy
Step #4: Upgrade on snapshot copy
Dormant mode D7 only

Documentum 7 sees original data
(CS, xPlore, DB)
DB Content fulltext
Documentum 7
Upgraded D7 sees new (CS, xPlore, DB)
data changes Upgraded D7
Step #5 If upgrade succeeds then move users to
upgraded copy and discard read-only point-in-time
snapshot
Snapshot original
delta data to
discard
DB content fulltext
Documentum 7
Upgraded D7 sees new (CS, xPlore, DB)
data changes Upgraded D7
Agenda
• Documentum D7 Performance & Availability Enhancements

• Documentum xPlore HA / DR best-practices
Documentum xPlore HA/DR best-practices
• Tip #1: Establish RTO / RPO for Deployment

• Tip #2: “Recovery with Re-feed” an option for small
repositories only
• Tip #3: xPlore repair tools useful, but not sufficient to
achieve RTO
• Tip #4: HA with Direct Attached SAN’s
• Tip #5: ftintegrity use in Point-in-time recovery
• Tip #6: Know differences between FAST and xPlore
HA/DR techniques
Tip #1: Establish RTO / RPO for Deployment
 The RTO is the target time to restore the system into service
– It is typically a “Service Level Agreement” made from the IT group to
the business users
– Goal of the underlying software: enable IT’s ability to meet it
– Tools: Failover & Disaster Recovery
– RTO could be defined in terms of complete system availability or in
partial system availability
• The more “mission critical” the system is, the shorter the RTO
will be
Example of different RTO for different
“services” of Documentum
Component Failure Service Example RTO
Content Server Document viewing, Within minutes

checkin, checkout
DTS Ability to transform 12 hours
documents
FAST or xPlore Search 2 hours
BAM Business Activity 24 hours

Monitoring
Recovery Point Objective (RPO)
 RPO defines the amount of data that can be lost in a failure
– Defined in terms of minutes, hours, or days
 A short RPO (hours/minutes) typically implies either very frequent incremental
backups or complete duplicate systems
 For some Documentum components (like xPlore the new Full Text search) RPO
is always defined in terms of the RPO of the Content Server
– Any data “lost” in a failure on xPlore can be re-fed from the Content Server
– Backing up xPlore or duplicating the processing is always meant as a goal to
shorten the RTO, not to achieve an RPO
Data potentially lost in crash

time
Point of last backup Point of crash (disk corruption

for example)
Types of possible failures for xPlore deployment
Establishing a procedure to meet an RTO is a healthy exercise of pessimism
Failure Type Synopsis Repair strategies

Operating OS crash, power loss, Recycle software. Refeed
Environment JVM crash, CPS crash, any documents that didn’t
software failure out-of-memory get indexed properly
situations
Server Hardware Physical hardware Restart VM on alternate
Failure problem (memory, cpu, hardware or physical
etc) backup server
Logical Data Rare logical Restore from backup (and

Inconsistency inconsistency potentially using xPlore
rebuild / repair tools)
Physical Disk Typical cause: NAS Restore from backup (and

corruption protocol issue. potentially use xPlore
rebuild/ repair tools)
Tip #2: Recovery with Re-feed an option for small
repositories only
• Re-feeding the data is always possible
– Highly optimistic approach
– Does not have much of an ongoing operational cost
• However, unless the repository is small it is unlikely
to meet the established RTO
• Recommendation:
– At least take a full backup each week
xPlore Disaster Recovery Strategy Comparison
Strategy Normal Recovery Time comment

mode
operational
Expense
Re-feed $ Days to weeks Should only be used
for very small
docbases
Full Restore from $$ hours Backup can be from a

backup hot backup
Fail over to another $$$$ Fastest Requires duplicate
xPlore system (dual hardware.
(possibly within
mode)
seconds) But also may require
one of the above to
bring failed primary
back online.
Tip #3: xPlore repair tools useful, but not
sufficient to achieve RTO
• Tools are available with xPlore to fix various logical
and physical inconsistencies
– Between Content Server and xPlore
– Between xDB in xPlore and Lucene index
• The tools have the advantage of being surgical and
fast
– However, these tools are not guaranteed to work in all
cases
• Customers are advised to establish procedures to
meet an RTO based on tested failover and
backup/restore mechanisms
Tip #4: HA with Direct Attached SAN’s
• Background on xPlore storage & multi-

node
• A SAN-based multinode deployment
• Technical notes
Physical Layout
An XML Document (e.g., DFTXML)

can be thought of as a collection of
elements, attributes (or ‘xml nodes’)
A
B
C
D
E
This node structure can
be represented as a tree
Database A B C D E
page
XDB concepts
• xDB Library ≈ xPlore Collection
– Logical and Physical container for other libraries and/or
XML Documents
– Hierarchical in structure
– Can be associated with its own physical storage segment
(file / file system)
• Query Processing over collections
– Without qualification, queries from the root library
proceed to examine all libraries / collections
– This process is made faster by creating some indexes that
are scoped over the entire library
Libraries / Collections & Indexes
= xDB Library ≈ ESS collection
A = xDB Index
= xDB xml file (dftxml, tracking

xml, status, metrics, audit)
B C
= xDB segment
Scope of index
covers all xml files in
all sub-libraries
A
C
B
xPlore Collections at a Glance..
xDB Root Library
SystemData dsearchConfig DocbaseName-1 DocbaseName-2 DocbaseName-3
• xPlore supports consolidated or multi-tenant environments

within a single instance
• Each Docbase / Tenant’s data is a child of the Root Library
xPlore Collections at a Glance..
• Each Docbase / Tenant has three
major collections
– Application Info: currently acl
xDB Root Library and group collections
– SystemInfo: used by the

Indexing Pipelines to locate or
track the indexing progress of
SystemData DocbaseName documents
– Data: Holds the DFTXML for each

dsearch document in Documentum
ApplicationInfo SystemInfo Data

Scope of an xPlore Domain
xDB Root Library
SystemData Docbase-1 Docbase-2
Scope of Domain dsearch dsearch
ACLs/Groups Data Data
C1 C2 C3 C4
Scope of an xPlore Collection
xDB Root Library
SystemData Docbase-1 Docbase-2
dsearch dsearch
Scope of Collection
Data Data
C1 C2 C3 C4
Each xPlore Instance (Node) “owns” whole domains
or collections plus a transaction log
Logical structure mapped to physical files
Data “owned” by instance (node)
XDB transaction log Data and indexes for domains and collections
Multi-Instance (multi-node) and
data ownership
Each Instance (or node)

“owns” a portion of the
data (from the domain or
collection level)
The instance's transaction

log is used during
recovery for the data on
that Instance
Multi-Instance (multi-node) and SAN’s
Each host has SAN access

to its own data and cross
mounts this via NFS (or
CIFS) to the other
instances in the multi-
node deployment for low
intensity operations
Host A Host B
Best performance for high

capacity local traffic
Multi-Instance (multi-node) and NAS
A NAS implementation is
similar but all access is
through NAS protocol
(high and low volume)
Again, all else equal, SAN

architecture will perform
better
Host A Host B
NAS
Sharing Needs and usage
Item to share Use-case notes
Indexserverconfig.xml Shared and owned by Light on network usage
primary to all secondary
nodes
Collection Created by primary on Light on network usage

storage that is SAN when just created on
attached to secondary and primary and bound right
then later logically away to secondary
“bound” to secondary
Collections / logs Full native backup Heavy on read usage as all

blocks are copied over
network.
Notes on SAN-based multi-node
• Spare node HA mechanism doesn’t work, but alternatives

include
– active/passive clusters (like Microsoft cluster server)
– Vmware vmotion
– Active / Active dual system
• Load balancing (rebinding) of collections to any node is difficult

– But possible
Tip #5: ftintegrity use in Point-in-time recovery
• Ftintegrity defined
– Tool used to ensure identify any documents in the content
server that did not properly get indexed into xPlore
• Default behavior
– Look at all objects in the repository
– This can be slow for large repositories
• Date Range option

– Can be used check consistency for a objects created /
modified within a range
– Parameters: start and end dates
– Can significantly cut down on number of objects that have to
be examined
Tip #6: Know differences between FAST and
xPlore HA/DR techniques
Use-case FAST xPlore

Backup Cold-backup, but could backup Hot-backups: full & incremental
only fixml Warm-backups: collection-level
& domain-level
Restore Restore from fixml and Restore from backup and
ftintegrity point-in-time recovery ftintegrity point-in-time recovery
Spare Node HA Not supported Supported
Active / Passive Not supported Supported
HA
Active / Active Supported Supported
dual system
• Questions?
THANK YOU
This presentation is also available at
www.momentumeurope.com
password: spree

Momentum2011 0103

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Momentum2011 0103

Uploaded by

Copyright:

Available Formats

TRANSFORMATION GATEWAY

Optimizing EMC Documentum:

• Documentum D7 Performance & Availability Enhancements

Session for user 1

= User 2’s Interaction with server

= User 1’s Interaction with server

= User 2’s Interaction with server

= User 1’s Interaction with server

• Session request queuing 60

– Bursty behavior of users doesn’t

Potentially data model changes    

• D7 target : rolling upgrade of patch releases

• Assumptions and caveats apply

• Upgrading components without shutting down the whole

Content Content Content

Content Content Content

#2 – Content server informs docbroker

Content Content Content

Content Server instance can now be

1. Place production system into Dormant mode

Dormant mode D7 only

• Documentum D7 Performance & Availability Enhancements

• Tip #1: Establish RTO / RPO for Deployment

Content Server Document viewing, Within minutes

BAM Business Activity 24 hours

Data potentially lost in crash

Point of last backup Point of crash (disk corruption

Failure Type Synopsis Repair strategies

Logical Data Rare logical Restore from backup (and

Physical Disk Typical cause: NAS Restore from backup (and

Strategy Normal Recovery Time comment

Full Restore from $$ hours Backup can be from a

• Background on xPlore storage & multi-

An XML Document (e.g., DFTXML)

= xDB Library ≈ ESS collection

= xDB xml file (dftxml, tracking

xDB Root Library

SystemData dsearchConfig DocbaseName-1 DocbaseName-2 DocbaseName-3

• xPlore supports consolidated or multi-tenant environments

– SystemInfo: used by the

– Data: Holds the DFTXML for each

ApplicationInfo SystemInfo Data

xDB Root Library

SystemData Docbase-1 Docbase-2

Scope of Domain dsearch dsearch

ACLs/Groups Data Data

xDB Root Library

SystemData Docbase-1 Docbase-2

Logical structure mapped to physical files

Data “owned” by instance (node)

Each Instance (or node)

The instance's transaction

Each host has SAN access

Best performance for high

Again, all else equal, SAN

Collection Created by primary on Light on network usage

Collections / logs Full native backup Heavy on read usage as all

• Spare node HA mechanism doesn’t work, but alternatives

• Load balancing (rebinding) of collections to any node is difficult

• Date Range option

Use-case FAST xPlore

You might also like