You are on page 1of 46

TRANSFORMATION GATEWAY

Optimizing EMC Documentum:


Best Practices for Deployment

Ed Bueché
EMC Distinguished Engineer

TRANSFORMATION GATEWAY
Agenda

• Documentum D7 Performance & Availability Enhancements


• Documentum xPlore HA / DR best-practices
Documentum D7
Performance & Availability Enhancements
• Session pooling improvements
• Rolling upgrade support
• Dynamic Capacity Allocation
Session pooling background:
How “sessions” can be shared
time
Session for user 2

Session for user 1


Long random
pause

= User 2’s Interaction with server

= User 1’s Interaction with server


Session pooling background:
How “sessions” can be shared
time

= User 2’s Interaction with server

= User 1’s Interaction with server


Session pooling background:
Operating Context within Documentum
Discrete interactions

DFC
Users HTTP App Content RDBMS
Server Server Server
(Browsers)

Pooled
sessions

Web-tier server
Session pooling changes for D7
Multi-user test (200 users)
• Session pooling context switching session count
improved to be significantly faster
– In some simple scenarios 100x 160

faster worse
140

120
• Pool session replacement is now
LRU 100

80

• Session request queuing 60

– Bursty behavior of users doesn’t


lead to DB connection run-up, 40 better
rather to delay in getting session
20
– Huge benefit for massively large
environments 0
D6.7 D7.0 prototype
Continuous Availability Background:
A Comparison of release types
Item Patch Service Minor Major
Release Pack Release Release
Can back out software changes    
(without restore)
Provides new functionality    

Potentially data model changes    

 = Yes
 = Yes, but to a small degree
 = No, but historically, some small low-risk deviations have happened
 = No
D7 Rolling upgrade

• D7 target : rolling upgrade of patch releases


– Still under evaluation: rolling upgrade of service packs
and minor releases

• Assumptions and caveats apply


– Architectural deployment in “server pods”
– Systems properly sized
– Load balancers likely required to isolate user traffic
– Tests and limitations to be published
D7 Dormant Mode Feature
• A state which privileged users (Data Center Managers) can set so that a
server component becomes read-only
– Content Server enforced restrictions
• No changes on Database for meta-data nor on File system for content files
• Agent exec won’t process jobs
• No methods can be launched
• No SQL passthrough (i.e., the execsql call)
• Not generating audit trail nor event notification
• Existing open transaction can make changes until committed or aborted
• No new connections
o Can set dormant on connection only for load balancing
o DFC will avoid connecting to dormant servers
– JMS enforced restrictions
• Not allowing Http Post
– xPlore enforced restrictions
• Not indexing any content.

• Data Center Managers can always connect and enable updates for own
sessions at dormant state if needed
Rolling Upgrade Procedure for Patches

• Upgrading components without shutting down the whole


system
– Assumption: patch can be backed out
• no state change that the previous software can’t handle
• Procedure:
1. In server cluster, setting dormant state on a server,
2. shut it down when sessions have terminated
3. Load new software
4. restart with a new binary.
5. Repeat the same process until all servers are
upgraded
Potential Procedure for Patch upgrade
Example
#1 – App server to create new
session and refresh active content Application
server map Server
(DFC)

Docbroker #2 – docbroker
provides list of
“active” servers

Content Content Content


Server Server Server

#3 – session is established
Potential Procedure for Patch upgrade
Example

Application
Server
(DFC)
#3 – Sessions
Docbroker informed of
“dormant” state

Content Content Content


Server Server Server

#2 – Content server informs docbroker


that it is now in “dormant mode”. #1 - Administrator directs this
Docbroker will now not send any Content Server instance to go
sessions to that Content Server into “Dormant Mode”
Potential Procedure for Patch upgrade
Example

Application
Server
(DFC)

Docbroker

Content Content Content


Server Server Server

Content Server instance can now be


upgraded and later brought back
into “active state”
Potential Procedure for Service Pack Upgrade
• Technologies
– Storage Snapshot mechanism (e.g., EMC Timefinder)
– Documentum D7 Dormant mode
– Application “Dormant mode awareness”

• Procedure
1. Place production system into Dormant mode
2. Have it operate on read-only point-in-time snapshot of filesystems
3. Create writable snapshot copy
4. Upgrade on snapshot copy
5. If upgrade succeeds then move users to upgraded copy and discard read-only
point-in-time snapshot
6. If upgrade fails then discard writable snapshot copy and bring point-in-time
snapshot into write mode and bring dormant production environment back
into normal mode
Potential procedure for upgrade with service
packs illustration

Documentum 7
(CS, xPlore, DB)

DB Content fulltext

1. Place production system into Dormant mode


2. Have it operate on read-only point-in-time snapshot of
filesystem
Step #3: Create writable snapshot copy
Step #4: Upgrade on snapshot copy

Dormant mode D7 only


Documentum 7 sees original data
(CS, xPlore, DB)

DB Content fulltext

Documentum 7
Upgraded D7 sees new (CS, xPlore, DB)
data changes Upgraded D7
Step #5 If upgrade succeeds then move users to
upgraded copy and discard read-only point-in-time
snapshot
Snapshot original
delta data to
discard

DB content fulltext

Documentum 7
Upgraded D7 sees new (CS, xPlore, DB)
data changes Upgraded D7
Agenda

• Documentum D7 Performance & Availability Enhancements


• Documentum xPlore HA / DR best-practices
Documentum xPlore HA/DR best-practices

• Tip #1: Establish RTO / RPO for Deployment


• Tip #2: “Recovery with Re-feed” an option for small
repositories only
• Tip #3: xPlore repair tools useful, but not sufficient to
achieve RTO
• Tip #4: HA with Direct Attached SAN’s
• Tip #5: ftintegrity use in Point-in-time recovery
• Tip #6: Know differences between FAST and xPlore
HA/DR techniques
Tip #1: Establish RTO / RPO for Deployment
 The RTO is the target time to restore the system into service
– It is typically a “Service Level Agreement” made from the IT group to
the business users
– Goal of the underlying software: enable IT’s ability to meet it
– Tools: Failover & Disaster Recovery
– RTO could be defined in terms of complete system availability or in
partial system availability

• The more “mission critical” the system is, the shorter the RTO
will be
Example of different RTO for different
“services” of Documentum
Component Failure Service Example RTO

Content Server Document viewing, Within minutes


checkin, checkout
DTS Ability to transform 12 hours
documents
FAST or xPlore Search 2 hours

BAM Business Activity 24 hours


Monitoring
Recovery Point Objective (RPO)
 RPO defines the amount of data that can be lost in a failure
– Defined in terms of minutes, hours, or days
 A short RPO (hours/minutes) typically implies either very frequent incremental
backups or complete duplicate systems
 For some Documentum components (like xPlore the new Full Text search) RPO
is always defined in terms of the RPO of the Content Server
– Any data “lost” in a failure on xPlore can be re-fed from the Content Server
– Backing up xPlore or duplicating the processing is always meant as a goal to
shorten the RTO, not to achieve an RPO

Data potentially lost in crash


time

Point of last backup Point of crash (disk corruption


for example)
Types of possible failures for xPlore deployment
Establishing a procedure to meet an RTO is a healthy exercise of pessimism

Failure Type Synopsis Repair strategies


Operating OS crash, power loss, Recycle software. Refeed
Environment JVM crash, CPS crash, any documents that didn’t
software failure out-of-memory get indexed properly
situations
Server Hardware Physical hardware Restart VM on alternate
Failure problem (memory, cpu, hardware or physical
etc) backup server

Logical Data Rare logical Restore from backup (and


Inconsistency inconsistency potentially using xPlore
rebuild / repair tools)

Physical Disk Typical cause: NAS Restore from backup (and


corruption protocol issue. potentially use xPlore
rebuild/ repair tools)
Tip #2: Recovery with Re-feed an option for small
repositories only
• Re-feeding the data is always possible
– Highly optimistic approach
– Does not have much of an ongoing operational cost
• However, unless the repository is small it is unlikely
to meet the established RTO
• Recommendation:
– At least take a full backup each week
xPlore Disaster Recovery Strategy Comparison

Strategy Normal Recovery Time comment


mode
operational
Expense
Re-feed $ Days to weeks Should only be used
for very small
docbases

Full Restore from $$ hours Backup can be from a


backup hot backup
Fail over to another $$$$ Fastest Requires duplicate
xPlore system (dual hardware.
(possibly within
mode)
seconds) But also may require
one of the above to
bring failed primary
back online.
Tip #3: xPlore repair tools useful, but not
sufficient to achieve RTO
• Tools are available with xPlore to fix various logical
and physical inconsistencies
– Between Content Server and xPlore
– Between xDB in xPlore and Lucene index
• The tools have the advantage of being surgical and
fast
– However, these tools are not guaranteed to work in all
cases
• Customers are advised to establish procedures to
meet an RTO based on tested failover and
backup/restore mechanisms
Tip #4: HA with Direct Attached SAN’s

• Background on xPlore storage & multi-


node
• A SAN-based multinode deployment
• Technical notes
Physical Layout

An XML Document (e.g., DFTXML)


can be thought of as a collection of
elements, attributes (or ‘xml nodes’)

A
B
C
D
E
This node structure can
be represented as a tree

Database A B C D E
page
XDB concepts
• xDB Library ≈ xPlore Collection
– Logical and Physical container for other libraries and/or
XML Documents
– Hierarchical in structure
– Can be associated with its own physical storage segment
(file / file system)
• Query Processing over collections
– Without qualification, queries from the root library
proceed to examine all libraries / collections
– This process is made faster by creating some indexes that
are scoped over the entire library
Libraries / Collections & Indexes

= xDB Library ≈ ESS collection

A = xDB Index

= xDB xml file (dftxml, tracking


xml, status, metrics, audit)
B C

= xDB segment

Scope of index
covers all xml files in
all sub-libraries
A
C

B
xPlore Collections at a Glance..

xDB Root Library

SystemData dsearchConfig DocbaseName-1 DocbaseName-2 DocbaseName-3

• xPlore supports consolidated or multi-tenant environments


within a single instance
• Each Docbase / Tenant’s data is a child of the Root Library
xPlore Collections at a Glance..
• Each Docbase / Tenant has three
major collections
– Application Info: currently acl
xDB Root Library and group collections

– SystemInfo: used by the


Indexing Pipelines to locate or
track the indexing progress of
SystemData DocbaseName documents

– Data: Holds the DFTXML for each


dsearch document in Documentum

ApplicationInfo SystemInfo Data


Scope of an xPlore Domain

xDB Root Library

SystemData Docbase-1 Docbase-2

Scope of Domain dsearch dsearch

ACLs/Groups Data Data

C1 C2 C3 C4
Scope of an xPlore Collection

xDB Root Library

SystemData Docbase-1 Docbase-2

dsearch dsearch
Scope of Collection

Data Data

C1 C2 C3 C4
Each xPlore Instance (Node) “owns” whole domains
or collections plus a transaction log

Logical structure mapped to physical files

Data “owned” by instance (node)

XDB transaction log Data and indexes for domains and collections
Multi-Instance (multi-node) and
data ownership

Each Instance (or node)


“owns” a portion of the
data (from the domain or
collection level)

The instance's transaction


log is used during
recovery for the data on
that Instance
Multi-Instance (multi-node) and SAN’s

Each host has SAN access


to its own data and cross
mounts this via NFS (or
CIFS) to the other
instances in the multi-
node deployment for low
intensity operations

Host A Host B

Best performance for high


capacity local traffic
Multi-Instance (multi-node) and NAS

A NAS implementation is
similar but all access is
through NAS protocol
(high and low volume)

Again, all else equal, SAN


architecture will perform
better

Host A Host B

NAS
Sharing Needs and usage
Item to share Use-case notes
Indexserverconfig.xml Shared and owned by Light on network usage
primary to all secondary
nodes

Collection Created by primary on Light on network usage


storage that is SAN when just created on
attached to secondary and primary and bound right
then later logically away to secondary
“bound” to secondary

Collections / logs Full native backup Heavy on read usage as all


blocks are copied over
network.
Notes on SAN-based multi-node

• Spare node HA mechanism doesn’t work, but alternatives


include
– active/passive clusters (like Microsoft cluster server)
– Vmware vmotion
– Active / Active dual system

• Load balancing (rebinding) of collections to any node is difficult


– But possible
Tip #5: ftintegrity use in Point-in-time recovery

• Ftintegrity defined
– Tool used to ensure identify any documents in the content
server that did not properly get indexed into xPlore

• Default behavior
– Look at all objects in the repository
– This can be slow for large repositories

• Date Range option


– Can be used check consistency for a objects created /
modified within a range
– Parameters: start and end dates
– Can significantly cut down on number of objects that have to
be examined
Tip #6: Know differences between FAST and
xPlore HA/DR techniques

Use-case FAST xPlore


Backup Cold-backup, but could backup Hot-backups: full & incremental
only fixml Warm-backups: collection-level
& domain-level
Restore Restore from fixml and Restore from backup and
ftintegrity point-in-time recovery ftintegrity point-in-time recovery
Spare Node HA Not supported Supported
Active / Passive Not supported Supported
HA
Active / Active Supported Supported
dual system
• Questions?
THANK YOU
This presentation is also available at
www.momentumeurope.com
password: spree

You might also like