You are on page 1of 207

EMC ® Documentum ® xPlore

Version 1.0

Administration Guide

EMC Corporation
Corporate Headquarters:
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com
Copyright © 2010 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change
without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS
OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY
DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
All other trademarks used herein are the property of their respective owners.
Table of Contents

Preface ................................................................................................................................ 11
Chapter 1 Overview of xPlore ...................................................................................... 13
Features and limitations .................................................................................... 13
Indexing features .......................................................................................... 13
Indexing limitations ...................................................................................... 13
Search features ............................................................................................. 14
Search limitations ......................................................................................... 14
Administration ............................................................................................. 14
Indexing and search: FAST and xPlore compared ........................................... 14
Administration differences ........................................................................ 15
Indexing differences ................................................................................. 15
Search differences ..................................................................................... 15
Architectural overview ..................................................................................... 16
xPlore physical and logical architecture.............................................................. 17
Physical architecture ..................................................................................... 17
xPlore disk areas ...................................................................................... 17
xPlore instances ........................................................................................ 18
xDB libraries ............................................................................................ 19
Indexes .................................................................................................... 20
Logical architecture ...................................................................................... 21
Physical and logical component mapping ...................................................... 24
Documentum domains and categories ............................................................... 25
Documentum collections data model (dm_fulltext_collection) ............................. 26
How Content Server documents are indexed ...................................................... 27
How Content Server documents are queried ...................................................... 29

Chapter 2 Managing the System .................................................................................. 31


Using xPlore administrator ............................................................................... 32
xPlore administrator home page .................................................................... 32
Viewing services........................................................................................... 32
Global configuration ......................................................................................... 33
Tasks outside xPlore administrator .................................................................... 33
Managing disk space ........................................................................................ 35
Using the xDB admin tool ................................................................................. 36
Modifying indexserverconfig.xml ...................................................................... 36
Displaying and configuring the system .............................................................. 37
Configuring system metrics ............................................................................... 38
Starting and stopping the system ....................................................................... 38
Managing the status database ............................................................................ 38
Managing domains ........................................................................................... 39
Execute XQuery............................................................................................ 39

EMC Documentum xPlore Version 1.0 Administration Guide 3


Table of Contents

Create a domain ........................................................................................... 39


Configure a domain ...................................................................................... 40
Attach or detach a domain or collection ......................................................... 40
Check database consistency ........................................................................... 40
Database performance statistics ..................................................................... 40
Managing instances .......................................................................................... 40
Add or delete an instance .............................................................................. 41
Configure an instance ................................................................................... 41
Start or stop an instance ................................................................................ 42
Getting instance status .................................................................................. 42
Managing spare and failed instances .................................................................. 42
Using the watchdog service ............................................................................... 45

Chapter 3 Managing Security ....................................................................................... 47


Documentum search results security .................................................................. 47
Configuring the security cache .......................................................................... 48
Configuring results summary security ............................................................... 49
Troubleshooting security ................................................................................... 49
Viewing security in the log ............................................................................ 49
Verifying security settings in the Content Server ............................................. 50
Determining the area of failure ...................................................................... 51
The wrong number of results are returned ..................................................... 51
Query execution is slow ................................................................................ 52
Troubleshooting a DFC client ........................................................................ 52
How xPlore replicates security .......................................................................... 52

Chapter 4 Managing the Index Agent ........................................................................... 53


Documentum attributes that control indexing .................................................... 53
Configuring the index agent .............................................................................. 53
Setting up index agents for ACLs and groups ................................................. 54
Filtering content and locations ........................................................................... 55
Making types non-indexable ......................................................................... 55
Indexing metadata only for specific formats ................................................... 55
Using the index agent filters .......................................................................... 56
Migrating a limited set of object types ............................................................ 57
Removing entries from the index ................................................................... 58
Resubmitting documents for indexing ............................................................... 58
Mapping file stores and content ......................................................................... 58
Mapping file stores in shared directories ........................................................ 58
Mapping Content Server storage areas to collections ....................................... 60
Running the state of the index job ...................................................................... 60

Chapter 5 Managing Document Processing (CPS) ...................................................... 63


Adding a remote CPS instance .......................................................................... 63
Starting and stopping CPS ................................................................................ 64
Viewing CPS statistics ....................................................................................... 64
Managing CPS and tokenization ........................................................................ 65
White space.................................................................................................. 65
Lemmatization ............................................................................................. 65
Special characters ......................................................................................... 69
Case sensitivity ............................................................................................ 71
Stop words ................................................................................................... 71

4 EMC Documentum xPlore Version 1.0 Administration Guide


Table of Contents

Fuzzy search (wildcards) .............................................................................. 71


Query operators ........................................................................................... 72
Language ..................................................................................................... 73
Adding dictionaries to CPS ............................................................................... 73

Chapter 6 Managing Indexing ...................................................................................... 75


Indexing scalability........................................................................................... 75
Modifying indexes ............................................................................................ 76
Configuring text extraction............................................................................ 76
Defining an index ......................................................................................... 77
Modifying subpaths...................................................................................... 80
Configuring indexing depth .......................................................................... 80
Viewing and configuring indexing metrics ......................................................... 80
Viewing indexing metrics.............................................................................. 81
Configuring indexing metrics ........................................................................ 81
Managing indexing in xPlore administrator........................................................ 81

Chapter 7 Managing Index Data ................................................................................... 83


Configuring categories ...................................................................................... 83
Managing categories ......................................................................................... 84
Planning collections for scalability ..................................................................... 84
Viewing and configuring collections .................................................................. 85
Viewing collection contents ........................................................................... 85
Adding a collection ....................................................................................... 85
Deleting a collection ..................................................................................... 86
Configuring collections ................................................................................. 86
Managing storage locations ............................................................................... 87
Troubleshooting xDB ........................................................................................ 87
Database performance ...................................................................................... 88

Chapter 8 Backup and Restore .................................................................................... 89


Backup and high availability configurations ....................................................... 89
Handling data corruption ................................................................................. 91
Rebuilding indexes ........................................................................................... 92
Native xPlore backup and restore ...................................................................... 92
Snapshot (volume-based) backup and restore ..................................................... 95
File-based backup and restore ........................................................................... 96
Scripted backup and restore utilities .................................................................. 96
Turning off indexing or changing state ........................................................... 97
Backup utilities............................................................................................. 97
Purging orphaned segments .......................................................................... 98
Restore utilities............................................................................................. 98

Chapter 9 Managing Searches ..................................................................................... 99


Configuring search ........................................................................................... 99
Viewing search statistics ................................................................................. 100
Configuring scoring and freshness ................................................................... 100
Configuring query summary and highlighting ................................................. 101
Auditing queries ............................................................................................ 103

EMC Documentum xPlore Version 1.0 Administration Guide 5


Table of Contents

Documentum Search....................................................................................... 105


Search engine configuration (dm_ftengine_config) ........................................ 106
Making types and attributes searchable........................................................ 107
Folder descend ........................................................................................... 107
DQL, DFC, and DFS queries ........................................................................ 107
Routing a query to a collection using DQL ................................................... 108
Search for lightweight sysobjects (LWSOs) ................................................... 108
FTDQL ...................................................................................................... 109
Using DQL hints ........................................................................................ 109
Hints file location ................................................................................... 109
Hints file elements .................................................................................. 110
Hints file examples ................................................................................. 111
Enabling query routing in DFC .................................................................... 112
Changing VQL queries to XQuery expressions ............................................. 112
Understanding search results ...................................................................... 113
Configuring search for fragments, wildcards, and like terms ......................... 113
Tracing Documentum queries ...................................................................... 114

Chapter 10 Troubleshooting ........................................................................................ 117


Diagnostics and troubleshooting in xPlore administrator .................................. 117
Troubleshooting system problems ................................................................... 118
Insufficient disk space ................................................................................. 118
Out of memory errors ................................................................................. 119
I/O errors, No such file or directory ............................................................. 119
Troubleshooting the Documentum index agent................................................. 120
Startup problems ........................................................................................ 120
The index agent log .................................................................................... 120
Indexing status in the index agent UI ........................................................... 120
Indexing status in Documentum Administrator ............................................ 121
Restarting the index agent ........................................................................... 122
Verifying index migration with ftintegrity .................................................... 122
Documents are not indexed ......................................................................... 124
Reindexing ................................................................................................. 124
Setting the index agent error threshold......................................................... 125
Cannot stop the index agent ........................................................................ 125
Cleaning up the Documentum index queue to restart.................................... 126
Troubleshooting CPS ...................................................................................... 126
Reading CPS log files .................................................................................. 126
Separating log files for CPS instances ........................................................... 127
Verifying the CPS process ........................................................................... 127
Testing tokenization.................................................................................... 127
Testing CPS processing ............................................................................... 127
Slow ingestion ............................................................................................ 128
Insufficient CPU ..................................................................................... 128
Large documents .................................................................................... 128
Disk I/O issues ....................................................................................... 129
Slow network ......................................................................................... 129
Large number of Excel documents ........................................................... 130
Virus checking software .......................................................................... 130
Interference by another guest OS ............................................................. 130
Slow content storage area ........................................................................ 130
CPS daemon fails to start ............................................................................ 130
CPS starts but fails to process some requests ................................................ 131
CPS starts but then has to restart ................................................................. 131
CPS configuration changes do not take effect ................................................ 131
Handling concurrent large file ingestion ...................................................... 131
Troubleshooting indexing ............................................................................... 132

6 EMC Documentum xPlore Version 1.0 Administration Guide


Table of Contents

Testing upload and indexing ....................................................................... 132


Checking network bandwidth and latency ................................................... 132
Checking the indexing log ........................................................................... 132
Checking the status of a document............................................................... 133
Checking Documentum settings .................................................................. 133
High save-to-search latency ......................................................................... 134
Index agent is down ............................................................................... 134
CPS restarts frequently ........................................................................... 134
Large documents tie up ingestion ............................................................ 134
Large ingestion batches ........................................................................... 135
Hardware and virtual server resources..................................................... 135
Connection refused ..................................................................................... 135
Changes to index configuration do not take effect ......................................... 135
Timing problems: Login ticket expired ........................................................ 135
Troubleshooting search ................................................................................... 136
Testing the query in xPlore administrator ..................................................... 136
Testing the query in Documentum iAPI or DQL ........................................... 136
Verifying the query plugin settings .............................................................. 136
Getting the query plan ................................................................................ 137
Slow queries............................................................................................... 137
System is not warmed up or caches are too small ...................................... 138
Result sets are large ................................................................................ 138
xPlore security is disabled (security applied in Content Server) ................. 138
Group caches are not tuned ..................................................................... 138
Query result size is too large ................................................................... 139
FAST-compatible wildcard behavior is enabled ........................................ 139
Insufficient CPU, disk I/O, or memory ..................................................... 139
Query of too many collections (multiple repositories, or
collections defined within a repository domain)........................................ 139
User is very underprivileged ................................................................... 140
The query does not use the index ............................................................. 140
Troubleshooting XML searches .................................................................... 140
Debugging from Webtop ............................................................................. 140
Communication error or no collection available ............................................ 141
Foreign language not identified ................................................................... 141
Changes to configuration not seen ............................................................... 141
Document indexed but not searchable.......................................................... 142
Logging ......................................................................................................... 142
Viewing logs in xPlore administrator ........................................................... 143
CPS logging ............................................................................................... 143
Log layout formats ..................................................................................... 143
Log locations .............................................................................................. 145
xDB and Lucene logging ............................................................................. 146
Query logging ............................................................................................ 146
Tracing .......................................................................................................... 147

Chapter 11 Using Reports ........................................................................................... 149


Types of reports .............................................................................................. 149
Document processing (CPS) reports ................................................................. 150
Indexing reports ............................................................................................. 151
Search reports ................................................................................................ 151
Editing a report .............................................................................................. 152

Chapter 12 Performance and Disk Space .................................................................... 155


Planning for performance................................................................................ 155

EMC Documentum xPlore Version 1.0 Administration Guide 7


Table of Contents

Improving search performance with time-based collections ........................... 157


Disk space and storage type ............................................................................ 158
Planning for disk space ............................................................................... 158
Storage types and locations ......................................................................... 159
System sizing ................................................................................................. 160
Using metrics to evaluate performance............................................................. 161
System tuning ............................................................................................... 161
Documentum index agent performance ........................................................... 163
Index agent settings .................................................................................... 163
Measuring index agent performance ............................................................ 163
Adding index agent instances ...................................................................... 163
Indexing performance ..................................................................................... 164
Factors in indexing rate ............................................................................... 164
Tunable Indexing properties ........................................................................ 164
Document size and performance ................................................................. 165
Tunable xDB properties............................................................................... 165
Search performance ........................................................................................ 166
Factors in query performance ...................................................................... 167
Changing the security cache sizes ................................................................ 168
Increasing query batch size ......................................................................... 168
Tuning xDB properties for search................................................................. 168

Appendix A Configuration settings for CPS, Indexing, and Search .............................. 171
Documentum index agent parameters.............................................................. 171
Content processing instance settings ................................................................ 173
Document processing and indexing service settings .......................................... 175
Search service settings .................................................................................... 177

Appendix B Extensible Documentum DTD ................................................................... 179


Appendix C DQL Hints File DTD ................................................................................... 183
Appendix D Tracking and Status XQueries ................................................................... 185
Appendix E Indexable Languages ................................................................................ 187
Appendix F Indexable Encodings ................................................................................. 189
Appendix G Indexable Formats ..................................................................................... 191
xPlore Glossary ................................................................................................................. 199

8 EMC Documentum xPlore Version 1.0 Administration Guide


Table of Contents

List of Figures

Figure 1. xDB and Lucene ................................................................................................... 20


Figure 2. Lucene directories and files ................................................................................... 21
Figure 3. Read-write (index and search) and read-only (search-only) collections ..................... 23
Figure 4. xPlore instances .................................................................................................... 24
Figure 5. Domain to database mapping ................................................................................ 25
Figure 6. xPlore indexing path ............................................................................................. 28
Figure 7. xPlore query path ................................................................................................. 30
Figure 8. xDB admin console ............................................................................................... 36
Figure 9. Tokens in DFTXML............................................................................................... 68
Figure 10. Tokens database in xDB admin tool ....................................................................... 69
Figure 11. System crash decision tree ..................................................................................... 92
Figure 12. Customized report for query count ...................................................................... 153
Figure 13. Scaling ingestion throughput............................................................................... 156
Figure 14. Scaling number of users or query complexity in search ......................................... 157

EMC Documentum xPlore Version 1.0 Administration Guide 9


Table of Contents

List of Tables

Table 1. Disk areas for xPlore ............................................................................................. 18


Table 2. Lucene functions .................................................................................................. 21
Table 3. Properties defined for the FT_collection type .......................................................... 26
Table 4. Actions outside xPlore administrator ..................................................................... 33
Table 5. State of the Index job arguments ............................................................................ 61
Table 6. linguistic-process element ..................................................................................... 66
Table 7. Extraction configuration options ............................................................................ 76
Table 8. Index definition options ........................................................................................ 78
Table 9. Path-value index with and without subpaths ......................................................... 79
Table 10. Category configuration options ............................................................................. 83
Table 11. Backup scenarios .................................................................................................. 90
Table 12. Differences between DQL and DFC/DFS queries................................................... 107
Table 13. DQL hints file elements ....................................................................................... 110
Table 14. Comparing index agent and xPlore administrator indexing metrics ....................... 121
Table 15. Index agent error configuration ........................................................................... 125
Table 16. Error codes......................................................................................................... 125
Table 17. Text layout arguments ........................................................................................ 144
Table 18. List of reports ..................................................................................................... 149
Table 19. How xPlore uses disk space ................................................................................ 158
Table 20. Comparison of storage types performance ........................................................... 160
Table 21. Indexing metrics mapped to performance problems ............................................. 161
Table 22. Indexagent configuration parameters in generic_indexer.parameter_list ................ 171
Table 23. Index agent runtime configuration in indexer_plugin_config.indexer ..................... 172
Table 24. Other index agent parameters ............................................................................. 173
Table 25. DMFTXML top-level elements ............................................................................. 179
Table 26. List of indexable languages ................................................................................. 187
Table 27. Indexable word processing and text formats ......................................................... 191
Table 28. Indexable database formats ................................................................................. 193
Table 29. Indexable spreadsheet formats ............................................................................ 193
Table 30. Indexable presentation formats............................................................................ 194
Table 31. Indexable graphics formats (vector and raster) ..................................................... 194
Table 32. Indexable compressed formats ............................................................................ 196
Table 33. Indexable email formats ...................................................................................... 196
Table 34. Indexable multimedia formats ............................................................................. 197
Table 35. Other indexable formats...................................................................................... 197

10 EMC Documentum xPlore Version 1.0 Administration Guide


Preface

This guide describes the configuration and administration of Documentum xPlore. These tasks
include system monitoring, index configuration and management, query configuration and
management, auditing and security, and Documentum integration.
The documentation set also contains release notes, and installation guide, and a development guide.
These documents are available as PDF downloads on the EMC download site and as HTML within
the xPlore infocenter web application that is installed with xPlore. The infocenter is available from
the Help button in xPlore administrator tool.

Intended Audience
This guide contains information for xPlore administrators. The overview information is also helpful
to developers who are creating indexing or query customizations.
An administrator must be familiar with the installation guide, which describes the initial configuration
of the xPlore installation. For Documentum product users, this guide assumes familiarity with EMC
Documentum Content Server administration when Documentum functionality is discussed.

Revision History
The following changes have been made to this document.

Revision Date Description


October 2010 Initial publication for version 1.0 release

Additional documentation
This guide provides overview and administration information. For information on installation
and development, refer to:
• Documentum Documentum xPlore Release Notes
• Documentum Documentum xPlore Deployment Guide
• Documentum Documentum xPlore Development Guide

EMC Documentum xPlore Version 1.0 Administration Guide 11


Preface

For additional information on Content Server installation and Documentum search client
applications, refer to:
• Documentum Content Server Installation Guide
• Documentum Search Development Guide

12 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 1
Overview of xPlore

Documentum xPlore is a multi-instance, scalable, high-performance, full-text index server that can be
configured for high availability and disaster recovery.
The following topics are described in this overview:
• Features and limitations, page 13
• Architectural overview, page 16
• xPlore physical and logical architecture, page 17
• Documentum domains and categories, page 25
• Documentum collections data model (dm_fulltext_collection), page 26
• How Content Server documents are indexed, page 27
• How Content Server documents are queried, page 29

Features and limitations


Index and search features and limitations are summarized in the following topics.

Indexing features
Collection topography — xPlore supports creating collections online, and collections can span
multiple file systems.

Transactional updates and purges — xPlore supports transactional updates and purges of indexes
as well as transactional commit notification to the caller.

Multithreaded insertion into indexes — xPlore ingestion through multiple threads supports
vertical scaling on the same host.

Indexing limitations
Batch failure — Indexing requests are processed in batches. When one request in a batch fails,
the entire batch fails.

EMC Documentum xPlore Version 1.0 Administration Guide 13


Overview of xPlore

Lemmatization — xPlore supports lemmatization, but you cannot configure the parts of speech
that are lemmatized.

Thesaurus — xPlore does not support a thesaurus.

Search features
Case sensitivity — xPlore queries are lower-cased (rendered case-insensitive).

Full-text queries — To query metadata, set up a specific index on the metadata.

Faceted search — Facets in xPlore are computed over the entire result set or over a configurable
number of results.

Security evaluation — When a user performs a search, permissions are evaluated for each result.
Security can be evaluated in the xPlore full-text engine before results are returned to Content Server,
resulting in faster query results. This feature is turned on by default and can be configured or
turned off.

Native XQuery syntax — The xPlore full-text engine supports XQuery syntax.

Search limitations
Search topic — Zone searching (search topic in Documentum DQL) searches defined regions of an
XML document, for example, all child elements and attributes enclosed within an element. xPlore
does not support zone searching of attributes, although individual elements and their attributes can
be indexed and searched. You can configure xPlore to index XML content that is within an input
document, which will allow zone searching through XQuery or DQL.

XML attributes — xPlore does not index attribute values on XML elements. This refers to the
input XML. For example, in the stored DFTXML representations of Documentum documents, you
cannot find all documents for which the value of the dmfttype attribute of the element acl_name is
"dmstring."

Administration
xPlore has an administration console.

Indexing and search: FAST and xPlore compared


If you are migrating from FAST to xPlore, the following information describes differences between
the two indexing servers.

14 EMC Documentum xPlore Version 1.0 Administration Guide


Overview of xPlore

Administration differences

Many features in xPlore are configurable through xPlore administrator. These features were not
configurable for FAST. Additionally, administrative tasks are exposed through Java APIs.

Ports required — During xPlore instance configuration, the installer prompts for the HTTP port for
the JBoss instance (base port) and validates that the next 100 consecutive ports are available. During
index agent configuration, the installer prompts for the HTTP port for index agent Jboss instance and
validates that the next 20 consecutive ports are available. FAST used 4000 ports.

High availability — xPlore supports N+1, active/passive with clusters, and active/active shared data
configurations. FAST supports only active/active.

Disaster recovery — xPlore supports online backup, including full and incremental. FAST supports
only offline (cold) backup.

SAN and NAS — xPlore supports SAN and NAS. FAST supports SAN only.

Virtualization — xPlore runs in VMWare environments. FAST does not.

64-bit address space — 64-bit systems are supported in xPlore but not in FAST.

Indexing differences
Backup and restore — xPlore supports warm backups and spare indexing instances. xPlore also
supports active/passive clusters for high availability.

Disaster recovery — xPlore automatically restarts content processing in case of a CPS crash. In the
case of a VM crash, the xPlore watchdog sends an email notification.

Transactional updates and purges — xPlore supports transactional updates and purges as well as
transactional commit notification to the caller. FAST does not.

Collection topography — xPlore supports creating collections online, and collections can span
multiple file systems. These features are not supported by FAST.

Thesaurus — FAST supports a thesaurus, xPlore does not.

Lemmatization — FAST supports configuration for which parts of speech are lemmatized. In
xPlore, lemmatization is enabled or disabled.

Search differences
One-box search — Searches from the Webtop client default to ANDed query terms in xPlore. In
FAST, they defaulted to OR, resulting in many more non-specific hits.

Query a specific collection — Targeted queries are supported in xPlore but not FAST.

Folder descend — Folder descend queries are optimized in xPlore but not in FAST.

EMC Documentum xPlore Version 1.0 Administration Guide 15


Overview of xPlore

Results ranking — FAST and xPlore use different ranking algorithms.


xPlore allows you to configure non-indexed metadata to save disk space and improve ingestion and
search performance, but the number of hits will differ between FAST and xPlore queries on the
non-indexed content. For example, if xPlore does not tokenize docbase_id, a full-text search on "256"
would return no hits in xPlore but would return all indexed documents for repository whose ID is 256.

Security evaluation — Security is evaluated by default in the xPlore full-text engine before results
are returned to Content Server, resulting in faster query results. FAST returns results to the Content
Server, resulting in many hits that the user is not able to view.
Underprivileged user queries are optimized in xPlore but not in FAST.

Native XQuery syntax — XQuery syntax is supported by the xPlore full-text engine.

Facets — Facets are limited to 350 hits in FAST, but xPlore supports many more hits.

Hit count — FAST returns the total number of hits before returning results. xPlore does not.

Search topic — Zone searching (search topic) searches defined regions of an XML document,
for example, all child elements and attributes enclosed within an element. Zone searching is not
supported by xPlore, although individual elements and their attributes can be indexed. Zone
searching is supported by FAST for backward compatibility. Zone searches do not span entities nor
do they return the contents of the zone.

XML attributes — Attribute values on XML elements are part of the xPlore binary index. They
are not indexed by xPlore.

Wildcards — FAST matches fragments of words in wildcard searches, for example, in the Webtop
one-box search. xPlore matches whole words only. In advanced search with xPlore, you can use
wildcards to search for attributes. For example, run* produces a hit of "run fast" or "running" but
not on "runt" or "prune.". You can revert to fragment search in xPlore for both one-box and attribute
search, but performance is slower.

Special characters — Special character lists are configurable. The default in xPlore differs from
FAST when terms such as email addresses or contractions are tokenized. For example, in FAST, an
email address will be split up into separate tokens with the period and @ as boundaries. However, in
xPlore, only the @ will serve as the boundary, since the dot is considered a "context" character.

Thesaurus — Supported in FAST but not in xPlore.

Architectural overview
xPlore provides query and indexing services that can be integrated into external content sources such
as the Documentum content management system. External content source clients like Webtop or
CenterStage, or custom Documentum DFC clients, can send indexing requests to xPlore.
Each document source is configured as a domain in xPlore. You can set up domains using xPlore
administrator. For Documentum environments, the Documentum index agent creates a domain for
each repository and a default collection within that domain.

16 EMC Documentum xPlore Version 1.0 Administration Guide


Overview of xPlore

Documents are provided in an XML representation to xPlore for indexing through the indexing APIs.
In a Documentum environment, the Documentum index agent prepares an XML representation of
each document to be indexed. The document is assigned to a category, and each category corresponds
to one or more collections as defined in xPlore. To support faceted search in Documentum
repositories, you can define a special type of an index called an implicit composite index.
xPlore instances are web application instances that reside on application servers. When an xPlore
instance receives an indexing request, it uses the document category to determine what should be
tokenized and saved to the index. The content is fetched by a local or remote instance of the content
processing service (CPS). CPS detects the primary language and format of a document. CPS then
extracts indexable content from the request stream and parses it into tokens. The tokens are used
for building a full-text index.
xPlore manages the full-text index. An external Apache Lucene full-text index is embedded into
the EMC XML database (xDB). xDB tracks indexing and updates requests, recording the status of
requests and the location of indexed content. xDB provides transactional updates to the Lucene
index. Indexes are still searchable during updates.
When an instance receives a query request, the request is processed on all instances, then the query
results are returned.
xPlore provides a web-based administration console.

xPlore physical and logical architecture


The xPlore architecture is designed with the following principles:
• Use standards as much as possible, like XQuery
• Use open source tools and libraries, like Lucene
• Support enterprise readiness: High availability, backup and restore, analytics, reports, diagnostics
and troubleshooting, administration GUI, and configuration and customization points.
xPlore physical and logical architecture are described in the following topics.

Physical architecture
The xPlore index service and search service are deployed as a WAR file to a JBoss application server
that is included in the xPlore installer. xPlore administrator and online help are installed as war files
in the same JBoss application server. The index is stored in the storage location that was selected
during configuration of xPlore.

xPlore disk areas

xPlore creates disk areas for xDB data and redo log, the Lucene index, a temp area, xPlore
configuration and utilities, and index agent content staging. Table 1, page 18 describes how these
areas are used during indexing and search. xPlore runtime files and instances are described in xPlore
instances, page 18. xPlore configuration is described throughout this administration guide.

EMC Documentum xPlore Version 1.0 Administration Guide 17


Overview of xPlore

Table 1. Disk areas for xPlore

Area Description Use in indexing Use in search


xDB data Stores DFTXML, Next free space is Random access
metrics, audit, ACLs consumed by disk retrieval for specific
and groups. block for batch XML elements and
files summary
xDB redo log Stores transaction Updates to xDB data Provides snapshot
information are logged information during
some retrievals
Lucene index Performs query lookup Index updated through Inverted index lookup
and retrieval and inserts and merges and facet and security
facet and security retrieval
information
Temp Updates the Non-committed data is None
Lucene index stored to the log
(non-transactional
data)
Index agent content Temporarily holds Holds content None
staging area content during
indexing process

xPlore instances

An xPlore instance is one deployment of the xPlore WAR file to an application server container. You
can have multiple instances on the same host (vertical scaling), although it is more common to have
one xPlore instance per host (horizontal scaling). You create an instance by running the xPlore
installer. You manage instances in xPlore administrator.
Note: All instances in an xPlore deployment must have their host clocks synchronized to the primary
xPlore instance host.
An instance can be configured to enable one or more of the following features:
• Content processing service (CPS)
• Indexing service
• Search service
• xPlore Administrator (includes analytics, instance, and data management services)
• Spare
A spare instance can be manually activated to take over for a disabled instance. Refer to Managing
spare and failed instances, page 42 for more information.
The first instance that is installed is designated as the primary instance. Secondary instances can be
added after the primary instance has been installed. The primary instance must be installed and
running when you install a secondary instance.

18 EMC Documentum xPlore Version 1.0 Administration Guide


Overview of xPlore

xDB libraries

xDB is a database that enables high-speed storage and manipulation of many XML documents.
An xDB library has a hierarchical structure similar to an OS directory. The library is a logical
container for other libraries or XML documents. The library corresponds to a collection in xPlore
with additional metadata such as category, usage, and properties. An xDB library stores an xPlore
collection as a Lucene index, optionally including the XML content that is indexed. xPlore manages
the indexes on the collection.
xDB manages the following libraries for xPlore:
• The root library contains a SystemData with metrics and audit databases. These databases record
metrics and audit queries by xPlore instance.
• Each domain contains an xDB tracking library (database) records the content that has been
indexed.
• Each domain contains a status library (database) that reports indexing status for the domain.
• Each domain contains one or more data libraries. The default library is the first that is created for
a domain.
When xPlore processes an XML representation of an input document and supplies tokens to xDB,
xDB stores them into a Lucene index. Optionally, xPlore can be configured to store the content
along with the tokens. A tracking database in xDB manages deletes and updates to the index. For
Documentum, this means that when documents are updated or deleted, changes to the index are
propagated. When xPlore supplies XQuery expressions to xDB, xDB passes them to the Lucene index.
xDB tracks the location of documents in order to query the correct index. xDB also manages parallel
dispatching of queries to more than one Lucene index. For example, if you have set up multiple
collections on different storage locations, the query is processed in parallel rather than sequentially.
xDB and the Lucene index are diagrammed in Figure 1, page 20.

EMC Documentum xPlore Version 1.0 Administration Guide 19


Overview of xPlore

Figure 1. xDB and Lucene

An xDB library is stored on a data store. If you install more than one instance of xPlore, the storage
locations should be accessible by all instances. The xDB data stores and indexes can reside on a
separate data store, SAN or NAS. The locations are configurable in xPlore administrator. If you do
not have heavy performance requirements, xDB and the indexes can reside on the same data store.

Indexes

You can configure none, one, or multiple indexes on a collection. An explicit index can be created
based on values of XML elements, paths within the XML document, path-value combination, or
full-text content. For example, following is a value indexed field:
/dmftdoc[dmftmetadata//object_name="foo"]
Following is a tokenized, full-text field:
/dmftdoc[dmftmetadata//object_name ftcontains ’foo’]

xPlore manages an implicit index. xDB performs the index management within xPlore and provides
support for more search capabilities than standard Lucene index searches. Indexes can be compressed
to enhance performance.
Indexes are defined and configured in indexserverconfig.xml. (This file is located in
dsearch_home/config on the primary instance. Stop all xPlore instances to edit this file. Validate your
changes using the validation tool described in Modifying indexserverconfig.xml, page 36.) Back up
the xPlore federation after you change this file.
Table 2, page 21 describes the function of Lucene directories and their files on the file system.

20 EMC Documentum xPlore Version 1.0 Administration Guide


Overview of xPlore

Figure 2. Lucene directories and files

Table 2. Lucene functions

Name Function

blacklists Contains log sequence numbers and xDB node


IDs of deleted documents. When merged with
indexes, the blacklist is no longer needed.
Lucene index directory Begins with LI. Active indexes are registered in
the index_info file.
index_info Contains a list of all committed Lucene indexes.
Entries are final or non-final. Non-final entries
list all committed transactions. All indexes in
index_info can be queried. If an LI directory
is not in the list, it will eventually be removed
from the file system.
returnable_field_path Stores the XML path for returnable elements
such as faceted attributes or security. These
elements are stored with the index and returned
during a query.
returnable_field_value Stores a map of a value-name to a compressed
number, for returnable fields that have value
compression enabled.

Logical architecture
A domain contains indexes for one or more categories of documents. A category is logically
represented as a collection. Each collection contains indexes on the content and on metadata for
which indexes have been defined. When a document is indexed, it is assigned to a category or
class of documents. The category can have one or more collections, with various kinds of indexes
defined on these collections.

Domains — A domain is a separate, independent, logical grouping of collections with an xPlore


deployment. For example, a domain could contain the indexed contents of a single Documentum
content repository. Domains are defined in xPlore administrator in the data management screen. A
domain can have multiple collections in addition to the default collection.

EMC Documentum xPlore Version 1.0 Administration Guide 21


Overview of xPlore

The Documentum index agent creates al domain for the repository to which it connects. This domain
receives indexing requests from the Documentum index agent.

Categories — A category defines how a class of documents is indexed. All documents submitted for
ingestion must be in XML format. (For example, the Documentum index agent prepares an XML
version for Documentum repository indexing.) The category is defined in indexserverconfig.xml and
managed by xPlore. A category definition specifies the processing and semantics that is applied to an
ingested XML document. You can specify the XML elements that are used for language identification.
You can specify the elements that have compression, text extraction, tokenization, and storage of
tokens. You also specify the indexes that are defined on the category and the XML elements that are
not indexed. A category can map to more than one collection.

Collections — A collection is a logical group of XML documents that is physically stored in an


xDB detachable library. A collection represents the most granular data management unit within
xPlore. All documents submitted for indexing are assigned to a collection. A collection generally
contains one category of documents. In a basic deployment, all documents in a domain are assigned
to a single default collection.
A collection is bound to a specific instance in read-write mode (index and search) or to multiple
instances in read-only mode (search-only). This relationship is diagrammed in Figure 3, page 23.

22 EMC Documentum xPlore Version 1.0 Administration Guide


Overview of xPlore

Figure 3. Read-write (index and search) and read-only (search-only) collections

Using xPlore Administrator, you can define a collection and its category, back up the collection, and
change binding and state. If a collection has been configured to store XML tokens, the collection index
can be rebuilt without reingestion.
The metrics and audit systems use collections in a domain named SystemData. You can view this
domain and collections in xPlore administrator. One metrics and one audit database is defined. Each
database has a subcollection for each xPlore instance.

Example — A document is submitted for indexing. The client indexing application, for example,
Documentum index agent, has not specified the target collection for the document. If the document
exists, the index service updates the document. If it is a new document, the document is assigned
to an instance based on a round-robin order. On that instance, if the instance has more than one
collection, then collection routing is applied. If collection routing is not supplied by a client routing
class, the document is assigned to a collection in round-robin order.

EMC Documentum xPlore Version 1.0 Administration Guide 23


Overview of xPlore

Physical and logical component mapping


Figure 4, page 24 diagrams the physical components of a simple xPlore system: Two installed
instances, each with its own indexing, search, and CPS services.

Figure 4. xPlore instances

Figure 5, page 25 shows the database structure for the two example instances.
• The entire xPlore federation library is stored in xDB root-library.
• One content source (Documentum repository A) is mapped to a domain library. The library is
stored in a defined storage area on either instance.
• A second repository, Repository B, has its own domain.
• All xPlore domains share the system metrics and audit databases (SystemData library in xDB with
libraries MetricsDB and AuditDB). The metrics and audit databases have a subcollection for
each xPlore instance.
• The ApplicationInfo library contains Documentum ACL and group collections for a specific
domain (repository).
• The SystemInfo library has two subcollections: TrackingDB and StatusDB. Each collection in
TrackingDB matches a collection in Data and is bound to the same instance as that data collection.
There is a subcollection in StatusDB for each xPlore instance. The instance-specific subcollection
has a file status.xml that contains processing information for objects that are being processed
by the instance.
• The Data collection has a default subcollection.

24 EMC Documentum xPlore Version 1.0 Administration Guide


Overview of xPlore

Figure 5. Domain to database mapping

Documentum domains and categories


Repository domains — An xPlore domain generally maps to a single Documentum repository.
Within that domain, you can direct documents to one or more collections. In the following
configuration in indexserverconfig.xml in dsearch_home/config, a repository is mapped to a domain.
Three collections are defined: one for metadata and content (default), one for ACLs, and one for
groups. These latter two collections are used to filter results for permissions before returning them
to the client application. The collections in the domain can be distributed across multiple xPlore
instances.
<domain name="repository1" default-document-category="dftxml">
<collection name="default" usage="Data"/>
<collection name="acl" usage="ApplicationInfo" document-category="acl"/>
<collection name="group" usage="ApplicationInfo" document-category="group"/>
</domain>

For more information on domains, refer to Domains, page 21.

Documentum categories — A document category defines the characteristics of XML documents that
belong to that category and their processing. All documents are sent to a specific index based on

EMC Documentum xPlore Version 1.0 Administration Guide 25


Overview of xPlore

the document category. For example, xPlore pre-defines a category called dftxml that defines the
indexes. All Documentum indexable content and metadata are sent to this category. If your custom
types need special configuration and a separate index, create custom categories for them.
The following Documentum categories are defined within the <domain> element in
indexserverconfig.xml, which is located in dsearch_home/config. Shut down all xPlore instances
before changing this file. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.
• dftxml
XML representation of object metadata and content for full text indexing
• acl
ACLs that defined in the repository are indexed so that security can be evaluated in the full-text
engine. Refer to Documentum search results security, page 47 for more information.
• group
Groups defined in the repository are indexed to evaluate security in the full-text engine.
For more information on categories, refer to Categories, page 22.

Documentum collections data model


(dm_fulltext_collection)
You can create multiple full-text collections for a repository for the following purposes:
• Partition data
• Scale indexes for performance
• Support storage-based routing
The dm_ftengine_config object has a repeating attribute dm_fulltext_collection. It is reserved for
use by Content Server client applications. Each ID points to a dm_fulltext_collection object. Table
3, page 26 describes the properties defined for this type. All properties listed are single except
physical_indexes, which is repeating.

Table 3. Properties defined for the FT_collection type

Property Datatype Description


name string(64) Name of the collection.
max_indexes integer Maximum number of indexes
the collection can have.
collection_root char(512) Root path for the collection.
Each index will be in a separate
subdirectory of this root. A
value applies to all the indexes
within the collection.

26 EMC Documentum xPlore Version 1.0 Administration Guide


Overview of xPlore

Property Datatype Description


collection_root_backup char(512) Alternate location for storing
the indexes in the collection.
max_collection_size integer Maximum size on disk, in
MB, of all indexes within the
collection.
status integer Status of the collection. Valid
values are:
• 0, for invalid

• 1, for read/write (index and


search)

• 2, for read-only (search-only)

The default is 0.
physical_indexes ID List of object IDs of indexes
within the collection.
mode integer Allowed operation mode. Valid
values are:
• 0, for read/write (index and
search)

• 1, for read-only (search-only)

The default is 0.
index_root_location string(255) Name of a dm_location object
r_object_count double Number of objects in the
collection
r_partition_name string(256) Name of partition for the
collection

How Content Server documents are indexed


Figure 6, page 28 illustrates the path of a document from a Documentum repository to an xPlore index.

EMC Documentum xPlore Version 1.0 Administration Guide 27


Overview of xPlore

Figure 6. xPlore indexing path

1. In a client application, a Save, Checkin, Destroy, Readonlysave, or MoveContent operation is


performed on a SysObject in the repository.
2. This operation event generates a queue item (dmi_queue_item) in the repository that is sent
to the full-text user work queue. (The full-text user, dm_fulltext_index_user, is a Superuser
created when a repository is created or when an existing repository is upgraded.) The index
agent retrieves the queue item (step 7). After an index request is submitted to xPlore, the client
application can move on to the next task. (Indexing is asynchronous.)
3. The index agent retrieves the object associated with the queue item from the repository. The
content is retrieved using getfile or staged to a temporary area for a getpath operation. The
agent then creates a DFTXML (XML) representation of the object that can be used full-text and
metadata indexing.
4. The Index Agent sends the DFTXML representation of the content and metadata to the xPlore
Server.
5. The xPlore indexing service calls CPS, which performs the following functions:
• Gets the content from the repository
• Identifies the primary language of the document
• Transforms the indexable metadata and content of the document into XML tokens
6. The content is merged with DFTXML (optionally stored in xDB) and is indexed.
7. The xPlore indexing service performs the following steps:
• Distributes indexing requests among multiple worker threads.
• Routes documents to their target collections.
• Document location is tracked in the TrackingDB.

28 EMC Documentum xPlore Version 1.0 Administration Guide


Overview of xPlore

• Indexing status is tracked in the StatusDB.


• Indexing metrics are saved in the MetricsDB.
• The service notifies the index agent of the indexing status through a callback. When the index
agent receives a notification of a successful index operation, the queue item is removed from
the repository. Otherwise, the queue item is left behind and the error status is updated,
along with the error message.
The object is now searchable.
Note: The index service does not provide any indication that an object is searchable.

Enabling indexing for an object type — Queue items for indexing are generated by events
in dmi_registry for the user dm_fulltext_index_user. The following events are registered for
dm_fulltext_index_user to generate indexing events by default:
• dm_sysobject: dm_save, dm_checkin, dm_destroy, dm_saveasnew, dm_move_content
• dm_acl: dm_save, dm_destroy, dm_saveasnew
• dm_group: dm_save, dm_destroy
Use Documentum Administrator to change the fulltext registration for an object type. Select the type,
view the properties, and for the property Enable indexing check Register for indexing. To change
specific events that are registered for fulltext, you must use the DFC API registerEvent().
Note: The type must be dm_sysobject or its subtype.

Reindexing — The index agent does not recreate all the queue items for reindexing. Instead, it
creates a watermark queue item (type dm_ftwatermark) to indicate the progress of reindexing. It
picks up all the objects for indexing in batches by running a query. The index agent updates the
watermark as it completes each batch. When the reindexing is completed, the watermark queue item
is updated to ’done’ status.
You can submit for reindexing one or all documents that failed indexing. In Documentum
Administrator, open Indexing Management > Index Queue. Choose Tools > Resubmit all failed
queue items, or select a queue item and choose Tools > Resubmit queue item.

How Content Server documents are queried


Several software components control full-text search using the xPlore server:
• The Content Server queries the full-text indexes and returns query results to client applications.
• The xPlore server responds to full-text queries from Content Server.
Figure 7, page 30 illustrates the path of a query from a Documentum client to xPlore.

EMC Documentum xPlore Version 1.0 Administration Guide 29


Overview of xPlore

Figure 7. xPlore query path

1. Client application submits a DQL query to the Documentum Content Server. If the client
application uses DFC to create the query, DFC translates the query into XQuery syntax.
2. The Server transmits a DQL query to the query plugin, which translates the query into XQuery
syntax.
3. The query plugin transmits batches of HTTP messages containing XQuery statements to the
xPlore search service.
4. CPS identifies the primary language of the query, tokenizes it, and passes it to xDB. xDB then
breaks the query into XQuery clauses for full-text (using ftcontains) and metadata (using value
constraints). The query is executed in the Lucene index. The query is executed against all
collections unless a collection is specified in the query.
5. xDB applies the security filter. If configured, the Documentum security filter applies ACL and
group permissions to results.
6. The results are returned in batches, with summary, highlighting, and facets.

30 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 2
Managing the System

Most system administration tasks are available in xPlore administrator. When you open xPlore
administrator, you see the navigation tree and the system overview page. You can open
administration pages for system-wide services, instance-specific services, data management, and
diagnostics and troubleshooting.
When you open a service page, such as indexing service, the actions apply to all indexing services
in the xPlore installation. To change the indexing service configuration for a specific instance, open
the instance in the navigation tree and then choose the service.
For information on system troubleshooting, refer to Troubleshooting system problems, page 118.
The following topics describe system management:
• Using xPlore administrator, page 32
• Global configuration, page 33
• Tasks outside xPlore administrator, page 33
• Managing disk space, page 35
• Using the xDB admin tool, page 36
• Modifying indexserverconfig.xml, page 36
• Displaying and configuring the system, page 37
• Configuring system metrics, page 38
• Starting and stopping the system, page 38
• Managing the status database, page 38
• Managing domains, page 39
• Managing instances, page 40
• Managing spare and failed instances, page 42
• Using the watchdog service, page 45
For information on backup and restore, refer to Chapter 8, Backup and Restore.

EMC Documentum xPlore Version 1.0 Administration Guide 31


Managing the System

Using xPlore administrator


To start xPlore administrator, do the following:
1. Open your web browser and enter:
http://host:port/dsearchadmin

• host: DNS name of the computer on which the xPlore primary instance is installed.
• port: xPlore primary instance port (default: 9300).
• password: xPlore administrator password that was used during installation of the primary
instance.
2. Specify values in these fields and click OK.

xPlore administrator home page


The xPlore administrator home page displays a navigation tree in the left pane and links to the
four management areas in the content pane. Click System Overview in the left tree to get status of
auditing and each xPlore instance. Click Global Configuration to configure system-wide settings.

Viewing services
Expand Services in xPlore administrator.
• Click Indexing Service to view all indexing service instances in the xPlore federation.
For information on configuring the indexing service for a specific instance, expand Instances >
Instance_name > Indexing Service.
• Click Search Service to view all search service instances in the xPlore federation.
For information on configuring the search service for a specific instance, expand Instances
> Instance_name > Search Service.
• Click Content Processing Service to view all CPS instances in the xPlore federation.
For information on configuring CPS for a specific instance, expand Instances > Instance_name >
Content Processing Service.
• Click Logging to configure system-wide logging.
For information on configuring logging for a specific instance, expand Instances > Instance_name >
Logging.
• Click Tracing to configure system-wide tracing.
For information on configuring tracing for a specific instance, expand Instances > Instance_name
> Tracing.

32 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the System

Global configuration
Click Global Configuration to configure the following system-wide settings:
• Storage management
Managing storage locations, page 87
• Index service configuration
Document processing and indexing service settings, page 175
• Search service configuration
Search service settings, page 177
• Logging configuration
Logging, page 142

Tasks outside xPlore administrator


Some xPlore administration tasks can be performed in xPlore administrator as well as in
indexserverconfig.xml or xDB. Use xPlore administrator for those tasks. Table 4, page 33 describes the
actions that must be performed outside xPlore administrator. (These actions are not common.)

Table 4. Actions outside xPlore administrator

Action Admin API indexserverconfig.xml xDB


Define/change a X
category of documents
(refer to Configuring
categories, page 83)
Define a sub-path X
for facets (refer to
Documentum Search
Development Guide)
Disable system, X
index, search metrics
(refer to Displaying
and configuring the
system, page 37)
Purge status DB (refer X
to Managing the status
database, page 38)
Register custom X X
routing class (refer
to Documentum Search
Development Guide)

EMC Documentum xPlore Version 1.0 Administration Guide 33


Managing the System

Action Admin API indexserverconfig.xml xDB


Change primary X
instance or xPlore host
(refer to To replace a
primary instance with
a spare instance, page
43)
Enable and configure X
lemmatization (refer to
Lemmatization, page
65)
Configure indexing X
depth (refer to
Configuring indexing
depth, page 80)
Specify collection X
backup path (refer
to Configuring
collections, page 86)
Boost metadata and X
freshness (refer to
Configuring scoring
and freshness, page
100)
Change special X X
characters list (Special
characters, page 69)
Add a custom X
dictionary (refer to
Adding dictionaries to
CPS, page 73)
Configure X
Documentum security
filter properties (refer
to Documentum search
results security, page
47)
Trace specific classes X
(refer to Tracing, page
147)
Suspend xDB for X
backups (refer to
Chapter 8, Backup and
Restore)

34 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the System

Indexing tasks in the Documentum environment — The following index agent tasks are performed
outside xPlore administrator.
• Limit content size for indexing (refer to Configuring the index agent, page 53.)
• Exclude ACL and group attributes from indexing (refer to Configuring the index agent, page 53.)
• Map file stores in shared directories (refer to Mapping file stores and content, page 58.
• Install additional index agents (refer to Setting up index agents for ACLs and groups, page 54).
• Map partitions to specific collections (refer to Mapping Content Server storage areas to collections,
page 60.
• Verify index agent migration (refer to Verifying index migration with ftintegrity, page 122).
• Customize indexing and query routing, filter object types, and inject metadata (refer to
Documentum xPlore Development Guide.

Search tasks in the Documentum environment — The following search configuration tasks are
performed outside xPlore administrator.
• Turn off xPlore native security (refer to Documentum search results security, page 47.
• Make types and attributes searchable (refer to Making types and attributes searchable, page 107).
• Turn off XQuery generation to support certain DQL operations (refer to Disabling XQuery
generation by DFC or DFS, page 108).
• Configure search for fragments, wildcards, and like terms (refer to Configuring search for
fragments, wildcards, and like terms, page 113).
• Routing a query to a specific collection Enabling query routing in DFC, page 112
• Turn on tracing for the Documentum query plugin (refer to Tracing Documentum queries, page
114).
• Customize facets and queries (refer to "Documentum customizations" in Documentum xPlore
Development Guide.

Managing disk space


The space for xPlore is consumed by three main areas: xDB data (the DFTXML of a Documentum
document), the Lucene indexes, and the xDB log. In most use cases, the xDB data will be larger
than the other two. The Lucene index needs extra space for its maintenance. Generally, the Lucene
index requires twice the size of disk space that it currently consumes, in order to perform updates
and merges.
For backups, you must provide at least the same amount of disk space that is already consumed
by the Lucene index. You can determine the space consumed by a particular index using xPlore
administrator. Choose Data Management and select the index. The disk space is displayed.
For information on troubleshooting disk space problems, refer to Insufficient disk space, page 118.

EMC Documentum xPlore Version 1.0 Administration Guide 35


Managing the System

Using the xDB admin tool


Some query optimization and debugging tasks use the xDB admin tool. To start the admin tool,
navigate to dsearch_home/dsearch/xhive/admin and run the script XHAdmin.bat or XHAdmin.sh.
Choose the connection icon to log in. The password is the same as your xPlore administrator
password.

Caution: Do not use xhadmin to rebuild an index or change files that are used by xPlore. This
tool is not aware of xPlore configuration settings in indexserverconfig.xml.

After login, you see the tree in the left pane, which shows segments, users, groups, and libraries:

Figure 8. xDB admin console

You can expand the root library to find a library and a collection of interest, then highlight a particular
indexed document to see its XML rendition. To query a library or collection, use the search icon at
the top of the admin client. The query window has tabs to show the results tree, debug the query,
and optimize the query.

Caution: If you remove segments, your backups cannot be restored.

Modifying indexserverconfig.xml
Some tasks are not available in xPlore administrator. These rarely-needed tasks require manual
editing of indexserverconfig.xml. This file is located in dsearch_home/config. Stop all instances in the
xPlore system before modifying this file.
Validate your changes using the tool validateConfigurationFile.bat or validateConfigurationFile.sh.
This tool is located in dsearch_home/dsearch/xhive/admin on the primary xPlore instance. From the
command line, type the following. Substitute your path to indexserverconfig.xml.
validateConfigurationFile.bat path_to_config_file

36 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the System

For example:
validateConfigurationFile.bat C:\xPlore\config\indexserverconfig.xml

Back up the xPlore federation after you change this file.

Caution: Make your changes to this file using an XML editor. Changes must be encoded
in UTF-8. A simple text editor such as Notepad may insert characters using the native OS
encoding, causing validation to fail.

You can perform the following customizations in indexserverconfig.xml.


• Define and configure indexes for facets.
• Add and configure categories: Specifying the XML elements that have text extraction,
tokenization, and storage of tokens. Specify the indexes that are defined on the category and the
XML elements that are not indexed. Change the collection for a category.
• Configure system, indexing, and search metrics.
• Conserve disk space by purging the status database on startup.
• Specify a custom routing-class for user-defined domains.
• Change the xDB listener port and admin RMI port.
• Turn off lemmatization.
• Lemmatize specific categories or element content.
• Configure indexing depth (leaf node).
• Change the xPlore host name.
• Boost metadata and freshness in results scores.
• Add or change special characters for CPS processing.
• Specify a collection backup path.
• Trace specific classes.
• (Documentum environments) Set the security filter batch size and the user and group cache size.

Displaying and configuring the system


A single xPlore system is a set of instances with a single primary instance and optional secondary
instances. In xPlore administrator , you manage the system by selecting System Overview in the
left panel. Select a service to view the status of each instance of the service. Click Configuration to
configure all instances of the service.
You configure system-wide logging and tracing by selecting Logging or Tracing. For more
information on logging, refer to Logging, page 142. For more information on tracing, refer to
Documentum Search Development Guide.

EMC Documentum xPlore Version 1.0 Administration Guide 37


Managing the System

Configuring system metrics


Configure system metrics indexserverconfig.xml, which is located in dsearch_home/config. Stop all
xPlore instances before applying your changes. Edit the element system-metrics-service to enable
or disable system metrics. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file. For
information on the settings for indexing and CPS metrics, refer to Viewing and configuring indexing
metrics, page 80.

To configure system metrics persistence


1. Shut down all xPlore instances.
2. Open indexserverconfig.xml. (This file is located in dsearch_home/config.)
3. Add the following line to the system-metrics-service element. The wait-timeout unit is seconds.
For example, if wait-timeout is set to 10, the latest metrics are available about 10 seconds later
(average 5 seconds). The batch size determines how many metrics are accumulated before they
are saved to the system metrics database in xDB.
<persistence-service batch-size="100" wait-timeout="10"/>

Note: Up to date metrics are available after an interval of wait timeout plus 60 seconds. For
example, a wait-timeout of 10 seconds, the latest metrics are available 70 seconds later. If
wait-timeout is too small, frequent writes to the metrics service database may affect xPlore
performance.
4. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36. Back up the xPlore federation after you change this file.

Starting and stopping the system


Start or stop the xPlore system using the script that is installed in dsearch_home/jboss4.3.0/server.
(On Windows, an automatic service is installed.) You can stop non-primary instances in xPlore
administrator. Navigate to the instance in the tree and choose Stop instance. Use the same screen to
restart a secondary instance. A primary instance must be restarted from the start script or Windows
service.
Note: Shut down all instances when you shut down the primary instance. Connections between the
primary and secondary instances fail when the secondary instance tries to connect to the primary
instance after a restart. Start the primary instance before you start any secondary instances. If you run
a stop script, run as the same administrator user who started the instance.

Managing the status database


The status database records the indexing status of ingested documents (success or failure).
You can control how much data is cached before the status DB is updated. Stop the xPlore instances
and edit indexserverconfig.xml in dsearch_home/config. Validate your changes using the validation

38 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the System

tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after
you change this file. The statusdb-cache-size property for each instance can be configured. In the
following example, the cache size is set to 1000 instead of the default 10000 bytes:
<node ...>
<properties>
<property value="1000" name="statusdb-cache-size"/>
</properties>

To conserve disk space on the primary host, you can purge the status database when the xPlore
primary instance starts up. By default, the status DB is not purged. To change this property, edit
indexserverconfig.xml. (This file is located in dsearch_home/config. Shut down the xPlore instance
before applying your changes. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36.) Back up the xPlore federation after you change this file. Set the value
of the purge-statusdb-on-startup attribute on the index-server-configuration element to true.

Managing domains
A domain is a separate, independent, logical, or structural grouping of collections. Domains are
managed through the Data Management screen in xPlore administrator. The Documentum index
agent creates a domain for the repository to which it connects. This domain receives indexing
requests from the repository.
To delete a domain, you must remove the domain library using the xDB admin tool.
When you select a domain, you can create or collection, configure the domain, or run an XQuery
in the domain.

Execute XQuery
You can query a domain or collection with Execute XQuery in xPlore administrator. Enter your
XQuery expression in the input area. Check to provide information to technical support. The options
get query plan and get optimizer debug are used to provide information to EMC technical support.

Create a domain
To create a domain, select Data Management in the left panel and then click New Domain in the right
panel. Choose a default document category. (Categories are specified in indexserverconfig.xml.)
Choose a storage location from the dropdown list. (To create a new storage location, refer to
Managing storage locations, page 87.)
Use a custom routing class to route documents to a domain that you have created. The Documentum
index agent creates a domain for each repository source and routes documents to the domain
collection using a routing class. For information on custom routing classes, refer to Documentum
xPlore Development Guide.

EMC Documentum xPlore Version 1.0 Administration Guide 39


Managing the System

Configure a domain
To configure a domain, select the domain in the left panel and then click Configuration. The
document category and storage location are displayed (read-only). You can set the runtime mode
as normal (default) or maintenance (for corrupt domain). The mode does not persist across xPlore
sessions; mode reverts to runtime on xPlore restart.
For more information on maintenance mode, refer to Corrupt domain, page 91.

Attach or detach a domain or collection


When you select a domain or collection in xPlore administrator (under Data Management), you can
detach or attach it. These procedures are required for a restore operation, after backup. See To restore
a domain, page 93 or To restore a collection, page 94.

Check database consistency


Select Data Management in xPlore Administrator and then choose Check DB Consistency. This
check determines whether there are any corrupted or missing files such as configuration files or
Lucene indexes. Lucene indexes are checked to see whether they are consistent with the xDB records:
tree segments, xDB page owners, and xDB DOM nodes.

Database performance statistics


The Database performance statistics page displays statistics from xDB operations.

Managing instances
An xPlore instance is a web application instance that resides on an application server. In xPlore
administrator, click Instances to see a list of instances in the right content pane. You manage an
instance by selecting the instance in the left panel and then selecting the desired operation.
When you select an instance in xPlore administrator, the following instance information is displayed:
• OS information: Host name, status, OS, and architecture
• JVM information: Version, active thread count, and number of classes loaded.
• xPlore information: Instance version, instance type, and state
Services for the instance are accessible on the left: Indexing, search, CPS, logging, and tracing.
Collections that are bound to the instance are listed on the right. Click on a collection to go to the
Data Management view of the collection.

40 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the System

The application server instance name for each xPlore instance is recorded in indexserverconfig.xml.
If you change the name of the JBoss instance, you must change the value of the attribute
appserver-instance-name on the node element for that instance. This attribute is used for registering
and unregistering instances. Back up the xPlore federation after you change this file.
Note: All instances in an xPlore deployment must have their host clocks synchronized to the primary
xPlore instance host.

Add or delete an instance


To add an instance to the xPlore system, run the xPlore configurator script. If an xPlore instance exists
on the same host, select a different port for the new instance, because the default port is already in use.
To delete an instance from the xPlore system, use the xPlore configurator script.

Configure an instance
You select the storage location for an instance when you configure xPlore for the instance.
You can configure the indexing service, search service, or content processing service for a secondary
instance. Stop the instance before changing its configuration. Select an instance in xPlore
administrator and then click Stop Instance.
To configure the indexing service, search service, or CPS for an instance, click the appropriate icon
in the left panel.
Note: You cannot configure the primary instance after you stop it. You must configure it manually.

Configure the primary instance — You can set the following attributes on the primary instance
element in indexserverconfig.xml, which is located in dsearch_home/config. Shut down the xPlore
instance before applying your changes. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.
• xdb-listener-port.
By default, the xDB listener port is set during xPlore installation.
<node name="primary" hostname="localhost" xdb-listener-port="9330">
...
</node>

• primaryNode attribute:
Set to true
• admin-rmi-port
Specify the port at which other instances connect to xPlore administrator. By default, this value is
set to the port number of the JBoss connector + 31. Default: 9331
• url
Specify the URL of the primary instance, used to set connections with additional instances.

EMC Documentum xPlore Version 1.0 Administration Guide 41


Managing the System

Start or stop an instance


Start and stop an instance from xPlore administrator. Select an instance and then click Stop Instance
or Start Instance.
Note: If you stop the primary instance, stop and restart all secondary instances after the primary
instance has been restarted to avoid connection failures.

Getting instance status


In a browser, get the active instances using the following URL:
http://hostname:port/dsearch/?action=getactivenodes

Managing spare and failed instances


You can install a spare instance using the xPlore installer. When you install a spare instance, the
data, index, and log directories must all be accessible to the primary instance. Use shared storage
for the spare. When you activate the spare to take over a failed instance, xPlore recovers failed
data using the transaction log.
If you do not yet have a multi-instance installation, you must configure xPlore before you install
the spare instance. Stop the xPlore primary instance and edit indexserverconfig.xml. Set the
client-server-mode property of the engine-config element to true and then restart the primary instance.
Validate your changes using the validation tool described in Modifying indexserverconfig.xml, page
36. Back up the xPlore federation after you change this file. Make sure that storage for all instances is
shared, so that a spare instance can take over the index for a failed instance.
Activate a spare to replace a failed secondary instance in xPlore administrator. (If you are replacing a
primary instance, refer to To replace a primary instance with a spare instance, page 43.) You cannot
change an active instance into a spare instance.
Note: A spare instance can only be activated to replace a stopped instance.

To replace a secondary instance with a spare instance


1. Open xPlore administrator. Make sure the spare instance is running.
2. Select the spare instance. Click Activate Spare Instance.
3. Select the instance to replace.
Note: When xPlore administrator reports success, the spare instance is renamed in the UI with
the replaced instance name. When you activate a spare to replace another instance, the spare
takes on the identity of the old instance. For example, if you activated DSearchSpare to replace
DSearchNode3, the spare instance becomes DSearchNode3.

To change a failed instance into a spare one


Because a failed instance’s original identity is assigned to another instance, its identity must be
changed.

42 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the System

1. Shut down all instances.


2. Modify indexserverconfig.xml:
• Change the failed instance’s node element’s name attribute to a new unique name.
• Change its status attribute to spare.
3. Modify indexserver-bootstrap.properties. (This file is located in the WEB-INF/classes directory
of the application server instance, for example, C:\xPlore\jboss4.3.0\server\DctmServer_
PrimaryDsearch\deploy\dsearch.war\WEB-INF\classes. Change the node-name key value
to the one you set in indexserverconfig.xml.
4. Restart the primary and then the secondary instances.
5. Back up the xPlore federation.

To replace a primary instance with a spare instance


1. Shut down all xPlore instances. The shutdown scripts are located in dsearch_home/jboss4.3.0/server.
(On Windows, each instance is installed as an automatic service.) If you run a stop script, run as
the same administrator user who started the instance.
2. Edit indexserverconfig.xml, which is located in dsearch_home/config.
3. Locate the node element for the old primary instance. Delete this node element if you do not plan
to use the instance for xPlore administrator or other service. If you plan to use the instance for
xPlore administrator or other service, do the following:
• Delete the old primary node element in dsearch_home/config/indexserverconfig.xml..
• Delete the web application directory for the old primary instance, for example,
dsearch_home/jboss4.3.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war.

Caution: Do not change the value of application-instance-name. This is the name of the
instance web application.
4. Locate the spare node element in indexserverconfig.xml. (The status attribute is set to spare.)
• Set the status to normal.
• Change the value of the primaryNode attribute to true.
• Change the value of the name attribute to the name of your previous primary instance, for
example, PrimaryDsearch.

Caution: You cannot replace a primary instance with a different name. Do not change
the value of the appserver-instance-name of the primary node in indexserverconfig.xml.

• Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36.
5. Edit indexserver-bootstrap.properties in the web application for the new primary instance, for
example, dsearch_home/jboss4.3.0/server/DctmServer_Spare/deploy/dsearch.war/WEB-INF/
classes. Change the value of the node-name property to PrimaryDsearch.
6. Change the xDB properties in xdb.properties. This file is in the directory WEB-INF/classes of the
new primary instance. Change the entries to match your new primary instance, for example:

EMC Documentum xPlore Version 1.0 Administration Guide 43


Managing the System

XHIVE_BOOTSTRAP=xhive://Config8518VM0:9430
...
XHIVE_FEDERATION=C:/xPlore/config/XhiveDatabase.bootstrap
...
XHIVE_SERVER_PORT=9430

7. Edit xDB.properties in all other xPlore instances to reference the new primary instance.
8. Start the xPlore primary instance, then start the secondary instances.
9. Back up the federation.
10. Update all clients, such as xDB admin tool, index agent, and query plugin, to point to the new
primary instance name.
• xDB admin tool
Edit xh_runner.bat (Windows) or xs_runner.sh (Linux) in dsearch_home/xhive/admin. Your
new values must match those in indexserverconfig.xml for the new primary instance.
— Change the path for XHIVE_HOME to the path to the new primary instance web
application.
— Change the host name in XHIVE_BOOTSTRAP=xhive:// to match the hostname attribute
for.the new instance (in indexserverconfig.xml). Change the port to match the port for the
value of the attribute xdb-listener-port on the new instance. For example:
set XHIVE_BOOTSTRAP=xhive://NewHost:9430

— Change ESS_HOST to the new host name.


— Change ESS_PORT to match the value of the port in the url attribute of the new primary
instance (in indexserverconfig.xml).
• The Documentum index agent
Shut down the index agent instance and modify indexagent.xml in dsearch_home/
jboss4.3.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
Change parameter values for parameters that are defined in the element
indexer_plugin_config.generic_indexer.parameter_list.parameter.
— Change the parameter_value of the parameter dsearch_qrserver_host to the new host
name.
— Change the parameter_value of the parameter dsearch_qrserver_port to the new port.
• Use iAPI to change the parameters for the host name and port. This change will take effect
when you restart the repository.
— To set the port, enter your new port at the SET command line:
retrieve,c,dm_ftengine_config
set,c,l,param_value[2]
SET>new_port
save,c,l

— To set the host name, enter your new host name at the SET command line:
retrieve,c,dm_ftengine_config
set,c,l,param_value[3]
SET>new_hostname
save,c,l

• Update the environment variable DSS_INSTANCE on the new primary instance to point to
the path of the new instance. This environment variable is used by the restore scripts. (For

44 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the System

more information about the scripts, refer to Scripted backup and restore utilities, page 96.) For
example:
dsearch_home/jboss4.3.0/server/DctmServer_Spare/deploy/dsearch.war/WEB-INF

Using the watchdog service


The xPlore watchdog service is a Windows service or daemon process that monitors and checks the
status of various processes in xPlore. One watchdog service is installed on each xPlore host. Thus,
if a host has multiple xPlore instances, the watchdog service can monitor all instances. If a process
such as the indexing or search service fails, the watchdog service detects the failure and sends an
email notification to the administrator.
The watchdog process starts at xPlore installation and when the host is booted up. It runs as a
standalone Java process.
Note: Turn off the watchdog service in a clustered environment.

To turn off the watchdog service — On Windows hosts, stop the watchdog service: Documentum
Search Services Watchdog. On UNIX and Linux hosts, run the script stopWatchdog.sh in
dsearch_home/watchdog. If you run a stop script, run as the same administrator user who started
the instance.

To restart the watchdog service — On Windows hosts, start the watchdog service: Documentum
Search Services Watchdog. On UNIX and Linux hosts, run the script startWatchdog.sh in
dsearch_home/watchdog.

EMC Documentum xPlore Version 1.0 Administration Guide 45


Managing the System

46 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 3
Managing Security

xPlore does not have a security subsystem. Anyone with access to the xPlore host port can connect to
it. You must secure the xPlore environment using network security components such as a firewall
and restriction of network access. Secure the xPlore administrator port and open it only to specific
client hosts.

Documentum search results security


Documentum repository security is managed through individual and group permissions (ACLs). By
default, security is applied to results before they are returned to the Content Server (native xPlore
security), providing faster search results.
For faster search results, especially when facets are turned on in the client application, search results
are filtered for permissions in xDB before they are returned to the Content Server. This xPlore
security minimizes the result set that is returned to the Content Server. Content Server queues
changes to ACLs and groups, which sometimes causes a delay between changes in the Content Server
and propagation of security to the search server.
There can be a delay between changes to security in the Content Server and application of security
to search results. For example, before the index agent has processed a document for indexing or
updated changes to a permission set, users cannot find the document in a search. To eliminate
latency and support complete transactional consistency, you can turn on security filtering in the
Content Server. To support security filtering in the Content Server and not in xPlore, change the value
of ftsearch_security_mode in the dm_ftengine_config object to 0. The default is 1 (on), and security
evaluation by default is performed in xPlore.
Note: Performance can be slower when security is performed in Content Server instead of the index
server.

To turn off security filtering in the xPlore server


1. Open the IAPI tool from the Documentum Server Manager on the Content Server host.
2. Enter the following command to turn off filtering. Note lowercase L in the set and save
commands:
retrieve,c,dm_ftengine_config
set,c,l,ftsearch_security_mode
0
save,c,l
reinit,c

EMC Documentum xPlore Version 1.0 Administration Guide 47


Managing Security

To check your existing security mode, enter the following command:


retrieve,c,dm_ftengine_config
get,c,l,ftsearch_security_mode

3. Restart all xPlore instances.


Note: If you turn on security filtering in xPlore after you have turned it off, you must replicate the
ACLs and groups from the Content Server. The script aclreplication_for_repositoryname.bat or .sh is
located in dsearch_home/setup/indexagent/tools. Edit the script before you run it to set the repository
name, repository user, password, xPlore primary instance host, xPlore port, and xPlore domain
(optional). If you run a script, run as the same administrator user who started the instance.

To manually update permissions in xPlore


Permissions are populated when you install xPlore. You can manually populate or update the ACL
and group information in xPlore for the following use cases:
• You are migrating from FAST
• You start using xPlore in a repository that has no full-text system
• You deploy a Documentum application that creates ACLs and groups
To manually update Documentum permissions, perform the following steps:
1. Locate the script aclreplication_for_repositoryname.bat or .sh in dsearch_home/setup/
indexagent,tools.
2. Edit the script to add the repository installation owner password.
3. Launch the script.

Configuring the security cache


Increase the cache size for large numbers of users or large numbers of groups that users can belong to.
You can set cache sizes for number of groups a user belongs to (groups-in cache), number of groups
that a user does not belong to (not-in-groups cache), and number of users in the cache (ACL cache).
Note: Groups-in and not-in-groups cache sizes can affect performance. For more information, refer
to Changing the security cache sizes, page 168.

To change security cache sizes


1. Stop all xPlore instances
2. Edit indexserverconfig.xml, located in dsearch_home/config.
3. Change the size of a cache in the security-filter-class element:
<security-filter-class name="documentum" default="true" class-name="
com.emc.documentum.core.fulltext.indexserver.services.security.SecurityJoin">
<properties>
<property name="groups-in-cache-size" value="1000"/>
<property name="not-in-groups-cache-size" value="1000"/>
<property name="acl-cache-size" value="400">
<property name="batch-size" value="800">
</properties>
</security-filter-class>

48 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Security

4. If necessary, change the Groups-in cache cleanup interval by adding a property to the
security-filter-class properties. The default is 7200 sec (2 hours).
<property name="groupcache-clean-interval" value="7200">

5. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36. Back up the xPlore federation after you change this file.

Configuring results summary security


By default, users see search results for which they have BROWSE permission if SUMMARY is
not selected. If SUMMARY is in the select list, they see only results for which they have READ
permission. To modify the permissions applied to FTDQL and non-FTDQL search summaries, change
the security_mode property of the dm_ftengine_config object. Use one of the following values:
• BROWSE
Displays all results for which the user has at least BROWSE permission. If the user has BROWSE
permission, the summary is blank.
• READ
Displays all results for which the user has at least READ permission.
• SUMMARY_BASED (default)
Displays all results for which the user has at least BROWSE permission if SUMMARY is not in
the select list. Displays results for which the user has at least READ permission if SUMMARY
is selected.
In the following iAPI example, the summary mode is set to READ:
retrieve,c,dm_ftengine_config
append,c,l,param_name
security_mode
append,c,l,param_value
READ
save,c,l

Troubleshooting security
The following topics describe troubleshooting of Documentum security in search results.

Viewing security in the log


Check dsearch.log to view the following information:
• The XQuery expression
• Security filtering statistics

EMC Documentum xPlore Version 1.0 Administration Guide 49


Managing Security

For example:
<message Total not-in-groups cache hits="0" Number of matching group probes="0"
Total ACL cache hits="0" Number of ACL index probes="0" Total groups-in cache
hits="0" Total values from data page="6" Total values from index keys="0"
Number of group probes="3" Minimum permit level="2" Filter output="2"
Filter input="2"><![CDATA[]]></message>

• Security filter messages


For example:
<message><![CDATA[Security Filter invoked]]></message>

If auditing is turned on, the following additional information is saved in dsearch.log:


• TOTAL_INPUT_HITS_TO_FILTER
How many hits a query had before security filtering.
• HITS_FILTERED_OUT
How many hits were discarded because the user did not have permissions for the results.
• GROUP_IN_CACHE_HIT
How many times the group-in cache was probed for a query.
• GROUP_OUT_CACHE_HIT
How many times the not-in-groups cache was probed for a query.
• GROUP_IN_CACHE_FILL (Number of matching group probes)
How many times the query added a group to the group-in cache.
• GROUP_OUT_CACHE_FILL (difference between Number of group probes and matching group
probes)
How many times the query added a group to the group-out cache.
In the following example from the log, the query returned 2200 hits to filter. Of these, 2000 were
filtered out, returning 200 results to the client application. The not-in-groups cache was probed 30
times for this query, and the cache was filled with 3 entries, for groups that the user did not belong to:
<USER_NAME>tuser4</USER_NAME>
<TOTAL_INPUT_HITS_TO_FILTER>2200</TOTAL_INPUT_HITS_TO_FILTER>
<HITS_FILTERED_OUT>2000</HITS_FILTERED_OUT>
<GROUP_IN_CACHE_HIT>0</GROUP_IN_CACHE_HIT>
<GROUP_OUT_CACHE_HIT>30</GROUP_OUT_CACHE_HIT>
<GROUP_IN_CACHE_FILL>0</GROUP_IN_CACHE_FILL>
<GROUP_OUT_CACHE_FILL>3</GROUP_OUT_CACHE_FILL>

Verifying security settings in the Content Server


Use iAPI to verify that dm_fulltext_index_user is registered to receive events for dm_acl and
dm_group with the following commands. They should return at least one ACL object ID and one
group object ID:
?,c,select r_object_id from dm_type where name=’dm_acl’
?,c,select r_object_id from dm_type where name=’dm_group’

50 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Security

Verify that the ACL IDs are registered for the events dm_save, dm_destroy, dm_saveasnew and the
group IDs are registered for the events dm_save and dm_destroy, for example:
?,c,select registered_id,event from dmi_registry where user_name=’
dm_fulltext_index_user’

Determining the area of failure


1. Start at the lowest level, xDB. Use the xDB admin tool to execute the XQuery. (Get the XQuery
from the log and then click the query icon in the admin tool.)
2. If the query runs successfully in xDB, use xPlore administrator to run the XQuery (Execute
XQuery in the domain or collection view).
3. If xPlore administrator runs the query successfully, check the query plugin trace log. (Refer
to Tracing Documentum queries, page 114.)
4. If there are two counter.xml files in domain_name/Data/ApplicationInfo/group collection, delete
the file that contains the lower integer value.

The wrong number of results are returned


There is latency between document creation or modification and indexing. First, check whether the
object has been indexed yet. You can use the following DQL. Substitute the actual object ID of the
document that exists on the Content Server but is not found in search results:
select r_object_id from dm_sysobject search document contains object_id

If the object has been indexed, check the following:


• Check user permissions. Run the query as superuser or through xPlore administrator.
• ACL and group databases may be out of synch. Run the manual update script aclreplication. (Refer
to To manually update permissions in xPlore, page 48.)
• Query tokens may not match indexed tokens (because of contextual differences). Run the
tokenization test on the query terms and on the sentence containing the terms in the document.
(Refer to Testing tokenization, page 127.)
• Make sure the attribute was not excluded from tokenization. Check indexserverconfig.xml for a
subpath whose full-text-search attribute is set to false, for example:
<sub-path ...full-text-search="false" ...path="dmftmetadata//acl_name"/>

• Make sure counter.xml has not been deleted from the collection domain_name/Data/
ApplicationInfo/group. If it has, restart xPlore.
• Try the query with Content Server security turned on. (Refer to To turn off security filtering in
the xPlore server, page 47.)
• Summary may be blank if the summary security mode is set to BROWSE. (Refer to Configuring
results summary security, page 49.)

EMC Documentum xPlore Version 1.0 Administration Guide 51


Managing Security

Query execution is slow


Check the audit events using xPlore administrator. Refer to Changing the security cache sizes, page
168 for cache changes that can speed up queries.

Troubleshooting a DFC client


To log XQuery and XML results, set log4j.logger.com.documentum.fc.client.search=DEBUG,
stdout in dfc.properties for the DFC application. The file dfc.properties is generally located in the
WEB-INF/classes directory of a web application like Webtop or CenterStage.
To see the query generated from Webtop, control-click on Edit in the search results page.

How xPlore replicates security


ACL (dm_acl) and group (dm_group) objects are stored in XML format in the xPlore xDB. The
XML format is ACLXML for ACLs and GroupXML for groups. They are updated when a Save,
Save as new, or Destroy event on an ACL or group takes place in the repository. The XML
for ACLs and groups is stored as a collection in xDB: domain_name/Data/ApplicationInfo/acl or
/ApplicationInfo/group.
Note: Do not delete the file counter.xml in each of these collections. This file is used to track cache
entries.
The security filter is applied in xDB to filter search results per batch. (The batch size is configurable.
Refer to To change security cache sizes, page 48.) The security filter receives the user credentials,
minimum permit level, whether MACL is enabled, privileges such as superuser, and dynamic state of
the user (dynamic group membership).
You can set cache sizes for number of groups a user belongs to (groups-in cache), number of groups
that a user does not belong to (not-in-groups cache), and number of users in the cache (ACL cache).
The ACL cache is a per-query LRU cache that contains ACLs and granted permissions for users. The
groups-in cache is a global LRU cache that is shared between search sessions, containing a user and all
the groups that a user belongs to. Entries in the caches are replaced on a first-in, first-out (FIFO) basis.

52 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 4
Managing the Index Agent

The following topics describe Documentum indexing functionality and tasks in the xPlore server. For
information on troubleshooting, refer to Troubleshooting the Documentum index agent, page 120.
For information on creating custom indexes, refer to "Creating custom indexes" in Documentum
xPlore Development Guide.

Documentum attributes that control indexing


Documents are selected for indexing in the Content Server based on the following criteria:
• If a_full_text attribute is false, the content is not indexed. Metadata is indexed.
• If a_full_text attribute is true, content is indexed based on the following attributes on the
dm_format associated with the document:
— If can_index is true, the document is indexed.
— If format_class is ft_always, the document is indexed.
— If format_class is ft_preferred (a preferred rendition), the rendition is indexed.
— If the object has renditions that are not of the format_class ft_always or ft_preferred, and
can_index is true, the rendition is indexed.
Sample DQL to determine these attribute values for the format bmp:
select can_index, format_class from dm_format where name = ’bmp’

Configuring the index agent


Most of the configuration options for indexing and querying are available in xPlore administrator.
For more information on configuring indexing in xPlore, refer to Chapter 6, Managing Indexing.
The Documentum index agent is configured for the first time through the index agent
configurator, which can be run after installing xPlore. For information on running
the configurator, refer to Documentum xPlore Deployment Guide. The following topics
describe index agent settings. The index agent installer and configurator set some of
these parameters. Parameter default values have been optimized for most environments.
They can be changed later using iAPI or by editing indexagent.xml, which is located in

EMC Documentum xPlore Version 1.0 Administration Guide 53


Managing the Index Agent

dsearch_home/jboss4.3.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
For descriptions of the settings, refer to Documentum index agent parameters, page 171.
Note: If you change parameters in indexagent.xml, stop and restart the index agent for the
parameters to take effect.

Limit content size for indexing — You can set a maximum size for content that is indexed. This is
the actual document size, not the size of the text within the content. To set the maximum content
size, edit the contentSizeLimit parameter within the parent element exporter. The value is in bytes.
Default: 20MB.

Exclude ACL and group attributes from indexing — By default, all attributes of ACLs
and groups are indexed. You can specify that certain attributes of ACLs and groups are not
indexed. Add an acl_exclusion_list and group_exclusion_list element to the parent element
indexer_plugin_config/generic_indexer.parameter_list. These elements are described in Table 22,
page 171.

Change the local content storage location — When you configured the index agent,
you selected a local content temporary staging location. You can change this location
by editing the local_content_area element in indexagent.xml. This file is located in
dsearch_home/jboss4.3.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
Restart the index agent web application after editing this file.

Caution: For multi-instance xPlore, the temporary staging area for the index agent must be
accessible from all xPlore instances.

Setting up index agents for ACLs and groups


By default, you configure an index agent for each Documentum repository that will be indexed. You
can also set up multiple index agents to index various object types within a repository.
You can separate the indexing of various object types, such as dm_acl and dm_groups and sysobjects.
Create two index agents. Run the index agent configurator and give the agent instance a name and
port that are different from the first agent. (The configurator is the file configIndexagent.bat or
configIndexagent.sh in dsearch_home/setup/indexagent.)
After you have installed a second index agent, edit indexagent.xml for this agent. (This file is located in
dsearch_home/jboss4.3.0/server/DctmServer_Indexagent2/deploy/IndexAgent.war/WEB-INF/classes.)

Example 4-1. ACL and group index agent configuration


The following example assumes that your original index agent processes sysobjects and your new
index agent processes ACLs and groups. Add one parameter set to your new indexagent.xml file.
Set the value of parameter_name to index_type_mode, and set the value of parameter_value to aclgroup
as follows:
<indexer_plugin_config>
<generic_indexer>
<class_name>… </class_name>
<parameter_list>
...
<parameter>
<parameter_name>index_type_mode</parameter_name>

54 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the Index Agent

<parameter_value>aclgroup</parameter_value>
</parameter>
</parameter_list>
</generic_indexer>
</indexer_plugin_config>

In the indexagent.xml for sysobjects (the original index agent), add a similar parameter set. Set the
value of parameter_name to index_type_mode, and set the value of parameter_value to sysobject. Restart
both index agents. (To restart, navigate to dsearch_home/jboss4.3.0/server Run stopIndexagent.cmd and
stopIndexagent2.cmd. Then run startIndexagent.cmd and startIndexagent2.cmd.

Filtering content and locations


You can set Documentum object formats to index metadata only or configure the pre-packaged
xPlore index agent filters to completely filter out content and metadata for specific object types or
repository locations:
• Making types non-indexable, page 55
• Indexing metadata only for specific formats, page 55
• Using the index agent filters, page 56
• Migrating a limited set of object types, page 57
You can create a custom index agent filter that implements IDfCustomIndexFilter. Base the filter on a
date attribute. For information on creating a BOF filter, refer to Documentum xPlore Development Guide.

Making types non-indexable


In Documentum Administrator, select the object type (Administration > Types in the left pane) and
right-click for the list of attributes. Uncheck Enable for indexing to exclude this type from indexing.
Note: The ebs script for xPlore sets the attributes acl_name and r_aspect_name to non-searchable.

Indexing metadata only for specific formats


You can set the can_index attribute of a dm_format object to F(alse) so that contents of that format are
not full text indexed and not even transferred when the dftxml is generated.
For example using iAPI:
retrieve,c,dm_format where name = ’tiff’
set,c,l,can_index
F
save,c,l

EMC Documentum xPlore Version 1.0 Administration Guide 55


Managing the Index Agent

Using the index agent filters


You can install index agent filters that exclude cabinets, folders, or object types from indexing. These
filters are packaged as a .dar file: IndexAgentDefaultFilters.dar. This file is installed with the index
agent at dsearch_home/setup/indexagent/filters. (The index agent is installed by the dsearch installer,
and the home directory on the index agent host is referred to as dsearch_home.)
To remove from the index documents that have already been indexed, refer to Removing entries
from the index, page 58.

To install the index agent filters


1. Copy IndexAgentDefaultFilters.dar, DarInstall.bat or DarInstall.sh, DarInstall.xml from
dsearch_home/setup/indexagent/filters to a temporary install directory. (The index agent is
installed by the dsearch installer, and the home directory on the index agent host is referred
to as dsearch_home.)
2. Edit DarInstall.xml:
• Specify the full path to IndexAgentDefaultFilters.dar including the file name, as the value
of the dar attribute.
• Specify your repository name as the value of the docbase attribute.
• Specify the repository superuser name as the value of the username attribute.
• Specify the repository superuser password as the value of the password attribute.
For example:
<emc.install dar="C:\Downloads\tempIndexAgentDefaultFilters.dar" docbase="
DSS_LH1" username="Administrator" password="password" />

3. Edit DarInstall.bat (Windows) or DarInstall.sh (Linux or Unix):


• Specify the path to the composerheadless package as the value of ECLIPSE.
• Specify the path to the file DarInstall.xml in the temporary working directory (excluding
the file name) as the value of BUILDFILE.
• Specify a workspace directory for the generated Composer files.
For example:
set ECLIPSE="C:\Documentum\product\6.5\install\composer\ComposerHeadless"
set BUILDFILE="C:\DarInstall\temp"
set WORKSPACE="C:\DarInstall\work"

4. Launch DarInstall.bat (Windows) or DarInstall.sh (Unix or Linux) to install the filters.

To configure the index agent filter properties


1. Open filter.properties in dsearch_home/jboss4.3.0/server/DctmServer_Indexagent/deploy/
IndexAgent.war/WEB-INF/classes.
2. To configure excluded cabinets, type a comma-delimited list of cabinet names for the key
CabinetsToExclude (no default value)
3. To configure excluded object types, type a comma-delimited list of type names for the key
TypesToExclude (no default value). Subtypes will not be excluded.

56 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the Index Agent

4. To configure excluded folders, type a comma-delimited list of folder paths for the key
FoldersToExclude. By default, temp and system folders Jobs and Reports are excluded.
5. Save the file and restart the index agent application server.
Note: Documents indexed before the filters are installed are not filtered.

Testing whether the filters are installed — Use the following DQL statement. If the filters are
installed, a list of object IDs and names of the filters is returned:
select r_object_id,object_name from dmc_module where any a_interfaces=’
com.documentum.fc.indexagent.IDfCustomIndexFilter’

You can verify that the filters are loaded by the index agent in the index agent log, which is located in
the logs subdirectory of the index agent deployment directory in the JBoss application server. The
following example from the log shows that the FoldersToExclude filter was loaded:
2010-06-09 10:49:14,693 INFO FileConfigReader [http-0.0.0.0-9820-1]Filter FoldersToExclude Value:/
/System/Sysadmin/Reports, /System/Sysadmin/Jobs,

Filter instantiation is logged similar to the following:


2010-06-09 10:49:15,896 INFO ObjectFilter [http-0.0.0.0-9820-1]
[DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter:
com.documentum.server.impl.fulltext.indexagent.filter.defaultFolderFilterAction

Troubleshooting the index agent filters — Open dfc.properties in the composerheadless


package. This package is installed with Content Server at $DOCUMENTUM/product/version/
install/composer/ComposerHeadless. The file dfc.properties is located in the subdirectory
plugins/com.emc.ide.external.dfc_1.0.0/documentum.config. Find the following lines and verify that
the IP address and port of the connection broker for the target repository are accurate.
dfc.docbroker.host[N]=connection_broker_ip_address
dfc.docbroker.port[N]=connection_broker_port

Invoking the filters in ftintegrity and stateofindex. — To invoke the index agent filters when you
run ftintegrity, follow the instructions in Verifying index migration with ftintegrity, page 122. To
invoke the filters when you run the stateofindex job, refer to Running the state of the index job, page
60. Both scripts generate a file ObjectId-filtered-out.txt that records all IDs of filtered-out objects.

Migrating a limited set of object types


If you wish to migrate a small number of object types, you can use the index agent UI. Perform the
following steps.
1. Replicate ACLs and groups to xPlore by running the aclreplication script. The script
aclreplication_for_repositoryname.bat or .sh is located in dsearch_home/setup/indexagent/tools.
Edit the script before you run it to set the repository name, repository user, password, xPlore
primary instance host, xPlore port, and xPlore domain (optional). If you run a script, run as the
same administrator user who started the instance.
2. Start the index agent in normal mode and open the UI.
3. Check Index selected list of objects, then check DQL.
4. Select the type in the From dropdown list.
5. Repeat for each type that you want indexed.

EMC Documentum xPlore Version 1.0 Administration Guide 57


Managing the Index Agent

To remove from the index documents that have already been indexed, refer to Removing entries
from the index, page 58.

Removing entries from the index


You can remove certain object types, or objects that meet other criteria such as dates, from the index.
You can execute a DQL query to get object IDs of the documents that you wish to delete from the
index. Save the list of object IDs in a text file.
Navigate to dsearch_home/dsearch/xhive/admin and open the text file deletedocs.properties. Make
sure that the host and port values correspond to those in your environment. Set the value of
dss_domain to the xPlore domain from which you wish to delete indexed documents. Change the
value of the key file_contains_id_to_delete to the path to your object IDs. Alternatively, you can list
the object IDs, separated by commas, as the value of the key ids_to_delete.
In the same directory, run the script deleteDocs.bat (Windows) or deleteDocs.sh (Linux).

Resubmitting documents for indexing


Start the index agent in normal mode. You get a page that allows you to input a selected list of objects
for indexing. Submit either a file of object IDs or DQL.
Run ftintegrity or the state of the index Content Server job to get a list of objects that failed in
indexing. (Refer to Verifying index migration with ftintegrity, page 122.) You can input the
ObjectId-common-version-mismatch.txt file into the index agent UI to see errors for those files. You
must remove all data from the file except object IDs. After you have started the index agent, check
Index selected list of objects and then check Object file. Navigate to the file and then choose Submit.

Mapping file stores and content


This topic describes how to map Content Server file stores and how to direct storage area content to
a specific collection.

Mapping file stores in shared directories


By default, the index agent performs a getfile to retrieve content from the content storage area to the
index agent temporary content location. This temporary content is deleted after it has been indexed.
For performance reasons, you can choose to share the content storage. With shared content storage,
CPS has direct read access to the content. The process is getpath instead of getfile, and no content is
streamed. The content storage area must be mountable as read-only by the Index Agent and xPlore
hosts. You map the path to the file store in index agent web application.

58 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the Index Agent

There are two configuration options in mapping file stores. One configuration is for file system paths
to the content that are identical on the Content Server host and xPlore index server host. In the
other option, the paths are different.
Note: You cannot map the remote components of a distributed store, because content is moved to
the primary site for indexing. You also cannot map contents of a turbo storage area, an encrypted
store, or an external store.

To map file stores for indexing


1. On the index agent host, open indexagent.xml, which is located in dsearch_home/JBoss4.3.0/
server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes. If you installed
multiple index agents on this host, an integer is appended to the IndexAgent WAR file name, for
example, IndexAgent1.war.
2. Edit the file for your environment:
• If paths to the content files are the same, locate the exporter element and change the value of
the child element all_filestores_local to true.
• If the paths are different, add a file store map within the exporter element, specifying
the store name and local mapping for each file store. In the following example,
Content Server is on the host Dandelion and filestore_01 is on the same host at the
directory /Dandelion/Documentum/data/repo1/content_storage_01. The index agent
and xPlore server are on a separate host with a map to the Content Server host:
/mappingtoDandelion/repo1/content_storage_01. The following map is added to the exporter
element:
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>/mappingtoDandelion/repo1/content_storeage_01</local_mount>
</local_filestore>
<!-- similar entry for each file store -->
</local_filestore_map>

Example with UNC path:


<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>\\CS\e$\Documentum\data\dss\content_storage_01</local_mount>
</local_filestore>
<!-- similar entry for each file store -->
</local_filestore_map>

Note: You must update the file_system_path attribute of the dm_location object in the
repository to match this local_mount value, and then restart the Content Server.
3. Save indexagent.xml and restart the index agent. (The application server containing the index
agent must be running.)

Tip: For better performance, you can mount the content storage to the xPlore index server host and
set all_filestores_local to true. Create a local file store map as shown in the following example:
<all_filestores_local>true</all_filestores_local>
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>\\192.168.195.129\DCTM\data\ftwinora\content_storage_01</local_mount>
</local_filestore>

EMC Documentum xPlore Version 1.0 Administration Guide 59


Managing the Index Agent

<!-- similar entry for each file store -->


</local_filestore_map>

Mapping Content Server storage areas to collections


A Content Server file store maps to an xPlore collection. You set up this map in indexagent.xml,
located in the indexing agent WAR file in the directory ($DOCUMENTUM//jboss4.3.0/server/
DctmServer_IndexAgent/deploy/IndexAgent.war/WEB-INF/classes). Add partition-config and its
child elements to the element index-agent.indexer_plugin_config.indexer to map file stores to
collections.
Note: Be sure to add this element to the indexer element.
In the following example, filestore_01 maps to collection ’coll01’, and 02 to ’coll02’. The rest of the
repository is mapped to the default collection. Each repository has one default collection named
default.
<partition_config>
<default_partition>
<collection_name>default</collection_name>
</default_partition>
<partition>
<storage_name>filestore_01</storage_name>
<collection_name>coll01</collection_name>
</partition>
<partition>
<storage_name>filestore_02</storage_name>
<collection_name>coll02</collection_name>
</partition>
</partition_config>

Running the state of the index job


Repository configuration for Content Server 6.6 and above installs a job called stateofindex. This
job is implemented as a Java method and runs in the Content Server Java method server. (The job
dm_FTStateOfIndex is called by the ftintegrity script.) The state of the index job compares the index
content with the repository content.
Note: This job is not available for Content Server 6.5 SP2 or SP3.
Execute the stateofindex job from Documentum Administrator (DA) version 6.6, connecting to
Content Server 6.6. The job generates reports that provide the following information:
• Index completeness and comparison of document version stamps
• Status of the index server, including disk space usage, instance statistics, and process status
• The total number of objects with content correctly indexed, the total number of objects with
content that had some failure during indexing, and the total number of objects with no content
The following table lists the arguments for the job. You can set the argument values in DA.
(Arguments may have a slightly different form in DA. For example, -EndDate in ftintegrity is
-end_date in DA.)

60 EMC Documentum xPlore Version 1.0 Administration Guide


Managing the Index Agent

Table 5. State of the Index job arguments

Argument Description
-batchsize value Number of objects to be retrieved from the index in each batch.

The default value is 1000.


-checkType Specifies a specific object type to check (includes subtypes). Other
types will not be checked.
-checkUnmaterializedLWSO Sets whether to check unmaterialized lightweight sysobjects during
comparison.
-collection_name Compares index for the specified collection to data in the repository.

Default: All collections. Cannot use this argument with the


ftintegrity script.
-EndDate Local end date of sysobject r_modify_date, for range comparison.
Format: MM/dd/yyyy HH:mm:ss\n
-filterProperty Specifies full path to filter.properties. Invokes all loaded filters.
If the index agent and Content Server are on separate hosts,
copy filter.properties to a directory that is accessible to Content
Server. Refer to Using the index agent filters, page 56. Slows the
stateofindex performance. Generates a file ObjectId-filtered-out.txt
that records all IDs of filtered-out objects.
-ftengine_standby Dual mode only (FAST and xPlore). Set to True. Cannot use this
argument with the ftintegrity script.
-fulltextUser Name of user who owns the xPlore instance. For dual mode (FAST
and xPlore), the user is dm_fulltext_index_user_01.
-get_id_in_indexing If specified, IDs that have not yet been indexed will be dumped to
a file, ObjectId-in-indexing.txt.

Default: False. Cannot use this argument with the ftintegrity script.
-StartDate Local start date of sysobject r_modify_date, for range comparison.
Format: MM/dd/yyyy HH:mm:ss\n
-timeout Number of minutes to time out the session. Default: 1.
-usefilter value Invokes a custom filter. xPlore filters are not invoked. (For xPlore
filters, refer to Refer to Using the index agent filters, page 56. ) The
default value is F.

If a custom filter is used for indexing, set this argument to T to use


the filter when generating the job results. The job runs more slowly
with -usefilter.
Note: You can create custom filters to stop indexing of specific object
types. Refer to Custom Full-Text Index Filter Software Environment.
In addition, the job is installed with the -queueperson and -windowinterval arguments set. The
-queueperson and -windowinterval arguments are standard arguments for administration jobs and
are explained in the Content Server Administration Guide.

EMC Documentum xPlore Version 1.0 Administration Guide 61


Managing the Index Agent

Reports from state of the index job — The job generates a job report, FTStateofIndexDoc.txt and
four results files. The FTStateofIndexDoc.txt contains information about the job execution, like the job
reports generated by other administration jobs. The four results files are:
• ObjectId-common-version-match.txt
This file contains the object IDs and i_vstamp values of all objects in the index and the repository
and having identical i_vstamp values in both places.
• ObjectId-common-version-mistch.txt
This file records all objects in the index and the repository with identical object IDs but
nonmatching i_vstamp values. For each object, it records the object ID, i_vstamp value in the
repository, and i_vstamp value in the index.
• ObjectId-dctmOnly.txt
This report contains the object IDs and i_vstamp values of objects in the repository but not in
the index.
• ObjectId-indexOnly.txt
This report contains the object IDs and i_vstamp values of objects in the index but not in the
repository.
The report and result files are in %DOCUMENTUM%\dba\log\sessionID\sysadmin
($DOCUMENTUM/dba/log/sessionID/sysadmin).
Note: You can also use ftintegrity to check the consistency between the repository and the xPlore
index. (Refer to Verifying index migration with ftintegrity, page 122.) To disable the FTStateOfIndex
job, enter the following using iAPI in Documentum Adminstrator:
Iapi>retrieve,c,dm_job where object_name=’dm_FTStateOfIndex’
Iapi>set,c,l,is_inactive
SET>T
Iapi>save,c,l

62 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 5
Managing Document Processing (CPS)

The content processing service (CPS) performs the following functions:


• Retrieves indexable content from content sources
• Determines the document format and primary language
• Parses the content into index tokens that xPlore can process into full-text indexes
You can install CPS separately from indexing and querying services using the xPlore installer. If the
indexing service uses a local CPS installation, it is an in-process call. If CPS is called on a separate
instance, it is via URL. Each instance can be configured to use a specific CPS process or URL. You
can also configure sharing of a single CPS instance by the system. When you install multiple CPS
instances, they are called in a round-robin order.
You can configure common CPS tasks in xPlore administrator and lesser-used tasks in the CPS
configuration file configuration.xml, which is located in dsearch_home/dsearch/cps/cps_daemon. The
CPS components include a text extractor, an XML validator, and a language processor.
You can view the CPS version and statistics using xPlore administrator. Select an instance in the
tree and expand to choose Content Processing Service. For information on CPS troubleshooting,
refer to Troubleshooting CPS, page 126.
The following topics describe CPS administration tasks:
• Adding a remote CPS instance, page 63
• Starting and stopping CPS, page 64
• Viewing CPS statistics, page 64
• Managing CPS and tokenization, page 65
• Adding dictionaries to CPS, page 73

Adding a remote CPS instance


By default, every xPlore instance has a local CPS. To improve indexing or search performance, you
can install CPS on a delegated server. The installer adds a JBoss instance, CPS ear file, and CPS
native daemon on the remote host.

Caution: The remote instance must be on the same operating system as other xPlore instances.

EMC Documentum xPlore Version 1.0 Administration Guide 63


Managing Document Processing (CPS)

After remote CPS installation, perform the following steps:


1. Register the remote CPS instance in xPlore administrator. Open Services > Content Processing
Service in the tree and then click Add. Enter the URL to the remote instance using the following
syntax:
http://hostname:port/services

For example:
http://DR:8080/services

In this same screen, specify whether the CPS instance will be used to process indexing requests
(the index option), search requests (the search option), or both (the all option).
2. Start the CPS instance using the start script startCPS.bat or startCPS.sh in dsearch_home/jboss4.3.0/
server. (On Windows, the standalone instance is installed as an automatic service.)
3. Test the remote CPS service using the WSDL testing page, with the following syntax:
http://hostname:port/services/cps/ContentProcessingService?wsdl

Note: When you install CPS on a host remote from the xPlore indexing server, make sure the location
specified in export_path in cps configuration.xml is accessible by xPlore.

Starting and stopping CPS


You can configure CPS tasks in xPlore administrator. In the left pane, expand the instance and click
Content Processing Service. Click Configuration
• Stop CPS: Select an instance in the xPlore administrator tree, expand it, and choose Content
Processing Service. Click Stop CPS and then click Suspend..
• Start CPS: Select an instance in the xPlore administrator tree, expand it, and choose Content
Processing Service. Click Start CPS and then click Resume.
If CPS crashes or malfunctions, the CPS manager will try to restart it to continue processing.
Note: For a standalone CPS instance, use the JBoss scripts on the instance to start or stop
CPS. The scripts are located in the directory dsearch_home/jboss4.3.0/server, for example,
startCPS_standalone.cmd or startCPS_standalone.sh.

Viewing CPS statistics


To view all CPS instances in the xPlore federation, expand Services > Content Processing Service. Click
Add to register a remote CPS instance that you have installed. For more information on remote
instances, refer to Adding a remote CPS instance, page 63.
To view information about a specific CPS instance, expand the instance and click Content Processing
Service. The following information about the CPS instance is displayed:
• Versions
• Statistics

64 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Document Processing (CPS)

You can configure some of the CPS settings in xPlore administrator. Click Configuration. For more
information, refer to Content processing instance settings, page 173.
The default settings have been optimized for most environments. You may require technical support
to evaluate the effects of changes to these settings. For a description of these settings, refer to Content
processing instance settings, page 173.

Managing CPS and tokenization


The following topics describe how content is tokenized:
• White space, page 65
• Lemmatization, page 65
• Special characters, page 69
• Case sensitivity, page 71
• Stop words, page 71
• Fuzzy search (wildcards), page 71
• Query operators, page 72
• Language, page 73
For information on testing tokenization, refer to Testing tokenization, page 127.

White space
Word separation is first identified by white space such as a space separator or line feed. Subsequently,
special characters are substituted with white space. Refer to Special characters, page 69.
For Asian languages, white space is not used. Content is tokenized by entity recognition and logical
fragments.

Lemmatization
Lemmatization is a normalization process that reduces a word to its canonical form. For example, a
word like books is normalized into book by removing the plural marker. Am, are, and is are normalized
to “be.” This behavior contrasts with stemming, a different normalization process in which stemmed
words are reduced to a string that sometimes is not a valid word. For example, ponies becomes poni.
xPlore uses an indexing analyzer that performs lemmatization. Studies have found that some form of
stemming or lemmatization is almost always helpful in search.
Lemmatization is applied to indexed documents and to queries. Lemmatization analyzes a word
for its context (part of speech), and the canonical form of a word (lemma) is indexed. The extracted
lemmas are actual words.

EMC Documentum xPlore Version 1.0 Administration Guide 65


Managing Document Processing (CPS)

Note: Two forms of the same word may not be lemmatized to the same canonical form. For example,
“singing” is lemmatized to the noun form “singing,” and “sing” is lemmatized to the verb form
“sing.” A search on “singing” without context will not find content containing “sing.”
Lemmatization saves both the indexed term and its canonical form in the index, effectively doubling
the size of the index.

Limits of lemmatization — Because lemmatization is context-based, a word is lemmatized


differently depending on its context in a sentence, yielding variable results. For example, saw is
lemmatized to see or to saw depending on the context. A query sometimes does not have enough
context to determine which of these bases is required. In another example, the noun swimming
is not lemmatized to the related verb to swim. A search for swimming does not return documents
containing swim. Lemmatization of queries is more prone to error because less context is available
in comparison to indexing.

Disabling lemmatization — To turn off lemmatization for both indexing and search, add
an enable-lemmatization attribute to the domain element in indexserverconfig.xml. Set the
value to false. (This file is located in dsearch_home/config. Shut down the xPlore instance before
applying your changes. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36.) Back up the xPlore federation after you change this file.

Lemmatization of a Documentum query — In the DFC API IDfXQuery.setXqueryString, the term is


lemmatized if the ‘with stemming’ option is included. In DQL, terms in a search document contains
(SDC) clause are always lemmatized but phrases or terms with wildcards are not. For example, the
query select r_object_id from dm_document search document contains ‘companies winning’ produces the
following tokens: companies, company, winning, and win.

Configuring lemmatization for specific categories and elements — Lemmatization for specific
categories and elements can be configured in indexserverconfig.xml. Within a category element,
add or edit a linguistic-process element. (You must shut down the xPlore instance before
applying your changes. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36.) Back up the xPlore federation after you change this file. This
element can specify elements or their attributes that are lemmatized when indexed, as shown in the
following table of child elements. If you do not configure the linguistic-process element, then all
input XML fields will be processed.

Table 6. linguistic-process element

Element Description
element-with-name The name attribute on this element specifies the
name of an element that contains lemmatizable
content.

66 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Document Processing (CPS)

Element Description
save-tokens-for-summary-processing Child of element-with-name. If this element
exists, the parent element tokens are saved.
They are used in determining a summary or
highlighting. Specify the maximum size of
documents in bytes as the value of the attribute
extract-text-size-less-than. Tokens will not be
saved for larger content. Set the maximum size
of tokens for the element as the value of the
attribute token-size.
element-with-attribute The name attribute on this element specifies the
name of an attribute on an element. The value
attribute contains a value of the attribute. When
the value is matched, the element content is
lemmatized.
element-for-language-identification Specifies an input element that is used by CPS to
identify the language of the document.

Caution: If you wish to apply your lemmatization changes to the existing index, you must
reindex your documents.

In the following example from indexserverconfig.xml, the content of an input element with the
attribute dmfttype with a value of dmstring is lemmatized. An input element with the name dmftcustom
is processed if the extracted text does not exceed 262144 bytes. Several elements are specified for
language identification.
<linguistic-process>
<element-with-attribute name="dmfttype" value="dmstring"/>
<element-with-name name="dmftcustom">
<save-tokens-for-summary-processing extract-text-size-="
262144" token-size="65536"/>
</element-with-name>
<element-for-language-identification name="object_name"/>
...
</linguistic-process>

Troubleshooting lemmatization — If a query does not return expected results, examine the
following:
• Test the query phrase or terms for lemmatization and compare to the lemmatization in the context
of the document. (You can test each sample using xPlore administrator Test Tokenization.
• View the query tokens by setting the dsearch logger level to DEBUG using xPlore administrator.
Expand Services > Logging and click Configuration. Set the log level for dsearchsearch. Tokens are
saved in dsearch.log.
• Check whether some parts of the input were not tokenized because they were excluded from
lemmatization: Text size exceeds the configured value of the extract-text-size-less-than attribute.

EMC Documentum xPlore Version 1.0 Administration Guide 67


Managing Document Processing (CPS)

• Check whether a sub-path excludes the element from search. The sub-path attribute full-text-search
is set to false.
• If you have configured a collection to save tokens, you can view them in the xDB admin tool. (
Refer to Using the xDB admin tool, page 36. ) Token files are generated under the Tokens library,
located at the same level as the Data library. You can also view tokens in the stored DFTXML
using xPlore administrator if dynamic summary processing is enabled. (The number of tokens
stored in the DFTXML depends on the configured amount of tokens to save.) Click on a document
in a collection to see the DFTXML. Figure 9, page 68 displays tokens in xPlore administrator:

Figure 9. Tokens in DFTXML

Configuring a collection to save tokens — To save tokens of metadata and content, set the property
save-tokens to true for the collection. The default is false. (Refer to Modifying indexserverconfig.xml,
page 36 for instructions on modifying indexserverconfig.xml.) For example:

68 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Document Processing (CPS)

Figure 10. Tokens database in xDB admin tool

The tokens database stores the original and root forms of the text, the components of compound
words, the starting and ending offset relative to the field the text is contained in, and whether it
was identified as a stop word.
<collection document-category="dftxml" usage="Data" name="default">
<properties>
<property value="true" name="save-tokens" />
</properties>
</collection>

Note: Saving tokens increases disk space usage.

Special characters
Special characters are used to break text into meaningful tokens. Two types of special characters
are defined in xPlore:
• Characters that are treated as white space
The default special characters are defined in indexserverconfig.xml as the value of the
special-characters attribute on the content-processing-services element:
@#$%^_~`*&;:()-+=<>/\[]{}

EMC Documentum xPlore Version 1.0 Administration Guide 69


Managing Document Processing (CPS)

White space is substituted for these characters. For example, a phrase extract-text is tokenized as
extract and text, and a search for either term finds the document.
• Characters that are required for context (punctuation)
The default context characters are defined in indexserverconfig.xml as the value of the
context-characters attribute of the content-processing-services element:
!,.;?'&quot;

White space is substituted after the parts of speech have been identified. For example, the email
address john.smith@emc.com contains a special character (@) and two instances of a context
special character ( . ) Because the context special character . is not punctuation in this example,
it is not replaced as white space. The string is tokenized as two tokens: john.smith emc.com
For the phrase “John Smith is working for EMC.” the period is filtered out because it functions
as a context special character (punctuation).
<content-processing-services ...context-characters="!,.;?'&quot;" ...>

These characters are context-sensitive and cannot be used for tokenization until the part of speech
has been identified.

Queries that contain special characters — When a string containing a special character is indexed,
the tokens are stored next to each other in the index. A search for the string is treated as a phrase
search. For example, an index of home_base stores home and base next to each other. A search for
home_base finds the containing document but does not find other documents containing home or
base but not both.

To change the special characters list


1. Stop all xPlore instances.
2. Open indexserverconfig.xml in dsearch_home/config.
3. Edit the special-characters attribute on the content-processing-services element.
4. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36.
5. Back up the xPlore federation.
6. Reindex your documents to apply your changes to the existing index.
To change the location for the list of white space special characters, open the CPS configuration screen
for a selected CPS instance. Set the path as the value for Illegal char file.

To change the context characters list


1. Stop all xPlore instances.
2. Open indexserverconfig.xml in dsearch_home/config.
3. Edit the context-characters attribute on the content-processing-services element.
4. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36.
5. Back up the xPlore federation.
6. Reindex your documents to apply your changes to the existing index.

70 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Document Processing (CPS)

Troubleshooting — If you edit a special characters list, you must reindex all your documents to
apply the new tokenization rules. If a query fails, check to see whether it contains a special character.

Case sensitivity
All characters are stored as lowercase in the index. For example, the phrase “I’m runNiNg iN THE
Rain” is lemmatized and tokenized as “I be run in the rain.”
Case sensitivity is not configurable.

Stop words
Stop words are words that are filtered out before indexing or query tokenization, to save the size of
the index and to prevent searches on common words. The stop words list for each language is located
in dsearch_home/dsearch/cps/cps_daemon/shared libraries/rlp/etc. Some languages do not require a
stop words list. Stop words are removed from phrase searches. This can cause phrase searches that
contain a stop word to fail. For example, a document that contains the phrase “be safe” would not
be found with a search for “be safe,” because the “be” is removed and a null set is intersected with
the documents that contain “safe.”
Editing the stop words list is not supported in this release.

Enabling stop words — Stop words are not enabled in this release. To enable stop words,
set the value of the property filter_stop_word (child of linguistic_processing) to true in the file
InstanceName_local_configuration.xml where InstanceName is the name of the instance in which CPS is
running. The file is located in dsearch_home/dsearch/cps/cps_daemon.

Fuzzy search (wildcards)


Lemmatization of search terms is not applied to terms that contain wildcards. Wildcards match
separate terms, not fragments of a term. For example, computer* matches “computer store” or
“computer parts” but not “computers.” However, wildcards in phrase searches can match word
fragments. Fragment search support can be turned on in xPlore, but it causes slower performance.
For details, refer to Configuring search for fragments, wildcards, and like terms, page 113 and
Turning on support for fragments, page 113.
The following DQL wildcards are supported. All other characters are treated as literals.
• search document contains wildcards * and ?.
• Where clause wildcard %.
The following XQuery wildcards are supported:
• If a period is present, but there are no qualifiers, one character in the text matches.
• If a question mark follows a period (?), zero or one character in the text being searched is matched.
• If an asterisk follows a period (.*), zero or more characters are matched.

EMC Documentum xPlore Version 1.0 Administration Guide 71


Managing Document Processing (CPS)

• If a plus sign follows a period (.+), one or more characters are matched.
• If two comma-separated numbers enclosed by curly braces follow a period, (.{n,m}), a specified
range of characters (at least n characters and no more than m characters) is matched.
To escape a wild card character in an XQuery statement, prefix a back slash. For example, a query
containing hotmail.com would be escaped to hotmail\.com.
Following are sample queries with wildcards.

To match a single word with a wildcard — To match glance with a wildcard, use syntax similar
to the following:
for $i in /dmftdoc[//object_name ftcontains ’g.*nce’ with wildcards] return
{$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier }

To find documents with two words — To match two words in a document, use syntax similar
to the following:
for $i in /dmftdoc[.ftcontains ’corporate’ ftand ’profile’] return
{$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier }

To find documents with a phrase — To match a phrase in a document, use syntax similar to the
following:
for $i in /dmftdoc[.ftcontains {’corporate’,’profile’} phrase] return
{$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier }

To find an exact match — To match an exact text in a document, use syntax similar to the following:
for $i in /dmftdoc[//object_name=’bugs.xls’] return
{$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier }

Query operators
Operators in XQuery expressions and DQL are interpreted in the following ways:
• XQuery operators
— The value operators = != < > specify a value comparison search. Search terms do not need to be
tokenized. Can be used for exact match or range searching on dates and IDs.
Any subpath that can be searched with a value operator should have the value-comparison
attribute set to true for the corresponding subpath configuration in indexserverconfig.xml. For
example, an improper configuration of the r_modify_date attribute sets full-text-search to
true. A date of ‘2010-04-01T06:55:29’ is tokenized into 5 tokens: ’2010’ ’04’ ’01T06’ ’55’ ’29’. A
search for ’04’ returns any document modified in April. The user will get many non-relevant
results. Therefore, r_modify_date should only have value-comparison set to true. Then the
date attribute is indexed as one token. A search for ’04’ would not hit all documents modified
in April.
— The ftcontains operator (XQFT syntax) specifies that the search term is tokenized before
searching against index.

72 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Document Processing (CPS)

Any subpath that can be searched by ftcontains should have the full-text-search attributed set to
true in the corresponding subpath configuration in indexserverconfig.xml.
• DQL operators
All string attributes are searched with the ftcontains operator in XQuery. All other attribute types
use value operators (= != < >).
• In DQL, dates are automatically normalized to UTC representation when translated to XQuery.
With IDfXQuery, it is the application’s responsibility to specify dates in UTC to match the format
in DFTXML.

Language
The language of the content plays a role in how the document is tokenized. During indexing, CPS
identifies the language of the document and uses this information for linguistic analysis. During a
query, the session locale is used as the language for linguistic analysis. If the language identified
during indexing does not match the language used during querying, different tokens can be
generated, resulting in no query results.

How to check the identified language of an indexed document — Use xPlore administrator to view
the DFTXML of a document. (Click the document in the collection view, under Data Management.)
The language is specified in the lang attribute on the dmftcontentref element. For example:
<dmftcontentref content-type="" lang="en" encoding="utf-16le" ...>

How to check the session locale of a query — Look at the xPlore log event that prints the query
string. The event includes the query-locale setting used for the query. For example:
<event timestamp...>
<message >
<![CDATA[QueryID=primary$f20cc611-14bb-41e8-8b37-2a4f1e135c70,query-locale=en,...>

How to change the session locale of a query — The session_locale attribute on a Documentum
object is automatically set based on the OS environment. You can change it per session in DFC or iAPI
in order to search for documents in a different language Use iAPI to change the session_locale:
set,c,sessionconfig,session_locale

In DFC, use IDfSession.getSessionConfig() to get the session config and use IDfTypedObject.
setString("session_locale", locale) on the session config object.

Adding dictionaries to CPS


You can create user dictionaries for words specific to an industry or application, including personal
names and foreign words. The following procedure creates a Chinese user dictionary. Use these same
steps for other supported languages.

To create a Chinese user dictionary


1. Create a UTF-8 encoded file. Each entry in the file is on a single line with the following syntax:

EMC Documentum xPlore Version 1.0 Administration Guide 73


Managing Document Processing (CPS)

word TAB part_of_speech TAB decomposition_pattern


Parts of speech: NOUN, PROPER_NOUN, PLACE, PERSON, ORGANIZATION, GIVEN_NAME,
or FOREIGN_PERSON.
Decomposition: a comma-delimited list of numbers that specify the number of characters
from word to include in each part of the compound. A value of 0 indicates no decomposition.
For example, the following entry indicates that the word should be decomposed into three
two-character sequences. The sum of the digits in the pattern must match the number of
characters in the entry.

Sequences:

The following example is decomposed into two four-character sequences:

2. Compile the dictionary.


• On Linux, the scripts are in dsearch_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bin/
ia32-glibc23–gcc34. Call chmod a+x to set permissions and then call source cpsenv.sh.
• On Windows, the scripts are in dsearch_home/dsearch/cps/cps_daemon/shared_libraries/rlp/
bin/ia32-w32-msvc71. Call build_cla_user_dictionary.exe.
3. Put the compiled dictionary into dsearch_home/cps/cps_daemon/shared_libraries/rlp/cma/dicts.
4. Edit the CLA configuration file to include the user dictionary. You add a dictionarypath element
to cla-options.xml in dsearch_home/cps/cps_daemon/shared_libraries/rlp/etc. The following
example adds a user dictionary named user_dict.bin:
<claconfig>
...
...
<dictionarypath><env name="root"/>C://cma/dicts/user_dict.bin</dictionarypath>
</claconfig>

5. To prevent a word that is also listed in a system dictionary from being decomposed, set
com.basistech.cla.favor_user_dictionary to true.

74 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 6
Managing Indexing

The indexing service receives batches of requests to index from a custom indexing client. The
index requests are passed to the content processing service, which extracts tokens for indexing and
returns them to the indexing service. You can configure all indexing parameters by choosing Global
Configuration from the System Overview panel in xPlore administrator. You can configure the
same indexing parameters on a per-instance basis by choosing Indexing Service on an instance and
then choosing Configuration.
For information on indexing troubleshooting, refer to Troubleshooting indexing, page 132.
The following topics describe common tasks in the indexing process.
• Indexing scalability, page 75
• Modifying indexes, page 76
• Viewing and configuring indexing metrics, page 80
• Managing indexing in xPlore administrator, page 81
• Chapter 6, Managing Indexing
• Chapter 4, Managing the Index Agent
For information on managing index data, such as collections and categories, libraries and storage
location, refer to Chapter 7, Managing Index Data. For information on configuring indexing
performance, refer to Indexing performance, page 164.

Indexing scalability
To scale vertically, each indexing operation is implemented using ThreadPoolExecutor in the Java 1.5
concurrent thread package. The executor spawns or terminates threads based on the request load.
You can configure the core and maximum threadpool sizes in xPlore administrator.
You can achieve horizontal scalability by adding xPlore instances and binding collections to different
instances.

EMC Documentum xPlore Version 1.0 Administration Guide 75


Managing Indexing

Modifying indexes
Modify indexes by editing indexserverconfig.xml, which is located in dsearch_home/config. By
default, Documentum content and metadata are indexed. You can tune the indexing configuration
for specific needs. Shut down the xPlore instance before applying your changes. Validate your
changes using the validation tool described in Modifying indexserverconfig.xml, page 36. Back up
the xPlore federation after you change this file. A full-text index can be created as a path-value
index with the FULL_TEXT option.
For information on creating Documentum indexes, refer to "Creating custom indexes" in Documentum
xPlore Development Guide.

Configuring text extraction


Table 7, page 76 describes the elements in indexserverconfig.xml that specify what is extracted. (Text
extraction involves identifying the primary language, stemming or lemmatization, and tokenizing
content for indexing.) The paths in this configuration file are in XPath syntax and refer to the path
within the DFTXML representation of the object. (For information on DFTXML, refer to Appendix B,
Extensible Documentum DTD.) Specify an XPath value to the element whose content requires text
extraction for indexing. You can configure compression and how XML content is handled. Excluding
content from extraction shrinks the index footprint and speeds up ingestion.

Table 7. Extraction configuration options

Option Description
do-text-extraction Contains one or more for-element-with-name
elements that define content or metadata that
should be extracted for indexing.
for-element-with-name Specifies the names of elements that set
tokenization and handling of embedded XML.
for-element-with-name/xml-content When a document to be indexed contains XML
content, you must specify how that content
should be handled. It can be tokenized or not
(tokenize=”true | false"). It can be stored within
the input document or separately (store="embed
| separate | none"). Separate storage is not
supported for this release.
for-element-with-name/save-tokens-for- Sets tokenization of content in specific elements
summary-processing for summaries, for example, dmftcontentref
(content of a Documentum document). Specify
the maximum size of documents in bytes as the
value of the attribute extract-text-size-less-than.
Tokens will not be saved for larger content. Set
the maximum size of tokens for the element as
the value of the attribute token-size.

76 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Indexing

Option Description
xml-content on-embed-error You can specify how to handle parsing errors
when the on-embed-error attribute is set to true.
Handles errors such as syntax or external entity
access. Valid values: embed_as_cdata | ignore
| fail. The option embed_as_cdata stores the
entire XML content as a CData sub-node of the
specified node. The ignore option does not store
the XML content. For the fail option, content is
not searchable.
xml-content index-as-sub-path Boolean parameter that specifies whether
the path is stored with XML content when
xml-content embed attribute is set to true.
xml-content file-limit Sets the maximum size of embedded XML
content.
compress Compresses the text value of specified elements
to save storage space. Compressed content
is about 30% of submitted XML content.
Compression may slow the ingestion rate by
10-20%.
compress/for-element Using XPath notation, specifies the XML node
of the input document that contains text values
to be compressed.

Defining an index
Indexes are configured within an indexes element. (The path is category-definitions.category.indexes.)
Four types of indexes can be configured: fulltext-index, value-index, path-value index, and multi-path
index.
By default, multi-path indexes do not have all content indexed. If an element does not match a
configuration option, it is not indexed. To index all element content in a multi-path index, add a
sub-path element on //*. For example, to index all metadata content, use the path dmftmetadata//*.
The following child elements of node.indexes.index define an index.

EMC Documentum xPlore Version 1.0 Administration Guide 77


Managing Indexing

Table 8. Index definition options

Index option Description


path-value-index The options attribute of this element
specifies a comma-delimited string
of xDB options: GET_ALL_TEXT
(indexed by its string value including
descendant nodes)| SUPPORT_PHRASES
(optimizes for phrase search and
increases index size) | NO_LOGGING
(turns off xDB transaction logging) |
INCLUDE_START_END_TOKEN_FLAGS
(stores position information) | CONCURRENT
(index is not locked)

The path attribute specifies the path to an


attribute that should be indexed. The path
attribute contains an XPath notation to a path
within the input document and options for the
IndexServerAnalyzer. The symbols < and > must
be escaped.
path-value-index/sub-path Specifies the path to an element for which
the path information should be saved
with the indexed value. Applies only
to path-value-indexes that contain the
IndexServerAnalyzer option INDEX_PATHS.
Increases index size while enhancing
performance. Sub-path indexes must be
configured to support Documentum facets.
Refer to Documentum xPlore Development Guide
for information on facets. Refer to Table 9, page
79 for a comparison of path-value indexes with
and without sub-paths.
sub-path attributes boost-value: Increases the score for hits on the
subpath metadata by a multiplier. Default: 1.
(continued) compress: Boolean, specifies whether the
content should be compressed.
(continued) enumerate-repeating-elements: Boolean,
specifies whether the position of the element
in the path should be indexed. Used for
correlated repeating attributes, for example,
media objects with prop_name=’dimension’ and
prop_value=’800x600’,’blue’.

78 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Indexing

Index option Description


(continued) full-text-search: Specifies whether the sub-path
content should be tokenized. Set to true if
the tokens will be queried. If false, you do
not need to duplicate this information in the
no-tokenization element. This exclusion reduces
the binary index size. Also, when excluded
elements in a document are modified, the
full-text index does not need to be updated.
(continued) include-descendants: Boolean. Default: false. If
true, the token for this instance will have a copy
of all descendant tokens, speeding up queries.
Cost of this option: Lowers the indexing rate and
increases disk space. Use for nodes with many
small descendant nodes, such as Documentum
dmftmetadata.
(continued) leading-wildcard: Boolean, specifies whether
the subpath supports leading wildcard searches.
Default: false.
(continued) path: Path to element relative to the index path.
(continued) returning-contents: Boolean, specifies whether
the indexed value will be returned. For example,
the user may search for documents with specific
characteristics in a certain repository folder path,
but the folder path does not need to be returned.
Used for facets only. Refer to Documentum xPlore
Development Guide for information on facets.
(continued) type: type of content in sub-path. Valid values:
string | integer | boolean | double | date |
datetime. Supports XQuery typed expressions
such as date range or boolean value.
(continued) value-comparison: Boolean, specifies that the
value in this path should be indexed. Use for
comparisons such as =, >, <, starts-with. Value
indexing requires additional storage, so you
should not index fields that will not be searched
for as comparisons or starts-with.

Note: If tokenization is excluded for a specific attribute, search term matches for xPlore and FAST
return a different number of results. FAST indexes all attributes.

Table 9. Path-value index with and without subpaths

Feature Without sub-path With sub-path


Key set combinations Limited Flexible

EMC Documentum xPlore Version 1.0 Administration Guide 79


Managing Indexing

Feature Without sub-path With sub-path


Single key query latency Low High (performs better with
complex predicates)
ftcontains (full-text) Single per probe Supports multiple ftcontains in
a probe
Updates Low overhead High overhead
Returnable (covering) values Yes No

Modifying subpaths
A subpath definition in indexserverconfig.xml specifies the path to an element for which the path
information should be saved with the indexed value. A subpath increases index size while enhancing
performance. For most Documentum applications, you do not need to modify the definitions of the
subpath indexes, except for the following use cases:
• Add facet values to be stored in the index.
• Add paths for dmftcustom area elements.
• Add paths for XQuery of XML content.
• Modifying the capabilities of existing subpaths, such as supporting leading wildcard searches
for certain paths.
For these use cases, refer to Defining an index, page 77.

Configuring indexing depth


Only the leaf (last node) text values from subelements of an XML node with implicit composite
indexes are returned. You can configure indexing to return all node values instead of the leaf
node value. (This change will negatively impact performance.) To configure, set the value of
index-value-leaf-node-only in the index-plugin element of indexserverconfig.xml to false. (This file is
located in dsearch_home/config. Shut down the xPlore instance before applying your changes. Validate
your changes using the validation tool described in Modifying indexserverconfig.xml, page 36.) Back
up the xPlore federation after you change this file.
A change to indexing depth requires reindexing.

Viewing and configuring indexing metrics


The following topics describe how to view and configure indexing metrics.

80 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Indexing

Viewing indexing metrics


The following statistics are gathered for indexing. They are displayed in xPlore administrator when
you choose Indexing Service for an instance:
• Task achievement:
— Count of successful requests to index
— Failure count
— Pending count
— Canceled requests count
• Performance
— Documents indexed per second
— Bytes indexed per second
• Document statistics
— Total number of documents added
— Total number of documents updated
— Total number of documents deleted
— Total size of content
— Total size of tokens
Click Configuration to enable or disable the indexing service on this instance.

Configuring indexing metrics


You can enable or disable indexing metrics by editing indexserverconfig.xml. (This file is located in
dsearch_home/config. Shut down the xPlore instance before applying your changes. Validate your
changes using the validation tool described in Modifying indexserverconfig.xml, page 36.) Back up
the xPlore federation after you change this file.
By default, indexing metrics are enabled. To disable system metrics, change the value of the enabled
attribute on the INDEX task to false:
<system-metrics-service enable="true">
...
<task name="INDEX" enable="false" interval="3600">...

The interval at which metrics are saved is configurable by setting the interval attribute value. (The
unit is seconds.) The interval after which metrics are purged is configured as the value of the
delete-older-than property. The default is a purge every 90 days.

Managing indexing in xPlore administrator


You can perform the following administrative tasks in xPlore administrator.
• View indexing statistics.

EMC Documentum xPlore Version 1.0 Administration Guide 81


Managing Indexing

Expand an instance in the tree and choose Indexing Service. Statistics are displayed in the right
panel: tasks completed, with a breakdown by document properties, and performance.
• Configure indexing across all instances.
Expand Services > Indexing Service in the tree. Click Configuration. You can configure the
various options described in Document processing and indexing service settings, page 175. The
default values have been optimized for most environments.
• Start or stop indexing
To start or stop indexing, select an instance in the tree and choose Indexing Service. Click Enable
or Disable.
• View the indexing queue
To view the queue, expand an instance in the tree and choose Indexing Service. The queue is
displayed. You can cancel any indexing batch requests in the queue.
Note: This queue is not the same as the index agent queue. You can view the index agent queue in
the index agent UI or in Documentum administrator.

82 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 7
Managing Index Data

The following topics describe index data management:


• Configuring categories, page 83
• Managing categories, page 84
• Planning collections for scalability, page 84
• Viewing and configuring collections, page 85
• Managing storage locations, page 87
• Troubleshooting xDB, page 87
• Database performance, page 88
For information on domains, which is a system-wide task, refer to Managing domains, page 39.

Configuring categories
A category defines a class of documents and their XML structure. The category is defined in
indexserverconfig.xml and specifies the processing and semantics that are applied to the ingested
XML document. You can specify the XML elements that have text extraction, tokenization, and
storage of tokens. You also specify the indexes that are defined on the category and the XML elements
that are not indexed. More than one collection can map to a category. xPlore manages categories.
Table 10, page 83 describes the options that can be configured for each category. Categories are
defined and configured in indexserverconfig.xml, which is located in dsearch_home/config. Shut
down all xPlore instances before changing this file. Validate your changes using the validation tool
described in Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you
change this file. The paths in this configuration file are in XPath syntax and refer to the path within
the XML representation of the document. (All documents are submitted for ingestion in an XML
representation.) Specify an XPath value to the element whose content requires text extraction for
indexing.

Table 10. Category configuration options

Option Description
category-definitions Contains one or more category elements.

EMC Documentum xPlore Version 1.0 Administration Guide 83


Managing Index Data

Option Description
category Contains elements that govern category
indexing.
properties/property track-location Specifies whether to track the location (index
name) of the content in this category. For
Documentum DFTXML representations of
documents, the location is tracked in the tracking
DB. Documentum ACLs and groups are not
tracked because their index location is known.

Managing categories
Categories are defined in indexserverconfig.xml. Refer to Configuring categories, page 83 for
more information. When you create a collection, choose a category from the categories defined in
indexserverconfig.xml.
When you view the configuration of a collection, you see the assigned category. It cannot be changed
in xPlore administrator. To change the category, edit indexserverconfig.xml, which is located in
dsearch_home/config. Shut down all xPlore instances before changing this file. Validate your changes
using the validation tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore
federation after you change this file.
The indexes, text extraction settings, and compression setting for each category are also defined in
indexserverconfig.xml. For information on configuring these settings, refer to Modifying indexes,
page 76.

Planning collections for scalability


A collection is a logical grouping of tokenized content and associated full-text indexes within a
domain. For example, you have a collection that indexes all email documents. A collection contains
documents of a single category. There is generally a one-to-one mapping between a category and a
collection. A document category definition in indexserverconfig.xml describes what is indexed within
the documents that are submitted for indexing. This description declares the elements of a document
that are content references and the elements that are to be full-text indexed.
For Documentum environments, the Documentum index agent creates a domain for each source
repository, and documents are indexed to collections within that domain.
Specify the target collection for documents using one of the following methods. They have the
following order of precedence in xPlore, with highest first: custom routing class, API indexing
option that specifies the collection, or default collection. When the target collection is not specified,
documents are indexed to collections in round-robin order.The documents are passed to the instance
with the specified collection attached in index or index and search state. If the target collection is not
specified, a target instance and its default collection are selected in round-robin order.
You can route a collection to high-speed storage for ingestion. As the data becomes less in demand,
you can detach the collection and move it to low-cost storage.

84 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Index Data

Viewing and configuring collections


To view the collections for a domain, choose Data Management and then choose the domain the left
pane. In the right pane, you see each collection name, category, usage, state, and instances that the
collection is bound to. There is a red X next to the collection to delete it. For xPlore system collections,
the X is grayed out, and the collection cannot be deleted.

Viewing collection contents


To view the contents of a specific collection, choose Data Management and drill down to the
collection in the left pane. In the content pane, you see the following information about the collection:
• Library path in xDB
For more information, refer to xDB libraries, page 19.
• Document category
For more information on categories, refer to Categories, page 22.
• xPlore instances the collection is bound to
For more information, refer to Configuring collections, page 86.
• State
Valid states: index and search, index only, or search only. For more information, refer to Configuring
collections, page 86.
• Usage
Type of xDB library. Valid types: data (index), ApplicationInfo and SystemInfo.
• Current size on disk, in KB

Adding a collection
Choose a domain and then choose New collection to create a collection. After you have created the
collection, you can change collection state in the Configuration menu. You can set the following
properties for a new collection:
• Collection name
• Parent domain
• Usage: Type of xDB library. Valid types: data (index) or applicationinfo.
• Document category: Categories are defined in indexserverconfig.xml.
• Binding instance: Existing instances are listed.
To change the binding of a collection, refer to Configuring collections, page 86.
• Storage location: Choose a storage location from the dropdown list. To define a storage location,
refer to Managing storage locations, page 87.

EMC Documentum xPlore Version 1.0 Administration Guide 85


Managing Index Data

Note: There is no default collection for secondary instances of xPlore. Create a collection in the
domain and then bind it to the secondary instance before indexing documents into the secondary
instance.

Deleting a collection
Choose a domain and then click X next to the collection you wish to delete. A collection must have
the state index_and_search or index_only to be deleted. Collections with the state search_only or off_line
cannot be deleted in xPlore administrator. To delete these collections, use the xDB admin tool.
Note: When you remove a collection, the data is not deleted from xDB.

Configuring collections
You can configure the following properties on a collection. Select a collection and then choose
Configuration. The Edit collection screen displays the collection name, parent domain, usage,
state, binding instance, and storage location.
• State: index and search, index only, or search only.
You can attach a collection in search only (read-only) mode to multiple instances for query load
balancing and scalability. You can set a collection to index only to repair the index.
Note: Users and administrators cannot query a collection that is set to index only state.
• Binding instance: Existing xPlore instances are listed. To change binding, first detach the
collection from its current instance. To change the binding on a failed instance, you must first
restore the collection to the same instance or to a spare instance.
Note: You cannot change the binding of a subcollection to a different instance from the parent
collection.
Choose a collection in the Data Management tree. In the right pane, click Detach. Then click
Configuration to change the binding. A collection with the state index_and_search or index only
can be bound to only one instance. When the collection state is search_only, the collection can be
bound to multiple instances.
To remove a binding, set the state of the collection to search_only. If a binding instance is
unreachable, you cannot edit the binding.
• Storage location
To set up storage locations, refer to Managing storage locations, page 87.
You can perform the following actions on a collection:
• Attach or detach a collection.
A collection can be attached to one instance in index and search state (read-write) and to multiple
instances in search-only (read) state. To move or delete a collection, change the collection state to
search_only and then detach it. Choose a collection in the Data Management tree. In the right
pane, click Attach or Detach.
• Back up collections.

86 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Index Data

Choose a collection in the Data Management tree. Set the collection state to off_line. In the right
pane, click Backup.
You can specify the backup location path in indexserverconfig.xml. Shut down all xPlore instances
before changing this file. Edit the path attribute of the element admin-config/backup-location-path
with the path to your desired location. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.
• View a list of documents in a collection.
Choose a collection in the Data Management tree. You can filter the list of indexed documents to
see whether a particular document was indexed. Click Name for an individual document to view
the XML content of the document.
• Query a collection.
Choose a collection in the Data Management tree. In the right pane, click Execute XQuery. Check
Get query debug to debug your query. The query optimizer is for technical support use.
• Restore a collection. Refer to the procedure To restore a collection, page 94.

Managing storage locations


xPlore stores data and indexes in an xDB database. The index for each collection can be routed to
a specific storage location.

Adding a storage location — To add a storage location using xPlore administrator, choose System
Overview in the tree. Click Global Configuration and then choose the Storage Management tab.
Click Add Storage. Enter a name and path and save the storage location. The storage location is
created with unlimited size.
After you create a storage location, you can select it when you create a domain or a new collection.
For a collection, you have the option of choosing a storage location different from the storage location
of the domain.

Troubleshooting xDB
If xDB fails to start up, you can force a start. Set the value of force-restart-xdb in
indexserver-bootstrap.properties to true. (This file is located in the WEB-INF/classes directory
of the application server instance, for example, C:\xPlore\jboss4.3.0\server\DctmServer_
PrimaryDsearch\deploy\dsearch.war\WEB-INF\classes. Restart the xPlore instance.
If this property does not exist in indexserver-bootstrap.properties, add the following line:
force-restart-xdb=true

This property will be removed after restart.

Caution: If you remove segments from xDB, your backups cannot be restored.

EMC Documentum xPlore Version 1.0 Administration Guide 87


Managing Index Data

Database performance
To view xDB performance statistics, choose Data Management in the left panel and then choose
View DB Statistics.

88 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 8
Backup and Restore

You must back up a domain or xPlore federation after you make xPlore environment changes such as
adding or deleting a collection or changing a collection binding. If you do not back up, then a restore
of the domain or xPlore federation will put the system in an inconsistent state. Perform all your
anticipated configuration changes before performing a full federation backup.
Before backup and after restore, perform a database consistency check. Select Data Management
in xPlore Administrator and then choose Check DB Consistency. This check determines whether
there are any corrupted or missing files such as configuration files or Lucene indexes. Lucene
indexes are checked to see whether they are consistent with the xDB records: tree segments, xDB
page owners, and xDB DOM nodes.
The following topics describe backup and restore procedures:
• Backup and high availability configurations, page 89
• Handling data corruption, page 91
• Rebuilding indexes, page 92
• Native xPlore backup and restore, page 92
• Snapshot (volume-based) backup and restore, page 95
• File-based backup and restore, page 96
• Scripted backup and restore utilities, page 96
For more detailed information on planning your backup, recovery, and high availability environment,
refer to Documentum xPlore Deployment Guide.

Backup and high availability configurations


Plan your backup and restore procedures based on your specific RTO need (recovery time objective).
You can perform simple federation, domain, or collection backups using xPlore administrator or use
your preferred volume-based or file-based backup technologies. High availability and disaster
recovery planning is described in Documentum xPlore Deployment Guide.
There are three levels within the system at which you can perform backup and restore. Each level
supports a backup scope (full or cumulative) and disaster recovery technology (xDB, volume, or file):
• Collection

EMC Documentum xPlore Version 1.0 Administration Guide 89


Backup and Restore

Using xPlore administrator, you can back up large collections, separating older and newer data
backups.
• Domain
Using xPlore administrator, you can restore the index of a single document source, such as a
Documentum repository, across all xPlore instances.
• Complete xPlore system (federation)
You can back up all xPlore collections using xPlore administrator, volume-based, or file-based
technologies.
All restore operations are performed off-line.

Caution: Before creating a backup, and after a restore operation, run the database consistency
checker (refer to Check database consistency, page 40).

Backup technologies — xPlore supports the following backup technologies:


• Native xDB backups
Can be incremental, cumulative (differential) or full, hot (while running), warm (search only), or
cold (off-line). A cumulative backup has all backups since the last full backup. Back up xPlore
federation, domain, or collection. These are performed through xPlore administrator. Refer to
Native xPlore backup and restore, page 92.
• File-based backups
Backup of xPlore federation directory dsearch_home/data, dsearch_home/config and /dblog files.
Backup is warm or cold.
• Volume-based (snapshot) backups
Can be cumulative or full backup of disk blocks, requiring third-party product such as EMC
Timefinder. Backup is warm or cold.
Note: Incremental file-based backups are generally not recommended, since most files are touched
when they are opened. In addition, Windows file-based backup software requires exclusive access to
a file during backup, requiring a cold backup.
External automatic backup products such as EMC Networker are also supported. All backup and
restore commands are available as command-line interfaces (CLI) for scripting. Refer to Scripted
backup and restore utilities, page 96.
Table 11, page 90 describes the differences between supported backup combinations. Periodic full
backups are recommended in addition to differential backups.

Table 11. Backup scenarios

Level Backup type DR technology Backup scope


collection warm xDB full only
domain warm xDB full only

90 EMC Documentum xPlore Version 1.0 Administration Guide


Backup and Restore

Level Backup type DR technology Backup scope


xPlore federation warm or hot xDB full, incremental or
cumulative
cold or warm volume*
full or cumulative
cold or warm file*
full only

Note: *For file-based and volume-based backups, back up the following files on each instance:
indexserverconfig.xml file, the xDB transaction log files in dsearch_home/dblog, and the database and
index files in dsearch_home/data. Each xPlore instance has one or more domains and a single xDB
transaction log for instance data recovery. Back up each instance in a multi-instance environment to a
single file, then restore the instance from this file.

Caution: If you remove segments from xDB, your backups cannot be restored.

Handling data corruption


You can detect data corruption in the following ways:
• An XhiveDataCorruptionExceptio is reported in the xPlore server log.
• Run the consistency checker on the xPlore federation (refer to Check database consistency,
page 40).
Mark the offending collection as corrupted by setting its state to off_line. Choose the collection in the
tree, click Configuration, and then set the state.
A collection that is corrupted or unusable cannot be queried. It is silently skipped during query
processing.

Corrupt collection or database redo log — If a specific collection, or the database redo log, is
reported as corrupted on server startup, you have one of two options:
• Restore the federation, domain, or collection from a previous backup.
• Force the server to start up. The offending collection and its index will be marked as unusable
and its update operations will be ignored. Refer to Troubleshooting xDB, page 87. You can then
restore the corrupted collection or log.

Corrupt domain — If a domain index is corrupt, use xPlore administrator to set the domain mode
to maintenance. (You can also use the CLI dsearch-set-domain-mode or the API setDomainMode to
set the mode to maintenance.) In maintenance mode, the only allowed operations are repair and
consistency check. Queries are allowed only from xPlore administrator. Queries from a Documentum
client will be tried as NOFTDQL in the Content Server but will not be processed by xPlore.
Use xPlore administrator to detach the corrupted domain. To restore the domain, refer to To restore a
domain, page 93. When xPlore is restarted, the domain mode is always set to normal (maintenance
mode is not persisted to disk).

EMC Documentum xPlore Version 1.0 Administration Guide 91


Backup and Restore

Recovering from a system crash — Figure 11, page 92 diagrams a typical workflow that responds to
a system crash:

Figure 11. System crash decision tree

Notes:
1. Verifying index migration with ftintegrity, page 122.
2. Refer to Chapter 8, Backup and Restore.
3. Refer to Rebuilding indexes, page 92.

Rebuilding indexes
Indexes for an xPlore federation must be rebuilt by a cleanup and reingestion process. Perform the
following steps to remove and clean up all indexes:
1. Shut down all xPlore instances.
2. Delete everything under dsearch_home/data.
3. Delete everything under dsearch_home/config except the file indexserverconfig.xml.
4. Start xPlore instances.
5. Re-feed the documents.

Native xPlore backup and restore


After you change the federation or domain structure such as adding or deleting a collection or
changing a collection binding,back up the xPlore federation or domain. If you do not back up, then a
restore of the domain or xPlore federation will put the system in an inconsistent state. Perform all
your anticipated configuration changes before performing a full federation backup.
Use xPlore administrator to back up the system. Select the level for your backup:
• xPlore federation
Choose Data Management in the tree and then click Backup.
• Domain
Choose Data Management in the tree, highlight the domain, and then click Backup.
• Collection
Choose Data Management in the tree, highlight the collection, and then click Backup.

Scripted backup and restore — The CLI for backup is dsearch-backup (federation, collection,
domain). Refer to Scripted backup and restore utilities, page 96. xPlore supports offline restore
only. The xPlore server must be shut down to restore a collection or an xPlore federation. If you are

92 EMC Documentum xPlore Version 1.0 Administration Guide


Backup and Restore

restoring a full backup and an incremental backup, perform both restore procedures before restarting
the xPlore instances.

Incremental backups — By default, log files are deleted at each backup. For incremental backups,
change this setting before a full backup using the xDB admin tool. In the menu, choose Federation >
Change keep-log-file option. Enter the xPlore administrator password and check Keep log files.
When you change this setting, the log file from the full backup will not be deleted at the next
incremental backup.

To restore an xPlore federation


This procedure assumes that no system changes (new or deleted collections, changed bindings) have
occurred since backup. (You must do a full federation backup every time you make configuration
changes to the xPlore environment.)
1. Shut down all xPlore instances.
2. Clean up all existing data files:
• Delete everything under dsearch_home/data.
• Delete everything under dsearch_home/config except the file indexserverconfig.xml.
3. Run the data restore tool dsearch-restore.bat (Windows) or dsearch-restore.sh (Linux). This
executable is located in dsearch_home/dsearch/restore. The argument bootstrap_file is optional. The
argument backup-directory is optional: The default location is specified in indexserverconfig.xml.
Usage:
dsearch-restore federation backup_directory bootstrap_file

Caution: If you are restoring a full backup and an incremental backup, restore both before
restarting xPlore instances.
If you are restoring a federation and a collection, do the following:
1. Restore the federation.
2. Start up and shut down xPlore.
3. Restore the collection.
4. Restart the xPlore instances.

To restore a domain
This procedure replaces the index data with a backup copy. This procedure assumes that no system
changes (new or deleted collections, changed bindings) have occurred since backup. (Always back up
the xPlore federation after you change the xPlore environment.)
Backup-directory is optional. The default location is specified in indexserverconfig.xml.
1. Force-detach the domain using xPlore administrator. If you are scripting backup and restore,
use the CLI search-force-detach. The type argument is domain.
dsearch-force-detach type hostname port domain-name

2. Generate the orphaned segment list. Use the CLI dsearch-list-orphaned-segments to list the segments
that will be orphaned after a restore operation. If an orphaned segment file is not specified, the
IDs of orphaned segments are sent to stdio.

EMC Documentum xPlore Version 1.0 Administration Guide 93


Backup and Restore

dsearch-list-orphaned-segments collection|domain backup-directory


host port [orphaned-segment-file]

3. Shut down all xPlore instances.


4. Run the data restore tool dsearch-restore.bat (Windows) or dsearch-restore.sh (Linux). This
executable is located in dsearch_home/dsearch/restore. The default backup root directory
is recorded in indexserverconfig.xml as the value of the path attribute on the element
admin-config.backup-location. The argument bootstrap_file is optional.
Usage:
dsearch-restore domain backup_directory bootstrap_file
For example (on a single line):
dsearch-restore domain C:\xPlore\dsearch\backup\DSS_LH1\2010-05-05-00-55-52

5. Start xPlore instances.


6. If orphaned segments were reported before restore, run the purge segment utility to purge
those segments. Specify the absolute path to the file generated by dsearch-list-orphaned-segments
as orphaned-segment-file. If an orphaned segment file is not specified, the orphaned segment IDs
are read from stdin.
dsearch-purge-orphaned-segments [orphaned-segment-file]

7. Force the domain to attach. The type argument is domain.


dsearch-force-attach type hostname port domain-name

8. Perform a consistency check and test search. Select Data Management in xPlore Administrator
and then choose Check DB Consistency.
9. Set the domain to normal mode using xPlore administrator.
10. (Documentum environment) Run the ACL and group replication script to update any
changes since the backup. The script aclreplication_for_repositoryname.bat or .sh is located in
dsearch_home/setup/indexagent/tools. Edit the script before you run it to set the password and
(optional) xPlore domain .
11. (Documentum environment) Run ftintegrity. (Refer to Verifying index migration with ftintegrity,
page 122.)

To restore a collection
This procedure replaces the index data with a backup copy. This procedure assumes that no system
changes (new or deleted collections, changed bindings) have occurred since backup. Back up the
xPlore federation after you change the xPlore environment.
1. Set the collection to off_line using xPlore administrator. Select the collection and click
Configuration.
2. Force-detach the collection using xPlore administrator. If you are scripting backup and restore,
use the CLI dsearch-force-detach. The type argument value is collection.
dsearch-force-detach type hostname port domain-name collection-name

3. Shut down all xPlore instances.


4. Run the data restore tool dsearch-restore.bat (Windows) or dsearch-restore.sh (Linux). This
executable is located in dsearch_home/dsearch/restore. The default backup root directory

94 EMC Documentum xPlore Version 1.0 Administration Guide


Backup and Restore

is recorded in indexserverconfig.xml as the value of the path attribute on the element


admin-config.backup-location. The argument bootstrap_file is optional.
Usage:
dsearch-restore collection backup_directory bootstrap_file
For example (on a single line):
dsearch-restore collection C:\xPlore\dsearch\backup\DSS_LH1\group\2010-05-05-00-55-52

5. Start all xPlore instances.


6. Force the collection to attach. The type argument is collection
dsearch-force-attach type hostname port domain-name collection-name

7. Perform a consistency check and test search. Select Data Management in xPlore Administrator
and then choose Check DB Consistency.
8. (Documentum environment) Run the ACL and group replication script to update any
changes since the backup. The script aclreplication_for_repositoryname.bat or .sh is located in
dsearch_home/setup/indexagent/tools. Edit the script before you run it to set the repository
name, repository user, password, xPlore primary instance host, xPlore port, and xPlore domain
(optional).
9. (Documentum environment) Run ftintegrity. (Refer to Verifying index migration with ftintegrity,
page 122.)

Snapshot (volume-based) backup and restore


Domain or collection backup by snapshot is not supported. You can back up the xPlore federation.
Requirements for snapshot backup and restore:
• Data files are on a single volume: All domain collections are on a single volume.
• Indexserverconfig.xml is not in the same path as the domain or collection you are backing up.
This file must be backed up separately from the volume backup.

To back up or restore a federation with a snapshot


This procedure assumes that no system changes (new or deleted collections, changed bindings) have
occurred since backup. Perform all your anticipated environment changes before backup. Make sure
you have sufficient disk space for the backup and for temporary space (twice the present index size).
1. Suspend ingestion for backup or restore:
a. Navigate to dsearch_home/dsearch/xhive/admin.
b. Launch the command line tool with the following command. You supply the administrator
password (same as xPlore administrator).
XHCommand suspend-diskwrites

2. Set all domains to the read_only state. The script to turn off indexing is described in Turning off
indexing or changing state, page 97.
3. Use your third-party backup software to back up or restore the system.
4. Resume xDB with the following command:

EMC Documentum xPlore Version 1.0 Administration Guide 95


Backup and Restore

XHCommand suspend-diskwrites --resume

5. Set all domains to the reset state and then turn on indexing. (This state is not displayed anywhere
in xPlore administrator and is used only for the backup and restore utilities.) The script to turn on
indexing is described in Turning off indexing or changing state, page 97.

File-based backup and restore


File-based domain or collection backup is not supported. You can back up the xPlore federation.
Perform all your anticipated environment changes before backup.

To back up or restore a federation with a file


1. Suspend ingestion:
a. Navigate to dsearch_home/dsearch/xhive/admin.
b. Launch the command line tool with the following command. You supply the administrator
password (same as xPlore administrator).
XHCommand suspend-diskwrites

2. Set all domains to the read_only state. The script to turn off indexing is described in Turning off
indexing or changing state, page 97.
3. Use your third-party backup software to back up or restore the system.
4. Resume xDB with the following command:
XHCommand suspend-diskwrites --resume
If you restore without purging orphaned segments, the xPlore primary instance may not start up.
In this case, you can force an xDB restart. Refer to Troubleshooting xDB, page 87.
5. Set all domains to the reset state. (This state is not displayed anywhere in xPlore administrator
and is used only for the backup and restore utilities.) The script to turn on indexing is described
in Turning off indexing or changing state, page 97.

Scripted backup and restore utilities


Backup and restore utilities *.bat (for Windows) and *.sh (for Linux and UNIX) are installed in
dsearch_home/dsearch/restore on the primary instance. All commands also have Java API equivalents.
(For information on the APIs, refer to Documentum xPlore Development Guide.) Use the following
backup and restore utilities in your scripting programs. If you run a script, run as the same
administrator user who started the instance.

Caution: Scripts require stdin input and cannot be executed in a pipeline.

In the following utlities, backup-directory is optional. The default location is specified in


indexserverconfig.xml.

96 EMC Documentum xPlore Version 1.0 Administration Guide


Backup and Restore

Turning off indexing or changing state


dsearch_home/dsearch/restore on the primary instance. Use the script dsearch-set-state to set the state of
a collection or domain.
The syntax to set domain state is the following. Valid states are: read_only or reset.
dsearch-set-state domain host port domain_name state

For example:
dsearch-set-state domain localhost 9300 defaultDomain read_only
dsearch-set-state domain localhost 9300 defaultDomain reset

The syntax to set collection state is the following. Valid states are: index_only, search_only,
index_and_search, or off_line.
dsearch-set-state collection host port domain_name collection_name state

For example:
dsearch-set-state collection localhost 9300 defaultDomain default1 search_only
dsearch-set-state collection localhost 9300 defaultDomain default1 index_and_search
dsearch-set-state collection localhost 9300 defaultDomain default1 index_only
dsearch-set-state collection localhost 9300 defaultDomain default1 off_line

Backup utilities
In the following utilities, backup-directory is optional. The default location is specified in
indexserverconfig.xml.

To back up a federation — Navigate to dsearch_home/dsearch/xhive/admin. Use the


dsearch-backup tool with the following syntax. The backup-type can be either full or incremental.
backup-directory is the location for backup files.
dsearch-backup federation hostname port backup-type [backup-directory]

For example:
dsearch-backup federation localhost 9300 full c:/xPlore/backup

To back up a domain — Turn off indexing by setting the domain state to read_only. (Refer to Turning
off indexing or changing state, page 97.) Then use the dsearch-backup tool with the following syntax.
dsearch-backup domain hostname port domain-name [backup-directory]

Turn indexing back on by setting the domain state to reset.

To back up a collection — Set the collection state to search_only. (Refer to Turning off indexing or
changing state, page 97.) Then use the dsearch-backup tool with the following syntax (on a single
line).
dsearch-backup collection hostname port domain-name
collection-name [backup-directory]

Turn indexing back on by setting the collection state to index_and_search.


Note: An off_line collection cannot be backed up.

EMC Documentum xPlore Version 1.0 Administration Guide 97


Backup and Restore

Purging orphaned segments


When you back up and then restore a domain or collection, some database segments may be
orphaned in the process. For example, if new collections are added after backup, and the parent
collection or federation is restored, the database segments used for the new collections are orphaned.
Note: When you restore a federation, all segments are wiped, so these utilities are not needed.
If you restore without purging orphaned segments, the xPlore primary instance may not start up. In
this case, you can force an xDB restart. Refer to Troubleshooting xDB, page 87.
Use the CLI dsearch-list-orphaned-segments to list the segments that will be orphaned after a restore
operation. If an orphaned segment file is not specified, the IDs of orphaned segments are sent to
stdout. Backup-directory is optional. The default location is specified in indexserverconfig.xml.
dsearch-list-orphaned-segments collection|domain backup-directory
host port [orphaned-segment-file]
For example:
dsearch-list-orphaned-segments collection c:/1Domain/collA/2009-10-29 localhost 9300
c:/tmp/orphan.txt"

After restoring, run the CLI dsearch-purge-orphaned-segments to purge those segments. Specify the
absolute path to the file generated by dsearch-list-orphaned-segments as orphaned-segment-file. If an
orphaned segment file is not specified, the orphaned segment IDs are read from stdin.
dsearch-purge-orphaned-segments [orphaned-segment-file]

Restore utilities
Restoring a federation, domain, or collection — For a domain or collection, be sure to purge
orphaned segments before a restore operation. (Refer to Purging orphaned segments, page 98.) Then
follow the steps described in Native xPlore backup and restore, page 92.
Note: If you restore without purging orphaned segments, the xPlore primary instance may not start
up. In this case, you can force an xDB restart. Refer to Troubleshooting xDB, page 87.

98 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 9
Managing Searches

The search service receives queries from a search client in the form of XQuery statements. The query
is submitted to the Lucene
You can configure all search service parameters by choosing Global Configuration from the System
Overview panel in xPlore administrator. You can configure the same search service parameters on a
per-instance basis by choosing Search Service on an instance and then choosing Configuration.

Enabling or disabling search on an instance — You can enable or disable search by choosing an
instance of the search service in the left pane of the administrator. Click Disable (or Enable).

Canceling running queries — You can view a list of individual queries and cancel individual
queries. Choose an instance of the search service in the left pane of the administrator.
For information on query troubleshooting, refer to Troubleshooting search, page 136.
The following topics describe search management:
• Configuring search, page 99
• Viewing search statistics, page 100
• Configuring scoring and freshness, page 100
• Configuring query summary and highlighting, page 101
• Auditing queries, page 103
• Documentum Search, page 105

Configuring search
You can configure parameters for the search service in xPlore administrator. The default values have
been optimized for most environments. For details, refer to Search service settings, page 177.
The stop words list for each language is located in dsearch_home/dsearch/cps/cps_daemon/shared
libraries/rlp/etc. Some languages do not require a stop words list. Editing the stop words list is
not supported in this release.

EMC Documentum xPlore Version 1.0 Administration Guide 99


Managing Searches

Viewing search statistics


Use xPlore Administrator to view the following statistics. Choose Search Service and click an
instance. :
• Accumulated number of executed queries
• Number of failed queries
• Number of pending queries
• Number of spooled queries
• Number of execution plans
• Number of streamed results
• Maximum query result batch size
• Total result bytes
• Maximum hits returned by a query
• Average number of results per query request
• Maximum query execution time
• Average duration of query execution
• Queries per second
Click Operations to enable or disable the search service on this instance.
Use reports for additional information about queries. Query auditing must be turned on to
accumulate the report data. (It is off by default, to save disk space.) Refer to Chapter 11, Using
Reports for more information.

Configuring scoring and freshness


xPlore uses the following ranking principles to score query results:
• If a search term appears more than once in a document, the hit is ranked higher than a document
in which the term occurs only once.
• If a search term appears in the metadata, the hit is ranked higher than when the term occurs
in the content.
• When the search criteria are linked with OR, a document with both terms is ranked higher than a
document with one term.
• If the search spans multiple instances, the results are merged based on score.
Results scores can be changed by boosting subpath metadata and by freshness of source documents.
Make these changes before you index documents. To apply changes to an existing index, you must
reindex your documents. Implement these scoring changes by editing indexserverconfig.xml, located
in dsearch_home/config. Shut down the xPlore instances before applying your changes. Validate your
changes using the validation tool described in Modifying indexserverconfig.xml, page 36.

100 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Searches

Boosting metadata in scores — Scores for hits in metadata can be increased by adding a boost-value
attribute to a subpath element. The default boost-value (multiplier) is 1.0. In the following example, a
hit in the keywords metadata doubles the score for a result:
<sub-path returnable="true" boost-value="2.0" path="dmftmetadata/keywords" />

Boosting recent documents — The Documentum attribute r_modify_date is used to boost scores
in results. By default, a freshness boost is applied to the default collection. The multiplier is based
on how recent the document is. To remove this boost, set the property enable-freshness-score to false
on the parent category element. For example:
<category name=’dftxml’><properties>
...
<property name="enable-freshness-score" value="false" />
</properties></category>

Configuring query summary and highlighting


The indexing service stores the content of each object as an XML node called dmftcontentref. For all
documents in which an indexed term has been found, xPlore retrieves the content node and computes
summary. The summary is a phrase of text from the original indexed document that contains the
searched word. Search terms are highlighted in the summary.
Note: No summary is returned for content-less objects.

Configuring the summary length — xPlore returns a summary display window from the
summary computation text. The length of this window is specified as the value of the parameter
query-summary-display-length. This window within the summary text is returned to the client
application. If no search term is found in the summary text, a static summary of the specified length
from the beginning of the text is displayed and no terms are highlighted. The default value of this
parameter is 256. This means that 256 characters surrounding the search terms are returned as the
summary. Configure this value in xPlore administrator: Search Service > Configuration.
Summaries can be dynamic or static, depending on your needs for summary precision and query
performance.

Configuring dynamic summaries — Under certain conditions, a dynamic summary can be


computed. The summary is taken from the dmftcontentref element of DFTXML.
Because dynamic summaries require much more computation time than static summaries, the
parameters that govern dynamic summaries are configurable. Set the following parameters in
indexserverconfig.xml, which is located in dsearch_home/config. Stop all xPlore instances before
modifying this file. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36.
• The property query-enable-dynamic-summary has a value of true. (Default: true) You can set this
value in xPlore administrator Global Configuration > Search Service Configuration.
• The first n rows of results as defined by max-dynamic-summary-threshold have a dynamic summary.
This setting is a property of the search-config element. The default is 50, so only the first 50 rows
will have dynamic summary, and any rows after that will have a static summary. If most users
will not go beyond the first page of results, set this value to the page size, for example, 10, for
faster performance.

EMC Documentum xPlore Version 1.0 Administration Guide 101


Managing Searches

• The size of the content is less than the value of extract-text-size-less-than. This
setting is an attribute on the save-tokens-for-summary-processing element in
category-definitions.category.do-text-extraction. The default value is -1 (all documents are
included). If this is set to a positive value, a static summary is returned for larger documents. For
faster summary calculation, set this value to a positive value.
• The query term appears within the first n characters as defined by the token-size attribute.
This setting is an attribute on the save-tokens-for-summary-processing element in
category-definitions.category.do-text-extraction. The default value is 65536 (64K). If the query
term is not found in this snippet, a static summary is returned and term hits are not highlighted.
A value of -1 indicates no maximum content size, but this negatively impacts performance. For
faster summary calculation, set this value lower.
• If security is evaluated in xPlore (not Content Server), and the security_mode property of the
dm_ftengine_config object is set to BROWSE, the user must have at least READ permission. Refer
to Configuring results summary security, page 49.

Configuring static summaries — Static summaries are much faster to compute but less specific than
dynamic summaries. Static summaries are computed, even if you have enabled dynamic summaries,
when the summary conditions do not match the conditions configured for dynamic summaries.
(Refer to Configuring dynamic summaries, page 101). To route all summary computation to static
summaries, set query-enable-dynamic-summary to false in xPlore administrator. (Dynamic summaries
are enabled by default.) Choose the Search Service and click Configuration.
When dynamic summary is turned off, the first n characters of the document are displayed, where n is
the value of the parameter query-summary-display-length. Configure the size of the static summary
display window using xPlore administrator. Set the number of characters to display.
You can specify metadata elements that are displayed in a static summary. Set the following
parameters in indexserverconfig.xml, which is located in dsearch_home/config. Stop all xPlore
instances before modifying this file. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36.
• elements-for-static-summary
Child element of category-definitions.category. Sets the elements whose contents are evaluated
for a static summary. The max-size attribute sets the maximum size of the static summary.
Default: 65536 (bytes)’
• element-name
Child element of elements-for-static-summary. Specifies an element whose content is analyzed
for a static summary.

Highlighting — The search terms, including lemmatized terms, are highlighted within the summary
that is returned to the client search application. Wildcard search terms are also highlighted. For
example, if the search term is ran*, then the word rant is highlighted.
Note: If a search term is in the document but not in the summary computation string. it will not be
visible (or highlighted) in the summary.
Highlighting does not preserve query context such as phrase search, AND search, NOT search, fuzzy
search, or range search. Each search term in the original query is highlighted separately.

102 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Searches

Summary, highlighting, and performance — Dynamic summaries have a performance impact.


Unselective queries can require massive processing to produce summaries. After the summary is
computed, the summary is reprocessed for highlighting, causing a second performance impact.

Auditing queries
Queries are audited to help identify problems. Auditing is on by default. To turn off auditing,
expand Diagnostic and troubleshooting in xPlore administrator left pane and then choose Audit
records. Click Disable .
Audit records are saved in an xDB collection named AuditDB. You can view the audit record for a
selected date range using xPlore administrator. To view or create reports on the audit record, refer to
Chapter 11, Using Reports. Auditing provides the following information:
• The XQuery expression
• The library in which the hits were found
• The number of hits
• Number of hits filtered out by security
• The number of items returned
• The amount of time to execute the query
• The time elapsed to fetch results
• Number of hits to be filtered by security
• Number of hits filtered out by security
• Number of Documentum groups in the cache
• Number of Documentum groups excluded from the cache
To configure audit record properties, stop all xPlore instances and edit indexserverconfig.xml. Make
your changes to the security.auditing element. If the property is not included in indexserverconfig,
add it. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36. Back up the xPlore federation after you change this file.
• auditing.location element: Specifies a storage path for the auditing file. Attributes: name, path,
size-limit. Size limit units: K | M | G | T (KB, MB, GB, TB). Default: 2G,
• audit-config element: Configures auditing. Attributes: name, component, status, format, location.
• properties.property element. Name: audit-save-batch-size. Specifies how many records are
batched before a save. Default: 100.
To configure security cache sizes, refer to To change security cache sizes, page 48.

To view a query in the audit records — Choose a report from the date selector, and then choose
View. Double-click a query of interest to view the XML entry in the report. The XQuery expression is
contained within the QUERY element.

Audit record format — An audit record has the following format in XML:
<event name="<event_name>" component="<component_name>">
<element-name>value</element_name>

EMC Documentum xPlore Version 1.0 Administration Guide 103


Managing Searches

...
</event>

Query events that are audited:


• QUERY
The original query string
• QUERY_OPTION
The query options, as name/value pairs
• LIBRARY_PATH
The xDB library in which the query was executed
• FETCH_COUNT
The number of results fetched by the client application
• TOTAL_HITS
The total number of results returned by xDB
• START_TIME
The time the query started
• EXEC_TIME
The duration of query execution in msec
• FETCH_TIME
The time for xPlore to fetch results from xDB, in msec
• STATUS
Whether query was successful or not
• The following security events are also recorded. For more information on these events, refer to
Changing the security cache sizes, page 168.
— TOTAL_INPUT_HITS_TO_FILTER
How many hits a query had before security filtering.
— HITS_FILTERED_OUT
How many hits were discarded because the user did not have permissions for the results.
— GROUP_IN_CACHE_HIT
How many times the group-in cache was probed for a query.
— GROUP_OUT_CACHE_HIT
How many times the group-out cache was probed for a query.
— GROUP_IN_CACHE_FILL
How many times the query added a group to the group-in cache.
— GROUP_OUT_CACHE_FILL

104 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Searches

How many times the query added a group to the group-out cache.
The audit record reports how many times these caches were hit for a query (, ) and (, ). For details on
these configuration settings, refer to To change security cache sizes, page 48.
For XML records, each event is added to a root instance called AuditRecords. For example:
-<event component="search" name="QUERY">
−<QUERY_ID>
PrimaryDsearch$27571452-1cd3-41c0-9f32-b75629e9be6e
</QUERY_ID>
−<QUERY>

(: Report Name: Get Query Text :)


<report title=’Get query text’>
<header>
<column type="text">Query Id</column>
<column type="text">Query</column>
</header>
<rowset> {

( ( for $k in collection(’audit’)//event[ QUERY_ID = "


PrimaryDsearch$54760bf2-428f-4ac0-a621-a249a7591132" ]

return
<row>
<cell> { $k/QUERY_ID/text() } </cell>
<cell> { $k/QUERY/text() } </cell>
</row>))
} </rowset>

</report>

</QUERY>
<USER_NAME>admin</USER_NAME>
<QUERY_OPTION APPLICATION_NAME="AdminReports" BATCH_SIZE="0" CACHED="true" COLLECTION="
/SystemDataDomain" DOMAIN="" EXECUTION_PLAN="false" LOCALE="en" PARALLEL_EXECUTION="
false" RETURN_SUMMARY="false" RETURN_TEXT="false" SECURITY_EVAL="false" SECURITY_FILTER="
" SPOOLING="false" STREAMING_RESULT="true" SYSTEM_QUERY="true" TIMEOUT="0"
WAIT_FOR_RESULTS="true"/>
<NODE_NAME>PrimaryDsearch</NODE_NAME>
<LIBRARY_PATH>/SystemDataDomain</LIBRARY_PATH>
<FETCH_COUNT>1</FETCH_COUNT>
<TOTAL_HITS>1</TOTAL_HITS>
<START_TIME>2010-04-07T19:06:09</START_TIME>
<EXEC_TIME>0</EXEC_TIME>
<FETCH_TIME>0</FETCH_TIME>
<TOTAL_TIME>0</TOTAL_TIME>
<STATUS>success</STATUS>
</event>

Documentum Search
The following topics describe Documentum indexing and query functionality and tasks in the xPlore
server.

EMC Documentum xPlore Version 1.0 Administration Guide 105


Managing Searches

Search engine configuration (dm_ftengine_config)


The Content Server query plugin settings are set during installation. Change them only if the Content
Server or xPlore environment changes. The following settings affect query processing:
• ftsearch_security_mode
Default: 1. Sets security evaluation in xPlore. Value of 0 sets evaluation in the Content Server. For
more information, refer to Documentum search results security, page 47.
• dsearch_result_batch_size
Default: 200. Sets the number of results fetched from xPlore in each batch.
• security_mode
Sets the security when summaries are displayed. For more information, refer to Configuring
results summary security, page 49.
• fast_wildcard_compatible
Default: false. Sets fragment search option. For more information, refer to Changing a
ft_engine_config parameter, page 106.
• folder_cache_limit
Default: 2000. Specifies the maximum number of folder IDs to include in the index probe for a
folder descend query.

Checking the dm_ftengine_config settings — Use iAPI, DQL, or DFC to check the
dm_ftengine_config object. To view existing parameters using iAPI in Documentum Administrator,
first get the object ID:
retrieve,c,dm_ftengine_config

Use the object ID to get the parameters:


?,c,select param_name, param_value from dm_ftengine_config where r_object_id=’080
a0d6880000d0d’

Changing a ft_engine_config parameter — Use iAPI, DQL, or DFC to modify the ft_engine_config
object. To add a parameter using iAPI in Documentum Administrator, use append similar to the
following:
retrieve,c,dm_ftengine_config
append,c,l,param_name
fast_wildcard_compatible
append,c,l,param_value
true
save,c,l

To change an existing parameter, use set similar to the following:


retrieve,c,dm_ftengine_config
set,c,l,param_name
fast_wildcard_compatible
set,c,l,param_value
false
save,c,l

To remove an existing parameter, use remove instead of set.

106 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Searches

Making types and attributes searchable


You can create or alter types to make them searchable. Configure them for full-text support.
Note: You cannot drop properties from a type definition user ALTER TYPE if the full-text index
contains existing indexed objects of the type or its subtypes.
Properties associated with aspects are not fulltext-indexed by default. If you wish to index them,
you must issue an ALTER ASPECT statement to identify the aspects you want indexed. For more
information on this statement, refer to Documentum Content Server DQL Reference Manual.
For lightweight sysobjects (LWSO) such as dm_message_archive, the client application must
configure searchable attributes. Use CREATE TYPE and ALTER TYPE FULLTEXT SUPPORT switches
to specify searchable attributes. For more information on this configuration, refer to Documentum
Content Server DQL Reference Manual.

Folder descend
Folder descend query performance can depend on folder hierarchy and data distribution across
folders. The following conditions can degrade query performance:
• Many folders, and a large portion of them are empty
Increase folder_cache_limit in the dm_ftengine_config object.
• The search predicate is unselective but the folder constraint is selective
Decrease folder_cache_limit in the dm_ftengine_config object.
The folder_cache_limit setting in the dm_ftengine_config object specifies the maximum number
of folder IDs probed. Default is 2000. If the folder descend condition evaluates to less than the
folder_cache_limit value, then folder IDs are pushed into the index probe. If the condition exceeds the
folder_cache_limit value, the folder constraint is evaluated separately for each result.

DQL, DFC, and DFS queries


DFC-based client applications use the DFC query builder package to translate a query into an XQuery
statement. DFS similarly generates XQuery statements. Alternatively, your application can issue DQL
queries. The Content Server query plugin for xPlore translates the DQL into an XQuery expression.
Verity Query Language (VQL) is not supported by xPlore. With xPlore or DFC APIs, you can
rewrite the some VQL queries to XQuery equivalents. For information on using these APIs, refer to
Documentum xPlore Development Guide.
Table 12, page 107 shows the differences between query results from DQL or DFC/DFS queries:

Table 12. Differences between DQL and DFC/DFS queries

DQL DFC and DFS


No latency: attributes are evaluated from the Latency between Content Server and xPlore
database

EMC Documentum xPlore Version 1.0 Administration Guide 107


Managing Searches

DQL DFC and DFS


No latency for security evaluation Latency for security but faster search results
No VQL equivalent Extended object search (VQL-type support)
No facets Facet support
No hit count Hit count

Disabling XQuery generation by DFC or DFS — You can disable XQuery generation by DFC or
DFS. This allows you to use a DQL hints file or hints in a DQL query. The hints file allows you to
specify certain conditions under which a database or standard query is done in place of a full-text
query.
Turn off XQuery generation by adding the following setting to dfc.properties on the DFC client
application:
dfc.search.fulltext.enabled=false

Routing a query to a collection using DQL


By default, DQL is not generated by DFC. You can route a query to a collection by implementing
DFC query builder APIs. Refer to "Building a query with query builder APIs" in Documentum Search
Development Guide for details.
If you have legacy client applications that generate DFC, however, you can route a query to a specific
collection. Use the DQL clause IN COLLECTION to specify the target of a SELECT statement. Use
one of the two following syntaxes. In the first, collection names are separated by underscores. In the
second, they are in quotation marks and separated by commas.
select attr from type SDC where … enable(
fds_query_collection_collection1_collection2__...)

select attr from type SDC in collection (’collection1’,’collection2’,...)

For example:
select r_object_id from dm_document search document contains ’report’
in collection ( ’default’ ) enable(return_top 10)

Search for lightweight sysobjects (LWSOs)


Lightweight sysobjects group the attribute values that are identical for a large set of objects. This
redundant information is shared among the LWSOs from a single copy of the shared parent object.
For example, Enterprise A-Plus Financial Services receives many payment checks each day. They
record the images of the checks and store the payment information in sysobjects. They will retain this
information for several years and then get rid of it. All objects created on the same day can use a single
ACL, retention information, creation date, version, and other attributes. That information is held by
the shared parent object (a shareable type). The LWSO has information about the specific transaction.
Facets cannot be computed for queries of LWSOs. By default, the queries are executed on the Content
Server using DQL, not in xPlore. If search

108 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Searches

FTDQL
The Webtop search components use the DFC query builder package to construct a query. The DFC
query builder adds the DQL hint TRY_FTDQL_FIRST. This hint prevents timeouts and resource
exceptions by querying the attributes portion of a query against the repository database. The query
builder also bypasses lemmatization by using a DQL hint for wildcard and phrase searches.
If wildcard attribute searches ("contains", "begins with", "ends with") have many results, they can
time out. You can configure attributes searches to go directly against the repository metadata, which
can be faster than the default behavior of the TRY_FTDQL_FIRST hint. In the following DQL hints
file example, FTDQL is turned off for object_name attribute queries:
<Rule>
<Condition>
<Where>
<Attribute operator="like">object_name
</Attribute>
</Where>
</Condition>
<DisableFTDQL/>
<Rule>

The following query is generated when the user searches on the object name and also enters a string
into the Webtop full-text box. The string "technical" is queried in the full-text index and the query for
object_name containing "WDK" is queried against the database:
SELECT r_object_id,text,object_name,... FROM dm_document SEARCH DOCUMENT
CONTAINS technical WHERE (UPPER(object_name) LIKE %WDK% ESCAPE \)
AND (a_is_hidden = FALSE) ENABLE(NOFTDQL)

Note: If your hint contains an object type condition, the hint is applied only for that type and
its subtypes, not for the supertype.

Using DQL hints


If a DQL hints file is present on the application server, DFC reads it. DFC applies the hints to queries
based on conditions defined in the file. You can define conditions under which the hints are applied,
for example, for certain object types, attributes, or repositories.
Note: You must turn off XQuery generation by DFC or DFS in order to use a DQL hints file. Refer
to Disabling XQuery generation by DFC or DFS, page 108 for instructions on turning off XQuery
generation. Refer to DQL, DFC, and DFS queries, page 107 for information on the features that are
not available with DQL. Refer to Appendix C, DQL Hints File DTD for the hints file DTD.
If you turn off XQuery generation, you can use DQL hints in a DQL query or apply hints to all
queries using a DQL hints file.
Table 13, page 110 describes the behavior governed by these elements.

Hints file location

The DQL hints file location is specified in the DFC configuration file dfc.properties on the application
server host. The file must be named dfc.dqlhints.xml. If the file has been modified, it is reloaded

EMC Documentum xPlore Version 1.0 Administration Guide 109


Managing Searches

every two minutes. The following line could be added to dfc.properties to specify a Windows
location for the hints file:
dfc.dqlhints.file=C:/Documentum/config/dfc-dqlhints.xml

Alternatively, you can place a DQL hints file in the application server host system classpath or as
a system environment variable, for example:
-Ddfc.dqlhints.file=path_to_hints_file

Use forward slashes for paths in Java properties file (back slash is used for escape). Alternatively, the
file can be loaded from classpath or the DFC data home directory on the application server host.

Hints file elements

The following elements are contained within a root <RuleSet> element to define the hints passed
to IDfQueryManager.

Table 13. DQL hints file elements

Element Description
<Rule> Can have zero to many <Condition> elements
<DisableFullText/> Disables full-text search on basic search or attributes for the conditions in
the rule
<DisableFTDQL/> Disables search for metadata in the FT index.
<Condition> Child elements are ANDed
<Select>, Child <Attribute> elements can be ANDed (condition="all") or ORed
<Where> (condition="any")
<SelectOption> Adds a permission, for example, FOR READ or FOR BROWSE. For
example, FOR DELETE would limit the results of a query that meets the
condition to those documents on which the user has delete permission.
The following example applies to all Webtop queries:
<RuleSet> <Rule> <Condition> <Where> <Attribute
operator="like">object_name</Attribute> </Where>
</Condition> <SelectOption>FOR DELETE</SelectOption>
<DisableFTDQL/> </Rule> </RuleSet>
<From> Child <Type> elements can be ANDed (condition="all") or ORed
(condition="any")
<Docbase> The value of this element corresponds to a repository to which the hint
applies. The descend attribute is optiona. Fefault=false. To apply the DQL
hint to a folder and all its subfolders,set descend="true".
<Attribute>, <Type>, Support Java regular expression (java.util.regex.Pattern). For example,
<Docbase> <type>custom.*</type> matches all type names beginning with "custom".
<Attribute> Operator "like" represents DQL predicates CONTAINS and LIKE.
The value "is_null" represents DQL predicates NULL, NULLINT,
NULLSTRING, and NULLDATE.

110 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Searches

Element Description
<FulltextExpression> Child of <condition>. Set the mandatory exists attribute ="false" to add
ENABLE(NOFTDQL) to the query when there is no full-text expression in
the search.
<DQLHint> Contains any valid DQL hint, including IN COLLECTION and
RETURN_TOP N. For the full list of DQL hints, refer to Documentum
Content Server DQL Reference Manual.

Hints file examples


Turning off attribute queries in the full-text index — To send all queries on attributes to the
database, define the following hint. The query must not contain a full-text search expression.
<RuleSet>
<Rule>
<Condition>
<FulltextExpression exists="false"/>
</Condition></Rule></RuleSet>
If you disable FTDQL for specific conditions that you have defined within the <rule> element, the
attributes portion of the query that meets those conditions is issued against the database, not the
full-text index.
A temp table is populated with the full-text result. If the full-text query is unselective, then the temp
table is large, negatively impacting response time.

Turning off FTDQL for specific types — In the following example, attributes for the specified object
type are queried in the database, not the full-text index:
<RuleSet>
<Rule>
<Condition>
<From condition="any">
<Type>km_message</Type>
</From>
</Condition>
<DisableFTDQL/>
</Rule>
</RuleSet>

Adding multiple hints to queries — The following example adds two hints to wildcard queries
on either of two attributes:
<RuleSet>
<Rule>
<Condition>
<Where condition="any">
<Attribute operator="like">subject</Attribute>
<Attribute operator="like">object_name</Attribute>
</Where>
</Condition>
<DQLHint>ENABLE(SQL_DEF_RESULT_SET 100, NOFTDQL)</DQLHint>
<DisableFTDQL/>
</Rule>
</RuleSet>

EMC Documentum xPlore Version 1.0 Administration Guide 111


Managing Searches

Using multiple rules — In the following hints file, one rule applies to queries for one attribute,
the second rule applies to a different attribute:
<RuleSet>
<Rule>
<Condition>
<Where condition="any">
<Attribute operator="like">subject</Attribute>
</Where>
</Condition>
<DQLHint> ENABLE(SQL_DEF_RESULT_SET 100, NOFTDQL) </DQLHint>
<DisableFTDQL/>
</Rule>
<Rule>
<Condition>
<Where condition="any">
<Attribute operator="like">object_name</Attribute>
</Where>
</Condition>
<DQLHint> ENABLE(SQL_DEF_RESULT_SET 10) </DQLHint>
<DisableFTDQL/>
</Rule>
</RuleSet>

Make sure that your multiple rules are mutually exclusive when applied to a single query. If not, the
query generates a DQL syntax error. If the Webtop user adds both attributes to the query (subject and
object_name), this hints file example throws an error.

Enabling query routing in DFC


You can route queries to a particular collection to enhance search performance. You can route queries
using one of the following procedures:
• Disable XQuery generation, and use a DQL hints file.
Refer to Disabling XQuery generation by DFC or DFS, page 108 for more information.
• Customize the DFC search service to target a particular collection. Call addPartititionScope(source,
collection_name) in IDfQueryBuilder.
Refer to "Building a query with query builder APIs" in Documentum Search Development Guide
for details.

Changing VQL queries to XQuery expressions


By default, XML content of an input document is not indexed. You can change this by setting the
index-as-sub-path value to true on the xml-content element in indexserverconfig.xml. Shut down all
xPlore instances before changing this file. Validate your changes using the validation tool described
in Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you change this
file. If your documents containing XML have already been indexed, they must be reindexed to
include the XML content.
Zone searching using Verity Query Language (VQL) was supported by older Documentum
applications but was not supported by FAST indexing. With xPlore or DFC APIs, you can rewrite the

112 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Searches

some VQL queries to XQuery equivalents. For information on using these APIs, refer to Documentum
xPlore Development Guide.
• Perform structured searches of XML documents using XQuery or the DFC interface IDfXQuery.
• Join different objects using DQL (NOFTDQL), XQuery, or the DFC interface IDfXQuery.
• Denormalize the relationship of a document to other objects or tables, such as email attachments,
using XQuery or the DFC interface IDfXQuery.
• Perform boolean searches using DQL, XQuery, or the DFC interface IDfXQuery.

Understanding search results


Content Server client applications issue queries through the DFC search service or through DQL.
Not all DQL operators are available through the DFC search service. In some cases, a DQL search of
the Server database will return different results than an xPlore search. The following information
provides an example of what is found for the DFC search service operators begins-with, ends-with,
and wildcard.
xPlore is case-insensitive and ignores white space or other special characters. Diacritics such as
accent marks are also ignored. By default, search terms are lemmatized or stemmed, unless they
are contained with a phrase.

Configuring search for fragments, wildcards, and like


terms
Some parameters in the dm_ftengine_config object control search behavior. In xPlore, queries that
contain a wildcard are interpreted as containing an entire word not a word fragment. For example,
a query for “computer*” matches “computer store” or “computer parts” but not “computers”. By
default, xPlore does not support search for word fragments, even for searches that contain a leading
or trailing wild card. This enables much faster search results.
You can turn on search for word fragments.

Caution: Searches for word fragments are generally much slower than searches for entire
words. Memory consumption on the search server, and user experience, may not be acceptable.
Search precision is degraded by fragment search.

Turning on support for fragments — xPlore does not search for word fragments. For example,
a search for “car*” turns up “car” but not “careful.” The FAST indexing server supported word
fragment searches for leading and trailing wild cards in metadata and word fragment searches in
SEARCH DOCUMENT CONTAINS (SDC) full-text queries. DQL queries that contain the DQL
hint FT_CONTAIN_FRAGMENT in the where clause were converted to the search clause LIKE
’%word%’. For example, a search for com was converted to the clause LIKE ’%com%’, finding
documents containing committee or incoming.

EMC Documentum xPlore Version 1.0 Administration Guide 113


Managing Searches

You can set xPlore to backward compatibility for this behavior in FTDQL SDC queries and DQL
where clauses. Edit the dm_ftengine_config object in the Content Server. Add a param_name element
with the name fast_wildcard_compatible. Add the param_value element and set it to true.

Checking the dm_ftengine_config settings — Use iAPI, DQL, or DFC to check the
dm_ftengine_config object. To view existing parameters using iAPI in Documentum Administrator,
first get the object ID:
retrieve,c,dm_ftengine_config

Use the object ID to get the parameters:


?,c,select param_name, param_value from dm_ftengine_config where r_object_id=’080
a0d6880000d0d’

Adding legacy wildcard behavior to the ft_engine_config object — Use iAPI, DQL, or DFC to
modify the ft_engine_config object. To add a parameter using iAPI in Documentum Administrator,
use append as follows:
retrieve,c,dm_ftengine_config
append,c,l,param_name
fast_wildcard_compatible
append,c,l,param_value
true
save,c,l

To change an existing parameter, use set as follows:


retrieve,c,dm_ftengine_config
set,c,l,param_name
fast_wildcard_compatible
set,c,l,param_value
false
save,c,l

To remove an existing parameter, use remove instead of set.

Turning off search lemmatization — xPlore supports search for similar or like terms, also
known as lemmatization, by default. To speed indexing and search performance, you can turn off
lemmatization for indexing. Refer to Disabling lemmatization, page 66. Validate your changes using
the validation tool described in Modifying indexserverconfig.xml, page 36. ) Back up the xPlore
federation after you change this file.
You can turn off lemmatization for individual queries by using the XQFT modifier “with stemming”
or “without stemming.” The XQFT default is “without stemming,” but the Documentum DQL default
is “with stemming.” To turn off stemming in Documentum queries, you must use a phrase search.

Tracing Documentum queries


There are four possible levels of tracing queries in Documentum environments. You can trace
subsystems with one of the following values:
• all
Traces everything (sum of cs, ftplugin, and ftengine).
• cs

114 EMC Documentum xPlore Version 1.0 Administration Guide


Managing Searches

Traces Content Server search operations such as initializing full-text in-memory objects and
the options used in a query.
• ftplugin
Traces the query plugin front end operations such as DQL translation to XQuery, calls to the
back end, and fetching of each result.
• ftengine
Traces back end operations such as HTTP transactions between the query plugin and xPlore, the
request stream sent to xPlore, the result stream returned from xPlore, and the query execution
plan.
• none
You can trace queries using the MODIFY_TRACE apply method. To turn on tracing in iAPI, type the
following command:
apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,all

To turn off tracing in iAPI, type the following command:


apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,none

On Windows, this command controls tracing for all sessions. On UNIX and Linux, tracing
is session-specific. Trace messages are written to $DOCUMENTUM/dba/log/fulltext/fttrace_
<repository_name>.log. The log entry contains the following information:
• Request query ID, so that you can find the translated query in the xPlore fulltext log
($DOCUMENTUM/dba/log/fullext/fttrace_repository_name.log).
• The XQuery that was translated from DQL
• Query plan, if you have set tracing to all or ftengine. (The query plan is used to provide information
to EMC technical support.)
• The request and response streams, to diagnose communication errors or memory stream
corruption
• dm_ftengine_config options
Note: This information is not written to the log for test queries that are issued through xPlore
administrator.

EMC Documentum xPlore Version 1.0 Administration Guide 115


Managing Searches

116 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 10
Troubleshooting

For information on troubleshooting installation, refer to Documentum xPlore Deployment Guide.


Reports provide production troubleshooting support (refer to Chapter 11, Using Reports).
The following features aid in diagnosing and troubleshooting problems in indexing and searches:
• System troubleshooting
Troubleshooting system problems, page 118.
• Index agent troubleshooting
Troubleshooting the Documentum index agent, page 120
• Document processing troubleshooting (CPS)
Troubleshooting CPS, page 126 and Document processing (CPS) reports, page 150.
• Indexing troubleshooting
Troubleshooting indexing, page 132 and Indexing reports, page 151.
• Search troubleshooting
Troubleshooting search, page 136 and Search reports, page 151.
• Using logging or tracing
Logging, page 142 and Tracing, page 147.

Diagnostics and troubleshooting in xPlore


administrator
Choose Diagnostics and troubleshooting in the left navigation tree. You see links for the following
features:
• Upload testing document
Testing upload and indexing, page 132
• Test search
Testing the query in xPlore administrator, page 136
• Test tokenization
Testing tokenization, page 127

EMC Documentum xPlore Version 1.0 Administration Guide 117


Troubleshooting

• Audit records
Auditing queries, page 103
• Reports
Chapter 11, Using Reports

Troubleshooting system problems


The following topics describe system problem troubleshooting:
• Insufficient disk space
• Out of memory errors
• I/O errors, No such file or directory

Insufficient disk space


Investigate the following potential causes:
• xPlore is configured for incremental backups but they are not performed. The xDB redo
logs will grow until a full backup. Check the amount of space consumed by the xDB log in
dsearch_home/config/log. The log files will be very large and can even reach the size of the space
consumed by the data and index files.
Suggested workaround: Discard log files if you are not doing incremental backups. Use the xDB
admin tool. In the menu, choose Federation > Change keep-log-file option. Enter the xPlore
administrator password and uncheck Keep log files.
• The storage area for the indexes or logs are not large enough to handle the volume of data, or you
added indexing sources after the storage area was set up. This can also cause data corruption
and query failure.
Suggested workarounds:
— Allocate more disk space to xPlore instances.
— Use index agent filters to filter out content that does not need indexing. For example, you can
exclude certain object types or folders. (Refer to Using the index agent filters, page 56.)
Note: If you have already indexed content, it will not be removed from the index by the filters.
— Turn off or purge non-essential consumption of disk space like tracing, auditing, or metrics.
(Tracing consumes much more space than auditing and metrics.)
— Turn off the search service for an instance if it is not needed.
— Change a Documentum repository to index metadata only (not content).
— Use a compressed file system. (May impact performance.)
• Log files are too large. Check the space in dsearch_home/jboss4.3.0/server/instance_name/logs.

118 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

Suggested workarounds:
— Purge unneeded log files.
— Turn down the log level from Debug or Info to Warning.
— Add disk space.
• You are saving tokens but didn’t plan space for it. Saved tokens could potentially save time on an
index rebuild, but they consume a large amount of disk space, more than five times the space
without save-tokens. Suggested workaround: Set the save-tokens option to false (the default) in
indexserverconfig.xml (Troubleshooting lemmatization, page 67) and restart xPlore instances.
• You disabled content compression but did not allow enough space for it. The extracted content
from Documentum documents is compressed by default. If this is changed, the index size can
grow to several times larger. Suggested workaround: Add more disk space, or stop all xPlore
instances and add the following in indexserverconfig.xml if it has been removed:
<compress>
<for-element name="dmftcontentref"/>
</compress>

• Incomplete xPlore cleanup. The data file space grows too large.
Suggested workaround: Purge orphaned xDB files. For purge utilities, refer to Purging orphaned
segments, page 98.

Out of memory errors


If the application server has out of memory errors, you can increase the default heap size for the
JVM. Stop the application server. Edit the script that launches an xPlore server or index agent,
located in dsearch_home/jboss4.3.0/server. Increase the values of -Xms and -Xmx, save, and restart the
application server. (On Windows, each instance is installed as an automatic service. Stop the service,
edit the launch script, and restart the service.)
If you run a script, run as the same administrator user who started the instance.

I/O errors, No such file or directory


If you have installed multiple instances of xPlore, their storage areas must be accessible from all other
instances. If not, you will see an I/O error when you try to create a collection. Use the following
cleanup procedure:
1. Shut down all xPlore instances.
2. Edit xhivedatabase.bootstrap in dsearch_home/config. Change the binding node value to primary
for segments that have this problem.
3. Edit indexserverconfig.xml dsearch_home/config and remove binding elements from the collection
that has the issue.
4. Restart xPlore instances.

EMC Documentum xPlore Version 1.0 Administration Guide 119


Troubleshooting

Troubleshooting the Documentum index agent


The following topics describe troubleshooting the Documentum index agent.

Startup problems
Make sure the index agent web application is running. On Windows, verify that the Documentum
Indexagent service is running. On Linux, verify that you have instantiated the index agent using the
start script in dsearch_home/jboss4.3.0/server.
If the repository name is reported as null, restart the repository and the connection broker and
try again.
If you see a status 500 on the index agent UI, examine the stack trace for the index agent instance. If a
custom routing class cannot be resolved, this error appears in the browser:
org.apache.jasper.JasperException: An exception occurred processing JSP page
/action_dss.jsp at line 39
...
root cause
com.emc.documentum.core.fulltext.common.IndexServerRuntimeException:
com.emc.documentum.core.fulltext.client.index.FtFeederException:
Error while instantiating collection routing custom class...

If the index agent web application starts with port conflicts, stop the index agent with the script. If
you run a stop script, run as the same administrator user who started the instance. The index agent
locks several ports, and they are not released by closing the command window.

The index agent log


The index agent log uses log4j.properties. This file is located in the WEB-INF/classes directory of the
index agent WAR file. You can change the amount of information by setting the following properties:
• log4j.category.com.documentum.server.impl=DEBUG
• log4j.category.com.emc.documentum.core.fulltext=DEBUG,F1
The following type of indexing error will be reported in IndexAgent.log, which is located in
dsearch_home/jboss4.3.0/server/DctmServer_Indexagent/logs.
2009-01-30 10:11:14,078 ERROR IndexingStatus [SubmitterThread][DM_INDEX_AGENT_PLUGIN]
Document represented by key 0881540f80000322
failed to index into collection knowledgeworker,
error:/indexserver/UpdateServlet

Indexing status in the index agent UI


The index agent UI displays indexing status. On login, you can view information about the last
indexing operation: Date and time, total count, completed count, success count, warning count,
and failure count.

120 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

When you view Details during or after an indexing process, you see the following statistics:
• Active items: Error count, indexed content size, indexed count, last update timestamp, size,
and warnings count.
• Indexer plugin: Maximum call time
• Migration progress (if applicable): Processed docs and total docs.
• Averages: Pause time, KB/sec indexed, number of indexed docs/sec, plugin blocking max time.
• List of current internal index agent threads
When you start an indexing operation, a status summary is displayed until indexing has completed.
Click Refresh to update this summary. The summary disappears when indexing has completed. To
view more details of indexing in progress, click Details.
Table 14, page 121 compares the processing counts reported by the index agent and xPlore
administrator.

Table 14. Comparing index agent and xPlore administrator indexing metrics

Metric Index agent xPlore administrator


Failed Documents not submitted to Errors in CPS processing,
xPlore content and metadata not
indexed. Count does not
include failures of the index
agent.
Warning Metadata indexed by not Metadata indexed but not
content content
Success Documents indexed by xPlore Documents indexed by xPlore

Indexing status in Documentum Administrator


In Documentum Administrator, you can check the index agent queue. Navigate to Administration >
Indexing Management > Index Queue. The drop-down list displays Indexing failed, Indexing in
progress, Awaiting indexing, Warning, and All. From the Indexing failed display, you can find the
object ID and type, and the type of failure. Some types of errors are the following:
• [DM_FULLTEXT_E_SEARCH_GET_ROW_FAIL...]
Caused by incorrect query plugin (Content Server hotfix for 6.5 SPx)
• [DM_FULLTEXT_E_QUERY_IS_NOT_FTDQL...]
Caused by incorrect query plugin (Content Server hotfix for 6.5 SPx)
• [DM_FULLTEXT_E_EXEC_XQUERY_FAIL...]
There is nothing in the index.
To sort by queue state when there is a large queue, use the following DQL command in Documentum
Administrator:
select count(*), task_state from dmi_queue_item where name like ’%fulltext%’
group by task_state

EMC Documentum xPlore Version 1.0 Administration Guide 121


Troubleshooting

To check the indexing status of an object — The queue item ID for the document is available in the
details screen of the index agent UI. Use the following DQL to check the status of the queue item:
select task_name,item_id,task_state,message from dmi_queue_item where name=
username and event=’FT re-index‘

Restarting the index agent


If you must stop and restart the index agent before it has finished indexing a batch of documents
that you manually submitted (through the index agent UI), you must resubmit the indexing requests
that were not finished.

Verifying index migration with ftintegrity


Use these instructions for running the index verification tool after migration. The tool is a standalone
Java program that checks index integrity against repository documents. It verifies all types that are
registered to dmi_registry_table with the user dm_fulltext_index_user, comparing the object ID and
i_vstamp between the repository and xPlore.
To verify that indexing after migration has completed and normal indexing mode has ingested
documents properly, use the state of the index job. Refer to Running the state of the index job, page 60.
Use the tool to verify a migration or a restore operation.
Note: The ftintegrity tool references the default index agent filters properties file. (Refer to Using the
index agent filters, page 56.) The tool is not aware of object types that you have chosen not to index
through a custom filter. The object IDs of instances of those types appear in the generated results file.
For information on ftintegrity arguments, refer to the list of arguments for the state of the index job
(Table 5, page 61). The ftintegrity script calls this job.

To run the index migration verification tool


1. Navigate to Dsearch_home/setup/indexagent/tools.
2. Open the script ftintegrity_for_repositoryname.bat (Windows) or ftintegrity_for_repositoryname.sh
(Linux) and edit the script. Substitute the repository instance owner password in the script. The
tool automatically resolves all parameters except for the password.
3. (Optional) If you have indexed using the default index agent filters, you can add the filters to
the ftintegrity parameters. This setting will significantly slow ftintegrity performance. (For
information on the filters, refer to Using the index agent filters, page 56.) Add the following
string to the end of the java command. A file ObjectId-filtered-out.txt will record all object IDs
that are filtered.
• Windows:
dm_fulltext_index_user %CONFIG_DIR%\filter.properties

For dual mode installations (FAST and xPlore), the user is dm_fulltext_index_user_01
• Unix/Linux:
dm_fulltext_index_user $CONFIG_DIR/filter.properties

122 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

For dual mode installations (FAST and xPlore), the user is dm_fulltext_index_user_01
This setting generates a file ObjectId-filtered-out.txt that records all IDs of filtered-out objects.
4. If you run a script, run as the same administrator user who started the instance. Launch the
ftintegrity script.
Output from the script is similar to the following:
2009/09/02 12:28:10:078 Connected to the docbase
2009/09/02 12:28:10:344 Index Server is running
2009/09/02 12:28:12:453 fetched 63 object from docbase for type dm_group
2009/09/02 12:28:12:453 fetched 0 objects from DSS for type dm_group
2009/09/02 12:28:29:721 fetched 12216 object from docbase for type dm_sysobject
2009/09/02 12:28:29:721 fetched 11185 objects from DSS for type dm_sysobject
2009/09/02 12:28:30:033 fetched 286 object from docbase for type dm_acl
2009/09/02 12:28:30:033 fetched 0 objects from DSS for type dm_acl
2009/09/02 12:28:30:033 11183 objects with match ivstamp in both DCTM and
Index Server
2009/09/02 12:28:30:033 2 objects with different ivstamp in DCTM and Index Server
2009/09/02 12:28:30:033 1380 objects in DCTM only
2009/09/02 12:28:30:033 0 objects in Index Server only

Results from the ftintegrity migration verification — The script generates four results files in the
tools directory:
• ObjectId-common-version-match.txt
This file contains the object IDs and i_vstamp values of all objects in the index and the repository
and having identical i_vstamp values in both places.
• ObjectId-common-version-mismatch.txt
This file records all objects in the index and the repository with identical object IDs but
nonmatching i_vstamp values. For each object, it records the object ID, i_vstamp value in the
repository, and i_vstamp value in the index.
The mismatch is on objects that were modified during or after migration. You can resubmit this
list after you start the index agent in normal mode. Click Object File and browse to the file.
• ObjectId-dctmOnly.txt
This report contains the object IDs and i_vstamp values of objects in the repository but not in
the index.
These objects could be documents that failed indexing, documents that were filtered out, or new
objects generated in the repository during or after migration. You can resubmit this list after you
start the index agent in normal mode. Click Object File and browse to the file.
To check whether filters were applied during migration, run the following DQL query. If one or
more rows are returned, a filter was applied.
select r_object_id,object_name,primary_class from dmc_module where any
a_interfaces=‘com.documentum.fc.indexagent.IDfCustomIndexFilter’

• ObjectId-indexOnly.txt
This report contains the object IDs and i_vstamp values of objects in the index but not in the
repository.
These objects were removed from the repository during or after migration, before the event
has updated the index.

EMC Documentum xPlore Version 1.0 Administration Guide 123


Troubleshooting

You can input the ObjectId-common-version-mismatch.txt file into the index agent UI to see errors for
those files. After you have started the index agent, check Index selected list of objectsand then check
Object file. Navigate to the file and then choose Submit. Open xPlore Administrator > Reports and
choose Document processing error summary. The error codes and reasons are displayed.

Documents are not indexed


Is indexing enabled? — Documents are not indexed if the document type is not registered or is
not a subtype of a registered type. Check whether indexing is enabled (the type is a subtype of a
registered type). You can check whether indexing is enabled in Documentum Administrator by
viewing the type properties.
You can register or unregister a type through Documentum Administrator. The type must be
dm_sysobject or a subtype of it. If a type’s supertype is registered for indexing, the system displays
the Enable Indexing checkbox selected but disabled. You cannot clear the checkbox.

Is the format indexable? — Check the class property of the document format. Refer to Documentum
attributes that control indexing, page 53 for more information.

Is the document too large? — Check the content size. By default, the index agent filters out content
larger than 20 MB. The following message is logged in indexagent.log:
Content size for XXX exceeds limit of 20000000 skipping content

Is there another cause? — Check the index agent log for any other error message for the document,
such as unsupported format (the most common).

Reindexing
You can submit for reindexing the lists of objects that are generated by ftintegrity (Verifying index
migration with ftintegrity, page 122.)

To check on the status of queue items that have been submitted for reindexing — Use the
following DQL. For username, specify the user logged into the index agent UI and started reindexing.
select task_name,item_id,task_state,message from dmi_queue_item where
name=username and event=and event=’FT re-index’

If task_state is done, the message will be “Successful batch...” If the task_state is failed, the message
will be “Incomplete batch...”

To resubmit one document for reindexing — Put the object ID into a temporary text file. Use the
index agent UI to submit the upload: Choose Index selected list of objects >Object File option.

To remove queue items from reindexing — Use the following DQL. For username, specify the user
logged into the index agent UI and started reindexing.
delete dmi_queue_item object where name=username and
event=’FT re-index’

124 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

Setting the index agent error threshold


When the index agent receives a response from xPlore, a counter is updated for each error message.
When the counter exceeds a configurable error_threshold, the index agent performs the configured
action, for example, stops the indexing. (This is the only action available in this release.) To edit the
error thresholds, stop the index agent instance and edit the file indexagent.xml. (This file is located in
dsearch_home/jboss4.3.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.)
Locate the element error_configs, just after the closing tag of indexer_plugin_config. Each
error_config element contains the following elements:

Table 15. Index agent error configuration

Element name Description


error_config Contains error_code, error_threshold,
time_threshold and action elements.
error_code Refer to Table 16, page 125.
error_threshold Number of errors at which the action is executed.
time_threshold Time in seconds at which to check the counter.
If error_threshold is exceeded, the action is
executes.
action Valid value: stop

Table 16. Error codes

error_code Description
UNSUPPORTED_DOCUMENT Unsupported format
XML_ERROR XML parsing error for document content
DATA_NOT_AVAILABLE No information available
PASSWORD_PROTECTED Password protected or document encrypted
MISSING_DOCUMENT RTS routing error
INDEX_ENGINE_NOT_RUNNING xPlore indexing service not running

Cannot stop the index agent


If you have configured two index agents on the same host and port, you see the following error
message when you attempt to stop the agent:
Exception in thread "main" java.lang.SecurityException:
Failed to authenticate principal=admin, securityDomain=jmx-console

You can kill the JVM process and run the index agent configurator to give the agents different ports.

EMC Documentum xPlore Version 1.0 Administration Guide 125


Troubleshooting

Cleaning up the Documentum index queue to restart


Using iAPI in Documentum Administrator, remove all dmi_queue_items with the following
command, inserting the instance owner for the value of name:
?,c,delete dmi_queue_item object where name = ’dmadmin’

Troubleshooting CPS
You can test upload processing by using the Test upload feature in xPlore administrator. For more
information, refer to Testing upload and indexing, page 132.

Reading CPS log files


CPS logging in an xPlore instance — If CPS is installed as an in-process service on an
xPlore instance, it shares the log4j.properties of the indexserver web application, for example,
dsearch_home/jboss4.3.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/classes.
The log files cps.log and cps_daemon.log are located in dsearch_home/jboss4.3.0/server/DctmServer_
PrimaryDsearch/logs.

CPS logging in a standalone instance — If CPS is installed as a standalone service, the


log4j.properties file is located in the CPS war file, in WEB-INF/classes. The log files cps.log and
cps_daemon.log are located in cps_home/jboss4.3.0/server/cps_instance_name/logs.

CPS log levels — The following log levels are available for CPS in order of decreasing amount of
information logged: debug, info, warn, and error. Set the log level to INFO to troubleshoot CPS. The
log output file is specified in the log4j.properties file of the instance.
Each CPS request is logged with the prefix PERFCPSTS#. You see prefixes for following libraries
in CPS logging:
• CPS daemon: DAEMON-CORE
• Text extraction: DAEMON-TE STELLENT
• HTTP content retrieval: DAEMON-CF_HTTP
• Language identification: DAEMON-LI_RLI
• Language processing: DAEMON-LP_RLP
Following is an example from cps.log. (Remote CPS log is named cps_manager.log.)
2008-10-21 13:35:40,402 WARN [DAEMON-CORE-(1324)] max_batch_size in configuration
file is invalid. Use default:65536 instead.

Example: CPS performance by format — Use the timestamp difference between PERFCPSTS9
(Content fetching of the single request finished) AND PERFCPSTS10 (Text extraction of the single
request finished) to find the processing time for a particular document.

126 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

Separating log files for CPS instances


If you have more than one CPS instance, make sure that each instances specifies a different
log file path. The log output file is specified in the log4j.properties file of the instance.
If CPS is installed as a standalone service, the log4j.properties file is located in the CPS
war file, in WEB-INF/classes. If CPS is installed as an in-process service on an xPlore
instance, it shares the log4j.properties of the indexserver web application, for example,
C:\xPlore\jboss4.3.0\server\DctmServer_PrimaryDsearch\deploy\dsearch.war\WEB-INF\classes.

Verifying the CPS process


The CPS process (CPSDaemon) runs on port 64321. Make sure that the process has started.

Testing tokenization
Test the tokenization of a word or phrase to see what is indexed. Expand Diagnostic and
Troubleshooting in the xPlore administrator tree and then choose Test tokenization. Different
tokenization rules are applied for each language. Uppercase characters are rendered as lowercase.
Special characters are replaced by white space.
Note: Test tokenization is not traced.
The results table displays the original input words. The root form is the token used for the index. The
Start and End offsets display the position in raw input. Components are displayed for languages that
support component decomposition, such as German.
Results may differ from tokenization of a full document. If the document language that is identified
during indexing does not match the language that is identified from the test, or the context of the
indexed document does not match the context of the text, the tokens can vary.

Testing CPS processing


Use the executable CASample in dsearch_home/dsearch/cps/cps_daemon/bin to test the processing
of a file. Syntax:
casample path_to_input_file

Many types of errors are flagged.


If there are processing errors for the file, they will be displayed after the processing statistics. A
corrupt file returns the following error. The XML element that contains the error is displayed:
*** Error: file is corrupt in xml_element.

If the CPS analyzer cannot identify the file type, it displays the following error. The XML element
that contains the error is displayed:
*** Error: no filter available for this file type in xml_element.

EMC Documentum xPlore Version 1.0 Administration Guide 127


Troubleshooting

If the file is empty, the following error is displayed. The XML element that contains the error is
displayed:
*** Error: file is empty in xml_element.

Slow ingestion
Slow ingestion is most often seen during migration. If migration is spread over days, for example,
tens of millions of documents ingested over two weeks, slow ingestion may not be an issue. Most
ingestion issues can be resolved with planning, pre-production sizing, and benchmarking.
The following topics describe possible causes and workarounds for slow ingestion:
• Insufficient CPU
• Large documents
• Disk I/O issues
• Slow network
• Large number of Excel documents
• Virus checking software
• Interference by another guest OS
• Slow content storage area

Insufficient CPU

Content extraction and text analysis are CPU-intensive. CPU is consumed for each document
creation, update, or change in metadata. Check CPU consumption during ingestion. Suggested
workarounds: For migration, add temporary CPU capacity. For day-forward (ongoing) ingestion,
add permanent CPU or new CPS instances. CPS instances will be used in a round-robin order.

Large documents

Large documents can tie up a slow network. These documents also contain more text to process. Use
the xPlore administrator reports to see the average size of documents and how many documents
are ingested per hour. Document size is also reported by the State of repository report in Content
Server. For example, the Documents ingested per hour reports shows number of documents and
text bytes ingested. Divide bytes ingested by document count to get average number of bytes per
document processed.
Two configuration properties affect the size of documents that are indexed and consequently the
ingestion performance:
• Indexing agent (Documentum only) limits the size of the documents submitted for indexing. This
limit is changed in indexagent.xml, in the WEB-INF/classes/ directory of the index agent WAR

128 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

file. You can change the contentSizeLimit parameter to a different value (in bytes). Stop the
index agent instance to change the size limit.
<parameter_name>contentSizeLimit</parameter_name>
<parameter_value>20000000</parameter_value>
</parameter>

• CPS limits the size of text that is indexed. A document can have a much greater size
(contentSizeLimit) compared to the indexable text within the document. You can change the value
of Max Text Threshold in the xPlore Administrator CPS configuration screen. Units are bytes
and the range is 5-40 MB.
Other suggested workarounds: Add CPU, memory, and possible disk I/O capacity. Improve network
performance.

Disk I/O issues

You can detect disk I/O issues by looking at CPU utilization. Low CPU utilization and high I/O
response time indicate an I/O problem. Test the network by transferring large files or using Linux
dd (disk dump).
Suggested workarounds:
• NAS
Verify that the network has not been set as half duplex. Increase network bandwidth and/or
improved network I/O controllers on the xPlore host.
• SAN (check in the following order)
1. Verify that the SAN has sufficient memory to handle the I/O rate.
2. Increase the number of drives available for the xPlore instance.
3. If the SAN is multiplexing a set of drives over multiple application, move the "disk space"
to a less contentious set of drives.
4. If other measures have not resolve the problem, change underlying drives to solid state.

Slow network

A slow network between the Documentum Content Server and xPlore results in low CPU
consumption on the xPlore host even when the disk subsystem has a high capacity. File transfers via
FTP or network share are also slow, independent of xPlore operations.
Suggested workarounds: Verify that network is not set to half duplex. Check for faulty hubs or
switches. Increase network capacity.

EMC Documentum xPlore Version 1.0 Administration Guide 129


Troubleshooting

Large number of Excel documents

Microsoft Excel documents require the most processing of all text formats, due to the complexity of
extracting text from the spreadsheet structure. You can detect the number of Excel documents using
the State of repository report in Content Server.
Suggested workaround: Add temporary CPU for migration or permanent CPU for ongoing load.

Virus checking software

Virus checking software can lead to high disk I/O because it continually checks the changes in xPlore
file structures during indexing.
Workarounds: Exclude temp and xPlore working and data directories, or switch to Linux platform.

Interference by another guest OS

In a VM environment, the physical host may have several guest OSes. This could causes intermittent
slowness in indexing not due to format, document size, I/O capacity, or CPU capacity.
Workaround: Work with the infrastructure team to load balance the VMs appropriately.

Slow content storage area

Ingestion is dependent on the speed of the content source. This is especially noticeable during
migration. For example, you find that migration or ingestion takes much longer in production than in
development. Development is on a small volume of content on NAS but production content is on a
higher-latency device like Centera. You can determine the location of the original content by using
the State of the repository report in Content Server.
Workaround: Extend the migration time window.

CPS daemon fails to start


The CPS configuration may be invalid. Check to see whether you have changed the file
InstanceName_local_configuration.xml in dsearch_home/dsearch/cps/cps_daemon.
CPS may not start on an unsupported OS. Go to dsearch_home/dsearch/cps/cps_daemon/bin and try to
run the daemon directly to see whether it can be started.

130 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

CPS starts but fails to process some requests


Check the list of supported formats in Appendix G, Indexable Formats. Check an individual file by
submitting it to a test. Run casample.exe or casample.sh in dsearch_home/dsearch/cps/cps_daemon/bin.
If the files pass the casample test, try to resubmit them using the index agent UI.
If a collection has been set to read-only, and documents in that collection are submitted for updating,
the update will fail.

CPS starts but then has to restart


The CPS load is too high, causing out of memory errors. Check the use of memory on the CPS
instance. If the load is high, change the CPS configuration: Decrease the number of worker threads.
Resubmit any failed files using the Documentum index agent.

CPS configuration changes do not take effect


If you have edited indexserverconfig.xml, xPlore overwrites your changes. First shut down all xPlore
instances before changing this file. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.

Handling concurrent large file ingestion


When two or more large files are processed by CPS at the same time, the CPS log file reports one of
the following errors (cps_daemon.log):
ERROR [Daemon-Core-(3400)] Exception happened, ACCESS_VIOLATION,
Attempt to read data at address 1 at (connection-handler-2)
...
FATAL [DAEMON-LP_RLP-(3440)] Not enough memory to process linguistic requests.
Error message: bad allocation

Use xPlore administrator to select the instance, and then Configuration. Change the following to
smaller values:
• Max text threshold
• Thread pool size
You can add a separate CPS instance that is dedicated to processing. This processor will not interfere
with query processing.

EMC Documentum xPlore Version 1.0 Administration Guide 131


Troubleshooting

Troubleshooting indexing
You can use reports to troubleshoot indexing and content processing issues. Refer to Chapter
11, Using Reports for more information on these reports. The following topics describe general
troubleshooting tasks and specific indexing errors.

Testing upload and indexing


If xPlore administrator is running on the same instance as an index service, you can test upload
a document for indexing. To do test upload in xPlore administrator, expand the Diagnostic and
troubleshooting tree and choose Upload testing document. You can test upload in the following
ways:
• Local File
Navigate to a local file and upload it.
• Remote File
Enter the path to a remove file that is accessible from the xPlore administrator host.
• Specify raw XML
Click Specify raw XML, and type raw XML such as DFTXML for testing.
When you click on the object name, you see the DFTXML. This DFTXML rendition is template-based,
not generated by the Documentum index agent. There may be slight differences in the DFTXML
when you submit the document through the Documentum index agent. To verify remote CPS, you
start with a sample DFTXML file, then edit the dmcontentref element. Set it as a path to the shared
storage, then paste the DFTXML in the text box.
Results are displayed. If the document has been successfully uploaded, it is available immediately
for search unless the indexing load is high. The object name in xPlore is created by concatenating
the file name and timestamp.

Checking network bandwidth and latency


Bandwidth or latency bottlenecks can degrade performance during the transfer of content from a
source file system to xPlore. Validate that file transfers take place with the expected speed. This issue
can be seen more frequently in virtual environments.

Checking the indexing log


The xPlore log dsearch.log is located in the logs subdirectory of the JBoss deployment directory. In
the following example, logging was set to INFO (not the default) in xPlore administrator.
The following test from dsearch.log shows that a document was indexed and inserted into the
track DB:

132 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

<event timestamp="2009-04-03 08:40:54,536" level="INFO" thread="http-0.0.0.0-9200-2"


logger="com.emc.documentum.core.fulltext.indexserver.core.index.FtIndexObject"
elapsedTime="1238773254536">
<message ><![CDATA[[INSERT_STATUSDB] insert into the StatusDB with docId 0965dd8980001dce
operationId primary$3cb3b293-8790-452e-af02-84c9502a45e4 status NEW (message) ]]>
</message>
</event>

Checking the status of a document


Using xPlore administrator, you can issue the following XQuery to find the indexing status of a
document. You must know the document ID.
for $i in collection(’/domainname>/dsearch/SystemInfo/StatusDB’)/
trackinginfo/operation
where $i [@doc-id = ’<document-id>’]
return <r> <status>{$i/@status/data(.)}</status>
<message> {$i/data(.)} </message></r>

The status returned is one of the following:


• NEW
The indexing service has begun to process the request.
• ERROR
xPlore failed to process the request. Metadata and content cannot be indexed.
• DONE
The document has been processed successfully.
• WARN
Only the metadata was indexed.

Checking Documentum settings


Check the following settings:
• Make sure the user who starts the index agent has permission in the repository to read all content
that is indexed.
• Start the index agent application before you start the UI. You can use the following start script
for your environment:
dsearch_home\jboss4.3.0\server\startIndexagent.cmd
dsearch_home/jboss4.3.0/server/startIndexagent.sh

On Windows, the index agent instance is installed as an automatic service called Documentum
Index_agent Windows.
• Check for errors in the index agent status page at http://host:port/IndexAgent/.

EMC Documentum xPlore Version 1.0 Administration Guide 133


Troubleshooting

Note: The index agent reports processing X number of documents. xPlore reports success and
failure that should add up to X. Warning numbers reported in xPlore and IA should match.
Failures in xPlore are not reported back to IA.

High save-to-search latency


The following issues can cause high latency between the time a document is created or updated and
the time the document is available for search:
• Index agent is down
• CPS restarts frequently
• Large documents tie up ingestion
• Ingestion batches are large
• Insufficient hardware resources

Index agent is down

When the index agent is down, documents cannot be indexed or searched. Detect this problem
by monitoring the size of the index agent queue. Use xPlore administrator to determine whether
documents were sent for ingestion. For example, the Documents ingested per hour report shows 0
for DocCount when the index agent is down.
Workaround: Configure multiple index agents for redundancy. Monitor the index agents and restart
when they fail.

CPS restarts frequently

Under certain conditions, CPS fails while processing a document. xPlore restarts the CPS process, but
the restart causes a delay. Restart is logged in cps.log and cps_daemon.log. For information on these
logs, refer to Reading CPS log files, page 126.

Large documents tie up ingestion

A large document in the ingestion pipeline can delay smaller documents that are further back in the
queue. Detect this issue using the Documents ingested per hour report in xPlore administrator.
(Only document size averages are reported.)
If a document is larger than the configured maximum limit for document size or text size, the
document is not indexed. The document metadata are indexed but the content is not. This is recorded
in the xPlore administrator report Content too large to index.
Workaround: Attempt to refeed a document that was too large. Increase the maximum size for
document processing. (Refer to Document size and performance, page 165.)

134 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

Large ingestion batches

During periods of high ingestion load, documents can take a long time to be processed. Review the
ingestion reports in xPlore administrator to find bytes processed and latency. Use dsearch.log to
determine when a specific document was ingested.
Workaround: Set up a dedicated index agent for the batch workload.

Hardware and virtual server resources

If CPU, disk I/O, or memory are highly utilized, increase the capacity. Performance on a virtual server
is somewhat slower than on a dedicated host. For a comparison or performance on various storage
types, refer to Storage types and locations, page 159.

Connection refused
If an API returns a connection refused error, check the value of the URL on the instance. Make sure
that it is valid and that indexing is turned on for the instance.
If you have to change the xPlore host name, do the following:
• Update indexserverconfig.xml with the new value of the URL attribute on the node element.
Shut down the xPlore instance before applying your changes. Validate your changes using the
validation tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore
federation after you change this file.
• Change the JBoss startup (script or service) so that it starts correctly. If you run a stop script, run
as the same administrator user who started the instance.

Changes to index configuration do not take effect


If you have edited indexserverconfig.xml, your changes are not applied. Shut down all xPlore
instances before changing this file. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.

Timing problems: Login ticket expired


All instances in an xPlore deployment must have their host clocks synchronized to the primary xPlore
instance host. Shut down all xPlore instances, synchronize clocks, and restart.

EMC Documentum xPlore Version 1.0 Administration Guide 135


Troubleshooting

Troubleshooting search
When you set the search service log level to WARN, queries are logged. Refer to Query logging, page
146 for more information. If query auditing is enabled (the default), you can view or edit reports on
queries. Refer to Chapter 11, Using Reports for more information on query reports.

No queries allowed — An error message in the client can indicate that the xPlore search service
has not started:
The search has failed: The Full-text service is disabled

Testing the query in xPlore administrator


You can search on a full-text string in content or metadata. In xPlore administrator, expand the
Diagnostic and troubleshooting tree and choose Test search . To search for a keyword, choose
Keyword and enter the search string. If you enter multiple terms in the Keyword field, the XQuery
expression is generated using the AND condition.
To search using an XQuery expression, choose XQuery and enter the expression. Make sure that you
select the correct domain for the document. If you are copying a query from the log, you must remove
declare phrases like "declare option xhive:queryplan-debug ’true’" from the beginning of the query.
Note: The document must already be indexed before you can perform this test.
Security is not evaluated for results from a test search. As a result, the number of items returned does
not reflect hits that are removed after security is applied in the index server. A status of fail or success
indicates that the query did or did not execute; success does not indicate the presence of hits.

Testing the query in Documentum iAPI or DQL


Try a query similar to the following:
api>?,c,SELECT text,object_name FROM dm_document SEARCH DOCUMENT CONTAINS ’test’
WHERE (a_is_hidden = FALSE)

Verifying the query plugin settings


Check the Content Server log after your start the Content Server. The file repository_name.log is
located in $DOCUMENTUM/dba/log. Look for the line similar to the following. It should reference
a plugin with DSEARCH in the name, similar to the following.
Mon Jun 14 21:53:50 2010 031000 [DM_FULLTEXT_T_QUERY_PLUGIN_VERSION]info:
"Loaded FT Query Plugin: ...C:\Documentum\product\6.5/bin/DSEARCHQueryPlugin.dll...

The Content Server query plugin properties of the dm_ftengine_config object are set during xPlore
configuration. If you have changed one of the properies, like the primary xPlore host, the plugin can
fail. Verify the plugin properties, especially the qrserverhost, with the following DQL:
1> select param_name, param_value from dm_ftengine_config

136 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

2> go

You see specific properties similar to the following:


param_name param_value
-
dsearch_qrygen_mode both
fast_wildcard_compatible true
query_plugin_mapping_file C:\Documentum\fulltext\dsearch\dm_AttributeMapping.xml
dsearch_domain DSS_LH1
dsearch_qrserver_host Config8518VM0
dsearch_qrserver_port 9300
dsearch_qrserver_target /dsearch/IndexServerServlet

Getting the query plan


The query plan can be useful to EMC tech support for evaluating slow queries. The query plan shows
which indexes were probed and the order in which they were probed. Use one of the following
options to save or fetch the query plan:
• xPlore search API
IDfXQuery.setSaveExecutionPlan(true)
IFtSearchSession.fetExecutionPlan(requestId) //fetch query plan
• DFC query builder API
IDfXQuery.setBooleanOption(IDfXQuery.FtQueryOptions.SAVE_EXECUTION_PLAN,true)
IDfXQuery.getExecutionPlani(session)
• apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,ftengine
Execution plan is in fulltext trace log, which is located in $DOCUMENTUM/dba/log/fullext/
fttrace_repository_name.log.
Note: This information is not written to the log for test queries that are issued through xPlore
administrator.

Slow queries
Slow queries can be caused by the following:
• System is not warmed up
• Result sets are large
• xPlore security has been disabled, and security is applied in Content Server. (Default is native
xPlore security.)
• Group caches are not tuned
• Query result size is too large
• FAST-compatible wildcard behavior is enabled

EMC Documentum xPlore Version 1.0 Administration Guide 137


Troubleshooting

• Insufficient CPU, disk I/O, or memory


• Query of too many collections (multiple repositories, or collections defined within a repository
domain)
• User is very underprivileged
• The query does not use the index

System is not warmed up or caches are too small

xPlore uses caches that reduce disk I/O. Response times are higher until the caches are loaded with
data.
Suggested workaround: Increase the size of the xDB buffer cache for higher query rates. Stop all
xPlore instances. Change the value of the property xhive-cache-pages in the engine-config element of
indexserverconfig.xml and restart the xPlore instances.

Result sets are large

Webtop users have a result maximum of 350, but custom clients may consume a larger result set.
Enable query auditing. Examine the number of results in the TopNSlowestQueries report for a
specific user and day. If the number of results is more than one thousand, the custom client may
be returning all the results.
Workaround: Change the client to consume a smaller number of results by closing the result
collection early or by using the DQL hint ENABLE(RETURN_TOP_N).

xPlore security is disabled (security applied in Content Server)

Content Server security is much slower than xPlore native security, because some or many results
that are passed to the Content Server are discarded. To detect the problem, enable query auditing.
Examine the number of results in the TopNSlowestQueries report for a specific user and day. If
the number of results is more than one thousand, xPlore security may be disabled and the user is
underprivileged. (When the user is underprivileged, many results are discarded.)
Workaround: Enable xPlore native security. Refer to Documentum search results security, page 47.

Group caches are not tuned

If a user is very underprivileged, or the user is the member of many groups, queries may slow due
to small group caches. For instructions on configuring the caches, refer to Configuring the security
cache, page 48.
For underprivileged users, examine the group_out_cache_fill element in the query audit record. If the
value exceeds the not-in-groups-cache-size, then the cache is too small.

138 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

For users who are members of a large number of groups, examine the group_cache_cache_fill element
in the query audit record. If the value exceeds the groups-in-cache-size, then the cache is too small.

Query result size is too large

By default, xPlore gets the top 12,000 most relevant results per collection to support a facet window of
10,000 results. Webtop applications consume only 350 results, so the extra result memory is costly
for large user environments or multiple collections (multiple repositories). In an environment
with millions of documents and multiple collections, you could see longer response times or out
of memory messages.
Workaround: Open xdb.properties, which is located in the directory WEB-INF/classes of the primary
instance. Set the value of queryResultsWindowSize to a number smaller than 12000.

FAST-compatible wildcard behavior is enabled

Many Documentum clients do not enable wildcard searches for word fragments like “car” for
“careful.” The FAST indexing server supported word fragment searches for leading and trailing wild
cards in metadata and word fragment searches in SEARCH DOCUMENT CONTAINS (SDC) full-text
queries. If you enable FAST-compatible wildcard behavior for your Documentum application, you
see slower queries when the query contains a wildcard.
For information on how to change this behavior, refer to Turning on support for fragments, page 113.

Insufficient CPU, disk I/O, or memory

Determine whether the system has only one or two cores and high query rate, or the system
is large but receives complex or unselective queries. Enable query auditing and examine the
TopNSlowestQueries report for the specific user name and day. Look for high query rates with
slow queries.
Workaround: Add more capacity.

Query of too many collections (multiple repositories, or collections


defined within a repository domain)

A query will probe each index for a repository (domain) sequentially. Results are collected across
repositories. To detect the problem, enable query auditing. Try the query across repositories and
then target it to a specific repository.
Suggested workarounds: Use IDfXQuery parallel queries (refer to Documentum xPlore Development
Guide) , or use the ENABLE(fds_collection collectionname) hint or the IN COLLECTION clause in
DQL (refer to Routing a query to a collection using DQL, page 108).

EMC Documentum xPlore Version 1.0 Administration Guide 139


Troubleshooting

User is very underprivileged

If the user is very underprivileged, tens of thousands of results may be discarded by the security
filter. To detect this, enable query auditing. Find the query using the TopNSlowestQueries report
for the specific user and day. If the number in the Documents filtered out columns is very large,
it is a security cache issue.
Workaround: Queries can generally be made more selective. If this is not possible, organize the
repository so that the user has access to documents in certain containers such rooms or cases, then
append the container IDs to the user’s query.

The query does not use the index

If the multi-path index is not used to service the query, then the query will run slowly. DQL and DFC
Search service queries always use the index. Some IDfXQuery-based queries may not use it. To detect
this issue, enable query auditing. Find the query using the TopNSlowestQueries report (with user
name and day). Get the user’s query id and get the query text by using the GetQueryText report.
Obtain the query plan to determine which indexes were probed, if any. (Provide the query plan to
EMC technical support for evaluation.) Rewrite the query to use the index.
Note: The query plan is not written to the log for test queries that are issued through xPlore
administrator.

Troubleshooting XML searches


Users can search for a specific element in an XML document. By default, XML content of an input
document is not indexed. You can change this in indexserverconfig.xml. Shut down all xPlore
instances before changing this file. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.
If your documents containing XML have already been indexed, they must be reindexed to include
the XML content.
Verify the following settings in indexserverconfig.xml:
• Change the value of the store attribute on the xml-content element to embed.
• Change the value of the tokenize attribute on the xml-content element to true.
• Change the value of the index-as-sub-path attribute on the xml-content element to true.
• Verify the path value attribute on the xml-content element against the DFTXML path. (For the
DFTXML DTD, refer to Appendix B, Extensible Documentum DTD.) An XPath error can cause
the query to fail.

Debugging from Webtop


If the query fails to return expected results in Webtop, perform a Ctrl-click on the Edit button in the
results page. The query is displayed in the events history as a select statement similar to the following:

140 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

IDfQueryEvent(INTERNAL, DEFAULT): [dm_notes] returned [Start processing] at


[2010-06-30 02:31:00:176 -0700]
IDfQueryEvent(INTERNAL, NATIVEQUERY): [dm_notes] returned
[SELECT text,object_name,score,summary,r_modify_date,...
SEARCH DOCUMENT CONTAINS ’ctrl-click’ WHERE (...]

If there is a processing error, the stack trace is shown.

Communication error or no collection available


If an API returns a connection refused error, check the value of the URL on the instance. Make sure
that it is valid and that search is turned on for the instance. If the search service is not enabled,
dsearch.log records the following exception:
com.emc.documentum.core.fulltext.common.search.FtSearchException:...
There is no node available to process this request type.

From Documentum DFC clients, the following exception is returned:


DfException: ..."EXEC_XQUERY failed with error:
ESS_DMSearch:ExecuteSearchPassthrough. Communication Error
Could not get node information using round-robin routing.

From Documentum DQL, the following error is returned:


dmFTSearchNew failed with error:
ESS_DMSearch:ExecuteSearch. Communication Error
Could not get node information using round-robin routing.

Error because you have changed the xPlore host — If you have to change the xPlore host name,
do the following:
• Update indexserverconfig.xml with the new value of the URL attribute on the node element.
Shut down all xPlore instances before applying your changes. Validate your changes using the
validation tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore
federation after you change this file.
• Change the JBoss startup (script or service) so that it starts correctly.

Foreign language not identified


Queries issued from Documentum clients are searched in the language of the session_locale. The
search client can set this through DFC or iAPI.

Changes to configuration not seen


If you have edited indexserverconfig.xml, your changes are overwritten by xPlore. First shut down all
xPlore instances and then apply your changes this file. Validate your changes using the validation
tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after
you change this file.

EMC Documentum xPlore Version 1.0 Administration Guide 141


Troubleshooting

Document indexed but not searchable


Try the following troubleshooting steps:
• Make sure the indexing status is DONE. (Refer to Checking the status of a document, page 133.)
• Verify that the document was indexed to the correct collection. The collection for each document is
recorded in the TrackingDB for the domain. Substitute the document ID in the following XQuery
expression, and execute it in the xDB admin tool:
for $i in collection("dsearch/SystemInfo")
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name

• Set the save-tokens option to true for the target collection (Troubleshooting lemmatization, page
67) and restart xPlore, then reindex the document. Check the tokens in the Tokens library to see
whether the search term was properly indexed.

Logging
Logging can be configured for each service in xPlore administrator. Log levels can be set for indexing,
search, and xPlore administrator.
To set logging for a service, choose System Overview in the left panel. Choose Global Configuration
and then choose the Logging Configuration tab to configure logging.
Choose a log and set the tracing level for dsearch.log:
• dsearchadmin
Logs xPlore administrator operations
• dsearchindex
Logs indexing operations
• dsearchdefault
Sets the default log level
• dsearchsearch
Logs search operations
For CPS logging configuration, refer to CPS logging, page 143.
The following log levels are available. Levels are shown in increasing severity and decreasing
amounts of information, so that TRACE displays more than DEBUG, which displays more than
INFO. FATAL logs only the most severe errors.
• TRACE
• DEBUG
• INFO
• WARN

142 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

• ERROR
• FATAL

Caution: Logging can slow the system and consume disk space. In a production environment,
the system should run with minimal logging enabled.

Logging for individual instances must be configured in indexserverconfig.xml, which is located in


dsearch_home/config. Stop all xPlore instances before modifying this file. Validate your changes
using the validation tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore
federation after you change this file.
Log files can be displayed or downloaded from the logging screen.
You can log information for specific xPlore packages or classes. Enter the class and log level in the
properties file log4j.properties in the WEB-INF/classes directory of the xPlore WAR file.

Viewing logs in xPlore administrator


You can view indexing, search, CPS and xDB logs in xPlore administrator. Choose an instance in
the tree and click Logging. Indexing and search messages are logged to dsearch. Click the tab for
dsearch, cps, cps_daemon, or xdb to view the last part of the log. Click Download All Log Files
to get links for each log file.

CPS logging
CPS does not use the xPlore logging framework. A CPS instance that is embedded in an xPlore
instance uses the log4j.properties file in WEB-INF/classes of the dsearch web application. A
standalone CPS instance uses log4j.properties in the CPS web application, in the WEB-INF/classes
directory.
If you have installed more than one CPS instance on the same host, each instance has its own web
application and log4j.properties file. To avoid one instance log overwriting another, make sure each
file appender in log4j.properties points to a unique file path.

Log layout formats


Two formats are supported for logs: Text and XML.

Text layout — Message format:


%r %5p [%c{1}-(%t)] %m %x %Z%n

Key:

EMC Documentum xPlore Version 1.0 Administration Guide 143


Troubleshooting

Table 17. Text layout arguments

Argument Description
r Number of milliseconds elapsed from the
constructions of the layout until the creation of
the logging event
p Priority of the logging event (max length is 5
characters)
c Category of the logging event, typically fully
qualified class name. It will be filtered to log just
class name
t Thread name for thread that generated the
logging event
m Message from the application associated with
the logging event
x Context (NDC, nested diagnostic context)
associated with the thread that generated the
logging event, if the code is instrumented. For
internal use only.

Z Additional name-value pair information that


was passed to logging framework.
n Platform-dependent line separator character or
characters

A sample log message from the CPS log:


2009-09-10 19:05:48,773 INFO [MANAGER-CPSManager-(RMI TCP Connection(40)-127.0.0.1)]
PERFCPSTS1 request get-metrics recv’d
2009-08-25 09:24:09,101 INFO [MANAGER-CPSConfigurationFileReader-(main)]
CPS version: 1.0.3.tst

Following is a sample log message when additional name-value pair information is available or
passed:
2009-08-25 09:24:09,101 INFO [ESSContext-(main)] testing
[message = xhive db has started ]

Specify your preferred text layout as the value of log4j.appender.<appenderName>.layout.


ConversionPattern in the log4j.properties file. (This file is located in the indexserver war file, in the
WEB-INF/classes directory.) The default layout is XML. The log4j configuration is like the following.
Substitute <appenderName> with your appender. Substitute your preferred values for file size and
log location. Line breaks are shown here for readability but do not exist in the properties file:
log4j.appender.<appenderName>=org.apache.log4j.RollingFileAppender
log4j.appender.<appenderName>.MaxFileSize=10MB
log4j.appender.<appenderName>.MaxBackupIndex=10
log4j.appender.<appenderName>.File=C:/temp/xPlore/logs/fulltext.log
log4j.appender.<appenderName>.layout=
com.emc.documentum.core.fulltext.utils.log.ESSPatternLayout

144 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

log4j.appender.<appenderName>.layout.ConversionPattern=
%r %5p [%c{1}-(%t)] %m %x %Z%n

XML layout — For XML layout, log4j generates the message into an XML file. The appender that
generates an XML log is com.emc.documentum.core.fulltext.utils.log.ESSXmlLayout.
Note: For XML log output, log4j does not generate the parent or root XML element. Add the parent
element before parsing the file by an XML parser.
Sample message (line breaks inserted for readability):
<event timestamp="2009-01-06 18:39:41,094" level="WARN" thread="main"
logger="com.emc.documentum.core.fulltext.indexserver.core.config.impl.
xmlfile.IndexCollectionConfig" elapsedTime="1231295981094">
<message><![CDATA[[CONF_NO_DEFAULT_LIBRARY] There is no default
library found for collection, [knowledgeworker]. The first library
in the list, [library1], is assumed as default.]]></message>
</event>

Log locations
xPlore uses Apache log4j, an open source module for logging. log4j has a set of logging configuration
options based on severity level. Information for specific packages can be logged. The xPlore custom
XML log4j appender logs messages into a file when you specify the log4j RollingFileAppender or into
xDB when you specify the XHiveDbAppender.
The following configuration logs messages to a file. Line breaks are shown here for readability
but do not exist in the properties file:
log4j.appender.<appenderName>=org.apache.log4j.RollingFileAppender
log4j.appender.<appenderName>.MaxFileSize=10MB
log4j.appender.<appenderName>.MaxBackupIndex=10
log4j.appender.<appenderName>.File=C:/temp/xPlore/logs/fulltext.log
log4j.appender.<appenderName>.layout=
com.emc.documentum.core.fulltext.utils.log.ESSXmlLayout

The following configuration logs messages to an xDB log. Line breaks are shown here for readability
but do not exist in the properties file:
log4j.appender.<appenderName>=
com.emc.documentum.core.fulltext.utils.log.XHiveDBAppender
log4j.appender.<appenderName>.filename=dsearch.log
log4j.appender.<appenderName>.fallBackAppender=org.apache.log4j.RollingFileAppender
log4j.appender.<appenderName>.libraryPath=root
log4j.appender.<appenderName>.buffer=10
log4j.appender.<appenderName>.layout=
com.emc.documentum.core.fulltext.utils.log.ESSXmlLayout

The following log4j parameters for the XhiveDbAppender are configurable:

filename Log file name that will be created in xDB, default


is dsearch.log (use with XHiveDbAppender)
fallBackAppender. Appender used to log messages that are
generated before xDB is available (use with
XHiveDbAppender)

EMC Documentum xPlore Version 1.0 Administration Guide 145


Troubleshooting

libraryPath library destination for logs, default is ’root’.


buffer Specifies number of events to hold in buffer
before flushing to xDB. Default is 10 logged
events.

xDB and Lucene logging


xDB and Lucene are logged in xDB.log, which is located in the primary instance
dsearch_home/server/DCTMServer_PrimaryDsearch/logs. Logging for xDB and Lucene operations is
configured in logging.properties, which is in the following directory of the xPlore primary instance:
deploy/dsearch.war/WEB-INF/classes. You can configure the following JDK log levels:
• log level
Valid values: SEVERE, WARNING (default), INFO, CONFIG, FINE, FINER, FINEST
• Path to log file

Query logging
The xPlore search service logs queries. For each query, the search service logs the following
information for all log levels:
• Start of query execution including the query statement
• Total results processed
• Total query time including query execution and result fetching

Tip: More query information is logged when native xPlore security (not Content Server security)
is enabled.

Set the log level in xPlore administrator. Open Services in the tree, expand and select Logging, and
click Configuration. You can set the log level independently for administration, indexing, search,
and default. Levels in decreasing amount of verbosity: TRACE, DEBUG, INFO, WARN (default),
ERROR, and FATAL.
To further configure logging, stop all xPlore instances and edit indexserverconfig.xml in
dsearch_home/config. You can set the maximum log file size and maximum number of backups.
A single line is logged for each batch of query results returned by the xPlore server. The log message
has the following form:
<date-time><Tracing Level><Class Name><Thread ID><Query ID>[
<main query options in concise form>]<total hits><execution time in millseconds>

The following examples from dsearch.log show a query, total results processed, and total query time:
<event timestamp="2010-06-07 21:54:26,090" ...>
<message ><![CDATA[QueryID=PrimaryDsearch$d95fd870-9639-42ad-8da2-167958017f4d,
query-locale=en,query-string=let $j:= for $i score $s in /dmftdoc
[. ftcontains ’strange’] order by $s
descending return

146 EMC Documentum xPlore Version 1.0 Administration Guide


Troubleshooting

<d> {$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }


{ $i/dmftmetadata//r_modifier } </d>
return subsequence($j,1,200) is running]]></message></event>

<event timestamp="2010-06-07 21:54:26,090" ...>


<message ><![CDATA[QueryID=PrimaryDsearch$d95fd870-9639-42ad-8da2-167958017f4d,
query thread started on xhive library=DSS_LH1/dsearch/Data//default]]></message></event>

<event timestamp="2010-06-07 21:54:26,324" ...>


<message ><![CDATA[QueryID=PrimaryDsearch$d95fd870-9639-42ad-8da2-167958017f4d
execution time=234 Milliseconds]]></message></event>

Tracing
You can configure tracing in xPlore administrator. Expand the instance in the left panel and select
Tracing. Enable or disable tracing in the right panel. Tracing does not require a restart.
When you enable tracing, a detailed Java method call stack is logged in one file. From that file, you
can identify the methods that are called, with parameters and return values. Refer to the Documentum
xPlore Development Guide for more information on tracing.
To trace specific classes, edit indexserverconfig.xml, which is located in dsearch_home/config. Shut
down all xPlore instances before changing this file. Validate your changes using the validation tool
described in Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you
change this file. You can configure the name, location, and format of the log file for the logger and
its appender in indexserverconfig.xml or in the log4j.properties file. The log4j configuration takes
precedence.

EMC Documentum xPlore Version 1.0 Administration Guide 147


Troubleshooting

148 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 11
Using Reports

Reports provide indexing and query statistics, and they are also a troubleshooting tool. Chapter
10, Troubleshooting and Chapter 12, Performance and Disk Space describe how to use reports for
troubleshooting tips.
Statistics on content processing and indexing are stored in the metrics database. Use Reports to
query these statistics. Statistics for queries are stored in an audit record. Enabled query auditing
to get reports on queries. (It is disabled by default.) Choose Diagnostic and Troubleshooting ,
click Audit Records and then click Enable. For more information on configuring auditing, refer to
Auditing queries, page 103.

Running reports — To run reports, choose Diagnostic and Troubleshooting and then click Reports.
To generate Documentum reports that compare a repository to the index, refer to Running the state of
the index job, page 60.
Reports are described in the following topics:
• Types of reports, page 149
• Document processing (CPS) reports, page 150
• Indexing reports, page 151
• Search reports, page 151
• Editing a report, page 152

Types of reports
Table 18, page 149 describes the reports that are available in xPlore administrator.

Table 18. List of reports

Report title Description


Document processing error summary Use first to determine the most common
problems. Displays error code, count, and error
text.
Document processing error detail Drill down for error codes. Report for each code
displays the request ID, domain, date and time,
format, and error text.

EMC Documentum xPlore Version 1.0 Administration Guide 149


Using Reports

Report title Description


Content too large to index Displays format, count, average size, maximum
size, and minimum size.
Documents ingested per month Displays monthly totals for current year,
including document count, bytes ingested,
average processing latency, and CPS error count.
Documents ingested per day Displays daily totals for current month.
Documents ingested per hour Displays hourly totals for current day.
Top query terms Displays most common query terms including
number of queries and average number of hits.
Top N slowest queries Displays the slowest queries. Select N as
Number of results to display. Optionally, you
can get slowest queries for a specified user.
You can sort by time to first result, processing
time, number of results fetched, number of hits,
number filtered out by security, and most recent.
Query counts by user For each user, displays number of queries,
average response time, and maximum and
minimum response times (sortable columns).
Get query text Get the query ID from the report Top N slowest
queries. Input to this report to get the XQuery
expression.

Document processing (CPS) reports


Run the Document processing error summary report to find the count for each type of problem. The
error count for each type is listed in descending order. The following types of processing errors are
reported: request and fetch timeout, invalid path, fetching errors, password protection or encryption,
file damage, unsupported format, language and parts of speech detection, or document size.
View detailed reports for each type of processing error. For example, the Document processing error
detail report for Error code 770 (File corrupt) displays object ID, domain, date, time, format, and error
text. You can then locate the document in xPlore administrator by navigating to the domain and
filtering the default collection for the object ID. Using the object ID, you can view the metadata in
Content Server to determine the document owner or other relevant properties.
Run the report Content too large to index to see how many documents are being rejected for size. If
your indexing throughput is acceptable, you can increase the size of documents being indexed. For
more information, refer to Indexing performance, page 164.

150 EMC Documentum xPlore Version 1.0 Administration Guide


Using Reports

Indexing reports
To view indexing rate, run the report Documents ingested per month/day/hour. The report shows Average
processing latency. The monthly report covers the current 12 months. The daily report covers the
current month. The hourly report covers the current day. From the hourly report, you can determine
your period of highest usage. You can divide the document count into bytes processed to find out the
average size of content ingested. For example, 2,822,469 bytes for 909 documents yields an average
size of 3105 bytes. This does not include non-indexable content.

Search reports
Enable auditing in xPlore administrator to view query reports.
Note: Queries in xPlore administrator are audited but are not reported by the query processing
reports.

Top N slowest queries — Find the slowest queries by selecting Top N slowest queries. To determine
how many queries are unselective, sort by Number of results fetched. (Note that this is limited by
default in Webtop to 350.)
Sort Top N slowest queries by Number of hits denied access by security filter to see how many
underprivileged users are experiencing slow queries due to security filtering. For information on
changing the security cache, refer to Changing the security cache sizes, page 168

Get query text — To examine a slow or failed query by a user, get the query ID from Top N slowest
queries and then enter the query ID into Get query text. Examine the query text for possible problems.
The following example is a slow query response time. The user searched in Webtop for the string
"xplore" (line breaks added here):
declare option xhive:fts-analyzer-class ’com.emc.documentum.core.fulltext.indexserver
.core.index.xhive.IndexServerAnalyzer’; for $i score $s in collection(’
/DSS_LH1/dsearch/Data’) /dmftdoc[( ( ( (dmftmetadata//a_is_hidden = ’false’) ) )
and ( (dmftinternal/i_all_types = ’030a0d6880000105’) )
and ( (dmftversions/iscurrent = ’true’) ) )
and ( (. ftcontains ( ((’xplore’) with stemming) ) )) ]
order by $s descending return
<dmrow>{if ($i/dmftinternal/r_object_id) then $i/dmftinternal/r_object_id
else
<r_object_id/>}{if ($i/dmftsecurity/ispublic) then $i/dmftsecurity/ispublic
else <ispublic/>}{if ($i/dmftinternal/r_object_type) then $i/dmftinternal/r_object_type
else <r_object_type/>}{if ($i/dmftmetadata/*/owner_name)
then $i/dmftmetadata/*/owner_name
else <owner_name/>}{if ($i/dmftvstamp/i_vstamp) then $i/dmftvstamp/i_vstamp
else <i_vstamp/>}{if ($i/dmftsecurity/acl_name) then $i/dmftsecurity/acl_name
else <acl_name/>}{if ($i/dmftsecurity/acl_domain) then $i/dmftsecurity/acl_domain
else <acl_domain/>}<score dmfttype=’dmdouble’>{$s}</score>{xhive:highlight(
$i/dmftcontents/dmftcontent/dmftcontentref)}</dmrow>

Use the xDB admin tool to debug the query. For instructions on using xhadmin, refer to Using the xDB
admin tool, page 36.

Query counts by user — Use Query counts by user to determine which users are experiencing the
slowest query response times.

EMC Documentum xPlore Version 1.0 Administration Guide 151


Using Reports

Editing a report
You can edit any of the xPlore reports. Select a report in xPlore administrator and click Save as.
Specify a unique file name and title for the report. Alternatively, you can write a new copy of the report
and save it to dsearch_home/jboss4.3.0/server/primary_instance/deploy/dsearchadmin.war/reports.
The new report will be picked up by xPlore administrator if you click somewhere else in xPlore
administrator and then click Reports.

Accessing the audit record — The audit record is stored in the xDB database for the xPlore
federation. You can filter the audit record by date using xPlore administrator. You can copy
the entire audit record using the xDB admin tool. Open the xDB tree and drill down to
root-library/SystemData/AuditDB/primary_instance_name/auditRecords.xml. For instructions on
using xhadmin, refer to Using the xDB admin tool, page 36.

Example 11-1. Sample edited report


This example edits the Query counts by user report to add a column for number of failed queries.
1. Open the audit record and view it as XML to see the fields that can be used in the report. For this
example, we select the TOTAL_HITS field.
2. Select the report Query counts by user and click Save as. Specify a unique file name and title
for the report.
3. Edit the saved report, which we saved as My query counts by user. (Click the pencil icon next to the
report in My Recently Saved Reports.)
4. After the <column> element whose value is Query Cnt, add the following column:
<column type="integer">Failed Queries</column>

5. Create a variable for failed queries and add it after the variable definition for successful queries
(for $j ...let $k ...). We can find the nodes in a QUERY element whose TOTAL_HITS value is
equal to zero to get the failed queries.
let $z := collection(’AuditDB’)//event[@component = "search" and @name = "QUERY"
and START_TIME[ . >= $startTime and . <= $endRange] and USER_NAME = $j
and TOTAL_HITS = 0]

6. Create a variable for the count of failed queries and add it after the variable for successful query
count (let $queryCnt...):
let $failedCnt := count($z)

7. Return the failed query count cell, after the query count cell (<cell> { $queryCnt } ...):
<cell> { $failedCnt } </cell>

8. Redefine the failed query variable to get a count for all users. Add this line after <rowset...>let $k...:
let $z := collection(’AuditDB’)//event[@component = "search" and @name = "QUERY"
and START_TIME[ . >= $startTime and . <= $endRange] and USER_NAME and TOTAL_HITS = 0]

9. Add the total count cell to this second rowset, after <cell> { $queryCnt } </cell>:
<cell> { $failedCnt } </cell>

10. Save and run the report. The result is similar to the following:

152 EMC Documentum xPlore Version 1.0 Administration Guide


Using Reports

Figure 12. Customized report for query count

If your query has a syntax error, you will get a stack trace that identifies the line number of the error.
You can copy the text of your report into an XML editor that displays line numbers, for debugging.
If the query runs very slowly, it will time out after about one minute. You can run the same query in
the xDB admin tool.

EMC Documentum xPlore Version 1.0 Administration Guide 153


Using Reports

154 EMC Documentum xPlore Version 1.0 Administration Guide


Chapter 12
Performance and Disk Space

The following topics describe disk space usage and performance.


• Planning for performance, page 155
• Disk space and storage type, page 158
• System sizing, page 160
• Using metrics to evaluate performance, page 161
• System tuning , page 161
• Documentum index agent performance, page 163
• Indexing performance, page 164
• Search performance, page 166
In addition, use troubleshooting topics to help you find the cause for a specific problem:
• Troubleshooting system problems, page 118
• Troubleshooting the Documentum index agent, page 120
• Troubleshooting CPS, page 126 and Document processing (CPS) reports, page 150
• Troubleshooting indexing, page 132 and Indexing reports, page 151
• Troubleshooting search, page 136 and Search reports, page 151
• Logging, page 142 and Tracing, page 147

Planning for performance


Plan your system sizing to match your performance and availability requirements. Refer to the
system planning topic in Documentum xPlore Deployment Guide and the System Sizing Guide. This
information will help you plan for the number of hosts and storage. The following diagram shows
ingestion scaling. As you increase the number of documents in your system, or the rate at which
documents are added, first add memory, disk, or CPU, then add remote CPS or more JVM memory.
You can increase the number of collections for ingestion specificity. Add xPlore instances on the same
or different hosts to handle your last scaling needs.

EMC Documentum xPlore Version 1.0 Administration Guide 155


Performance and Disk Space

Figure 13. Scaling ingestion throughput

Use the rough guidelines in the following diagram to help you plan scaling of search. The order of
adding resources is the same as for ingestion scaling.

156 EMC Documentum xPlore Version 1.0 Administration Guide


Performance and Disk Space

Figure 14. Scaling number of users or query complexity in search

Improving search performance with time-based


collections
You can plan for time-based collections, so that only recent documents are indexed. If most of your
documents are not changed after a specific time period, you can migrate data to collections based on
creation date, modification date, or a custom date atttribute. (Route using a custom routing class or
index agent configuration.) You must also route queries to the appropriate collection by customizing
DFC query builder. Refer to Filtering content and locations, page 55.)
To determine whether a high percentage of your documents are not touched after a specific time
period, use two DQL queries to compare results:
1. Use the following DQL query to determine the number of documents modified and accessed in
the past two years (change DQL to meet your requirements):
select count(+) from dm_sysobject where
datediff(year,r_creation_date,r_access_date)<2 and
datediff(year,r_creation_date,r_modify_date)<2

2. Use the following DQL query to determine the number of documents in the repository:

EMC Documentum xPlore Version 1.0 Administration Guide 157


Performance and Disk Space

select count(*) from dm_sysobject

3. Divide the results of step 1 by the results of step 2. If the number is high, for example, .8, most
documents were modified and accessed in the past two years. (80%, in this example)

Disk space and storage type


Disk space allocation and storage type have a large impact on performance.

Planning for disk space


xPlore requires disk space for the following components. The first two require most of the xPlore
space.

Table 19. How xPlore uses disk space

Component Space use Indexing Search


xDB DFTXML Next free space Random access
representation of consumed by disk retrieval of particular
document content and blocks for batches of elements and
metadata, metrics, XML files. summary.
audit, and Document
ACLs and groups.
Lucene Stores an index of Information is updated Inverted index lookup,
content and metadata. through inserts and facet and security
merges. lookup.
xDB transaction (redo) Stores transaction Updates areas in xDB Sometimes provides
log information. from log. snapshot during
retrieval.
Lucene temporary Used for Lucene Uncommitted data is None
working area updates of stored to the log.
non-transactional data.

Estimating index size (Documentum environments) — The average size of indexable content
within a document varies from one document type to another and from one enterprise to another.
You must calculate the average size for your environment. The easiest estimate is to use the disk
space that was required for a Documentum indexing server with FAST. If you have not installed a
Documentum indexing server, you can use the following procedure to estimate index size.

You can also use the following procedure to estimate index size.
1. Perform a query to find the average size of documents, grouped by a_content_type, for example:
select avg(r_full_content_size),a_content_type from dm_sysobject group by
a_content_type order by 1 desc

158 EMC Documentum xPlore Version 1.0 Administration Guide


Performance and Disk Space

2. Perform a query to return 1000 documents in each format. Specify an the average size range, that
is, r_full_content_size greater than (average less some value) and less than (average plus some
value). Make the plus/minus value a small percentage of the average size. For example:
select r_object_id,r_full_content_size from dm_sysobject
where r_full_content_size >(1792855 -1000) and
r_full_content_size >(1792855 +1000) and
a_content_type = ’zip’ enable (return_top 1000)

3. Export these documents and index them into new, clean xPlore install.
4. Determine the size on disk of the dbfile and lucene-index directories in dsearch_home./data
5. Extrapolate to your production size.
For example, you have ten indexable formats with a average size of 270 KB from a repository
containing 50000 documents. The Content Server footprint is approximately 12 GB. You get a sample
of 1000 documents of each format in the range of 190 to 210 KB. After export and indexing, these
10000 documents have an indexed footprint of 286 MB. Your representative sample was 20% of the
indexable content, so your calculated index footprint is 5 x sample_footprint=1.43 GB (dbfile 873
MB, lucene-index 593 MB)..

Disk space vs. indexing rebuild performance — If you save indexing tokens for faster index
rebuilding, they consume disk space. By default they are not saved. Edit indexserverconfig.xml and
set domain.collection.properties.property "save-tokens" to true for a collection.

Tuning xDB properties for disk space — You can set the following property in xdb.properties,
which is located in the directory WEB-INF/classes of the primary instance. If this properties is not
listed, you can add it.
• TEMP_PATH
Temporary path for Lucene index. If not specified, the current system property java.io.tmpdir is
used.

Managing index disk space — To conserve disk space on the primary host, purge the status
database when the xPlore primary instance starts up. By default, the status DB is not purged. Refer to
Managing the status database, page 38)
If you have specified save-tokens for summary processing, edit indexserverconfig.xml to limit the
size of tokens that are saved. Set the maximum size of the element content in bytes as the value
of the attribute extract-text-size-less-than. Tokens will not be saved for larger documents. Set the
maximum size of tokens for the document as the value of the attribute token-size. For details on
extraction settings, refer to Table 7, page 76.
Insufficient disk space, page 118 describes specific troubleshooting for unexpected disk space
problems.

Storage types and locations


Choosing a storage type — Table 20, page 160 shows performance notes for various disk storage
types.

EMC Documentum xPlore Version 1.0 Administration Guide 159


Performance and Disk Space

Table 20. Comparison of storage types performance

Function SAN NAS local disk iSCSI CFS


Used for Common Common Common Rare Rare
Content (content)
Server
Network Fiber Ethernet Local Ethernet Fiber
Performance Best Slower Good until I/O Slower Almost as fast
than SAN, limit reached than SAN, as SAN
improved improved
with 10GE with 10GE
High Requires Provides Requires Requires Provides
availability cluster shared drives complete dual cluster shared drives
technology for server system technology for server
takeover takeover
xPlore multi- Requires Drives already Requires Requires Drives already
instance network shared network network shared
shared drives shared drives shared drives

Managing storage locations — The data store locations for xDB libraries are configurable. The xDB
data stores and indexes can reside on a separate data store, SAN or NAS. Configure the storage
location for a collection in xPlore administrator. You can also add new storage locations through
xPlore administrator.

System sizing
You can plan system sizing for CPS processing, ingestion, and search.

Adding CPS instances — CPS processing of documents is typically the bottleneck in ingestion. CPS
also processes queries. You can add CPS instances either on the same host as the primary instance
or on additional hosts (vertical and horizontal scaling, respectively). A remote CPS instance does
not perform as well as a CPS instance on an indexing instance. The remote instance adds overhead
for the xPlore system.
To add CPS instances, run the xPlore configuration script and choose Create Content Processing
Service Only.

Sizing for search performance — You can size several components of an xPlore system for search
performance requirements:
• CPU capacity
• Memory for query caches
Using xPlore administrator, change the value of query-result-cache-size in search service
configuration and restart the search service.

160 EMC Documentum xPlore Version 1.0 Administration Guide


Performance and Disk Space

Sizing for ingestion performance — You can size several components of an xPlore system for
performance requirements:
• CPU capacity
• I/O capacity (the number of disks that can write data simultaneously)
• Memory for temporary indexing usage

Sizing migration from FAST — When you compare sizing of the FAST indexing system to xPlore,
use the following guidelines:
• Size with the same allocations used for FAST, unless the FAST installation was very undersized or
you expect usage to change.
• Use VMWare-based deployments, which were not supported for FAST.
• Include sizing for changes to existing documents:
— A modification to a document requires the same CPU for processing as a new document.
— A versioned document requires the same (additional) space as the original version.
• Size for high availability and disaster recovery requirements.

Using metrics to evaluate performance


The following metrics are available in xPlore administrator to help identify specific performance
problems. Select an xPlore instance and then choose Indexing Service or Search Service to see the
metric.

Table 21. Indexing metrics mapped to performance problems

Metric Service Problem

Ingestion throughput Indexing Service Slow document indexing


throughputs (documents per
second)
Total number of documents Indexing Service Indexing runs out of disk space
indexed (or bytes)
Search response time Search Service Query timeouts or slow query
response

System tuning
Some system tuning requires editing of indexserverconfig.xml. (Refer to Modifying
indexserverconfig.xml, page 36.)

Excluding xPlore files from virus scanners — Performance of both indexing and search can be
degraded during virus scanning. Exclude xPlore directories, especially the dsearch_home/data
directory.

EMC Documentum xPlore Version 1.0 Administration Guide 161


Performance and Disk Space

Tuning memory pools — xPlore uses four memory caches. The last three are part of the xPlore
instance memory and have a fixed size:
• OS buffer cache
Holds temporary files, xDB data, and Lucene index structures. Has largest impact on Lucene
index performance.
• xDB buffer cache
Stores XML file blocks for ingestion and query. Increase for higher query rates: Change the value
of the property xhive-cache-pages in the engine-config element of indexserverconfig.xml. Back up
the xPlore federation after you change this file.
• Lucene working memory
Used to process queries. Lucene working memory is consumed from the host JVM process.
Increasing the JVM memory may not affect performance.
• xPlore caches
Temporary cache to buffer results. Using xPlore administrator, change the value of
query-result-cache-size in search service configuration and restart the search service.
VMWare deployments require more instances than physical deployments. For example, VMWare is
limited to eight cores.

64–bit vs. 32–bit — 64-bit operating systems have advantages and disadvantages in an xPlore
installation:
• Advantages
— More memory is used to cache index structures for faster query access.
— More memory is available to index large documents.
— 64–bit supports higher ingestion and query rates.
• Disadvantages
— Per-object memory space is higher. If memory is low, a 32–bit VM will perform better.
— The size of the 64–bit VM is limited by garbage collection activity.

Sizing the disk I/O subsystem — xPlore supports local disk, SAN, and NAS storage. These storage
options do not have equal performance. For example, NAS devices send more data and packets
between the host and subsystem. Jumbo frame support is helpful as is higher bandwidth.

Compression — Indexes can be compressed to enhance performance. Compression uses more I/O
memory. The compress element in indexserverconfig.xml specifies which elements in the ingested
document have content compression to save storage space. Compressed content is about 30% of
submitted XML content. Compression may slow the ingestion rate by 10-20% when I/O capacity is
constrained. Refer to Modifying indexes, page 76.
If ingestion starts fast and gets progressively slower, set compression to false for subpath indexes in
indexserverconfig.xml.

162 EMC Documentum xPlore Version 1.0 Administration Guide


Performance and Disk Space

Documentum index agent performance


The following topics describe tunable index agent settings and performance evaluation. Refer to
Troubleshooting the Documentum index agent, page 120 for specific troubleshooting topics.

Index agent settings


The following parameters in indexagent.xml can affect index agent performance. This file is located
in the WEB-INF/classes/ directory of the index agent WAR file. Do not change these values unless you
are directed to change them by EMC technical support.
• exporter.thread_count
Number of threads that extract metadata into DFTXML using DFC
• connectors.file_connector.batch_size
Number of items picked up for indexing when the index agent queries the repository for queue
items.
• exporter.queue_size
Internal queue of objects submitted for indexing
• indexer.queue_size
Queue of objects submitted for indexing
• indexser.callback_queue_size
Size of queue to hold requests sent to xPlore for indexing. When the queue reaches
this size, the index agent will wait until the callback queue has reached 100% less the
callback_queue_low_percent.

Measuring index agent performance


Verify index agent performance using the index agent UI details page. Find the details for Indexed
content KB/sec and Indexed documents/sec. All Averages measures the average time between index
agent startup and current run time. All Averages up to Last Activity measures the time between index
agent startup and last indexing activity.

Adding index agent instances


You can add index agent instances to speed up indexing. Each index agent temporary storage area,
which you specify when you configure the index agent, must be accessible to all CPS instances. Refer
to Setting up index agents for ACLs and groups, page 54.

EMC Documentum xPlore Version 1.0 Administration Guide 163


Performance and Disk Space

Indexing performance
Various factors affect the rate of indexing. You can tune some indexing and xDB parameters and adjust
allowable document size. For specific indexing issues, refer to Troubleshooting indexing, page 132.

Factors in indexing rate


The indexing rate is affected by the following major factors:
• The complexity of documents
For example, a simple text document containing thousands of words can take longer to index
than a much larger Microsoft Word document full of pictures. MS Excel files take much longer
to index due to their complex cell structure.
• The indexing server I/O subsystem capabilities
• The number of CPS instances
For heavy ingestion loads or high availability requirements, add CPS instances to increase content
processing bandwidth.
• The number of collections
Create multiple collections spread over multiple xPlore instances to scale xPlore. (Documents can
be indexed into specific target collections. For best search performance, queries should also be
routed to specific collections. Refer to Documentum xPlore Development Guide for information on
custom indexing and query routing.
• Recovery during heavy ingestion
If the system crashes during a period of heavy ingestion, transactional recovery could take a long
time as it replays the log. The recovery process is single-threaded. This bottleneck can be avoided
by frequent incremental backups, which shortens the restore period. Alternatively, you can set up
an active/active high availability system so that failure in a single system does not disrupt business.
• Processor version
A 64-bit processor supports more domains, more collections, more users, and higher ingestion
rates than a 32-bit processor.

Tunable Indexing properties


The number of threads, batch size, TrackDB cache size, thread wait time, and queue size at each
stage of indexing impacts ingestion performance. The biggest impact on ingestion rate is with
threadpool size and processing buffer size. You can configure CPS and indexing settings using xPlore
administrator. For a list of these properties, refer to Document processing and indexing service
settings, page 175. See also Content processing instance settings, page 173.
To scale up for large ingestion requirements or for high availability, add more CPS instances.

164 EMC Documentum xPlore Version 1.0 Administration Guide


Performance and Disk Space

Document size and performance


Two configuration properties affect the size of documents that are indexed and consequently the
ingestion performance:
• The index agent (Documentum only) limits the size of the documents submitted for indexing.
This limit is changed in indexagent.xml, in the WEB-INF/classes/ directory of the index agent
WAR file. You can change the contentSizeLimit parameter to a different value (in bytes). Stop the
index agent instance to change the size limit.
<parameter_name>contentSizeLimit</parameter_name>
<parameter_value>20000000</parameter_value>
</parameter>

• CPS limits the size of text that is indexed. A document can have a much greater size
(contentSizeLimit) compared to the indexable text within the document. You can change the value
of Max Text Threshold in the xPlore Administrator CPS configuration screen. Units are bytes and
the range is 5-40 MB. Default: 10 MB.
You can configure multiple CPS instances so that a single CPS is not overwhelmed with load.
Documents will be submitted to CPS for processing in round-robin order.
Note: Increasing the maximum text size can negatively impact CPS memory consumption under
heavy load. In this case, the entire batch of submitted documents will fail.
For additional factors that impact disk space, refer to Insufficient disk space, page 118.

Tunable xDB properties


Most applications do not need to modify xDB properties. The bottleneck for indexing is usually the
process of writing index files to disk, which you can address by increasing I/O capabilities. With the
guidance of Documentum technical support, you can set the following properties in xdb.properties,
which is located in the directory WEB-INF/classes of the primary instance. If these properties are
not listed, you can add them.
• cleaningInterval
Interval in seconds between LRU-based cache cleanup. Default: 120.
• cleanMergeInterval
Interval in seconds before a non-final merge into a fresh, new index. After committing, the index
and black list (change log) are dirty because part of them can be in the application cache or OS
system cache. During the xDB checkpoint process, the data can be flushed to disk and they are
clean. Default: 300.
• dirtyMergeInterval
Interval in seconds before a non-final, dirty merge. Default: 30.
• ramBufferSizeMB
Size in megabytes of the RAM buffer for document additions, updates, and deletions. For faster
indexing, use as large a RAM buffer as possible for the host. Default: 3.
• maxRamDirectorySize

EMC Documentum xPlore Version 1.0 Administration Guide 165


Performance and Disk Space

Maximum RAM in bytes to be used for in-memory Lucene index. Higher values use more
memory and support faster indexing. Default: 3000000.
• mergeFactor
Number of index entries to keep in memory before storing to disk and how often segments are
merged. For example, a factor of 10 creates a new segment for every 10 XML documents added to
the index, and when the tenth segment has been added, the segments are merged. A high value
improves batch indexing and optimized search performance and uses more RAM. A low value
uses less memory and causes the index to be updated more often, slowing down indexing, but
searches on unoptimized indexes are faster. Default: 10.
Note: High values can causes a “too many open files” exception. You can increase the maximum
number of open files allowed on a UNIX or Linux host by increasing the ulimit setting.
• maxMergeDoc
Sets the maximum size of a segment that can be merged with other segments. Low values are
better for interactive indexing because this limits the length of merging pauses during indexing.
High values are better for batch indexing and faster searches. If RAM buffer size is exceeded
before max merge doc, then flush is triggered. Default: 1000000
• nonFinalMaxMergeSize
Maximum size of internal Lucene index that is eligible for merging, in bytes. Non-final merge is
executed frequently to reduce the number of file descriptors, memory consumption and sub-index
creation. Default: 300000000
• finalMergingInterval
Interval after which final sub-indexes are merged, usually once a day. Units are hours in 24–hour
time, minutes, and seconds. Default: midnight (24*60*60).

Search performance
To measure query performance, you must enable auditing. You can also turn on tracing information
for query execution. Select an instance, choose Tracing, and then choose Enable.
For slow queries, refer to Slow queries, page 137.
Examine the query load to see if the system is overloaded. Run the report Top N slowest queries .
Examine the Start time column to see whether slow queries occur at a certain period during the day or
certain days of the month.
Save the query execution plan to find out whether you need an additional index on a metadata
element. (For more information on the query plan, refer to Getting the query plan, page 137.)
Documentum clients can save the plan with the following iAPI command:
apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,ftengine

166 EMC Documentum xPlore Version 1.0 Administration Guide


Performance and Disk Space

Factors in query performance


The following features of full-text search can affect search performance:
• Single-box search
The default operator for multiple terms is AND. This can be configured to OR (the old Webtop
default), but performance can be much slower.
• Flexible metadata search (FTDQL)
Searches on multiple object attributes can affect performance, especially if the first term is
unselective.
• Leading or trailing wildcards
By default, xPlore does not match parts of words. For example, WHERE object_name LIKE ’foo%’
matches foo bar but not football. Support for fragment matches (leading and trailing wildcards)
can be enabled, but this impacts performance. A more limited support for leading wildcards in
metadata search can also be enabled.
• Security
Native xPlore security performs faster than security applied to results in the Content Server. The
latter option can be enabled, but this impacts performance.
• Number of documents
Documents can be routed to specific collections based on age or other criteria. When queries are
routed to a collection, performance is much better. Scaling to more instances on the same or
multiple hosts, as well as use of 64-bit hosts, can also improve search performance.
• Size of query result set
Results should be consumed in a paged display for good performance. Webtop limits results to
350, with a smaller page size (from 10 to 100). The first page of results loads while the remainder
of the 350 results are fetched. CenterStage limits results to 150. Paging is especially important
to limit result sets for underprivileged users.
• Number of collections
If queries are not run in parallel mode (across several collections at once), response time rises as the
number of collections rises. Queries can be targeted to specific collections to avoid this problem. If
you do not use targeted queries, try to limit the number of collections in your xPlore federation.
For information on parallel or targeted queries, refer to Documentum xPlore Development Guide.
• Caches empty on system startup
At startup, the query and security caches have not been filled, so response times are slower. Make
sure you have allocated sufficient memory for the file system buffer cache and good response
time from the I/O subsystem.
• Response times slower during heavy ingestion
This is usually an issue only during migration from FAST. If your environment has large batch
migrations once a month or quarterly, you can set the target collection or domain to index-only
during ingestion. Alternatively, you can schedule ingestion during an off-peak time.

EMC Documentum xPlore Version 1.0 Administration Guide 167


Performance and Disk Space

Changing the security cache sizes


Monitor the query audit record to determine security performance. The value of
<TOTAL_INPUT_HITS_TO_FILTER> records how many hits a query had before security filtering.
The value of <HITS_FILTERED_OUT> shows how many hits were discarded because the user did
not have permissions for the results. The hits filtered out divided by the total number of hits is the
hit ratio. A low hit ratio indicates an underprivileged user, who often has slower query response
times than other users.
There are two caches that affect security performance: Groups that a user belongs to, and groups
that a user does not belong to. Cache sizes are configured in indexserverconfig.xml. The audit
record reports how many times these caches were hit for a query (GROUP_IN_CACHE_HIT,
GROUP_OUT_CACHE_HIT) and how many times the query added a group to the cache
(GROUP_IN_CACHE_FILL, GROUP_OUT_CACHE_FILL). For information on how to change these
configuration settings, refer to To change security cache sizes, page 48.
For underprivileged users, increase the not-in-groups cache size to reduce the number of times
this cache must be checked. For highly-privileged users (members of many groups), increase the
groups-in-cache size to reduce the number of times this cache must be checked.
If you have a large number of ACLs, increase the value of acl-cache-size (number of permission
sets in the cache).

Increasing query batch size


In a Documentum client application based on DFC, you can set the query batch size. Edit
dfc.properties on the search client to increase the value of dfc.batch_hint_size. Default: 50. Suggested
size: 350.

Tuning xDB properties for search


You can set the following properties in xdb.properties, which is located in the directory
WEB-INF/classes of the primary instance. If these properties are not listed, you can add them. Some
of these properties affect indexing performance as well as search performance.
• mergeFactor
Number of index entries to keep in memory before storing to disk and how often segments are
merged. For example, a factor of 10 creates a new segment for every 10 XML nodes added to the
index, and when the tenth segment has been added, the segments are merged. A high value
improves batch indexing and optimized search performance and uses more RAM. A low value
uses less memory and causes the index to be updated more often, slowing down indexing, but
searches on unoptimized indexes are faster. Default: 10.
Note: High values can causes a “too many open files” exception. You can increase the maximum
number of open files allowed on a UNIX or Linux host by increasing the ulimit setting.
• queryResultsWindowSize

168 EMC Documentum xPlore Version 1.0 Administration Guide


Performance and Disk Space

Result window for a single query. If the total result number is larger than the window, the
window size will be expanded twice for the next collecting round. Lower values can trigger
re-collection operations and increase query response time. Higher values can consume more
memory, especially for unselective queries. Default: 12000

EMC Documentum xPlore Version 1.0 Administration Guide 169


Performance and Disk Space

170 EMC Documentum xPlore Version 1.0 Administration Guide


Appendix A
Configuration settings for CPS,
Indexing, and Search

The settings for CPS, indexing, and search services are described in the following topics:
• Documentum index agent parameters, page 171
• Content processing instance settings, page 173
• Document processing and indexing service settings, page 175
• Search service settings, page 177

Documentum index agent parameters


The index agent configuration file indexagent.xml is located in dsearch_home/jboss4.3.0/server/
DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes. Most of these parameters are set
at optimal settings for all environments.
Table 22, page 171 describes the index agent parameters that you can configure within the parent
element indexer_plugin_config/generic_indexer.parameter_list.

Table 22. Indexagent configuration parameters in generic_indexer.parameter_list

Parameter Description
acl_exclusion_list Add this parameter to exclude specific
ACL attributes from indexing. Contains an
acl_attributes_exclude_list element. Check with
technical support before you add this list.
acl_attributes_exclude_list Specifies a space delimited list of ACL attributes
that will not be indexed.
dsearch_qrserver_host Fully qualified host name or IP address of host
for xPlore server
dsearch_qrserver_port Port used by xPlore server. Default is 9200
dsearch_domain Repository name

EMC Documentum xPlore Version 1.0 Administration Guide 171


Configuration settings for CPS, Indexing, and Search

Parameter Description
group_exclusion_list Add this parameter to exclude specific
group attributes from indexing. Contains an
group_attributes_exclude_list element. Check
with technical support before you add this list.
group__attributes_exclude_list Specifies a space delimited list of group
attributes that will not be indexed.
index_type_mode Object types to be indexed. Values: both
(default) | aclgroup | sysobject. If you use two
index agents, each can index either ACLs or
sysobjects.
max_requests_in_batch Maximum number of objects to be indexed in a
batch. Default: 5
max_batch_wait_msec. Maximum wait time in milliseconds for a
batch to reach the max_requests_in_batch
size. When this timeout is reached the batch
is submitted to xPlore. The default setting
(1) is for high indexing throughput. If your
Index Agent has a low ingestion rate of
documents and you want to have low latency,
reduce both max_requests_in_batch and
max_submission_timeout_sec.
max_pending_requests Maximum number of indexing requests in the
queue. Default: 10000
max_tries Maximum number of tries to add the request
to the internal queue when the queue is full.
Default: 2
group_attributes_exclude_list Attributes of a group to exclude from indexing

Table 23, page 172 describes general index agent runtime settings. Requests for indexing pass from
the exporter queue to the indexer queue to the callback queue.

Table 23. Index agent runtime configuration in indexer_plugin_config.indexer

Parameter Description
queue_size Size of queue for indexing requests. When the
queue reaches this limit, the index agent will
wait for the queue to be lower than queue_size
less (queue_size * queue_low_percent).
For example, if the queue_size is 500 and
queue_low_percent is 10%, then the agent will
resume indexing when the queue is lower than
500 - (500 * .1) = 450.
queue_low_percent Percent of queue size at which the index agent
will resume processing the queue.

172 EMC Documentum xPlore Version 1.0 Administration Guide


Configuration settings for CPS, Indexing, and Search

Parameter Description
callback_queue_size Size of queue to hold requests sent to xPlore
for indexing. When the queue reaches this
size, the index agent will wait until the
callback queue has reached 100% less the
callback_queue_low_percent.
callback_queue_low_percent Percent of callback queue size at which the index
agent will resume sending requests to xPlore.
wait_time Time in seconds that the indexing thread waits
before reading the next item in the indexing
queue.
thread_count Number of threads to be used by index agent.
shutdown_timeout Time the index agent should wait for thread
termination and cleanup before shutdown
runaway_timeout Timeout for runaway query.
partition_config You can add this element and its contents,
described below, if you want to map partitions
to specific collections. Refer to Mapping Content
Server storage areas to collections, page 60 for
more information.

Miscellaneous index agent parameters:

Table 24. Other index agent parameters

Parameter Description
contentSizeLimit In exporter.parameter_list. Sets the maximum
size for documents to be sent for indexing. The
value is in bytes. Default: 20MB.

Content processing instance settings


You can configure the following CPS settings for each instance in xPlore administrator. The default
values have been optimized for most environments.
• Connection pool size: Maximum number of concurrent connections. Valid values: 1-100. Default:
4.
Increasing the number of connections will consume more memory. Decreasing may slow
ingestion.
• Port number: Listener port for CPS daemon, used by the CPS manager. Default: 64321.
This value is set during xPlore configuration.
• Daemon path: Specifies the path to the installed CPS daemon (read-only).
This value is set during xPlore configuration.

EMC Documentum xPlore Version 1.0 Administration Guide 173


Configuration settings for CPS, Indexing, and Search

• Keep intermediate temp file: Keep content in a temporary CPS folder for debugging.
Enabling temp file has a large impact on performance. Disable (default) to remove temporary files
after the specified time in seconds. Time range in seconds: 1-604800 (1 week).
• Restart threshold: Check After processed... and specify the number of requests after which
to restart the CPS daemon.
Disable if you do not want the daemon restarted. Decreasing the number may impact performance.
• Heartbeat: Interval in seconds between the CPS manager and daemon.
Range: 1-600. Default: 60.
• Embedded return: Check Yes (default) to return embedded results to the buffer. Check No return
results to a file, and specify the file path for export.
Embedded return increases communication time and, consequently, impacts ingestion.
• Export file path: Valid URI at which to store CPS processing results, for example, file:///c:/.
If the results are larger than Result buffer threshold, they are saved in this path. This setting does
not apply to remote CPS instances, because the processing results are always embedded in the
return to xPlore.
• Result buffer size threshold: Number of bytes at which the result buffer returns results to file.
Valid values: 8 - 16MB. Default: 1MB (1048576 bytes). Larger value can accelerate process but
can cause more instability.
• Processing buffer size threshold: Specifies the number of bytes of the internal memory chunk
used to process small documents.
If this threshold is exceeded, a temporary file is created for processing. Valid values: 100KB-10MB.
Default: 2MB (2097152 bytes). Increase the value to speed processing. Consumes more memory.
• Load file to memory: Check to load the submitted file into memory for processing. Uncheck to
pass the file to a plug-in analyzer for processing (for example, the Documentum index agent).
• Batch in batch count: Average number of batch requests in a batch request.
Range: 1-100. Default: 5. CPS assigns the number of Connection pool threads for each
batch_in_batch count. For example, defaults of batch_in_batch of 5 and connection_pool_size of 5
result in 25 threads.
• Thread pool size: Number of threads used to process a single incoming request such as text
extraction and linguistic processing.
Range: 1-100. Default: 10). Larger size can speed ingestion when CPU is not under heavy load.
Causes instability at heavy CPU load.
• System language: ISO 639-1 language code that specifies the language for CPS.
Refer to Appendix E, Indexable Languages for codes.
• Max text threshold: Sets the size limit, in bytes, for documents to be tokenized.
Range: 5-40 MB expressed in bytes. Default: 20 MB. Larger values can slow ingestion rate and
cause more instability.

174 EMC Documentum xPlore Version 1.0 Administration Guide


Configuration settings for CPS, Indexing, and Search

Note: This threshold is applied to the size of the document including expanded attachments. For
example, if an email has a zip attachment, the zip file is expanded to evaluate document size. If
you increase this threshold, ingestion performance may degrade under heavy load.
• Illegal char file: Specifies the URI of a file that defines illegal characters.
To create a token separator, xPlore replaces illegal characters with white space. This list is
configurable.
• Request time out: Number of seconds before a single request times out.
Range: 60-3600. Default: 600.
• Daemon standalone: Check to stop daemon if no manager connects to it. Default: unchecked.
• IP version: Internet Protocol version of the host machine. Values: IPv4 or IPv6. Dual stack is
not supported.
• Use express queue: This queue contains admin requests and query requests. (Queries are
processed for language identification, lemmatization, and tokenization.) The express queue has
priority over the regular queue. Set the maximum number of requests in the queue. Default: 128.
• The regular queue processes indexing requests. Set the maximum number of requests in the
queue. Default: 1024.
• When the token count is zero and the extracted text is larger than the configured threshold,
a warning is logged
You can configure the following additional parameters in the CPS configuration file configuration.xml,
which is located in the CPS instance directory dsearch_home/dsearch/cps/cps_daemon:
• language_identification: The number of bytes used for language identification can be configured
in the CPS configuration file as the value of max_process_byte. The bytes are analyzed from the
beginning of the file. A larger number slows the ingestion process. A smaller number increases
the risk of language misidentification. Default: 1000.
• max_batch_size: Limit for the number of requests in a batch. Valid values: 2 - 65538 (default: 1024).
Note: The index agent also has batch size parameters.
• max_text_threshold: The upper limit in bytes for documents that are tokenized. Above this size,
only the document metadata is tokenized. Default: 10485760 (10 MB).

Document processing and indexing service


settings
You can configure the following settings for the CPS and indexing services in xPlore administrator.
For per-instance CPS settings, refer to Content processing instance settings, page 173. (The
per-instance CPS settings relate to the particular instance and do not overlap with the CPS settings in
Indexing Service configuration.) For CPS and indexing processing settings, choose Indexing Service in
the tree and click Configuration. The default values have been optimized for most environments.
• CPS
— CPS-requests-max-size: Maximum size of CPS queue. Default: 1000.
— CPS-requests-batch-size: Maximum number of CPS requests in a batch. Default: 5.

EMC Documentum xPlore Version 1.0 Administration Guide 175


Configuration settings for CPS, Indexing, and Search

— CPS-threadpool-core-size: Minimum number of threads used to process a single incoming


request. Valid values: 1 - 100. Default: 10.
Note: If you decrease the threadpool size, ingestion rate can slow down.
— CPS-threadpool-max-size: Maximum number of threads used to process a single incoming
request. Valid values: 1 - 100. Default: 100.
— CPS-thread-wait-time: Time in milliseconds to accumulate requests in a batch. Range:
1-2147483647. Default: 1000.
— CPS-executor-queue-size: Maximum size of CPS queue before spawning a new worker
thread. Default: 10.
— CPS-executor-retry-wait-time: Wait time in milliseconds after queue and worker thread
maximums have been reached. Range: 1-2147483647. Default: 1000.
• Indexing
— index-requests-max-size: Maximum size of internal index queue. Default: 1000.
— index-requests-batch-size: Maximum number of index requests in a batch. Default: 10.
— index-threadpool-core-size: Minimum number of threads used to process a single incoming
request. Valid values: 1 - 100. Default: 10.
— index-threadpool-max-size: Maximum number of threads used to process a single incoming
request. Valid values: 1 - 100. Default: 100.
— index-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a
batch. Range: 1-2147483647. Default: 1000.
— index-executor-queue-size: Maximum size of index queue before spawning a new worker
thread. Default: 10.
— index-executor-retry-wait-time: Wait time in milliseconds after index queue and worker thread
maximums have been reached. Default: 1000.
— status-requests-batch-size: Maximum number of status update requests in a batch. Default:
1000.

176 EMC Documentum xPlore Version 1.0 Administration Guide


Configuration settings for CPS, Indexing, and Search

— status-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a


batch. Default: 1000.
— index-check-duplicate-at-ingestion: Set to true to check for duplicate documents. May slow
ingestion. Default: true.

Search service settings


You can configure the following settings for the search service in xPlore administrator. The default
values have been optimized for most environments:
• query-default-locale: Default locale for queries. Refer to Appendix E, Indexable Languages for
language codes and to your release notes for supported languages in this release. Default: en
(English).
• query-default-result-batch-size: Default size of result batches that are sent to the client. Default:
200. In a Documentum environment, this setting is overridden by dfc.batch_hint_size in
dfc.properties.
• query-result-cache-size: Default size of results buffer. When this limit is reached, no more results
are fetched from xDB until the client asks for more results. Default: 200.
• query-result-spool-location: Path to location at which to spool results. Default:
dsearch_home/dsearch/spool
• query-default-timeout: Interval in milliseconds for a query to time out. Default: 60000.
• query-threadpool-core-size: Minimum number of threads used to process incoming requests.
Threads are allocated at startup, and idle threads are removed down to this minimum number.
Valid values: 1 - 100. Default: 10.
Note: If you decrease the threadpool size, search performance can decrease.
• query-threadpool-max-size: Maximum number of threads used to process incoming requests.
After this limit is reached, service is denied to additional requests. Valid values: 1 - 100. Default:
100.
• query-threadpool-queue-size: Maximum number in threadpool queue before spawning a new
worker thread. Default: 0.
• query-threadpool-keepalive-time: Interval after which idle threads are terminated. Default:
600000.
• query-threadpool-keep-alive-time-unit: Unit of time for query-thread-pool-keep-alive-time.
Default: milliseconds.
• query-executor-retry-interval: Wait time in milliseconds after search queue and worker thread
maximums have been reached. Default: 100.
• query-executor-retry-limit: Number of times to retry query. Default: 3.
• query-thread-sync-interval: Used for xPlore internal synchronization. Interval after which
results fetching is suspended when the result cache is full. For a value of 0, the thread waits
indefinitely until space is available in the cache (freed up when the client application retrieves
results). Default: 100 units.

EMC Documentum xPlore Version 1.0 Administration Guide 177


Configuration settings for CPS, Indexing, and Search

• query-thread-max-idle-interval: Query thread is freed up for reuse after this interval, because
the client application has not retrieved the result. (Threads are freed immediately after a result is
retrieved.) Default: 3600000.
• query-summary-default-highlighter: Class that determines summary and highlighting. Default:
com.emc.documentum.core.fulltext.indexserver.services.summary.DefaultSummary. Refer to
Configuring query summary and highlighting, page 101.
• query-summary-display-length: Number of characters to return as a dynamic summary. Default:
64. Refer to Configuring query summary and highlighting, page 101.
• query-summary-highlight-begin-tag: HTML tag to insert at beginning of summary. Default:
empty string. Refer to Configuring query summary and highlighting, page 101.
• query-summary-highlight-end-tag: HTML tag to insert at end of summary. Default: empty string.
Refer to Configuring query summary and highlighting, page 101.
• query-enable-dynamic-summary: If context is not important, set to false to return as a summary
the first n chars defined by the query-summary-display-length configuration parameter. For
summaries evaluated in context, set to true (default). Refer to Configuring query summary and
highlighting, page 101.
• query-index-covering-values: Supports Documentum DQL evaluation. Do not change unless
tech support directs you to do this.
• query-facet-max-result-size: Documentum only. Sets the maximum number of results used to
compute facet values. For example, if query-facet-max-result-size=12, only 12 results for all facets
in a query are returned. If a query has many facets, the number of results per facet is reduced
accordingly. Default: 10000.
Note: Result set size cannot be limited. It is up to the client application to limit the number of results
that are fetched.

178 EMC Documentum xPlore Version 1.0 Administration Guide


Appendix B
Extensible Documentum DTD

Documentum repository content is stored in XML format. Table 25, page 179 displays the partial
DTD. Customer-defined elements and attributes can be added to this DTD as children of dmftcustom.
Each element specifies an attribute of the object type. The object type is the element in the path
dmftdoc/dmftmetadata/type_name, for example, dmftdoc/dmftmetadata/dm_document.
The root element of DFTXML is dmftdoc. Table 25, page 179 describes the top-level elements under
dmftdoc. This DTD is subject to change.

Table 25. DMFTXML top-level elements

Element Description
dmftkey Contains Documentum object ID (r_object_id)
dmftmetadata Contains elements for all indexable attributes
from the standard Documentum object model,
including custom object types. Each attribute is
modeled as an element and value. Repeating
attributes repeat the element name and contain
a unique value. Some metadata, such as
r_object_id, are repeated in other elements as
noted.
dmftvstamp Contains the internal version stamp (i_vstamp)
attribute.
dmftsecurity Contains security attributes from the object
model plus computed attributes: acl_name,
acl_domain, and ispublic.
dmftinternal Contains attributes used internally for query
processing.
dmftversions Contains version labels and iscurrent for the
object if it is a sysobject.
dmftfolders Contains the folder ID and folder parents.
dmftcontents Contains content-related attributes and one
or more pointers to content files. The actual
content can be stored within the child element
dmftcontent as a CDATA section.

EMC Documentum xPlore Version 1.0 Administration Guide 179


Extensible Documentum DTD

Element Description
dmftcustom Contains searchable information supplied by
custom applications. (Requires a TBO.)
dmftsearchinternals Contains tokens used by static and dynamic
summaries.

To find the path of a specific attribute in DFTXML, use a Documentum client to look up the object
ID of a custom object in the repository. Using xPlore administrator, open the target collection and
paste the object ID into the Filter word box. Click the resulting document to see the DFTXML
representation. Following is a sample DFTXML representation of a custom object type:
<?xml version="1.0"?>
<dmftdoc dmftkey="090a0d6880008848" dss_tokens=":dftxml:1">
<dmftkey>090a0d6880008848</dmftkey>
<dmftmetadata>
<dm_sysobject>
<r_object_id dmfttype="dmid">090a0d6880008848</r_object_id>
<object_name dmfttype="dmstring">mylog.txt</object_name>
<r_object_type dmfttype="dmstring">techpubs</r_object_type>
<r_creation_date dmfttype="dmdate">2010-04-09T21:40:47</r_creation_date>
<r_modify_date dmfttype="dmdate">2010-04-09T21:40:47</r_modify_date>
<r_modifier dmfttype="dmstring">Administrator</r_modifier>
<r_access_date dmfttype="dmdate"/>
<a_is_hidden dmfttype="dmbool">false</a_is_hidden>
<i_is_deleted dmfttype="dmbool">false</i_is_deleted>
<a_retention_date dmfttype="dmdate"/>
<a_archive dmfttype="dmbool">false</a_archive>
<a_link_resolved dmfttype="dmbool">false</a_link_resolved>
<i_reference_cnt dmfttype="dmint">1</i_reference_cnt>
<i_has_folder dmfttype="dmbool">true</i_has_folder>
<i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id>
<r_link_cnt dmfttype="dmint">0</r_link_cnt>
<r_link_high_cnt dmfttype="dmint">0</r_link_high_cnt>
<r_assembled_from_id dmfttype="dmid">0000000000000000</r_assembled_from_id>
<r_frzn_assembly_cnt dmfttype="dmint">0</r_frzn_assembly_cnt>
<r_has_frzn_assembly dmfttype="dmbool">false</r_has_frzn_assembly>
<r_is_virtual_doc dmfttype="dmint">0</r_is_virtual_doc>
<i_contents_id dmfttype="dmid">060a0d688000ec61</i_contents_id>
<a_content_type dmfttype="dmstring">crtext</a_content_type>
<r_page_cnt dmfttype="dmint">1</r_page_cnt>
<r_content_size dmfttype="dmint">130524</r_content_size>
<a_full_text dmfttype="dmbool">true</a_full_text>
<a_storage_type dmfttype="dmstring">filestore_01</a_storage_type>
<i_cabinet_id dmfttype="dmid">0c0a0d6880000105</i_cabinet_id>
<owner_name dmfttype="dmstring">Administrator</owner_name>
<owner_permit dmfttype="dmint">7</owner_permit>
<group_name dmfttype="dmstring">docu</group_name>
<group_permit dmfttype="dmint">5</group_permit>
<world_permit dmfttype="dmint">3</world_permit>
<i_antecedent_id dmfttype="dmid">0000000000000000</i_antecedent_id>
<i_chronicle_id dmfttype="dmid">090a0d6880008848</i_chronicle_id>
<i_latest_flag dmfttype="dmbool">true</i_latest_flag>
<r_lock_date dmfttype="dmdate"/>
<r_version_label dmfttype="dmstring">1.0</r_version_label>
<r_version_label dmfttype="dmstring">CURRENT</r_version_label>
<i_branch_cnt dmfttype="dmint">0</i_branch_cnt>
<i_direct_dsc dmfttype="dmbool">false</i_direct_dsc>
<r_immutable_flag dmfttype="dmbool">false</r_immutable_flag>
<r_frozen_flag dmfttype="dmbool">false</r_frozen_flag>
<r_has_events dmfttype="dmbool">false</r_has_events>
<acl_domain dmfttype="dmstring">Administrator</acl_domain>

180 EMC Documentum xPlore Version 1.0 Administration Guide


Extensible Documentum DTD

<acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name>
<i_is_reference dmfttype="dmbool">false</i_is_reference>
<r_creator_name dmfttype="dmstring">Administrator</r_creator_name>
<r_is_public dmfttype="dmbool">true</r_is_public>
<r_policy_id dmfttype="dmid">0000000000000000</r_policy_id>
<r_resume_state dmfttype="dmint">0</r_resume_state>
<r_current_state dmfttype="dmint">0</r_current_state>
<r_alias_set_id dmfttype="dmid">0000000000000000</r_alias_set_id>
<a_is_template dmfttype="dmbool">false</a_is_template>
<r_full_content_size dmfttype="dmdouble">130524</r_full_content_size>
<a_is_signed dmfttype="dmbool">false</a_is_signed>
<a_last_review_date dmfttype="dmdate"/>
<i_retain_until dmfttype="dmdate"/>
<i_partition dmfttype="dmint">0</i_partition>
<i_is_replica dmfttype="dmbool">false</i_is_replica>
<i_vstamp dmfttype="dmint">0</i_vstamp>
<webpublish dmfttype="dmbool">false</webpublish>
</dm_sysobject>
</dmftmetadata>
<dmftvstamp>
<i_vstamp dmfttype="dmint">0</i_vstamp>
</dmftvstamp>
<dmftsecurity>
<acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name>
<acl_domain dmfttype="dmstring">Administrator</acl_domain>
<ispublic dmfttype="dmbool">true</ispublic>
</dmftsecurity>
<dmftinternal>
<docbase_id dmfttype="dmstring">658792</docbase_id>
<server_config_name dmfttype="dmstring">DSS_LH1</server_config_name>
<contentid dmfttype="dmid">060a0d688000ec61</contentid>
<r_object_id dmfttype="dmid">090a0d6880008848</r_object_id>
<r_object_type dmfttype="dmstring">techpubs</r_object_type>
<i_all_types dmfttype="dmid">030a0d68800001d7</i_all_types>
<i_all_types dmfttype="dmid">030a0d6880000129</i_all_types>
<i_all_types dmfttype="dmid">030a0d6880000105</i_all_types>
<i_dftxml_schema_version dmfttype="dmstring">5.3</i_dftxml_schema_version>
</dmftinternal>
<dmftversions>
<r_version_label dmfttype="dmstring">1.0</r_version_label>
<r_version_label dmfttype="dmstring">CURRENT</r_version_label>
<iscurrent dmfttype="dmbool">true</iscurrent>
</dmftversions>
<dmftfolders>
<i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id>
</dmftfolders>
<dmftcontents>
<dmftcontent>
<dmftcontentattrs>
<r_object_id dmfttype="dmid">060a0d688000ec61</r_object_id>
<page dmfttype="dmint">0</page>
<i_full_format dmfttype="dmstring">crtext</i_full_format>
</dmftcontentattrs>
<dmftcontentref content-type="text/plain" islocalcopy="true" lang="en"
encoding="US-ASCII" summary_tokens="dmftsummarytokens_0">
<![CDATA[...]]></dmftcontentref>
</dmftcontent>
</dmftcontents>
<dmftdsearchinternals dss_tokens="excluded">
<dmftstaticsummarytext dss_tokens="excluded"><![CDATA[mylog.txt ]]>
</dmftstaticsummarytext>
<dmftsummarytokens_0 dss_tokens="excluded"><![CDATA[1Tkns ...]]>
</dmftsummarytokens_0>
</dmftdsearchinternals>

EMC Documentum xPlore Version 1.0 Administration Guide 181


Extensible Documentum DTD

</dmftdoc>

182 EMC Documentum xPlore Version 1.0 Administration Guide


Appendix C
DQL Hints File DTD

Following is the DFC hints file DTD. For more information on this DTD, refer to Hints file elements,
page 110.
<!ELEMENT RuleSet (Rule*)>
<!ELEMENT Rule (Condition?, DQLHint?, SelectOption?, DisableFullText?, DisableFTDQL?)>
<!ELEMENT Condition (Select?, From?, Where?, Docbase?, FulltextExpression?)>
<!ELEMENT DQLHint (#PCDATA)>
<!ELEMENT SelectOption (#PCDATA)>
<!ELEMENT DisableFullText EMPTY>
<!ELEMENT DisableFTDQL EMPTY>
<!ELEMENT Select (Attribute+)>
<!ATTLIST Select condition (all | any) \"all\">
<!ELEMENT From (Type+)>
<!ATTLIST From condition (all | any) \"all\">
<!ELEMENT Where (Attribute+)>
<!ATTLIST Where condition (all | any) \"all\">
<!ELEMENT Docbase (Name+)>
<!ELEMENT FulltextExpression EMPTY>
<!ELEMENT FulltextExpression exists (true | false) #REQUIRED>
<!ELEMENT Attribute (#PCDATA)>
<!ATTLIST Attribute operator
(equal | not_equal | greater_than | greater_equal | less_than | less_equal |
like | not_like | is_null |
is_not_null | in | not_in | between)
#IMPLIED>
<!ELEMENT Type (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ATTLIST Name descend (true | false) #IMPLIED>

EMC Documentum xPlore Version 1.0 Administration Guide 183


DQL Hints File DTD

184 EMC Documentum xPlore Version 1.0 Administration Guide


Appendix D
Tracking and Status XQueries

You can issue the following XQuery expressions against the tracking database for each domain.
Many of these expressions are available in xPlore administrator or as audit reports. These XQuery
expressions can be submitted in the xDB console.

Object count from tracking DB —


• Get object count in a collection
count(//trackinginfo/document[collection-name="<Collection_name>"])

For example:
for $i in collection("dsearch/SystemInfo/TrackingDB/TestCustomType")
return count($i//trackinginfo/document)

• Get object count in library


count(//trackinginfo/document[library-path="<LibraryPath>"])

• Get object count in all collections (all indexed objects)


count(//trackinginfo/document)

For example:
for $i in collection("dsearch/SystemInfo")
return count($i//trackinginfo/document)

Find documents —
• Find collection in which a document is indexed
//trackinginfo/document[@id="<DocumentId>"]/collection-name/string(.)

For example:
for $i in collection("dsearch/SystemInfo")
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name

• Find library in which a document is indexed


//trackinginfo/document[@id="<DocumentId>"]/library-path/string(.)

• Get tracking information for a document


//trackinginfo/document[@id="<DocumentId>"]

EMC Documentum xPlore Version 1.0 Administration Guide 185


Tracking and Status XQueries

Status information — Get operations and status information for a document


//trackinginfo/operation[@doc-id="<DocumentId>"]

186 EMC Documentum xPlore Version 1.0 Administration Guide


Appendix E
Indexable Languages

The following languages can be indexed. For a list of supported languages, refer to the release
notes for this release.
Unless noted, a language is analyzed for tokenization, part of speech tagging, sentence boundary
detection (SBD), base noun phrase detection (BNP), stemming, compound analysis, and alternative
readings. Some languages not in this list are identified but not indexed.

Table 26. List of indexable languages

Language What is indexed


Arabic no compounds, readings
Chinese (Simplified) no stemming, compounds. Readings are pinyin
transcriptions.
Chinese (Traditional) no stemming, compounds. Readings are pinyin
transcriptions.
Czech no BNP, compounds, readings
Dutch no readings
English no compounds, readings
Farsi (Persian) no POS, BNP, compounds, readings
French no compounds, readings
German no readings
Greek no BNP, compounds, readings
Hungarian no BNP, readings
Italian no compounds, readings
Japanese Readings are Furigana transcriptions rendered
in Hiragana.
Korean no BNP, readings
Polish no BNP, compounds, readings
Portuguese no compounds, readings
Russian no BNP, compounds, readings

EMC Documentum xPlore Version 1.0 Administration Guide 187


Indexable Languages

Language What is indexed


Spanish no compounds, readings
Urdu no BNP, compounds, readings

188 EMC Documentum xPlore Version 1.0 Administration Guide


Appendix F
Indexable Encodings

Documents must be encoded with one of the following encoding types:


ASCII
ARABIC
BIG5
GB2312
ISO_8859_1
ISO_8859_2
ISO_10646_UCS_2
LATIN1
LATIN2
UCS2
UCS_2
UNICODE
UTF7
UTF_7
UTF8
UTF_8
UTF8BOM
UTF32
UTF_32
UTF_16

EMC Documentum xPlore Version 1.0 Administration Guide 189


Indexable Encodings

190 EMC Documentum xPlore Version 1.0 Administration Guide


Appendix G
Indexable Formats

The following tables list the formats that can be indexed. Some formats that are listed are indexed
on file ID or metadata only, as noted.

Table 27. Indexable word processing and text formats

Adobe FrameMaker (MIF) Versions 3.0-6.0


Adobe Illustrator Postscript Level 2
Ami (Lotus), Ami Pro for OS2, Ami Pro for Windows 2.0 and 3.0
ANSI Text 7 & 8 bit
ASCII Text 7 & 8 bit
DEC DX through 4.0, DX Plus 4.0, 4.1
DOS character set
EBCDIC
Enable 3.0-4.5
First Choice WP 1.0, 3.0
Framework WP 3.0
Hangul Versions 97-2007
HTML (not CSS rendering) 1.0-4.0
IBM DCA/RFT and FFT
IBM DisplayWrite 2.0-5.0
IBM Writing Assistant 1.01
Ichitaro 5.0, 6.0, 8.0-13.0, 2004
JustWrite through 3.0
Legacy 1.1
Lotus Manuscript through 2.0
Lotus Word Pro (non-Windows and Windows) 97-Millennium 9.6
Macintosh character set
MacWrite II 1.1
MASS11 through 8.0

EMC Documentum xPlore Version 1.0 Administration Guide 191


Indexable Formats

IBM FFT All versions


MacWrite II Version 1.1
MASS11 Versions through 8.0
Microsoft Publisher file ID only 2003–2007
Microsoft Word (DOS) 4.0-6.0
Microsoft Word (Mac) 4.0-6.0, 98-2008
Microsoft Word (Windows) 1.0-2007, 98-J
Microsoft WordPad
Microsoft Works (DOS) 2.0
Microsoft Works (Mac) 2.0
Microsoft Works (Windows) 3.0-4.0
Microsoft Windows Write 1.0-3.0
MultiMate through 4.0, Multimate Advantage 2.0
Navy DIF
Nota Bene 3.0
Novell Perfect Works 2.0
Office Writer 4.0-6.0
OpenOffice Writer (Windows and UNIX) 1.1-2.0
PC File Doc 5.0
PFS:Write A, B
Professional Write (DOS) 1.0-2.0
Professional Write Plus (Windows) 1.0
Q & A Write 2.0, 3.0
Rich Text Format (RTF) All versions
Samna Word IV 1.0-3.0, IV+
Signature 1.0
SmartWare II WP 1.02
Sprint 1.0
StarOffice Writer 5.2-8.0
Total Word 1.2
Unicode Text 3.0, 4.0
UTF-8
Wang IWP through 2.6
Wireless Markup Language
WordMARC Composer, Composer+, Word Processor
WordPerfect (DOS) 4.2

192 EMC Documentum xPlore Version 1.0 Administration Guide


Indexable Formats

WordPerfect (Mac) 1.02-3.1


WordPerfect (Windows) 5.1-X3
WordStar (DOS) 3.0-7.0
WordStar 2000 (DOS) 2.0-3.0
WordStar (Windows) 1.0
XML (text only)
XHTML (file ID only) 1.0
XyWrite through III+

Table 28. Indexable database formats

DataEase Version 4.x


dBASE III, IV, V
First Choice DB through 3.0
Framework DB 3.0
Microsoft Access 1.0-2.0
Microsoft Works (DOS) 1.0, 2.0
Microsoft Works (Mac) 2.0
Microsoft Works (Windows) 3.0-4.0
Paradox (DOS) 2.0-4.0
Paradox (Windows) 1.0
Q & A through 2.0
R:BASE 5000 and System V
Reflex 2.0
SmartWare II DB 1.02

Table 29. Indexable spreadsheet formats

Enable 3.-4.5
First Choice SS through 3.0
Framework SS 3.0
Lotus 1-2-3 through Millennium 9.6
Lotus 1-2-3 (OS/2) 2.0
Lotus 1-2-3 Charts (DOS & Windows) through 5.0
Lotus 1-2-3 for SmartSuite Versions 97 - Millennium 9.6
Lotus Symphony 1.x
Microsoft Excel Charts Versions 2.x-2007
Microsoft Excel (Mac) 98-2008

EMC Documentum xPlore Version 1.0 Administration Guide 193


Indexable Formats

Microsoft Excel (Windows) 3.0-2007


Microsoft Excel (Windows) File ID only 2007 binary
Microsoft Multiplan Version 4.0
Microsoft Works (Windows) 3.0, 4.0
Microsoft Works (DOS and Mac) 2.0
Novell Perfect Works 2.0
OpenOffice Calc 1.1-2.0
PFS:Professional Plan 1.0
Quattro Pro (DOS) through 5.0
Quattro Pro (Windows) through X3
SmartWare SS
SmartWare II 1.02
StarOffice Calc 5.2-8.0
SuperCalc 5.0
Symphony through 2.0
VP Planner 1.0

Table 30. Indexable presentation formats

Corel Presentations 6.0-X3


Harvard Graphics (DOS) 3.0
IBM Lotus Symphony Presentations 1.x
Lotus Freelance (Windows) 1.0-Millennium 9.6
Lotus Freelance (OS/3) 2.0
Microsoft PowerPoint (Windows) 3.0-2007
Microsoft PowerPoint (Mac) 4.0-2008
Novell Presentations 3.0, 7.0
OpenOffice Impress 1.1, 2.0
StarOffice Impress 5.2-8.0
Word Perfect Presentations

Table 31. Indexable graphics formats (vector and raster)

Adobe Framemaker Graphics (FMV) 3.0-5.0


Adobe Illustrator 4.0-7.0, 9.0
Adobe Illustrator XMP only 11-13 (CS 1-3)
Adobe InDesign and InDesign Interchange XMP only 3.0-5.0 (CS 1-3)
Adobe PDF 1.0-1.7 (Acrobat 1-9) except PDF Packages or PDF Portfolios

194 EMC Documentum xPlore Version 1.0 Administration Guide


Indexable Formats

Adobe Photoshop 4.0


Adobe Photoshop XMP only 8.0-10.0 (CS 1-3)
Ami Draw SDW
AutoCAD Drawing 2.5, 2.6, 9.0-14.0, 2000i-2007
AutoShade Rendering (RND) 2.0
bitmap (Windows BMP)
CALS Raster (GP4) Type I and Type II
Corel Draw Clipart format (CMX) 5.0, 7.0
Corel Draw (CDR) 2.0-9.0
Computer Graphics Metafile (CGM) ANSI, CALS, NIST
Encapsulated PostScript (EPS) TIFF header only
Escher graphics
GEM Image (IMG bitmap) and GEM File (vector)
Graphics Interchange Format (GIF) All versions
Harvard Graphics Chart DOS 2.0-3.0
Harvard Graphics for Windows
HP Graphics Language (HPGL) 2.0
IBM Graphics Data Format (GDF) 1.0
IBM Picture Interchange Format (PIF) 1.0
IGES Drawing 5.1-5.3
JBIG2 JBIG2 graphic embeddings in PDF files except PDF Packages or PDF Portfolios
JFIF (JPEG not in TIFF format)
JPEG
JPEG 2000 JP2
Kodak Flash Pix (FPX)
Kodak Photo CD (PCD) 1.0
Lotus PIC
Lotus Snapshot
Macintosh PICT1 & PICT2 bitmap only
MacPaint (PNTG)
Micrografx Draw (DRW) through 4.0
Micrografx Designer (DRW, DSF) through 3.1, 6.0
Microsoft Windows Cursor
Microsoft Windows Icon
Microsoft XPS text only
Novell PerfectWorks Draw 2.0

EMC Documentum xPlore Version 1.0 Administration Guide 195


Indexable Formats

OpenOffice Draw 1.1-3.0


OS/2 Bitmap, Warp Bitmap
Paint Shop Pro (PSP) Windows 32 only
PC Paintbrush (PCX and DCX)
Portable Bitmap (PBM), Portable Graymap (PGM), Portable Network Graphics (PNG), Portable
Pixmap (PPM)
Progressive JPEG
StarOffice/OpenOffice Draw 6.x-8.0
Sun Raster (SRS)
TIFF groups 5 and 6, TIFF CCITT Group 3 & 4
Truevision TGA (TARGA) 2.0
Visio (Page Preview WMF/EMF) 4.0
Visio 5.0–2007
Visio XML, VSX (file ID only) 2007
Wireless graphics format (WBMP)
Windows Enhanced Metafile (EMF)
Windows Metafile (WMF)
WordPerfect Graphics (WPG & WPG2) 1.0, 2.0-10.0
X-Windows Bitmap (XBM), Dump (XWD), Pixmap (XPM) x10 compatible

Table 32. Indexable compressed formats

GZIP (Unix)
LZA Self Extracting Compress
LZH Compress
Microsoft Office Binder 95, 97
RAR 1.5, 2.0, 2.9
Self-extracting .exe
UUEncode
UNIX Compress
UNIX TAR
ZIP PKZip and WinZip

Table 33. Indexable email formats

Encoded mail messages MHT, multipart alternative/digest/mixed/news group/signed, TNEF


IBM Lotus Notes Domino XML (DXL) 8.5
IBM Lotus Notes NSF (file ID only) 7.x, 8.x

196 EMC Documentum xPlore Version 1.0 Administration Guide


Indexable Formats

Microsoft Outlook Express (EML)


Microsoft Outlook Folder (PST) 97-2007, Mac 2001
Microsoft Outlook Forms Template (OFT) 97-2007
Microsoft Outlook Message (MSG) 97-2007
MIME-encoded mail messages.

Table 34. Indexable multimedia formats

AVI (metadata only)


Flash (text only)6.x, 7.x, Lite
MP3 (ID3 metadata only)
MPEG-1 (file ID only) audio layer 3V ID3 v1 and v2, video layer v2 and v3
MPEG-2 (file ID only) audio
MPEG-4, MPEG-7 metadata only
QuickTime metadata only
Real Media (file ID only)
WAV metadata only
Windows Media metadata only ASF, DVR-MS, Audio WMA, Video WMV

Table 35. Other indexable formats

Microsoft Project (text) 98-2003, 2007 (file ID only)


Microsoft Windows DLL and executable
vCalendar, vCard 2.1
Yahoo! Messenger 6.x-8

EMC Documentum xPlore Version 1.0 Administration Guide 197


Indexable Formats

198 EMC Documentum xPlore Version 1.0 Administration Guide


xPlore Glossary

category
A category defines a class of documents and their XML structure.

collection
A collection is a logical group of XML documents that is physically stored in an xDB library.
A collection represents the most granular data management unit within xPlore.

content processing service


see CPS

CPS
The content processing service (CPS) retrieves indexable content from content sources and
determines the document format and primary language. CPS parses the content into index
tokens that xPlore can process into full-text indexes.

domain
A domain is a separate, independent group of collections with an xPlore deployment.

DQL
Documentum Query Language, used by many Content Server clients

FTDQL
Full-text Documentum Query Language

ftintegrity
A standalone Java program that checks index integrity against Content Server repository
documents. The ftintegrity script calls the state of the index job in the Content Server.

full-text index
Index structure that tracks terms and their occurrence in a document.

index agent
Documentum application that receives indexing requests from the Content Server. The agent
prepares and submits to xPlore an XML representation of the document to be indexed.

ingestion
Process in which xPlore receives an XML representation of a document and processes it
into an index.

EMC Documentum xPlore Version 1.0 Administration Guide 199


xPlore Glossary

instance
A xPlore instance is one deployment of the xPlore WAR file to an application server container.
You can have multiple instances on the same host (vertical scaling), although it is more
common to have one xPlore instance per host (horizontal scaling). The following processes
can run in an xPlore instance: CPS, indexing, search, xPlore administrator. xPlore can have
multiple instances installed on the same host.

lemmatization
Lemmatization is a normalization process in which the lemmatizer finds a canonical or
dictionary form for a word, called a lemma. Content that is indexed is also lemmatized
unless lemmatization is turned off. Terms in search queries are also lemmatized unless
lemmatization is turned off.

Lucene
Apache open-source, Java-based full-text indexing and search engine.

node
In xPlore and xDB, node is sometimes used to denote instance. It does not denote host.

persistence library
Saves CPS, indexing, and search metrics. Configurable in indexserverconfig.xml.

state of the index job


Repository configuration installs the state of the index job. This job is run from Documentum
Administrator. The ftintegrity script calls this job, which reports on index completeness,
status, and indexing failures.

status library
A status library reports on indexing status for a domain. There is one status library for
each domain.

stop words
Stop words are words that are filtered out before indexing, to save the size of the index
and to prevent searches on common words.

text extraction
Identification of terms in a content file.

token
Piece of an input string defined by semantic processing rules.

tracking library
An xDB tracking library records the object IDs and location of content that has been indexed.
There is one tracking database for each domain.

transactional support
Small in-memory indexes are created in rapid transactional updates, then merged into
larger indexes. When an index is written to disk, it is considered clean. Committed and
uncommitted data before the merge is searchable along with the on-disk index.

200 EMC Documentum xPlore Version 1.0 Administration Guide


Glossary

watchdog service
Installed by the xPlore installer, the watchdog service pings all xPlore instances and sends
an email notification when an instance does not respond.

xDB
xDB is a database that enables high-speed storage and manipulation of many XML
documents. In xPlore, an xDB library stores a collection as a Lucene index and manages the
indexes on the collection. The XML content of indexed documents can optionally be stored.

XQFT
W3C full-text XQuery and XPath extensions described in XQuery and XPath Full Text
1.0. Support for XQFT includes logical full-text operators, wildcard option, anyall option,
positional filters, and score variables.

XQuery
W3C standard query language that is designed to query XML data. xPlore receives xQuery
expressions that are compliant with the XQuery standard and returns results.

EMC Documentum xPlore Version 1.0 Administration Guide 201


xPlore Glossary

202 EMC Documentum xPlore Version 1.0 Administration Guide


Index

A rebuild, 92
architecture restore with xDB, 94
logical, 21 collection backup
physical, 17 scripted, 97
attach domain, 40 collections
audit Documentum, 26
queries, 103 scalability, 84
audit record, 152 storage areas, mapping, 60
connectors_batch_size, 163
consistency
B index, 40
backup CONTAINS WORD, 113
file-based, 96 Content Server
incremental, 93 indexing, 27, 29
overview, 89 content storage areas
planning, 89 Documentum, mapping, 58
scripted, 97 Content too large report, 150
snapshot, 95 CPS
volume-based, 95 configure, 65
backup-directory, 97 logging, 143
batch_hint_size, 177 overview, 63
status, 65
test processing, 127
C troubleshoot, 126
cache
Documentum groups and ACLs, 48
CASample, 127 D
case sensitivity, 71 data model
categories Documentum full-text, 26
configure, 83 DB statistics, 40
manage, 84 detach domain, 40
category DFC
Documentum, 25 compared to DQL, 107
overview, 22 DFS queries
collection compared to DQL, 107
configure, 86 disk areas in xPlore, 17
create, 85 disk space, 158
create,, 85 dm_ftengine_config
delete, 85 to 86 security, 47
global, 23 settings, 106
overview, 22 summary security_mode, 49

EMC Documentum xPlore Version 1.0 Administration Guide 203


Index

dm_fulltext_collection, 26 backward compatibility, 113


dm_fulltext_index_user, 29 migration, sizing, 161
dmi_registry, 29 federation
document restore with xDB, 93
size, limit, 165 federation backup
Document processing reports, 150 scripted, 97
document size file store
maximum, for ingestion, 174 mapping, 58
Documentum file stores
categories, 25 map to collections, 60
domains, 25 file-based
index server options, 14 backup and restore, 96
indexing overview, 53 filters
search overview, 105 by type or location, 56
search results security, 47 folder descend, 107
domain format
attach or detach, 40 exclude from indexing, 55
configure, 40 fragment
create, 39 FAST compatibility, 113
Documentum, 25 freshness
overview, 21 search results configuration, 100
reset state, 96 FT_CONTAIN_FRAGMENT, 113
restore with xDB, 93 ftintegrity
domain backup running, 122
scripted, 97 full-text indexing
domains Content Server documents, 27, 29
configuration, 39 index server, 29
DQL overview, 27
collection hint, 108 software installation, 27, 29
compared to DFC/DFS, 107 verifying indexes, 122
hints file, 109 xPlore, 29
DQL queries fulltext indexes
compared to DFC/DFS, 107 state of the index job, 60
dsearch-backup, 97 fuzzy search, 71
dsearch-list-orphaned-segments, 98
dsearch-purge-orphaned-segments, 98
dsearch-set-state, 97
G
Get query text, 151
getfile, 58
E getpath, 58
events global configuration, 33
register, 29
exclude object types, 55
execute XQuery, 39
H
exporter queue_size, 163 highlighting, 101
exporter_thread_count, 163 in results summary, 102

F I
FAST incremental backup, 93
index

204 EMC Documentum xPlore Version 1.0 Administration Guide


Index

consistency, 40 J
rebuild, 92 jobs
remove docs, 58 state of the index, 60
index agent
configuration, 171
configure, 53 L
error threshold, 125 lemmatization
filters, 56 configure, 66
multiple, 54 managing, 65
performance, 163 troubleshooting, 67
reindexing, 124 logging
restart, 122 CPS, 143
role in indexing process, 27, 29 formats, 143
troubleshoot, 120 location, 145
index selected list, 58 log4j, 142
index server overview, 142
role in indexing process, 27, 29 queries, 146
index servers xDB, 146
Documentum integration, 14 xDB and Lucene, 146
index-value-leaf-node-only, 80 login
indexagent.xml, 171 xPlore administrator, 32
indexer queue_size, 163 Lucene
indexes and xDB, 19
create, 76 logging, 146
overview, 20
indexing
components, 17
M
exclude from, 55 metadata
manage, 75 boost in results, 101
metadata only, 55 metrics
performance, 164 and performance, 161
queue items, 29 indexing, view and configure, 80
resubmit, 58 persistence of, 38
tasks, 81
troubleshoot, 132 O
indexing report, 151
object type
indexserverconfig.xml
exlude from indexing, 55
Documentum categories, 25
orphaned segments
Documentum domains, 25
purge, 98
modifying, 36
ingestion
slow, 128 P
installing indexing software, 27, 29 performance
instance language identification, 175
activate spare, 42 limit content size, 128, 164
deactivate, 42 local filestore map, 59
display information, 37 metrics, 161
overview, 18 purge status DB, 38
instances query summary, 103
get status, 42 xDB, 88

EMC Documentum xPlore Version 1.0 Administration Guide 205


Index

xDB properties, 168 S


persistence save-tokens, 68
of metrics, 38 scalability
primary instance collections, 84
replace, 43 of indexing, 75
purge orphaned segments, 98 scoring
search results configuration, 100
Q script
query backup and restore, 96
audit, 103 turn off indexing, 97
route to a collection, 108, 112 SDC. See SEARCH DOCUMENT
troubleshoot, 136 CONTAINS
Query counts by user, 151 search
query plan, 137 scoring and freshness, 100
query summary, 101 slow, 137
queue troubleshoot, 136
indexing, clean up, 126 SEARCH DOCUMENT CONTAINS, 113
queue items, 29 search management tasks, 99
search reports, 151
security, 47
R Documentum cache, 48
rate Documentum, configuring, 47
factors in indexing, 164 security_mode, 49
rebuild index, 92 segments
recent documents purge orphans, 98
boost in results, 101 size
remove from index, 58 configuring limits for documents, 128
report content size limit, in Documentum, 54
Content too large, 150 document limit, 165
document processing, 150 maximum, for ingestion, 174
Documents ingested per..., 151 sizing
editing, 152 CPS, 160
Get query text, 151 ingestion, 161
Query counts by user, 151 migration from FAST, 161
Top N slowest queries, 151 search, 160
reports skip
ingestion, 149 content extraction, 76
list of, 149 slow ingestion, 128
reset slow queries, 137
domain state, 96 snapshot
restore backup and restore, 95
collection, with xDB, 94 spare instance
domain, with xDB, 93 activate, 42
federation, with xDB, 93 deactivate, 42
file-based, 96 special characters
overview, 89, 92 as word boundaries, 69
snapshot, 95 stateofindex, 60
volume-based, 95 statistics
result set size, 178 query, 100

206 EMC Documentum xPlore Version 1.0 Administration Guide


Index

status U
CPS, 65 upload testing document, 132
status DB
purge, 38
stop words, 71 V
storage locations volume-based
manage, 87 backup and restore, 95
summary
dynamic, 101
overview, 101
W
performance, 103 watchdog service, 45
static, 102 Webtop
system query debugging, 140
managing, 37 white space, 65
topology, 37 wild card
system management tasks, 31 highlighting, 101
wildcards, 71

T
test search, 136
X
ticket xDB
login, expired, 135 overview, 19
tokenization performance tuning, 168
language, 73 xDB admin tool, 36
special characters, 69 XHadmin, 36
Top N slowest queries, 151 xPlore administrator
tracing, 147 login, 32
troubleshoot xPlore server
CPS, 126 locations, 17
index agent, 120
indexing, 132 Z
query, 136
zone search, 112

EMC Documentum xPlore Version 1.0 Administration Guide 207

You might also like