Professional Documents
Culture Documents
Version 1.0
Administration Guide
EMC Corporation
Corporate Headquarters:
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com
Copyright © 2010 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change
without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS
OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY
DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
All other trademarks used herein are the property of their respective owners.
Table of Contents
Preface ................................................................................................................................ 11
Chapter 1 Overview of xPlore ...................................................................................... 13
Features and limitations .................................................................................... 13
Indexing features .......................................................................................... 13
Indexing limitations ...................................................................................... 13
Search features ............................................................................................. 14
Search limitations ......................................................................................... 14
Administration ............................................................................................. 14
Indexing and search: FAST and xPlore compared ........................................... 14
Administration differences ........................................................................ 15
Indexing differences ................................................................................. 15
Search differences ..................................................................................... 15
Architectural overview ..................................................................................... 16
xPlore physical and logical architecture.............................................................. 17
Physical architecture ..................................................................................... 17
xPlore disk areas ...................................................................................... 17
xPlore instances ........................................................................................ 18
xDB libraries ............................................................................................ 19
Indexes .................................................................................................... 20
Logical architecture ...................................................................................... 21
Physical and logical component mapping ...................................................... 24
Documentum domains and categories ............................................................... 25
Documentum collections data model (dm_fulltext_collection) ............................. 26
How Content Server documents are indexed ...................................................... 27
How Content Server documents are queried ...................................................... 29
Appendix A Configuration settings for CPS, Indexing, and Search .............................. 171
Documentum index agent parameters.............................................................. 171
Content processing instance settings ................................................................ 173
Document processing and indexing service settings .......................................... 175
Search service settings .................................................................................... 177
List of Figures
List of Tables
This guide describes the configuration and administration of Documentum xPlore. These tasks
include system monitoring, index configuration and management, query configuration and
management, auditing and security, and Documentum integration.
The documentation set also contains release notes, and installation guide, and a development guide.
These documents are available as PDF downloads on the EMC download site and as HTML within
the xPlore infocenter web application that is installed with xPlore. The infocenter is available from
the Help button in xPlore administrator tool.
Intended Audience
This guide contains information for xPlore administrators. The overview information is also helpful
to developers who are creating indexing or query customizations.
An administrator must be familiar with the installation guide, which describes the initial configuration
of the xPlore installation. For Documentum product users, this guide assumes familiarity with EMC
Documentum Content Server administration when Documentum functionality is discussed.
Revision History
The following changes have been made to this document.
Additional documentation
This guide provides overview and administration information. For information on installation
and development, refer to:
• Documentum Documentum xPlore Release Notes
• Documentum Documentum xPlore Deployment Guide
• Documentum Documentum xPlore Development Guide
For additional information on Content Server installation and Documentum search client
applications, refer to:
• Documentum Content Server Installation Guide
• Documentum Search Development Guide
Documentum xPlore is a multi-instance, scalable, high-performance, full-text index server that can be
configured for high availability and disaster recovery.
The following topics are described in this overview:
• Features and limitations, page 13
• Architectural overview, page 16
• xPlore physical and logical architecture, page 17
• Documentum domains and categories, page 25
• Documentum collections data model (dm_fulltext_collection), page 26
• How Content Server documents are indexed, page 27
• How Content Server documents are queried, page 29
Indexing features
Collection topography — xPlore supports creating collections online, and collections can span
multiple file systems.
Transactional updates and purges — xPlore supports transactional updates and purges of indexes
as well as transactional commit notification to the caller.
Multithreaded insertion into indexes — xPlore ingestion through multiple threads supports
vertical scaling on the same host.
Indexing limitations
Batch failure — Indexing requests are processed in batches. When one request in a batch fails,
the entire batch fails.
Lemmatization — xPlore supports lemmatization, but you cannot configure the parts of speech
that are lemmatized.
Search features
Case sensitivity — xPlore queries are lower-cased (rendered case-insensitive).
Faceted search — Facets in xPlore are computed over the entire result set or over a configurable
number of results.
Security evaluation — When a user performs a search, permissions are evaluated for each result.
Security can be evaluated in the xPlore full-text engine before results are returned to Content Server,
resulting in faster query results. This feature is turned on by default and can be configured or
turned off.
Native XQuery syntax — The xPlore full-text engine supports XQuery syntax.
Search limitations
Search topic — Zone searching (search topic in Documentum DQL) searches defined regions of an
XML document, for example, all child elements and attributes enclosed within an element. xPlore
does not support zone searching of attributes, although individual elements and their attributes can
be indexed and searched. You can configure xPlore to index XML content that is within an input
document, which will allow zone searching through XQuery or DQL.
XML attributes — xPlore does not index attribute values on XML elements. This refers to the
input XML. For example, in the stored DFTXML representations of Documentum documents, you
cannot find all documents for which the value of the dmfttype attribute of the element acl_name is
"dmstring."
Administration
xPlore has an administration console.
Administration differences
Many features in xPlore are configurable through xPlore administrator. These features were not
configurable for FAST. Additionally, administrative tasks are exposed through Java APIs.
Ports required — During xPlore instance configuration, the installer prompts for the HTTP port for
the JBoss instance (base port) and validates that the next 100 consecutive ports are available. During
index agent configuration, the installer prompts for the HTTP port for index agent Jboss instance and
validates that the next 20 consecutive ports are available. FAST used 4000 ports.
High availability — xPlore supports N+1, active/passive with clusters, and active/active shared data
configurations. FAST supports only active/active.
Disaster recovery — xPlore supports online backup, including full and incremental. FAST supports
only offline (cold) backup.
SAN and NAS — xPlore supports SAN and NAS. FAST supports SAN only.
64-bit address space — 64-bit systems are supported in xPlore but not in FAST.
Indexing differences
Backup and restore — xPlore supports warm backups and spare indexing instances. xPlore also
supports active/passive clusters for high availability.
Disaster recovery — xPlore automatically restarts content processing in case of a CPS crash. In the
case of a VM crash, the xPlore watchdog sends an email notification.
Transactional updates and purges — xPlore supports transactional updates and purges as well as
transactional commit notification to the caller. FAST does not.
Collection topography — xPlore supports creating collections online, and collections can span
multiple file systems. These features are not supported by FAST.
Lemmatization — FAST supports configuration for which parts of speech are lemmatized. In
xPlore, lemmatization is enabled or disabled.
Search differences
One-box search — Searches from the Webtop client default to ANDed query terms in xPlore. In
FAST, they defaulted to OR, resulting in many more non-specific hits.
Query a specific collection — Targeted queries are supported in xPlore but not FAST.
Folder descend — Folder descend queries are optimized in xPlore but not in FAST.
Security evaluation — Security is evaluated by default in the xPlore full-text engine before results
are returned to Content Server, resulting in faster query results. FAST returns results to the Content
Server, resulting in many hits that the user is not able to view.
Underprivileged user queries are optimized in xPlore but not in FAST.
Native XQuery syntax — XQuery syntax is supported by the xPlore full-text engine.
Facets — Facets are limited to 350 hits in FAST, but xPlore supports many more hits.
Hit count — FAST returns the total number of hits before returning results. xPlore does not.
Search topic — Zone searching (search topic) searches defined regions of an XML document,
for example, all child elements and attributes enclosed within an element. Zone searching is not
supported by xPlore, although individual elements and their attributes can be indexed. Zone
searching is supported by FAST for backward compatibility. Zone searches do not span entities nor
do they return the contents of the zone.
XML attributes — Attribute values on XML elements are part of the xPlore binary index. They
are not indexed by xPlore.
Wildcards — FAST matches fragments of words in wildcard searches, for example, in the Webtop
one-box search. xPlore matches whole words only. In advanced search with xPlore, you can use
wildcards to search for attributes. For example, run* produces a hit of "run fast" or "running" but
not on "runt" or "prune.". You can revert to fragment search in xPlore for both one-box and attribute
search, but performance is slower.
Special characters — Special character lists are configurable. The default in xPlore differs from
FAST when terms such as email addresses or contractions are tokenized. For example, in FAST, an
email address will be split up into separate tokens with the period and @ as boundaries. However, in
xPlore, only the @ will serve as the boundary, since the dot is considered a "context" character.
Architectural overview
xPlore provides query and indexing services that can be integrated into external content sources such
as the Documentum content management system. External content source clients like Webtop or
CenterStage, or custom Documentum DFC clients, can send indexing requests to xPlore.
Each document source is configured as a domain in xPlore. You can set up domains using xPlore
administrator. For Documentum environments, the Documentum index agent creates a domain for
each repository and a default collection within that domain.
Documents are provided in an XML representation to xPlore for indexing through the indexing APIs.
In a Documentum environment, the Documentum index agent prepares an XML representation of
each document to be indexed. The document is assigned to a category, and each category corresponds
to one or more collections as defined in xPlore. To support faceted search in Documentum
repositories, you can define a special type of an index called an implicit composite index.
xPlore instances are web application instances that reside on application servers. When an xPlore
instance receives an indexing request, it uses the document category to determine what should be
tokenized and saved to the index. The content is fetched by a local or remote instance of the content
processing service (CPS). CPS detects the primary language and format of a document. CPS then
extracts indexable content from the request stream and parses it into tokens. The tokens are used
for building a full-text index.
xPlore manages the full-text index. An external Apache Lucene full-text index is embedded into
the EMC XML database (xDB). xDB tracks indexing and updates requests, recording the status of
requests and the location of indexed content. xDB provides transactional updates to the Lucene
index. Indexes are still searchable during updates.
When an instance receives a query request, the request is processed on all instances, then the query
results are returned.
xPlore provides a web-based administration console.
Physical architecture
The xPlore index service and search service are deployed as a WAR file to a JBoss application server
that is included in the xPlore installer. xPlore administrator and online help are installed as war files
in the same JBoss application server. The index is stored in the storage location that was selected
during configuration of xPlore.
xPlore creates disk areas for xDB data and redo log, the Lucene index, a temp area, xPlore
configuration and utilities, and index agent content staging. Table 1, page 18 describes how these
areas are used during indexing and search. xPlore runtime files and instances are described in xPlore
instances, page 18. xPlore configuration is described throughout this administration guide.
xPlore instances
An xPlore instance is one deployment of the xPlore WAR file to an application server container. You
can have multiple instances on the same host (vertical scaling), although it is more common to have
one xPlore instance per host (horizontal scaling). You create an instance by running the xPlore
installer. You manage instances in xPlore administrator.
Note: All instances in an xPlore deployment must have their host clocks synchronized to the primary
xPlore instance host.
An instance can be configured to enable one or more of the following features:
• Content processing service (CPS)
• Indexing service
• Search service
• xPlore Administrator (includes analytics, instance, and data management services)
• Spare
A spare instance can be manually activated to take over for a disabled instance. Refer to Managing
spare and failed instances, page 42 for more information.
The first instance that is installed is designated as the primary instance. Secondary instances can be
added after the primary instance has been installed. The primary instance must be installed and
running when you install a secondary instance.
xDB libraries
xDB is a database that enables high-speed storage and manipulation of many XML documents.
An xDB library has a hierarchical structure similar to an OS directory. The library is a logical
container for other libraries or XML documents. The library corresponds to a collection in xPlore
with additional metadata such as category, usage, and properties. An xDB library stores an xPlore
collection as a Lucene index, optionally including the XML content that is indexed. xPlore manages
the indexes on the collection.
xDB manages the following libraries for xPlore:
• The root library contains a SystemData with metrics and audit databases. These databases record
metrics and audit queries by xPlore instance.
• Each domain contains an xDB tracking library (database) records the content that has been
indexed.
• Each domain contains a status library (database) that reports indexing status for the domain.
• Each domain contains one or more data libraries. The default library is the first that is created for
a domain.
When xPlore processes an XML representation of an input document and supplies tokens to xDB,
xDB stores them into a Lucene index. Optionally, xPlore can be configured to store the content
along with the tokens. A tracking database in xDB manages deletes and updates to the index. For
Documentum, this means that when documents are updated or deleted, changes to the index are
propagated. When xPlore supplies XQuery expressions to xDB, xDB passes them to the Lucene index.
xDB tracks the location of documents in order to query the correct index. xDB also manages parallel
dispatching of queries to more than one Lucene index. For example, if you have set up multiple
collections on different storage locations, the query is processed in parallel rather than sequentially.
xDB and the Lucene index are diagrammed in Figure 1, page 20.
An xDB library is stored on a data store. If you install more than one instance of xPlore, the storage
locations should be accessible by all instances. The xDB data stores and indexes can reside on a
separate data store, SAN or NAS. The locations are configurable in xPlore administrator. If you do
not have heavy performance requirements, xDB and the indexes can reside on the same data store.
Indexes
You can configure none, one, or multiple indexes on a collection. An explicit index can be created
based on values of XML elements, paths within the XML document, path-value combination, or
full-text content. For example, following is a value indexed field:
/dmftdoc[dmftmetadata//object_name="foo"]
Following is a tokenized, full-text field:
/dmftdoc[dmftmetadata//object_name ftcontains ’foo’]
xPlore manages an implicit index. xDB performs the index management within xPlore and provides
support for more search capabilities than standard Lucene index searches. Indexes can be compressed
to enhance performance.
Indexes are defined and configured in indexserverconfig.xml. (This file is located in
dsearch_home/config on the primary instance. Stop all xPlore instances to edit this file. Validate your
changes using the validation tool described in Modifying indexserverconfig.xml, page 36.) Back up
the xPlore federation after you change this file.
Table 2, page 21 describes the function of Lucene directories and their files on the file system.
Name Function
Logical architecture
A domain contains indexes for one or more categories of documents. A category is logically
represented as a collection. Each collection contains indexes on the content and on metadata for
which indexes have been defined. When a document is indexed, it is assigned to a category or
class of documents. The category can have one or more collections, with various kinds of indexes
defined on these collections.
The Documentum index agent creates al domain for the repository to which it connects. This domain
receives indexing requests from the Documentum index agent.
Categories — A category defines how a class of documents is indexed. All documents submitted for
ingestion must be in XML format. (For example, the Documentum index agent prepares an XML
version for Documentum repository indexing.) The category is defined in indexserverconfig.xml and
managed by xPlore. A category definition specifies the processing and semantics that is applied to an
ingested XML document. You can specify the XML elements that are used for language identification.
You can specify the elements that have compression, text extraction, tokenization, and storage of
tokens. You also specify the indexes that are defined on the category and the XML elements that are
not indexed. A category can map to more than one collection.
Using xPlore Administrator, you can define a collection and its category, back up the collection, and
change binding and state. If a collection has been configured to store XML tokens, the collection index
can be rebuilt without reingestion.
The metrics and audit systems use collections in a domain named SystemData. You can view this
domain and collections in xPlore administrator. One metrics and one audit database is defined. Each
database has a subcollection for each xPlore instance.
Example — A document is submitted for indexing. The client indexing application, for example,
Documentum index agent, has not specified the target collection for the document. If the document
exists, the index service updates the document. If it is a new document, the document is assigned
to an instance based on a round-robin order. On that instance, if the instance has more than one
collection, then collection routing is applied. If collection routing is not supplied by a client routing
class, the document is assigned to a collection in round-robin order.
Figure 5, page 25 shows the database structure for the two example instances.
• The entire xPlore federation library is stored in xDB root-library.
• One content source (Documentum repository A) is mapped to a domain library. The library is
stored in a defined storage area on either instance.
• A second repository, Repository B, has its own domain.
• All xPlore domains share the system metrics and audit databases (SystemData library in xDB with
libraries MetricsDB and AuditDB). The metrics and audit databases have a subcollection for
each xPlore instance.
• The ApplicationInfo library contains Documentum ACL and group collections for a specific
domain (repository).
• The SystemInfo library has two subcollections: TrackingDB and StatusDB. Each collection in
TrackingDB matches a collection in Data and is bound to the same instance as that data collection.
There is a subcollection in StatusDB for each xPlore instance. The instance-specific subcollection
has a file status.xml that contains processing information for objects that are being processed
by the instance.
• The Data collection has a default subcollection.
Documentum categories — A document category defines the characteristics of XML documents that
belong to that category and their processing. All documents are sent to a specific index based on
the document category. For example, xPlore pre-defines a category called dftxml that defines the
indexes. All Documentum indexable content and metadata are sent to this category. If your custom
types need special configuration and a separate index, create custom categories for them.
The following Documentum categories are defined within the <domain> element in
indexserverconfig.xml, which is located in dsearch_home/config. Shut down all xPlore instances
before changing this file. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.
• dftxml
XML representation of object metadata and content for full text indexing
• acl
ACLs that defined in the repository are indexed so that security can be evaluated in the full-text
engine. Refer to Documentum search results security, page 47 for more information.
• group
Groups defined in the repository are indexed to evaluate security in the full-text engine.
For more information on categories, refer to Categories, page 22.
The default is 0.
physical_indexes ID List of object IDs of indexes
within the collection.
mode integer Allowed operation mode. Valid
values are:
• 0, for read/write (index and
search)
The default is 0.
index_root_location string(255) Name of a dm_location object
r_object_count double Number of objects in the
collection
r_partition_name string(256) Name of partition for the
collection
Enabling indexing for an object type — Queue items for indexing are generated by events
in dmi_registry for the user dm_fulltext_index_user. The following events are registered for
dm_fulltext_index_user to generate indexing events by default:
• dm_sysobject: dm_save, dm_checkin, dm_destroy, dm_saveasnew, dm_move_content
• dm_acl: dm_save, dm_destroy, dm_saveasnew
• dm_group: dm_save, dm_destroy
Use Documentum Administrator to change the fulltext registration for an object type. Select the type,
view the properties, and for the property Enable indexing check Register for indexing. To change
specific events that are registered for fulltext, you must use the DFC API registerEvent().
Note: The type must be dm_sysobject or its subtype.
Reindexing — The index agent does not recreate all the queue items for reindexing. Instead, it
creates a watermark queue item (type dm_ftwatermark) to indicate the progress of reindexing. It
picks up all the objects for indexing in batches by running a query. The index agent updates the
watermark as it completes each batch. When the reindexing is completed, the watermark queue item
is updated to ’done’ status.
You can submit for reindexing one or all documents that failed indexing. In Documentum
Administrator, open Indexing Management > Index Queue. Choose Tools > Resubmit all failed
queue items, or select a queue item and choose Tools > Resubmit queue item.
1. Client application submits a DQL query to the Documentum Content Server. If the client
application uses DFC to create the query, DFC translates the query into XQuery syntax.
2. The Server transmits a DQL query to the query plugin, which translates the query into XQuery
syntax.
3. The query plugin transmits batches of HTTP messages containing XQuery statements to the
xPlore search service.
4. CPS identifies the primary language of the query, tokenizes it, and passes it to xDB. xDB then
breaks the query into XQuery clauses for full-text (using ftcontains) and metadata (using value
constraints). The query is executed in the Lucene index. The query is executed against all
collections unless a collection is specified in the query.
5. xDB applies the security filter. If configured, the Documentum security filter applies ACL and
group permissions to results.
6. The results are returned in batches, with summary, highlighting, and facets.
Most system administration tasks are available in xPlore administrator. When you open xPlore
administrator, you see the navigation tree and the system overview page. You can open
administration pages for system-wide services, instance-specific services, data management, and
diagnostics and troubleshooting.
When you open a service page, such as indexing service, the actions apply to all indexing services
in the xPlore installation. To change the indexing service configuration for a specific instance, open
the instance in the navigation tree and then choose the service.
For information on system troubleshooting, refer to Troubleshooting system problems, page 118.
The following topics describe system management:
• Using xPlore administrator, page 32
• Global configuration, page 33
• Tasks outside xPlore administrator, page 33
• Managing disk space, page 35
• Using the xDB admin tool, page 36
• Modifying indexserverconfig.xml, page 36
• Displaying and configuring the system, page 37
• Configuring system metrics, page 38
• Starting and stopping the system, page 38
• Managing the status database, page 38
• Managing domains, page 39
• Managing instances, page 40
• Managing spare and failed instances, page 42
• Using the watchdog service, page 45
For information on backup and restore, refer to Chapter 8, Backup and Restore.
• host: DNS name of the computer on which the xPlore primary instance is installed.
• port: xPlore primary instance port (default: 9300).
• password: xPlore administrator password that was used during installation of the primary
instance.
2. Specify values in these fields and click OK.
Viewing services
Expand Services in xPlore administrator.
• Click Indexing Service to view all indexing service instances in the xPlore federation.
For information on configuring the indexing service for a specific instance, expand Instances >
Instance_name > Indexing Service.
• Click Search Service to view all search service instances in the xPlore federation.
For information on configuring the search service for a specific instance, expand Instances
> Instance_name > Search Service.
• Click Content Processing Service to view all CPS instances in the xPlore federation.
For information on configuring CPS for a specific instance, expand Instances > Instance_name >
Content Processing Service.
• Click Logging to configure system-wide logging.
For information on configuring logging for a specific instance, expand Instances > Instance_name >
Logging.
• Click Tracing to configure system-wide tracing.
For information on configuring tracing for a specific instance, expand Instances > Instance_name
> Tracing.
Global configuration
Click Global Configuration to configure the following system-wide settings:
• Storage management
Managing storage locations, page 87
• Index service configuration
Document processing and indexing service settings, page 175
• Search service configuration
Search service settings, page 177
• Logging configuration
Logging, page 142
Indexing tasks in the Documentum environment — The following index agent tasks are performed
outside xPlore administrator.
• Limit content size for indexing (refer to Configuring the index agent, page 53.)
• Exclude ACL and group attributes from indexing (refer to Configuring the index agent, page 53.)
• Map file stores in shared directories (refer to Mapping file stores and content, page 58.
• Install additional index agents (refer to Setting up index agents for ACLs and groups, page 54).
• Map partitions to specific collections (refer to Mapping Content Server storage areas to collections,
page 60.
• Verify index agent migration (refer to Verifying index migration with ftintegrity, page 122).
• Customize indexing and query routing, filter object types, and inject metadata (refer to
Documentum xPlore Development Guide.
Search tasks in the Documentum environment — The following search configuration tasks are
performed outside xPlore administrator.
• Turn off xPlore native security (refer to Documentum search results security, page 47.
• Make types and attributes searchable (refer to Making types and attributes searchable, page 107).
• Turn off XQuery generation to support certain DQL operations (refer to Disabling XQuery
generation by DFC or DFS, page 108).
• Configure search for fragments, wildcards, and like terms (refer to Configuring search for
fragments, wildcards, and like terms, page 113).
• Routing a query to a specific collection Enabling query routing in DFC, page 112
• Turn on tracing for the Documentum query plugin (refer to Tracing Documentum queries, page
114).
• Customize facets and queries (refer to "Documentum customizations" in Documentum xPlore
Development Guide.
Caution: Do not use xhadmin to rebuild an index or change files that are used by xPlore. This
tool is not aware of xPlore configuration settings in indexserverconfig.xml.
After login, you see the tree in the left pane, which shows segments, users, groups, and libraries:
You can expand the root library to find a library and a collection of interest, then highlight a particular
indexed document to see its XML rendition. To query a library or collection, use the search icon at
the top of the admin client. The query window has tabs to show the results tree, debug the query,
and optimize the query.
Modifying indexserverconfig.xml
Some tasks are not available in xPlore administrator. These rarely-needed tasks require manual
editing of indexserverconfig.xml. This file is located in dsearch_home/config. Stop all instances in the
xPlore system before modifying this file.
Validate your changes using the tool validateConfigurationFile.bat or validateConfigurationFile.sh.
This tool is located in dsearch_home/dsearch/xhive/admin on the primary xPlore instance. From the
command line, type the following. Substitute your path to indexserverconfig.xml.
validateConfigurationFile.bat path_to_config_file
For example:
validateConfigurationFile.bat C:\xPlore\config\indexserverconfig.xml
Caution: Make your changes to this file using an XML editor. Changes must be encoded
in UTF-8. A simple text editor such as Notepad may insert characters using the native OS
encoding, causing validation to fail.
Note: Up to date metrics are available after an interval of wait timeout plus 60 seconds. For
example, a wait-timeout of 10 seconds, the latest metrics are available 70 seconds later. If
wait-timeout is too small, frequent writes to the metrics service database may affect xPlore
performance.
4. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36. Back up the xPlore federation after you change this file.
tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after
you change this file. The statusdb-cache-size property for each instance can be configured. In the
following example, the cache size is set to 1000 instead of the default 10000 bytes:
<node ...>
<properties>
<property value="1000" name="statusdb-cache-size"/>
</properties>
To conserve disk space on the primary host, you can purge the status database when the xPlore
primary instance starts up. By default, the status DB is not purged. To change this property, edit
indexserverconfig.xml. (This file is located in dsearch_home/config. Shut down the xPlore instance
before applying your changes. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36.) Back up the xPlore federation after you change this file. Set the value
of the purge-statusdb-on-startup attribute on the index-server-configuration element to true.
Managing domains
A domain is a separate, independent, logical, or structural grouping of collections. Domains are
managed through the Data Management screen in xPlore administrator. The Documentum index
agent creates a domain for the repository to which it connects. This domain receives indexing
requests from the repository.
To delete a domain, you must remove the domain library using the xDB admin tool.
When you select a domain, you can create or collection, configure the domain, or run an XQuery
in the domain.
Execute XQuery
You can query a domain or collection with Execute XQuery in xPlore administrator. Enter your
XQuery expression in the input area. Check to provide information to technical support. The options
get query plan and get optimizer debug are used to provide information to EMC technical support.
Create a domain
To create a domain, select Data Management in the left panel and then click New Domain in the right
panel. Choose a default document category. (Categories are specified in indexserverconfig.xml.)
Choose a storage location from the dropdown list. (To create a new storage location, refer to
Managing storage locations, page 87.)
Use a custom routing class to route documents to a domain that you have created. The Documentum
index agent creates a domain for each repository source and routes documents to the domain
collection using a routing class. For information on custom routing classes, refer to Documentum
xPlore Development Guide.
Configure a domain
To configure a domain, select the domain in the left panel and then click Configuration. The
document category and storage location are displayed (read-only). You can set the runtime mode
as normal (default) or maintenance (for corrupt domain). The mode does not persist across xPlore
sessions; mode reverts to runtime on xPlore restart.
For more information on maintenance mode, refer to Corrupt domain, page 91.
Managing instances
An xPlore instance is a web application instance that resides on an application server. In xPlore
administrator, click Instances to see a list of instances in the right content pane. You manage an
instance by selecting the instance in the left panel and then selecting the desired operation.
When you select an instance in xPlore administrator, the following instance information is displayed:
• OS information: Host name, status, OS, and architecture
• JVM information: Version, active thread count, and number of classes loaded.
• xPlore information: Instance version, instance type, and state
Services for the instance are accessible on the left: Indexing, search, CPS, logging, and tracing.
Collections that are bound to the instance are listed on the right. Click on a collection to go to the
Data Management view of the collection.
The application server instance name for each xPlore instance is recorded in indexserverconfig.xml.
If you change the name of the JBoss instance, you must change the value of the attribute
appserver-instance-name on the node element for that instance. This attribute is used for registering
and unregistering instances. Back up the xPlore federation after you change this file.
Note: All instances in an xPlore deployment must have their host clocks synchronized to the primary
xPlore instance host.
Configure an instance
You select the storage location for an instance when you configure xPlore for the instance.
You can configure the indexing service, search service, or content processing service for a secondary
instance. Stop the instance before changing its configuration. Select an instance in xPlore
administrator and then click Stop Instance.
To configure the indexing service, search service, or CPS for an instance, click the appropriate icon
in the left panel.
Note: You cannot configure the primary instance after you stop it. You must configure it manually.
Configure the primary instance — You can set the following attributes on the primary instance
element in indexserverconfig.xml, which is located in dsearch_home/config. Shut down the xPlore
instance before applying your changes. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.
• xdb-listener-port.
By default, the xDB listener port is set during xPlore installation.
<node name="primary" hostname="localhost" xdb-listener-port="9330">
...
</node>
• primaryNode attribute:
Set to true
• admin-rmi-port
Specify the port at which other instances connect to xPlore administrator. By default, this value is
set to the port number of the JBoss connector + 31. Default: 9331
• url
Specify the URL of the primary instance, used to set connections with additional instances.
Caution: Do not change the value of application-instance-name. This is the name of the
instance web application.
4. Locate the spare node element in indexserverconfig.xml. (The status attribute is set to spare.)
• Set the status to normal.
• Change the value of the primaryNode attribute to true.
• Change the value of the name attribute to the name of your previous primary instance, for
example, PrimaryDsearch.
Caution: You cannot replace a primary instance with a different name. Do not change
the value of the appserver-instance-name of the primary node in indexserverconfig.xml.
• Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36.
5. Edit indexserver-bootstrap.properties in the web application for the new primary instance, for
example, dsearch_home/jboss4.3.0/server/DctmServer_Spare/deploy/dsearch.war/WEB-INF/
classes. Change the value of the node-name property to PrimaryDsearch.
6. Change the xDB properties in xdb.properties. This file is in the directory WEB-INF/classes of the
new primary instance. Change the entries to match your new primary instance, for example:
XHIVE_BOOTSTRAP=xhive://Config8518VM0:9430
...
XHIVE_FEDERATION=C:/xPlore/config/XhiveDatabase.bootstrap
...
XHIVE_SERVER_PORT=9430
7. Edit xDB.properties in all other xPlore instances to reference the new primary instance.
8. Start the xPlore primary instance, then start the secondary instances.
9. Back up the federation.
10. Update all clients, such as xDB admin tool, index agent, and query plugin, to point to the new
primary instance name.
• xDB admin tool
Edit xh_runner.bat (Windows) or xs_runner.sh (Linux) in dsearch_home/xhive/admin. Your
new values must match those in indexserverconfig.xml for the new primary instance.
— Change the path for XHIVE_HOME to the path to the new primary instance web
application.
— Change the host name in XHIVE_BOOTSTRAP=xhive:// to match the hostname attribute
for.the new instance (in indexserverconfig.xml). Change the port to match the port for the
value of the attribute xdb-listener-port on the new instance. For example:
set XHIVE_BOOTSTRAP=xhive://NewHost:9430
— To set the host name, enter your new host name at the SET command line:
retrieve,c,dm_ftengine_config
set,c,l,param_value[3]
SET>new_hostname
save,c,l
• Update the environment variable DSS_INSTANCE on the new primary instance to point to
the path of the new instance. This environment variable is used by the restore scripts. (For
more information about the scripts, refer to Scripted backup and restore utilities, page 96.) For
example:
dsearch_home/jboss4.3.0/server/DctmServer_Spare/deploy/dsearch.war/WEB-INF
To turn off the watchdog service — On Windows hosts, stop the watchdog service: Documentum
Search Services Watchdog. On UNIX and Linux hosts, run the script stopWatchdog.sh in
dsearch_home/watchdog. If you run a stop script, run as the same administrator user who started
the instance.
To restart the watchdog service — On Windows hosts, start the watchdog service: Documentum
Search Services Watchdog. On UNIX and Linux hosts, run the script startWatchdog.sh in
dsearch_home/watchdog.
xPlore does not have a security subsystem. Anyone with access to the xPlore host port can connect to
it. You must secure the xPlore environment using network security components such as a firewall
and restriction of network access. Secure the xPlore administrator port and open it only to specific
client hosts.
4. If necessary, change the Groups-in cache cleanup interval by adding a property to the
security-filter-class properties. The default is 7200 sec (2 hours).
<property name="groupcache-clean-interval" value="7200">
5. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36. Back up the xPlore federation after you change this file.
Troubleshooting security
The following topics describe troubleshooting of Documentum security in search results.
For example:
<message Total not-in-groups cache hits="0" Number of matching group probes="0"
Total ACL cache hits="0" Number of ACL index probes="0" Total groups-in cache
hits="0" Total values from data page="6" Total values from index keys="0"
Number of group probes="3" Minimum permit level="2" Filter output="2"
Filter input="2"><![CDATA[]]></message>
Verify that the ACL IDs are registered for the events dm_save, dm_destroy, dm_saveasnew and the
group IDs are registered for the events dm_save and dm_destroy, for example:
?,c,select registered_id,event from dmi_registry where user_name=’
dm_fulltext_index_user’
• Make sure counter.xml has not been deleted from the collection domain_name/Data/
ApplicationInfo/group. If it has, restart xPlore.
• Try the query with Content Server security turned on. (Refer to To turn off security filtering in
the xPlore server, page 47.)
• Summary may be blank if the summary security mode is set to BROWSE. (Refer to Configuring
results summary security, page 49.)
The following topics describe Documentum indexing functionality and tasks in the xPlore server. For
information on troubleshooting, refer to Troubleshooting the Documentum index agent, page 120.
For information on creating custom indexes, refer to "Creating custom indexes" in Documentum
xPlore Development Guide.
dsearch_home/jboss4.3.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
For descriptions of the settings, refer to Documentum index agent parameters, page 171.
Note: If you change parameters in indexagent.xml, stop and restart the index agent for the
parameters to take effect.
Limit content size for indexing — You can set a maximum size for content that is indexed. This is
the actual document size, not the size of the text within the content. To set the maximum content
size, edit the contentSizeLimit parameter within the parent element exporter. The value is in bytes.
Default: 20MB.
Exclude ACL and group attributes from indexing — By default, all attributes of ACLs
and groups are indexed. You can specify that certain attributes of ACLs and groups are not
indexed. Add an acl_exclusion_list and group_exclusion_list element to the parent element
indexer_plugin_config/generic_indexer.parameter_list. These elements are described in Table 22,
page 171.
Change the local content storage location — When you configured the index agent,
you selected a local content temporary staging location. You can change this location
by editing the local_content_area element in indexagent.xml. This file is located in
dsearch_home/jboss4.3.0/server/DctmServer_Indexagent/deploy/IndexAgent.war/WEB-INF/classes.
Restart the index agent web application after editing this file.
Caution: For multi-instance xPlore, the temporary staging area for the index agent must be
accessible from all xPlore instances.
<parameter_value>aclgroup</parameter_value>
</parameter>
</parameter_list>
</generic_indexer>
</indexer_plugin_config>
In the indexagent.xml for sysobjects (the original index agent), add a similar parameter set. Set the
value of parameter_name to index_type_mode, and set the value of parameter_value to sysobject. Restart
both index agents. (To restart, navigate to dsearch_home/jboss4.3.0/server Run stopIndexagent.cmd and
stopIndexagent2.cmd. Then run startIndexagent.cmd and startIndexagent2.cmd.
4. To configure excluded folders, type a comma-delimited list of folder paths for the key
FoldersToExclude. By default, temp and system folders Jobs and Reports are excluded.
5. Save the file and restart the index agent application server.
Note: Documents indexed before the filters are installed are not filtered.
Testing whether the filters are installed — Use the following DQL statement. If the filters are
installed, a list of object IDs and names of the filters is returned:
select r_object_id,object_name from dmc_module where any a_interfaces=’
com.documentum.fc.indexagent.IDfCustomIndexFilter’
You can verify that the filters are loaded by the index agent in the index agent log, which is located in
the logs subdirectory of the index agent deployment directory in the JBoss application server. The
following example from the log shows that the FoldersToExclude filter was loaded:
2010-06-09 10:49:14,693 INFO FileConfigReader [http-0.0.0.0-9820-1]Filter FoldersToExclude Value:/
/System/Sysadmin/Reports, /System/Sysadmin/Jobs,
Invoking the filters in ftintegrity and stateofindex. — To invoke the index agent filters when you
run ftintegrity, follow the instructions in Verifying index migration with ftintegrity, page 122. To
invoke the filters when you run the stateofindex job, refer to Running the state of the index job, page
60. Both scripts generate a file ObjectId-filtered-out.txt that records all IDs of filtered-out objects.
To remove from the index documents that have already been indexed, refer to Removing entries
from the index, page 58.
There are two configuration options in mapping file stores. One configuration is for file system paths
to the content that are identical on the Content Server host and xPlore index server host. In the
other option, the paths are different.
Note: You cannot map the remote components of a distributed store, because content is moved to
the primary site for indexing. You also cannot map contents of a turbo storage area, an encrypted
store, or an external store.
Note: You must update the file_system_path attribute of the dm_location object in the
repository to match this local_mount value, and then restart the Content Server.
3. Save indexagent.xml and restart the index agent. (The application server containing the index
agent must be running.)
Tip: For better performance, you can mount the content storage to the xPlore index server host and
set all_filestores_local to true. Create a local file store map as shown in the following example:
<all_filestores_local>true</all_filestores_local>
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>\\192.168.195.129\DCTM\data\ftwinora\content_storage_01</local_mount>
</local_filestore>
Argument Description
-batchsize value Number of objects to be retrieved from the index in each batch.
Default: False. Cannot use this argument with the ftintegrity script.
-StartDate Local start date of sysobject r_modify_date, for range comparison.
Format: MM/dd/yyyy HH:mm:ss\n
-timeout Number of minutes to time out the session. Default: 1.
-usefilter value Invokes a custom filter. xPlore filters are not invoked. (For xPlore
filters, refer to Refer to Using the index agent filters, page 56. ) The
default value is F.
Reports from state of the index job — The job generates a job report, FTStateofIndexDoc.txt and
four results files. The FTStateofIndexDoc.txt contains information about the job execution, like the job
reports generated by other administration jobs. The four results files are:
• ObjectId-common-version-match.txt
This file contains the object IDs and i_vstamp values of all objects in the index and the repository
and having identical i_vstamp values in both places.
• ObjectId-common-version-mistch.txt
This file records all objects in the index and the repository with identical object IDs but
nonmatching i_vstamp values. For each object, it records the object ID, i_vstamp value in the
repository, and i_vstamp value in the index.
• ObjectId-dctmOnly.txt
This report contains the object IDs and i_vstamp values of objects in the repository but not in
the index.
• ObjectId-indexOnly.txt
This report contains the object IDs and i_vstamp values of objects in the index but not in the
repository.
The report and result files are in %DOCUMENTUM%\dba\log\sessionID\sysadmin
($DOCUMENTUM/dba/log/sessionID/sysadmin).
Note: You can also use ftintegrity to check the consistency between the repository and the xPlore
index. (Refer to Verifying index migration with ftintegrity, page 122.) To disable the FTStateOfIndex
job, enter the following using iAPI in Documentum Adminstrator:
Iapi>retrieve,c,dm_job where object_name=’dm_FTStateOfIndex’
Iapi>set,c,l,is_inactive
SET>T
Iapi>save,c,l
Caution: The remote instance must be on the same operating system as other xPlore instances.
For example:
http://DR:8080/services
In this same screen, specify whether the CPS instance will be used to process indexing requests
(the index option), search requests (the search option), or both (the all option).
2. Start the CPS instance using the start script startCPS.bat or startCPS.sh in dsearch_home/jboss4.3.0/
server. (On Windows, the standalone instance is installed as an automatic service.)
3. Test the remote CPS service using the WSDL testing page, with the following syntax:
http://hostname:port/services/cps/ContentProcessingService?wsdl
Note: When you install CPS on a host remote from the xPlore indexing server, make sure the location
specified in export_path in cps configuration.xml is accessible by xPlore.
You can configure some of the CPS settings in xPlore administrator. Click Configuration. For more
information, refer to Content processing instance settings, page 173.
The default settings have been optimized for most environments. You may require technical support
to evaluate the effects of changes to these settings. For a description of these settings, refer to Content
processing instance settings, page 173.
White space
Word separation is first identified by white space such as a space separator or line feed. Subsequently,
special characters are substituted with white space. Refer to Special characters, page 69.
For Asian languages, white space is not used. Content is tokenized by entity recognition and logical
fragments.
Lemmatization
Lemmatization is a normalization process that reduces a word to its canonical form. For example, a
word like books is normalized into book by removing the plural marker. Am, are, and is are normalized
to “be.” This behavior contrasts with stemming, a different normalization process in which stemmed
words are reduced to a string that sometimes is not a valid word. For example, ponies becomes poni.
xPlore uses an indexing analyzer that performs lemmatization. Studies have found that some form of
stemming or lemmatization is almost always helpful in search.
Lemmatization is applied to indexed documents and to queries. Lemmatization analyzes a word
for its context (part of speech), and the canonical form of a word (lemma) is indexed. The extracted
lemmas are actual words.
Note: Two forms of the same word may not be lemmatized to the same canonical form. For example,
“singing” is lemmatized to the noun form “singing,” and “sing” is lemmatized to the verb form
“sing.” A search on “singing” without context will not find content containing “sing.”
Lemmatization saves both the indexed term and its canonical form in the index, effectively doubling
the size of the index.
Disabling lemmatization — To turn off lemmatization for both indexing and search, add
an enable-lemmatization attribute to the domain element in indexserverconfig.xml. Set the
value to false. (This file is located in dsearch_home/config. Shut down the xPlore instance before
applying your changes. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36.) Back up the xPlore federation after you change this file.
Configuring lemmatization for specific categories and elements — Lemmatization for specific
categories and elements can be configured in indexserverconfig.xml. Within a category element,
add or edit a linguistic-process element. (You must shut down the xPlore instance before
applying your changes. Validate your changes using the validation tool described in Modifying
indexserverconfig.xml, page 36.) Back up the xPlore federation after you change this file. This
element can specify elements or their attributes that are lemmatized when indexed, as shown in the
following table of child elements. If you do not configure the linguistic-process element, then all
input XML fields will be processed.
Element Description
element-with-name The name attribute on this element specifies the
name of an element that contains lemmatizable
content.
Element Description
save-tokens-for-summary-processing Child of element-with-name. If this element
exists, the parent element tokens are saved.
They are used in determining a summary or
highlighting. Specify the maximum size of
documents in bytes as the value of the attribute
extract-text-size-less-than. Tokens will not be
saved for larger content. Set the maximum size
of tokens for the element as the value of the
attribute token-size.
element-with-attribute The name attribute on this element specifies the
name of an attribute on an element. The value
attribute contains a value of the attribute. When
the value is matched, the element content is
lemmatized.
element-for-language-identification Specifies an input element that is used by CPS to
identify the language of the document.
Caution: If you wish to apply your lemmatization changes to the existing index, you must
reindex your documents.
In the following example from indexserverconfig.xml, the content of an input element with the
attribute dmfttype with a value of dmstring is lemmatized. An input element with the name dmftcustom
is processed if the extracted text does not exceed 262144 bytes. Several elements are specified for
language identification.
<linguistic-process>
<element-with-attribute name="dmfttype" value="dmstring"/>
<element-with-name name="dmftcustom">
<save-tokens-for-summary-processing extract-text-size-="
262144" token-size="65536"/>
</element-with-name>
<element-for-language-identification name="object_name"/>
...
</linguistic-process>
Troubleshooting lemmatization — If a query does not return expected results, examine the
following:
• Test the query phrase or terms for lemmatization and compare to the lemmatization in the context
of the document. (You can test each sample using xPlore administrator Test Tokenization.
• View the query tokens by setting the dsearch logger level to DEBUG using xPlore administrator.
Expand Services > Logging and click Configuration. Set the log level for dsearchsearch. Tokens are
saved in dsearch.log.
• Check whether some parts of the input were not tokenized because they were excluded from
lemmatization: Text size exceeds the configured value of the extract-text-size-less-than attribute.
• Check whether a sub-path excludes the element from search. The sub-path attribute full-text-search
is set to false.
• If you have configured a collection to save tokens, you can view them in the xDB admin tool. (
Refer to Using the xDB admin tool, page 36. ) Token files are generated under the Tokens library,
located at the same level as the Data library. You can also view tokens in the stored DFTXML
using xPlore administrator if dynamic summary processing is enabled. (The number of tokens
stored in the DFTXML depends on the configured amount of tokens to save.) Click on a document
in a collection to see the DFTXML. Figure 9, page 68 displays tokens in xPlore administrator:
Configuring a collection to save tokens — To save tokens of metadata and content, set the property
save-tokens to true for the collection. The default is false. (Refer to Modifying indexserverconfig.xml,
page 36 for instructions on modifying indexserverconfig.xml.) For example:
The tokens database stores the original and root forms of the text, the components of compound
words, the starting and ending offset relative to the field the text is contained in, and whether it
was identified as a stop word.
<collection document-category="dftxml" usage="Data" name="default">
<properties>
<property value="true" name="save-tokens" />
</properties>
</collection>
Special characters
Special characters are used to break text into meaningful tokens. Two types of special characters
are defined in xPlore:
• Characters that are treated as white space
The default special characters are defined in indexserverconfig.xml as the value of the
special-characters attribute on the content-processing-services element:
@#$%^_~`*&;:()-+=<>/\[]{}
White space is substituted for these characters. For example, a phrase extract-text is tokenized as
extract and text, and a search for either term finds the document.
• Characters that are required for context (punctuation)
The default context characters are defined in indexserverconfig.xml as the value of the
context-characters attribute of the content-processing-services element:
!,.;?'"
White space is substituted after the parts of speech have been identified. For example, the email
address john.smith@emc.com contains a special character (@) and two instances of a context
special character ( . ) Because the context special character . is not punctuation in this example,
it is not replaced as white space. The string is tokenized as two tokens: john.smith emc.com
For the phrase “John Smith is working for EMC.” the period is filtered out because it functions
as a context special character (punctuation).
<content-processing-services ...context-characters="!,.;?'"" ...>
These characters are context-sensitive and cannot be used for tokenization until the part of speech
has been identified.
Queries that contain special characters — When a string containing a special character is indexed,
the tokens are stored next to each other in the index. A search for the string is treated as a phrase
search. For example, an index of home_base stores home and base next to each other. A search for
home_base finds the containing document but does not find other documents containing home or
base but not both.
Troubleshooting — If you edit a special characters list, you must reindex all your documents to
apply the new tokenization rules. If a query fails, check to see whether it contains a special character.
Case sensitivity
All characters are stored as lowercase in the index. For example, the phrase “I’m runNiNg iN THE
Rain” is lemmatized and tokenized as “I be run in the rain.”
Case sensitivity is not configurable.
Stop words
Stop words are words that are filtered out before indexing or query tokenization, to save the size of
the index and to prevent searches on common words. The stop words list for each language is located
in dsearch_home/dsearch/cps/cps_daemon/shared libraries/rlp/etc. Some languages do not require a
stop words list. Stop words are removed from phrase searches. This can cause phrase searches that
contain a stop word to fail. For example, a document that contains the phrase “be safe” would not
be found with a search for “be safe,” because the “be” is removed and a null set is intersected with
the documents that contain “safe.”
Editing the stop words list is not supported in this release.
Enabling stop words — Stop words are not enabled in this release. To enable stop words,
set the value of the property filter_stop_word (child of linguistic_processing) to true in the file
InstanceName_local_configuration.xml where InstanceName is the name of the instance in which CPS is
running. The file is located in dsearch_home/dsearch/cps/cps_daemon.
• If a plus sign follows a period (.+), one or more characters are matched.
• If two comma-separated numbers enclosed by curly braces follow a period, (.{n,m}), a specified
range of characters (at least n characters and no more than m characters) is matched.
To escape a wild card character in an XQuery statement, prefix a back slash. For example, a query
containing hotmail.com would be escaped to hotmail\.com.
Following are sample queries with wildcards.
To match a single word with a wildcard — To match glance with a wildcard, use syntax similar
to the following:
for $i in /dmftdoc[//object_name ftcontains ’g.*nce’ with wildcards] return
{$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier }
To find documents with two words — To match two words in a document, use syntax similar
to the following:
for $i in /dmftdoc[.ftcontains ’corporate’ ftand ’profile’] return
{$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier }
To find documents with a phrase — To match a phrase in a document, use syntax similar to the
following:
for $i in /dmftdoc[.ftcontains {’corporate’,’profile’} phrase] return
{$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier }
To find an exact match — To match an exact text in a document, use syntax similar to the following:
for $i in /dmftdoc[//object_name=’bugs.xls’] return
{$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier }
Query operators
Operators in XQuery expressions and DQL are interpreted in the following ways:
• XQuery operators
— The value operators = != < > specify a value comparison search. Search terms do not need to be
tokenized. Can be used for exact match or range searching on dates and IDs.
Any subpath that can be searched with a value operator should have the value-comparison
attribute set to true for the corresponding subpath configuration in indexserverconfig.xml. For
example, an improper configuration of the r_modify_date attribute sets full-text-search to
true. A date of ‘2010-04-01T06:55:29’ is tokenized into 5 tokens: ’2010’ ’04’ ’01T06’ ’55’ ’29’. A
search for ’04’ returns any document modified in April. The user will get many non-relevant
results. Therefore, r_modify_date should only have value-comparison set to true. Then the
date attribute is indexed as one token. A search for ’04’ would not hit all documents modified
in April.
— The ftcontains operator (XQFT syntax) specifies that the search term is tokenized before
searching against index.
Any subpath that can be searched by ftcontains should have the full-text-search attributed set to
true in the corresponding subpath configuration in indexserverconfig.xml.
• DQL operators
All string attributes are searched with the ftcontains operator in XQuery. All other attribute types
use value operators (= != < >).
• In DQL, dates are automatically normalized to UTC representation when translated to XQuery.
With IDfXQuery, it is the application’s responsibility to specify dates in UTC to match the format
in DFTXML.
Language
The language of the content plays a role in how the document is tokenized. During indexing, CPS
identifies the language of the document and uses this information for linguistic analysis. During a
query, the session locale is used as the language for linguistic analysis. If the language identified
during indexing does not match the language used during querying, different tokens can be
generated, resulting in no query results.
How to check the identified language of an indexed document — Use xPlore administrator to view
the DFTXML of a document. (Click the document in the collection view, under Data Management.)
The language is specified in the lang attribute on the dmftcontentref element. For example:
<dmftcontentref content-type="" lang="en" encoding="utf-16le" ...>
How to check the session locale of a query — Look at the xPlore log event that prints the query
string. The event includes the query-locale setting used for the query. For example:
<event timestamp...>
<message >
<![CDATA[QueryID=primary$f20cc611-14bb-41e8-8b37-2a4f1e135c70,query-locale=en,...>
How to change the session locale of a query — The session_locale attribute on a Documentum
object is automatically set based on the OS environment. You can change it per session in DFC or iAPI
in order to search for documents in a different language Use iAPI to change the session_locale:
set,c,sessionconfig,session_locale
In DFC, use IDfSession.getSessionConfig() to get the session config and use IDfTypedObject.
setString("session_locale", locale) on the session config object.
Sequences:
5. To prevent a word that is also listed in a system dictionary from being decomposed, set
com.basistech.cla.favor_user_dictionary to true.
The indexing service receives batches of requests to index from a custom indexing client. The
index requests are passed to the content processing service, which extracts tokens for indexing and
returns them to the indexing service. You can configure all indexing parameters by choosing Global
Configuration from the System Overview panel in xPlore administrator. You can configure the
same indexing parameters on a per-instance basis by choosing Indexing Service on an instance and
then choosing Configuration.
For information on indexing troubleshooting, refer to Troubleshooting indexing, page 132.
The following topics describe common tasks in the indexing process.
• Indexing scalability, page 75
• Modifying indexes, page 76
• Viewing and configuring indexing metrics, page 80
• Managing indexing in xPlore administrator, page 81
• Chapter 6, Managing Indexing
• Chapter 4, Managing the Index Agent
For information on managing index data, such as collections and categories, libraries and storage
location, refer to Chapter 7, Managing Index Data. For information on configuring indexing
performance, refer to Indexing performance, page 164.
Indexing scalability
To scale vertically, each indexing operation is implemented using ThreadPoolExecutor in the Java 1.5
concurrent thread package. The executor spawns or terminates threads based on the request load.
You can configure the core and maximum threadpool sizes in xPlore administrator.
You can achieve horizontal scalability by adding xPlore instances and binding collections to different
instances.
Modifying indexes
Modify indexes by editing indexserverconfig.xml, which is located in dsearch_home/config. By
default, Documentum content and metadata are indexed. You can tune the indexing configuration
for specific needs. Shut down the xPlore instance before applying your changes. Validate your
changes using the validation tool described in Modifying indexserverconfig.xml, page 36. Back up
the xPlore federation after you change this file. A full-text index can be created as a path-value
index with the FULL_TEXT option.
For information on creating Documentum indexes, refer to "Creating custom indexes" in Documentum
xPlore Development Guide.
Option Description
do-text-extraction Contains one or more for-element-with-name
elements that define content or metadata that
should be extracted for indexing.
for-element-with-name Specifies the names of elements that set
tokenization and handling of embedded XML.
for-element-with-name/xml-content When a document to be indexed contains XML
content, you must specify how that content
should be handled. It can be tokenized or not
(tokenize=”true | false"). It can be stored within
the input document or separately (store="embed
| separate | none"). Separate storage is not
supported for this release.
for-element-with-name/save-tokens-for- Sets tokenization of content in specific elements
summary-processing for summaries, for example, dmftcontentref
(content of a Documentum document). Specify
the maximum size of documents in bytes as the
value of the attribute extract-text-size-less-than.
Tokens will not be saved for larger content. Set
the maximum size of tokens for the element as
the value of the attribute token-size.
Option Description
xml-content on-embed-error You can specify how to handle parsing errors
when the on-embed-error attribute is set to true.
Handles errors such as syntax or external entity
access. Valid values: embed_as_cdata | ignore
| fail. The option embed_as_cdata stores the
entire XML content as a CData sub-node of the
specified node. The ignore option does not store
the XML content. For the fail option, content is
not searchable.
xml-content index-as-sub-path Boolean parameter that specifies whether
the path is stored with XML content when
xml-content embed attribute is set to true.
xml-content file-limit Sets the maximum size of embedded XML
content.
compress Compresses the text value of specified elements
to save storage space. Compressed content
is about 30% of submitted XML content.
Compression may slow the ingestion rate by
10-20%.
compress/for-element Using XPath notation, specifies the XML node
of the input document that contains text values
to be compressed.
Defining an index
Indexes are configured within an indexes element. (The path is category-definitions.category.indexes.)
Four types of indexes can be configured: fulltext-index, value-index, path-value index, and multi-path
index.
By default, multi-path indexes do not have all content indexed. If an element does not match a
configuration option, it is not indexed. To index all element content in a multi-path index, add a
sub-path element on //*. For example, to index all metadata content, use the path dmftmetadata//*.
The following child elements of node.indexes.index define an index.
Note: If tokenization is excluded for a specific attribute, search term matches for xPlore and FAST
return a different number of results. FAST indexes all attributes.
Modifying subpaths
A subpath definition in indexserverconfig.xml specifies the path to an element for which the path
information should be saved with the indexed value. A subpath increases index size while enhancing
performance. For most Documentum applications, you do not need to modify the definitions of the
subpath indexes, except for the following use cases:
• Add facet values to be stored in the index.
• Add paths for dmftcustom area elements.
• Add paths for XQuery of XML content.
• Modifying the capabilities of existing subpaths, such as supporting leading wildcard searches
for certain paths.
For these use cases, refer to Defining an index, page 77.
The interval at which metrics are saved is configurable by setting the interval attribute value. (The
unit is seconds.) The interval after which metrics are purged is configured as the value of the
delete-older-than property. The default is a purge every 90 days.
Expand an instance in the tree and choose Indexing Service. Statistics are displayed in the right
panel: tasks completed, with a breakdown by document properties, and performance.
• Configure indexing across all instances.
Expand Services > Indexing Service in the tree. Click Configuration. You can configure the
various options described in Document processing and indexing service settings, page 175. The
default values have been optimized for most environments.
• Start or stop indexing
To start or stop indexing, select an instance in the tree and choose Indexing Service. Click Enable
or Disable.
• View the indexing queue
To view the queue, expand an instance in the tree and choose Indexing Service. The queue is
displayed. You can cancel any indexing batch requests in the queue.
Note: This queue is not the same as the index agent queue. You can view the index agent queue in
the index agent UI or in Documentum administrator.
Configuring categories
A category defines a class of documents and their XML structure. The category is defined in
indexserverconfig.xml and specifies the processing and semantics that are applied to the ingested
XML document. You can specify the XML elements that have text extraction, tokenization, and
storage of tokens. You also specify the indexes that are defined on the category and the XML elements
that are not indexed. More than one collection can map to a category. xPlore manages categories.
Table 10, page 83 describes the options that can be configured for each category. Categories are
defined and configured in indexserverconfig.xml, which is located in dsearch_home/config. Shut
down all xPlore instances before changing this file. Validate your changes using the validation tool
described in Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you
change this file. The paths in this configuration file are in XPath syntax and refer to the path within
the XML representation of the document. (All documents are submitted for ingestion in an XML
representation.) Specify an XPath value to the element whose content requires text extraction for
indexing.
Option Description
category-definitions Contains one or more category elements.
Option Description
category Contains elements that govern category
indexing.
properties/property track-location Specifies whether to track the location (index
name) of the content in this category. For
Documentum DFTXML representations of
documents, the location is tracked in the tracking
DB. Documentum ACLs and groups are not
tracked because their index location is known.
Managing categories
Categories are defined in indexserverconfig.xml. Refer to Configuring categories, page 83 for
more information. When you create a collection, choose a category from the categories defined in
indexserverconfig.xml.
When you view the configuration of a collection, you see the assigned category. It cannot be changed
in xPlore administrator. To change the category, edit indexserverconfig.xml, which is located in
dsearch_home/config. Shut down all xPlore instances before changing this file. Validate your changes
using the validation tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore
federation after you change this file.
The indexes, text extraction settings, and compression setting for each category are also defined in
indexserverconfig.xml. For information on configuring these settings, refer to Modifying indexes,
page 76.
Adding a collection
Choose a domain and then choose New collection to create a collection. After you have created the
collection, you can change collection state in the Configuration menu. You can set the following
properties for a new collection:
• Collection name
• Parent domain
• Usage: Type of xDB library. Valid types: data (index) or applicationinfo.
• Document category: Categories are defined in indexserverconfig.xml.
• Binding instance: Existing instances are listed.
To change the binding of a collection, refer to Configuring collections, page 86.
• Storage location: Choose a storage location from the dropdown list. To define a storage location,
refer to Managing storage locations, page 87.
Note: There is no default collection for secondary instances of xPlore. Create a collection in the
domain and then bind it to the secondary instance before indexing documents into the secondary
instance.
Deleting a collection
Choose a domain and then click X next to the collection you wish to delete. A collection must have
the state index_and_search or index_only to be deleted. Collections with the state search_only or off_line
cannot be deleted in xPlore administrator. To delete these collections, use the xDB admin tool.
Note: When you remove a collection, the data is not deleted from xDB.
Configuring collections
You can configure the following properties on a collection. Select a collection and then choose
Configuration. The Edit collection screen displays the collection name, parent domain, usage,
state, binding instance, and storage location.
• State: index and search, index only, or search only.
You can attach a collection in search only (read-only) mode to multiple instances for query load
balancing and scalability. You can set a collection to index only to repair the index.
Note: Users and administrators cannot query a collection that is set to index only state.
• Binding instance: Existing xPlore instances are listed. To change binding, first detach the
collection from its current instance. To change the binding on a failed instance, you must first
restore the collection to the same instance or to a spare instance.
Note: You cannot change the binding of a subcollection to a different instance from the parent
collection.
Choose a collection in the Data Management tree. In the right pane, click Detach. Then click
Configuration to change the binding. A collection with the state index_and_search or index only
can be bound to only one instance. When the collection state is search_only, the collection can be
bound to multiple instances.
To remove a binding, set the state of the collection to search_only. If a binding instance is
unreachable, you cannot edit the binding.
• Storage location
To set up storage locations, refer to Managing storage locations, page 87.
You can perform the following actions on a collection:
• Attach or detach a collection.
A collection can be attached to one instance in index and search state (read-write) and to multiple
instances in search-only (read) state. To move or delete a collection, change the collection state to
search_only and then detach it. Choose a collection in the Data Management tree. In the right
pane, click Attach or Detach.
• Back up collections.
Choose a collection in the Data Management tree. Set the collection state to off_line. In the right
pane, click Backup.
You can specify the backup location path in indexserverconfig.xml. Shut down all xPlore instances
before changing this file. Edit the path attribute of the element admin-config/backup-location-path
with the path to your desired location. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you change this file.
• View a list of documents in a collection.
Choose a collection in the Data Management tree. You can filter the list of indexed documents to
see whether a particular document was indexed. Click Name for an individual document to view
the XML content of the document.
• Query a collection.
Choose a collection in the Data Management tree. In the right pane, click Execute XQuery. Check
Get query debug to debug your query. The query optimizer is for technical support use.
• Restore a collection. Refer to the procedure To restore a collection, page 94.
Adding a storage location — To add a storage location using xPlore administrator, choose System
Overview in the tree. Click Global Configuration and then choose the Storage Management tab.
Click Add Storage. Enter a name and path and save the storage location. The storage location is
created with unlimited size.
After you create a storage location, you can select it when you create a domain or a new collection.
For a collection, you have the option of choosing a storage location different from the storage location
of the domain.
Troubleshooting xDB
If xDB fails to start up, you can force a start. Set the value of force-restart-xdb in
indexserver-bootstrap.properties to true. (This file is located in the WEB-INF/classes directory
of the application server instance, for example, C:\xPlore\jboss4.3.0\server\DctmServer_
PrimaryDsearch\deploy\dsearch.war\WEB-INF\classes. Restart the xPlore instance.
If this property does not exist in indexserver-bootstrap.properties, add the following line:
force-restart-xdb=true
Caution: If you remove segments from xDB, your backups cannot be restored.
Database performance
To view xDB performance statistics, choose Data Management in the left panel and then choose
View DB Statistics.
You must back up a domain or xPlore federation after you make xPlore environment changes such as
adding or deleting a collection or changing a collection binding. If you do not back up, then a restore
of the domain or xPlore federation will put the system in an inconsistent state. Perform all your
anticipated configuration changes before performing a full federation backup.
Before backup and after restore, perform a database consistency check. Select Data Management
in xPlore Administrator and then choose Check DB Consistency. This check determines whether
there are any corrupted or missing files such as configuration files or Lucene indexes. Lucene
indexes are checked to see whether they are consistent with the xDB records: tree segments, xDB
page owners, and xDB DOM nodes.
The following topics describe backup and restore procedures:
• Backup and high availability configurations, page 89
• Handling data corruption, page 91
• Rebuilding indexes, page 92
• Native xPlore backup and restore, page 92
• Snapshot (volume-based) backup and restore, page 95
• File-based backup and restore, page 96
• Scripted backup and restore utilities, page 96
For more detailed information on planning your backup, recovery, and high availability environment,
refer to Documentum xPlore Deployment Guide.
Using xPlore administrator, you can back up large collections, separating older and newer data
backups.
• Domain
Using xPlore administrator, you can restore the index of a single document source, such as a
Documentum repository, across all xPlore instances.
• Complete xPlore system (federation)
You can back up all xPlore collections using xPlore administrator, volume-based, or file-based
technologies.
All restore operations are performed off-line.
Caution: Before creating a backup, and after a restore operation, run the database consistency
checker (refer to Check database consistency, page 40).
Note: *For file-based and volume-based backups, back up the following files on each instance:
indexserverconfig.xml file, the xDB transaction log files in dsearch_home/dblog, and the database and
index files in dsearch_home/data. Each xPlore instance has one or more domains and a single xDB
transaction log for instance data recovery. Back up each instance in a multi-instance environment to a
single file, then restore the instance from this file.
Caution: If you remove segments from xDB, your backups cannot be restored.
Corrupt collection or database redo log — If a specific collection, or the database redo log, is
reported as corrupted on server startup, you have one of two options:
• Restore the federation, domain, or collection from a previous backup.
• Force the server to start up. The offending collection and its index will be marked as unusable
and its update operations will be ignored. Refer to Troubleshooting xDB, page 87. You can then
restore the corrupted collection or log.
Corrupt domain — If a domain index is corrupt, use xPlore administrator to set the domain mode
to maintenance. (You can also use the CLI dsearch-set-domain-mode or the API setDomainMode to
set the mode to maintenance.) In maintenance mode, the only allowed operations are repair and
consistency check. Queries are allowed only from xPlore administrator. Queries from a Documentum
client will be tried as NOFTDQL in the Content Server but will not be processed by xPlore.
Use xPlore administrator to detach the corrupted domain. To restore the domain, refer to To restore a
domain, page 93. When xPlore is restarted, the domain mode is always set to normal (maintenance
mode is not persisted to disk).
Recovering from a system crash — Figure 11, page 92 diagrams a typical workflow that responds to
a system crash:
Notes:
1. Verifying index migration with ftintegrity, page 122.
2. Refer to Chapter 8, Backup and Restore.
3. Refer to Rebuilding indexes, page 92.
Rebuilding indexes
Indexes for an xPlore federation must be rebuilt by a cleanup and reingestion process. Perform the
following steps to remove and clean up all indexes:
1. Shut down all xPlore instances.
2. Delete everything under dsearch_home/data.
3. Delete everything under dsearch_home/config except the file indexserverconfig.xml.
4. Start xPlore instances.
5. Re-feed the documents.
Scripted backup and restore — The CLI for backup is dsearch-backup (federation, collection,
domain). Refer to Scripted backup and restore utilities, page 96. xPlore supports offline restore
only. The xPlore server must be shut down to restore a collection or an xPlore federation. If you are
restoring a full backup and an incremental backup, perform both restore procedures before restarting
the xPlore instances.
Incremental backups — By default, log files are deleted at each backup. For incremental backups,
change this setting before a full backup using the xDB admin tool. In the menu, choose Federation >
Change keep-log-file option. Enter the xPlore administrator password and check Keep log files.
When you change this setting, the log file from the full backup will not be deleted at the next
incremental backup.
Caution: If you are restoring a full backup and an incremental backup, restore both before
restarting xPlore instances.
If you are restoring a federation and a collection, do the following:
1. Restore the federation.
2. Start up and shut down xPlore.
3. Restore the collection.
4. Restart the xPlore instances.
To restore a domain
This procedure replaces the index data with a backup copy. This procedure assumes that no system
changes (new or deleted collections, changed bindings) have occurred since backup. (Always back up
the xPlore federation after you change the xPlore environment.)
Backup-directory is optional. The default location is specified in indexserverconfig.xml.
1. Force-detach the domain using xPlore administrator. If you are scripting backup and restore,
use the CLI search-force-detach. The type argument is domain.
dsearch-force-detach type hostname port domain-name
2. Generate the orphaned segment list. Use the CLI dsearch-list-orphaned-segments to list the segments
that will be orphaned after a restore operation. If an orphaned segment file is not specified, the
IDs of orphaned segments are sent to stdio.
8. Perform a consistency check and test search. Select Data Management in xPlore Administrator
and then choose Check DB Consistency.
9. Set the domain to normal mode using xPlore administrator.
10. (Documentum environment) Run the ACL and group replication script to update any
changes since the backup. The script aclreplication_for_repositoryname.bat or .sh is located in
dsearch_home/setup/indexagent/tools. Edit the script before you run it to set the password and
(optional) xPlore domain .
11. (Documentum environment) Run ftintegrity. (Refer to Verifying index migration with ftintegrity,
page 122.)
To restore a collection
This procedure replaces the index data with a backup copy. This procedure assumes that no system
changes (new or deleted collections, changed bindings) have occurred since backup. Back up the
xPlore federation after you change the xPlore environment.
1. Set the collection to off_line using xPlore administrator. Select the collection and click
Configuration.
2. Force-detach the collection using xPlore administrator. If you are scripting backup and restore,
use the CLI dsearch-force-detach. The type argument value is collection.
dsearch-force-detach type hostname port domain-name collection-name
7. Perform a consistency check and test search. Select Data Management in xPlore Administrator
and then choose Check DB Consistency.
8. (Documentum environment) Run the ACL and group replication script to update any
changes since the backup. The script aclreplication_for_repositoryname.bat or .sh is located in
dsearch_home/setup/indexagent/tools. Edit the script before you run it to set the repository
name, repository user, password, xPlore primary instance host, xPlore port, and xPlore domain
(optional).
9. (Documentum environment) Run ftintegrity. (Refer to Verifying index migration with ftintegrity,
page 122.)
2. Set all domains to the read_only state. The script to turn off indexing is described in Turning off
indexing or changing state, page 97.
3. Use your third-party backup software to back up or restore the system.
4. Resume xDB with the following command:
5. Set all domains to the reset state and then turn on indexing. (This state is not displayed anywhere
in xPlore administrator and is used only for the backup and restore utilities.) The script to turn on
indexing is described in Turning off indexing or changing state, page 97.
2. Set all domains to the read_only state. The script to turn off indexing is described in Turning off
indexing or changing state, page 97.
3. Use your third-party backup software to back up or restore the system.
4. Resume xDB with the following command:
XHCommand suspend-diskwrites --resume
If you restore without purging orphaned segments, the xPlore primary instance may not start up.
In this case, you can force an xDB restart. Refer to Troubleshooting xDB, page 87.
5. Set all domains to the reset state. (This state is not displayed anywhere in xPlore administrator
and is used only for the backup and restore utilities.) The script to turn on indexing is described
in Turning off indexing or changing state, page 97.
For example:
dsearch-set-state domain localhost 9300 defaultDomain read_only
dsearch-set-state domain localhost 9300 defaultDomain reset
The syntax to set collection state is the following. Valid states are: index_only, search_only,
index_and_search, or off_line.
dsearch-set-state collection host port domain_name collection_name state
For example:
dsearch-set-state collection localhost 9300 defaultDomain default1 search_only
dsearch-set-state collection localhost 9300 defaultDomain default1 index_and_search
dsearch-set-state collection localhost 9300 defaultDomain default1 index_only
dsearch-set-state collection localhost 9300 defaultDomain default1 off_line
Backup utilities
In the following utilities, backup-directory is optional. The default location is specified in
indexserverconfig.xml.
For example:
dsearch-backup federation localhost 9300 full c:/xPlore/backup
To back up a domain — Turn off indexing by setting the domain state to read_only. (Refer to Turning
off indexing or changing state, page 97.) Then use the dsearch-backup tool with the following syntax.
dsearch-backup domain hostname port domain-name [backup-directory]
To back up a collection — Set the collection state to search_only. (Refer to Turning off indexing or
changing state, page 97.) Then use the dsearch-backup tool with the following syntax (on a single
line).
dsearch-backup collection hostname port domain-name
collection-name [backup-directory]
After restoring, run the CLI dsearch-purge-orphaned-segments to purge those segments. Specify the
absolute path to the file generated by dsearch-list-orphaned-segments as orphaned-segment-file. If an
orphaned segment file is not specified, the orphaned segment IDs are read from stdin.
dsearch-purge-orphaned-segments [orphaned-segment-file]
Restore utilities
Restoring a federation, domain, or collection — For a domain or collection, be sure to purge
orphaned segments before a restore operation. (Refer to Purging orphaned segments, page 98.) Then
follow the steps described in Native xPlore backup and restore, page 92.
Note: If you restore without purging orphaned segments, the xPlore primary instance may not start
up. In this case, you can force an xDB restart. Refer to Troubleshooting xDB, page 87.
The search service receives queries from a search client in the form of XQuery statements. The query
is submitted to the Lucene
You can configure all search service parameters by choosing Global Configuration from the System
Overview panel in xPlore administrator. You can configure the same search service parameters on a
per-instance basis by choosing Search Service on an instance and then choosing Configuration.
Enabling or disabling search on an instance — You can enable or disable search by choosing an
instance of the search service in the left pane of the administrator. Click Disable (or Enable).
Canceling running queries — You can view a list of individual queries and cancel individual
queries. Choose an instance of the search service in the left pane of the administrator.
For information on query troubleshooting, refer to Troubleshooting search, page 136.
The following topics describe search management:
• Configuring search, page 99
• Viewing search statistics, page 100
• Configuring scoring and freshness, page 100
• Configuring query summary and highlighting, page 101
• Auditing queries, page 103
• Documentum Search, page 105
Configuring search
You can configure parameters for the search service in xPlore administrator. The default values have
been optimized for most environments. For details, refer to Search service settings, page 177.
The stop words list for each language is located in dsearch_home/dsearch/cps/cps_daemon/shared
libraries/rlp/etc. Some languages do not require a stop words list. Editing the stop words list is
not supported in this release.
Boosting metadata in scores — Scores for hits in metadata can be increased by adding a boost-value
attribute to a subpath element. The default boost-value (multiplier) is 1.0. In the following example, a
hit in the keywords metadata doubles the score for a result:
<sub-path returnable="true" boost-value="2.0" path="dmftmetadata/keywords" />
Boosting recent documents — The Documentum attribute r_modify_date is used to boost scores
in results. By default, a freshness boost is applied to the default collection. The multiplier is based
on how recent the document is. To remove this boost, set the property enable-freshness-score to false
on the parent category element. For example:
<category name=’dftxml’><properties>
...
<property name="enable-freshness-score" value="false" />
</properties></category>
Configuring the summary length — xPlore returns a summary display window from the
summary computation text. The length of this window is specified as the value of the parameter
query-summary-display-length. This window within the summary text is returned to the client
application. If no search term is found in the summary text, a static summary of the specified length
from the beginning of the text is displayed and no terms are highlighted. The default value of this
parameter is 256. This means that 256 characters surrounding the search terms are returned as the
summary. Configure this value in xPlore administrator: Search Service > Configuration.
Summaries can be dynamic or static, depending on your needs for summary precision and query
performance.
• The size of the content is less than the value of extract-text-size-less-than. This
setting is an attribute on the save-tokens-for-summary-processing element in
category-definitions.category.do-text-extraction. The default value is -1 (all documents are
included). If this is set to a positive value, a static summary is returned for larger documents. For
faster summary calculation, set this value to a positive value.
• The query term appears within the first n characters as defined by the token-size attribute.
This setting is an attribute on the save-tokens-for-summary-processing element in
category-definitions.category.do-text-extraction. The default value is 65536 (64K). If the query
term is not found in this snippet, a static summary is returned and term hits are not highlighted.
A value of -1 indicates no maximum content size, but this negatively impacts performance. For
faster summary calculation, set this value lower.
• If security is evaluated in xPlore (not Content Server), and the security_mode property of the
dm_ftengine_config object is set to BROWSE, the user must have at least READ permission. Refer
to Configuring results summary security, page 49.
Configuring static summaries — Static summaries are much faster to compute but less specific than
dynamic summaries. Static summaries are computed, even if you have enabled dynamic summaries,
when the summary conditions do not match the conditions configured for dynamic summaries.
(Refer to Configuring dynamic summaries, page 101). To route all summary computation to static
summaries, set query-enable-dynamic-summary to false in xPlore administrator. (Dynamic summaries
are enabled by default.) Choose the Search Service and click Configuration.
When dynamic summary is turned off, the first n characters of the document are displayed, where n is
the value of the parameter query-summary-display-length. Configure the size of the static summary
display window using xPlore administrator. Set the number of characters to display.
You can specify metadata elements that are displayed in a static summary. Set the following
parameters in indexserverconfig.xml, which is located in dsearch_home/config. Stop all xPlore
instances before modifying this file. Validate your changes using the validation tool described in
Modifying indexserverconfig.xml, page 36.
• elements-for-static-summary
Child element of category-definitions.category. Sets the elements whose contents are evaluated
for a static summary. The max-size attribute sets the maximum size of the static summary.
Default: 65536 (bytes)’
• element-name
Child element of elements-for-static-summary. Specifies an element whose content is analyzed
for a static summary.
Highlighting — The search terms, including lemmatized terms, are highlighted within the summary
that is returned to the client search application. Wildcard search terms are also highlighted. For
example, if the search term is ran*, then the word rant is highlighted.
Note: If a search term is in the document but not in the summary computation string. it will not be
visible (or highlighted) in the summary.
Highlighting does not preserve query context such as phrase search, AND search, NOT search, fuzzy
search, or range search. Each search term in the original query is highlighted separately.
Auditing queries
Queries are audited to help identify problems. Auditing is on by default. To turn off auditing,
expand Diagnostic and troubleshooting in xPlore administrator left pane and then choose Audit
records. Click Disable .
Audit records are saved in an xDB collection named AuditDB. You can view the audit record for a
selected date range using xPlore administrator. To view or create reports on the audit record, refer to
Chapter 11, Using Reports. Auditing provides the following information:
• The XQuery expression
• The library in which the hits were found
• The number of hits
• Number of hits filtered out by security
• The number of items returned
• The amount of time to execute the query
• The time elapsed to fetch results
• Number of hits to be filtered by security
• Number of hits filtered out by security
• Number of Documentum groups in the cache
• Number of Documentum groups excluded from the cache
To configure audit record properties, stop all xPlore instances and edit indexserverconfig.xml. Make
your changes to the security.auditing element. If the property is not included in indexserverconfig,
add it. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 36. Back up the xPlore federation after you change this file.
• auditing.location element: Specifies a storage path for the auditing file. Attributes: name, path,
size-limit. Size limit units: K | M | G | T (KB, MB, GB, TB). Default: 2G,
• audit-config element: Configures auditing. Attributes: name, component, status, format, location.
• properties.property element. Name: audit-save-batch-size. Specifies how many records are
batched before a save. Default: 100.
To configure security cache sizes, refer to To change security cache sizes, page 48.
To view a query in the audit records — Choose a report from the date selector, and then choose
View. Double-click a query of interest to view the XML entry in the report. The XQuery expression is
contained within the QUERY element.
Audit record format — An audit record has the following format in XML:
<event name="<event_name>" component="<component_name>">
<element-name>value</element_name>
...
</event>
How many times the query added a group to the group-out cache.
The audit record reports how many times these caches were hit for a query (, ) and (, ). For details on
these configuration settings, refer to To change security cache sizes, page 48.
For XML records, each event is added to a root instance called AuditRecords. For example:
-<event component="search" name="QUERY">
−<QUERY_ID>
PrimaryDsearch$27571452-1cd3-41c0-9f32-b75629e9be6e
</QUERY_ID>
−<QUERY>
return
<row>
<cell> { $k/QUERY_ID/text() } </cell>
<cell> { $k/QUERY/text() } </cell>
</row>))
} </rowset>
</report>
</QUERY>
<USER_NAME>admin</USER_NAME>
<QUERY_OPTION APPLICATION_NAME="AdminReports" BATCH_SIZE="0" CACHED="true" COLLECTION="
/SystemDataDomain" DOMAIN="" EXECUTION_PLAN="false" LOCALE="en" PARALLEL_EXECUTION="
false" RETURN_SUMMARY="false" RETURN_TEXT="false" SECURITY_EVAL="false" SECURITY_FILTER="
" SPOOLING="false" STREAMING_RESULT="true" SYSTEM_QUERY="true" TIMEOUT="0"
WAIT_FOR_RESULTS="true"/>
<NODE_NAME>PrimaryDsearch</NODE_NAME>
<LIBRARY_PATH>/SystemDataDomain</LIBRARY_PATH>
<FETCH_COUNT>1</FETCH_COUNT>
<TOTAL_HITS>1</TOTAL_HITS>
<START_TIME>2010-04-07T19:06:09</START_TIME>
<EXEC_TIME>0</EXEC_TIME>
<FETCH_TIME>0</FETCH_TIME>
<TOTAL_TIME>0</TOTAL_TIME>
<STATUS>success</STATUS>
</event>
Documentum Search
The following topics describe Documentum indexing and query functionality and tasks in the xPlore
server.
Checking the dm_ftengine_config settings — Use iAPI, DQL, or DFC to check the
dm_ftengine_config object. To view existing parameters using iAPI in Documentum Administrator,
first get the object ID:
retrieve,c,dm_ftengine_config
Changing a ft_engine_config parameter — Use iAPI, DQL, or DFC to modify the ft_engine_config
object. To add a parameter using iAPI in Documentum Administrator, use append similar to the
following:
retrieve,c,dm_ftengine_config
append,c,l,param_name
fast_wildcard_compatible
append,c,l,param_value
true
save,c,l
Folder descend
Folder descend query performance can depend on folder hierarchy and data distribution across
folders. The following conditions can degrade query performance:
• Many folders, and a large portion of them are empty
Increase folder_cache_limit in the dm_ftengine_config object.
• The search predicate is unselective but the folder constraint is selective
Decrease folder_cache_limit in the dm_ftengine_config object.
The folder_cache_limit setting in the dm_ftengine_config object specifies the maximum number
of folder IDs probed. Default is 2000. If the folder descend condition evaluates to less than the
folder_cache_limit value, then folder IDs are pushed into the index probe. If the condition exceeds the
folder_cache_limit value, the folder constraint is evaluated separately for each result.
Disabling XQuery generation by DFC or DFS — You can disable XQuery generation by DFC or
DFS. This allows you to use a DQL hints file or hints in a DQL query. The hints file allows you to
specify certain conditions under which a database or standard query is done in place of a full-text
query.
Turn off XQuery generation by adding the following setting to dfc.properties on the DFC client
application:
dfc.search.fulltext.enabled=false
For example:
select r_object_id from dm_document search document contains ’report’
in collection ( ’default’ ) enable(return_top 10)
FTDQL
The Webtop search components use the DFC query builder package to construct a query. The DFC
query builder adds the DQL hint TRY_FTDQL_FIRST. This hint prevents timeouts and resource
exceptions by querying the attributes portion of a query against the repository database. The query
builder also bypasses lemmatization by using a DQL hint for wildcard and phrase searches.
If wildcard attribute searches ("contains", "begins with", "ends with") have many results, they can
time out. You can configure attributes searches to go directly against the repository metadata, which
can be faster than the default behavior of the TRY_FTDQL_FIRST hint. In the following DQL hints
file example, FTDQL is turned off for object_name attribute queries:
<Rule>
<Condition>
<Where>
<Attribute operator="like">object_name
</Attribute>
</Where>
</Condition>
<DisableFTDQL/>
<Rule>
The following query is generated when the user searches on the object name and also enters a string
into the Webtop full-text box. The string "technical" is queried in the full-text index and the query for
object_name containing "WDK" is queried against the database:
SELECT r_object_id,text,object_name,... FROM dm_document SEARCH DOCUMENT
CONTAINS technical WHERE (UPPER(object_name) LIKE %WDK% ESCAPE \)
AND (a_is_hidden = FALSE) ENABLE(NOFTDQL)
Note: If your hint contains an object type condition, the hint is applied only for that type and
its subtypes, not for the supertype.
The DQL hints file location is specified in the DFC configuration file dfc.properties on the application
server host. The file must be named dfc.dqlhints.xml. If the file has been modified, it is reloaded
every two minutes. The following line could be added to dfc.properties to specify a Windows
location for the hints file:
dfc.dqlhints.file=C:/Documentum/config/dfc-dqlhints.xml
Alternatively, you can place a DQL hints file in the application server host system classpath or as
a system environment variable, for example:
-Ddfc.dqlhints.file=path_to_hints_file
Use forward slashes for paths in Java properties file (back slash is used for escape). Alternatively, the
file can be loaded from classpath or the DFC data home directory on the application server host.
The following elements are contained within a root <RuleSet> element to define the hints passed
to IDfQueryManager.
Element Description
<Rule> Can have zero to many <Condition> elements
<DisableFullText/> Disables full-text search on basic search or attributes for the conditions in
the rule
<DisableFTDQL/> Disables search for metadata in the FT index.
<Condition> Child elements are ANDed
<Select>, Child <Attribute> elements can be ANDed (condition="all") or ORed
<Where> (condition="any")
<SelectOption> Adds a permission, for example, FOR READ or FOR BROWSE. For
example, FOR DELETE would limit the results of a query that meets the
condition to those documents on which the user has delete permission.
The following example applies to all Webtop queries:
<RuleSet> <Rule> <Condition> <Where> <Attribute
operator="like">object_name</Attribute> </Where>
</Condition> <SelectOption>FOR DELETE</SelectOption>
<DisableFTDQL/> </Rule> </RuleSet>
<From> Child <Type> elements can be ANDed (condition="all") or ORed
(condition="any")
<Docbase> The value of this element corresponds to a repository to which the hint
applies. The descend attribute is optiona. Fefault=false. To apply the DQL
hint to a folder and all its subfolders,set descend="true".
<Attribute>, <Type>, Support Java regular expression (java.util.regex.Pattern). For example,
<Docbase> <type>custom.*</type> matches all type names beginning with "custom".
<Attribute> Operator "like" represents DQL predicates CONTAINS and LIKE.
The value "is_null" represents DQL predicates NULL, NULLINT,
NULLSTRING, and NULLDATE.
Element Description
<FulltextExpression> Child of <condition>. Set the mandatory exists attribute ="false" to add
ENABLE(NOFTDQL) to the query when there is no full-text expression in
the search.
<DQLHint> Contains any valid DQL hint, including IN COLLECTION and
RETURN_TOP N. For the full list of DQL hints, refer to Documentum
Content Server DQL Reference Manual.
Turning off FTDQL for specific types — In the following example, attributes for the specified object
type are queried in the database, not the full-text index:
<RuleSet>
<Rule>
<Condition>
<From condition="any">
<Type>km_message</Type>
</From>
</Condition>
<DisableFTDQL/>
</Rule>
</RuleSet>
Adding multiple hints to queries — The following example adds two hints to wildcard queries
on either of two attributes:
<RuleSet>
<Rule>
<Condition>
<Where condition="any">
<Attribute operator="like">subject</Attribute>
<Attribute operator="like">object_name</Attribute>
</Where>
</Condition>
<DQLHint>ENABLE(SQL_DEF_RESULT_SET 100, NOFTDQL)</DQLHint>
<DisableFTDQL/>
</Rule>
</RuleSet>
Using multiple rules — In the following hints file, one rule applies to queries for one attribute,
the second rule applies to a different attribute:
<RuleSet>
<Rule>
<Condition>
<Where condition="any">
<Attribute operator="like">subject</Attribute>
</Where>
</Condition>
<DQLHint> ENABLE(SQL_DEF_RESULT_SET 100, NOFTDQL) </DQLHint>
<DisableFTDQL/>
</Rule>
<Rule>
<Condition>
<Where condition="any">
<Attribute operator="like">object_name</Attribute>
</Where>
</Condition>
<DQLHint> ENABLE(SQL_DEF_RESULT_SET 10) </DQLHint>
<DisableFTDQL/>
</Rule>
</RuleSet>
Make sure that your multiple rules are mutually exclusive when applied to a single query. If not, the
query generates a DQL syntax error. If the Webtop user adds both attributes to the query (subject and
object_name), this hints file example throws an error.
some VQL queries to XQuery equivalents. For information on using these APIs, refer to Documentum
xPlore Development Guide.
• Perform structured searches of XML documents using XQuery or the DFC interface IDfXQuery.
• Join different objects using DQL (NOFTDQL), XQuery, or the DFC interface IDfXQuery.
• Denormalize the relationship of a document to other objects or tables, such as email attachments,
using XQuery or the DFC interface IDfXQuery.
• Perform boolean searches using DQL, XQuery, or the DFC interface IDfXQuery.
Caution: Searches for word fragments are generally much slower than searches for entire
words. Memory consumption on the search server, and user experience, may not be acceptable.
Search precision is degraded by fragment search.
Turning on support for fragments — xPlore does not search for word fragments. For example,
a search for “car*” turns up “car” but not “careful.” The FAST indexing server supported word
fragment searches for leading and trailing wild cards in metadata and word fragment searches in
SEARCH DOCUMENT CONTAINS (SDC) full-text queries. DQL queries that contain the DQL
hint FT_CONTAIN_FRAGMENT in the where clause were converted to the search clause LIKE
’%word%’. For example, a search for com was converted to the clause LIKE ’%com%’, finding
documents containing committee or incoming.
You can set xPlore to backward compatibility for this behavior in FTDQL SDC queries and DQL
where clauses. Edit the dm_ftengine_config object in the Content Server. Add a param_name element
with the name fast_wildcard_compatible. Add the param_value element and set it to true.
Checking the dm_ftengine_config settings — Use iAPI, DQL, or DFC to check the
dm_ftengine_config object. To view existing parameters using iAPI in Documentum Administrator,
first get the object ID:
retrieve,c,dm_ftengine_config
Adding legacy wildcard behavior to the ft_engine_config object — Use iAPI, DQL, or DFC to
modify the ft_engine_config object. To add a parameter using iAPI in Documentum Administrator,
use append as follows:
retrieve,c,dm_ftengine_config
append,c,l,param_name
fast_wildcard_compatible
append,c,l,param_value
true
save,c,l
Turning off search lemmatization — xPlore supports search for similar or like terms, also
known as lemmatization, by default. To speed indexing and search performance, you can turn off
lemmatization for indexing. Refer to Disabling lemmatization, page 66. Validate your changes using
the validation tool described in Modifying indexserverconfig.xml, page 36. ) Back up the xPlore
federation after you change this file.
You can turn off lemmatization for individual queries by using the XQFT modifier “with stemming”
or “without stemming.” The XQFT default is “without stemming,” but the Documentum DQL default
is “with stemming.” To turn off stemming in Documentum queries, you must use a phrase search.
Traces Content Server search operations such as initializing full-text in-memory objects and
the options used in a query.
• ftplugin
Traces the query plugin front end operations such as DQL translation to XQuery, calls to the
back end, and fetching of each result.
• ftengine
Traces back end operations such as HTTP transactions between the query plugin and xPlore, the
request stream sent to xPlore, the result stream returned from xPlore, and the query execution
plan.
• none
You can trace queries using the MODIFY_TRACE apply method. To turn on tracing in iAPI, type the
following command:
apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,all
On Windows, this command controls tracing for all sessions. On UNIX and Linux, tracing
is session-specific. Trace messages are written to $DOCUMENTUM/dba/log/fulltext/fttrace_
<repository_name>.log. The log entry contains the following information:
• Request query ID, so that you can find the translated query in the xPlore fulltext log
($DOCUMENTUM/dba/log/fullext/fttrace_repository_name.log).
• The XQuery that was translated from DQL
• Query plan, if you have set tracing to all or ftengine. (The query plan is used to provide information
to EMC technical support.)
• The request and response streams, to diagnose communication errors or memory stream
corruption
• dm_ftengine_config options
Note: This information is not written to the log for test queries that are issued through xPlore
administrator.
• Audit records
Auditing queries, page 103
• Reports
Chapter 11, Using Reports
Suggested workarounds:
— Purge unneeded log files.
— Turn down the log level from Debug or Info to Warning.
— Add disk space.
• You are saving tokens but didn’t plan space for it. Saved tokens could potentially save time on an
index rebuild, but they consume a large amount of disk space, more than five times the space
without save-tokens. Suggested workaround: Set the save-tokens option to false (the default) in
indexserverconfig.xml (Troubleshooting lemmatization, page 67) and restart xPlore instances.
• You disabled content compression but did not allow enough space for it. The extracted content
from Documentum documents is compressed by default. If this is changed, the index size can
grow to several times larger. Suggested workaround: Add more disk space, or stop all xPlore
instances and add the following in indexserverconfig.xml if it has been removed:
<compress>
<for-element name="dmftcontentref"/>
</compress>
• Incomplete xPlore cleanup. The data file space grows too large.
Suggested workaround: Purge orphaned xDB files. For purge utilities, refer to Purging orphaned
segments, page 98.
Startup problems
Make sure the index agent web application is running. On Windows, verify that the Documentum
Indexagent service is running. On Linux, verify that you have instantiated the index agent using the
start script in dsearch_home/jboss4.3.0/server.
If the repository name is reported as null, restart the repository and the connection broker and
try again.
If you see a status 500 on the index agent UI, examine the stack trace for the index agent instance. If a
custom routing class cannot be resolved, this error appears in the browser:
org.apache.jasper.JasperException: An exception occurred processing JSP page
/action_dss.jsp at line 39
...
root cause
com.emc.documentum.core.fulltext.common.IndexServerRuntimeException:
com.emc.documentum.core.fulltext.client.index.FtFeederException:
Error while instantiating collection routing custom class...
If the index agent web application starts with port conflicts, stop the index agent with the script. If
you run a stop script, run as the same administrator user who started the instance. The index agent
locks several ports, and they are not released by closing the command window.
When you view Details during or after an indexing process, you see the following statistics:
• Active items: Error count, indexed content size, indexed count, last update timestamp, size,
and warnings count.
• Indexer plugin: Maximum call time
• Migration progress (if applicable): Processed docs and total docs.
• Averages: Pause time, KB/sec indexed, number of indexed docs/sec, plugin blocking max time.
• List of current internal index agent threads
When you start an indexing operation, a status summary is displayed until indexing has completed.
Click Refresh to update this summary. The summary disappears when indexing has completed. To
view more details of indexing in progress, click Details.
Table 14, page 121 compares the processing counts reported by the index agent and xPlore
administrator.
Table 14. Comparing index agent and xPlore administrator indexing metrics
To check the indexing status of an object — The queue item ID for the document is available in the
details screen of the index agent UI. Use the following DQL to check the status of the queue item:
select task_name,item_id,task_state,message from dmi_queue_item where name=
username and event=’FT re-index‘
For dual mode installations (FAST and xPlore), the user is dm_fulltext_index_user_01
• Unix/Linux:
dm_fulltext_index_user $CONFIG_DIR/filter.properties
For dual mode installations (FAST and xPlore), the user is dm_fulltext_index_user_01
This setting generates a file ObjectId-filtered-out.txt that records all IDs of filtered-out objects.
4. If you run a script, run as the same administrator user who started the instance. Launch the
ftintegrity script.
Output from the script is similar to the following:
2009/09/02 12:28:10:078 Connected to the docbase
2009/09/02 12:28:10:344 Index Server is running
2009/09/02 12:28:12:453 fetched 63 object from docbase for type dm_group
2009/09/02 12:28:12:453 fetched 0 objects from DSS for type dm_group
2009/09/02 12:28:29:721 fetched 12216 object from docbase for type dm_sysobject
2009/09/02 12:28:29:721 fetched 11185 objects from DSS for type dm_sysobject
2009/09/02 12:28:30:033 fetched 286 object from docbase for type dm_acl
2009/09/02 12:28:30:033 fetched 0 objects from DSS for type dm_acl
2009/09/02 12:28:30:033 11183 objects with match ivstamp in both DCTM and
Index Server
2009/09/02 12:28:30:033 2 objects with different ivstamp in DCTM and Index Server
2009/09/02 12:28:30:033 1380 objects in DCTM only
2009/09/02 12:28:30:033 0 objects in Index Server only
Results from the ftintegrity migration verification — The script generates four results files in the
tools directory:
• ObjectId-common-version-match.txt
This file contains the object IDs and i_vstamp values of all objects in the index and the repository
and having identical i_vstamp values in both places.
• ObjectId-common-version-mismatch.txt
This file records all objects in the index and the repository with identical object IDs but
nonmatching i_vstamp values. For each object, it records the object ID, i_vstamp value in the
repository, and i_vstamp value in the index.
The mismatch is on objects that were modified during or after migration. You can resubmit this
list after you start the index agent in normal mode. Click Object File and browse to the file.
• ObjectId-dctmOnly.txt
This report contains the object IDs and i_vstamp values of objects in the repository but not in
the index.
These objects could be documents that failed indexing, documents that were filtered out, or new
objects generated in the repository during or after migration. You can resubmit this list after you
start the index agent in normal mode. Click Object File and browse to the file.
To check whether filters were applied during migration, run the following DQL query. If one or
more rows are returned, a filter was applied.
select r_object_id,object_name,primary_class from dmc_module where any
a_interfaces=‘com.documentum.fc.indexagent.IDfCustomIndexFilter’
• ObjectId-indexOnly.txt
This report contains the object IDs and i_vstamp values of objects in the index but not in the
repository.
These objects were removed from the repository during or after migration, before the event
has updated the index.
You can input the ObjectId-common-version-mismatch.txt file into the index agent UI to see errors for
those files. After you have started the index agent, check Index selected list of objectsand then check
Object file. Navigate to the file and then choose Submit. Open xPlore Administrator > Reports and
choose Document processing error summary. The error codes and reasons are displayed.
Is the format indexable? — Check the class property of the document format. Refer to Documentum
attributes that control indexing, page 53 for more information.
Is the document too large? — Check the content size. By default, the index agent filters out content
larger than 20 MB. The following message is logged in indexagent.log:
Content size for XXX exceeds limit of 20000000 skipping content
Is there another cause? — Check the index agent log for any other error message for the document,
such as unsupported format (the most common).
Reindexing
You can submit for reindexing the lists of objects that are generated by ftintegrity (Verifying index
migration with ftintegrity, page 122.)
To check on the status of queue items that have been submitted for reindexing — Use the
following DQL. For username, specify the user logged into the index agent UI and started reindexing.
select task_name,item_id,task_state,message from dmi_queue_item where
name=username and event=and event=’FT re-index’
If task_state is done, the message will be “Successful batch...” If the task_state is failed, the message
will be “Incomplete batch...”
To resubmit one document for reindexing — Put the object ID into a temporary text file. Use the
index agent UI to submit the upload: Choose Index selected list of objects >Object File option.
To remove queue items from reindexing — Use the following DQL. For username, specify the user
logged into the index agent UI and started reindexing.
delete dmi_queue_item object where name=username and
event=’FT re-index’
error_code Description
UNSUPPORTED_DOCUMENT Unsupported format
XML_ERROR XML parsing error for document content
DATA_NOT_AVAILABLE No information available
PASSWORD_PROTECTED Password protected or document encrypted
MISSING_DOCUMENT RTS routing error
INDEX_ENGINE_NOT_RUNNING xPlore indexing service not running
You can kill the JVM process and run the index agent configurator to give the agents different ports.
Troubleshooting CPS
You can test upload processing by using the Test upload feature in xPlore administrator. For more
information, refer to Testing upload and indexing, page 132.
CPS log levels — The following log levels are available for CPS in order of decreasing amount of
information logged: debug, info, warn, and error. Set the log level to INFO to troubleshoot CPS. The
log output file is specified in the log4j.properties file of the instance.
Each CPS request is logged with the prefix PERFCPSTS#. You see prefixes for following libraries
in CPS logging:
• CPS daemon: DAEMON-CORE
• Text extraction: DAEMON-TE STELLENT
• HTTP content retrieval: DAEMON-CF_HTTP
• Language identification: DAEMON-LI_RLI
• Language processing: DAEMON-LP_RLP
Following is an example from cps.log. (Remote CPS log is named cps_manager.log.)
2008-10-21 13:35:40,402 WARN [DAEMON-CORE-(1324)] max_batch_size in configuration
file is invalid. Use default:65536 instead.
Example: CPS performance by format — Use the timestamp difference between PERFCPSTS9
(Content fetching of the single request finished) AND PERFCPSTS10 (Text extraction of the single
request finished) to find the processing time for a particular document.
Testing tokenization
Test the tokenization of a word or phrase to see what is indexed. Expand Diagnostic and
Troubleshooting in the xPlore administrator tree and then choose Test tokenization. Different
tokenization rules are applied for each language. Uppercase characters are rendered as lowercase.
Special characters are replaced by white space.
Note: Test tokenization is not traced.
The results table displays the original input words. The root form is the token used for the index. The
Start and End offsets display the position in raw input. Components are displayed for languages that
support component decomposition, such as German.
Results may differ from tokenization of a full document. If the document language that is identified
during indexing does not match the language that is identified from the test, or the context of the
indexed document does not match the context of the text, the tokens can vary.
If the CPS analyzer cannot identify the file type, it displays the following error. The XML element
that contains the error is displayed:
*** Error: no filter available for this file type in xml_element.
If the file is empty, the following error is displayed. The XML element that contains the error is
displayed:
*** Error: file is empty in xml_element.
Slow ingestion
Slow ingestion is most often seen during migration. If migration is spread over days, for example,
tens of millions of documents ingested over two weeks, slow ingestion may not be an issue. Most
ingestion issues can be resolved with planning, pre-production sizing, and benchmarking.
The following topics describe possible causes and workarounds for slow ingestion:
• Insufficient CPU
• Large documents
• Disk I/O issues
• Slow network
• Large number of Excel documents
• Virus checking software
• Interference by another guest OS
• Slow content storage area
Insufficient CPU
Content extraction and text analysis are CPU-intensive. CPU is consumed for each document
creation, update, or change in metadata. Check CPU consumption during ingestion. Suggested
workarounds: For migration, add temporary CPU capacity. For day-forward (ongoing) ingestion,
add permanent CPU or new CPS instances. CPS instances will be used in a round-robin order.
Large documents
Large documents can tie up a slow network. These documents also contain more text to process. Use
the xPlore administrator reports to see the average size of documents and how many documents
are ingested per hour. Document size is also reported by the State of repository report in Content
Server. For example, the Documents ingested per hour reports shows number of documents and
text bytes ingested. Divide bytes ingested by document count to get average number of bytes per
document processed.
Two configuration properties affect the size of documents that are indexed and consequently the
ingestion performance:
• Indexing agent (Documentum only) limits the size of the documents submitted for indexing. This
limit is changed in indexagent.xml, in the WEB-INF/classes/ directory of the index agent WAR
file. You can change the contentSizeLimit parameter to a different value (in bytes). Stop the
index agent instance to change the size limit.
<parameter_name>contentSizeLimit</parameter_name>
<parameter_value>20000000</parameter_value>
</parameter>
• CPS limits the size of text that is indexed. A document can have a much greater size
(contentSizeLimit) compared to the indexable text within the document. You can change the value
of Max Text Threshold in the xPlore Administrator CPS configuration screen. Units are bytes
and the range is 5-40 MB.
Other suggested workarounds: Add CPU, memory, and possible disk I/O capacity. Improve network
performance.
You can detect disk I/O issues by looking at CPU utilization. Low CPU utilization and high I/O
response time indicate an I/O problem. Test the network by transferring large files or using Linux
dd (disk dump).
Suggested workarounds:
• NAS
Verify that the network has not been set as half duplex. Increase network bandwidth and/or
improved network I/O controllers on the xPlore host.
• SAN (check in the following order)
1. Verify that the SAN has sufficient memory to handle the I/O rate.
2. Increase the number of drives available for the xPlore instance.
3. If the SAN is multiplexing a set of drives over multiple application, move the "disk space"
to a less contentious set of drives.
4. If other measures have not resolve the problem, change underlying drives to solid state.
Slow network
A slow network between the Documentum Content Server and xPlore results in low CPU
consumption on the xPlore host even when the disk subsystem has a high capacity. File transfers via
FTP or network share are also slow, independent of xPlore operations.
Suggested workarounds: Verify that network is not set to half duplex. Check for faulty hubs or
switches. Increase network capacity.
Microsoft Excel documents require the most processing of all text formats, due to the complexity of
extracting text from the spreadsheet structure. You can detect the number of Excel documents using
the State of repository report in Content Server.
Suggested workaround: Add temporary CPU for migration or permanent CPU for ongoing load.
Virus checking software can lead to high disk I/O because it continually checks the changes in xPlore
file structures during indexing.
Workarounds: Exclude temp and xPlore working and data directories, or switch to Linux platform.
In a VM environment, the physical host may have several guest OSes. This could causes intermittent
slowness in indexing not due to format, document size, I/O capacity, or CPU capacity.
Workaround: Work with the infrastructure team to load balance the VMs appropriately.
Ingestion is dependent on the speed of the content source. This is especially noticeable during
migration. For example, you find that migration or ingestion takes much longer in production than in
development. Development is on a small volume of content on NAS but production content is on a
higher-latency device like Centera. You can determine the location of the original content by using
the State of the repository report in Content Server.
Workaround: Extend the migration time window.
Use xPlore administrator to select the instance, and then Configuration. Change the following to
smaller values:
• Max text threshold
• Thread pool size
You can add a separate CPS instance that is dedicated to processing. This processor will not interfere
with query processing.
Troubleshooting indexing
You can use reports to troubleshoot indexing and content processing issues. Refer to Chapter
11, Using Reports for more information on these reports. The following topics describe general
troubleshooting tasks and specific indexing errors.
On Windows, the index agent instance is installed as an automatic service called Documentum
Index_agent Windows.
• Check for errors in the index agent status page at http://host:port/IndexAgent/.
Note: The index agent reports processing X number of documents. xPlore reports success and
failure that should add up to X. Warning numbers reported in xPlore and IA should match.
Failures in xPlore are not reported back to IA.
When the index agent is down, documents cannot be indexed or searched. Detect this problem
by monitoring the size of the index agent queue. Use xPlore administrator to determine whether
documents were sent for ingestion. For example, the Documents ingested per hour report shows 0
for DocCount when the index agent is down.
Workaround: Configure multiple index agents for redundancy. Monitor the index agents and restart
when they fail.
Under certain conditions, CPS fails while processing a document. xPlore restarts the CPS process, but
the restart causes a delay. Restart is logged in cps.log and cps_daemon.log. For information on these
logs, refer to Reading CPS log files, page 126.
A large document in the ingestion pipeline can delay smaller documents that are further back in the
queue. Detect this issue using the Documents ingested per hour report in xPlore administrator.
(Only document size averages are reported.)
If a document is larger than the configured maximum limit for document size or text size, the
document is not indexed. The document metadata are indexed but the content is not. This is recorded
in the xPlore administrator report Content too large to index.
Workaround: Attempt to refeed a document that was too large. Increase the maximum size for
document processing. (Refer to Document size and performance, page 165.)
During periods of high ingestion load, documents can take a long time to be processed. Review the
ingestion reports in xPlore administrator to find bytes processed and latency. Use dsearch.log to
determine when a specific document was ingested.
Workaround: Set up a dedicated index agent for the batch workload.
If CPU, disk I/O, or memory are highly utilized, increase the capacity. Performance on a virtual server
is somewhat slower than on a dedicated host. For a comparison or performance on various storage
types, refer to Storage types and locations, page 159.
Connection refused
If an API returns a connection refused error, check the value of the URL on the instance. Make sure
that it is valid and that indexing is turned on for the instance.
If you have to change the xPlore host name, do the following:
• Update indexserverconfig.xml with the new value of the URL attribute on the node element.
Shut down the xPlore instance before applying your changes. Validate your changes using the
validation tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore
federation after you change this file.
• Change the JBoss startup (script or service) so that it starts correctly. If you run a stop script, run
as the same administrator user who started the instance.
Troubleshooting search
When you set the search service log level to WARN, queries are logged. Refer to Query logging, page
146 for more information. If query auditing is enabled (the default), you can view or edit reports on
queries. Refer to Chapter 11, Using Reports for more information on query reports.
No queries allowed — An error message in the client can indicate that the xPlore search service
has not started:
The search has failed: The Full-text service is disabled
The Content Server query plugin properties of the dm_ftengine_config object are set during xPlore
configuration. If you have changed one of the properies, like the primary xPlore host, the plugin can
fail. Verify the plugin properties, especially the qrserverhost, with the following DQL:
1> select param_name, param_value from dm_ftengine_config
2> go
Slow queries
Slow queries can be caused by the following:
• System is not warmed up
• Result sets are large
• xPlore security has been disabled, and security is applied in Content Server. (Default is native
xPlore security.)
• Group caches are not tuned
• Query result size is too large
• FAST-compatible wildcard behavior is enabled
xPlore uses caches that reduce disk I/O. Response times are higher until the caches are loaded with
data.
Suggested workaround: Increase the size of the xDB buffer cache for higher query rates. Stop all
xPlore instances. Change the value of the property xhive-cache-pages in the engine-config element of
indexserverconfig.xml and restart the xPlore instances.
Webtop users have a result maximum of 350, but custom clients may consume a larger result set.
Enable query auditing. Examine the number of results in the TopNSlowestQueries report for a
specific user and day. If the number of results is more than one thousand, the custom client may
be returning all the results.
Workaround: Change the client to consume a smaller number of results by closing the result
collection early or by using the DQL hint ENABLE(RETURN_TOP_N).
Content Server security is much slower than xPlore native security, because some or many results
that are passed to the Content Server are discarded. To detect the problem, enable query auditing.
Examine the number of results in the TopNSlowestQueries report for a specific user and day. If
the number of results is more than one thousand, xPlore security may be disabled and the user is
underprivileged. (When the user is underprivileged, many results are discarded.)
Workaround: Enable xPlore native security. Refer to Documentum search results security, page 47.
If a user is very underprivileged, or the user is the member of many groups, queries may slow due
to small group caches. For instructions on configuring the caches, refer to Configuring the security
cache, page 48.
For underprivileged users, examine the group_out_cache_fill element in the query audit record. If the
value exceeds the not-in-groups-cache-size, then the cache is too small.
For users who are members of a large number of groups, examine the group_cache_cache_fill element
in the query audit record. If the value exceeds the groups-in-cache-size, then the cache is too small.
By default, xPlore gets the top 12,000 most relevant results per collection to support a facet window of
10,000 results. Webtop applications consume only 350 results, so the extra result memory is costly
for large user environments or multiple collections (multiple repositories). In an environment
with millions of documents and multiple collections, you could see longer response times or out
of memory messages.
Workaround: Open xdb.properties, which is located in the directory WEB-INF/classes of the primary
instance. Set the value of queryResultsWindowSize to a number smaller than 12000.
Many Documentum clients do not enable wildcard searches for word fragments like “car” for
“careful.” The FAST indexing server supported word fragment searches for leading and trailing wild
cards in metadata and word fragment searches in SEARCH DOCUMENT CONTAINS (SDC) full-text
queries. If you enable FAST-compatible wildcard behavior for your Documentum application, you
see slower queries when the query contains a wildcard.
For information on how to change this behavior, refer to Turning on support for fragments, page 113.
Determine whether the system has only one or two cores and high query rate, or the system
is large but receives complex or unselective queries. Enable query auditing and examine the
TopNSlowestQueries report for the specific user name and day. Look for high query rates with
slow queries.
Workaround: Add more capacity.
A query will probe each index for a repository (domain) sequentially. Results are collected across
repositories. To detect the problem, enable query auditing. Try the query across repositories and
then target it to a specific repository.
Suggested workarounds: Use IDfXQuery parallel queries (refer to Documentum xPlore Development
Guide) , or use the ENABLE(fds_collection collectionname) hint or the IN COLLECTION clause in
DQL (refer to Routing a query to a collection using DQL, page 108).
If the user is very underprivileged, tens of thousands of results may be discarded by the security
filter. To detect this, enable query auditing. Find the query using the TopNSlowestQueries report
for the specific user and day. If the number in the Documents filtered out columns is very large,
it is a security cache issue.
Workaround: Queries can generally be made more selective. If this is not possible, organize the
repository so that the user has access to documents in certain containers such rooms or cases, then
append the container IDs to the user’s query.
If the multi-path index is not used to service the query, then the query will run slowly. DQL and DFC
Search service queries always use the index. Some IDfXQuery-based queries may not use it. To detect
this issue, enable query auditing. Find the query using the TopNSlowestQueries report (with user
name and day). Get the user’s query id and get the query text by using the GetQueryText report.
Obtain the query plan to determine which indexes were probed, if any. (Provide the query plan to
EMC technical support for evaluation.) Rewrite the query to use the index.
Note: The query plan is not written to the log for test queries that are issued through xPlore
administrator.
Error because you have changed the xPlore host — If you have to change the xPlore host name,
do the following:
• Update indexserverconfig.xml with the new value of the URL attribute on the node element.
Shut down all xPlore instances before applying your changes. Validate your changes using the
validation tool described in Modifying indexserverconfig.xml, page 36. Back up the xPlore
federation after you change this file.
• Change the JBoss startup (script or service) so that it starts correctly.
• Set the save-tokens option to true for the target collection (Troubleshooting lemmatization, page
67) and restart xPlore, then reindex the document. Check the tokens in the Tokens library to see
whether the search term was properly indexed.
Logging
Logging can be configured for each service in xPlore administrator. Log levels can be set for indexing,
search, and xPlore administrator.
To set logging for a service, choose System Overview in the left panel. Choose Global Configuration
and then choose the Logging Configuration tab to configure logging.
Choose a log and set the tracing level for dsearch.log:
• dsearchadmin
Logs xPlore administrator operations
• dsearchindex
Logs indexing operations
• dsearchdefault
Sets the default log level
• dsearchsearch
Logs search operations
For CPS logging configuration, refer to CPS logging, page 143.
The following log levels are available. Levels are shown in increasing severity and decreasing
amounts of information, so that TRACE displays more than DEBUG, which displays more than
INFO. FATAL logs only the most severe errors.
• TRACE
• DEBUG
• INFO
• WARN
• ERROR
• FATAL
Caution: Logging can slow the system and consume disk space. In a production environment,
the system should run with minimal logging enabled.
CPS logging
CPS does not use the xPlore logging framework. A CPS instance that is embedded in an xPlore
instance uses the log4j.properties file in WEB-INF/classes of the dsearch web application. A
standalone CPS instance uses log4j.properties in the CPS web application, in the WEB-INF/classes
directory.
If you have installed more than one CPS instance on the same host, each instance has its own web
application and log4j.properties file. To avoid one instance log overwriting another, make sure each
file appender in log4j.properties points to a unique file path.
Key:
Argument Description
r Number of milliseconds elapsed from the
constructions of the layout until the creation of
the logging event
p Priority of the logging event (max length is 5
characters)
c Category of the logging event, typically fully
qualified class name. It will be filtered to log just
class name
t Thread name for thread that generated the
logging event
m Message from the application associated with
the logging event
x Context (NDC, nested diagnostic context)
associated with the thread that generated the
logging event, if the code is instrumented. For
internal use only.
Following is a sample log message when additional name-value pair information is available or
passed:
2009-08-25 09:24:09,101 INFO [ESSContext-(main)] testing
[message = xhive db has started ]
log4j.appender.<appenderName>.layout.ConversionPattern=
%r %5p [%c{1}-(%t)] %m %x %Z%n
XML layout — For XML layout, log4j generates the message into an XML file. The appender that
generates an XML log is com.emc.documentum.core.fulltext.utils.log.ESSXmlLayout.
Note: For XML log output, log4j does not generate the parent or root XML element. Add the parent
element before parsing the file by an XML parser.
Sample message (line breaks inserted for readability):
<event timestamp="2009-01-06 18:39:41,094" level="WARN" thread="main"
logger="com.emc.documentum.core.fulltext.indexserver.core.config.impl.
xmlfile.IndexCollectionConfig" elapsedTime="1231295981094">
<message><![CDATA[[CONF_NO_DEFAULT_LIBRARY] There is no default
library found for collection, [knowledgeworker]. The first library
in the list, [library1], is assumed as default.]]></message>
</event>
Log locations
xPlore uses Apache log4j, an open source module for logging. log4j has a set of logging configuration
options based on severity level. Information for specific packages can be logged. The xPlore custom
XML log4j appender logs messages into a file when you specify the log4j RollingFileAppender or into
xDB when you specify the XHiveDbAppender.
The following configuration logs messages to a file. Line breaks are shown here for readability
but do not exist in the properties file:
log4j.appender.<appenderName>=org.apache.log4j.RollingFileAppender
log4j.appender.<appenderName>.MaxFileSize=10MB
log4j.appender.<appenderName>.MaxBackupIndex=10
log4j.appender.<appenderName>.File=C:/temp/xPlore/logs/fulltext.log
log4j.appender.<appenderName>.layout=
com.emc.documentum.core.fulltext.utils.log.ESSXmlLayout
The following configuration logs messages to an xDB log. Line breaks are shown here for readability
but do not exist in the properties file:
log4j.appender.<appenderName>=
com.emc.documentum.core.fulltext.utils.log.XHiveDBAppender
log4j.appender.<appenderName>.filename=dsearch.log
log4j.appender.<appenderName>.fallBackAppender=org.apache.log4j.RollingFileAppender
log4j.appender.<appenderName>.libraryPath=root
log4j.appender.<appenderName>.buffer=10
log4j.appender.<appenderName>.layout=
com.emc.documentum.core.fulltext.utils.log.ESSXmlLayout
Query logging
The xPlore search service logs queries. For each query, the search service logs the following
information for all log levels:
• Start of query execution including the query statement
• Total results processed
• Total query time including query execution and result fetching
Tip: More query information is logged when native xPlore security (not Content Server security)
is enabled.
Set the log level in xPlore administrator. Open Services in the tree, expand and select Logging, and
click Configuration. You can set the log level independently for administration, indexing, search,
and default. Levels in decreasing amount of verbosity: TRACE, DEBUG, INFO, WARN (default),
ERROR, and FATAL.
To further configure logging, stop all xPlore instances and edit indexserverconfig.xml in
dsearch_home/config. You can set the maximum log file size and maximum number of backups.
A single line is logged for each batch of query results returned by the xPlore server. The log message
has the following form:
<date-time><Tracing Level><Class Name><Thread ID><Query ID>[
<main query options in concise form>]<total hits><execution time in millseconds>
The following examples from dsearch.log show a query, total results processed, and total query time:
<event timestamp="2010-06-07 21:54:26,090" ...>
<message ><![CDATA[QueryID=PrimaryDsearch$d95fd870-9639-42ad-8da2-167958017f4d,
query-locale=en,query-string=let $j:= for $i score $s in /dmftdoc
[. ftcontains ’strange’] order by $s
descending return
Tracing
You can configure tracing in xPlore administrator. Expand the instance in the left panel and select
Tracing. Enable or disable tracing in the right panel. Tracing does not require a restart.
When you enable tracing, a detailed Java method call stack is logged in one file. From that file, you
can identify the methods that are called, with parameters and return values. Refer to the Documentum
xPlore Development Guide for more information on tracing.
To trace specific classes, edit indexserverconfig.xml, which is located in dsearch_home/config. Shut
down all xPlore instances before changing this file. Validate your changes using the validation tool
described in Modifying indexserverconfig.xml, page 36. Back up the xPlore federation after you
change this file. You can configure the name, location, and format of the log file for the logger and
its appender in indexserverconfig.xml or in the log4j.properties file. The log4j configuration takes
precedence.
Reports provide indexing and query statistics, and they are also a troubleshooting tool. Chapter
10, Troubleshooting and Chapter 12, Performance and Disk Space describe how to use reports for
troubleshooting tips.
Statistics on content processing and indexing are stored in the metrics database. Use Reports to
query these statistics. Statistics for queries are stored in an audit record. Enabled query auditing
to get reports on queries. (It is disabled by default.) Choose Diagnostic and Troubleshooting ,
click Audit Records and then click Enable. For more information on configuring auditing, refer to
Auditing queries, page 103.
Running reports — To run reports, choose Diagnostic and Troubleshooting and then click Reports.
To generate Documentum reports that compare a repository to the index, refer to Running the state of
the index job, page 60.
Reports are described in the following topics:
• Types of reports, page 149
• Document processing (CPS) reports, page 150
• Indexing reports, page 151
• Search reports, page 151
• Editing a report, page 152
Types of reports
Table 18, page 149 describes the reports that are available in xPlore administrator.
Indexing reports
To view indexing rate, run the report Documents ingested per month/day/hour. The report shows Average
processing latency. The monthly report covers the current 12 months. The daily report covers the
current month. The hourly report covers the current day. From the hourly report, you can determine
your period of highest usage. You can divide the document count into bytes processed to find out the
average size of content ingested. For example, 2,822,469 bytes for 909 documents yields an average
size of 3105 bytes. This does not include non-indexable content.
Search reports
Enable auditing in xPlore administrator to view query reports.
Note: Queries in xPlore administrator are audited but are not reported by the query processing
reports.
Top N slowest queries — Find the slowest queries by selecting Top N slowest queries. To determine
how many queries are unselective, sort by Number of results fetched. (Note that this is limited by
default in Webtop to 350.)
Sort Top N slowest queries by Number of hits denied access by security filter to see how many
underprivileged users are experiencing slow queries due to security filtering. For information on
changing the security cache, refer to Changing the security cache sizes, page 168
Get query text — To examine a slow or failed query by a user, get the query ID from Top N slowest
queries and then enter the query ID into Get query text. Examine the query text for possible problems.
The following example is a slow query response time. The user searched in Webtop for the string
"xplore" (line breaks added here):
declare option xhive:fts-analyzer-class ’com.emc.documentum.core.fulltext.indexserver
.core.index.xhive.IndexServerAnalyzer’; for $i score $s in collection(’
/DSS_LH1/dsearch/Data’) /dmftdoc[( ( ( (dmftmetadata//a_is_hidden = ’false’) ) )
and ( (dmftinternal/i_all_types = ’030a0d6880000105’) )
and ( (dmftversions/iscurrent = ’true’) ) )
and ( (. ftcontains ( ((’xplore’) with stemming) ) )) ]
order by $s descending return
<dmrow>{if ($i/dmftinternal/r_object_id) then $i/dmftinternal/r_object_id
else
<r_object_id/>}{if ($i/dmftsecurity/ispublic) then $i/dmftsecurity/ispublic
else <ispublic/>}{if ($i/dmftinternal/r_object_type) then $i/dmftinternal/r_object_type
else <r_object_type/>}{if ($i/dmftmetadata/*/owner_name)
then $i/dmftmetadata/*/owner_name
else <owner_name/>}{if ($i/dmftvstamp/i_vstamp) then $i/dmftvstamp/i_vstamp
else <i_vstamp/>}{if ($i/dmftsecurity/acl_name) then $i/dmftsecurity/acl_name
else <acl_name/>}{if ($i/dmftsecurity/acl_domain) then $i/dmftsecurity/acl_domain
else <acl_domain/>}<score dmfttype=’dmdouble’>{$s}</score>{xhive:highlight(
$i/dmftcontents/dmftcontent/dmftcontentref)}</dmrow>
Use the xDB admin tool to debug the query. For instructions on using xhadmin, refer to Using the xDB
admin tool, page 36.
Query counts by user — Use Query counts by user to determine which users are experiencing the
slowest query response times.
Editing a report
You can edit any of the xPlore reports. Select a report in xPlore administrator and click Save as.
Specify a unique file name and title for the report. Alternatively, you can write a new copy of the report
and save it to dsearch_home/jboss4.3.0/server/primary_instance/deploy/dsearchadmin.war/reports.
The new report will be picked up by xPlore administrator if you click somewhere else in xPlore
administrator and then click Reports.
Accessing the audit record — The audit record is stored in the xDB database for the xPlore
federation. You can filter the audit record by date using xPlore administrator. You can copy
the entire audit record using the xDB admin tool. Open the xDB tree and drill down to
root-library/SystemData/AuditDB/primary_instance_name/auditRecords.xml. For instructions on
using xhadmin, refer to Using the xDB admin tool, page 36.
5. Create a variable for failed queries and add it after the variable definition for successful queries
(for $j ...let $k ...). We can find the nodes in a QUERY element whose TOTAL_HITS value is
equal to zero to get the failed queries.
let $z := collection(’AuditDB’)//event[@component = "search" and @name = "QUERY"
and START_TIME[ . >= $startTime and . <= $endRange] and USER_NAME = $j
and TOTAL_HITS = 0]
6. Create a variable for the count of failed queries and add it after the variable for successful query
count (let $queryCnt...):
let $failedCnt := count($z)
7. Return the failed query count cell, after the query count cell (<cell> { $queryCnt } ...):
<cell> { $failedCnt } </cell>
8. Redefine the failed query variable to get a count for all users. Add this line after <rowset...>let $k...:
let $z := collection(’AuditDB’)//event[@component = "search" and @name = "QUERY"
and START_TIME[ . >= $startTime and . <= $endRange] and USER_NAME and TOTAL_HITS = 0]
9. Add the total count cell to this second rowset, after <cell> { $queryCnt } </cell>:
<cell> { $failedCnt } </cell>
10. Save and run the report. The result is similar to the following:
If your query has a syntax error, you will get a stack trace that identifies the line number of the error.
You can copy the text of your report into an XML editor that displays line numbers, for debugging.
If the query runs very slowly, it will time out after about one minute. You can run the same query in
the xDB admin tool.
Use the rough guidelines in the following diagram to help you plan scaling of search. The order of
adding resources is the same as for ingestion scaling.
2. Use the following DQL query to determine the number of documents in the repository:
3. Divide the results of step 1 by the results of step 2. If the number is high, for example, .8, most
documents were modified and accessed in the past two years. (80%, in this example)
Estimating index size (Documentum environments) — The average size of indexable content
within a document varies from one document type to another and from one enterprise to another.
You must calculate the average size for your environment. The easiest estimate is to use the disk
space that was required for a Documentum indexing server with FAST. If you have not installed a
Documentum indexing server, you can use the following procedure to estimate index size.
You can also use the following procedure to estimate index size.
1. Perform a query to find the average size of documents, grouped by a_content_type, for example:
select avg(r_full_content_size),a_content_type from dm_sysobject group by
a_content_type order by 1 desc
2. Perform a query to return 1000 documents in each format. Specify an the average size range, that
is, r_full_content_size greater than (average less some value) and less than (average plus some
value). Make the plus/minus value a small percentage of the average size. For example:
select r_object_id,r_full_content_size from dm_sysobject
where r_full_content_size >(1792855 -1000) and
r_full_content_size >(1792855 +1000) and
a_content_type = ’zip’ enable (return_top 1000)
3. Export these documents and index them into new, clean xPlore install.
4. Determine the size on disk of the dbfile and lucene-index directories in dsearch_home./data
5. Extrapolate to your production size.
For example, you have ten indexable formats with a average size of 270 KB from a repository
containing 50000 documents. The Content Server footprint is approximately 12 GB. You get a sample
of 1000 documents of each format in the range of 190 to 210 KB. After export and indexing, these
10000 documents have an indexed footprint of 286 MB. Your representative sample was 20% of the
indexable content, so your calculated index footprint is 5 x sample_footprint=1.43 GB (dbfile 873
MB, lucene-index 593 MB)..
Disk space vs. indexing rebuild performance — If you save indexing tokens for faster index
rebuilding, they consume disk space. By default they are not saved. Edit indexserverconfig.xml and
set domain.collection.properties.property "save-tokens" to true for a collection.
Tuning xDB properties for disk space — You can set the following property in xdb.properties,
which is located in the directory WEB-INF/classes of the primary instance. If this properties is not
listed, you can add it.
• TEMP_PATH
Temporary path for Lucene index. If not specified, the current system property java.io.tmpdir is
used.
Managing index disk space — To conserve disk space on the primary host, purge the status
database when the xPlore primary instance starts up. By default, the status DB is not purged. Refer to
Managing the status database, page 38)
If you have specified save-tokens for summary processing, edit indexserverconfig.xml to limit the
size of tokens that are saved. Set the maximum size of the element content in bytes as the value
of the attribute extract-text-size-less-than. Tokens will not be saved for larger documents. Set the
maximum size of tokens for the document as the value of the attribute token-size. For details on
extraction settings, refer to Table 7, page 76.
Insufficient disk space, page 118 describes specific troubleshooting for unexpected disk space
problems.
Managing storage locations — The data store locations for xDB libraries are configurable. The xDB
data stores and indexes can reside on a separate data store, SAN or NAS. Configure the storage
location for a collection in xPlore administrator. You can also add new storage locations through
xPlore administrator.
System sizing
You can plan system sizing for CPS processing, ingestion, and search.
Adding CPS instances — CPS processing of documents is typically the bottleneck in ingestion. CPS
also processes queries. You can add CPS instances either on the same host as the primary instance
or on additional hosts (vertical and horizontal scaling, respectively). A remote CPS instance does
not perform as well as a CPS instance on an indexing instance. The remote instance adds overhead
for the xPlore system.
To add CPS instances, run the xPlore configuration script and choose Create Content Processing
Service Only.
Sizing for search performance — You can size several components of an xPlore system for search
performance requirements:
• CPU capacity
• Memory for query caches
Using xPlore administrator, change the value of query-result-cache-size in search service
configuration and restart the search service.
Sizing for ingestion performance — You can size several components of an xPlore system for
performance requirements:
• CPU capacity
• I/O capacity (the number of disks that can write data simultaneously)
• Memory for temporary indexing usage
Sizing migration from FAST — When you compare sizing of the FAST indexing system to xPlore,
use the following guidelines:
• Size with the same allocations used for FAST, unless the FAST installation was very undersized or
you expect usage to change.
• Use VMWare-based deployments, which were not supported for FAST.
• Include sizing for changes to existing documents:
— A modification to a document requires the same CPU for processing as a new document.
— A versioned document requires the same (additional) space as the original version.
• Size for high availability and disaster recovery requirements.
System tuning
Some system tuning requires editing of indexserverconfig.xml. (Refer to Modifying
indexserverconfig.xml, page 36.)
Excluding xPlore files from virus scanners — Performance of both indexing and search can be
degraded during virus scanning. Exclude xPlore directories, especially the dsearch_home/data
directory.
Tuning memory pools — xPlore uses four memory caches. The last three are part of the xPlore
instance memory and have a fixed size:
• OS buffer cache
Holds temporary files, xDB data, and Lucene index structures. Has largest impact on Lucene
index performance.
• xDB buffer cache
Stores XML file blocks for ingestion and query. Increase for higher query rates: Change the value
of the property xhive-cache-pages in the engine-config element of indexserverconfig.xml. Back up
the xPlore federation after you change this file.
• Lucene working memory
Used to process queries. Lucene working memory is consumed from the host JVM process.
Increasing the JVM memory may not affect performance.
• xPlore caches
Temporary cache to buffer results. Using xPlore administrator, change the value of
query-result-cache-size in search service configuration and restart the search service.
VMWare deployments require more instances than physical deployments. For example, VMWare is
limited to eight cores.
64–bit vs. 32–bit — 64-bit operating systems have advantages and disadvantages in an xPlore
installation:
• Advantages
— More memory is used to cache index structures for faster query access.
— More memory is available to index large documents.
— 64–bit supports higher ingestion and query rates.
• Disadvantages
— Per-object memory space is higher. If memory is low, a 32–bit VM will perform better.
— The size of the 64–bit VM is limited by garbage collection activity.
Sizing the disk I/O subsystem — xPlore supports local disk, SAN, and NAS storage. These storage
options do not have equal performance. For example, NAS devices send more data and packets
between the host and subsystem. Jumbo frame support is helpful as is higher bandwidth.
Compression — Indexes can be compressed to enhance performance. Compression uses more I/O
memory. The compress element in indexserverconfig.xml specifies which elements in the ingested
document have content compression to save storage space. Compressed content is about 30% of
submitted XML content. Compression may slow the ingestion rate by 10-20% when I/O capacity is
constrained. Refer to Modifying indexes, page 76.
If ingestion starts fast and gets progressively slower, set compression to false for subpath indexes in
indexserverconfig.xml.
Indexing performance
Various factors affect the rate of indexing. You can tune some indexing and xDB parameters and adjust
allowable document size. For specific indexing issues, refer to Troubleshooting indexing, page 132.
• CPS limits the size of text that is indexed. A document can have a much greater size
(contentSizeLimit) compared to the indexable text within the document. You can change the value
of Max Text Threshold in the xPlore Administrator CPS configuration screen. Units are bytes and
the range is 5-40 MB. Default: 10 MB.
You can configure multiple CPS instances so that a single CPS is not overwhelmed with load.
Documents will be submitted to CPS for processing in round-robin order.
Note: Increasing the maximum text size can negatively impact CPS memory consumption under
heavy load. In this case, the entire batch of submitted documents will fail.
For additional factors that impact disk space, refer to Insufficient disk space, page 118.
Maximum RAM in bytes to be used for in-memory Lucene index. Higher values use more
memory and support faster indexing. Default: 3000000.
• mergeFactor
Number of index entries to keep in memory before storing to disk and how often segments are
merged. For example, a factor of 10 creates a new segment for every 10 XML documents added to
the index, and when the tenth segment has been added, the segments are merged. A high value
improves batch indexing and optimized search performance and uses more RAM. A low value
uses less memory and causes the index to be updated more often, slowing down indexing, but
searches on unoptimized indexes are faster. Default: 10.
Note: High values can causes a “too many open files” exception. You can increase the maximum
number of open files allowed on a UNIX or Linux host by increasing the ulimit setting.
• maxMergeDoc
Sets the maximum size of a segment that can be merged with other segments. Low values are
better for interactive indexing because this limits the length of merging pauses during indexing.
High values are better for batch indexing and faster searches. If RAM buffer size is exceeded
before max merge doc, then flush is triggered. Default: 1000000
• nonFinalMaxMergeSize
Maximum size of internal Lucene index that is eligible for merging, in bytes. Non-final merge is
executed frequently to reduce the number of file descriptors, memory consumption and sub-index
creation. Default: 300000000
• finalMergingInterval
Interval after which final sub-indexes are merged, usually once a day. Units are hours in 24–hour
time, minutes, and seconds. Default: midnight (24*60*60).
Search performance
To measure query performance, you must enable auditing. You can also turn on tracing information
for query execution. Select an instance, choose Tracing, and then choose Enable.
For slow queries, refer to Slow queries, page 137.
Examine the query load to see if the system is overloaded. Run the report Top N slowest queries .
Examine the Start time column to see whether slow queries occur at a certain period during the day or
certain days of the month.
Save the query execution plan to find out whether you need an additional index on a metadata
element. (For more information on the query plan, refer to Getting the query plan, page 137.)
Documentum clients can save the plan with the following iAPI command:
apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,ftengine
Result window for a single query. If the total result number is larger than the window, the
window size will be expanded twice for the next collecting round. Lower values can trigger
re-collection operations and increase query response time. Higher values can consume more
memory, especially for unselective queries. Default: 12000
The settings for CPS, indexing, and search services are described in the following topics:
• Documentum index agent parameters, page 171
• Content processing instance settings, page 173
• Document processing and indexing service settings, page 175
• Search service settings, page 177
Parameter Description
acl_exclusion_list Add this parameter to exclude specific
ACL attributes from indexing. Contains an
acl_attributes_exclude_list element. Check with
technical support before you add this list.
acl_attributes_exclude_list Specifies a space delimited list of ACL attributes
that will not be indexed.
dsearch_qrserver_host Fully qualified host name or IP address of host
for xPlore server
dsearch_qrserver_port Port used by xPlore server. Default is 9200
dsearch_domain Repository name
Parameter Description
group_exclusion_list Add this parameter to exclude specific
group attributes from indexing. Contains an
group_attributes_exclude_list element. Check
with technical support before you add this list.
group__attributes_exclude_list Specifies a space delimited list of group
attributes that will not be indexed.
index_type_mode Object types to be indexed. Values: both
(default) | aclgroup | sysobject. If you use two
index agents, each can index either ACLs or
sysobjects.
max_requests_in_batch Maximum number of objects to be indexed in a
batch. Default: 5
max_batch_wait_msec. Maximum wait time in milliseconds for a
batch to reach the max_requests_in_batch
size. When this timeout is reached the batch
is submitted to xPlore. The default setting
(1) is for high indexing throughput. If your
Index Agent has a low ingestion rate of
documents and you want to have low latency,
reduce both max_requests_in_batch and
max_submission_timeout_sec.
max_pending_requests Maximum number of indexing requests in the
queue. Default: 10000
max_tries Maximum number of tries to add the request
to the internal queue when the queue is full.
Default: 2
group_attributes_exclude_list Attributes of a group to exclude from indexing
Table 23, page 172 describes general index agent runtime settings. Requests for indexing pass from
the exporter queue to the indexer queue to the callback queue.
Parameter Description
queue_size Size of queue for indexing requests. When the
queue reaches this limit, the index agent will
wait for the queue to be lower than queue_size
less (queue_size * queue_low_percent).
For example, if the queue_size is 500 and
queue_low_percent is 10%, then the agent will
resume indexing when the queue is lower than
500 - (500 * .1) = 450.
queue_low_percent Percent of queue size at which the index agent
will resume processing the queue.
Parameter Description
callback_queue_size Size of queue to hold requests sent to xPlore
for indexing. When the queue reaches this
size, the index agent will wait until the
callback queue has reached 100% less the
callback_queue_low_percent.
callback_queue_low_percent Percent of callback queue size at which the index
agent will resume sending requests to xPlore.
wait_time Time in seconds that the indexing thread waits
before reading the next item in the indexing
queue.
thread_count Number of threads to be used by index agent.
shutdown_timeout Time the index agent should wait for thread
termination and cleanup before shutdown
runaway_timeout Timeout for runaway query.
partition_config You can add this element and its contents,
described below, if you want to map partitions
to specific collections. Refer to Mapping Content
Server storage areas to collections, page 60 for
more information.
Parameter Description
contentSizeLimit In exporter.parameter_list. Sets the maximum
size for documents to be sent for indexing. The
value is in bytes. Default: 20MB.
• Keep intermediate temp file: Keep content in a temporary CPS folder for debugging.
Enabling temp file has a large impact on performance. Disable (default) to remove temporary files
after the specified time in seconds. Time range in seconds: 1-604800 (1 week).
• Restart threshold: Check After processed... and specify the number of requests after which
to restart the CPS daemon.
Disable if you do not want the daemon restarted. Decreasing the number may impact performance.
• Heartbeat: Interval in seconds between the CPS manager and daemon.
Range: 1-600. Default: 60.
• Embedded return: Check Yes (default) to return embedded results to the buffer. Check No return
results to a file, and specify the file path for export.
Embedded return increases communication time and, consequently, impacts ingestion.
• Export file path: Valid URI at which to store CPS processing results, for example, file:///c:/.
If the results are larger than Result buffer threshold, they are saved in this path. This setting does
not apply to remote CPS instances, because the processing results are always embedded in the
return to xPlore.
• Result buffer size threshold: Number of bytes at which the result buffer returns results to file.
Valid values: 8 - 16MB. Default: 1MB (1048576 bytes). Larger value can accelerate process but
can cause more instability.
• Processing buffer size threshold: Specifies the number of bytes of the internal memory chunk
used to process small documents.
If this threshold is exceeded, a temporary file is created for processing. Valid values: 100KB-10MB.
Default: 2MB (2097152 bytes). Increase the value to speed processing. Consumes more memory.
• Load file to memory: Check to load the submitted file into memory for processing. Uncheck to
pass the file to a plug-in analyzer for processing (for example, the Documentum index agent).
• Batch in batch count: Average number of batch requests in a batch request.
Range: 1-100. Default: 5. CPS assigns the number of Connection pool threads for each
batch_in_batch count. For example, defaults of batch_in_batch of 5 and connection_pool_size of 5
result in 25 threads.
• Thread pool size: Number of threads used to process a single incoming request such as text
extraction and linguistic processing.
Range: 1-100. Default: 10). Larger size can speed ingestion when CPU is not under heavy load.
Causes instability at heavy CPU load.
• System language: ISO 639-1 language code that specifies the language for CPS.
Refer to Appendix E, Indexable Languages for codes.
• Max text threshold: Sets the size limit, in bytes, for documents to be tokenized.
Range: 5-40 MB expressed in bytes. Default: 20 MB. Larger values can slow ingestion rate and
cause more instability.
Note: This threshold is applied to the size of the document including expanded attachments. For
example, if an email has a zip attachment, the zip file is expanded to evaluate document size. If
you increase this threshold, ingestion performance may degrade under heavy load.
• Illegal char file: Specifies the URI of a file that defines illegal characters.
To create a token separator, xPlore replaces illegal characters with white space. This list is
configurable.
• Request time out: Number of seconds before a single request times out.
Range: 60-3600. Default: 600.
• Daemon standalone: Check to stop daemon if no manager connects to it. Default: unchecked.
• IP version: Internet Protocol version of the host machine. Values: IPv4 or IPv6. Dual stack is
not supported.
• Use express queue: This queue contains admin requests and query requests. (Queries are
processed for language identification, lemmatization, and tokenization.) The express queue has
priority over the regular queue. Set the maximum number of requests in the queue. Default: 128.
• The regular queue processes indexing requests. Set the maximum number of requests in the
queue. Default: 1024.
• When the token count is zero and the extracted text is larger than the configured threshold,
a warning is logged
You can configure the following additional parameters in the CPS configuration file configuration.xml,
which is located in the CPS instance directory dsearch_home/dsearch/cps/cps_daemon:
• language_identification: The number of bytes used for language identification can be configured
in the CPS configuration file as the value of max_process_byte. The bytes are analyzed from the
beginning of the file. A larger number slows the ingestion process. A smaller number increases
the risk of language misidentification. Default: 1000.
• max_batch_size: Limit for the number of requests in a batch. Valid values: 2 - 65538 (default: 1024).
Note: The index agent also has batch size parameters.
• max_text_threshold: The upper limit in bytes for documents that are tokenized. Above this size,
only the document metadata is tokenized. Default: 10485760 (10 MB).
• query-thread-max-idle-interval: Query thread is freed up for reuse after this interval, because
the client application has not retrieved the result. (Threads are freed immediately after a result is
retrieved.) Default: 3600000.
• query-summary-default-highlighter: Class that determines summary and highlighting. Default:
com.emc.documentum.core.fulltext.indexserver.services.summary.DefaultSummary. Refer to
Configuring query summary and highlighting, page 101.
• query-summary-display-length: Number of characters to return as a dynamic summary. Default:
64. Refer to Configuring query summary and highlighting, page 101.
• query-summary-highlight-begin-tag: HTML tag to insert at beginning of summary. Default:
empty string. Refer to Configuring query summary and highlighting, page 101.
• query-summary-highlight-end-tag: HTML tag to insert at end of summary. Default: empty string.
Refer to Configuring query summary and highlighting, page 101.
• query-enable-dynamic-summary: If context is not important, set to false to return as a summary
the first n chars defined by the query-summary-display-length configuration parameter. For
summaries evaluated in context, set to true (default). Refer to Configuring query summary and
highlighting, page 101.
• query-index-covering-values: Supports Documentum DQL evaluation. Do not change unless
tech support directs you to do this.
• query-facet-max-result-size: Documentum only. Sets the maximum number of results used to
compute facet values. For example, if query-facet-max-result-size=12, only 12 results for all facets
in a query are returned. If a query has many facets, the number of results per facet is reduced
accordingly. Default: 10000.
Note: Result set size cannot be limited. It is up to the client application to limit the number of results
that are fetched.
Documentum repository content is stored in XML format. Table 25, page 179 displays the partial
DTD. Customer-defined elements and attributes can be added to this DTD as children of dmftcustom.
Each element specifies an attribute of the object type. The object type is the element in the path
dmftdoc/dmftmetadata/type_name, for example, dmftdoc/dmftmetadata/dm_document.
The root element of DFTXML is dmftdoc. Table 25, page 179 describes the top-level elements under
dmftdoc. This DTD is subject to change.
Element Description
dmftkey Contains Documentum object ID (r_object_id)
dmftmetadata Contains elements for all indexable attributes
from the standard Documentum object model,
including custom object types. Each attribute is
modeled as an element and value. Repeating
attributes repeat the element name and contain
a unique value. Some metadata, such as
r_object_id, are repeated in other elements as
noted.
dmftvstamp Contains the internal version stamp (i_vstamp)
attribute.
dmftsecurity Contains security attributes from the object
model plus computed attributes: acl_name,
acl_domain, and ispublic.
dmftinternal Contains attributes used internally for query
processing.
dmftversions Contains version labels and iscurrent for the
object if it is a sysobject.
dmftfolders Contains the folder ID and folder parents.
dmftcontents Contains content-related attributes and one
or more pointers to content files. The actual
content can be stored within the child element
dmftcontent as a CDATA section.
Element Description
dmftcustom Contains searchable information supplied by
custom applications. (Requires a TBO.)
dmftsearchinternals Contains tokens used by static and dynamic
summaries.
To find the path of a specific attribute in DFTXML, use a Documentum client to look up the object
ID of a custom object in the repository. Using xPlore administrator, open the target collection and
paste the object ID into the Filter word box. Click the resulting document to see the DFTXML
representation. Following is a sample DFTXML representation of a custom object type:
<?xml version="1.0"?>
<dmftdoc dmftkey="090a0d6880008848" dss_tokens=":dftxml:1">
<dmftkey>090a0d6880008848</dmftkey>
<dmftmetadata>
<dm_sysobject>
<r_object_id dmfttype="dmid">090a0d6880008848</r_object_id>
<object_name dmfttype="dmstring">mylog.txt</object_name>
<r_object_type dmfttype="dmstring">techpubs</r_object_type>
<r_creation_date dmfttype="dmdate">2010-04-09T21:40:47</r_creation_date>
<r_modify_date dmfttype="dmdate">2010-04-09T21:40:47</r_modify_date>
<r_modifier dmfttype="dmstring">Administrator</r_modifier>
<r_access_date dmfttype="dmdate"/>
<a_is_hidden dmfttype="dmbool">false</a_is_hidden>
<i_is_deleted dmfttype="dmbool">false</i_is_deleted>
<a_retention_date dmfttype="dmdate"/>
<a_archive dmfttype="dmbool">false</a_archive>
<a_link_resolved dmfttype="dmbool">false</a_link_resolved>
<i_reference_cnt dmfttype="dmint">1</i_reference_cnt>
<i_has_folder dmfttype="dmbool">true</i_has_folder>
<i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id>
<r_link_cnt dmfttype="dmint">0</r_link_cnt>
<r_link_high_cnt dmfttype="dmint">0</r_link_high_cnt>
<r_assembled_from_id dmfttype="dmid">0000000000000000</r_assembled_from_id>
<r_frzn_assembly_cnt dmfttype="dmint">0</r_frzn_assembly_cnt>
<r_has_frzn_assembly dmfttype="dmbool">false</r_has_frzn_assembly>
<r_is_virtual_doc dmfttype="dmint">0</r_is_virtual_doc>
<i_contents_id dmfttype="dmid">060a0d688000ec61</i_contents_id>
<a_content_type dmfttype="dmstring">crtext</a_content_type>
<r_page_cnt dmfttype="dmint">1</r_page_cnt>
<r_content_size dmfttype="dmint">130524</r_content_size>
<a_full_text dmfttype="dmbool">true</a_full_text>
<a_storage_type dmfttype="dmstring">filestore_01</a_storage_type>
<i_cabinet_id dmfttype="dmid">0c0a0d6880000105</i_cabinet_id>
<owner_name dmfttype="dmstring">Administrator</owner_name>
<owner_permit dmfttype="dmint">7</owner_permit>
<group_name dmfttype="dmstring">docu</group_name>
<group_permit dmfttype="dmint">5</group_permit>
<world_permit dmfttype="dmint">3</world_permit>
<i_antecedent_id dmfttype="dmid">0000000000000000</i_antecedent_id>
<i_chronicle_id dmfttype="dmid">090a0d6880008848</i_chronicle_id>
<i_latest_flag dmfttype="dmbool">true</i_latest_flag>
<r_lock_date dmfttype="dmdate"/>
<r_version_label dmfttype="dmstring">1.0</r_version_label>
<r_version_label dmfttype="dmstring">CURRENT</r_version_label>
<i_branch_cnt dmfttype="dmint">0</i_branch_cnt>
<i_direct_dsc dmfttype="dmbool">false</i_direct_dsc>
<r_immutable_flag dmfttype="dmbool">false</r_immutable_flag>
<r_frozen_flag dmfttype="dmbool">false</r_frozen_flag>
<r_has_events dmfttype="dmbool">false</r_has_events>
<acl_domain dmfttype="dmstring">Administrator</acl_domain>
<acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name>
<i_is_reference dmfttype="dmbool">false</i_is_reference>
<r_creator_name dmfttype="dmstring">Administrator</r_creator_name>
<r_is_public dmfttype="dmbool">true</r_is_public>
<r_policy_id dmfttype="dmid">0000000000000000</r_policy_id>
<r_resume_state dmfttype="dmint">0</r_resume_state>
<r_current_state dmfttype="dmint">0</r_current_state>
<r_alias_set_id dmfttype="dmid">0000000000000000</r_alias_set_id>
<a_is_template dmfttype="dmbool">false</a_is_template>
<r_full_content_size dmfttype="dmdouble">130524</r_full_content_size>
<a_is_signed dmfttype="dmbool">false</a_is_signed>
<a_last_review_date dmfttype="dmdate"/>
<i_retain_until dmfttype="dmdate"/>
<i_partition dmfttype="dmint">0</i_partition>
<i_is_replica dmfttype="dmbool">false</i_is_replica>
<i_vstamp dmfttype="dmint">0</i_vstamp>
<webpublish dmfttype="dmbool">false</webpublish>
</dm_sysobject>
</dmftmetadata>
<dmftvstamp>
<i_vstamp dmfttype="dmint">0</i_vstamp>
</dmftvstamp>
<dmftsecurity>
<acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name>
<acl_domain dmfttype="dmstring">Administrator</acl_domain>
<ispublic dmfttype="dmbool">true</ispublic>
</dmftsecurity>
<dmftinternal>
<docbase_id dmfttype="dmstring">658792</docbase_id>
<server_config_name dmfttype="dmstring">DSS_LH1</server_config_name>
<contentid dmfttype="dmid">060a0d688000ec61</contentid>
<r_object_id dmfttype="dmid">090a0d6880008848</r_object_id>
<r_object_type dmfttype="dmstring">techpubs</r_object_type>
<i_all_types dmfttype="dmid">030a0d68800001d7</i_all_types>
<i_all_types dmfttype="dmid">030a0d6880000129</i_all_types>
<i_all_types dmfttype="dmid">030a0d6880000105</i_all_types>
<i_dftxml_schema_version dmfttype="dmstring">5.3</i_dftxml_schema_version>
</dmftinternal>
<dmftversions>
<r_version_label dmfttype="dmstring">1.0</r_version_label>
<r_version_label dmfttype="dmstring">CURRENT</r_version_label>
<iscurrent dmfttype="dmbool">true</iscurrent>
</dmftversions>
<dmftfolders>
<i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id>
</dmftfolders>
<dmftcontents>
<dmftcontent>
<dmftcontentattrs>
<r_object_id dmfttype="dmid">060a0d688000ec61</r_object_id>
<page dmfttype="dmint">0</page>
<i_full_format dmfttype="dmstring">crtext</i_full_format>
</dmftcontentattrs>
<dmftcontentref content-type="text/plain" islocalcopy="true" lang="en"
encoding="US-ASCII" summary_tokens="dmftsummarytokens_0">
<![CDATA[...]]></dmftcontentref>
</dmftcontent>
</dmftcontents>
<dmftdsearchinternals dss_tokens="excluded">
<dmftstaticsummarytext dss_tokens="excluded"><![CDATA[mylog.txt ]]>
</dmftstaticsummarytext>
<dmftsummarytokens_0 dss_tokens="excluded"><![CDATA[1Tkns ...]]>
</dmftsummarytokens_0>
</dmftdsearchinternals>
</dmftdoc>
Following is the DFC hints file DTD. For more information on this DTD, refer to Hints file elements,
page 110.
<!ELEMENT RuleSet (Rule*)>
<!ELEMENT Rule (Condition?, DQLHint?, SelectOption?, DisableFullText?, DisableFTDQL?)>
<!ELEMENT Condition (Select?, From?, Where?, Docbase?, FulltextExpression?)>
<!ELEMENT DQLHint (#PCDATA)>
<!ELEMENT SelectOption (#PCDATA)>
<!ELEMENT DisableFullText EMPTY>
<!ELEMENT DisableFTDQL EMPTY>
<!ELEMENT Select (Attribute+)>
<!ATTLIST Select condition (all | any) \"all\">
<!ELEMENT From (Type+)>
<!ATTLIST From condition (all | any) \"all\">
<!ELEMENT Where (Attribute+)>
<!ATTLIST Where condition (all | any) \"all\">
<!ELEMENT Docbase (Name+)>
<!ELEMENT FulltextExpression EMPTY>
<!ELEMENT FulltextExpression exists (true | false) #REQUIRED>
<!ELEMENT Attribute (#PCDATA)>
<!ATTLIST Attribute operator
(equal | not_equal | greater_than | greater_equal | less_than | less_equal |
like | not_like | is_null |
is_not_null | in | not_in | between)
#IMPLIED>
<!ELEMENT Type (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ATTLIST Name descend (true | false) #IMPLIED>
You can issue the following XQuery expressions against the tracking database for each domain.
Many of these expressions are available in xPlore administrator or as audit reports. These XQuery
expressions can be submitted in the xDB console.
For example:
for $i in collection("dsearch/SystemInfo/TrackingDB/TestCustomType")
return count($i//trackinginfo/document)
For example:
for $i in collection("dsearch/SystemInfo")
return count($i//trackinginfo/document)
Find documents —
• Find collection in which a document is indexed
//trackinginfo/document[@id="<DocumentId>"]/collection-name/string(.)
For example:
for $i in collection("dsearch/SystemInfo")
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name
The following languages can be indexed. For a list of supported languages, refer to the release
notes for this release.
Unless noted, a language is analyzed for tokenization, part of speech tagging, sentence boundary
detection (SBD), base noun phrase detection (BNP), stemming, compound analysis, and alternative
readings. Some languages not in this list are identified but not indexed.
The following tables list the formats that can be indexed. Some formats that are listed are indexed
on file ID or metadata only, as noted.
Enable 3.-4.5
First Choice SS through 3.0
Framework SS 3.0
Lotus 1-2-3 through Millennium 9.6
Lotus 1-2-3 (OS/2) 2.0
Lotus 1-2-3 Charts (DOS & Windows) through 5.0
Lotus 1-2-3 for SmartSuite Versions 97 - Millennium 9.6
Lotus Symphony 1.x
Microsoft Excel Charts Versions 2.x-2007
Microsoft Excel (Mac) 98-2008
GZIP (Unix)
LZA Self Extracting Compress
LZH Compress
Microsoft Office Binder 95, 97
RAR 1.5, 2.0, 2.9
Self-extracting .exe
UUEncode
UNIX Compress
UNIX TAR
ZIP PKZip and WinZip
category
A category defines a class of documents and their XML structure.
collection
A collection is a logical group of XML documents that is physically stored in an xDB library.
A collection represents the most granular data management unit within xPlore.
CPS
The content processing service (CPS) retrieves indexable content from content sources and
determines the document format and primary language. CPS parses the content into index
tokens that xPlore can process into full-text indexes.
domain
A domain is a separate, independent group of collections with an xPlore deployment.
DQL
Documentum Query Language, used by many Content Server clients
FTDQL
Full-text Documentum Query Language
ftintegrity
A standalone Java program that checks index integrity against Content Server repository
documents. The ftintegrity script calls the state of the index job in the Content Server.
full-text index
Index structure that tracks terms and their occurrence in a document.
index agent
Documentum application that receives indexing requests from the Content Server. The agent
prepares and submits to xPlore an XML representation of the document to be indexed.
ingestion
Process in which xPlore receives an XML representation of a document and processes it
into an index.
instance
A xPlore instance is one deployment of the xPlore WAR file to an application server container.
You can have multiple instances on the same host (vertical scaling), although it is more
common to have one xPlore instance per host (horizontal scaling). The following processes
can run in an xPlore instance: CPS, indexing, search, xPlore administrator. xPlore can have
multiple instances installed on the same host.
lemmatization
Lemmatization is a normalization process in which the lemmatizer finds a canonical or
dictionary form for a word, called a lemma. Content that is indexed is also lemmatized
unless lemmatization is turned off. Terms in search queries are also lemmatized unless
lemmatization is turned off.
Lucene
Apache open-source, Java-based full-text indexing and search engine.
node
In xPlore and xDB, node is sometimes used to denote instance. It does not denote host.
persistence library
Saves CPS, indexing, and search metrics. Configurable in indexserverconfig.xml.
status library
A status library reports on indexing status for a domain. There is one status library for
each domain.
stop words
Stop words are words that are filtered out before indexing, to save the size of the index
and to prevent searches on common words.
text extraction
Identification of terms in a content file.
token
Piece of an input string defined by semantic processing rules.
tracking library
An xDB tracking library records the object IDs and location of content that has been indexed.
There is one tracking database for each domain.
transactional support
Small in-memory indexes are created in rapid transactional updates, then merged into
larger indexes. When an index is written to disk, it is considered clean. Committed and
uncommitted data before the merge is searchable along with the on-disk index.
watchdog service
Installed by the xPlore installer, the watchdog service pings all xPlore instances and sends
an email notification when an instance does not respond.
xDB
xDB is a database that enables high-speed storage and manipulation of many XML
documents. In xPlore, an xDB library stores a collection as a Lucene index and manages the
indexes on the collection. The XML content of indexed documents can optionally be stored.
XQFT
W3C full-text XQuery and XPath extensions described in XQuery and XPath Full Text
1.0. Support for XQFT includes logical full-text operators, wildcard option, anyall option,
positional filters, and score variables.
XQuery
W3C standard query language that is designed to query XML data. xPlore receives xQuery
expressions that are compliant with the XQuery standard and returns results.
A rebuild, 92
architecture restore with xDB, 94
logical, 21 collection backup
physical, 17 scripted, 97
attach domain, 40 collections
audit Documentum, 26
queries, 103 scalability, 84
audit record, 152 storage areas, mapping, 60
connectors_batch_size, 163
consistency
B index, 40
backup CONTAINS WORD, 113
file-based, 96 Content Server
incremental, 93 indexing, 27, 29
overview, 89 content storage areas
planning, 89 Documentum, mapping, 58
scripted, 97 Content too large report, 150
snapshot, 95 CPS
volume-based, 95 configure, 65
backup-directory, 97 logging, 143
batch_hint_size, 177 overview, 63
status, 65
test processing, 127
C troubleshoot, 126
cache
Documentum groups and ACLs, 48
CASample, 127 D
case sensitivity, 71 data model
categories Documentum full-text, 26
configure, 83 DB statistics, 40
manage, 84 detach domain, 40
category DFC
Documentum, 25 compared to DQL, 107
overview, 22 DFS queries
collection compared to DQL, 107
configure, 86 disk areas in xPlore, 17
create, 85 disk space, 158
create,, 85 dm_ftengine_config
delete, 85 to 86 security, 47
global, 23 settings, 106
overview, 22 summary security_mode, 49
F I
FAST incremental backup, 93
index
consistency, 40 J
rebuild, 92 jobs
remove docs, 58 state of the index, 60
index agent
configuration, 171
configure, 53 L
error threshold, 125 lemmatization
filters, 56 configure, 66
multiple, 54 managing, 65
performance, 163 troubleshooting, 67
reindexing, 124 logging
restart, 122 CPS, 143
role in indexing process, 27, 29 formats, 143
troubleshoot, 120 location, 145
index selected list, 58 log4j, 142
index server overview, 142
role in indexing process, 27, 29 queries, 146
index servers xDB, 146
Documentum integration, 14 xDB and Lucene, 146
index-value-leaf-node-only, 80 login
indexagent.xml, 171 xPlore administrator, 32
indexer queue_size, 163 Lucene
indexes and xDB, 19
create, 76 logging, 146
overview, 20
indexing
components, 17
M
exclude from, 55 metadata
manage, 75 boost in results, 101
metadata only, 55 metrics
performance, 164 and performance, 161
queue items, 29 indexing, view and configure, 80
resubmit, 58 persistence of, 38
tasks, 81
troubleshoot, 132 O
indexing report, 151
object type
indexserverconfig.xml
exlude from indexing, 55
Documentum categories, 25
orphaned segments
Documentum domains, 25
purge, 98
modifying, 36
ingestion
slow, 128 P
installing indexing software, 27, 29 performance
instance language identification, 175
activate spare, 42 limit content size, 128, 164
deactivate, 42 local filestore map, 59
display information, 37 metrics, 161
overview, 18 purge status DB, 38
instances query summary, 103
get status, 42 xDB, 88
status U
CPS, 65 upload testing document, 132
status DB
purge, 38
stop words, 71 V
storage locations volume-based
manage, 87 backup and restore, 95
summary
dynamic, 101
overview, 101
W
performance, 103 watchdog service, 45
static, 102 Webtop
system query debugging, 140
managing, 37 white space, 65
topology, 37 wild card
system management tasks, 31 highlighting, 101
wildcards, 71
T
test search, 136
X
ticket xDB
login, expired, 135 overview, 19
tokenization performance tuning, 168
language, 73 xDB admin tool, 36
special characters, 69 XHadmin, 36
Top N slowest queries, 151 xPlore administrator
tracing, 147 login, 32
troubleshoot xPlore server
CPS, 126 locations, 17
index agent, 120
indexing, 132 Z
query, 136
zone search, 112