Professional Documents
Culture Documents
Classification
DEPLOYMENT GUIDE
Document Information
Document Information
Trademarks
Thales CipherTrust Data Discovery and Classification is powered by Groundlabs.
All intellectual property is protected by copyright. All trademarks and product names used or referred to are the
copyright of their respective owners. No part of this document may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, chemical, photocopy, recording or
otherwise without the prior written permission of Thales.
Disclaimer
Thales makes no representations or warranties with respect to the contents of this document and specifically
disclaims any implied warranties of merchantability or fitness for any particular purpose. Furthermore, Thales
reserves the right to revise this publication and to make changes from time to time in the content hereof without
the obligation upon Thales to notify any person or organization of any such revisions or changes.
We have attempted to make these documents complete, accurate, and useful, but we cannot guarantee them
to be perfect. When we discover errors or omissions, or they are brought to our attention, we endeavor to
correct them in succeeding releases of the product.
You are responsible for ensuring your own compliance with various laws and regulations, including but not
limited to any data privacy or data protection regulation. You are solely responsible for obtaining advice from
competent legal counsel to assist you in the identification and interpretation of any relevant laws and
regulations that may affect your business and the implementation of any actions you may need to take to
ensure you meet your compliance obligations with respect to such laws and regulations.
The software, the products, services, and any other capabilities described or provided herein are not suitable
for all situations and may have restricted availability or applicability. Thales does not provide legal, accounting,
or auditing advice, nor does it represent or warrant that its software, services, or products will ensure that you
are in compliance with any law or regulation.
Thales invites constructive comments on the contents of this document. Send your comments, together with
your personal and/or company details to the address below.
Address Thales
4690 Millennium Drive
Belcamp, Maryland 21017
USA
Phone US 1-800-545-6608
International 1-410-931-7520
Email technical.support.DIS@thalesgroup.com
Document Information 2
DDC Deployment Architecture 10
Supported Data Stores 10
Where to Install the DDC Agents 11
How DDC Uses Hadoop 11
Deployment Prerequisites 15
Installing CipherTrust Manager 15
Installing and Configuring Hadoop 15
Configuring DNS Connectivity 16
Agent Configurations 24
Agent Compatibility and Installers 24
Installing Agents 26
Uninstalling Agents 29
Uninstalling Agents from RHEL 29
Uninstalling Agents from Debian 29
Uninstalling Agents from Windows 29
Upgrading Agents 30
Audience
This document is intended for Thales CipherTrust Data Discovery and Classification (DDC) users responsible
for classification of data discovered on data stores. It is assumed that the users of this document are proficient
with security and data discovery concepts.
All products manufactured and distributed by Thales are designed to be installed, operated, and maintained by
personnel who have the knowledge, training, and qualifications required to safely perform the tasks assigned
to them. The information, processes, and procedures contained in this document are intended for use by
trained and qualified personnel only.
Thales designs data security products for use by file server administrators, network administrators, security
engineers, database administrators, application developers, and other technology professionals responsible
for daily operations in support of data security.
Organization
The Thales CipherTrust Data Discovery and Classification Deployment Guide contains the following sections:
1. "DDC Deployment Architecture" on page 10
Contains a visual overview (a diagram) of a typical DDC deployment.
2. "Software and Hardware Requirements" on page 12
Software and hardware Requirements for the DDC Server and Agents.
3. "Deployment Prerequisites" on page 15
Lists requirements that must be fulfilled before you start the DDC deployment.
Document Conventions
This section describes the formatting conventions used in this user guide to indicate hyperlinks, special notes,
important information, tips, and warnings.
Hyperlinks
Hyperlinked text will, by default, appear in the shade of purple.
For example: All technical document templates can be found on the Technical Writing Community page.
Notifications
This user guide uses notes, tips, and warnings to alert you to important information that may help you to
complete your task, or prevent personal injury, damage to the equipment, or data loss.
Notes
Notes are used to alert you to important or helpful information. These elements use the following format:
NOTE Take note. Notes contain important or helpful information that you want to make
stand out to the user.
Cautions
Cautions are used to alert you to important information that may help prevent unexpected results or data loss.
These elements use the following format:
CAUTION! Exercise caution. Caution alerts contain important information that may help
prevent unexpected results or data loss.
Warnings
Warnings are used to alert you to the potential for catastrophic data loss or personal injury. These elements
use the following format:
**WARNING** Be extremely careful and obey all safety and security measures. In
this situation you might do something that could result in catastrophic data loss or
personal injury.
italic The italic attribute is used for emphasis or to indicate a related document. (See the
Thales CipherTrust Data Discovery and Classification Customer Release Notes for
more information.)
Double quote marks Double quote marks enclose references to other sections within the document.
For example: Refer to "Disclaimer" on page 2.
<variable> In command descriptions, angle brackets represent variables. You must substitute a
value for command line arguments that are enclosed in angle brackets.
Related Documents
The following documents contain related or additional information:
> Thales CipherTrust Data Discovery and Classification Administrator Guide
> Thales Data Platform Installation Guide
> Thales CipherTrust Data Discovery and Classification Customer Release Notes
You can view or download the latest version of the CRN for this release at this location:
https://supportportal.thalesgroup.com
DDC Deployment Architecture
This section describes the main components of Thales CipherTrust Data Discovery and Classification (DDC)
and how they operate together to provide the DDC solution. Before you go ahead with the actual deployment,
review the graphic included in this section to get a feel for what a typical DDC deployment looks like. The
concepts used in this diagram are introduced in the later sections of this document and explained at length in
the "DDC Administration Guide".
Software Requirements
DDC Agents for Debian require Debian kernel versions 3.x and higher.
Agents CipherTrust TCP 11117 Persistent Allow traffic between Agents and the
Manager CipherTrust Manager appliance.
Agents initiate the communication and
keep persistent connections.
Agents IBM DB2 TCP 50000 Non- Allow traffic between Agents and the
persistent IBM DB2 database store.
Agents initiate the communication and
need the port during the current session.
Agents Microsoft TCP 1433 Non- Allow traffic between Agents and the
SQL persistent Microsoft SQL database store.
Agents initiate the communication and
need the port during the current session.
Agents Oracle TCP 1521 Non- Allow traffic between Agents and the
persistent Oracle database store.
Agents initiate the communication and
need the port during the current session.
Agents NFS server TCP or 2049** Non- Allows scanning of NFS file shares.
UDP persistent
Agents Hadoop TCP 8020, 50075 Non- Allow traffic between Agents and
Scanning and 50010 persistent Hadoop cluster nodes.
Agents initiate the communication and
need the ports during the current
session.
Apart from Hadoop as data store, CipherTrust Manager uses Hadoop as an external database to store and
process the scan results. CipherTrust Manager initiates the communication and needs these ports to be open
during the current session:
CipherTrust Hadoop*** TCP 8443 and Non- Allow traffic between TDP cluster
Manager 8765 persistent nodes and the CipherTrust Manager
appliance. DDC supports Hadoop
NameNodes and Apache Knox.
• 137 (UDP)
• 138 (UDP)
• 139 (TCP)
** NFSv4 requires only port 2049 (TCP only). NFSv3 and older must allow connections on the following ports:
Deployment Prerequisites
> CipherTrust Manager must be installed, configured, and accessible through the GUI (also called the
console).
> Hadoop must be installed and configured with a Phoenix Query Server (PQS) mapped to HBase.
> You must also have Apache Knox installed and configured for Hadoop.
Knox must also be DNS addressable, through a network DNS or by adding the DNS entry as described in
the "CipherTrust Manager Administration Guide" ("Concepts" > "DNS Hosts" > "Configuring DNS Hosts"
section).
From then on, the active DDC node will store all the configuration settings in its database. The database of the
active node gets replicated over all cluster nodes, so every cluster member has an identical copy of the
database. All the copies get synchronized and updated every time new data is inserted into the copy on the
active node.
> To assign the active DDC node by using the CipherTrust Manager UI, follow this procedure:
a. Log in to the CipherTrust Manager node that you want to make the active DDC node.
b. Click the Data Discovery and Classification link to open the DDC app.
You should see the "The current node is inactive. The node must be activated to use DDC." message.
c. Click the Activate button below the message.
The CipherTrust Manager node becomes the active DDC node.
> To assign the active DDC node by using the CipherTrust Manager command line, you need the ksctl tool
installed and configured. For information on installing and configuring the tool, refer to the "Interfaces > CLI"
section in the "DDC Administration Guide".
Connect to the CipherTrust Manager node that you want to make the active DDC node, and issue this
command:
ksctl ddc active-node register
After you assign an active DDC node, you can perform all DDC related tasks through that node. The other
nodes - non-active nodes - will not allow you to work with DDC. When you log in to a CipherTrust Manager node
that is a non-active DDC node and enter the Data Discovery and Classification application, you will see this
message displayed:
"You are currently connected to an inactive node. You must switch to the active node <active DDC node IP
address> to run DDC."
The output shows the IP address of the cluster's active node. Use this IP address to configure a DNS entry for
the active CipherTrust Manager. Use that DNS entry to configure the Agents and access the CipherTrust
Manager UI.
{
"public_address": "mycluster.thalesgroup.com",
"host": "10.45.102.101"
}
If the CipherTrust Manager appliance is not in a cluster, the command returns the following error:
{
"code": 15,
"codeDesc": "NCERRBadRequest: Bad HTTP request",
"message": "oleander is not in cluster mode"
}
In this case, just use the IP address (or DNS entry) of this single CipherTrust Manager node.
NOTE Only the users with access to the root domain have access to and can modify the
Hadoop Services configuration. For more information about domains, refer to the "Thales
CipherTrust Manager Administrator Guide".
Configuring HBase
1. To configure DDC for HBase, click the PQS tab and configure the PQS settings:
a. Hostname and Port - the connection details of the Knox server.
You must use the hostname of your Knox server not its IP address as the server certificate that you need
to import later on is hostname based. The default port is 8443.
Knox must also be DNS addressable, through a network DNS or by adding the DNS entry as described
in the "CipherTrust Manager Administration Guide" ("Concepts" > "DNS Hosts" > "Configuring DNS
Hosts" section).
b. URI - the PQS path as configured in Knox. The path consists of the Knox server bit and the PQS bit. Here
is an example:
/gateway/default/avatica
where "gateway" is the Knox bit, "default" is the topology name, and "avatica" is the service name.
NOTE If you are not using the default topology, use your topology name instead of the
"default" bit in the URI.
TIP To avoid potential issues, it is recommended to have this schema already created prior
to performing this configuration step. Refer to the "Thales Data Platform Installation Guide",
section "12. Creating the PQS Schema".
d. Server Certificate - Knox uses HTTPS for connection that is why DDC will only connect to the Hadoop
Services if a valid and trusted certificate is used. If this certificate was signed by a public and recognized
CA, DDC will trust it automatically, otherwise you should manually import it here.
Click the Choose File button in the Select Server Certificate section and import the server certificate.
IMPORTANT: Use the same certificate as you exported earlier in "13. Exporting the Knox Server
Certificate" in the "Thales Data Platform Installation Guide".
TIP
> If the certificate is self-signed, you have to export it first on the Knox server and then import
on the machine where you are running the CipherTrust Manager console. The certificate
is then permanently stored in DDC. You do not need to repeat this step on another DDC
client machine.
> Binary format certificates are not accepted. You can only import a plain-text certificate
(such as for example, the Base-64 encoded X.509 standard).
e. In the AUTHENTICATION section fill in the Username and Password string for a Knox authorized
user.
IMPORTANT: Use the same authentication details as you configured earlier in "9. Configuring Knox" in
the "Thales Data Platform Installation Guide".
2. Click Save Changes to save the configuration.
Configuring HDFS
1. To configure DDC for HDFS, click the HDFS tab, and configure the HDFS settings:
a. Hostname and Port - the connection details of the Knox server.
You must use the hostname of your Knox server not its IP address as the server certificate that you need
to import later on is hostname based. The default port is 8443.
Knox must also be DNS addressable, through a network DNS or by adding the DNS entry as described
in the "CipherTrust Manager Administration Guide" ("Concepts" > "DNS Hosts" > "Configuring DNS
Hosts" section).
b. URI - the path to HDFS as configured in Knox. The path consists of the Knox server bit and the HDFS bit.
For example:
/gateway/default/webhdfs/v1
where "gateway" is the Knox bit, "default" is the topology name, and "webhdfs/v1" is the HDFS bit.
NOTE If you are not using the default topology, use your topology name instead of the
"default" bit in the URI.
c. Folder - type in the DDC file system directory in HDFS that you created earlier (for example
/ciphertrust_ddc).
IMPORTANT: Use the same directory name as you configured earlier in "11. Creating DDC Directory
Under HDFS" in the "Thales Data Platform Installation Guide".
d. Server Certificate - as HTTPS is forced as the protocol for Knox you have to import the server
certificate here.
Click the Choose File button in the Select Server Certificate section and import the server certificate.
IMPORTANT: Use the same certificate as you exported earlier in "13. Exporting the Knox Server
Certificate" in the "Thales Data Platform Installation Guide".
NOTE
> The certificate is typically self-signed so you have to export it first on the Knox server and
then import on the machine where you are running the CipherTrust Manager console. The
certificate is then permanently stored in DDC. You do not need to repeat this step on
another DDC client machine.
> Binary format certificates are not accepted. You can only import a plain-text certificate
(such as for example, the Base-64 encoded X.509 standard).
e. In the AUTHENTICATION section fill in the Username and Password string for a Knox authorized
user.
IMPORTANT: Use the same authentication details as you configured earlier in "9. Configuring Knox" in
the "Thales Data Platform Installation Guide".
2. Click Save Changes to save the configuration.
The connection was successful if no error is returned and this message is displayed:
“Success. HDFS settings have been updated”
Agent Configurations
DDC supports two types of Agent configurations:
> Local: Agent is installed and configured directly on the machine that contains sensitive data.
> Proxy: Agent is installed and configured on a proxy machine that is used to scan sensitive data on other
machines.
The instructions to install and configure Agents in both types of configurations are the same.
Debian based Local Debian (.deb) - 32 bit, 64 bit, 64 bit with database
distros support (compatible with Debian 9 only)
Database IBM DB2 11.1 and Local, Proxy Windows (.msi) - 32 bit with database support, 64
Storage higher bit with database support
Microsoft SQL 2005 Local, Proxy Windows (.msi) - 32 bit with database support, 64
and higher bit with database support
Oracle 9 and higher Local, Proxy Windows (.msi) - 32 bit with database support, 64
bit with database support
Network Storage Unix File Share Proxy Debian (.deb) - 32 bit, 64 bit, 64 bit with database
(NFS) support (compatible with Debian 9 only)
RHEL (.rpm) - 32 bit, 64 bit
Big Data Hadoop Cluster Proxy Debian (.deb) 64 bit with database support
(compatible with Ubuntu 18 only)*
* Linux 3 64-bit "database runtime" version with additional packages for use with Hadoop Clusters only. For Hadoop as data
store, Agents must have the following packages installed:
• libaio1
• libgsasl7
• libxml2
• libprotobuf10
Installing Agents
This section provides procedures for installing Agents for all supported operating systems.
1. Download the latest version of a compatible Agent installation file from the Thales Customer Support Portal.
The "Agent Compatibility and Installers" on page 24 lists all the data stores and their matching installer
packages.
2. Save the downloaded installer on the host machine where you want to install the Agent.
3. Follow the appropriate installation procedure for your operating system.
TIP Before you begin the installation, make sure that CipherTrust Manager is reachable from
the host where you are installing the Agent.
For example:
#rpm -ivh er2-2.0.31-linux26-rh-x64.rpm
The package name that you use with the command may be different and depends on your system's
architecture and Agent type.
3. Connect the Agent to the active CipherTrust Manager node:
#er2-config -i <hostname|ip_address>
If the connection has been correctly configured, you should see the following message:
Testing connection setting...
Test SUCCESS. Saving settings
Configuration updated, please restart agent service
The configuration has been saved. Please restart the agent for the changes to take effect.
• Option 2
#/etc/init.d/er2-agent stop
#/etc/init.d/er2-agent start
NOTE The installation script creates an erecon user in the erecon group. Please ensure
that this user (or group) is able to read all the files to scan. For security reasons, the account
has its password locked to ensure that the user is solely used by the DDC scanning agent.
For example:
sudo dpkg -i er2_2.0.31-linux26-x64.deb
The package name that you use with the command may be different and depends on your system's
architecture and Agent type.
3. Connect the Agent to the active CipherTrust Manager node:
sudo er2-config -i <hostname|ip_address>
If the connection has been correctly configured, you should see the following message:
Testing connection setting...
Test SUCCESS. Saving settings
Configuration updated, please restart agent service
The configuration has been saved. Please restart the agent for the changes to take effect.
• Option 2
sudo /etc/init.d/er2-agent stop
sudo /etc/init.d/er2-agent start
NOTE The installation script creates an erecon user in the erecon group. Please ensure
that this user (or group) is able to read all the files to scan. For security reasons, the account
has its password locked to ensure that the user is solely used by the DDC scanning agent.
NOTE The installer creates a service called Enterprise Recon 2 Agent that runs under the
Local System user account.
Uninstalling Agents
Uninstalling Agents from RHEL
To uninstall a DDC Agent:
1. Stop the DDC Agent.
/etc/init.d/er2-agent stop
TIP Alternatively, to uninstall a DDC Agent from CLI, run the following commands as
Administrator:
net stop "Enterprise Recon 2 Agent (<ARCH>)"
wmic product where name="Enterprise Recon 2 Agent (<ARCH>)" uninstall
Upgrading Agents
Agents do not require an upgrade unless a feature available in an updated version of the Agent is needed.
Older versions of the Agent are compatible with newer versions of the Server. To upgrade an Agent, simply re-
install it. Prior to reinstalling, you have to uninstall the Agent.
> For instructions on uninstalling Agents, see:
• "Uninstalling Agents" on page 29
• "Uninstalling Agents" on page 29
• "Uninstalling Agents from Windows" on page 29
> For instructions on installing Agents, see:
• "Installing Agents" on page 26
• "Installing Agents" on page 26
• "Installing Agents" on page 26