You are on page 1of 33

Thales CipherTrust Data Discovery and

Classification
DEPLOYMENT GUIDE
Document Information
Document Information

Product Version 2.2.0

Document Number 007-000727-002, Rev A

Release Date 17 February 2021

Trademarks
Thales CipherTrust Data Discovery and Classification is powered by Groundlabs.
All intellectual property is protected by copyright. All trademarks and product names used or referred to are the
copyright of their respective owners. No part of this document may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, chemical, photocopy, recording or
otherwise without the prior written permission of Thales.

Disclaimer
Thales makes no representations or warranties with respect to the contents of this document and specifically
disclaims any implied warranties of merchantability or fitness for any particular purpose. Furthermore, Thales
reserves the right to revise this publication and to make changes from time to time in the content hereof without
the obligation upon Thales to notify any person or organization of any such revisions or changes.
We have attempted to make these documents complete, accurate, and useful, but we cannot guarantee them
to be perfect. When we discover errors or omissions, or they are brought to our attention, we endeavor to
correct them in succeeding releases of the product.
You are responsible for ensuring your own compliance with various laws and regulations, including but not
limited to any data privacy or data protection regulation. You are solely responsible for obtaining advice from
competent legal counsel to assist you in the identification and interpretation of any relevant laws and
regulations that may affect your business and the implementation of any actions you may need to take to
ensure you meet your compliance obligations with respect to such laws and regulations.
The software, the products, services, and any other capabilities described or provided herein are not suitable
for all situations and may have restricted availability or applicability. Thales does not provide legal, accounting,
or auditing advice, nor does it represent or warrant that its software, services, or products will ensure that you
are in compliance with any law or regulation.
Thales invites constructive comments on the contents of this document. Send your comments, together with
your personal and/or company details to the address below.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 2
Contact Method Contact Information

Address Thales
4690 Millennium Drive
Belcamp, Maryland 21017
USA

Phone US 1-800-545-6608

International 1-410-931-7520

Email technical.support.DIS@thalesgroup.com

Technical Support https://supportportal.thalesgroup.com


Customer Portal Existing customers with a Technical Support Customer Portal account can log in to manage
incidents, get the latest software upgrades, and access the Thales Knowledge Base.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 3
CONTENTS

Document Information 2

About this Document 6


Audience 6
What's in This Guide 6
Organization 6
Document Conventions 7
Hyperlinks 7
Notifications 7
Command Syntax and Typeface Conventions 8
Related Documents 9

DDC Deployment Architecture 10
Supported Data Stores 10
Where to Install the DDC Agents 11
How DDC Uses Hadoop 11

Software and Hardware Requirements 12


Hardware Requirements 12
Software Requirements 12
Ports Used for Communication 12

Deployment Prerequisites 15
Installing CipherTrust Manager 15
Installing and Configuring Hadoop 15
Configuring DNS Connectivity 16

Running DDC in a Cluster 17


Deploying DDC into a New Environment 17
Deploying DDC into an Existing Cluster 17
Assigning the Active DDC Node 18
Identifying the Active DDC Node 18

Configuring CipherTrust Manager 20


Configuring HBase 20
Configuring HDFS 22

Agent Configurations 24
Agent Compatibility and Installers 24

Installing Agents 26

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 4
Installing Agents on RHEL 26
Installing Agents on Debian 27
Installing Agents on Windows 27

Uninstalling Agents 29
Uninstalling Agents from RHEL 29
Uninstalling Agents from Debian 29
Uninstalling Agents from Windows 29

Upgrading Agents 30

Hardening the Deployment 31


General Hardening Recommendations 31
Certificate Security Recommendations 31
Securing the Hadoop Configuration 32

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 5
About this Document
This introductory section identifies the audience, provides a brief summary of the contents of this guide, and
discusses the documentation conventions used. It contains the following sections:
> "Audience" below
> "What's in This Guide" below
> "Organization" below
> "Document Conventions" on the next page

Audience
This document is intended for Thales CipherTrust Data Discovery and Classification (DDC) users responsible
for classification of data discovered on data stores. It is assumed that the users of this document are proficient
with security and data discovery concepts.
All products manufactured and distributed by Thales are designed to be installed, operated, and maintained by
personnel who have the knowledge, training, and qualifications required to safely perform the tasks assigned
to them. The information, processes, and procedures contained in this document are intended for use by
trained and qualified personnel only.
Thales designs data security products for use by file server administrators, network administrators, security
engineers, database administrators, application developers, and other technology professionals responsible
for daily operations in support of data security.

What's in This Guide


This guide provides information on deploying the Thales CipherTrust Data Discovery and Classification
(Thales DDC) solution. The document describes the requirements that must be met before installing and
configuring the DDC Agents on host machines. The document also provides list of compatible installers for
different types of data stores. Finally, the document provides guidelines on enhancing the performance and
security of the DDC solution. The DDC documentation does not include information that is already covered in
the CipherTrust Manager documentation, available through the Product Documentation link on the
CipherTrust Manager GUI login page.

Organization
The Thales CipherTrust Data Discovery and Classification Deployment Guide contains the following sections:
1. "DDC Deployment Architecture" on page 10
Contains a visual overview (a diagram) of a typical DDC deployment.
2. "Software and Hardware Requirements" on page 12
Software and hardware Requirements for the DDC Server and Agents.
3. "Deployment Prerequisites" on page 15
Lists requirements that must be fulfilled before you start the DDC deployment.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 6
About this Document

4. "Running DDC in a Cluster" on page 17


Information and considerations for deploying DDC in a CipherTrust Manager cluster.
5. "Configuring CipherTrust Manager" on page 20
Provides instructions to configure DDC so that it can use Hadoop for storing its data.
6. "Agent Configurations" on page 24
The section lists all supported installers for different types of data stores.
7. "Installing Agents" on page 26
Provides instructions to install DDC Agents on Windows and Linux hosts.
8. "Hardening the Deployment" on page 31
Provides guidelines to enhance the performance and security of your DDC deployment.
9. "Uninstalling Agents" on page 29
Provides instructions to uninstall DDC Agents from Windows and Linux hosts.
10. "Upgrading Agents" on page 30
Instructions for upgrading DDC Agents on supported platforms.

Document Conventions
This section describes the formatting conventions used in this user guide to indicate hyperlinks, special notes,
important information, tips, and warnings.

Hyperlinks
Hyperlinked text will, by default, appear in the shade of purple.
For example: All technical document templates can be found on the Technical Writing Community page.

Notifications
This user guide uses notes, tips, and warnings to alert you to important information that may help you to
complete your task, or prevent personal injury, damage to the equipment, or data loss.

Notes
Notes are used to alert you to important or helpful information. These elements use the following format:

NOTE Take note. Notes contain important or helpful information that you want to make
stand out to the user.

Cautions
Cautions are used to alert you to important information that may help prevent unexpected results or data loss.
These elements use the following format:

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 7
About this Document

CAUTION! Exercise caution. Caution alerts contain important information that may help
prevent unexpected results or data loss.

Warnings
Warnings are used to alert you to the potential for catastrophic data loss or personal injury. These elements
use the following format:

**WARNING** Be extremely careful and obey all safety and security measures. In
this situation you might do something that could result in catastrophic data loss or
personal injury.

Command Syntax and Typeface Conventions


Convention Description

bold The bold attribute is used to indicate the following:


> Button names (Click Save As.)
> Check box and radio button names (Select the Print Duplex check box.)
> Dialog box titles (On the Protect Document dialog box, click Yes.)
> Field names (User Name: Enter the name of the user.)
> Menu names (On the File menu, click Save.) (Click Menu > Go To > Folders.)
> User input (In the Date box, type April 1.)

italic The italic attribute is used for emphasis or to indicate a related document. (See the
Thales CipherTrust Data Discovery and Classification Customer Release Notes for
more information.)

Double quote marks Double quote marks enclose references to other sections within the document.
For example: Refer to "Disclaimer" on page 2.

<variable> In command descriptions, angle brackets represent variables. You must substitute a
value for command line arguments that are enclosed in angle brackets.

[ optional ] Square brackets enclose optional keywords or <variables> in a command line


[ <optional> ] description. Optionally enter the keyword or <variable> that is enclosed in square
brackets, if it is necessary or desirable to complete the task.
[ a | b | c ] Square brackets enclose optional alternate keywords or variables in a command line
description. Choose one command line argument enclosed within the braces, if
[<a> | <b> | <c>]
desired. Choices are separated by vertical (OR) bars.

{ a | b | c } Braces enclose required alternate keywords or <variables> in a command line


{ <a> | <b> | <c> } description. You must choose one command line argument enclosed within the
braces. Choices are separated by vertical (OR) bars.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 8
About this Document

Related Documents
The following documents contain related or additional information:
> Thales CipherTrust Data Discovery and Classification Administrator Guide
> Thales Data Platform Installation Guide
> Thales CipherTrust Data Discovery and Classification Customer Release Notes
You can view or download the latest version of the CRN for this release at this location:
https://supportportal.thalesgroup.com

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 9
DDC Deployment Architecture

DDC Deployment Architecture
This section describes the main components of Thales CipherTrust Data Discovery and Classification (DDC)
and how they operate together to provide the DDC solution. Before you go ahead with the actual deployment,
review the graphic included in this section to get a feel for what a typical DDC deployment looks like. The
concepts used in this diagram are introduced in the later sections of this document and explained at length in
the "DDC Administration Guide".

* A Windows Proxy is needed to connect to databases.


At the heart of the DDC solution is CipherTrust Manager on which runs the DDC Server. It is from here that
users interact with the DDC GUI or use the DDC APIs to create classification profiles, add data stores, launch
scans and generate reports.

Supported Data Stores


DDC supports a number of different data stores:
> NFS shares
> CIFS shares
> Databases
> Hadoop storage
> Local storage (Windows and Linux)
In order to access these data stores, the DDC Server communicates with one or more DDC Agents. The DDC
Agent is a software component that is used to scan a data store for Infotypes (such as credit card numbers,
email addresses and so on) that are part of a classification profile. All data that is collected is sent from the
Agent to the DDC Server which stores the data, together with any user requested reports, on an external
Hadoop cluster.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 10
DDC Deployment Architecture

Where to Install the DDC Agents


Generally speaking, if you are scanning data stores that are local to Windows or Linux (no network shares),
you should install the DDC Agent on the server where the data is located. For all other types of storage (top
part of the figure), the DDC Agent should be installed on a proxy server.
As an example, let’s assume that you wish to scan an NFS share. In this case, the NFS share should be
mounted on the proxy server and the DDC Agent should be installed on the proxy server. To scan the share,
specify the mount point of the NFS share when creating the scan. For DDC Agent requirements and the types
of data stores supported, see "Agent Configurations" on page 24. For information on securing the deployment,
refer to "Hardening the Deployment" on page 31.

How DDC Uses Hadoop


DDC uses Hadoop to generate reports from scans and to store their results (report data). Thales Data Platform
(TDP) is the only Hadoop flavor currently supported for this purpose. This is different than the Hadoop cluster
that DDC also supports as a data store, that is where the user stores the data.
DDC can directly query HDFS but it requires Phoenix Query Server (PQS) to interface with Hadoop's HBase to
benefit from its multitenancy features.
The DDC server retrieves the results of the scan from the DDC Agent and stores this information in TDP
together with any reports that are generated. It is imperative that your TDP cluster is highly available to avoid
losing any data store scans or reports.
DDC also requires Apache Knox as a single point of access to the TDP cluster (both HBase and HDFS), to
ensure all the communications get protected with TLS, and for authentication. Therefore, you only need to
connect up DDC to Knox. For information on configuring DDC to use TDP, see "Configuring CipherTrust
Manager" on page 20.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 11
Software and Hardware Requirements

Software and Hardware Requirements


Hardware Requirements
CipherTrust Manager requirements
DDC is only supported when running CipherTrust Manager as a Virtual Machine. The CipherTrust Manager
VM has the following requirements:
> RAM: 16 GB minimum
> CPU: 4 cores. It is recommended to add extra cores if the average CPU usage is above 50% or CPU load is
above 80% for extended periods of time.
> Disk space: at least 256 GB
Agent requirements
Each concurrent DDC scan requires one core and typically less than 1 GB of RAM. Agents do not launch
concurrent Local Storage scans. When running Local scans Linux agents require a minimum of 1 core and 1
GB of RAM and Windows agents require 2 cores and 4 GB of RAM.
The above requirements only take the DDC scanning agent requirements into account. The operating system
requires additional resources, usually 1-2 cores and 2-4 GB of RAM, and consider the requirements of the
other services.
Please note that an agent running in a server can behave as a Local proxy for scanning this server and as a
Proxy agent to scan other Data Stores, so you should monitor the agent resource consumption while the scans
are running if needed.

Software Requirements
DDC Agents for Debian require Debian kernel versions 3.x and higher.

Ports Used for Communication


This section provides a list of ports that should allow communication among agents, data stores, and
CipherTrust Manager. Firewalls should be configured to allow this communication.
The following table lists the ports that are used by agents to connect to data stores:

Initiator Receiver Protocol Port(s) Connection Description


Type

Agents CipherTrust TCP 11117 Persistent Allow traffic between Agents and the
Manager CipherTrust Manager appliance.
Agents initiate the communication and
keep persistent connections.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 12
Software and Hardware Requirements

Initiator Receiver Protocol Port(s) Connection Description


Type

Agents IBM DB2 TCP 50000 Non- Allow traffic between Agents and the
persistent IBM DB2 database store.
Agents initiate the communication and
need the port during the current session.

Agents Microsoft TCP 1433 Non- Allow traffic between Agents and the
SQL persistent Microsoft SQL database store.
Agents initiate the communication and
need the port during the current session.

Agents Oracle TCP 1521 Non- Allow traffic between Agents and the
persistent Oracle database store.
Agents initiate the communication and
need the port during the current session.

Agents CIFS/SMB TCP 445* Non- Allows scanning of Windows remote


server persistent CIFS file shares.

Agents NFS server TCP or 2049** Non- Allows scanning of NFS file shares.
UDP persistent

Agents Hadoop TCP 8020, 50075 Non- Allow traffic between Agents and
Scanning and 50010 persistent Hadoop cluster nodes.
Agents initiate the communication and
need the ports during the current
session.

Apart from Hadoop as data store, CipherTrust Manager uses Hadoop as an external database to store and
process the scan results. CipherTrust Manager initiates the communication and needs these ports to be open
during the current session:

Initiator Receiver Protocol Port(s) Connection Description


Type

CipherTrust Hadoop*** TCP 8443 and Non- Allow traffic between TDP cluster
Manager 8765 persistent nodes and the CipherTrust Manager
appliance. DDC supports Hadoop
NameNodes and Apache Knox.

* Additional ports. For Windows 2000 and older:

• 137 (UDP)

• 138 (UDP)

• 139 (TCP)
** NFSv4 requires only port 2049 (TCP only). NFSv3 and older must allow connections on the following ports:

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 13
Software and Hardware Requirements

• 111 (TCP or UDP)

• Dynamic ports assigned by rpcbind.


*** Thales Data Platform (TDP) is the only Hadoop flavor currently supported.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 14
Deployment Prerequisites

Deployment Prerequisites
> CipherTrust Manager must be installed, configured, and accessible through the GUI (also called the
console).
> Hadoop must be installed and configured with a Phoenix Query Server (PQS) mapped to HBase.
> You must also have Apache Knox installed and configured for Hadoop.
Knox must also be DNS addressable, through a network DNS or by adding the DNS entry as described in
the "CipherTrust Manager Administration Guide" ("Concepts" > "DNS Hosts" > "Configuring DNS Hosts"
section).

Installing CipherTrust Manager


DDC is shipped as a module of CipherTrust Manager with a trial license already installed so no additional
installation should be required in CipherTrust Manager. If you don't have CipherTrust Manager already
installed, or you cannot find DDC in the list of installed licenses contact Thales Group or refer to the
CipherTrust Manager product documentation for instructions.

Installing and Configuring Hadoop


Thales Data Platform 3.1.5 (TDP) is a Big Data platform based on Hadoop technology. We recommend
running a 5 node TDP 3.1.5 cluster that has the following services available:
> HDFS
> HBase
> Phoenix Query Server (PQS) - available on at least one node
> Knox
We recommend 2 name nodes and 3 data nodes. Each node should have the following minimum hardware
configuration:
> 8 CPUs / vCPUS
> 16 GB RAM
> 100 GB of disk
For installing TDP 3.1.5 refer to the “Thales Data Platform Installation Guide” and perform all the steps in there
before continuing with the DDC installation.
For information about Hadoop, refer to the official HDP documentation page:
http://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/index.html

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 15
Deployment Prerequisites

Configuring DNS Connectivity


To ensure uninterrupted connectivity to the UI, it is recommended to create a DNS entry for the active
CipherTrust Manager node and use this entry when configuring the Agents and accessing the CipherTrust
Manager UI. This way the Agents will be able to reach CipherTrust Manager even if its IP address changes.
> Standalone CipherTrust Manager: If the CipherTrust Manager appliance is standalone (not in a cluster),
configure a DNS entry with its IP address.
> Clustered CipherTrust Manager: In a CipherTrust Manager cluster, configure a DNS entry with the IP
address of the cluster's "active" node. See "Identifying the Active DDC Node" on page 18 for details.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 16
Running DDC in a Cluster

Running DDC in a Cluster


Although running a DDC cluster is currently not supported, Agents can be deployed into a CipherTrust
Manager cluster. In this configuration, one of the CipherTrust Manager nodes, has to be assigned the active
DDC node, so all the Agents report to it. This DDC configuration does not support failover, so if the active node
fails so will DDC. DDC will not work, as long as the active node is down, so the only way to retain the DDC
operations without any data loss is to restore the original active node.
There are two possible scenarios for running DDC in a cluster:
> "Greenfield" deployment - in this scenario DDC is deployed into a completely new environment. See
"Deploying DDC into a New Environment" below for more information.
> "Brownfield" deployment - in this scenario DDC is deployed into an existing CipherTrust Manager
cluster. See "Deploying DDC into an Existing Cluster" below for more information.
In both these scenarios it is essential to identify the active DDC node as you need an active DDC node for DDC
to function in a cluster. For information on assigning the active node and identifying the IP address/hostname
of the active node, refer to "Assigning the Active DDC Node" on the next page and "Identifying the Active DDC
Node" on the next page .

Deploying DDC into a New Environment


If you are deploying DDC into an environment where there is no existing CipherTrust Manager server that you
could connect the Agents to, you have to start by deploying CipherTrust Manager first as you need a running,
active CipherTrust Manager server to complete Agents’ installation and configuration. If you’re considering
having a CipherTrust Manager cluster at some point, and using the cluster with DDC, you need to plan your
deployment ahead and think it out carefully.
It is strongly recommended to create any planned cluster of CipherTrust Manager nodes prior to deploying
DDC into it. Creating a CipherTrust Manager cluster out of CipherTrust Manager servers already hosting DDC
poses a risk of a loss of data collected by DDC prior to the cluster creation. This is because any new
CipherTrust Manager node added to the cluster gets its DDC database wiped out and replaced with a copy of
the database of the active node.
This scenario may prove destructive for cases, for example, when you initially planned to have a few
independent DDC systems (for example, for monitoring different segments of the network) with unclustered
CipherTrust Manager servers but then decided to form a cluster out of them. In such a case, all CipherTrust
Manager servers that also act as DDC servers will lose all their DDC data and settings except for the active
node.

Deploying DDC into an Existing Cluster


Deploying DDC into an existing CipherTrust Manager cluster is a relatively straightforward task and it should
not pose huge problems. After the installation, you just have to make one of the CipherTrust Manager nodes
the active DDC node. Next, you have to connect up all the Agents to that active DDC node to complete their
configuration.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 17
Running DDC in a Cluster

From then on, the active DDC node will store all the configuration settings in its database. The database of the
active node gets replicated over all cluster nodes, so every cluster member has an identical copy of the
database. All the copies get synchronized and updated every time new data is inserted into the copy on the
active node.

Assigning the Active DDC Node


DDC always requires one active node regardless of the number of CipherTrust Manager nodes (whether it is a
single CipherTrust Manager node or a cluster of two or more CipherTrust Manager nodes). It is this active
node that the DDC Agents point to (via DNS) and it is only this node that will respond to GUI operations.
To create an active DDC node you have to "earmark" one of the CipherTrust Manager nodes as the active
DDC node. This is a manual procedure, which can be performed either through the CipherTrust Manager UI or
the command line. In both cases, you have to have the CipherTrust Manager admin rights. The assignment of
the active DDC node does not affect the normal CipherTrust Manager cluster operation.

NOTE The assignment of a DDC active node cannot be undone!

> To assign the active DDC node by using the CipherTrust Manager UI, follow this procedure:
a. Log in to the CipherTrust Manager node that you want to make the active DDC node.
b. Click the Data Discovery and Classification link to open the DDC app.
You should see the "The current node is inactive. The node must be activated to use DDC." message.
c. Click the Activate button below the message.
The CipherTrust Manager node becomes the active DDC node.
> To assign the active DDC node by using the CipherTrust Manager command line, you need the ksctl tool
installed and configured. For information on installing and configuring the tool, refer to the "Interfaces > CLI"
section in the "DDC Administration Guide".
Connect to the CipherTrust Manager node that you want to make the active DDC node, and issue this
command:
ksctl ddc active-node register

After you assign an active DDC node, you can perform all DDC related tasks through that node. The other
nodes - non-active nodes - will not allow you to work with DDC. When you log in to a CipherTrust Manager node
that is a non-active DDC node and enter the Data Discovery and Classification application, you will see this
message displayed:
"You are currently connected to an inactive node. You must switch to the active node <active DDC node IP
address> to run DDC."

Identifying the Active DDC Node


In this section, we show how to locate the active node. For this procedure you need the ksctl tool. The tool
must be configured to access any of the cluster nodes. For detailed information on installing and configuring
the ksctl tool, refer to the "Interfaces > CLI" section in the "DDC Administration Guide".
> To find the IP address of the active CipherTrust Manager node, run this command:

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 18
Running DDC in a Cluster

ksctl ddc active-node info

The output shows the IP address of the cluster's active node. Use this IP address to configure a DNS entry for
the active CipherTrust Manager. Use that DNS entry to configure the Agents and access the CipherTrust
Manager UI.
{
"public_address": "mycluster.thalesgroup.com",
"host": "10.45.102.101"
}
If the CipherTrust Manager appliance is not in a cluster, the command returns the following error:
{
"code": 15,
"codeDesc": "NCERRBadRequest: Bad HTTP request",
"message": "oleander is not in cluster mode"
}
In this case, just use the IP address (or DNS entry) of this single CipherTrust Manager node.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 19
Configuring CipherTrust Manager

Configuring CipherTrust Manager


To configure DDC with Hadoop, perform the following steps:
1. Type the CipherTrust Manager URL in the browser and log on to the CipherTrust Manager console as the
administrator.
2. Click the Data Discovery link to open the Data Discovery configuration screen.
3. In the sidebar on the left, click Settings > Hadoop Services. The Hadoop Services page is displayed.
This page contains two tabs, PQS and HDFS. By default, PQS is the active tab.

NOTE Only the users with access to the root domain have access to and can modify the
Hadoop Services configuration. For more information about domains, refer to the "Thales
CipherTrust Manager Administrator Guide".

**WARNING** Once configured, the Hadoop Services settings must not be


modified or you will lose access to all data.

Configuring HBase
1. To configure DDC for HBase, click the PQS tab and configure the PQS settings:
a. Hostname and Port - the connection details of the Knox server.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 20
Configuring CipherTrust Manager

You must use the hostname of your Knox server not its IP address as the server certificate that you need
to import later on is hostname based. The default port is 8443.
Knox must also be DNS addressable, through a network DNS or by adding the DNS entry as described
in the "CipherTrust Manager Administration Guide" ("Concepts" > "DNS Hosts" > "Configuring DNS
Hosts" section).
b. URI - the PQS path as configured in Knox. The path consists of the Knox server bit and the PQS bit. Here
is an example:
/gateway/default/avatica
where "gateway" is the Knox bit, "default" is the topology name, and "avatica" is the service name.

NOTE If you are not using the default topology, use your topology name instead of the
"default" bit in the URI.

c. Schema - the PQS Schema as mapped to HBase namespace.

TIP To avoid potential issues, it is recommended to have this schema already created prior
to performing this configuration step. Refer to the "Thales Data Platform Installation Guide",
section "12. Creating the PQS Schema".

d. Server Certificate - Knox uses HTTPS for connection that is why DDC will only connect to the Hadoop
Services if a valid and trusted certificate is used. If this certificate was signed by a public and recognized
CA, DDC will trust it automatically, otherwise you should manually import it here.
Click the Choose File button in the Select Server Certificate section and import the server certificate.
IMPORTANT: Use the same certificate as you exported earlier in "13. Exporting the Knox Server
Certificate" in the "Thales Data Platform Installation Guide".

TIP
> If the certificate is self-signed, you have to export it first on the Knox server and then import
on the machine where you are running the CipherTrust Manager console. The certificate
is then permanently stored in DDC. You do not need to repeat this step on another DDC
client machine.
> Binary format certificates are not accepted. You can only import a plain-text certificate
(such as for example, the Base-64 encoded X.509 standard).

e. In the AUTHENTICATION section fill in the Username and Password string for a Knox authorized
user.
IMPORTANT: Use the same authentication details as you configured earlier in "9. Configuring Knox" in
the "Thales Data Platform Installation Guide".
2. Click Save Changes to save the configuration.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 21
Configuring CipherTrust Manager

Configuring HDFS

1. To configure DDC for HDFS, click the HDFS tab, and configure the HDFS settings:
a. Hostname and Port - the connection details of the Knox server.
You must use the hostname of your Knox server not its IP address as the server certificate that you need
to import later on is hostname based. The default port is 8443.
Knox must also be DNS addressable, through a network DNS or by adding the DNS entry as described
in the "CipherTrust Manager Administration Guide" ("Concepts" > "DNS Hosts" > "Configuring DNS
Hosts" section).
b. URI - the path to HDFS as configured in Knox. The path consists of the Knox server bit and the HDFS bit.
For example:
/gateway/default/webhdfs/v1
where "gateway" is the Knox bit, "default" is the topology name, and "webhdfs/v1" is the HDFS bit.

NOTE If you are not using the default topology, use your topology name instead of the
"default" bit in the URI.

c. Folder - type in the DDC file system directory in HDFS that you created earlier (for example
/ciphertrust_ddc).
IMPORTANT: Use the same directory name as you configured earlier in "11. Creating DDC Directory
Under HDFS" in the "Thales Data Platform Installation Guide".
d. Server Certificate - as HTTPS is forced as the protocol for Knox you have to import the server
certificate here.
Click the Choose File button in the Select Server Certificate section and import the server certificate.
IMPORTANT: Use the same certificate as you exported earlier in "13. Exporting the Knox Server
Certificate" in the "Thales Data Platform Installation Guide".

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 22
Configuring CipherTrust Manager

NOTE
> The certificate is typically self-signed so you have to export it first on the Knox server and
then import on the machine where you are running the CipherTrust Manager console. The
certificate is then permanently stored in DDC. You do not need to repeat this step on
another DDC client machine.
> Binary format certificates are not accepted. You can only import a plain-text certificate
(such as for example, the Base-64 encoded X.509 standard).

e. In the AUTHENTICATION section fill in the Username and Password string for a Knox authorized
user.
IMPORTANT: Use the same authentication details as you configured earlier in "9. Configuring Knox" in
the "Thales Data Platform Installation Guide".
2. Click Save Changes to save the configuration.
The connection was successful if no error is returned and this message is displayed:
“Success. HDFS settings have been updated”

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 23
Agent Configurations

Agent Configurations
DDC supports two types of Agent configurations:
> Local: Agent is installed and configured directly on the machine that contains sensitive data.
> Proxy: Agent is installed and configured on a proxy machine that is used to scan sensitive data on other
machines.
The instructions to install and configure Agents in both types of configurations are the same.

Agent Compatibility and Installers


The following table lists supported Agent installers for different types of data stores for different platforms and
databases to help you select an appropriate installer depending on your data store requirements.

Data Store Data Store Type Agent Agent Installer Packages


Category Configuration

Local Storage RHEL, CentOS Local RHEL (.rpm) - 32 bit, 64 bit

Debian based Local Debian (.deb) - 32 bit, 64 bit, 64 bit with database
distros support (compatible with Debian 9 only)

Windows Local Windows (.msi) - 32 bit, 64 bit, 32 bit with


database support, 64 bit with database support

Database IBM DB2 11.1 and Local, Proxy Windows (.msi) - 32 bit with database support, 64
Storage higher bit with database support

Microsoft SQL 2005 Local, Proxy Windows (.msi) - 32 bit with database support, 64
and higher bit with database support

Oracle 9 and higher Local, Proxy Windows (.msi) - 32 bit with database support, 64
bit with database support

Network Storage Unix File Share Proxy Debian (.deb) - 32 bit, 64 bit, 64 bit with database
(NFS) support (compatible with Debian 9 only)
RHEL (.rpm) - 32 bit, 64 bit

Windows Share Proxy Windows (.msi) - 32 bit, 64 bit, 32 bit with


(SMB, CIFS) database support, 64 bit with database support

Big Data Hadoop Cluster Proxy Debian (.deb) 64 bit with database support
(compatible with Ubuntu 18 only)*

* Linux 3 64-bit "database runtime" version with additional packages for use with Hadoop Clusters only. For Hadoop as data
store, Agents must have the following packages installed:

• libaio1

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 24
Agent Configurations

• libgsasl7

• libxml2

• libprotobuf10

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 25
Installing Agents

Installing Agents
This section provides procedures for installing Agents for all supported operating systems.
1. Download the latest version of a compatible Agent installation file from the Thales Customer Support Portal.
The "Agent Compatibility and Installers" on page 24 lists all the data stores and their matching installer
packages.
2. Save the downloaded installer on the host machine where you want to install the Agent.
3. Follow the appropriate installation procedure for your operating system.

TIP Before you begin the installation, make sure that CipherTrust Manager is reachable from
the host where you are installing the Agent.

Installing Agents on RHEL


1. Navigate to the location where the Agent installation package (.rpm) is stored.
2. Log in as a user with root privileges (su) and Install the Agent by using the following command:
#rpm -ivh er2-2.x.xx-xxxx.rpm

For example:
#rpm -ivh er2-2.0.31-linux26-rh-x64.rpm

The package name that you use with the command may be different and depends on your system's
architecture and Agent type.
3. Connect the Agent to the active CipherTrust Manager node:
#er2-config -i <hostname|ip_address>

where, <hostname|ip_address> represents the IP address or hostname of the CipherTrust Manager


node.
4. Test the connection settings (on the data store that is using this host).
#er2-config -t

If the connection has been correctly configured, you should see the following message:
Testing connection setting...
Test SUCCESS. Saving settings
Configuration updated, please restart agent service
The configuration has been saved. Please restart the agent for the changes to take effect.

5. Restart the Agent:


• Option 1
#/etc/init.d/er2-agent restart

• Option 2
#/etc/init.d/er2-agent stop
#/etc/init.d/er2-agent start

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 26
Installing Agents

NOTE The installation script creates an erecon user in the erecon group. Please ensure
that this user (or group) is able to read all the files to scan. For security reasons, the account
has its password locked to ensure that the user is solely used by the DDC scanning agent.

Installing Agents on Debian


1. Navigate to the location where the Agent installation (.deb) package is stored.
2. Install the Agent by using the following command:
sudo dpkg -i er2_2.x.xx-xxxx.deb

For example:
sudo dpkg -i er2_2.0.31-linux26-x64.deb

The package name that you use with the command may be different and depends on your system's
architecture and Agent type.
3. Connect the Agent to the active CipherTrust Manager node:
sudo er2-config -i <hostname|ip_address>

where, <hostname|ip_address> represents the IP address or hostname of the CipherTrust Manager


node.
4. Test the connection settings (on the data store that is using this host).
sudo er2-config -t

If the connection has been correctly configured, you should see the following message:
Testing connection setting...
Test SUCCESS. Saving settings
Configuration updated, please restart agent service
The configuration has been saved. Please restart the agent for the changes to take effect.

5. Restart the Agent:


• Option 1
sudo /etc/init.d/er2-agent restart

• Option 2
sudo /etc/init.d/er2-agent stop
sudo /etc/init.d/er2-agent start

NOTE The installation script creates an erecon user in the erecon group. Please ensure
that this user (or group) is able to read all the files to scan. For security reasons, the account
has its password locked to ensure that the user is solely used by the DDC scanning agent.

Installing Agents on Windows


1. Log in to the host machine where you want to install the Agent as administrator.
2. Run the Agent installer.
3. In the Welcome screen of the setup wizard, click Next to continue.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 27
Installing Agents

4. The End-User Licence Agreement (EULA) screen is displayed.


a. Read the license agreement and select I accept the terms in the Licence Agreement.
b. Click Next to continue.
5. In the Choose Setup Type screen, select the Install option for the standard installation and click Next to
continue.
6. The Ready to Install screen is displayed.
a. Click Install to install the product in the default location.
b. If the User Access Control dialog box appears, click Yes to confirm.
The installation begins and the progress is shown under the Status progress bar.
7. During the installation, in a separate Node Configuration window, you are asked for the connection
details of the active CipherTrust Manager node.
a. Master server IP address or host name - specify the IP address or host name of the CipherTrust
Manager node.
b. Master server public key and Target Group - skip this configuration part as it is optional and currently
not used.
c. Click Test Connection to test the connection between the Agent and CipherTrust Manager.
– If the connection is properly configured, a confirmation will appear stating "Connectivity test is
successful". Click OK to close the prompt.
– If the connectivity test fails, click OK to close the prompt, make sure that CipherTrust Manager is
reachable from the Agent host, and retry the test.
d. Click Finish to complete the configuration.
8. After a successful Agent installation, click the Finish button to exit the wizard and complete the installation.

NOTE The installer creates a service called Enterprise Recon 2 Agent that runs under the
Local System user account.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 28
Uninstalling Agents

Uninstalling Agents
Uninstalling Agents from RHEL
To uninstall a DDC Agent:
1. Stop the DDC Agent.
/etc/init.d/er2-agent stop

2. Uninstall the DDC Agent.


rpm -e er2

Uninstalling Agents from Debian


To uninstall a DDC Agent:
1. Stop the DDC Agent.
sudo /etc/init.d/er2-agent -stop

2. Uninstall the DDC Agent.


sudo dpkg --remove er2

Uninstalling Agents from Windows


To uninstall a DDC Agent, you must be logged on as Administrator to the host where the Agent is running.
1. Navigate to the Control Panel > Programs and Features.
2. Locate the Enterprise Recon 2 Agent in the list of installed programs.
3. Right click the Agent and select Uninstall.
4. In the dialog box that is displayed, select to automatically close the Enterprise Recon 2 Agent
application, and click OK to continue.
5. Walk through the wizard.

TIP Alternatively, to uninstall a DDC Agent from CLI, run the following commands as
Administrator:
net stop "Enterprise Recon 2 Agent (<ARCH>)"
wmic product where name="Enterprise Recon 2 Agent (<ARCH>)" uninstall

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 29
Upgrading Agents

Upgrading Agents
Agents do not require an upgrade unless a feature available in an updated version of the Agent is needed.
Older versions of the Agent are compatible with newer versions of the Server. To upgrade an Agent, simply re-
install it. Prior to reinstalling, you have to uninstall the Agent.
> For instructions on uninstalling Agents, see:
• "Uninstalling Agents" on page 29
• "Uninstalling Agents" on page 29
• "Uninstalling Agents from Windows" on page 29
> For instructions on installing Agents, see:
• "Installing Agents" on page 26
• "Installing Agents" on page 26
• "Installing Agents" on page 26

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 30
Hardening the Deployment

Hardening the Deployment


CipherTrust Manager should be deployed into as secure an environment as possible. Every effort has been
made to make CipherTrust Manager as secure as possible, however, additional precautions should be taken
especially when CipherTrust Manager is deployed into an untrusted environment. Refer to "Hardening
Guidelines" in the "CipherTrust Manager Deployment Guide" for details.

General Hardening Recommendations


When a scan is run, all the data is transferred from data stores to Agents so that they can read and discover
info types. As a result, potentially sensitive data is transferred between data stores and Agents. Although we
cannot provide the comprehensive list of actions to secure the data being transferred, for security and
performance reasons, we offer at least the following recommendations:
> As a general rule, take all necessary actions to eliminate or minimize the transfer of data between data
stores and Agents.
> It is recommended to install a different Agent in each subnet with sensitive data, and to configure the
firewalls to block any connection to the sensitive data stores from Agents in different subnets. This ensures
that sensitive data does not leave the subnet and company policies are respected.
> Agents should not be in a different network security zone. This ensures that data does not cross the network
security zone boundaries.
> Agents should be used in the same subnet as the data stores for Big Data stores. This can prevent firewalls
from collapsing.
> Local Agents should be used, when possible, in data stores with huge amount of data. This ensures that the
data never leaves the server.
> Contact your corporate security teams if data stores reside in different network security zones to determine
where to locate the CipherTrust Manager appliance.
> Agents should be hardened to comply with the company security policies for the sensitivity of the data
located in the data stores they will scan, as all the scanned data will be transferred to the Agents.
> DDC should be configured with secure protocols (for example, using TLS) to connect to data stores
whenever possible, to ensure that the data travels protected with channel encryption.
> Use firewalls to restrict the access to data stores, so only the allowed computers can connect to their
endpoints.

Certificate Security Recommendations


> Certificate generation
• Certificate private keys must not be generated or transformed on public web services.
• The certificate format must not be transformed on public web services.
• Cryptographic algorithms and key length must satisfy a minimum security strength of 112 bits.
– For the symmetric key, we recommend to use at least AES128.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 31
Hardening the Deployment

– For the asymmetric key: use DSA-2048, RSA-2048, and ECDSA224-255.


• Certificate validity and Key usage period should not exceed 2 years.
• Digital signatures must use a strong hash function. We recommend using at least SHA256.
> Certificate usage
• User must keep secret his private key and related activation data.
• Receivers of a certificate must verify the certificate.
• Users must respect the PKI Certificate Policy requirements when using a certificate.
• Do not share Wildcard Certificates for Different Security Contexts or Security Levels.
> Certificate revocation
• Subscribers must revoke their certificates when necessary (for example, when it's compromised).
> Certificate renewal
• Subscribers must renew their certificate before it expires.

Securing the Hadoop Configuration


Here are some general guidelines on securing your Hadoop deployment. These recommendations are
general and apply to any Hadoop cluster.
> The Hadoop cluster should only be deployed inside a corporate or trusted network. You must use a VPC
network if deploying to a cloud.
> None of the Hadoop nodes should be exposed to the Internet.
> To access services from the Internet, it is recommended to use VPN access. If VPN is not available, it is
recommended to use jump servers or bastion hosts. For example, to access SSH (port 22), do not expose it
directly to the Internet. Instead, use a jump server or a bastion host.
> To access web services, such as DDC or Ambari, use a reverse proxy or some kind of a load balancer.
These devices normally sit in a DMZ (demilitarized zone) and forwards only a specific TCP port to the
Hadoop cluster.
> For Hadoop clusters configure the replication channels to use transport level encryption.
Additionally for the DDC-Hadoop configuration (PQS/HDFS), below you can find some guidelines that will help
you set secure credentials and secure TLC configuration. For details and instructions, refer to your Hadoop
documentation.
> Secure credentials
• Hadoop credentials must be rotated regularly.
• Credentials must be different for every installation.
• Password must meet the security requirements described below:
– At least 10 characters in length.
– Contains both uppercase (A-Z) and lowercase (a-z) alphabetic characters.
– Has at least one numeric (0-9) character.
– Has at least one special character (! @ # $ % ^ & * ( ) _ - + = , . / < > ? ; ' : \" [ ] \\ { } | ~ `).

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 32
Hardening the Deployment

– Does not contain spaces or tabs.


> Secure TLS configuration
• Disable SSL and TLS 1.0 and 1.1.
• Use secure renegotiation or disable renegotiation.
• Disable TLS compression.
• Use Authenticated Encryption cipher suites.
• Use cipher suites with strong key exchange.
• Do not use cipher suites with known vulnerabilities.
• Cipher suites order must be defined by the user.
• Use Perfect Forward Secrecy (PFS).
• Download the tool from https://testssl.sh/ to check the configuration.

Thales CipherTrust Data Discovery and Classification 2.2.0 : Deployment Guide


17 February 2021, Copyright © 2021 Thales Group. All rights reserved. 33

You might also like