You are on page 1of 5055

HP Vertica Documentation

HP Vertica Analytics Platform


Software Version: 7.1.x

Document Release Date: 7/21/2016

Legal Notices
Warranty
The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be
construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
The information contained herein is subject to change without notice.

Restricted Rights Legend


Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer
Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial
license.

Copyright Notice
Copyright 2006 - 2013 Hewlett-Packard Development Company, L.P.

Trademark Notices
Adobe is a trademark of Adobe Systems Incorporated.
Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation.
UNIX is a registered trademark of The Open Group.

HP Vertica Analytics Platform

Page 2 of 5055

HP Vertica Documentation
Contents

Contents
Supported Platforms

New Features

19

Concepts Guide

72

Getting Started Guide

127

Installation Guide

186

Administrator's Guide

327

Analyzing Data Guide

1470

Flex Tables Guide

1640

SQL Reference Manual

1771

Extending HPVertica Guide

3598

Connecting to HP Vertica Guide

3838

HPVertica for SQLon Hadoop

4245

Hadoop Integration Guide

4254

HP Vertica Place Guide

4375

HP Vertica Pulse Guide

4542

Best Practices for OEMCustomers

4639

Connectivity Pack for Microsoft

4667

Glossary

4693

Third-Party Software Acknowledgements

4736

HP Vertica Analytics Platform (7.1.x)

Page 3 of 5055

HP Vertica Documentation
Contents

HP Vertica Analytics Platform (7.1.x)

Page 4 of 5055

Supported Platforms

HP Vertica Analytics Platform (7.1.x)

Page 5 of 5055

HP Vertica Documentation

HP Vertica Server and HP Vertica Management Console


Supported Operating Systems
Vertica Analytics Platform 7.1.x runs on the following 64-bit operating systems on x86_x64
architecture:
l

Red Hat Enterprise Linux 5.x* and 6.0 up to and including 6.7

Note: Red Hat Enterprise Linux 7 is supported only on the AMI supplied by Hewlett-Packard
for users who want to run HP Vertica on Amazon Web Services (AWS).

SUSELinux Enterprise Server 11.0 through 11.0 SP3

Oracle Enterprise Linux 6 - Red Hat Compatible Kernel only - HP Vertica does not support the
unbreakable kernel (kernels with a uel suffix)

Debian Linux 6* and 7.0 through 7.5

Cent OS 5.x* and and 6.0 up to and including 6.7

Ubuntu 12.04 LTS and 14.04 LTS

* HP is phasing out support for on these platforms. See End of Support Plan for more information.

Supported File Systems


Vertica Analytics Platform Enterprise Edition has been extensively tested on all supported Linux
platforms running ext3 or ext4 file systems. While other file systems have been successfully
deployed by some customers, Vertica Analytics Platform cannot guarantee performance or stability
of the product on these file systems. In certain support situations, you may be asked to migrate off
of these untested file systems to help you troubleshoot or fix an issue. In particular, several file
corruption issues have been linked to the use of XFS with HP Vertica; we strongly recommend not
using it in production.
Note: Vertica Analytics Platform 7.1.x does not support Linux Logical Volume Manager (LVM.)

Supported Browsers for HP Vertica Management Console


Vertica Analytics Platform7.1.x Management Console is supported on the following Web browsers:

HP Vertica Analytics Platform (7.1.x)

Page 6 of 5055

HP Vertica Documentation

Internet Explorer 10 and later

Firefox 24 and later

Google Chrome 27 and later

HP VerticaServer and Management Console


Compatibility
Each version of Vertica Analytics Platform Management Console is compatible only with the
matching version of the Vertica Analytics Platform server. For example, the Vertica Analytics
Platform 7.1 server is supported with Vertica Analytics Platform 7.1 Management Console only.

HP Vertica 7.1.x Client drivers


HP Vertica provides JDBC, ODBC, and ADO.NET client drivers. You can choose to download:
l

Client packages: All client packages contain the ODBC driver, the JDBC driver, and the HP
Vertica vsql client software for the specified operating system. 64-bit packages contain both 32and 64-bit installers.

The ADO.NET driver is not available in a client package, but rather is part of the HP Vertica
Microsoft Connectivity Pack. If the installer detects the presence of Microsoft SQL server,
ADO.net will be installed with additional software for MS connectivity. If the installer does not
detect the presence of SQL server, only the ADO.NET driver will be installed.

Individual drivers: You can download Vertica ODBC, JDBC, and ADO.NET drivers individually.

See HP VerticaDriver/Server compatibility to see which server versions are compatible with
Vertica 6.1.x drivers.

ADO.NET Driver
The ADO.NET driver is supported on the following platforms:
Platform

Processor

HP Vertica Analytics Platform (7.1.x)

Supported Versions

.NETRequirements

Page 7 of 5055

HP Vertica Documentation

Microsoft Windows

x86 (32-bit)

Windows 7
Windows 8

Microsoft Windows

x64 (64-bit)

Microsoft .NET
Framework 3.5 SP1 or
later

Windows 7
Windows 8

Microsoft Windows

x86 (32-bit)

2008 R2

x64 (64-bit)

2008 R2

Server
Microsoft Windows
Server

2012

JDBC Driver
All JDBC drivers are supported on any Java 5-compliant platform. (Java 5 is the minimum.)

HP Vertica Analytics Platform (7.1.x)

Page 8 of 5055

HP Vertica Documentation

ODBC Driver
Vertica Analytics Platform provides both 32-bit and 64-bit ODBC drivers. HP Vertica 7.1.x ODBC
drivers are supported on the following platforms:
Platform

Processor

Supported Versions

Driver Manager

Microsoft Windows

x86 (32-bit)

Windows 7

Microsoft ODBC

Windows 8
Microsoft Windows

x64 (64-bit)

MDAC 2.8

Windows 7
Windows 8

Microsoft Windows

x86 (32-bit)

Server

Microsoft Windows

2008
2008 R2

x64 (64-bit)

Server

2008
2008 R2
2012

Red Hat Enterprise

x86_64

5* and 6

Linux
SUSE Linux Enterprise

iODBC 3.52.6 or
later

x86_64

11

unixODBC 2.2.14 or
later

Oracle Enterprise Linux

x86_64

(Red Hat Compatible

DataDirect 5.3 and


6.1 or later

Kernel only)
CentOS

x86_64

5* and 6

Ubuntu

x86_64

12.04 LTS

HP Vertica Analytics Platform (7.1.x)

Page 9 of 5055

HP Vertica Documentation

Platform

Processor

Supported Versions

Driver Manager

AIX

PowerPC

5.3 and 6.1

iODBC 3.52.6 or
later
unixODBC 2.3.0 or
later
DataDirect 5.3 and
6.1 or later

HP-UX

IA-64

11i V3

iODBC 3.52.6 or
later

Solaris

SPARC

10

unixODBC 2.2.14 or
later

Mac OS X

x86_64

10.7, 10.8, and 10.9

DataDirect 5.3 and


6.1 or later

* HP is phasing out support for on these platforms. See End of Support Plan for more information.

HP Vertica Analytics Platform (7.1.x)

Page 10 of 5055

HP Vertica Documentation

Vertica Analytics Platform Driver/Server Compatibility


The following table indicates the Vertica Analytics Platform driver versions that are supported by
different Vertica Analytics Platform server versions.
SHA password security is supported on client driver and server versions 7.1.x only

Client Driver Version Compatible Server Versions


6.1.x

6.1.x, 7.0.x, 7.1.x

7.0.x

7.0.x, 7.1.x

7.1.x

7.1.x

vsql Client
The HP Vertica vsql client is provided in all client packages. You cannot download vsql separately.
The vsql client is supported on the following platforms:
Operating System

Processor

Microsoft Windows

x86, x64

Windows 2008 & 2008 R2, all variants

Windows 2012, all variants

Windows 7, all variants

Windows 8.0, all variants

Red Hat Enterprise Linux 5

x86, x64, IA64

Red Hat Enterprise Linux 6

x86, x64

SUSE Linux Enterprise 11

x86, x64

Oracle Enterprise Linux 6 (Red Hat Compatible Kernel only)

x86, x64

CentOS 5

x86, x64, IA64

CentOS 6

x86, x64

HP Vertica Analytics Platform (7.1.x)

Page 11 of 5055

HP Vertica Documentation

Operating System

Processor

Ubuntu 12.04LTS

x86, x64

Solaris 10

x86, x64, SPARC

AIX 5.3, 6.1

PowerPC

HPUX 11i V3

IA32, IA64

Mac OS X 10.7, 10.8, 10.9

x86, x64

Perl and Python Requirements


You can use HP Vertica's ODBCdriver to connect applications written in Perl or Python to the
Vertica Analytics Platform.

Perl
To use Perl with HP Vertica, you must install the Perl driver modules (DBI and DBD::ODBC) and a
HP Vertica ODBC driver on the machine where Perl is installed. The following table lists the Perl
versions supported with HP Vertica 7.1.x.
Perl Version

Perl Driver Modules

5.8

DBIdriver version 1.609

5.10

DBD::ODBCversion 1.22

ODBCRequirements
See HP Vertica drivers table.

Python
To use Python with HP Vertica, you must install the pyodbc module and an HP VerticaODBC
driver on the machine where Python is installed. The following table lists the Python versions
supported with HP Vertica 7.1.x:
Python Version

Python Driver Module

ODBCRequirements

2.4.6

pyodbc 2.1.6

See HP Vertica ODBCdrivers table.

2.7.3

pyodbc 3.0.6

3.3.4

pyodbc 3.0.7

HP Vertica Analytics Platform (7.1.x)

Page 12 of 5055

HP Vertica Documentation

HP Vertica SDKs
This section details software requirements for running User Defined Extensions (UDxs) developed
using the HP Vertica SDKs.

C++SDK
The HP Vertica cluster does not have any special requirements for running UDXs written in C++.

Java SDK
Your HP Vertica cluster must have a Java runtimeinstalled to run UDxs developed using the HP
Vertica Java SDK. HP has tested the following Java Runtime Environments (JREs)with this
version of the HP Vertica Java SDK:
l

Oracle Java Platform Standard Edition 6 (version number 1.6)

Oracle Java Platform Standard Edition 7 (version number 1.7)

OpenJDK 6 (version number 1.6)

OpenJDK 7 (version number 1.7)

R Language Pack
The HP Vertica R Language Pack provides version 3.0.0 of the R runtime and associated libraries
for interfacing with HP Vertica. You install the R Language Pack on the HP Vertica server.

HP Vertica Integrations for Hadoop


HP Vertica 7.1.x is supported with these Hadoop distributions:
Distribution

Version

Cloudera (CDH)

4.6
5.2
5.3

HortonWorks Data Platform (HDP)

2.1
2.2 (without HCatalog)

HP Vertica Analytics Platform (7.1.x)

Page 13 of 5055

HP Vertica Documentation

MapR

3.1.1
4.0

For HDP, if you want to use the HCatalog Connector use version 2.1. HDP 2.2 introduces a newer
version of Hive that is not yet supported with HCatalog. However, you can use the ORC Reader,
an alternative to the HCatalog Connector, with HDP2.2.

HP Vertica Connector for Hadoop MapReduce


HP provides a module specific to Hadoop for HP Vertica client machines.
HP Vertica provides two different Connectors: one for Apache Hadoop 1.0.0 and earlier versions,
and the other for Apache Hadoop 2.0.0. The table below details supported software versions
HP Vertica Connector for Apache
Hadoop MapReduce for Hadoop
1.0.0 and earlier
Apache Hadoop
and Pig
Combinations

Apache Hadoop 0.20.2 and Pig 0.7.0

Apache Hadoop 0.20.205.0 and Pig

HP Vertica Connector for


Apache Hadoop MapReduce
for Hadoop 2.0.0
Apache Hadoop 2.0.0 and Pig
0.10.0

0.9.1
l

Apache Hadoop 1.0.0 and Pig 0.9.2

Cloudera
Distribution
Versions

Cloudera Distribution Including Apache

Cloudera Distribution Including

Hadoop (CDH) 3

Apache Hadoop (CDH) 4

Hortonworks
Data Platform
Versions

Hortonworks Data Platform (HDP) 1.1.1 N/A

MapR

MapR M5 Edition

N/A

Packs, Plug-Ins, and Connectors for HP Vertica Client


Machines
HP provides the following optional modules for HP Vertica client machines.

MS Connectivity Pack
The HP Vertica Microsoft Connectivity Pack is supported on the following platforms:

HP Vertica Analytics Platform (7.1.x)

Page 14 of 5055

HP Vertica Documentation

Operating
System
Versions

.Net Requirements (Versions


required by ADO.net and OLE
Visual Studio DB drivers for specific
Processor Requirements versions of Visual Studio)

SQLServer
Requirements

Windows

x86_x64

2003&2003
R2, all
variants

Visual Studio

Microsoft .NET Framework 3.5

SQLServer

2008

SP1

2008: R2,
SP2, R2

Visual Studio

&SP2

2008 SDK

Windows
2008 R2,

Visual Studio

Microsoft .NET Framework 3.5

SQL Server

(and later)

2010

SP1

2012

Windows 7,

Visual Studio

and (both needed)

all variants

2010 SDK

Microsoft .NET Framework 4.0

Windows 8,
all variants

Visual Studio

Microsoft .NET Framework 3.5

2012

SP1

Server

Visual Studio

and (both needed)

2012, all

2012 SDK

Windows

variants

Microsoft .NET Framework 4.5

Visual Studio 2008 (and 2008 SDK) must be used with SQLServer 2008.

Informatica PowerCenter Plug-In


The HP Vertica plug-in for Informatica PowerCenter is supported on the following platforms:

HP Vertica Analytics Platform (7.1.x)

Page 15 of 5055

HP Vertica Documentation

Plug-in
Version

Operating System

Informatica PowerCenter
Versions

HP Vertica
Versions

7.x

Microsoft Windows

9.x

6.x (limited
functionality)

Windows 2003 & 2003 R2


all variants

7.x (all
enhancements)

Windows 2008 & 2008 R2


all variants

Windows 7 all variants

Red Hat Enterprise Linux 5 (32


and 64 bit)
Solaris
AIX
HP-UX

HP Vertica on Amazon Web Services


HP provides a preconfigured AMI for users who want to run a Vertica Analytics Platform on
Amazon Web Services (AWS). This HP-supplied AMI is the officially-supported version of the
Vertica Analytics Platform for AWS. It allows users to configure their own storage and has been
configured for and tested on AWS.
Note: HP develops AMIs for HP Vertica on a different schedule than the core software product
release. Therefore, AMIs will be available after the HP Vertica software release, but not at the
same time.

HP Vertica in a Virtualized Environment


HP Vertica runs in the following virtualization environment:

Host
l

VMware version 5.5

The number of virtual machines per host did not exceed the number of physical processors

HP Vertica Analytics Platform (7.1.x)

Page 16 of 5055

HP Vertica Documentation

CPU frequency scaling turned off at the host level and for each virtual machine

VMware parameters for hugepages set at version. 5.5 defaults

Input/Output
l

Measured by vioperf concurrently on all Vertica nodes When running vioperf, provide the
duration=2min option and start on all nodes concurrently

25 megabytes per second per core of write

20+20 megabytes per second per core of rewrite

40 megabytes per second per core of read

150 seeks per second of latency (SkipRead)

Thick provisioned disk, or pass-through-storage

Network
l

Dedicated 10G NIC for each Virtual Machine

No oversubscription at the switch layer, verified with vnetperf

Processor
l

Architecture of Sandy Bridge (HP Gen8 or higher)

8 or more virtual cores per virtual machine

No oversubscription

vcpuperf time of no more than 12 seconds ~= 2.2 GHz clock speed

Memory
l

Pre-allocate and reserve memory for the VM

4G per virtual core of the virtual machines

HP has tested the configuration above. While other virtualization configurations may have been
successfully deployed by customers in development environments, performance of these
configurations may vary. If you choose to run HP Vertica on a different virtualization configuration

HP Vertica Analytics Platform (7.1.x)

Page 17 of 5055

HP Vertica Documentation

and experience an issue, the HP VerticaSupport team may ask you to reproduce the issue using
the configuration described above, or in a bare-metal environment, to aid in troubleshooting.
Depending on the details of the case, the Support team may also ask you to enter a support ticket
with your virtualization vendor.

End of Support Plan


HP Vertica 7.1.x is the last release for which HP will support the component/platform combinations
listed below. HP Vertica 7.1.x service packs and hotfixes will continue to include these
combinations. However, new minor and major versions (7.x and 8.x and later) will not include
support for these combinations.

SERVER
Platform

End of Support Version

RHEL

5.x

CentOS

5.x

Debian Linux

ODBC Client Driver


Platform

Processor

End of Support Version

Red Hat Enterprise Linux

x86_64

CentOS

x86_64

HP Vertica Analytics Platform (7.1.x)

Page 18 of 5055

New Features

HP Vertica Analytics Platform (7.1.x)

Page 19 of 5055

New Features and Changes in HP Vertica 7.1 SP2 (7.1.2)


Read the topics in this section for information about new and changed functionality in HP Vertica
7.1 SP2 (7.1.2).

New HP Vertica Place Functionality


The following function has been added to HP Vertica Place:
l

STV_Refresh_Index Appends newly added or updated polygons to an existing spatial index.

STV_LineStringPoint Retrieves the vertices of a linestring or multilinestring.

New HP Vertica Pulse Functionality


l

Case-Sensitive Sentiment Analysis

Concurrent User Defined Dictionaries

Dictionary And Mapping Labels

New HP Vertica Pulse Functions

Case-Sensitive Sentiment Analysis


By default, Pulse ignores upper and lower case when performing sentiment analysis. 'ERROR'
produces the same results as 'error'. In version 7.1 SP2 and later, you can specify a case setting for
a word by using the $Case parameter when you add that word to a dictionary. For example, to
identify Apple rather than apple, you would add the following:
=> insert into pulse.white_list_en values('$Case(Apple)');
=> commit;

Concurrent User Defined Dictionaries


In version 7.1 SP2 and later, users can apply dictionaries on a per-user basis. Any number of Pulse
users can concurrently apply different sets of dictionaries without conflicts and without disrupting
other users' sessions. Each user can have one dictionary of each type loaded at any given time. If a
user does not specify a dictionary of a given type, Pulse uses the default dictionary for that type.

HP Vertica Analytics Platform (7.1.x)

Page 20 of 5055

Dictionary And Mapping Labels


You can apply a label to any user defined dictionary or mapping when you load that object. Labels
enable to you perform sentiment analysis against a predetermined set of dictionaries and mappings
without having to specify a list of dictionaries. For example, you might have a set of dictionaries
labeled "music" and a set labeled "movies." The default user dictionaries automatically have a label
of "default."
A single dictionary or mapping can have multiple labels. For example, you might label a white list of
artists as both "painters" and "renaissance." You could load the dictionary by loading either label. A
label can only apply to one dictionary of each type. For example, you cannot have two stop words
dictionaries that share the same label. If you apply a label to multiple dictionaries of the same type,
the most recently applied label prevails.
You can view the labels associated with your current dictionaries using the
GetAllLoadedDictionaries() function. You can view the label associated with your current mapping
using the GetLoadedMapping() function.

New HP Vertica Pulse Functions


HP Vertica includes the following new functions:
l

GetAllDictionarySetLabels()

GetAllLoadedDictionaries()

UnloadLabeledDictionary()

UnloadLabeledDictionarySet()

UnloadLabeledMapping()

OLE DB Driver Added for SSAS


This release adds the OLE DB driver to the HP Vertica Connectivity Pack for Microsoft. The
Connectivity Pack offers both 32- and 64-bit versions of the driver. Use the OLE DB driver only with
the Microsoft component, SQL Server Analysis Services (SSAS).
For information on the OLE DB properties and using the driver with SSAS click here to refer to the
guide, Connectivity Pack for Microsoft.
Download the latest HP Vertica Connectivity Pack for Microsoft from my.vertica.com (logon
required).

HP Vertica Analytics Platform (7.1.x)

Page 21 of 5055

Row Counts for External Tables


HP Vertica 7.1.2 provides the capability to calculate the exact number of rows in an external table.
The Optimizer uses this count to optimize for queries that access external tables.
In particular, if an external table is part of a join, the Optimizer can now identify the smaller table to
be used as the inner input to the join. Identifying the inner input results in better query performance.
You can use the following functions to analyze and drop the row count statistic for external tables:
l

ANALYZE_EXTERNAL_ROW_COUNT

DROP_EXTERNAL_ROW_COUNT

Flex Tables Load and Query


This section presents changes and additions to the HP Vertica flex tables.

New Parameters for fjsonparser


This release includes new parameters for the fjsonparser:
l

start_point_occurrence

key_separator

suppress_nonalphanumeric_key_chars

For a description of these parameters, see the Flex Tables Guide, FJSONPARSER. Using the new
parameters is presented in Loading JSONData

New Parameter for fdelimitedparser and mapdelimitedextractor


Function
HP Vertica 7.1.2 has a new parameter for the flex parser fdelimitedparser, and the
mapdelimitedextractor function.
The new parameter is treat_empty_val_as_null. The default value is true.
When loading delimited data, the fdelimitedparser (or mapdelimitedextractor function) now
includes a NULL for any row with an empty value, rather than an empty string (''). To continue using
empty strings when loading delimited data keys without a value, as in previous releases, specify
treat_empty_val_as_null=false.
See FDELIMITEDPARSER and MAPDELIMITEDEXTRACTOR in the Flex Tables Guide.

HP Vertica Analytics Platform (7.1.x)

Page 22 of 5055

Improvements to GROUP BY Aggregation


HP Vertica 7.1.2 includes several major improvements to the GROUPBYaggregation
functionality:
l

The new CUBE aggregation performs the aggregation for all permutations of CUBE
expressions.

The new GROUPING SETS aggregation allows you to specify exactly which groupings of
aggregations you need.

You can now use multiple ROLLUP, CUBE, or GROUPING SETSaggregations in a single
GROUPBYclause.

For a description of how these aggregations integrate with multi-level aggregations, see
Aggregating GROUP BY Results.

New Hadoop and Storage Location Features


This release adds the following Hadoop-related features:
l

Kerberos integration for Hadoop. See Using Kerberos with Hadoop. This has been tested with
Hortonworks (HDP 2.2), Cloudera (CDH5.2 and 4.7) and MapR (4.0.1).

Direct reading of ORC files, a native Hadoop format, for improved efficiency. See Reading ORC
Files Directly.

Sharing of HCatalog configuration information. The hcatUtil script now copies HCatalog
configuration parameters from Hadoop so they do not need to be explicitly specified to
CREATEHCATALOGSCHEMA. See Configuring HP Vertica for HCatalog.

This release also adds improvements to the management of storage locations. Changes can now
be enforced immediately instead of waiting for the next automatic update, which simplifies the
process of retiring a storage location. Storage-location operations can also now be applied to all
nodes with a single command. See Managing Storage Locations and the corresponding metafunctions in the SQL Reference Manual.

HP Vertica Analytics Platform (7.1.x)

Page 23 of 5055

Using MC to Import HP Vertica in a Hadoop Environment


As of HP Vertica 7.1.2, you can use Management Console to connect to and monitor an HP Vertica
database that resides in an Apache Hadoop environment.
l

Connect to an Apache Ambari server through the Connect using an Ambari server to import
Vertica within a Hadoop environment button under the Provision section of Management
Console.

Monitor and update your Hadoop environment from the Management Console Existing
Infrastructure page.

For detailed instructions, see Import and Monitor HP Vertica in a Hadoop Environment.

New Installer for ODBCDriver on Mac OSX


This release adds a Mac OSX installer for the HP Vertica ODBC driver on Mac OS X. The installer
for the ODBC driver on Mac OS X is packaged as a .pkg file. You can run the installer as a regular
Mac OS X installer or silently. This driver is compatible with both 32-bit and 64-bit applications.
For more information on installing the ODBC driver on Mac OS X, see Installing the ODBC Driver
on Mac OS X.
Download the latest HP Vertica ODBC driver on Mac OS X from my.vertica.com (logon required).

MC Manager and Associate Roles


This release adds new roles to Management Console for HP Vertica on Demand users.
The Manager role allows users to:
l

Monitor databases on which the user has privileges


View the database overview and activity pages

Monitor the node state

View messages and mark them read or unread

View database settings.

Anything database-related beyond that scope depends on the mapped database user's privileges
granted on the database through GRANT statements.

HP Vertica Analytics Platform (7.1.x)

Page 24 of 5055

The Associate role allows full privileges to monitor activity and messages on databases managed
by MC. Associate users may have other database privileges, depending on the database user
account to which they are mapped. These privileges can include modifying settings, installing
licenses, and viewing the database designer.
See About MC Users and About MC Privileges and Roles.

Licensing Changes
HP Vertica no longer counts each delimiter character as a 1-byte value against your license data
limit. Runnning a license audit after upgrading results in less total data than the previous version.

Documentation Updates
Documentation on partially sorted GROUPBY was erroneously included in the documentation and
has been removed. This functionality is currently not included in the product.

New Features and Changes in HP Vertica 7.1 SP1 (7.1.1)


Read the topics in this section for information about new and changed functionality in HP Vertica
7.1 SP1 (7.1.1).

Client Authentication in Catalog


Client authentication information is now stored in the database catalog. Authentication information
is no longer stored in the individual vertica.conf files on each node. Any authentication
information that still exists in vertica.conf is obsolete. You can no longer use Administration
Tools to define client authentication.
When you upgrade to HP Vertica 7.1.1, the client authentication records in vertica.conf are
automatically converted and stored in the catalog on every node. There are a few steps that you
must take to fully configure authentication.
For details, see Upgrading Client Authentication in HP Vertica.
Storing client authentication information in the catalog has several benefits:
l

All client authentication records are stored in the same location using a consistent format.

Any changes to client authentication when a node goes down are automatically recovered when
the node restarts.

You use SQLcommands to:

HP Vertica Analytics Platform (7.1.x)

Page 25 of 5055

Create and manage authentication information.

Grant and revoke authentication methods for individual users or for user roles.

Assign a default authentication method. A user who has not been granted an authentication
method is authenticated with the default method.

Temporarily grant or revoke authentication methods for specific users and user roles or for all
users and user roles.

For detailed information, see How Client Authentication Works.

Flex Tables Load and Query


This release includes several changes and additions to Flex Tables. For more details about each of
the changes and new features, see the Flex Tables Guide.

New fdelimitedpairparser Parser


This release includes fdelimitedpairparser, a simplified version of the fcefparser. Use the
fdelimitedpairparser to parse key-value pairs.

Updated fcefparser Parser


The fcefparser now fully supports log files that are compliant with the HPArcSight Common
Event Format (CEF) log files. This release includes the following changes and additions:
l

Changes default separator from a comma (,) to an empty string ('')

Support for CEFprefix sections

Follows HPArcSight CEFrules for escape characters, which are:


n \n
n

\\

New line \.)

HP Vertica Analytics Platform (7.1.x)

Page 26 of 5055

New MAPLOOKUP()buffer_size Parameter


The maplookup()function supports a new, optional buffer-size parameter. Use this parameter to
specify the maximum byte size of column values processed by maplookup().

Secure Password File


HP Vertica 7.1 SP1 now stores saved passwords in a secure file. The vbr.py backup configuration
utility creates a user-named file that contains the database password as well as the password to
the rsync service account. Only the dbadmin, or members of that user's Linux permission group,
can view the contents of the file. If you alter the file permissions from their default values (xx0), HP
Vertica backup or restore actions fail with an error message. If you do not save any passwords,
running vbr.py still creates the password file, but leaves the file empty.

Rsync Service Account


HP Vertica 7.1 SP1 now has the ability to run rsync under an authenticated service account. If you
choose to save the password for the rsync account, HP Vertica stores it in the secure password
configuration file.

New System Table


The HP Vertica 7.1.1 release includes one new system table in the V_MONITORschema,
PARTITION_COLUMNS. This table shows, for each projection of each partitioned table, the disk space
used by each column on each node.
For detailed information, see the SQL Reference Manual.

New Configuration Parameter Options for SET, CLEAR,


and SHOW
HP Vertica 7.1 SP1 introduces a new way of setting, clearing, and viewing configuration
parameters.
You can now set and clear some parameter values at the database, node, or session level using
ALTER statements.
Additionally, the new SHOW SESSION, SHOWCURRENT, SHOW NODE, and SHOW
DATABASE statements allow you to view configuration parameter settings at different levels.

HP Vertica Analytics Platform (7.1.x)

Page 27 of 5055

This version also introduces the new SESSION_PARAMETERS system table, which provides
information about configuration parameters at the session level.
For more information, see Setting and Clearing Configuration Parameters.

Command-line Distribution of Configuration Files


You can now distribute HP Vertica configuration files from the command line, as follows:
$ admintools -t distribute_config_files

This option provides the same functionality currently available through the Administration Tools
interface.

New scrutinize Functionality


For detailed information, see the Administration Guide, Collecting Diagnostics (scrutinize
Command).

New Options
The following options are new to the scrutinize diagnostics tool:

--diag-dump

Collects HP Vertica queries, system tables, and Data Collector tables. This
option replicates the behavior of the superseded collect_diag_dump tool.

--diagnostics

Collects log file data and runs commands against HP Vertica and its host
system. This option replicates the behavior of the superseded diagnostics
utility.

--log-limit limit

Limits how much data is collected from HP Vertica logs, where limit
specifies in gigabytes how much log data to collect, starting from the most
recent log entry.

--tmpdir=path

Specifies a temporary directory to use on all nodes. Use this option to ensure
that the temporary directory has enough free space so scrutinize can
complete execution.

Enhanced collection capabilities


scrutinize now performs the following tasks:

HP Vertica Analytics Platform (7.1.x)

Page 28 of 5055

Collects cluster hardware configuration from systems where the lshw utilty is installed.

Collects Kerberos configuration data if it is installed on the cluster.

Export of License Audit Results to CSV


You can use the admintools license_audit tool to audit a database and export the results to a
CSV file.

Projections Quick Stat forManagement Console


In HP Vertica 7.1, Management Console introduces a new Quick Stat for the Overview Page. This
projections Quick Stat displays the total projection count and number of unsegmented and unsafe
projections for the database's schema with the largest number of projections.
For information about the Overview Page inManagement Console, see Viewing the Overview
Page.

JDBC Routable Query (Key/Value) API Improvements


In HP Vertica 7.1 SP1 the JDBC Key/Value API has been renamed to the JDBC Routable Query
API. The API introduces a new class called VerticaRoutableExecutor, which allows you to query
single nodes using SQL syntax instead and allows for JOIN and GROUP BY operations. For details
see Routing JDBC Queries Directly to a Single Node.

Pulse Multilingual Support


HP Vertica Pulse 7.1 SP1 now supports analyzing Spanish text. For details see Multilingual Pulse.

HP Vertica Analytics Platform (7.1.x)

Page 29 of 5055

HP Vertica Documentation

New Features and Changes in HP Vertica 7.1.0


This guide describes new features and changes that were added to the Vertica Analytics Platform
7.1.0 release.
For a list of known and resolved issues, see the HP Vertica 7.1.x Release Notes, available at
http://www.vertica.com/documentation

Live Aggregate Projections and Projections with


Expressions
HP Vertica 7.1 provides three new kinds of projections that aggregate data from large and
frequently updated tables. You query aggregated data from a live aggregate projection instead of
querying an anchor table and then aggregating the results. This approach eliminates resourceintensive computations because HP Vertica performs the calculations when it loads the data, not
each time you query the projection.

New Live Aggregate Projections Capability


HP Vertica 7.1 introduces live aggregate projections. A live aggregate projection contains column
values that have been aggregated from columns in its anchor table.
Querying data from a live aggregate projection is usually faster than querying data from an anchor
table and then aggregating it. Because the data is already aggregated, when you query the live
aggregate projection, reaggregation is not necessary. You receive the same results that you get
when querying from the anchor table.
When you create a live aggregate projection for a table and load data into the table, HP Vertica
aggregates the data from the anchor table and loads it into the live aggregate projection. On
subsequent loads, HP Vertica updates both the anchor table and the live aggregate projection.
When working with live aggregate projections, keep in mind:
l

When you create a live aggregate projection, you can no longer delete, update, or merge data in
the anchor table or the anchor projection. You can only add data to them.

A Top-K projection is a specific type of live aggregate projection.

The capability to create live aggregate projections is enabled by default.

For details, see:

HP Vertica Analytics Platform (7.1.x)

Page 30 of 5055

HP Vertica Documentation

Retrieving Aggregated Data from Tables

Live Aggregate Projections

Top-K Projections

New Top-K Projections Capability


HP Vertica 7.1introduces Top-Kprojections. Top-Kprojections are a type of live aggregate
projection, also new in HP Vertica 7.1. Top-Kprojections improve the performance of queries that
retrieve the top k rows per partition of selected rows. These queries are called Top-K queries.
For example, the following Top-K query retrieves the four most recent readings from a gas meter,
partitioned by meter_id and metric:
=> SELECT meter_id, metric, reading_time, reading_value
FROM readings
LIMIT 4 OVER (PARTITION BY meter_id, metric ORDER BY reading_time DESC);

For optimal performance of Top-K queries, create a Top-Kprojection that aggregates the data in the
table for fast access. Querying the preaggregated data directly from the Top-K projection is usually
faster than querying the data from the anchor table and then calculating the top k rows.
When you create a Top-K projection, keep in mind:
l

After you create a Top-K projection for a table, you can no longer delete, update, or merge data in
the anchor table or anchor projection. You can only add data to them.

The capability to create Top-K projections is enabled by default.

For details, see:


l

Retrieving Aggregated Data from Tables

Top-K Projections

New Capability to Use Expressions in Projection Definitions


In HP Vertica 7.1, you can create projections that use expressions in the column definitions. The
expression calculates values based on data in the anchor table and stores it in the projection. When
you load data into an anchor table that has a projection that uses expressions, HP Vertica
calculates the values using the data from that table. HP Vertica then inserts the calculated data into
the new projection.

HP Vertica Analytics Platform (7.1.x)

Page 31 of 5055

HP Vertica Documentation

Projections with expressions behave the same as normal projections, except you cannot perform
any merge operations on the anchor table.
A projection that uses expressions is not a live aggregate projection unless it aggregates the data in
one or more columns, If it does not aggregate data, you can continue to delete and update data in
the anchor table but you cannot perform any merge operations in the anchor table. If the projection
with expressions does aggregate data, it behaves like Live Aggregate Projections.
Using expressions when defining projections allows you to sort the data based on the calculated
results of an expression.
For details, see:
l

Retrieving Aggregated Data from Tables

Projections with Expressions

Security Model
HP Vertica 7.1 introduces these security features:
l

New and Modified Client Authentication Methods

Client Authentication Information Now Stored in Database Catalog

SHA-512 Now Available for Hash Authentication

Configuring Client Authentication with Key/Value Pairs

SSL Changes

New and Modified Client Authentication Methods


In HP Vertica 7.1, there are several changes to the client authentication methods that the database
supports:
l

GSS-API(Generic Security Services API)allows you to authenticate clients to Kerberos v5.

Note: For details about GSS-API implementation, see RFC2744 and RFC4121.

Hash authentication is new; it replaces MD5 authentication. It allows you to store passwords
using the MD5 algorithm and the more secure SHA-512 algorithm. You configure the hash

HP Vertica Analytics Platform (7.1.x)

Page 32 of 5055

HP Vertica Documentation

authentication and then define user-level and system-level parameters to specify which
algorithm to use.
l

TLS(Transport Layer Security) authentication now uses OpenSSL 0.9.8za.

You can use TLSfor any authentication method:


n

The HOST NO TLS syntax replaces hostnossl

The HOST TLSsyntax replaces hostssl.

You can now self-authenticate a client. To do this:


n

Use an SSL client certificate whose Common Name (CN) field contains the user name for the
target database.

Create an authentication method that uses an auth_type of tls with the authentication
method HOST TLS.
For information, see Configuring TLSAuthentication.

There are several new parameters for configuring LDAP. For details, see General
LDAPParameters.

The following client authentication methods have no changes:


l

Password. Password authentication sends passwords over the network in clear text. HP
Vertica recommends that you use hash authentication with the SHA512 algorithm.

Ident. The Ident protocol authenticates a database user with their system user name.

Trust. When you grant trust authentication to a user, they can always log in without a password.

Reject. When you grant reject authentication to a user, they can never log in.

For more information, see Client Authentication.

Client Authentication Information Now Stored in Database Catalog


In HP Vertica 7.1, client authentication information is now stored in the database catalog.
Authentication information is no longer stored in the individual vertica.conf files on each node.
Any authentication information that still exists in vertica.conf is obsolete. You can no longer use
Administration Tools to define client authentication.

HP Vertica Analytics Platform (7.1.x)

Page 33 of 5055

HP Vertica Documentation

When you upgrade to HP Vertica7.1, the vertica.conf file still contains all the client
authentication information, but your database does not use it. No client authentication is configured
for your database, and anyone can connect to the database.
You need to reconfigure all client authentication as described in Upgrading Client Authentication
Records from vertica.conf to the Catalog. Once you have reconfigured this information, you should
delete the client authentication records from the vertica.conf file.
This feature has several benefits:
l

All client authentication records are stored in the same location using a consistent format.

Any changes to client authentication when a node goes down are automatically recovered when
the node restarts.

You use SQLcommands to:


n

Create and manage authentication information.

Grant and revoke authentication methods for individual users or for user roles.

Assign a default authentication method. A user who has not been granted an authentication
method is authenticated with the default method.

Temporarily grant or revoke authentication methods for specific users and user roles or for all
users and user roles.

SHA-512 Now Available for Hash Authentication


HP Vertica 7.1 provides the capability to use the Secure Hash Algorithmspecifically, SHA-512
for password encryption.
When you upgrade to HP Vertica 7.1, the MD5 encryption is the default hash algorithm. The SHA512 algorithm provides more secure hashing than MD5.
You configure the hash algorithm in one of two ways:
l

At the system level, set the SecurityAlgorithm configuration parameter. Valid values are:
n

'NONE'

'MD5'

'SHA512'

HP Vertica Analytics Platform (7.1.x)

Page 34 of 5055

HP Vertica Documentation

This setting applies to all users, unless the DBADMINhas set the user-level parameter to 'MD5'
or 'SHA512'. In that case, the user-level value overrides the system-level value.
l

At the user level, use ALTER USERto set the Security_Algorithm user parameter. Valid values
are the same as for the system-level parameter:
n

'NONE'

'MD5'

'SHA512'

If the value of the user-level parameter Security_Algorithm is 'NONE', HP Vertica uses the
algorithm specified in the system-level parameter, SecurityAlgorithm. If both parameters are
'NONE', HP Vertica uses the MD5 algorithm.
Important: Before you upgrade, read about the issues you might encounter in Upgrade
Considerations for Hash Authentication.

See Also
l

Hash Authentication Parameters

How to Configure Hash Authentication

Configuring Client Authentication with Key/Value Pairs


In HP Vertica 7.1, you specify client authentication parameters as key/value pairs, as follows:
parameter_name = parameter_value

You use the new ALTERAUTHENTICATIONstatement to set and change LDAPand Ident
parameters.The following examples set parameters using key/value pairs:
=> ALTER AUTHENTICATION ident1 SET system_users='root';
=> ALTER AUTHENTICATION Ldap1 SET host='ldap://172.16.65.177',
binddn_prefix='cn=', binddn_suffix=',dc=qa_domain,dc=com';

For more information, see

HP Vertica Analytics Platform (7.1.x)

Page 35 of 5055

HP Vertica Documentation

ALTERAUTHENTICATION

Configuring LDAPAuthentication

Configuring Ident Authentication

SSL Changes
HP Vertica 7.1.0 introduces these changes to SSL:
l

New Parameters for Holding SSL Certificates and Key Files

Client Certificate Authentication

DSNChanges for ODBC

New Parameters Hold the Contents of Your SSL Certificate and Key Files
HP Vertica 7.1 adds three security configuration parameters to hold the contents of the certificate
and key files:
l

SSLPrivateKeySet this parameter to the contents of your servers private key (server.key)
file. Note that only the value of this parameter is visible to dbadmin users.

SSLCertificateSet this parameter to contents of your servers SSL certificate (server.crt)


file.

SSLCASet this parameter to the contents of SSL certificate authority (root.crt) file if using
mutual authentication.

These new parameters simplify distribution of your server certificate file (server.crt) and private
key (server.key) to server hosts in your cluster. BeforeHP Vertica7.1, the server certificate and
private key had to be copied to the HP Vertica catalog directory of each server host in your cluster.
Note: The SSLPrivateKey and SSLCertificate parameters are automatically set during the 7.1
upgrade, if EnableSSL=1 before the upgrade.
For a listing and description of these new configuration parameters and all other security
parameters, refer to the Administrator's Guide, Security Parameters.

HP Vertica Analytics Platform (7.1.x)

Page 36 of 5055

HP Vertica Documentation

Setting the New SSL Parameters


After you create your certificate and key files, set these parameters from the command line using
the statement:
ALTER DATABASE
If you use ALTERDATABASE to set the new parameters, you must include the actual contents of
the file, not the filename.
ALTER DATABASE exampleDB SET SSLPrivateKey ='<contents of server.key file>';
For more information on using this approach, see the SQL Reference Manual topic, ALTER
DATABASE.
Alternatively, use Admintools to point to and distribute the files as in previous releases; Admintools
sets the value of the parameters for you. For more information on using Admintools to distribute
SSL certificates and certifications, refer to the Administrator's Guide, Redistributing Configuration
Files to Nodes.
If you are upgrading from a previous version of HP Vertica and have already distributed your
certificate and key files, you must still use one of these approaches to prepare your parameters
for upgrade.

Common Questions
The following table provides answers to common questions on setting the new parameters.
Question

Response

Does the database have to be

No. The database does not have to be started. In

started to use Admintools to

Admintools, point to the locations of the certificate and

distribute certificate and key files?

key files, and Admintools sets the new parameters for


you.

If you are upgrading to HP

Yes. HP Vertica looks for the content of the certificate and

Vertica7.1 and have already

key files within the parameters. You must use Admintools

distributed your certificate and key

once again to point to the location of the certificate and

files, would you still need to use

key files so that the parameters are set for the upgrade.

Admintools to point to the location

Alternatively you can use ALTERDATABASE to set the

of the certificate and key files?

parameters.

HP Vertica Analytics Platform (7.1.x)

Page 37 of 5055

HP Vertica Documentation

If you use ALTERDATABASE to

Include the actual contents of the file when you set a

set the new parameters, do you use

parameter with the statement ALTERDATABASE.

the filename or the actual contents


of the file?

ALTER DATABASE exampledb SET SSLPrivateKey


= '<contents of server.key file>';

Client Certificate Authentication


Database users can now self-authenticate by using an SSL client certificate whose Common
Name (CN) field contains their user name for the target database. To enable user selfauthentication, you set the CN to the database user name for the target database when you create
the client key. For information on creating certificates and keys, refer to the Administrator's Guide,
Generating SSL Certificates and Keys.
After you set the CN to the database user name, you can create an authentication method using the
CREATE AUTHENTICATION statement together with the tls authentication method. You then
associate the method with the user using the GRANT AUTHENTICATION statement.

DSN Changes for ODBC


When you use an ODBC driver, HP Vertica 7.1 adds new DSN parameters for certification and key
file locations. You can use these parameters, SSLCertFile and SSLKeyFile, to point to the
locations of the client's public certificate file and the client's private key file, respectively. For a
listing of these and all the DSN parameters, refer to the Connecting to HP Vertica Guide section,
DSN Parameters.
For information on setting up a DSN, refer to the section, Setting Up a DSN.
For Windows ODBC, HP Vertica 7.1 also adds connection string properties within the ODBC Data
Source Administrator GUI, including SSL cert file and SSL key file fields under the Client Setting
tab.

HP Vertica Analytics Platform (7.1.x)

Page 38 of 5055

HP Vertica Documentation

Vertica Configuration File (vertica.conf) Changes


HP Vertica 7.1 introduces some changes to the vertica.conf files.
Previously, the vertica.conf files stored configuration parameter information. As of 7.1,
configuration parameter values are stored in the database catalog. By storing the values in the
database catalog, the state of configuration parameters remains consistent over all nodes, even if a
node was down when a parameter value changed.
This feature also eliminates the need for hand-editing node vertica.conf files because node-level
changes can be made in the catalog using SET_CONFIG_PARAMETER. In the past, you needed to log
on to each node and hand-edit its vertica.conf file to specify node-level parameter values, which
is risky and tedious.
Important: Hewlett-Packard now encourages the use of the SET and CLEAR parameters in
the ALTER NODE, ALTER DATABASE, and ALTER SESSION statements to set and clear
configuration parameters. See Setting and Clearing Configuration Parameters for more
information.
Specifically, client authentication information is now stored in the catalog, not in the vertica.conf
files. When you upgrade to HP Vertica7.1, the vertica.conf file still contains the client
authentication information defined by the ClientAuthentication configuration parameter, but your
database does not use it. No client authentication is configured for your database, and anyone can
connect to the database.
For instructions on configuring client authentication after you upgrade to HP Vertica7.1, see
Upgrading Client Authentication Records from vertica.conf to the Catalog.
For more information about configuration parameters, see Configuration Parameters in the
Administrator's Guide.

SDK Enhancements
HPVertica 7.1.0 introduces these SDK Enhancement features:
l

Change to SDK APIBinary Compatibility Policy

Increase in Allowed Number of Arguments for UDxs

Ability to Specify Dependency Path for UDx Libraries

HP Vertica Analytics Platform (7.1.x)

Page 39 of 5055

HP Vertica Documentation

Change to SDK API Binary Compatibility Policy


HPmay now make changes to the SDKAPIin the first service pack for each major release of the
HP Vertica Analytics Platform. This change allows HPto fix any issues found after the release of a
new version of the HP Vertica server. Without this policy change, HPwould not be able to resolve
issues that require a modification of the SDKAPI until the next major software release.
SDKAPIchanges require that UDx libraries be recompiled in order to work with the new version of
the API. See UDx Library Compatibility with New Server Versions in the Extending HPVertica
Guide for more information.

Allowed Number of Arguments for UDxs Increased


The maximum number of arguments that a UDx can accept has increased from 32 to 500.

Ability to Specify Dependency Paths for UDx Libraries


In previous versions of HP Vertica, you had two options to handle libraries on which your UDx
depended:
l

Bundle the library by statically linking it into your UDx library.

Manually copy the library to every node in your cluster to a location where the UDx could find it.
For example, copy a Java library to each node in your cluster. Then you update each node's
CLASSPATH to include the library's directory.

In HP Vertica Version 7.1.x, you can specify one or more directories containing support library files
using the the new DEPENDSkeyword in the CREATELIBRARYstatement. HP Vertica handles
these support libraries the same way it treats the UDx library itself. It copies the libraries to each
node in the cluster in a location where the UDx library can access them. For Java UDx libraries,
you can use this new feature as a way to set a Java classpath for each UDx library.
For more information, see Handling Java UDx Dependencies, Compiling Your C++ UDx, and
CREATE LIBRARY.

Flex Tables Load and Query


This release includes several changes and additions to Flex Tables. For more details about each of
the changes and new features, see the Flex Tables Guide.

HP Vertica Analytics Platform (7.1.x)

Page 40 of 5055

HP Vertica Documentation

Polystructured Data Support


This release adds polystructured data support for nested arrays and arbitrarily-deep map data. To
support polystructured data, several additions and changes to flex table functionality have been
made in the following areas:
l

Usability improvements in flex parsers and functions (new parameters)

New functions to extract semi-structured data from the database directly into a Vmap, without
first having to write such data to disk and load it with COPY and the flex parsers

Polystructured data query support by pass-through arguments in the MAPITEMS function

New Polystructured Data Extractor Functions


Three new scalar functions support polystructured data, returning a single row of map data from
existing database content. Each extractor function name includes the type of data it parses:
l

MAPDELIMITEDEXTRACTOR

MAPJSONEXTRACTOR

MAPREGEXEXTRACTOR

Each function processes a specific type of data, using identical arguments to the corresponding
flex table parser (FDELIMITEDPARSER, FJSONPARSER, and FREGEXPARSER). Use the
extractor functions to extract semi-structured data from database contents (such as an existing
table column or data returned from an expression), or entered directly as a string. The function
returns a single VMAProw that you can use in a flex or regular table column.
For more information, see the the Flex ExtractorFunctions Reference in the Flex
ExtractorFunctions ReferenceFlex Tables Guide.

New Regular Expressions Parser


This release includes a new Flex table parser for regular expressions.
For more information, see the FREGEXPARSER reference page.

New Options for FJSONPARSER


The FJSONPARSER has these new options:

HP Vertica Analytics Platform (7.1.x)

Page 41 of 5055

HP Vertica Documentation

start_point

reject_on_empty_key

omit_empty_keys

reject_on_materialized_type_error

Each of these options is described in the FJSONPARSER reference page.

New Options for FDELIMITEDPARSER


The FDELIMITEDPARSER has these new options:
l

reject_on_empty_key

omit_empty_key

reject_on_materialized_type_error

Each of these options is described in the FJSONPARSER reference page.

Changes to Flex Data Functions


The following Flex data functions now ignore empty keys:
l

BUILD_FLEXTABLE_VIEW

COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW

MATERIALIZE_FLEXTABLE_COLUMNS

Updates to MAPITEMS() Function


The MAPITEMS()flex map function now supports pass-through arguments. For more information,
see the MAPITEMS reference page.MAPITEMS

Improved Performance Using PARTITION BESTOption


For any flex map functions that require the use of an OVER()clause, HP Vertica recommends the
PARTITION BEST option. This option results in multi-threaded operations across all available nodes,

HP Vertica Analytics Platform (7.1.x)

Page 42 of 5055

HP Vertica Documentation

which can significantly improve performance when processing large amounts of raw data in flex
tables.
All of the flex table documentation examples now include the PARTITION BESTusage in the OVER()
clauses.

Data Management
HPVertica 7.1.0 introduces these Data Management features:
l

COPYStatement Changes

Log Projection Data in Database Designer

Swap Partitions Automatically

Column Encoding Compression Options

COPY Statement Changes


In this HP Vertica release, using COPY to bulk load data has these changes:
l

Apportioned load support. The COPYstatement already supported multi-threaded bulk loading
on a single node (parallel load). Now, load also occurs across multiple nodes to increase
performance.

LZO format support. COPYnow accepts LZO files, as it does other formats, such as BZIP and
GZIP files. For more information, see COPY in the SQL Reference Manual.

Logging Projection Data


HP Vertica 7.1 allows you to log information about the projections that theOptimizer recommends.
Database Designer considers these projections when creating a design. HP Vertica stores this
information in two Data Collector (DC)tables:
l

DC_DESIGN_PROJECTION_CANDIDATES

DC_DESIGN_QUERY_PROJECTION_CANDIDATES

When you enable logging and start Database Designer, the Optimizer proposes a set of ideal
projections based on the options that you specify. The logs contain information about:

HP Vertica Analytics Platform (7.1.x)

Page 43 of 5055

HP Vertica Documentation

Whether or not the projections are actually created when the design is deployed.

How projections are optimized.

Whether the projections are created with and without the ideal criteria that the Optimizer
identified.

If you do not deploy the design immediately, review the logs to determine if you want to change the
design and proposed projections. If Database Designer deployed the design, you can still manually
create some of the projections that Database Designer did not create in the deployed design.
By default, logging the Database Designer design data is disabled. To enable it, turn on the
configuration parameter DBDLogInternalDesignProcess:
=> ALTER DATABASE exampledb SET DBDLogInternalDesignProcess = '1';

For more information, see Logging Projection Data for Database Designer.

New Capability to Swap Partitions Atomically


HP Vertica 7.1 allows you to swap the partitions of two tables that have identical definitions using
the SWAP_PARTITIONS_BETWEEN_TABLESfunction. Use this feature if you are loading new
or updated data into a table on a regular basis.
You can swap partitions only between tables that have the same:
l

Column definitions

Segmentation

Partitioning expression

Number of projections

Projection sort order

Swapping partitions is an atomic operation. The swapping operation moves the data to a new
partition and then swaps the newly partition with the partition in the target table while maintaining
the integrity of the data. If any task in the swap operation fails, the entire operation fails.
For more information, see:

HP Vertica Analytics Platform (7.1.x)

Page 44 of 5055

HP Vertica Documentation

Swapping Partitions

Tutorial for Swapping Partitions

SWAP_PARTITIONS_BETWEEN_TABLES

New Column Encoding Compression Options


This release of HP Vertica adds two new table column encoding options:BZIP_COMP and GZIP_
COMP. These options let you use bzip2 and gzip compression on table columns. They result in
high compression, but also require more CPUtime to process, especially when loading data. See
Encoding-Type in the SQL Reference Manual for more information.

Usability and Administration


HPVertica 7.1.0 introduces these Usability and Administration features:
l

Active Standby Nodes

Management API

New Configuration Parameter Storage and Functionality

Dynamic Resource Pool Routing

New Configuration Parameters

Active Standby Nodes


An active standby node is a specialized type of HP Vertica node. An active standby node exists as
a backup node, ready to replace a failed node. Unlike standard HP Vertica nodes, an active standby
node does not perform computations or contain data. If a standard(permanent) node fails, an active
standby node can replace the failed node, after the failed node exceeds the failover time limit. When
it take the place of a failed node, the active standby node contains all of the projections and
performs all of the calculations of the replaced node.
To deploy active standby nodes automatically, you must first configure the
FailoverToStandbyAfter parameter. If possible, HP Vertica selects a standby node from the
same fault group as the failed node. Otherwise, HP Vertica randomly selects an available active
standby node.

HP Vertica Analytics Platform (7.1.x)

Page 45 of 5055

HP Vertica Documentation

If you are an administrator, you can manually replace a failed node using the ALTER NODE
command. You can specify a particular standby node to replace a failed node, or you can allow HP
Vertica to choose a node. As with automatic node replacement, HP Vertica defaults to a standby
node from the same fault group as the failed node. If the fault group has no available standby nodes,
HP Vertica selects any available active standby node.

Management API
HP Vertica 7.1 introduces the Management API. The Management API is a REST API that can be
used to manage HP Vertica databases with scripts or applications that understand REST and
JSON.
The Management API provides a subset of the database administration tools found in the
Management Console. Features include the ability to:
l

View host and network details for each node

View details on database version, structure, and state

View and update licenses

Start and stop databases on the entire cluster or individual nodes

View details about available database backups, start a new database backup, and restore
backups from specific archives

View and set database configuration parameters

Add hosts to the database, remove hosts from the database, and replace hosts with standby
hosts

Create and delete databases

View a list of running jobs

Run rebalance or workload analyzer on a database

Create and manage event listeners (also known as webhooks)

For complete details see Management API.

HP Vertica Analytics Platform (7.1.x)

Page 46 of 5055

HP Vertica Documentation

New Configuration Parameter Storage and Functionality


HP Vertica 7.1 introduces a new way of storing and setting configuration parameters. Configuration
parameter values are now stored in the database catalog, rather than in individual vertica.conf
files on each node, and can be set at the database or node level using SET_CONFIG_PARAMETER.
Because parameter values were stored in individual vertica.conf files, and the SET_CONFIG_
PARAMETER statement only acted on up nodes, it was possible that a down node would have
inconsistent parameter values when it returned to the cluster. Additionally, there was no way to tell
which vertica.conf file had the most up-to-date information.
By storing the values in the database catalog, the state of configuration parameters remains
consistent over all nodes. When a down node returns to the cluster, it retrieves the latest catalog
file and thus adopts the most up-to-date parameter values. The catalog storage feature also
assures that all nodes agree on what all other nodes' values are.
This new feature allows node-level parameter configuration though a new SET_CONFIG_PARAMETER
argument called node_name:
SET_CONFIG_PARAMETER (parameter_name, value, node_name)
This eliminates the need for hand-editing node vertica.conf files. In the past, you needed to log
on to each node and hand-edit its vertica.conf file to specify node-level parameter values. This is
discouraged because it is risky and tedious. The new feature allows you to set any one node's
parameter values from any other node using SET_CONFIG_PARAMETER.
Important: Hewlett-Packard now encourages the use of the SET and CLEAR parameters in
the ALTER NODE, ALTER DATABASE, and ALTER SESSION statements to set and clear
configuration parameters. See Setting and Clearing Configuration Parameters for more
information.
This version also introduces new columns to the CONFIGURATION_PARAMETERS system table:
l

catalog_valuevalue currently stored in the catalog.

database_valuevalue set at the database level. Shows default value if no value is set at the
database level.

sourcelevel at which the catalog_value is set.

is_mismatchtrue if current_value and catalog_value are different.

groupsdisplays the group(s) to which the parameter belongs (e.g. security parameter).

HP Vertica Analytics Platform (7.1.x)

Page 47 of 5055

HP Vertica Documentation

For more information, see Setting and Clearing Configuration Parameters.

Dynamic Resource Pool Routing


This release introduces two new features related to creating or altering a resource pool. The new
features allow you to dynamically reroute queries to a secondary resource pool, and designate a
pool to hold queued queries until a later time.

CASCADE TO Parameter
With the new CREATE RESOURCE POOl and ALTER RESOURCE POOL CASCADE TO parameter, you can
indicate a secondary resource pool to which queries can cascade. When a query reaches the initial
pool's RUNTIMECAP, the query cascades to your defined secondary pool where it executes.

PRIORITY HOLDOption
The new HOLD value option for the PRIORITY parameter sets a pool's priority to -999 so queries in
that pool queue until QUEUETIMEOUTis reached. You can set PRIORITY HOLD on a secondary pool if
you want a query that exceeds the initial pool's RUNTIMECAP to abort and stay on hold until a later
time.
For more information, see the RESOURCE_POOLS system table in the SQL Reference Manual
and Defining Secondary Resource Pools in the Administration Guide.

New Configuration Parameters


This section lists new configuration parameters in HP Vertica 7.1. See Configuration Parameters in
the Administrator's Guide for details.
HP Vertica 7.1 adds the following configuration parameters:
l

DBDLogInternalDesignProcessWhen you enable DBDLogInternalDesignProcess,


Database Designer logs data about the projections that theOptimizer proposes. These
projections are created when Database Designer deploys the design.

EnableGroupByProjectionsWhen you set EnableGroupByProjections to '1', you can


create live aggregate projections. EnableGroupByProjections is enabled by default. For more
information, see Live Aggregate Projections.

EnableTopKProjections HP Vertica 7.1 adds the EnableTopKProjections configuration


parameter. When you set EnableTopKProjections to '1', you can create Top-K projections that

HP Vertica Analytics Platform (7.1.x)

Page 48 of 5055

HP Vertica Documentation

allow you to retrieve Top-K data quickly. EnableTopKProjections is enabled by default. For more
information, see Top-K Projections.
l

EnableExprsInProjectionsWhen you set EnableExprsInProjections to '1', you can create


projections that use expressions to calculate column values. EnableExprsInProjections is
enabled by default. For more information, see Projections with Expressions.

SSLCLAYou set the SSL certificate authority by providing the contents of the certificate
authority root.crt file. For more information, see Security Parameters.

SSLCertificateYou set the SSL certificate by providing the contents of the server.crt file.
For more information, see Security Parameters.

SSLPrivateKeyYou set the server's private key by providing the contents of the server.key
file. Note that only the value of this parameter is visible to dbadmin users. For more information,
see Security Parameters.

SecurityAlgorithmUse this parameter to set the algorithm for the function that hash
authentication uses:MD5 or SHA512 .

Access Policy
With HPVertica you can now control access to data in a table at the column level. The new access
policy feature provides the capability to control user access at the table column level adding a
further level of security. Implementing this capability across your HP Vertica database can prevent
unauthorized users from retrieving sensitive information.
Implementing secure access to your HP Vertica table columns is achieved by creating user, roles,
and privileges. Once such required privileges exist, you choose which entities to associate with
table columns. Only user and roles granted relevant access can view and operate on data that the
column contains.
For more information see the following in the Administrator's Guide:
l

Creating Secure Access Policies

Using Access Policies

Limitations on Creating Access Policies with Projections

Schema and Table Privileges

HP Vertica Analytics Platform (7.1.x)

Page 49 of 5055

HP Vertica Documentation

Database-Level Column Security

Table-Level Column Security

See Also
SQLFor New Access Policy

System Tables
This section lists the new and modified system tables in this release. See the SQL Reference
Manual for details.

New and Changed System Tables


This section describes any new or changed system tables in both V_MONITORand V_CATALOG
schemas for HP Vertica 7.1.

Modified in V_MONITORSchema
TheLOGIN_FAILUREStable has one new column:
l

CLIENT_AUTHENTICATION_NAME

The SESSIONStable has the following new columns:


l

CLIENT_AUTHENTICATION_NAME

CLIENT_AUTHENTICATION

JVM_MEMORY_KB

New in V_CATALOGSchema
The following system tables were added in this release:
l

CLIENT_AUTH

CLIENT_AUTH_PARAMS

PASSWORD_AUDITOR

USER_CLIENT_AUTH

HP Vertica Analytics Platform (7.1.x)

Page 50 of 5055

HP Vertica Documentation

Modified in V_CATALOGSchema
The LICENSE_AUDITS system table has one new column:
l

LICENSE_NAME.

ThePROJECTION_COLUMNS system table has the following new columns:


l

IS_EXPRESSION

IS_AGGREGATE

COLUMN_EXPRESSION

The PROJECTIONSsystem table has the following new columns:


l

SEGMENT_EXPRESSION

HAS_EXPRESSIONS

IS_AGGREGATE

AGGREGATE_TYPE

TheTABLES system table has the following new columns:


l

HAS_AGGREGATE_PROJECTION

Management Console
This section lists the new functionality HP Vertica Analytics Platform introduced to Management
Console.

New Overview Page in Management Console


HP Vertica 7.1 introduces a redesigned Overview page in Management Console. This page offers a
more comprehensive view of your database. Quickly navigate the Overview page with three new
tabs: Status Summary, Systems Health, and Query Synopsis, which each contain charts that
summarize the state of your system. The QuickStats widgets on the right of the page give you at-aglance information, from node health and current queries to disk health and license status.

HP Vertica Analytics Platform (7.1.x)

Page 51 of 5055

HP Vertica Documentation

Monitoring RunningQueries in Management Console


HP Vertica 7.1 introduces a new chart on the Activity page that monitors running queries through
Management Console.
The Query Monitoring page displays running queries, queued queries, successfully run queries, and
failed queries. You can profile a query or cancel a running query from this page. Charts at the
bottom of the page display aggregate query usage by node or user.
For more information about the Query Monitoring page, see Monitoring Running Queries in the
Administrator's Guide.

Resource Pool Monitoring and Configuration in MC


HP Vertica 7.1 introduces enhancements for monitoring and configuring resource pools. These new
features help you manage workloads by providing resource pool configuration options and visual
representations of resource usage through Management Console.
The Management Console Activity page now displays graphs that show:
l

Average historical resource usage by pool across all nodes.

Total resource pool memory usage by node.

Average query queue time and execution time per pool across all nodes.

Total number of resource requests that were rejected on a specified pool across all nodes.

From the MC Configuration page, you can now change certain pool parameters.

HP Vertica Analytics Platform (7.1.x)

Page 52 of 5055

HP Vertica Documentation

MC Explain Plan and Profiling Views


The HP Vertica 7.1 release introduces new features in Management Console that enhance
EXPLAIN plan and profile viewing. These new features allow you to analyze and monitor multiple
query attributes in a flexible and assessable visual environment.
With these new features, you can:
l

Change query input and output settings

Specify the query you want to profile by transaction ID and statement ID

View counter information at the node and operator level

Monitor the progress of the query visually in the form of a tree

Run profile analysis

Database Designer Enhancements in Management Console


HP Vertica 7.1 introduces enhancements to Database Designer in Management Console. These
new features improve usability and provide Database Designer functionality that was not previously
available in Management Console.
These features appear in Database Designer in Management Console:
l

The option to turn on column correlation detection for a design appears in the General tab, or in
the Options screen when using the Database Designer Wizard.

When selecting queries to include in a design, the queries view in the Queries tab can be sorted
by most recent, most frequent, and longest running.

Icons now indicate the optimization status of design queries (optimized, partially optimized, or
unoptimized) in the Queries tab.

For more information about Database Designer in Management Console, see Using Management
Console to Create a Design in the Administrator's Guide.

Spread Retransmit Rate in Management Console


HP Vertica 7.1 introduces spread retransmit rate monitoring in Management Console. You can now
enable a spread retransmit rate check in Management Console Settings, and set an alert for spikes

HP Vertica Analytics Platform (7.1.x)

Page 53 of 5055

HP Vertica Documentation

in the spread retransmit rate that exceed a customizable threshold.

Clock Skew in Management Console


HP Vertica 7.1 introduces clock skew monitoring in Management Console. A new widget on the
Overview page alerts you to clock skew among nodes, which can interfere with Management
Console's ability to accurately monitor database activity.

Table Treemap Enhancements in Management Console


HP Vertica 7.1 includes the addition of a projections summary to the Table Treemap page in
Management Console's Activity page. The new projections summary displays the number of total
projections; skewed projections; unsafe projections; unsegmented projections; unused projections;
and storage containers for projections.
For more information on the Table Treemap chart in Management Console, see Monitoring Table
Utilization.

SQL Enhancements and Changes


This section contains the SQL enhancements made in Vertica Analytics Platform 7.0.

New REBALANCE_TABLE Function


Synchronously rebalances data in the specified table.
A rebalance operation performs the following tasks:
l

Distributes data based on:


n

user-defined fault groups, if specified,

large cluster automatic fault groups

Redistributes the database projections' data across all nodes

When to Rebalance
Rebalancing is useful (or necessary) after you:
l

Mark one or more nodes as ephemeral in preparation of removing them from the cluster

Add one or more nodes to the cluster so that HP Vertica can populate the empty nodes with data

HP Vertica Analytics Platform (7.1.x)

Page 54 of 5055

HP Vertica Documentation

Change the scaling factor of an elastic cluster, which determines the number of storage
containers used to store a projection across the database

Set the control node size or realign control nodes on a large cluster layout

Add nodes to or remove nodes from a fault group

Syntax
REBALANCE_TABLE('schema.table_name')

Parameters
[schema]

[Optional] Specifies the name of the schema that contains the table that you want to
rebalance. If you do not specify a schema-name, the statement checks for matching
tables in the first schema listed in the current search_path.

table_name

Specifies the table that you want to rebalance.

Privileges
Must be a superuser.

Example
The following command shows how to rebalance data on the specified table.
=> SELECT REBALANCE_TABLE('online_sales.online_sales_fact');
REBALANCE_TABLE
------------------REBALANCED
(1 row)

New SQLROLLUP Function


HP Vertica 7.1 adds the GROUPBY clause Online Analytic Processing (OLAP)
ROLLUPextension. The ROLLUP extension, and its accompanying grouping functions, let you
perform multi-level aggregates in a single query.

HP Vertica Analytics Platform (7.1.x)

Page 55 of 5055

HP Vertica Documentation

How ROLLUPWorks
ROLLUP performs aggregate operations on multiple levels in a hierarchy, creating sub-totals. HP
Vertica7.1 then aggregates (or rolls up) these subtotals from the most detailed level to a grand total.
ROLLUP requires an ordered list of grouping expressions used as arguments.
To aggregate data, ROLLUP performs these operations:
l

Calculates the standard aggregate values in the GROUP BY clause.

Creates progressively higher-level subtotals by moving from right to left through the grouping
columns, starting with the lowest level.

Continues this process until all the grouping columns have been incorporated and aggregations
computed. It then reports a grand total.

Using ROLLUP equates to using a UNIONALL with a set of GROUPBYqueries. With ROLLUP,
you specify only the required groupings in the GROUPBYclause. This approach allows more
efficient analysis and improved database performance.

ROLLUPGrouping Functions
ROLLUP supports the following grouping functions:
l

GROUPING()

GROUPING_ID()

GROUP_ID()

These grouping functions identify the group to which each row belongs and enable sorting subtotal
rows and filtering results .

See Also
l

GROUP BY Clause

GROUPING

GROUP_ID

GROUPING_ID

HP Vertica Analytics Platform (7.1.x)

Page 56 of 5055

HP Vertica Documentation

New Window Partition Clause Options


HP Vertica7.1 adds a new option to the window_partition_clause named PARTITION BEST and
renames PARTITION AUTO to PARTITION NODES. Previously, you could use PARTITION AUTO within
the window partition clause of an OVER() statement to add parallelism while processing for
improved performance.
HP Vertica7.1, PARTITION BESTprovides increased performance for multi-threaded queries across
multiple nodes. PARTITION AUTO is renamed to PARTITION NODES, and provides increased
performance for single-threaded queries across multiple nodes. The term PARTITION AUTO is
deprecated in this release.
For complete details see window_partition_clause.

SQL For New Access Policy


Use the following new SQLstatements to manage the new access policy. The access policy
restricts access to sensitive information in a column to only those users authorized to view that
data.
l

CREATE ACCESSPOLICY - creates an access policy for a specific column(s) in a table.

ALTER ACCESS POLICY - specify actions that can be performed on an access policy.

DROP ACCESS POLICY - delete an access policy.

Performance
In this release, HP Vertica introduces numerous infrastructure improvements to increase
performance across the entire platform.

WITH Clause Materialization Option


HP Vertica 7.1.x provides an additional WITH clause evaluation method that materializes the
WITHclause subquery results into a temporary table.
Currently, HP Vertica uses inline expansion to implement all WITH clauses, evaluating them each
time they are referenced in the main query. The new functionality evaluates WITH clause
subqueries once and stores the results in a local temporary table. The optimizer then references the

HP Vertica Analytics Platform (7.1.x)

Page 57 of 5055

HP Vertica Documentation

temporary tables as often as necessary while processing the main query. The table is dropped
when the primary query completes.
By default, HP Vertica continues to use the inline expansion method. To enable the new
functionality, use add_vertica_option:
SELECT add_vertica_options('OPT', 'ENABLE_WITH_CLAUSE_MATERIALIZATION');

To disable the materialization method, use clr_vertica_options:


SELECT clr_vertica_options('OPT', 'ENABLE_WITH_CLAUSE_MATERIALIZATION');

Note: You must be a superuser to enable the materialization method.


For more information, see WITH Clauses in SELECT in the Administrators Guide, and WITH
Clause in the SQL Reference Manual.

Improved Performance for Simple SELECT Queries


HP Vertica 7.1 now provides better performance for simple SELECT queries. If HP Vertica detects
that the query can be answered by a single node, then the initiating node routes the query plan
directly to that node instead of running a distributed query plan across all nodes in the cluster. This
results in increased performance for simple SELECT queries.

Concurrent Small Queries


HP Vertica7.1 improves performance for workloads consisting of many short queries. Such queries
will have reduced lock/mutex contention, so will perform better and utilize modern CPUs more
effectively.

HP Vertica and Hadoop Integration Updates


This section explains new and updated features for integrating HP Vertica and Apache Hadoop.

Storage Location for HDFS


HP Vertica Version 7.1.x expands its ability to integrate with Apache Hadoop by introducing the HP
Vertica Storage Location for HDFS. This feature lets you create a storage location on HDFS. Once
created, HP Vertica can store ROS containers on HDFS the same way it stores them on native
Linux filesystems.

HP Vertica Analytics Platform (7.1.x)

Page 58 of 5055

HP Vertica Documentation

You can use the Storage Location for HDFS feature to free storage space on your HP Vertica
cluster by moving less-queried data to HDFS. The data remains available for queries, although at
the cost of slower performance.
HDFS is slower than native Linux filesystems. Use HDFS-based storage locations to store data
only when you can afford to have queries on that data perform slower than queries on data stored
natively. Because of the slower performance of HDFS, you must never use HDFS-based storage
locations to store temporary files.
See Storage Location for HDFS in the Hadoop Integration Guide for more information.

Changes to the HP Vertica Connector for HDFS


The HP Vertica Connector for HDFShas increased performance and better fault tolerance in this
release:
l

The connector now determines if the file it is retrieving from HDFSis split across Hadoop nodes.
If it is split, then the connector connects to the Hadoop nodes storing each piece of the file and
retrieves and reassembles the pieces itself. Previously, the Hadoop nodes would reassemble
the file before sending it to the connector. Having the connector directly access the pieces and
reassemble the file removes the delay caused by having the file's contents transferred between
nodes in the Hadoop cluster before being sent to HP Vertica.

When the connecter encounters an error while retrieving data from HDFS, it now automatically
retries the request. Previously, errors would often result in the query failing.

The connector now monitors the data transfer speed from the Hadoop cluster. If any connection
slows below a threshold (1 MB per second by default), the connector breaks it and tries to
retrieve the data from another node in the cluster. You can manually set the lowest allowable
data transfer rate. See HP Vertica Connector for HDFS Troubleshooting Tips for more
information.

Changes to the HCatalog Connector


You can use the HCatalog Connector with all supported versions of Hadoop and Hive. To support
this use, HP Vertica requires some additional, one-time configuration. For details, see Configuring
Vertica for HCatalog.

Text Search
HP Vertica 7.1 introduces functionality to allow you to perform a text search on a field within a table.
The text search feature allows you to quickly search the contents of a single CHAR, VARCHAR or

HP Vertica Analytics Platform (7.1.x)

Page 59 of 5055

HP Vertica Documentation

LONGVARCHAR field within a table to locate a specific keyword. This feature is best used on
columns that will be repeatedly queried regarding their contents. After the text index is created,
DML operations will be slightly slower on the source table. This is a result of syncing the text index
and source table. Regular queries on the source table will not be affected.
This functionality includes the addition of two new SQL statements, CREATE TEXT INDEX and
DROP TEXT INDEX.
For more information and a detailed example, see Text Search Overview.

Scrutinize
The Scrutinize command will now accept more options to provide users with a custom diagnostic
report to best suit their needs.
The new options are:
l

--version

--no-containers

--no-active-queries

--with-active-queries

--begin

--end

For more information, see Collecting Diagnostics (scrutinize Command).

IPv6 Client Support


HP Vertica7.1 introduces support for IPv6 client connections, supported on these platforms:
l

Linux

Windows

Macintosh

See the for specifics on these platforms.


HP Vertica clients running on IPv6 networks can now connect to HP Vertica servers that have an
IPv6 interface. No server configuration is necessary to enable this feature as HP Vertica listens for
client connections on all network interfaces available on the server.

HP Vertica Analytics Platform (7.1.x)

Page 60 of 5055

HP Vertica Documentation

Note that IPV6 is not supported on the following client platforms in this release:
l

Solaris

AIX

HP-UX

The Connecting to HP Vertica Guide contains details for configuring clients to connect to an IPv6
interface on an HP Vertica server.

Python 3 Support
HP Vertica 7.1 introduces support for Python 3 and newer versions of PyODBC. The specific
versions supported are detailed in the Supported Platforms guide. See Perl and Python
Requirements

Routable Connection API Support for Numeric Data Type


HP Vertica 7.1 introduces support for the numeric data type in the Routable Connection API.

HP Vertica Place
HP Vertica 7.1 includes HP Vertica Place, an analytics package that provides users with the ability
to manipulate geospatial data. With 7.1.x, HP Vertica Place is now generally available. Previously,
HP Vertica Place was only available as part of the HP Vertica Innovations program.
You can download HP Vertica Place from my.vertica.com/downloads.
HP Vertica Place Features:
l

Over 50 Geospatial SQL Functions.

100% accuracy and faster spatial index creation.

Improved performance of spatial joins with points and polygons.

HP Vertica Place will be available on all supported HP Vertica platforms after 7.1.

For more information, see HP Vertica Place: An Overview.


Note: The HP Vertica Place version must be the same as the version of the HP Vertica server.

HP Vertica Analytics Platform (7.1.x)

Page 61 of 5055

HP Vertica Documentation

HP Vertica Pulse
HP Vertica 7.1 includes support for HP Vertica Pulse. HP Vertica Pulse is an add-on package that
provides a suite of functions that allow you to analyze and extract the sentiment from English
language text directly from your HP Vertica database.
HP Vertica Pulse features include:
l

Attribute based sentiment scoring - Pulse scores the sentiment of attributes in a sentence.
Attributes are generally nouns and are automatically discovered by Pulse. Pulse typically scores
sentiment from a range of -1 (negative sentiment) to +1 (positive sentiment). A sentiment of 0 is
considered neutral. Scoring individual attributes in a sentence instead of scoring the sentence as
a whole provides a more granular analysis for the text. For example, consider the sentence "The
quick brown fox jumped over the lazy dog." It would be difficult to score the sentiment on the
sentence as a whole, but if you score on the attributes of fox and dog, you could say the
sentiment on the fox was positive (the fox is quick), and the sentiment on the dog is negative
(the dog is lazy).

Tuning to your domain - Pulse provides functionality to recognize attributes that are specific
to your domain. For example, you can add the name of your product or company to a 'white_list'
so that it is discovered by Pulse.

Tuning of how sentiment is scored - Pulse includes user-dictionaries of words that are used
to help score sentiment. You can alter these user-dictionaries to fine tune the way your text is
analyzed.

Filtering of attributes you are not interested in - Pulse supports a special 'stop words' userdictionary to indicate attributes that should not be analyzed. Alternately, you can choose to
score sentiment only on attributes defined in your white_list.

Synonym mappings - Pulse provides customizable mappings so that you can map synonyms
to a base word, and then normalize the analysis for the synonyms to the base word. For
example, you can map Hewlett Packard to HP.

HP Vertica Pulse requires that Java and the HP Vertica Java Support Package are installed on all
nodes in the HP Vertica cluster.
Note: The HP Vertica Pulse version must be the same as the version of the HP Vertica server.
For more information, see About the HP Vertica Pulse Package.

HP Vertica Analytics Platform (7.1.x)

Page 62 of 5055

HP Vertica Documentation

Documentation Updates
HP Vertica 7.1.x introduces a number of changes to the standard documentation set.

Programmer's Guide Separated Into Individual Guides


The Programmer'sGuide has been broken into several smaller guides, to better target separate
audiences. These new guides are:
l

Analyzing Data explains how to write queries in HP Vertica SQL.

Connecting to HP Vertica explains how to use the vsql client and the HP Verticaclient libraries.

Extending HP Vertica explains how to use SQLmacros and the SDK to add analytic features to
HP Vertica.

Documentation Set Available in a Single PDF File


The vertica documentation set is now available in its entirety in a single PDF file.
Using the new PDF gives you all HP Vertica documents in one file while working offline. You can
search across all guides and all cross-book hyperlinks are now live. (Separate PDF docs do not
support live links to other guides.)
The documentation set continues to be available online, in HTML. You can also access all guides in
individual PDF files as well.

New Location for HPVertica on Amazon Web Services


Guide
The HPVertica on Amazon Web Services Guide has moved to a new location:
http://www.vertica.com/resources-for-technology-partner-integrations/
The new location will allow us more flexibility when updating this documentation. (HPreleases
HPVertica AMIs on a slightly different schedule than our software release schedule.)

Deprecated Functionality in This Release


In version 7.1, the following HP Vertica functionality has been deprecated.

HP Vertica Analytics Platform (7.1.x)

Page 63 of 5055

HP Vertica Documentation

The Geospatial Package SQL functions.

The ADD_LOCATION() function, which has been replaced by the


CREATELOCATIONstatement.

The JavaClassPathForUDx configuration parameter. Instead of setting this parameter, you now
set a class path on each UDx library. Set this per-library class path using the new
DEPENDSkeyword of the CREATE LIBRARYand ALTER LIBRARYstatements.

The verticaConfig vbr.py configuration option has been deprecated.

See Also
For information about the meaning of obsolete and deprecated functionality, see Retired
Functionality.

HP Vertica Analytics Platform (7.1.x)

Page 64 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 65 of 5055

HP Vertica Documentation

Retired Functionality
This section describes the two phases HP follows to retire HP Vertica functionality.
Deprecated. HP Vertica declares a feature deprecated in a major or minor release. Deprecated
features remain in the product and are functional. However, users accessing the feature receive
messages informing them that the feature will be removed in the following major or minor release.
Features are identified as deprecated in HP Vertica release notes and in feature documentation; all
documentation remains accessible. HP announces feature deprecation only in major and minor
releases.
Removed. HP removes a feature in the major or minor release immediately following the
deprecation announcement. Users can no longer access the functionality. The feature removal is
announced in HP Vertica release notes. All feature documentation is removed.
The following functionality has been deprecated, or removed in past and present versions:

HP Vertica Analytics Platform (7.1.x)

Page 66 of 5055

HP Vertica Documentation

Functionality

Deprecated Removed
Component Version
Version

Geospatial Package SQL Functions

Server

BB_WITHIN

BEARING

CHORD_TO_ARC

DWITHIN

ECEF_CHORD

ECEF_x

ECEF_y

ECEF_z

ISLEFT

KM2MILES

LAT_WITHIN

LL_WITHIN

LLD_WITHIN

LON_WITHIN

MILES2KM

RADIUS_LON

RADIUS_M

RADIUS_N

RADIUS_R

RADIUS_Ra

RADIUS_Rc

HP Vertica Analytics Platform (7.1.x)

7.1

7.2

Page 67 of 5055

HP Vertica Documentation

Functionality
l

RADIUS_Rv

RADIUS_SI

RAYCROSSING

WGS84_a

WGS84_b

WGS84_e2

WGS84_f

WGS84_if

WGS84_r1

Deprecated Removed
Component Version
Version

verticaConfig vbr.py configuration option

Server

7.1

JavaClassPathForUDx configuration parameter

Server

7.1

ADD_LOCATION()

Server

7.1

bwlimit

Server

7.1

EXECUTION_ENGINE_PROFILES counters file

Server

7.0

MERGE_PARTITIONS()

Server

7.0

Administration Tools option check_spread

Server,

7.0

handles, memory allocated, and memory reserved

clients
krb5 client authentication method

All clients

7.0

Pload Library

Server

7.0

USESINGLETARGET

Server

7.0

scope parameter of CLEAR_PROFILING

Server

6.1

HP Vertica Analytics Platform (7.1.x)

7.1

Page 68 of 5055

HP Vertica Documentation

Functionality

Deprecated Removed
Component Version
Version

IMPLEMENT_TEMP_DESIGN()

Server,

6.1

clients
USER_TRANSFORMSuser table

Server

6.0

UPDATEprivileges on sequences

Server

6.0

Query Repository, which includes:

Server

6.0

Server

6.0

SYS_DBA.QUERY_REPO table
Functions:
l

CLEAR_QUERY_REPOSITORY()

SAVE_QUERY_REPOSITORY()

Configuration parameters:
l

CleanQueryRepoInterval

QueryRepoMemoryLimit

QueryRepoRetentionTime

QueryRepositoryEnabled

SaveQueryRepoInterval

QueryRepoSchemaName

QueryRepoTableName

See Notes section below table.


RESOURCE_ACQUISITIONS_HISTORY system
table
Volatility and NULL behavior parameters of CREATE

Server

6.1

Server

6.0

FUNCTION
Ganglia on Red Hat 4

HP Vertica Analytics Platform (7.1.x)

Page 69 of 5055

HP Vertica Documentation

Functionality

Deprecated Removed
Component Version
Version

copy_vertica_database.sh

Server

restore.sh

Server

backup.sh

Server

LCOPY (see Note section below table)

Server,
clientsw

4.1 (Client)
5.1 (Server)

5.1
(Client)

MergeOutPolicySizeList

Server

4.1

5.0

EnableStrataBasedMrgOutPolicy

Server

4.1

5.0

ReportParamSuccess

All clients

4.1

5.0

BatchAutoComplete

All clients

4.1

5.0

use35CopyParameters

ODBC,

4.1

5.0

JDBC
clients
getNumAcceptedRows
getNumRejectedRows

ODBC,

5.0

JDBC
clients

MANAGED load (server keyword and related client

Server,

5.0

parameter)

clients

EpochAdvancementMode

Server

4.1

5.0

VT_ tables

Server

4.1

5.0

RefreshHistoryDuration

Server

4.1

5.0

Notes
l

While the HPVertica Geospatial package has been deprecated, it has been replaced by
HPVertica Place. This analytics package is available on my.vertica.com/downloads.

LCOPY: Supported by the 5.1 server to maintain backwards compatibility with the 4.1 client
drivers.

HP Vertica Analytics Platform (7.1.x)

Page 70 of 5055

HP Vertica Documentation

Query Repository: You can still monitor query workloads with the following system tables:
n

QUERY_PROFILES

SESSION_PROFILES

EXECUTION_ENGINE_PROFILES

In addition, HP Vertica Version 6.0 introduced new robust, stable workload-related system table:
l

QUERY_REQUESTS

QUERY_EVENTS

RESOURCE_ACQUISITIONS

The RESOURCE_ACQUISITIONS system table captures historical information.

Use the Kerberos gss method for client authentication, instead of krb5. See Configuring
Kerberos Authentication.

HP Vertica Analytics Platform (7.1.x)

Page 71 of 5055

Concepts Guide

HP Vertica Analytics Platform (7.1.x)

Page 72 of 5055

HP Vertica Documentation

The HP Vertica Approach


HP Vertica is built on a foundation of four distinguishing characteristics (the four Cs):
l

Column storage: store data to optimize for access.

Compression: store more data in less physical storage.

Clustering: scale your database easily, whether you need three nodes or 300.

Continuous performance: optimize the database on an ongoing basis, automatically.

The following sections describe these key characteristics.

Column Storage
HP Vertica stores data the way it is typically queried for best performance. Column storage is ideal
for read-intensive workloads because it can dramatically reduce disk I/O compared to the moretypical row-based storage.

See Column Store Architecture with FlexStore for more information.

Data Encoding and Compression


HP Vertica uses both encoding and compression to optimize query performance and save storage
space. However, in your HP Vertica database, they have different meanings.
Encoding is the process of converting data into a standard format. HP Vertica uses a number of
different encoding strategies, depending on column data type, table cardinality, and sort order.
Encoding increases performance because there is less disk I/O during query execution. In addition,
you can store more data in less space.

HP Vertica Analytics Platform (7.1.x)

Page 73 of 5055

HP Vertica Documentation

Compression is the process of transforming data into a compact format. HP Vertica uses several
different compression methods and automatically chooses the best one for the data being
compressed. Using compression, HP Vertica stores more data, provides more views, and uses
less hardware than other databases. Compression rates of 50-90% are possible. Using
compression lets you keep much more historical data in physical storage.

For more information, see Data Encoding and Compression.

Clustering
Clustering is used for scaling and redundancy. You can scale your database cluster easily by
adding more hardware, and you can improve reliability by distributing and replicating data across
your cluster.

HP Vertica Analytics Platform (7.1.x)

Page 74 of 5055

HP Vertica Documentation

Data (specifically, column data) is distributed across nodes in a cluster, so if one node becomes
unavailable the database continues to operate. When a node is added to the cluster, or comes back
online after being unavailable, it automatically queries other nodes to update its local data.
A projection is a set of columns with the same sort order, defined by a column to sort by or a
sequence of columns to sort by. A projection is similar to a sorted table, with possibly extra
columns brought in from joined tables, like a materialized view. Like an index or materialized view in
a traditional database, a projection works behind the scenes to speed up queries. You write queries
in terms of the original tables. It is these projections that are distributed among the nodes in your
cluster.
Projections are also replicated across nodes, ensuring that if one node becomes unavailable,
another copy of the data remains available. This is called K-Safety and is described in K-Safety.
Automatic data replication, failover, and recovery provide for active redundancy, which increases
performance. Nodes recover automatically by querying the system.

Continuous Performance
HP Vertica queries and loads data continuously (24x7) with virtually no database administration.

With concurrent loading and querying, you get real-time views and eliminate the need for nightly
load windows. On-the-fly schema changes allow you to add columns and projections without
shutting down your database; HP Vertica manages updates while keeping the database available.
In other words, you don't have to be a database expert to load a large database and start using it,
though we recommend that you become familiar with the key concepts and design principles.

HP Vertica Analytics Platform (7.1.x)

Page 75 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 76 of 5055

HP Vertica Documentation

HP Vertica Components
This section describes the unique components that make up HP Vertica.

Column Store Architecture with FlexStore


Traditionally databases were designed for OLTP and used a row-store architecture. To process a
query, a row store reads all of the columns in all of the tables named in the query, regardless of how
wide the tables might be or how many columns are actually needed. Often, analytic queries access
only two or three columns from tables containing up to several hundred columns, resulting in a lot of
unnecessary data retrieval.
Unlike other RDBMS systems, HP Vertica reads the columns from database objects called
projections, which are described in the Physical Schema section of this guide. No resources are
wasted by reading large numbers of unused columns. Every byte of data is used by the execution
engine. For example, consider this simple two-table schema:

Suppose you want to run this query:


SELECT A, C, N
FROM Table1 JOIN Table2
ON H = J;

A row store must read 16 columns (A through H and J through Q) from physical storage for each
record in the result set. A column store with a query-specific projection reads only three columns: A,
C, and N.

How FlexStore Enhances Your Column-Based


Architecture
FlexStore is a combination of physical design, database storage, and query execution
techniques. HP Vertica applies FlexStore to the database to optimize it for the current analytic
workload. These techniques include:

HP Vertica Analytics Platform (7.1.x)

Page 77 of 5055

HP Vertica Documentation

Column grouping. Refers to a technique for storing column data together to optimize I/O
during query processing. Such groupings can be advantageous for correlated columns and for
columns that are always accessed together for projecting, but not for filtering or joining. Grouped
columns also benefit from special compression and retrieval techniques. An example might be
bid and ask prices in a TickStore database. Column grouping is described in the CREATE
PROJECTION statement's GROUPED clause.

Intelligent disk use. Allows optimizing performance to place frequently-needed disk resources
onto faster media. This includes mixing solid-state and rotating "disk" storage in the database
nodes. You can prioritize disk use for:
n

data versus temporary storage

storage for columns in a projection

See Working With Storage Locations in the Administrator's Guide for details.
l

Fast deletes. Refers to projection design techniques to speed up delete processing, together
with the function EVALUATE_DELETE_PERFORMANCE() to help identify potential delete
problems. See Optimizing DELETEs and UPDATEs for Performance in the Administrator's
Guide for details.

Architecture of the HP Vertica Cluster


Terminology
In HP Vertica, the physical architecture is designed to distribute physical storage and to allow
parallel query execution over a potentially large collection of computing resources.
The most important terms to understand are host, instance, node, cluster, and database:
Host
A computer system with a 32-bit (non-production use only) or 64-bit Intel or AMD processor, RAM,
hard disk, and TCP/IP network interface (IP address and hostname). Hosts share neither disk
space nor main memory with each other.
Instance
An instance of HP Vertica consists of the running HP Vertica process and disk storage (catalog and
data) on a host. Only one instance of HP Vertica can be running on a host at any time.
Node

HP Vertica Analytics Platform (7.1.x)

Page 78 of 5055

HP Vertica Documentation

A host configured to run an instance of HP Vertica. It is a member of the database cluster. For a
database to have the ability to recover from the failure of a node requires at least three nodes. HP
recommends that you use a minimum of four nodes.
Cluster
Refers to the collection of hosts (nodes) bound to a database. A cluster is not part of a database
definition and does not have a name.
Database
A cluster of nodes that, when active, can perform distributed data storage and SQL statement
execution through administrative, interactive, and programmatic user interfaces.
Note: Although you can define more than one database on a cluster, HP Vertica supports
running only one database per cluster at a time.

Data Encoding and Compression


HP Vertica uses both encoding and compression to optimize query performance and save storage
space. However, in your HP Vertica database, they have different meanings.

Encoding
Encoding is the process of converting data into a standard format. In HP Vertica, encoded data can
be processed directly, but compressed data cannot. HP Vertica uses a number of different
encoding strategies, depending on column data type, table cardinality, and sort order. Encoding
increases performance because there is less disk I/O during query execution. In addition, you can
store more data in less space. Encoding is not the same as compression.
HP Vertica operates on the encoded data representation whenever possible for best performance. It
also passes encoded values to other operations, saving memory bandwidth.
For optimal encoding in your physical schema, run Database Designer.Database Designer
analyzes the data in each column. Depending on your design optimization objective, Database
Designer recommends encoding types for each column in the proposed projections.

Compression
Compression is the process of transforming data into a compact format. Compressed data cannot
be directly processed; it must first be decompressed. HP Vertica uses integer packing for
unencoded integers and LZO for compressible data. Although compression is generally considered
to be a form of encoding, the terms have different meanings in HP Vertica.

HP Vertica Analytics Platform (7.1.x)

Page 79 of 5055

HP Vertica Documentation

The size of a database is often limited by the availability of storage resources. Typically, when a
database exceeds its size limitations, the administrator archives data that is older than a specific
historical threshold.
The extensive use of compression allows a column store to occupy substantially less storage than
a row store. In a column store, every value stored in a column of a projection has the same data
type.This greatly facilitates compression, particularly in sorted columns. In a row store, each value
of a row can have a different data type, resulting in a much less effective use of compression.
HP Vertica's efficient storage allows the database administrator to keep much more historical data
in physical storage. In other words, the archiving threshold can be set to a much earlier date than in
a less efficient store.

K-Safety
K-safety is a measure of fault tolerance in the database cluster. The value K represents the number
of replicas of the data in the database that exist in the database cluster. These replicas allow other
nodes to take over for failed nodes, allowing the database to continue running while still ensuring
data integrity. If more than K nodes in the database fail, some of the data in the database may
become unavailable. In that case, the database is considered unsafe and automatically shuts
down.
It is possible for an HP Vertica database to have more than K nodes fail and still continue running
safely, because the database continues to run as long as every data segment is available on at
least one functioning cluster node. Potentially, up to half the nodes in a database with a K-safety
level of 1 could fail without causing the database to shut down. As long as the data on each failed
node is available from another active node, the database continues to run.
Note: If half or more of the nodes in the database cluster fail, the database will automatically
shut down even if all of the data in the database is technically available from replicas. This
behavior prevents issues due to network partitioning.
In HP Vertica, the value of K can be zero (0), one (1), or two (2). If a database that has a K-safety of
one (K=1) loses a node, the database continues to run normally. Potentially, the database could
continue running if additional nodes fail, as long as at least one other node in the cluster has a copy
of the failed node's data. Increasing K-safety to 2 ensures that HP Vertica can run normally if any
two nodes fail. When the failed node or nodes return and successfully recover, they can participate
in database operations again.
The physical schema design must meet certain requirements. To create designs that are K-safe,
HP recommends using the Database Designer.

HP Vertica Analytics Platform (7.1.x)

Page 80 of 5055

HP Vertica Documentation

Buddy Projections
In order to determine the value of k-safety, HP Vertica creates buddy projections, which are copies
of segmented projections that are distributed across database nodes. (See Projection
Segmentation.) HP Vertica ensures that segments that contain the same data are distributed to
different nodes. This ensures that if a node goes down, all the data is available on the remaining
nodes.

K-Safety Example

The diagram above shows a 5-node cluster that has a K-safety level of 1. Each of the nodes
contains buddy projections for the data stored in the next higher node (node 1 has buddy projections
for node 2, node 2 has buddy projections for node 3, and so on). Any of the nodes in the cluster
could fail, and the database would still be able to continue running (although with lower
performance, since one of the nodes has to handle its own workload and the workload of the failed
node).

HP Vertica Analytics Platform (7.1.x)

Page 81 of 5055

HP Vertica Documentation

If node 2 fails, node 1 handles requests on its behalf using its replica of node 2's data, in addition to
performing its own role in processing requests. The fault tolerance of the database will fall from 1 to
0, since a single node could cause the database to become unsafe. In this example, if either node 1
or node 3 fails, the database would become unsafe because not all of its data would be available. If
node 1 fails, then node 2's data will no longer be available. If node 3 fails, its data will no longer be
available, because node 2 is also down and could not fill in for it. In this case, nodes 1 and 3 are
considered critical nodes. In a database with a K-safety level of 1, the node that contains the
buddy projection of a failed node and the node whose buddy projections were on the failed node will
always become critical nodes.

HP Vertica Analytics Platform (7.1.x)

Page 82 of 5055

HP Vertica Documentation

With node 2 down, either node 4 or 5 in the cluster could fail and the database would still have all of
its data available. For example, if node 4 fails, node 3 is able to use its buddy projections to fill in for
it. In this situation, any further loss of nodes would result in a database shutdown, since all of the
nodes in the cluster are now critical nodes. (In addition, if one more node were to fail, half or more of
the nodes would be down, requiring HP Vertica to automatically shut down, no matter if all of the
data were available or not.)

In a database with a K-safety level of 2, any node in the cluster could fail after node 2 and the
database would be able to continue running. For example, if in the 5-node cluster each node
contained buddy projections for both its neighbors (for example, node 1 contained buddy projections
for both node 5 and node 2), then nodes 2 and 3 could fail and the database could continue running.
Node 1 could fill in for node 2, and node 4 could fill in for node 3. Due to the requirement that half or
more nodes in the cluster be available in order for the database to continue running, the cluster
could not continue running if node 5 were to fail as well, even though nodes 1 and 4 both have buddy
projections for its data.

Monitoring K-Safety
Monitoring tables can be accessed programmatically to enable external actions, such as alerts.
You monitor the K-safety level by polling the SYSTEM table column and checking the value. See
SYSTEM in the SQL Reference Manual.

High Availability and Recovery


HP Vertica's unique approach to failure recovery is based on the distributed nature of a database.
An HP Vertica database is said to be K-safe if any node can fail at any given time without causing

HP Vertica Analytics Platform (7.1.x)

Page 83 of 5055

HP Vertica Documentation

the database to shut down. When the lost node comes back online and rejoins the database, it
recovers its lost objects by querying the other nodes. See Managing Nodes and Monitoring
Recovery in the Administrator's Guide.
In HP Vertica, the value of K can be 0, 1, or 2. If a database that has a K-safety of one (K=1) loses a
node, the database continues to run normally. Potentially, the database could continue running if
additional nodes fail, as long as at least one other node in the cluster has a copy of the failed node's
data. Increasing K-safety to 2 ensures that HP Vertica can run normally if any two nodes fail. When
the failed node or nodes return and successfully recover, they can participate in database
operations again.

HP Vertica Analytics Platform (7.1.x)

Page 84 of 5055

HP Vertica Documentation

High Availability With Projections


To ensure high availability and recovery for database clusters of three or more nodes, HP Vertica:
l

Replicates small, unsegmented projections

Creates buddy projections for large, segmented projections.

Replication (Unsegmented Projections)


When it creates projections, Database Designer does not segment projections for small tables;
rather it replicates them, creating and storing duplicates of these projections on all nodes within the
database.
Replication ensures:
l

Distributed query execution across multiple nodes.

High availability and recovery. In a K-safe database, replicated projections serve as buddy
projections. This means that a replicated projection on any node can be used for recovery.

Note: We recommend you use Database Designer to create your physical schema. If you
choose not to, be sure to segment all large tables across all database nodes, and replicate
small, unsegmented table projections on all database nodes.
The following illustration shows two projections, B and C, replicated across a three node cluster.

Buddy Projections (Segmented Projections)


HP Vertica creates buddy projections, which are copies of segmented projections that are
distributed across database nodes. (See Projection Segmentation.) HP Vertica ensures that
segments that contain the same data are distributed to different nodes. This ensures that if a node

HP Vertica Analytics Platform (7.1.x)

Page 85 of 5055

HP Vertica Documentation

goes down, all the data is available on the remaining nodes. HP Vertica distributes segments to
different nodes by using offsets. For example, segments that comprise the first buddy projection
(A_BP1) would be offset from projection A by one node and segments from the second buddy
projection (A_BP2) would be offset from projection A by two nodes.
The following illustration shows the segmentation for a projection called A and its buddy
projections, A_BP1 and A_BP2, for a three node cluster.

The following illustration shows how HP Vertica uses offsets to ensure that every node has a full
set of data for the projection.

This example illustrates how one projection and its buddies are segmented across nodes.
However, each node can store a collection of segments from various projections.

High Availability With Fault Groups


Fault groups let you configure HP Vertica for your physical cluster layout in order to reduce the risk
of correlated failures inherent in your environment. Correlated failures occur when two or more
nodes fail as a result of a single failure event. These failures often occur due to shared resources,
such as power, networking, or storage.
Although correlated failure scenarios cannot always be avoided, HP Vertica helps you minimize the
risk of failure by letting you define fault groups on your cluster. HP Vertica then uses your fault

HP Vertica Analytics Platform (7.1.x)

Page 86 of 5055

HP Vertica Documentation

groups definitions to distribute data segments across the cluster, so the database stays up if a
single failure event occurs.
The following list describes some of the causes of correlated failures:
l

Servers in the same data center machine rack:


n

Power loss to a machine rack could cause all nodes on those servers to fail

User error during maintenance could affect an entire machine rack

Multiple virtual machines that reside on a single VM host server

Nodes that use other shared infrastructure that could cause correlated failures of a subset of
nodes

Note: If your cluster layout is managed by a single network switch, a switch failure would
cause a single point of failure. Fault groups cannot help with single-point failures.
HP Vertica supports complex, hierarchical fault groups of different shapes and sizes. Fault groups
are integrated with elastic cluster and large cluster arrangements to provide added cluster flexibility
and reliability.

Automatic fault groups


When you configure a cluster of 120 nodes or more, HP Vertica automatically creates fault groups
around control nodes. Control nodes are a subset of cluster nodes that manage spread (control
messaging). HP Vertica places nodes that share a control node in the same fault group. See Large
Cluster in the Administrator's Guide for details.

User-defined fault groups


If your cluster layout has the potential for correlated failures, or if you want to influence which
cluster hosts manage control messaging, you should define your own fault groups.

Example cluster topology


The following image provides an example of hierarchical fault groups configured on a single cluster:
l

Fault group FGA contains nodes only

Fault group FG-B (parent) contains child fault groups FG-C and FG-D. Each child fault group also

HP Vertica Analytics Platform (7.1.x)

Page 87 of 5055

HP Vertica Documentation

contain nodes.
l

Fault group FGE (parent)contains child fault groups FG-F and FG-G. The parent fault group FGE
also contains nodes.

How to create fault groups


Before you define fault groups, you must have a thorough knowledge of your physical cluster
layout. Fault groups require careful planning.
The simplest way to define fault groups is to create an input file of your cluster arrangement. You
pass the file to an HP Vertica-supplied script, which returns to the console the SQL statements you
need to run. See Fault Groups in the Administrator's Guide for details.

Hybrid Storage Model


To support Data Manipulation Language (DML) commands (INSERT, UPDATE, and DELETE) and
bulk load operations (COPY), intermixed with queries in a typical data warehouse workload, HP
Vertica implements the storage model shown in the illustration below. This model is the same on
each HP Vertica node.

HP Vertica Analytics Platform (7.1.x)

Page 88 of 5055

HP Vertica Documentation

Write Optimized Store (WOS) is a memory-resident data structure for storing data loaded (or
removed) using the INSERT, UPDATE, DELETE, and COPY statements, without /*+direct*/ hints.
Like the Read Optimized Store (ROS), the WOS is arranged by projection. To support very fast
data load speeds, the WOS stores records without data compression or indexing. A projection in
the WOS is sorted only when it is queried. It remains sorted as long as no further data is inserted
into it. The WOS organizes data by epoch and holds both committed and uncommitted transaction
data.
The Tuple Mover (TM) is the HP Vertica database optimizer component that moves data from
memory (WOS) to disk (ROS). The TM also combines small ROS containers into larger ones, and
purges deleted data. During moveout operations, the TM is also responsible for adhering to any
storage policies that are in effect for the storage location. The Tuple Mover runs in the background,
performing some tasks automatically (ATM) at time intervals determined by its configuration

HP Vertica Analytics Platform (7.1.x)

Page 89 of 5055

HP Vertica Documentation

parameters. For information about changing the TM configuration parameters, see Tuple Mover
Parameters in the Administrator's Guide for further information.
The Read Optimized Store (ROS) is a highly optimized, read-oriented, disk storage structure,
organized by projection. The ROS makes heavy use of compression and indexing. You can use the
COPY statement DIRECT and INSERT parameters (with /*+direct*/ hint) to load data directly into
the ROS.
Note: HP Vertica allows optional spaces before and after the plus sign in direct hints (between
the /* and the +).
A grouped ROS is a highly-optimized, read-oriented physical storage structure organized by
projection. A grouped ROS makes heavy use of compression and indexing. Unlike a ROS,
however, a grouped ROS stores data for two or more grouped columns in one disk file.
The COPY command is designed for bulk load operations and can load data into the WOS or the
ROS.

Logical Schema
Designing a logical schema for an HP Vertica database is no different than designing for any other
SQL database. A logical schema consists of objects such as schemas, tables, views and
referential Integrity constraints that are visible to SQL users. HP Vertica supports any relational
schema design of your choice.
For more information, see Designing a Logical Schema in the Administrator's Guide.

Physical Schema
In traditional database architectures, data is primarily stored in tables. Additionally, secondary
tuning structures such as index and materialized view structures are created for improved query
performance. In contrast, tables do not occupy any physical storage at all in HP Vertica. Instead,
physical storage consists of collections of table columns called projections.
Projections store data in a format that optimizes query execution. They are similar to materialized
views in that they store result sets on disk rather than compute them each time they are used in a
query. The result sets are automatically refreshed whenever data values are inserted or loaded.
Using projections provides the following benefits:
l

Projections compress and encode data to greatly reduce the space required for storing data.
Additionally, HP Vertica operates on the encoded data representation whenever possible to

HP Vertica Analytics Platform (7.1.x)

Page 90 of 5055

HP Vertica Documentation

avoid the cost of decoding. This combination of compression and encoding optimizes disk
space while maximizing query performance. See Projection Performance.
l

Projections can be segmented or replicated across database nodes depending on their size. For
instance, projections for large tables can be segmented and distributed across all nodes.
Unsegmented projections for small tables can be replicated across all nodes in the database.
See Projection Performance.

Projections are transparent to end-users of SQL. The HP Vertica query optimizer automatically
picks the best projections to use for any query.

Projections also provide high availability and recovery. To ensure high availability and recovery,
HP Vertica duplicates table columns on at least K+1 nodes within the cluster. Thus, if one
machine fails in a K-Safe environment, the database continues to operate normally using
duplicate data on the remaining nodes. Once the node resumes its normal operation, it
automatically recovers its data and lost objects by querying other nodes. See High Availability
and Recovery for an overview of this feature and High Availability With Projections for an
explanation of how HP Vertica uses projections to ensure high availability and recovery.

How Projections Are Created


For each table in the database, HP Vertica requires a minimum of one projection, called a
superprojection. A superprojection is a projection for a single table that contains all the columns in
the table.
To get your database up and running quickly, HP Vertica automatically creates a superprojection
when you load or insert data into an existing table created using the CREATE TABLE or CREATE
TEMPORARY TABLE statement.
By creating a superprojection for each table in the database, HP Vertica ensures that all SQL
queries can be answered. Default superprojections do not exploit the full power of HP Vertica.
Therefore, Vertica recommends loading a sample of your data and then running the Database
Designer to create optimized projections. Database Designer creates new projections that optimize
your database based on its data statistics and the queries you use. The Database Designer:
1. Analyzes your logical schema, sample data, and sample queries (optional)
2. Creates a physical schema design (projections) in the form of a SQL script that can be
deployed automatically or manually
In most cases, the designs created by the Database Designer provide excellent query performance
within physical constraints. The Database Designer uses sophisticated strategies to provide

HP Vertica Analytics Platform (7.1.x)

Page 91 of 5055

HP Vertica Documentation

excellent ad-hoc query performance while using disk space efficiently. If you prefer, you can design
custom projections.
For more information about creating projections, see Creating a Database Design in the
Administrator's Guide.

Anatomy of a Projection
The CREATE PROJECTION statement defines the individual elements of a projection, as the
following graphic shows.

The previous example contains the following significant elements:

Column List and Encoding


Lists every column in the projection and defines the encoding for each column. Unlike traditional
database architectures, HP Vertica operates on encoded data representations. Therefore, HP
recommends that you use data encoding because it results in less disk I/O.

Base Query
Identifies all the columns to incorporate in the projection through column name and table name
references. The base query for large table projections can contain PK/FK joins to smaller tables.

HP Vertica Analytics Platform (7.1.x)

Page 92 of 5055

HP Vertica Documentation

Sort Order
The sort order optimizes for a specific query or commonalities in a class of queries based on the
query predicate. The best sort orders are determined by the WHERE clauses. For example, if a
projection's sort order is (x, y), and the query's WHERE clause specifies (x=1 AND y=2), all of
the needed data is found together in the sort order, so the query runs almost instantaneously.
You can also optimize a query by matching the projection's sort order to the query's GROUP BY
clause. If you do not specify a sort order, HP Vertica uses the order in which columns are specified
in the column definition as the projection's sort order.
The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values
so that a disk read can pick up many results at once. For maximum performance, do not sort
projections on LONG VARBINARY and LONG VARCHAR columns.

Segmentation
The segmentation clause determines whether a projection is segmented across nodes within the
database. Segmentation distributes contiguous pieces of projections, called segments, for large
and medium tables across database nodes. Segmentation maximizes database performance by
distributing the load. Use SEGMENTED BY HASH to segment large table projections.
For small tables, use the UNSEGMENTED keyword to direct HP Vertica to replicate these tables,
rather than segment them. Replication creates and stores identical copies of projections for small
tables across all nodes in the cluster. Replication ensures high availability and recovery.
For maximum performance, do not segment projections on LONG VARBINARY and LONG
VARCHAR columns.

Projection Concepts
For each table in the database, HP Vertica requires a projection, called a superprojection. A
superprojection is a projection for a single table that contains all the columns in the table. By
creating a superprojection for each table in the database, HP Vertica ensures that all SQL queries
can be answered.
In addition to superprojections, you can optimize your queries by creating one or more projections
that contain only the subset of table columns required to process the query. These projections are
called query-specific projections.
Projections can contain joins between tables that are connected by PK/FK constraints. These
projections are called pre-join projections. Pre-join projections can have only inner joins between

HP Vertica Analytics Platform (7.1.x)

Page 93 of 5055

HP Vertica Documentation

tables on their primary and foreign key columns. Outer joins are not allowed. Pre-join projections
provide a significant performance advantage over joining tables at query run-time.

Projection Performance
HP Vertica provides the following methods for maximizing the performance of all projections:

Encoding and Compression


HP Vertica operates on encoded data representations. Therefore, HP encourages you to use data
encoding whenever possible because it results in less disk I/O and requires less disk space. For a
description of the available encoding types, see Encoding-Type in the SQL Reference Manual.

Sort Order
The sort order optimizes for a specific query or commonalities in a class of queries based on the
query predicate. For example, if the WHERE clause of a query is (x=1 AND y=2) and a projection is
sorted on (x, y), the query runs almost instantaneously. It is also useful for sorting a projection to
optimize a group by query. Simply match the sort order for the projection to the query group by
clause.

Segmentation
Segmentation distributes contiguous pieces of projections, called segments, for large tables across
database nodes. This maximizes database performance by distributing the load. See Projection
Segmentation.
In many cases, the performance gain for superprojections provided through these methods is
sufficient enough that creating additional query-specific projections is unnecessary.

Projection Segmentation
Projection segmentation splits individual projections into chunks of data of similar size, called
segments. One segment is created for and stored on each node. Projection segmentation provides
high availability and recovery and optimizes query execution. Specifically, it:
l

Ensures high availability and recovery through K-Safety.

Spreads the query execution workload across multiple nodes.

Allows each node to be optimized for different query workloads.

HP Vertica Analytics Platform (7.1.x)

Page 94 of 5055

HP Vertica Documentation

HP Vertica segments large tables, to spread the query execution workload across multiple nodes.
HP Vertica does not segment small tables; instead, HP Vertica replicates small projections,
creating a duplicate of each unsegmented projection on each node.

Hash Segmentation
HP Vertica uses hash segmentation to segment large projections. Hash segmentation allows you
to segment a projection based on a built-in hash function that provides even distribution of data
across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed
consists of one or more column values, each having a large number of unique values and an
acceptable amount of skew in the value distribution. Primary key columns that meet the criteria
could be an excellent choice for hash segmentation.

Projection Naming
HP Vertica uses a standard naming convention for projections. The first part of the projection name
is the name of the associated table, followed by characters that HP Vertica appends to the table
name; this string is called the projection's base name. All buddy projections have the same base
name so they can be identified as a group.
HP Vertica then appends a suffix that indicates the projection type. The projection type suffix,
described in the following table, can be:
l

_super

_<node_name>

_b<offset>
Projection Type

Suffix

Examples

Unsegmented or segmented

_super

Example:

(when only one auto projection


was created with the table)

customer_dimension_vmart_super

Unique name example:


customer_dimension_vmart_super_v1

HP Vertica Analytics Platform (7.1.x)

Page 95 of 5055

HP Vertica Documentation

Projection Type

Suffix

Examples

Replicated (unsegmented) on

_<node_name>

Example:

all nodes

customer_dimension_vmart_node01
customer_dimension_vmart_node02
customer_dimension_vmart_node03

Unique name example:


customer_dimension_vmart_v1_node01
customer_dimension_vmart_v1_node02
customer_dimension_vmart_v1_node03

Segmented (when multiple

_b<offset>

buddy projections were created


with the table)

Example:
customer_dimension_vmart_b0
customer_dimension_vmart_b1

Unique name example:


customer_dimension_vmart_v1_b0
customer_dimension_vmart_v2_b1

If the projection-naming convention will result in a duplicate name, HP Vertica automatically


appends v1 or v2 to the projection name. HP Vertica uses this naming convention for projections
created by the CREATE TABLE statement or by the Database Designer.
Note: If the projection name exceeds the maximum length, HP Vertica truncates the projection
name.

Database Setup
The process of setting up an HP Vertica database is described in detail in the Configuring the
Database. It involves the following tasks:

Prepare SQL Scripts and Data Files


The first part of the setup procedure can be done well before HP Vertica is installed. It consists of
preparing the following files:
l

Logical schema script

Loadable data files

HP Vertica Analytics Platform (7.1.x)

Page 96 of 5055

HP Vertica Documentation

Load scripts

Sample query script (training set)

Create the Database


This part requires that HP Vertica be installed on at least one host. The following tasks are not in
sequential order.
l

Use the Administration Tools to:


n

Create a database

Connect to the database

Use the Database Designer to design the physical schema.

Use the vsql interactive interface to run SQL scripts that:


n

Create tables and constraints

Create projections

Test the Empty Database


l

Test for sufficient projections using the sample query script

Test the projections for K-safety

Test the Partially-Loaded Database


l

Load the dimension tables

Partially load the fact table

Check system resource usage

Check query execution times

Check projection usage

HP Vertica Analytics Platform (7.1.x)

Page 97 of 5055

HP Vertica Documentation

Complete the Fact Table Load


l

Monitor system usage

Complete the fact table load

Set up Security
For security-related tasks, see Implementing Security.
l

[Optional] Set up SSL

[Optional] Set up client authentication

Set up database users and privileges

Set up Incremental Loads


Set up periodic ("trickle") loads.

Database Connections
You can connect to an HP Vertica database in the following ways:
l

Interactively using the vsql client, as described in Using vsql in the Administrator's Guide.
vsql is a character-based, interactive, front-end utility that lets you type SQL statements and
see the results. It also provides a number of meta-commands and various shell-like features that
facilitate writing scripts and automating a variety of tasks.
You can run vsql on any node within a database. To start vsql, use the Administration Tools or
the shell command described in Using vsql.

Programmatically using the JDBC driver provided by HP Vertica, as described in Programming


JDBC Client Applications in the Connecting to HP Vertica Guide.
An abbreviation for Java Database Connectivity, JDBC is a call-level application programming
interface (API) that provides connectivity between Java programs and data sources (SQL
databases and other non-relational data sources, such as spreadsheets or flat files). JDBC is
included in the Java 2 Standard and Enterprise editions.

Programmatically using the ODBC driver provided by HP Vertica, as described in Programming

HP Vertica Analytics Platform (7.1.x)

Page 98 of 5055

HP Vertica Documentation

ODBC Client Applications in the Connecting to HP Vertica Guide.


An abbreviation for Open DataBase Connectivity, ODBC is a standard application programming
interface (API) for access to database management systems.
l

Programmatically using the ADO.NET driver provided by HP Vertica, as described in


Programming ADO.NET Applications in the Connecting to HP Vertica Guide.
The HP Vertica driver for ADO.NET allows applications written in C# and Visual Studio to read
data from, update, and load data into HP Vertica databases. It provides a data adapter that
facilitates reading data from a database into a data set, and then writing changed data from the
data set back to the database. It also provides a data reader (VerticaDataReader) for reading
data and autocommit functionality for committing transactions automatically.

Programmatically using Perl and the DBI driver, as described in Programming Perl Client
Applications in the Connecting to HP Vertica Guide.
Perl is a free, stable, open source, cross-platform programming language licensed under its
Artistic License, or the GNU General Public License (GPL).

Programmatically using Python and the pyodbc driver, as described in Programming Python
Client Applications in the Connecting to HP Vertica Guide.
Python is a free, agile, object-oriented, cross-platform programming language designed to
emphasize rapid development and code readability.

HP recommends that you deploy HP Vertica as the only active process on each machine in the
cluster and connect to it from applications on different machines. HP Vertica expects to use all
available resources on the machine, and to the extent that other applications are also using these
resources, suboptimal performance could result.

The Administration Tools


HP Vertica provides a set of tools that allows you to perform administrative tasks quickly and
easily. Most of the database administration tasks in HP Vertica can be done using the
Administration Tools.
Always run the Administration Tools using the Database Administrator account on the
Administration host, if possible. Make sure that no other Administration Tools processes are
running.
If the Administration host is unresponsive, run the Administration Tools on a different node in the
cluster. That node permanently takes over the role of Administration host.

HP Vertica Analytics Platform (7.1.x)

Page 99 of 5055

HP Vertica Documentation

A man page is available for admintools. If you are running as the dbadmin user, simply type: man
admintools. If you are running as a different user, type: man -M /opt/vertica/man admintools.

Running the Administration Tools


At the Linux command line:
$ /opt/vertica/bin/admintools [ -t | --tool ] toolname [ options ]

toolname

Is one of the tools described in the Administration Tools Reference.

options

-h--help

Shows a brief help message and exits.

-a--help_all

Lists all command-line subcommands and options as described


in Writing Administration Tools Scripts.

If you omit toolname and options parameters, the Main Menu dialog box appears inside your
console or terminal window with a dark blue background and a title on top. The screen captures
used in this documentation set are cropped down to the dialog box itself, as shown below.

If you are unfamiliar with this type of interface, read Using the Administration Tools Interface before
you do anything else.

HP Vertica Analytics Platform (7.1.x)

Page 100 of 5055

HP Vertica Documentation

First Time Only


The first time you log in as the Database Administrator and run the Administration Tools, the user
interface displays.
1. In the EULA (end-user license agreement) window, type accept to proceed.
A window displays, requesting the location of the license key file you downloaded from the HP
Web site. The default path is /tmp/vlicense.dat.
2. Type the absolute path to your license key (for example, /tmp/vlicense.dat) and click OK.

Between Dialogs
While the Administration Tools are working, you see the command line processing in a window
similar to the one shown below. Do not interrupt the processing.

HP Vertica Analytics Platform (7.1.x)

Page 101 of 5055

HP Vertica Documentation

Management Console
Management Console (MC) is a database management tool that provides a unified view of your HP
Vertica cluster. Using a browser as a single point of access, you can create, import, manage, and
monitor multiple databases on one or more clusters. You can also create and manage MC users.
You then can map these users to an HP Vertica database and manage them through the MC
interface.

What You Can Do with Management Console


Create...
A database cluster on hosts that do not have HP Vertica installed
Multiple HP Vertica databases on one or more clusters from a single point of control
MC users and grant them access to MC and databases managed by MC
Configure...
Database parameters and user settings dynamically
Resource pools
Monitor...
License usage and conformance
Dynamic metrics about your database cluster
Resource pools
User information and activity on MC
Alerts by accessing a single message box of alerts for all managed databases
Recent databases and clusters through a quick link
Multiple HP Vertica databases on one or more clusters from a single point of control
Import or Export...
Export all database messages or log/query details to a file
Import multiple HP Vertica databases on one or more clusters from a single point of control
Troubleshoot...

HP Vertica Analytics Platform (7.1.x)

Page 102 of 5055

HP Vertica Documentation

Create...
Configure...
Monitor...
Import or Export...
Troubleshoot...
MC-related issues through a browser

Management Console provides some, but not all, the functionality that Administration Tools
provides. Management Console also includes extended functionality not available in admintools.
This additional functionality includes a graphical view of your HP Vertica database and detailed
monitoring charts and graphs. See Administration Tools and Management Console in the
Administrator's Guide for more information.

Getting MC
Download the HP Vertica server RPM and the MC package from myVertica Portal. You then have
two options:
l

Install HP Vertica and MC at the command line, and import one or more HP Vertica database
clusters into the MC interface

Install HP Vertica directly through MC

See the Installation Guide for details.

What You Need to Know


If you plan to use MC, review the following topics in the Administrator's Guide:
If you want to ...

See ...

Create a new, empty HP Vertica database

Create a Database on a Cluster

Import an existing HP Vertica database cluster into

Managing Database Clusters on MC

MC
Understand how MC users differ from database

About MC Users

users

HP Vertica Analytics Platform (7.1.x)

Page 103 of 5055

HP Vertica Documentation

If you want to ...

See ...

Read about the MC privilege model

About MC Privileges and Roles

Create new MC users

Creating an MC User

Grant MC users privileges on one or more HP

Granting Database Access to MC Users

Vertica databases managed by MC


Use HP Vertica functionality through the MC

Using Management Console

interface
Monitor MC and HP Vertica databases managed

Monitoring HP Vertica Using Management

by MC

Console

Monitor and configure Resource Pools

Monitoring and Configuring Resource


Pools in Management Console

Management Console Architecture


MC accepts HTTP requests from a client web browser, gathers information from the HP Vertica
database cluster, and returns that information back to the browser for monitoring.

MC Components
The primary components that drive Management Console are an application/web server and agents
that get installed on each node in the HP Vertica cluster.
The following diagram is a logical representation of MC, the MC user's interface, and the database
cluster nodes.

HP Vertica Analytics Platform (7.1.x)

Page 104 of 5055

HP Vertica Documentation

Application/web Server
The application server hosts MC's web application and uses port 5450 for node-to-MC
communication and to perform the following jobs:
l

Manage one or more HP Vertica database clusters

Send rapid updates from MC to the web browser

Store and report MC metadata, such as alerts and events, current node state, and MC users, on
a lightweight, embedded (Derby) database

Retain workload history

MC Agents
MC agents are internal daemon process that run on each HP Vertica cluster node. The default
agent port, 5444, must be available for MC-to-node and node-to-node communications. Agents
monitor MC-managed HP Vertica database clusters and communicate with MC to provide the
following functionality:
l

Provide local access, command, and control over database instances on a given node, using
functionality similar to Administration Tools

Report log-level data from the Administration Tools and Vertica log files

Cache details from long-running jobssuch as create/start/stop database operationsthat you


can view through your browser

Track changes to data-collection and monitoring utilities and communicate updates to MC

Communicate between all cluster nodes and MC through a webhook subscription, which
automates information sharing and reports on cluster-specific issues like node state, alerts,
events, and so on.

See Also
l

Monitoring HP Vertica Using MC

HP Vertica Analytics Platform (7.1.x)

Page 105 of 5055

HP Vertica Documentation

Management Console Security


Through a single point of control, the Management Console (MC) platform is designed to manage
multiple HP Vertica clusters, all which might have differing levels and types of security, such as
user names and passwords and LDAP authentication. You can also manage MC users who have
varying levels of access across these components.

OAuth and SSL


Management Console (MC) uses a combination of OAuth (Open Authorization), Secure Socket
Layer (SSL), and locally-encrypted passwords to secure HTTPS requests between a user's
browser and MC, and between MC and the agents. Authentication occurs through MC and between
agents within the cluster. Agents also authenticate and authorize jobs.
The MC configuration process sets up SSL automatically, but you must have the openssl package
installed on your Linux environment first.

See the following topics in the in the Administrator's Guide for more information:
l

SSL Prerequisites

Implementing SSL

Generating Certificates and Keys for MC

Importing a New Certificate to MC

User Authentication and Access


MC provides two authentication schemes for users: LDAP or MC. You can use only one method at
a time. For example, if you chose LDAP, all MC users will be authenticated against your
organization's LDAP server.
You set LDAP authentication up through MC Settings > Authentication on the MC interface.
Note: MC uses LDAP data for authentication purposes onlyit does not modify user
information in the LDAP repository.

HP Vertica Analytics Platform (7.1.x)

Page 106 of 5055

HP Vertica Documentation

The MC authentication method stores MC user information internally and encrypts passwords.
These MC users are not system (Linux) users; they are accounts that have access to MC and,
optionally, to one or more MC-managed HP Vertica databases through the MC interface.
Management Console also has rules for what users can see when they sign in to MC from a client
browser. These rules are governed by access levels, each of which is made up of a set of roles.

See Also
l

About MC Users

About MC Privileges and Roles

Creating an MC User

Management Console Home Page


The MC Home page is the entry point to all MC-managed HP Vertica database clusters and MC
users. Information on this page, as well as throughout the MC interface, will appear or be hidden,
based on the signed-on user's permissions (access levels). Layout and navigation are described in
Using Management Console.

Database Designer
HP Vertica's Database Designer is a tool that:

HP Vertica Analytics Platform (7.1.x)

Page 107 of 5055

HP Vertica Documentation

Analyzes your logical schema, sample data, and, optionally, your sample queries.

Creates a physical schema design (a set of projections) that can be deployed automatically (or
manually).

Can be used by anyone without specialized database knowledge (even business users can run
Database Designer).

Can be run and re-run any time for additional optimization without stopping the database.

There are three ways to run Database Designer:


l

Using the Management Console, as described in Using Management Console to Create a


Design

Programmatically, using the steps described in About Running HP Vertica Programmatically.

Using the Administration Tools by selecting Configuration Menu > Run Database Designer.
For details, see Using the Administration Tools to Create a Design

There are two types of designs you can create with Database Designer:
l

A comprehensive design, which allows you to create new projections for all tables in your
database.

An incremental design, which creates projections for all tables referenced in the queries you
supply.

Some of the benefits that Database Designer provides:


l

Accepts up to 100 queries in the query input file for an incremental design.

Accepts unlimited queries for a comprehensive design.

Produces higher quality designs by considering UPDATE and DELETE statements.

In most cases, the designs created by Database Designer provide optimal query performance
within physical constraints. Database Designer uses sophisticated strategies to provide optimal
query performance and data compression.

HP Vertica Analytics Platform (7.1.x)

Page 108 of 5055

HP Vertica Documentation

See Also
l

Physical Schema

Creating a Database Design

Database Security
HP Vertica secures access to the database and its resources by enabling you to control who has
access to the database and what they are authorized to do with database resources once they have
gained access. See Implementing Security.

Data Loading and Modification


The SQL Data Manipulation Language (DML) commands INSERT, UPDATE, and DELETE
perform the same functions in HP Vertica as they do in row-oriented databases. These commands
follow the SQL-92 transaction model and can be intermixed.
In HP Vertica, the COPY statement is designed for bulk loading data into the database. COPY
reads data from text files or data pipes and inserts it into WOS (memory) or directly into the ROS
(disk). COPY can load compressed formats such as GZIP and LZO. COPY automatically commits
itself and any current transaction but is not atomic; some rows could be rejected. Note that COPY
does not automatically commit when copying data into temporary tables.
You can use the COPY statement's NO COMMIT option to prevent COPY from committing a
transaction when it finishes copying data. You often want to use this option when sequentially
running several COPY statements to ensure the data in the bulk load is either committed or rolled
back at the same time. Also, combining multiple smaller data loads into a single transaction allows
HP Vertica to load the data more efficiently. See the COPY statement in the SQL Reference
Manual for more information.
You can use multiple, simultaneous database connections to load and/or modify data.
For more information about bulk loading, see Bulk Loading Data.

Workload Management
HP Vertica provides a sophisticated resource management scheme that allows diverse, concurrent
workloads to run efficiently on the database. For basic operations, the built-in GENERAL pool is
pre-configured based on RAM and machine cores, but you can customized this pool to handle
specific concurrency requirements.

HP Vertica Analytics Platform (7.1.x)

Page 109 of 5055

HP Vertica Documentation

You can also define new resource pools that you configure to limit memory usage, concurrency,
and query priority. You can then optionally restrict each database user to use a specific resource
pool, which control memory resources used by their requests.
User-defined pools are useful if you have competing resource requirements across different
classes of workloads. Example scenarios include:
l

A large batch job takes up all server resources, leaving small jobs that update a web page to
starve, which can degrade user experience.
In this scenario, you can create a resource pool to handle web page requests and ensure users
get resources they need. Another option is to create a limited resource pool for the batch job, so
the job cannot use up all system resources.

A certain application has lower priority than other applications, and you would like to limit the
amount of memory and number of concurrent users for the low-priority application.
In this scenario, you could create a resource pool with an upper limit on the query's memory and
associate the pool with users of the low-priority application.

For more information, best practices, and additional scenarios, see Managing Workload Resources
in the Administrator's Guide.

HP Vertica Analytics Platform (7.1.x)

Page 110 of 5055

HP Vertica Documentation

SQL Overview
An abbreviation for Structured Query Language, SQL is a widely-used, industry standard data
definition and data manipulation language for relational databases.
Note: In HP Vertica, use a semicolon to end a statement or to combine multiple statements on
one line.

HP Vertica Support for ANSI SQL Standards


HP Vertica SQL supports a subset of ANSI SQL-99.
See BNF Grammar for SQL-99

Support for Historical Queries


Unlike most databases, the DELETE command in HP Vertica does not delete data; it marks
records as deleted. The UPDATE command performs an INSERT and a DELETE. This behavior is
necessary for historical queries. See Historical (Snapshot) Queries in the Analyzing Data Guide.

Joins
HP Vertica supports typical data warehousing query joins. For details, see Joins in the Analyzing
Data Guide.
HP Vertica also provides the INTERPOLATE predicate, which allows for a special type of join. The
event series join is an HP Vertica SQL extension that lets you analyze two event series when their
measurement intervals dont align preciselysuch as when timestamps don't match. These joins
provide a natural and efficient way to query misaligned event data directly, rather than having to
normalize the series to the same measurement interval. See Event Series Joins in the Analyzing
Data Guide for details.

Transactions
Session-scoped isolation levels determine transaction characteristics for transactions within a
specific user session. You set them through the SET SESSION CHARACTERISTICS command.
Specifically, they determine what data a transaction can access when other transactions are
running concurrently. See Transactions in the Concepts Guide.

HP Vertica Analytics Platform (7.1.x)

Page 111 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 112 of 5055

HP Vertica Documentation

About Query Execution


When you submit a query, the initiator chooses the projections to use, optimizes and plans the
query execution, and logs the SQL statement to its log. Planning and optimization are quick,
requiring at most a few milliseconds.
Based on the tables and projections chosen, the query plan that the optimizer produces is
decomposed into mini-plans. These mini-plans are distributed to the other nodes, known as
executors, to handle, for example, other segments of a segmented fact table. (The initiator node
typically does executor work as well.) The nodes process the mini-plans in parallel, interspersed
with data movement operations.
The query execution proceeds in data-flow style, with intermediate result sets (rows) flowing
through network connections between the nodes as needed. Some, but not all, of the tasks
associated with a query are recorded in the executors' log files.
In the final stages of executing a query plan, some wrapup work is done at the initiator, such as:
l

Combining results in a grouping operation

Merging multiple sorted partial result sets from all the executors

Formatting the results to return to the client

The initiator has a little more work to do than the other nodes, but if the projections are well designed
for the workload, the nodes of the cluster share most of the work of executing expensive queries.
Some small queries, for example, queries on replicated dimension tables, can be executed locally.
In these types of queries, the query planning avoids unnecessary network communication.
For detailed information about writing and executing queries, see Writing Queries in the Analyzing
Data Guide.

Snapshot Isolation Mode


HP Verticacan run any SQL query in snapshot isolation mode in order to obtain the fastest possible
execution. To be precise, snapshot isolation mode is actually a form of a historical query. The
syntax is:
AT EPOCH LATEST SELECT...

The command queries all data in the database up to but not including the current epoch without
holding a lock or blocking write operations, which could cause the query to miss rows loaded by
other users up to (but no more than) a specific number of minutes before execution.

HP Vertica Analytics Platform (7.1.x)

Page 113 of 5055

HP Vertica Documentation

Historical Queries
HP Vertica can run a query from a snapshot of the database taken at a specific date and time or at a
specific epoch. The syntax is:
AT TIME 'timestamp' SELECT...
AT EPOCH epoch_number SELECT...
AT EPOCH LATEST SELECT...

The command queries all data in the database up to and including the specified epoch or the epoch
representing the specified date and time, without holding a lock or blocking write operations. The
specified TIMESTAMP and epoch_number values must be greater than or equal to the Ancient
History Mark epoch.
Historical queries, also known as snapshot queries, are useful because they access data in past
epochs only. Historical queries do not need to hold table locks or block write operations because
they do not return the absolute latest data. Their content is private to the transaction and valid only
for the length of the transaction.
Historical queries behave in the same manner regardless of transaction isolation level. Historical
queries observe only committed data, even excluding updates made by the current transaction,
unless those updates are to a temporary table.
Be aware that there is only one snapshot of the logical schema. This means that any changes you
make to the schema are reflected across all epochs. If, for example, you add a new column to a
table and you specify a default value for the column, all historical epochs display the new column
and its default value.
The DELETE command in HP Vertica does not actually delete data; it marks records as deleted.
(The UPDATE command is actually a combined INSERT and a DELETE.) Thus, you can control
how much deleted data is stored on disk. For more information, see Managing Disk Space in the
Administrator's Guide.

HP Vertica Analytics Platform (7.1.x)

Page 114 of 5055

HP Vertica Documentation

Transactions
When transactions in multiple user sessions concurrently access the same data, session-scoped
isolation levels determine what data each transaction can access.
A transaction retains its isolation level until it completes, even if the session's transaction isolation
level changes mid-transaction. HP Vertica internal processes (such as the Tuple Mover and
refresh operations) and DDL operations are always run at SERIALIZABLE isolation level to ensure
consistency.
The HP Vertica query parser supports standard ANSI SQL-92 isolation levels as follows:
l

READ UNCOMMITTED : Automatically interpreted as READ COMMITTED.

READ COMMITTED (default)

REPEATABLE READ: Automatically interpreted as SERIALIZABLE

SERIALIZABLE

Transaction isolation levels READ COMMITTED and SERIALIZABLE differ as follows:


Isolation level Dirty read

Non-repeatable read Phantom read

READ COMMITTED

Not Possible Possible

Possible

SERIALIZABLE

Not Possible Not Possible

Not Possible

You can set separate isolation levels for the database and individual transactions.

Implementation Details
HP Vertica supports conventional SQL transactions with standard ACID properties:
l

ANSI SQL 92 style-implicit transactions. You do not need to run a BEGIN or START
TRANSACTION command.

No redo/undo log or two-phase commits.

The COPY command automatically commits itself and any current transaction (except when
loading temporary tables). It is generally good practice to commit or roll back the current
transaction before you use COPY. This step optional for DDL statements, which are autocommitted.

HP Vertica Analytics Platform (7.1.x)

Page 115 of 5055

HP Vertica Documentation

Rollback
Transaction rollbacks restore a database to an earlier state by discarding changes made by that
transaction. Statement-level rollbacks discard only the changes initiated by the reverted
statements. Transaction-level rollbacks discard all changes made by the transaction.
With a ROLLBACK statement, you can explicitly roll back to a named savepoint within the
transaction, or discard the entire transaction. HP Vertica can also initiate automatic rollbacks in two
cases:
l

An individual statement returns an ERROR message. In this case, HP Vertica rolls back the
statement.

DDL errors, systemic failures, dead locks, and resource constraints return a ROLLBACK
message. In this case, HP Vertica rolls back the entire transaction.

Explicit and automatic rollbacks always release any locks that the transaction holds.

Savepoints
A savepoint is a special marker inside a transaction that allows commands that execute after the
savepoint to be rolled back. The transaction is restored to the state that preceded the savepoint.
HP Vertica supports two types of savepoints:
l

An implicit savepoint is automatically established after each successful command within a


transaction. This savepoint is used to roll back the next statement if it returns an error. A
transaction maintains one implicit savepoint, which it rolls forward with each successful
command. Implicit savepoints are available to HP Vertica only and cannot be referenced
directly.

Named savepoints are labeled markers within a transaction that you set through SAVEPOINT
statements. A named savepoint can later be referenced in the same transaction through
RELEASE SAVEPOINT, which destroys it, and ROLLBACK TO SAVEPOINT, which rolls
back all operations that followed the savepoint. Named savepoints can be especially useful in
nested transactions:a nested transaction that begins with a savepoint can be rolled back
entirely, if necessary.

READ COMMITTED Isolation


When the isolation level READ COMMITTED is in effect, a SELECT query obtains a snapshot of
committed data at the transaction's start. Subsequent queries during the current transaction also

HP Vertica Analytics Platform (7.1.x)

Page 116 of 5055

HP Vertica Documentation

see the results of uncommitted updates that already executed in the same transaction.
DML statements acquire write locks to prevent other READ COMMITTED transactions from modifying
the same data. SELECT statements do not acquire locks, so concurrent transactions can obtain read
and write access to the same selection.
READ COMMITTED is the default isolation level. For most queries, this isolation level balances
database consistency and concurrency. However, this isolation level can allow one transaction to
change the data that another transaction is in the process of accessing. This can yield
nonrepeatable and phantom reads. Applications with complex queries and updates that require a
more consistent view of the database should use SERIALIZABLE isolation.
The following graphic shows how READ COMMITTED isolation might control how concurrent
transactions read and write the same data:

READ COMMITTED isolation maintains exclusive write locks until a transaction ends, as shown in the
following graphic:

HP Vertica Analytics Platform (7.1.x)

Page 117 of 5055

HP Vertica Documentation

See Also
l

LOCKS

SET SESSION CHARACTERISTICS

Configuration Parameters

SERIALIZABLE Isolation
SERIALIZABLE is the strictest SQL transaction isolation level. While this isolation level permits
transactions to run concurrently, it creates the effect that transactions are running in serial order.
Transactions acquire locks for read and write operations, which ensures that successive SELECT
commands within a single transaction always produce the same results. Because SERIALIZABLE
isolation provides a consistent view of data, it is useful for applications that require complex queries
and updates. However, it reduces concurrency. For example, it blocks queries during a bulk load.
SERIALIZABLE isolation establishes the following locks:

HP Vertica Analytics Platform (7.1.x)

Page 118 of 5055

HP Vertica Documentation

Table-level read locks are acquired on selected tables and released at the end of the transaction.
This prevents one transaction from modifying rows while they are being read by another
transaction.

Table-level write locks are acquired on update and are released at the end of the transaction.
This behavior prevents one transaction from reading another transaction's changes to rows
before those changes are committed.

At the start of a transaction, a SELECT statement obtains a snapshot of the selection's committed
data. It also sees the results of updates that are run within its transaction before they are
committed.
The following example shows how concurrent transactions that both have SERIALIZABLE isolation
levels handle locking:

Applications that use SERIALIZABLE must be prepared to retry transactions due to serialization
failures, which are often due to deadlocks. When a deadlock occurs, any transaction awaiting a
lock automatically times out after five minutes. The following graphic shows how deadlock might
occur, and how HP Vertica handles it:

HP Vertica Analytics Platform (7.1.x)

Page 119 of 5055

HP Vertica Documentation

Note: SERIALIZABLE isolation does not apply to temporary tables. No locks are required for
these tables, as they are isolated by their transaction scope.

HP Vertica Analytics Platform (7.1.x)

Page 120 of 5055

HP Vertica Documentation

International Languages and Character Sets


This section describes how HP Vertica handles internationalization and character sets.

Unicode Character Encoding


UTF-8 is an abbreviation for Unicode Transformation Format-8 (where 8 equals 8-bit) and is a
variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. UTF-8
can represent any universal character in the Unicode standard, yet the initial encoding of byte
codes and character assignments for UTF-8 is coincident with ASCII (requiring little or no change
for software that handles ASCII but preserves other values).
All input data received by the database server is expected to be in UTF-8, and all data output by HP
Vertica is in UTF-8. The ODBC API operates on data in UCS-2 on Windows systems, and normally
UTF-8 on Linux systems. JDBC and ADO.NET APIs operate on data in UTF-16. The client drivers
automatically convert data to and from UTF-8 when sending to and receiving data from HP Vertica
using API calls. The drivers do not transform data loaded by executing a COPY or COPY LOCAL
statement.
See Implement Locales for International Data Sets in the Administrator's Guide for details.

Locales
The locale is a parameter that defines the user's language, country, and any special variant
preferences, such as collation. HP Vertica uses the locale to determine the behavior of certain
string functions. The locale also determines the collation for various SQL commands that require
ordering and comparison, such as GROUP BY, ORDER BY, joins, and the analytic ORDER BY
clause.
By default, the locale for your HP Vertica database is en_US@collation=binary (English US). You
can define a new default locale that is used for all sessions on the database. You can also override
the locale for individual sessions. However, projections are always collated using the default en_
US@collation=binary collation, regardless of the session collation. Any locale-specific collation is
applied at query time.
You can set the locale through ODBC, JDBC, and ADO.net.
See the following topics in the Administrator's Guide for details:
l

Implement Locales for International Data Sets

Supported Locales in the Appendix

HP Vertica Analytics Platform (7.1.x)

Page 121 of 5055

HP Vertica Documentation

String Functions
HP Vertica provides string functions to support internationalization. Unless otherwise specified,
these string functions can optionally specify whether VARCHAR arguments should be interpreted
as octet (byte) sequences, or as (locale-aware) sequences of characters. This is accomplished by
adding "USING OCTETS" and "USING CHARACTERS" (default) as a parameter to the function.
See String Functions in the SQL Reference Manual for details.

Character String Literals


By default, string literals ('...') treat back slashes literally, as specified in the SQL standard.
Tip: If you have used previous releases of HP Vertica and you do not want string literals to
treat back slashes literally (for example, you are using a back slash as part of an escape
sequence), you can turn off the StandardConformingStrings configuration parameter. See
Internationalization Parameters in the Administrator's Guide. You can also use the
EscapeStringWarning parameter to locate back slashes which have been incorporated into
string literals, in order to remove them.
See Character String Literals in the SQL Reference Manual for details.

HP Vertica Analytics Platform (7.1.x)

Page 122 of 5055

HP Vertica Documentation

Extending HP Vertica
HP Vertica lets you extend its capabilities through several different features:
l

User-Defined SQL Functions let you define a function using HP Vertica SQL statements.

User Defined Extensions and User Defined Functions are high-performance extensions to HP
Vertica's capabilities you develop using the HP Vertica Software Development Kit (SDK).

External Procedures let you pipe data from HP Vertica through external programs or shell scripts
to perform some form of processing on it.

User-Defined SQL Functions


User-Defined SQL Functions let you define and store commonly-used SQL expressions as a
function. User-Defined SQL Functions are useful for executing complex queries and combining HP
Vertica built-in functions. You simply call the function name you assigned in your query.
A User-Defined SQL Function can be used anywhere in a query where an ordinary SQL expression
can be used, except in the table partition clause or the projection segmentation clause.

User Defined Extensions and User Defined Functions


User Defined Extension (UDx) refers to all extensions to HP Vertica developed using the APIs in
the HP Vertica SDK. UDxs encompass functions such as User Defined Scalar Functions
(UDSFs), and utilities such as the User Defined Load (UDL) feature that let you create custom data
load routines.
Thanks to their tight integration with HP Vertica, UDxs usually have better performance than Userdefined SQL functions or External Procedures.
User Defined Functions (UDFs) are a specific type of UDx. You use them in SQL statements to
process data similarly to HP Vertica's own built-in functions. They give you the power of creating
your own functions that run just slightly slower than HP Vertica's own function.
The HP Vertica SDK uses the term UDx extensively, even for APIs that deal exclusively with
developing UDFs.

HP Vertica Analytics Platform (7.1.x)

Page 123 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 124 of 5055

HP Vertica Documentation

Get Started
To get started using HP Vertica, follow the steps presented in the Getting Started Guide. The
tutorial requires that you install HP Vertica on one or more hosts as described in the Installation
Guide.

HP Vertica Analytics Platform (7.1.x)

Page 125 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 126 of 5055

Getting Started Guide

HP Vertica Analytics Platform (7.1.x)

Page 127 of 5055

HP Vertica Documentation

Using the Getting Started Guide


The purpose of the Getting Started Guide is to show you how to set up an HP Vertica example
database and run simple queries that perform common database tasks.

Who Should Use This Guide?


The Getting Started Guide targets anyone who wants to learn how to create and run an HP Vertica
database. This guide requires no special knowledge at this point, although a rudimentary knowledge
of basic SQL commands is useful when you begin to run queries.

What You Need


The examples provided in this guide require that you:
l

Installed HP Vertica on one host or a cluster of hosts. Hewlett-Packard recommends a minimum


of three hosts in the cluster.

OR
l

Obtained a Virtual Machine (VM) with HP Vertica installed on it.

For further instructions regarding installation, see the Installation Guide.

Accessing Your Database


You access your database either using an SSH client or through the terminal utility in your Linux
Console, such as vsql. Throughout this guide you use the following user interfaces:
l

The Linux command line (shell) interface

The HP Vertica Administration Tools (See Running the Administration Tools in this guide for
details.)

The vsql client interface (See Using vsql.)

The HP Vertica Management Console (See Using Management Console in this guide for
details.)

HP Vertica Analytics Platform (7.1.x)

Page 128 of 5055

HP Vertica Documentation

Downloading and Starting the Virtual Machine


HP Vertica is available as a Virtual Machine (VM) that is pre-installed on a 64-bit CentOS image and
comes with a license for 500 GB of data storage.
The VM image is preconfigured with the following hardware settings:
l

1 CPU

1024 MB RAM

50 GB Hard Disk (SCSI, not preallocated, single file storage)

Bridged Networking

Downloading a VM
The HP Vertica VM is available both as an OVF template (for VMWare vSphere 4.0) and as a
VMDK file (for VMWare Server 2.0 and VMWare Workstation 7.0). Download and install the
appropriate file for your VMWare deployment from the myVertica portal at
http://www.vertica.com/documentation (registration required).

Starting the VM
1. Open the appropriate HP Vertica VM image file in VMWare. For example, open the VMX file if
you are using VMWare Workstation, or the OVF template if you are using VMWare vSphere.
2. Navigate to the settings for the VM image and adjust the network settings so that they are
compatible with your VM.
3. Start the VM. For example, in VMWare Workstation, select VM > Power > Power On.

Checking for HP Vertica Updates


The VM image might not include the latest available HP Vertica release. After you install and start
your VM, verify the version of HP Vertica with the following command.
$ rpm qa | grep vertica
The RPM package name that the command returns contains the version and build numbers. If there
is a later version of HP Vertica, download it from the myVertica portal at
https://my.vertica.com/downloads (registration required). Upgrade instructions are provided in the
Installation Guide.

HP Vertica Analytics Platform (7.1.x)

Page 129 of 5055

HP Vertica Documentation

Types of Database Users


Every HP Vertica database has one or more users. When users connect to a database, they must
log on with valid credentials (username and password) that a database administrator defines.
Database users own the objects they create in a database, such as tables, procedures, and storage
locations. By default, all users have the right to create temporary tables in a database.
In an HP Vertica database, there are three types of users:
l

Database administrator (dbadmin)

Object owner

Everyone else (PUBLIC)

dbadmin User
When you create a new database, a single database administrator account, dbadmin, is
automatically created along with a PUBLIC role. The database administrator bypasses all
permission checks and has the authority to perform all database operations, such as bypassing all
GRANT/REVOKE authorizations and any user granted PSEUDOSUPERUSER role.
Note: Although the dbadmin user has the same name as the Linux database administrator
account, do not confuse the concept of a database administrator with a Linux superuser (root)
privilege; they are not the same. A database administrator cannot have Linux superuser
privileges.
Object Owner
An object owner is the user who creates a particular database object; the owner can perform any
operation on that object. By default, only an owner or a database administrator can act on a
database object. In order to allow other users to use an object, the owner or database administrator
must grant privileges to those users using one of the GRANT statements. Object owners are
PUBLIC users for objects that other users own.
PUBLIC User
All non- administrator and non-object owners are PUBLIC users. Newly created users do not have
access to schema PUBLIC by default. Make sure to GRANT USAGE ON SCHEMA PUBLIC to all
users you create.

HP Vertica Analytics Platform (7.1.x)

Page 130 of 5055

HP Vertica Documentation

Logging in as dbadmin
The first time you boot the VM you are automatically logged in and a web page displays further
instructions. To log back into the VM, use the following username and password.
l

Username: dbadmin

Password: password

Root Password: password

Important: The dbadmin user has sudo privileges. Be sure to change the dbadmin and root
passwords with the Linux passwrd command.

Using the HP Vertica Interfaces


HP Vertica provides a set of tools that allows you to perform administrative tasks quickly and
easily. The administration tasks in HP Vertica can be done using the Management Console (MC) or
the Administration Tools. The MC provides a unified view of your HP Vertica cluster through a
browser connection, while the Administration Tools are implemented using Dialog, a graphical user
interface that works in terminal (character-cell) windows.

Using Management Console


The Management Console provides some, but not all, of the functionality that the Administration
Tools provides. In addition, the MC provides extended functionality not available in the
Administration Tools, such as a graphical view of your HP Vertica database and detailed monitoring
charts and graphs.
Most of the information you need to use MC is available on the MC interface, as seen in the
following two screenshots. For installation instructions, see Installing and Configuring Management
Console in the Installation Guide. For an introduction to MC functionality, architecture, and security,
see Management Console in the Concepts Guide.

HP Vertica Analytics Platform (7.1.x)

Page 131 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 132 of 5055

HP Vertica Documentation

Running the Administration Tools


A man page is available for convenient access to Administration Tools details. If you are running as
the dbadmin user, type man admintools. If you are running as a different user, type man -M
/opt/vertica/man admintools. If possible, always run the Administration Tools using the
database administrator account (dbadmin) on the administration host.
The Administration Tools interface responds to mouse clicks in some terminal windows,
particularly local Linux windows, but you might find that it responds only to keystrokes. For a quick
reference to keystrokes, see Using Keystrokes in the Administration Tools Interface in this guide.
When you run Administration Tools, the Main Menu dialog box appears with a dark blue
background and a title on top. The screen captures used in this documentation set are cropped
down to the dialog box itself, as shown in the following screenshot.

First Time Only


The first time you log in as the database administrator and run the Administration Tools, complete
the following steps.
1. In the EULA (end-user license agreement) window, type accept to proceed. A window
displays, requesting the location of the license key file you downloaded from the HP Web site.
The default path is /tmp/vlicense.dat.
2. Type the absolute path to your license key (for example, /tmp/vlicense.dat) and click OK.
3. To return to the command line, select Exit and click OK.

Using Keystrokes in the Administration Tools Interface


The following table is a quick reference to keystroke usage in the Administration Tools interface.
See Using the Administration Tools in the Administrators Guide for full details.
Return

Run selected command.

HP Vertica Analytics Platform (7.1.x)

Page 133 of 5055

HP Vertica Documentation

Tab

Cycle between OK, Cancel, Help, and menu.

Up/Down Arrow

Move cursor up and down in menu, window, or help file.

Space

Select item in list.

Character

Select corresponding command from menu.

Introducing the VMart Example Database


HP Vertica ships with a sample multi-schema database called the VMart Example Database,
which represents a database that might be used by a large supermarket (VMart) to access
information about its products, customers, employees, and online and physical stores. Using this
example, you can create, run, optimize, and test a multi-schema database.
The VMart database contains the following schema:
l

public (automatically created in any newly created HP Vertica database)

store

online_Sales

VMart Database Location and Scripts


If you installed HP Vertica from the RPM package, the VMart schema is installed in the
/opt/vertica/examples/VMart_Schema directory. This folder contains the following script files
that you can use to get started quickly. Use the scripts as templates for your own applications.
Script/file name

Description

vmart_count_data.sql

SQL script that counts


rows of all example
database tables, which
you can use to verify
load.

HP Vertica Analytics Platform (7.1.x)

Page 134 of 5055

HP Vertica Documentation

vmart_define_schema.sql

SQL script that defines


the logical schema for
each table and
referential integrity
constraints.

vmart_gen.cpp

Data generator source


code (C++).

vmart_gen

Data generator
executable file.

vmart_load_data.sql

SQL script that loads


the generated sample
data to the
corresponding tables
using COPY DIRECT.

vmart_ queries.sql

SQL script that


contains concatenated
sample queries for use
as a training set for the
Database Designer.

vmart_query_##.sql

SQL scripts that


contain individual
queries; for example,
vmart_query_01
through vmart_query_
09.sql

vmart_schema_drop.sql

SQL script that drops


all example database
tables.

For more information about the schema, tables, and queries included with the VMart example
database, see the Appendix.

HP Vertica Analytics Platform (7.1.x)

Page 135 of 5055

HP Vertica Documentation

Installing and Connecting to the VMart Example


Database
Follow the steps in this section to create the fully functioning, multi-schema VMart example
database that youll use to run sample queries. The number of example databases you create within
a single HP Vertica installation is limited only by the disk space available on your system; however,
Hewlett-Packard strongly recommends that you start only one example database at a time to avoid
unpredictable results.
HP Vertica provides two options to install the example database:
l

A quick installation that lets you create the example database and start using it immediately.
See Quick Installation Using a Script in this guide for details. Use this method to bypass the
schema and table creation processes and start querying immediately.

An advanced-but-simple example database installation using the Administration Tools interface.


See Advanced Installation in this guide for details. Use this method to better understand the
database creation process and practice creating schema and tables, and loading data.

Note: Both installation methods create a database named VMart. If you try both installation
methods, you will either need to drop the VMart database you created (see Restoring the
Status of Your Host in this guide) or create the subsequent database with a new name.
However, Hewlett-Packard strongly recommends that you start only one example database at
a time to avoid unpredictable results
This tutorial uses HP Vertica-provided queries, but you can follow the same set of procedures
later, when you create your own design and use your own queries file.
After you install the VMart database, the database has started. Connect to it using the steps in Step
3: Connecting to the Database.

Quick Installation Using a Script


The script you need to perform a quick installation is located in /opt/vertica/sbin and is called
install_example. This script creates a database on the default port (5433), generates data,
creates the schema and a default superprojection, and loads the data. The folder also contains a
delete_example script, which stops and drops the database.
1. In a terminal window, log in as the database administrator.
$ su dbadmin

HP Vertica Analytics Platform (7.1.x)

Page 136 of 5055

HP Vertica Documentation

Password: (your password)


2. Change to the /examples directory.
$ cd /opt/vertica/examples
3. Run the install script:
$ /opt/vertica/sbin/install_example VMart
After installation, you should see the following:
[dbadmin@localhost examples]$ /opt/vertica/sbin/install_example VMart
Installing VMart example example database
Mon Jul 22 06:57:40 PDT 2013
Creating Database
Completed
Generating Data. This may take a few minutes.
Completed
Creating schema
Completed
Loading 5 million rows of data. Please stand by.
Completed
Removing generated data files
Example data

The example database log files, ExampleInstall.txt and ExampleDelete.txt, are written to
/opt/vertica/examples/log.
To start using your database, continue to Connecting to the Database in this guide. To drop the
example database, see Restoring the Status of Your Host in this guide.

Advanced Installation
To perform an advanced-but-simple installation, set up the VMart example database environment
and then create the database using the Administration Tools or Management Console.
Note: If you installed the VMart database using the quick installation method, you cannot
complete the following steps because the database has already been created.
To try the advanced installation, drop the example database (see Restoring the Status of Your
Host on this guide) and perform the advanced Installation, or create a new example database
with a different name. However, Hewlett-Packard strongly recommends that you install only
one example database at a time to avoid unpredictable results.
The advanced installation requires the following steps:

HP Vertica Analytics Platform (7.1.x)

Page 137 of 5055

HP Vertica Documentation

Step 1:Setting Up the Example Environment


Creating the Example Database Using the Administration Tools

Step 3: Connecting to the Database

Step 4: Defining the Database Schema

Step 5: Loading Data

Step 1:Setting Up the Example Environment


1. Stop all databases running on the same host on which you plan to install your example
database.
If you are unsure if other databases are running, run the Administration Tools and select View
Cluster State. The State column should show DOWN values on pre-existing databases.
If databases are running, click Stop Database in the Main Menu of the Administration Tools
interface and click OK.
2. In a terminal window, log in as the database administrator:
$ su dbadmin
Password:

3. Change to the /VMart_Schema directory.


$ cd /opt/vertica/examples/VMart_Schema

Do not change directories while following this tutorial. Some steps depend on being in a
specific directory.
4. Run the sample data generator.
$ ./vmart_gen

5. Let the program run with the default parameters, which you can review in the README file.
Using default parameters
datadirectory = ./

HP Vertica Analytics Platform (7.1.x)

Page 138 of 5055

HP Vertica Documentation

numfiles = 1
seed = 2
null = ' '
timefile = Time.txt
numfactsalesrows = 5000000
numfactorderrows = 300000
numprodkeys = 60000
numstorekeys = 250
numpromokeys = 1000
numvendkeys = 50
numcustkeys = 50000
numempkeys = 10000
numwarehousekeys = 100
numshippingkeys = 100
numonlinepagekeys = 1000
numcallcenterkeys = 200
numfactonlinesalesrows = 5000000
numinventoryfactrows = 300000
gen_load_script = false
Data Generated successfully !
Using default parameters
datadirectory = ./
numfiles = 1
seed = 2
null = ' '
timefile = Time.txt
numfactsalesrows = 5000000
numfactorderrows = 300000
numprodkeys = 60000
numstorekeys = 250
numpromokeys = 1000
numvendkeys = 50
numcustkeys = 50000
numempkeys = 10000
numwarehousekeys = 100
numshippingkeys = 100
numonlinepagekeys = 1000
numcallcenterkeys = 200
numfactonlinesalesrows = 5000000
numinventoryfactrows = 300000
gen_load_script = false
Data Generated successfully !

6. If the vmart_gen executable does not work correctly, recompile it as follows, and run the
sample data generator script again.
$ g++ vmart_gen.cpp -o vmart_gen
$ chmod +x vmart_gen
$ ./vmart_gen

HP Vertica Analytics Platform (7.1.x)

Page 139 of 5055

HP Vertica Documentation

Step 2: Creating the Example Database


To create the example database:use the Administration Tools or Management Console, as
described in this section.

Creating the Example Database Using the Administration Tools


In this procedure, you create the example database using the Administration Tools. To use the
Management Console, go to the next section.
Note: If you have not used Administration Tools before, see Running the Administration Tools
in this guide.

1. Run the Administration Tools.


$ /opt/vertica/bin/admintools

or simply type admintools


2. From the Administration Tools Main Menu, click Configuration Menu and click OK.
3. Click Create Database and click OK.
4. Name the database VMart and click OK.

5. Click OK to bypass the password and click Yes to confirm.


There is no need for a database administrator password in this tutorial. When you create a
production database, however, always specify an administrator password. Otherwise, the
database is permanently set to trust authentication (no passwords).
6. Select the hosts you want to include from your HP Vertica cluster and click OK.

HP Vertica Analytics Platform (7.1.x)

Page 140 of 5055

HP Vertica Documentation

This example creates the database on a one-host cluster. Hewlett-Packard recommends a


minimum of three hosts in the cluster. If you are using the HP Vertica Community Edition, you
are limited to three nodes.

7. Click OK to select the default paths for the data and catalog directories.

Catalog and data paths must contain only alphanumeric characters and cannot have leading
space characters. Failure to comply with these restrictions could result in database creation
failure.

When you create a production database, youll likely specify other locations than the default.
See Prepare Disk Storage Locations in the Administrators Guide for more information.

8. Since this tutorial uses a one-host cluster, a K-safety warning appears. Click OK.

9. Click Yes to create the database.

HP Vertica Analytics Platform (7.1.x)

Page 141 of 5055

HP Vertica Documentation

During database creation, HP Vertica automatically creates a set of node definitions based on
the database name and the names of the hosts you selected and returns a success message.
10. Click OK to close the Database VMart created successfully message.

Creating the Example Database Using the Management Console


In this procedure, you create the example database using the Management Console. To use the
Administration Tools, follow the steps in the preceding section.
Note: To use Management Console, the console should already be installed and you should be
familiar with its concepts and layout. See Using Management Console in this guide for a brief
overview, or for detailed information, see Management Console in the Concepts Guide and
Installing and Configuring Management Console in the Installation Guide.

1. Connect to Management Console and log in.


2. On the Home page, under Tasks, click Database and Clusters.

HP Vertica Analytics Platform (7.1.x)

Page 142 of 5055

HP Vertica Documentation

3. Click to select the appropriate existing cluster and click Create Database.

4. Follow the on-screen wizard, which prompts you to provide the following information:
n

Database name, which must be between 325 characters, starting with a letter, and
followed by any combination of letters, numbers, or underscores.

HP Vertica Analytics Platform (7.1.x)

Page 143 of 5055

HP Vertica Documentation

(Optional) database administrator password for the database you want to create and
connect to.

IP address of a node in your database cluster, typically the IP address of the administration
host.

5. Click Next.

Step 3: Connecting to the Database


Regardless of the installation method you used, follow these steps to connect to the database.
1. As dbadmin, run the Administration Tools.
$ /opt/vertica/bin/admintools
or simply type admintools.
2. If you are already in the Administration Tools, navigate to the Main Menu page.
3. Select Connect to Database, click OK.

To configure and load data into the VMart database, complete the following steps:
n

Step 4: Defining the Database Schema

Step 5: Loading Data

If you installed the VMart database using the Quick Installation method, the schema, tables,
and data are already defined. You can choose to drop the example database (see Restoring the
Status of Your Host in this guide) and perform the Advanced Installation, or continue straight to
Querying Your Data in this guide.

HP Vertica Analytics Platform (7.1.x)

Page 144 of 5055

HP Vertica Documentation

Step 4: Defining the Database Schema


The VMart database installs with sample scripts with SQL commands that are intended to
represent queries that might be used in a real business. The vmart_define_schema.sql script runs
a script that defines the VMart schema and creates tables. You must run this script before you load
data into the VMart database.
This script performs the following tasks:
l

Defines two schemas in the VMart database schema: online_sales and store.

Defines tables in both schemas.

Defines constraints on those tables.


Vmart=> \i vmart_define_schema.sql
CREATE SCHEMA
CREATE SCHEMA
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE

Step 5: Loading Data


Now that you have created the schemas and tables, you can load data into a table by running the
vmart_load_data.sql script. This script loads data from the 15 .tbl text files in
opt/vertica/examples/VMart_Schema into the tables that vmart_design_schema.sql created.
It might take several minutes to load the data on a typical hardware cluster. Check the load status
by monitoring the vertica.log file, as described in Monitoring Log Files in the Administrators
Guide.

HP Vertica Analytics Platform (7.1.x)

Page 145 of 5055

HP Vertica Documentation

VMart=> \i vmart_load_data.sql
Rows Loaded
------------1826
(1 row)
Rows Loaded
------------60000
(1 row)
Rows Loaded
------------250
(1 row)
Rows Loaded
------------1000
(1 row)
Rows Loaded
------------50
(1 row)
Rows Loaded
------------50000
(1 row)
Rows Loaded
------------10000
(1 row)
Rows Loaded
------------100
(1 row)
Rows Loaded
------------100
(1 row)
Rows Loaded
------------1000
(1 row)
Rows Loaded
------------200
(1 row)
Rows Loaded
------------5000000
(1 row)

HP Vertica Analytics Platform (7.1.x)

Page 146 of 5055

HP Vertica Documentation

Rows Loaded
------------300000
(1 row)
VMart=>

Querying Data
The VMart database installs with sample scripts that contain SQL commands that represent
queries that might be used in a real business. Use basic SQL commands to query the database, or
try out the following command. Once youre comfortable running the example queries, you might
want to write your own.
Note: The data that your queries return might differ from the example output shown in this
guide because the sample data generator is random.
Type the following SQL command to return the values for five products with the lowest fat content
in the Dairy department. The command selects the fat content from Dairy department products in
the product_dimention table in the public schema, orders them from low to high and limits the
output to the first five (the five lowest fat contents).
VMart => SELECT fat_content
FROM ( SELECT DISTINCT fat_content
FROM product_dimension
WHERE department_description
IN ('Dairy') ) AS food
ORDER BY fat_content
LIMIT 5;

Your results will be similar to the following.


fat_content
------------80
81
82
83
84
(5 rows)

The preceding example is from the vmart_query_01.sql file. You can execute more sample
queries using the scripts that installed with the VMart database or write your own. For a list of the
sample queries supplied with HP Vertica, see the Appendix.

HP Vertica Analytics Platform (7.1.x)

Page 147 of 5055

HP Vertica Documentation

Backing Up and Restoring the Database


HP Vertica supplies a comprehensive utility, the vbr.py Python script, that lets you back up and
restore a full database, as well as create snapshots of specific schema or tables. The vbr.py utility
creates backup directories during its initial execution; subsequently running the utility creates
subdirectories.
The following information is intended to introduce the backup and restore functions. For more
detailed information, see Backing Up and Restoring the Database in the Administrators Guide.

Backing Up the Database


Use vbr.py to save your data to a variety of locations:
l

A local directory on the nodes in a cluster

One or more hosts outside of the cluster

A different HP Vertica cluster (effectively cloning your database)

Note: Creating a database backup on a different cluster does not provide disaster recovery.
The cloned database you create with vbr.py is entirely separate from the original, and is not
kept synchronized with the database from which it is cloned.

When to Back up the Database


In addition to any guidelines established by your organization, Hewlett-Packard recommends that
you back up your database:
l

Before you upgrade HP Vertica to another release.

Before you drop a partition.

After you load a large volume of data.

If the epoch in the latest snapshot is earlier than the current ancient history mark (AHM).

Before and after you add, remove, or replace nodes in your database cluster.

After recovering a cluster from a crash.

HP Vertica Analytics Platform (7.1.x)

Page 148 of 5055

HP Vertica Documentation

Note: When you restore a database snapshot, you must restore to a cluster that is identical to
the one where you created the snapshot. For this reason, always create a new snapshot after
adding, removing, or replacing nodes.
Ideally, create regular backups of your full database. You can run the HP Vertica vbr.py utility from
a cron job or other task scheduler.

Creating the Backup Configuration File


The vbr.py utility uses a configuration file for the information required to back up and restore a fullor object-level snapshot. The configuration file defines where the database backup is saved, the
temporary directories it uses, and which nodes, schema, and/or tables in the database are to be
backed up. You cannot run vbr.py without a configuration file, and no default file exists.
To invoke the script to set up a configuration file, enter this command:
$ vbr.py --setupconfig

The script prompts you to answer the following questions regarding the configuration file. Type
Enter to accept the default value in parentheses. See VBR Configuration File Reference in the
Administrators Guide for information about specific questions.
Snapshot name (backup_snapshot): Example_backup
Backup vertica configurations? (n) [y/n]: y
Number of restore points (1): 1
Specify objects (no default):
Vertica user name (dbadmin): dbadmin
Save password to avoid runtime prompt? (n) [y/n]: y
Password to save in vbr config file (no default): password
Node v_vmart_node0001
Backup host name (no default): localhost
Backup directory (no default): /home/dbadmin
Config file name (backup_snapshot.ini): exampleBackup.ini
Change advanced settings? (n) [y/n]: n
Saved vbr configuration to exampleBackup.ini.

After you answer the required questions, vbr.py generates a configuration file with the information
you supplied. Use the Config file name you specified when you run the --task backup or other
commands. The vbr.py utility uses the configuration file contents for both backup and restore
tasks.

Creating Full and Incremental Backups


Before you create a database backup, ensure the following:

HP Vertica Analytics Platform (7.1.x)

Page 149 of 5055

HP Vertica Documentation

Your database is running.

All of the backup hosts are up and available.

The backup location host has sufficient disk space to store the snapshots.

The user who starts the utility has write access to the target directories on the host backup
location.

Run the vbr.py script from a terminal using the database administrator account from an initiator
node in your database cluster. You cannot run the utility as root.
Use the --task backup and --config-file filename directives as shown in this example.
$ vbr.py --task backup --config-file exampleBackup.ini
Copying
[===============================================] 100%
All child processes terminated successfully.
Committing changes on all backup sites
backup done!

By default, there is no screen output other than the progress bar.


If you do not specify a configuration file, the vbr utility searches for one at this location:
/opt/vertica/config/vbr.ini

If the utility does not find a configuration file at this location, it fails with an error and exits.
The first time you run the vbr.py utility, it performs a full backup; subsequent runs with the same
configuration file create an incremental snapshot. When creating incremental snapshots, the utility
copies new storage containers, which can include data that existed the last time you backed up the
database, along with new and changed data since then. By default, vbr.py saves one archive
backup, unless you set the restorePointLimit parameter value in the configuration file to a value
greater than 1.

Restoring the Database


To restore a full database snapshot, ensure that:
l

The database is down.

All of the backup hosts are up and available.

The backup directory exists and contains the snapshots from which to restore.

HP Vertica Analytics Platform (7.1.x)

Page 150 of 5055

HP Vertica Documentation

The cluster to which you are restoring the backup has the same number of hosts as the one used
to create the snapshot; the node names and the IP addresses must also be identical.

The database you are restoring already exists on the cluster to which you are restoring data; the
database can be completely empty, without any data or schema. As long as the database name
matches the name in the snapshot, and all of the node names match the names of the nodes,
you can restore to it.

To begin a full database snapshot restore, log in using the database administrators account. You
cannot run the utility as root.
To restore the most recent snapshot, use the configuration file used to create the snapshot,
specifying vbr.py with the --task restore.
$ vbr.py --task restore --config-file exampleBackup.ini
Copying...
[==================================================] 100%
All child processes terminated successfully.
restore done!

You can restore a snapshot only to the database from which it was taken. You cannot restore a
snapshot into an empty database.

Using Database Designer to Create a Comprehensive


Design
The HP Vertica Database Designer:
l

Analyzes your logical schema, sample data, and, optionally, your sample queries.

Creates a physical schema design (a set of projections) that can be deployed automatically or
manually.

Can be used by anyone without specialized database knowledge.

Can be run and rerun any time for additional optimization without stopping the database.

Uses strategies to provide optimal query performance and data compression.

Use Database Designer to create a comprehensive design, which allows you to create new
projections for all tables in your database.

HP Vertica Analytics Platform (7.1.x)

Page 151 of 5055

HP Vertica Documentation

You can also use Database Designer to create an incremental design, which creates projections for
all tables referenced in the queries you supply. For more information, see Incremental Design in the
Administrators Guide.
You can create a comprehensive design with Database Designer using Management Console or
through Administration Tools. You can also choose to run Database Designer programmatically
(See About Running Database Designer Programmatically).
This section shows you how to:
l

Running Database Designer with Management Console

Run Database Designer with Administration Tools

Running Database Designer with Management Console


In this procedure, you create a comprehensive design with Database Designer through the
Management Console interface. If, in the future, you have a query that you want to optimize, you
can create an enhanced (incremental) design with additional projections. You can tune these
projections specifically for the query you provide. See Comprehensive Design in the
Administrator's Guide for more information.
Note: To run Database Designer outside Administration Tools, you must be a dbadmin user. If
you are not a dbadmin user, you must have the DBDUSER role assigned to you and own the
tables for which you are designing projections.
You can choose to create the design manually or use the wizard. To create a design manually, see
Creating a Design Manually in the Administrator's Guide.
Set your browser so that it does not cache pages. If a browser caches pages, you may not be able
to see the new design added.
Follow these steps to use the wizard to create the comprehensive design in Management Console:
1. Log in to Management Console.
2. Verify that your database is up and running.
3. Choose the database for which you want to create the design. You can find the database under
the Recent Databases section or by clicking the Databases and Clusters page.
The database overview page opens:

HP Vertica Analytics Platform (7.1.x)

Page 152 of 5055

HP Vertica Documentation

4. At the bottom of the screen, click the Design button.


5. In the New Design dialog box, enter the design name.

6. Click Wizard to continue.


7. Create an initial design. For Design Type, select Comprehensive and click Next.
8. In the Optimization Objective window, select Balance Load and Performance to create a
design that is balanced between database size and query performance. Click Next.

HP Vertica Analytics Platform (7.1.x)

Page 153 of 5055

HP Vertica Documentation

9. Select the schemas. Because the VMart design is a multi-schema database, select all three
schemas (public, store, and online_sales) for your design in the Select Sample Data window.
Click Next.

If you include a schema that contains tables without data, the design could be suboptimal. You
can choose to continue, but HP Verticarecommends that you deselect the schemas that
contain empty tables before you proceed.
10. Choose the K-safety value for your design.The K-Safety value determines the number of buddy
projections you want database designer to create.
11. Submit query files to Database Designer in one of two ways:

HP Vertica Analytics Platform (7.1.x)

Page 154 of 5055

HP Vertica Documentation

a. Supply your own query files by selecting the Browse button.


b. Click Use Query Repository, which submits recently executed queries from the
QUERY_REQUESTS system table.
Click Next.
12. In the Execution Options window, select all the options you want. You can select all three
options or fewer.

The three options are:


n

Analyze statistics: Select this option to run statistics automatically after design deploy to
help Database Designer make more optimal decisions for its proposed design.

Auto-build: Select this option to run Database Designer as soon as you complete the
wizard. This option only builds the proposed design.

Auto-deploy: Select this option for auto-build designs that you want to deploy automatically.

13. Click Submit Design.


The Database Designer page opens:
n

If you chose to automatically deploy your design, Database Designer executes in the
background.

If you did not select the Auto-build or Auto-deploy options, you can click Build Design or
Deploy Design on the Database Designer page.

HP Vertica Analytics Platform (7.1.x)

Page 155 of 5055

HP Vertica Documentation

14. In the My Designs pane, view the status of your design:


n

When the deployment completes, the My Design pane shows Design Deployed.

The event history window shows the details of the design build and deployment.

To run Database Designer with Administration Tools, see Run Database Designer with
Administration Tools in this guide.

Run Database Designer with Administration Tools


In this procedure, you create a comprehensive design with Database Designer using the
Administration Tools interface. If, in the future, you have a query that you want to optimize, you can
create an enhanced (incremental) design with additional projections. You can tune these projections
specifically for the query you provide. See Incremental Design in the Administrators Guide for more
information.
Follow these steps to create the comprehensive design using Database Designer in Administration
Tools:
1. If you are not in Administration Tools, exit the vsql session and access Administration Tools:
n

Type \q to exit vsql.

Type admintools to access the Administration Tools Main Menu.

2. Start the database for which you want to create a design.


3. From the Main Menu, click Configuration Menu and then click OK.

HP Vertica Analytics Platform (7.1.x)

Page 156 of 5055

HP Vertica Documentation

4. From the Configuration Menu, click Run Database Designer and then click OK.
5. When the Select a database for design dialog box opens, select VMart and then click OK.

If you are prompted to enter the password for the database, click OK to bypass the message.
Because no password was assigned when you installed the VMart database, you do not need
to enter one now.
6. Click OK to accept the default directory for storing Database Designer output and log files.

7. In the Database Designer window, enter a name for the design, for example, vmart_design,
and click OK. Design names can contain only alphanumeric characters or underscores. No
other special characters are allowed.

HP Vertica Analytics Platform (7.1.x)

Page 157 of 5055

HP Vertica Documentation

8. Create a complete initial design. In the Design Type window, click Comprehensive and click
OK.

9. Select the schemas. Because the VMart design is a multi-schema database, you can select all
three schemas (online_sales, public, and store) for your design. Click OK.

If you include a schema that contains tables without data, the Administration Tools notifies you
that designing for tables without data could be suboptimal. You can choose to continue, but
Hewlett-Packard recommends that you deselect the schemas that contain empty tables before
you proceed.
10. In the Design Options window, accept all three options and click OK.

HP Vertica Analytics Platform (7.1.x)

Page 158 of 5055

HP Vertica Documentation

The three options are:


n

Optimize with queries: Supplying the Database Designer with queries is especially
important if you want to optimize the database design for query performance. HewlettPackard recommends that you limit the design input to 100 queries.

Update statistics: Accurate statistics help the Database Designer choose the best strategy
for data compression. If you select this option, the database statistics are updated to
maximize design quality.

Deploy design: The new design deploys automatically. During deployment, new projections
are added, some existing projections retained, and any necessary existing projections
removed. Any new projections are refreshed to populate them with data.

11. Because you selected the Optimize with queries option, you must enter the full path to the
file containing the queries that will be run on your database. In this example, it is:
/opt/vertica/examples/VMart_Schema/vmart_queries.sql

The queries in the query file must be delimited with a semicolon (;).
12. Choose the K-safety value you want and click OK. The design K-Safety determines the
number of buddy projections you want database designer to create.
If you create a comprehensive design on a single node, you are not prompted to enter a Ksafety value.
13. In the Optimization Objective window, select Balanced query/load performance to create
a design that is balanced between database size and query performance. Click OK.

HP Vertica Analytics Platform (7.1.x)

Page 159 of 5055

HP Vertica Documentation

14. When the informational message displays, click Proceed.


Database Designer automatically performs these actions:
n

Sets up the design session.

Examines table data.

Loads queries from the query file you provided (in this example,
/opt/vertica/examples/VMart_Schema/vmart_queries.sql).

Creates the design.

Deploys the design or saves a SQL file containing the commands to create the design, based
on your selections in the Desgin Options window.
Depending on system resources, the design process could take several minutes. You should
allow this process to complete uninterrupted. If you must cancel the session, use Ctrl+C.

15. When Database Designer finishes, press Enter to return to the Administration Tools menu.
Examine the steps taken to create the design. The files are in the directory you specified to
store the output and log files. In this example, that directory is
/opt/vertica/examples/VMart_Schema. For more information about the script files, see

HP Vertica Analytics Platform (7.1.x)

Page 160 of 5055

HP Vertica Documentation

What Is a Design?, in the Administrator's Guide.


For additional information about managing your designs, see Creating a Database Design in the
Administrators Guide.

Restoring the Status of Your Host


When you finish the tutorial, you can restore your host machines to their original state. Use the
following instructions to clean up your host and start over from scratch.

Stopping and Dropping the Database


Follow these steps to stop and/or drop your database. A database must be stopped before it can be
dropped.
1. If connected to the database, disconnect by typing \q.
2. In the Administration Tools Main Menu dialog box, click Stop Database and click OK.
3. In the Select database to stop window, select the database you want to stop and click OK.
4. After stopping the database, click Configuration Menu and click OK.
5. Click Drop Database and click OK.
6. In the Select database to drop window, select the database you want to drop and click OK.
7. Click Yes to confirm.
8. In the next window type yes (lowercase) to confirm and click OK.
Alternatively, use the delete_example script, which stops and drops the database:
1. If connected to the database, disconnect by typing \q.
2. In the Administration Tools Main Menu dialog box, select Exit.
3. Log in as the database administrator.
4. Change to the /examples directory.
$ cd /opt/vertica/examples

HP Vertica Analytics Platform (7.1.x)

Page 161 of 5055

HP Vertica Documentation

5. Run the delete_example script.


$ /opt/vertica/sbin/delete_example Vmart

Uninstalling HP Vertica
Perform the steps in Uninstalling HP Vertica in the Installation Guide.

Optional Steps
You can also choose to:
l

Remove the dbadmin account on all cluster hosts.

Remove any example database directories you created.

Changing the GUI Appearance


The appearance of the Graphical User Interface (GUI) depends on the color and font settings used
by your terminal window. The screen captures in this document were made using the default color
and font settings in a PuTTY terminal application running on a Windows platform.
Note: If you are using a remote terminal application, such as PuTTY or a Cygwin bash shell,
make sure your window is at least 81 characters wide and 23 characters high.
If you are using PuTTY, take these steps to make the Administration Tools look like the screen
captures in this document.
1. In a PuTTY window, right-click the title area and select Change Settings.
2. Create or load a saved session.
3. In the Category dialog, click Window > Appearance.
4. In the Font settings, click the Change button.
5. Select Font: Courier New, Regular Size: 10.
6. Click Apply.
Repeat these steps for each existing session that you use to run the Administration Tools.
You can also change the translation to support UTF-8.

HP Vertica Analytics Platform (7.1.x)

Page 162 of 5055

HP Vertica Documentation

1. In a PuTTY window, right-click the title area and select Change Settings.
2. Create or load a saved session.
3. In the Category dialog, click Window > Translation.
4. In the Received data assumed to be in which character set drop-down menu, select UTF8.
5. Click Apply.

Appendix: VMart Example Database Schema, Tables,


and Scripts
The Appendix provides detailed information about the VMart example databases schema, tables,
and scripts.
The VMart example database contains three different schemas:
l

public

store

online_sales

The term schema has several related meanings in HP Vertica:


l

In SQL statements, a schema refers to named namespace for a logical schema.

Logical schema refers to a set of tables and constraints.

Physical schema refers to a set of projections.

Each schema contains tables that are created and loaded during database installation. See the
schema maps for a list of tables and their contents:
l

public Schema Map

store Schema Map

online_sales Schema Map

The VMart database installs with sample scripts that contain SQL commands that represent
queries that might be used in a real business. The sample scripts are available in the Sample

HP Vertica Analytics Platform (7.1.x)

Page 163 of 5055

HP Vertica Documentation

Scripts section of this Appendix. Once youre comfortable running the example queries, you might
want to write your own.

Tables
The three schemas in the VMart database include the following tables:
public Schema

store Schema

online_sales
Schema

inventory_fact

store_orders_
fact

online_sales_fact

customer_dimension

store_sales_
fact

call_center_
dimension

date_dimension

store_dimension

online_page_
dimension

employee_dimension
product_dimension
promotion_dimension
shipping_dimension
vendor_dimension
warehouse_dimension

public Schema Map


The public schema is a snowflake schema. The following graphic illustrates the public schema
and its relationships with tables in the online_sales and store schemas.

HP Vertica Analytics Platform (7.1.x)

Page 164 of 5055

HP Vertica Documentation

inventory_fact
This table contains information about each product in inventory.

HP Vertica Analytics Platform (7.1.x)

Page 165 of 5055

HP Vertica Documentation

Column Name

Data Type NULLs

date_key

INTEGER

No

product_key

INTEGER

No

product_version

INTEGER

No

warehouse_key

INTEGER

No

qty_in_stock

INTEGER

No

customer_dimension
This table contains information about all the retail chains customers.
Column Name

Data Type

NULLs

customer_key

INTEGER

No

customer_type

VARCHAR(16)

Yes

customer_name

VARCHAR(256)

Yes

customer_gender

VARCHAR(8)

Yes

title

VARCHAR(8)

Yes

household_id

INTEGER

Yes

customer_address

VARCHAR(256)

Yes

customer_city

VARCHAR(64)

Yes

customer_state

CHAR(2)

Yes

customer_region

VARCHAR(64)

Yes

marital_status

VARCHAR(32)

Yes

customer_age

INTEGER

Yes

number_of_children

INTEGER

Yes

annual_income

INTEGER

Yes

occupation

VARCHAR(64)

Yes

HP Vertica Analytics Platform (7.1.x)

Page 166 of 5055

HP Vertica Documentation

largest_bill_amount

INTEGER

Yes

store_membership_card

INTEGER

Yes

customer_since

DATE

Yes

deal_stage

VARCHAR(32)

Yes

deal_size

INTEGER

Yes

last_deal_update

DATE

Yes

date_dimension
This table contains information about dates. It is generated from a file containing correct date/time
data.
Column Name

Data Type

NULLs

date_key

INTEGER

No

date

DATE

Yes

full_date_description

VARCHAR(18)

Yes

day_of_week

VARCHAR(9)

Yes

day_number_in_calendar_month

INTEGER

Yes

day_number_in_calendar_year

INTEGER

Yes

day_number_in_fiscal_month

INTEGER

Yes

day_number_in_fiscal_year

INTEGER

Yes

last_day_in_week_indicator

INTEGER

Yes

last_day_in_month_indicator

INTEGER

Yes

calendar_week_number_in_year

INTEGER

Yes

calendar_month_name

VARCHAR(9)

Yes

calendar_month_number_in_year

INTEGER

Yes

calendar_year_month

CHAR(7)

Yes

HP Vertica Analytics Platform (7.1.x)

Page 167 of 5055

HP Vertica Documentation

calendar_quarter

INTEGER

Yes

calendar_year_quarter

CHAR(7)

Yes

calendar_half_year

INTEGER

Yes

calendar_year

INTEGER

Yes

holiday_indicator

VARCHAR(10)

Yes

weekday_indicator

CHAR(7)

Yes

selling_season

VARCHAR(32)

Yes

employee_dimension
This table contains information about all the people who work for the retail chain.
Column Name

Data Type

NULLs

employee_key

INTEGER

No

employee_gender

VARCHAR(8)

Yes

courtesy_title

VARCHAR(8)

Yes

employee_first_name

VARCHAR(64)

Yes

employee_middle_initial

VARCHAR(8)

Yes

employee_last_name

VARCHAR(64)

Yes

employee_age

INTEGER

Yes

hire_date

DATE

Yes

employee_street_address

VARCHAR(256)

Yes

employee_city

VARCHAR(64)

Yes

employee_state

CHAR(2)

Yes

employee_region

CHAR(32)

Yes

job_title

VARCHAR(64)

Yes

reports_to

INTEGER

Yes

HP Vertica Analytics Platform (7.1.x)

Page 168 of 5055

HP Vertica Documentation

salaried_flag

INTEGER

Yes

annual_salary

INTEGER

Yes

hourly_rate

FLOAT

Yes

vacation_days

INTEGER

Yes

product_dimension
This table describes all products sold by the department store chain.
Column Name

Data Type

NULLs

product_key

INTEGER

No

product_version

INTEGER

No

product_description

VARCHAR(128)

Yes

sku_number

CHAR(32)

Yes

category_description

CHAR(32)

Yes

department_description

CHAR(32)

Yes

package_type_description

CHAR(32)

Yes

package_size

CHAR(32)

Yes

fat_content

INTEGER

Yes

diet_type

CHAR(32)

Yes

weight

INTEGER

Yes

weight_units_of_measure

CHAR(32)

Yes

shelf_width

INTEGER

Yes

shelf_height

INTEGER

Yes

shelf_depth

INTEGER

Yes

product_price

INTEGER

Yes

product_cost

INTEGER

Yes

HP Vertica Analytics Platform (7.1.x)

Page 169 of 5055

HP Vertica Documentation

lowest_competitor_price

INTEGER

Yes

highest_competitor_price

INTEGER

Yes

average_competitor_price

INTEGER

Yes

discontinued_flag

INTEGER

Yes

promotion_dimension
This table describes every promotion ever done by the retail chain.
Column Name

Data Type

NULLs

promotion_key

INTEGER

No

promotion_name

VARCHAR(128)

Yes

price_reduction_type

VARCHAR(32)

Yes

promotion_media_type

VARCHAR(32)

Yes

ad_type

VARCHAR(32)

Yes

display_type

VARCHAR(32)

Yes

coupon_type

VARCHAR(32)

Yes

ad_media_name

VARCHAR(32)

Yes

display_provider

VARCHAR(128)

Yes

promotion_cost

INTEGER

Yes

promotion_begin_date

DATE

Yes

promotion_end_date

DATE

Yes

shipping_dimension
This table contains information about shipping companies that the retail chain uses.
Column Name Data Type NULLs
shipping_key

INTEGER

No

HP Vertica Analytics Platform (7.1.x)

Page 170 of 5055

HP Vertica Documentation

ship_type

CHAR(30)

Yes

ship_mode

CHAR(10)

Yes

ship_carrier

CHAR(20)

Yes

vendor_dimension
This table contains information about each vendor that provides products sold through the retail
chain.
Column Name

Data Type

NULLs

vendor_key

INTEGER

No

vendor_name

VARCHAR(64) Yes

vendor_address

VARCHAR(64) Yes

vendor_city

VARCHAR(64) Yes

vendor_state

CHAR(2)

vendor_region

VARCHAR(32) Yes

deal_size

INTEGER

Yes

last_deal_update

DATE

Yes

Yes

warehouse_dimension
This table provides information about each of the chains warehouses.
Column Name

Data Type

NULLs

warehouse_key

INTEGER

No

warehouse_name

VARCHAR(20)

Yes

warehouse_address

VARCHAR(256)

Yes

warehouse_city

VARCHAR(60)

Yes

warehouse_state

CHAR(2)

Yes

warehouse_region

VARCHAR(32)

Yes

HP Vertica Analytics Platform (7.1.x)

Page 171 of 5055

HP Vertica Documentation

store Schema Map


The store schema is a snowflake schema that contains information about the retail chains bricksand-mortar stores. The following graphic illustrates the store schema and its relationship with
tables in the public schema.

HP Vertica Analytics Platform (7.1.x)

Page 172 of 5055

HP Vertica Documentation

store_orders_fact
This table contains information about all orders made at the companys brick-and-mortar stores.

HP Vertica Analytics Platform (7.1.x)

Page 173 of 5055

HP Vertica Documentation

Column Name

Data Type

NULLs

product_key

INTEGER

No

product_version

INTEGER

No

store_key

INTEGER

No

vendor_key

INTEGER

No

employee_key

INTEGER

No

order_number

INTEGER

No

date_ordered

DATE

Yes

date_shipped

DATE

Yes

expected_delivery_date

DATE

Yes

date_delivered

DATE

Yes

quantity_ordered

INTEGER

Yes

quantity_delivered

INTEGER

Yes

shipper_name

VARCHAR(32)

Yes

unit_price

INTEGER

Yes

shipping_cost

INTEGER

Yes

total_order_cost

INTEGER

Yes

quantity_in_stock

INTEGER

Yes

reorder_level

INTEGER

Yes

overstock_ceiling

INTEGER

Yes

store_sales_fact
This table contains information about all sales made at the companys brick-and-mortar stores.
Column Name

Data Type

NULLs

date_key

INTEGER

No

HP Vertica Analytics Platform (7.1.x)

Page 174 of 5055

HP Vertica Documentation

product_key

INTEGER

No

product_version

INTEGER

No

store_key

INTEGER

No

promotion_key

INTEGER

No

customer_key

INTEGER

No

employee_key

INTEGER

No

pos_transaction_number

INTEGER

No

sales_quantity

INTEGER

Yes

sales_dollar_amount

INTEGER

Yes

cost_dollar_amount

INTEGER

Yes

gross_profit_dollar_amount

INTEGER

Yes

transaction_type

VARCHAR(16)

Yes

transaction_time

TIME

Yes

tender_type

VARCHAR(8)

Yes

store_dimension
This table contains information about each brick-and-mortar store within the retail chain.
Column Name

Data Type

NULLs

store_key

INTEGER

No

store_name

VARCHAR(64)

Yes

store_number

INTEGER

Yes

store_address

VARCHAR(256)

Yes

store_city

VARCHAR(64)

Yes

store_state

CHAR(2)

Yes

store_region

VARCHAR(64)

Yes

HP Vertica Analytics Platform (7.1.x)

Page 175 of 5055

HP Vertica Documentation

floor_plan_type

VARCHAR(32)

Yes

photo_processing_type

VARCHAR(32)

Yes

financial_service_type

VARCHAR(32)

Yes

selling_square_footage

INTEGER

Yes

total_square_footage

INTEGER

Yes

first_open_date

DATE

Yes

last_remodel_date

DATE

Yes

number_of_employees

INTEGER

Yes

annual_shrinkage

INTEGER

Yes

foot_traffic

INTEGER

Yes

monthly_rent_cost

INTEGER

Yes

online_sales Schema Map


The online_sales schema is a snowflake schema that contains information about the retail
chains. The following graphic illustrates the online_sales schema and its relationship with tables
in the public schema.

HP Vertica Analytics Platform (7.1.x)

Page 176 of 5055

HP Vertica Documentation

online_sales_fact
This table describes all the items purchased through the online store front.
Column Name

Data Type

NULLs

sale_date_key

INTEGER

No

ship_date_key

INTEGER

No

product_key

INTEGER

No

product_version

INTEGER

No

customer_key

INTEGER

No

call_center_key

INTEGER

No

online_page_key

INTEGER

No

shipping_key

INTEGER

No

warehouse_key

INTEGER

No

promotion_key

INTEGER

No

pos_transaction_number

INTEGER

No

HP Vertica Analytics Platform (7.1.x)

Page 177 of 5055

HP Vertica Documentation

sales_quantity

INTEGER

Yes

sales_dollar_amount

FLOAT

Yes

ship_dollar_amount

FLOAT

Yes

net_dollar_amount

FLOAT

Yes

cost_dollar_amount

FLOAT

Yes

gross_profit_dollar_amount

FLOAT

Yes

transaction_type

VARCHAR(16)

Yes

call_center_dimension
This table describes all the chains call centers.
Column Name Data Type

NULLs

call_center_key

INTEGER

No

cc_closed_date

DATE

Yes

cc_open_date

DATE

Yes

cc_date

VARCHAR(50)

Yes

cc_class

VARCHAR(50)

Yes

cc_employees

INTEGER

Yes

cc_hours

CHAR(20)

Yes

cc_manager

VARCHAR(40)

Yes

cc_address

VARCHAR(256) Yes

cc_city

VARCHAR(64)

Yes

cc_state

CHAR(2)

Yes

cc_region

VARCHAR(64)

Yes

online_page_dimension
This table describes all the pages in the online store front.

HP Vertica Analytics Platform (7.1.x)

Page 178 of 5055

HP Vertica Documentation

Column Name

Data Type

NULLs

online_page_key

INTEGER

No

start_date

DATE

Yes

end_date

DATE

Yes

page_number

INTEGER

Yes

page_description

VARCHAR(100) Yes

page_type

VARCHAR(100) Yes

Sample Scripts
You can create your own queries, but the VMart example directory includes sample query script
files to help you get started quickly.
You can find the following sample scripts at this path /opt/vertica/examples/VMart_Schema.
To run any of the scripts, enter
=>

\i <script_name>

Alternatively, type the commands from the script file manually.


Note: The data that your queries return might differ from the example output shown in this
guide because the sample data generator is random.

vmart_query_01.sql
-----

vmart_query_01.sql
FROM clause subquery
Return the values for five products with the
lowest-fat content in the Dairy department

SELECT fat_content
FROM (
SELECT DISTINCT fat_content
FROM product_dimension
WHERE department_description
IN ('Dairy') ) AS food
ORDER BY fat_content
LIMIT 5;

Output

HP Vertica Analytics Platform (7.1.x)

Page 179 of 5055

HP Vertica Documentation

fat_content
------------80
81
82
83
84
(5 rows)

vmart_query_02.sql
-----

vmart_query_02.sql
WHERE clause subquery
Asks for all orders placed by stores located in Massachusetts
and by vendors located elsewhere before March 1, 2003:

SELECT order_number, date_ordered


FROM store.store_orders_fact orders
WHERE orders.store_key IN (
SELECT store_key
FROM store.store_dimension
WHERE store_state = 'MA')
AND orders.vendor_key NOT IN (
SELECT vendor_key
FROM public.vendor_dimension
WHERE vendor_state = 'MA')
AND date_ordered < '2003-03-01';

Output
order_number | date_ordered
-------------+-------------53019 | 2003-02-10
222168 | 2003-02-05
160801 | 2003-01-08
106922 | 2003-02-07
246465 | 2003-02-10
234218 | 2003-02-03
263119 | 2003-01-04
73015 | 2003-01-01
233618 | 2003-02-10
85784 | 2003-02-07
146607 | 2003-02-07
296193 | 2003-02-05
55052 | 2003-01-05
144574 | 2003-01-05
117412 | 2003-02-08
276288 | 2003-02-08
185103 | 2003-01-03
282274 | 2003-01-01
245300 | 2003-02-06
143526 | 2003-01-04
59564 | 2003-02-06
...

HP Vertica Analytics Platform (7.1.x)

Page 180 of 5055

HP Vertica Documentation

vmart_query_03.sql
-----

vmart_query_03.sql
Noncorrelated subquery
Requests female and male customers with the maximum
annual income from customers

SELECT customer_name, annual_income


FROM public.customer_dimension
WHERE (customer_gender, annual_income) IN (
SELECT customer_gender, MAX(annual_income)
FROM public.customer_dimension
GROUP BY customer_gender);

Output
customer_name
| annual_income
------------------+--------------James M. McNulty |
999979
Emily G. Vogel
|
999998
(2 rows)

vmart_query_04.sql
-- vmart_query_04.sql
-- IN predicate
-- Find all products supplied by stores in MA
SELECT DISTINCT s.product_key, p.product_description
FROM store.store_sales_fact s, public.product_dimension p
WHERE s.product_key = p.product_key
AND s.product_version = p.product_version AND s.store_key IN (
SELECT store_key
FROM store.store_dimension
WHERE store_state = 'MA')
ORDER BY s.product_key;

Output
product_key |
product_description
-------------+---------------------------------------1 | Brand #1 butter
1 | Brand #2 bagels
2 | Brand #3 lamb
2 | Brand #4 brandy
2 | Brand #5 golf clubs
2 | Brand #6 chicken noodle soup

HP Vertica Analytics Platform (7.1.x)

Page 181 of 5055

HP Vertica Documentation

3 |
3 |
3 |
3 |
3 |
4 |
4 |
4 |
4 |
5 |
5 |
6 |
6 |
6 |
6 |
...

Brand
Brand
Brand
Brand
Brand
Brand
Brand
Brand
Brand
Brand
Brand
Brand
Brand
Brand
Brand

#10 ground beef


#11 vanilla ice cream
#7 canned chicken broth
#8 halibut
#9 camera case
#12 rash ointment
#13 low fat milk
#14 chocolate chip cookies
#15 silver polishing cream
#16 cod
#17 band aids
#18 bananas
#19 starch
#20 vegetable soup
#21 bourbon

vmart_query_05.sql
------

vmart_query_05.sql
EXISTS predicate
Get a list of all the orders placed by all stores on
January 2, 2003 for the vendors with records in the
vendor_dimension table

SELECT store_key, order_number, date_ordered


FROM store.store_orders_fact
WHERE EXISTS (
SELECT 1
FROM public.vendor_dimension
WHERE public.vendor_dimension.vendor_key = store.store_orders_fact.vendor_key)
AND date_ordered = '2003-01-02';

Output
store_key | order_number | date_ordered
-----------+--------------+-------------98 |
151837 | 2003-01-02
123 |
238372 | 2003-01-02
242 |
263973 | 2003-01-02
150 |
226047 | 2003-01-02
247 |
232273 | 2003-01-02
203 |
171649 | 2003-01-02
129 |
98723 | 2003-01-02
80 |
265660 | 2003-01-02
231 |
271085 | 2003-01-02
149 |
12169 | 2003-01-02
141 |
201153 | 2003-01-02
1 |
23715 | 2003-01-02
156 |
98182 | 2003-01-02
44 |
229465 | 2003-01-02
178 |
141869 | 2003-01-02
134 |
44410 | 2003-01-02

HP Vertica Analytics Platform (7.1.x)

Page 182 of 5055

HP Vertica Documentation

141
205
113
99
44

|
|
|
|
|

129839
54138
63358
50142
131255

|
|
|
|
|

2003-01-02
2003-01-02
2003-01-02
2003-01-02
2003-01-02

...

vmart_query_06.sql
-----

vmart_query_06.sql
EXISTS predicate
Orders placed by the vendor who got the best deal
on January 4, 2004

SELECT store_key, order_number, date_ordered


FROM store.store_orders_fact ord, public.vendor_dimension vd
WHERE ord.vendor_key = vd.vendor_key
AND vd.deal_size IN (
SELECT MAX(deal_size)
FROM public.vendor_dimension)
AND date_ordered = '2004-01-04';

Output
store_key | order_number | date_ordered
-----------+--------------+-------------45 |
202416 | 2004-01-04
24 |
250295 | 2004-01-04
121 |
251417 | 2004-01-04
198 |
75716 | 2004-01-04
166 |
36008 | 2004-01-04
27 |
150241 | 2004-01-04
148 |
182207 | 2004-01-04
9 |
188567 | 2004-01-04
113 |
66017 | 2004-01-04
...

vmart_query_07.sql
-----

vmart_query_07.sql
Multicolumn subquery
Which products have the highest cost,
grouped by category and department

SELECT product_description, sku_number, department_description


FROM public.product_dimension
WHERE (category_description, department_description, product_cost) IN (
SELECT category_description, department_description,
MAX(product_cost) FROM product_dimension
GROUP BY category_description, department_description);

Output

HP Vertica Analytics Platform (7.1.x)

Page 183 of 5055

HP Vertica Documentation

product_description
|
sku_number
|
department_description
---------------------------+-----------------------+--------------------------------Brand #601 steak
| SKU-#601
| Meat
Brand #649 brooms
| SKU-#649
| Cleaning supplies
Brand #677 veal
| SKU-#677
| Meat
Brand #1371 memory card
| SKU-#1371
| Photography
Brand #1761 catfish
| SKU-#1761
| Seafood
Brand #1810 frozen pizza
| SKU-#1810
| Frozen Goods
Brand #1979 canned peaches | SKU-#1979
| Canned Goods
Brand #2097 apples
| SKU-#2097
| Produce
Brand #2287 lens cap
| SKU-#2287
| Photography
...

vmart_query_08.sql
-- vmart_query_08.sql
-- Using pre-join projections to answer subqueries
-- between online_sales_fact and online_page_dimension
SELECT page_description, page_type, start_date, end_date
FROM online_sales.online_sales_fact f, online_sales.online_page_dimension d
WHERE f.online_page_key = d.online_page_key
AND page_number IN
(SELECT MAX(page_number)
FROM online_sales.online_page_dimension)
AND page_type = 'monthly' AND start_date = '2003-06-02';

Output
page_description
| page_type | start_date | end_date
---------------------------+-----------+------------+----------Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
Online Page Description #1 | monthly
| 2003-06-02 | 2003-06-11
(12 rows)

vmart_query_09.sql
-- vmart_query_09.sql
-- Equi join
-- Joins online_sales_fact table and the call_center_dimension

HP Vertica Analytics Platform (7.1.x)

Page 184 of 5055

HP Vertica Documentation

-- table with the ON clause


SELECT sales_quantity, sales_dollar_amount, transaction_type, cc_name
FROM online_sales.online_sales_fact
INNER JOIN online_sales.call_center_dimension
ON (online_sales.online_sales_fact.call_center_key
= online_sales.call_center_dimension.call_center_key
AND sale_date_key = 156)
ORDER BY sales_dollar_amount DESC;

Output
sales_quantity | sales_dollar_amount | transaction_type |
cc_name
----------------+---------------------+------------------+------------------7 |
589 | purchase
| Central Midwest
8 |
589 | purchase
| South Midwest
8 |
589 | purchase
| California
1 |
587 | purchase
| New England
1 |
586 | purchase
| Other
1 |
584 | purchase
| New England
4 |
584 | purchase
| New England
7 |
581 | purchase
| Mid Atlantic
5 |
579 | purchase
| North Midwest
8 |
577 | purchase
| North Midwest
4 |
577 | purchase
| Central Midwest
2 |
575 | purchase
| Hawaii/Alaska
4 |
573 | purchase
| NY Metro
4 |
572 | purchase
| Central Midwest
1 |
570 | purchase
| Mid Atlantic
9 |
569 | purchase
| Southeastern
1 |
569 | purchase
| NY Metro
5 |
567 | purchase
| Other
7 |
567 | purchase
| Hawaii/Alaska
9 |
567 | purchase
| South Midwest
1 |
566 | purchase
| New England
...

HP Vertica Analytics Platform (7.1.x)

Page 185 of 5055

Installation Guide

HP Vertica Analytics Platform (7.1.x)

Page 186 of 5055

HP Vertica Documentation

Installation Overview and Checklist


This page provides an overview of installation tasks. Carefully review and follow the instructions in
all sections in this topic.

Important Notes
l

HP Vertica supports only one running database per cluster.

HP Vertica supports installation on one, two, or multiple nodes. The steps for Installing HP
Vertica are the same, no matter how many nodes are in the cluster.

Prerequisites listed in Before You Install HP Vertica are required for all HP Vertica
configurations.

Only one instance of HP Vertica can be running on a host at any time.

To run the install_vertica script, as well as adding, updating, or deleting nodes, you must be
logged in as root, or sudo as a user with all privileges. You must run the script for all installations,
including upgrades and single-node installations.

Installation Scenarios
The four main scenarios for installing HP Vertica on hosts are:
l

A single node install, where HP Vertica is installed on a single host as a localhost process. This
form of install cannot be expanded to more hosts later on and is typically used for development
or evaluation purposes.

Installing to a cluster of physical host hardware. This is the most common scenario when
deploying HP Vertica in a testing or production environment.

Installing on Amazon Web Services (AWS). When you choose the recommended Amazon
Machine Image (AMI), Vertica is installed when you create your instances. For the AWS
specific installation procedure, see Installing and Running Vertica on AWS: The Detailed
Procedure rather than the using the steps for installation and upgrade that appear in this guide.

Installing to a local cluster of virtual host hardware. Also similar to installing on physical hosts,
but with network configuration differences.

HP Vertica Analytics Platform (7.1.x)

Page 187 of 5055

HP Vertica Documentation

Before You Install


Before You Install HP Vertica describes how to construct a hardware platform and prepare Linux for
HP Vertica installation.
These preliminary steps are broken into two categories:
l

Configuring Hardware and Installing Linux

Configuring the Network

Install or Upgrade HP Vertica


Once you have completed the steps in the Before You Install HP Vertica section, you are ready to
run the install script.
Installing HP Vertica describes how to:
l

Back up any existing databases.

Download and install the HP Vertica RPM package.

Install a cluster using the install_vertica script.

[Optional] Create a properties file that lets you install HP Vertica silently.

Note: This guide provides additional manual procedures in case you encounter installation
problems.

Upgrading HP Vertica to a New Version describes the steps for upgrading to a more recent
version of the software.

Upgrading HP Vertica from Community Edition to Enterprise Edition describes the steps for
upgrading HP Vertica to an evaluation or Enterprise Edition version of the software.

Post-Installation Tasks
After You Install HP Vertica describes subsequent steps to take after you've run the installation
script. Some of the steps can be skipped based on your needs:

HP Vertica Analytics Platform (7.1.x)

Page 188 of 5055

HP Vertica Documentation

Install the license key.

Verify that kernel and user parameters are correctly set.

Install the vsql client application on non-cluster hosts.

Resolve any SUSE10 issues during spread configuration.

Use the HP Vertica documentation online, or download and install HP Vertica documentation.
Find the online documentation and documentation packages to download at
http://www.vertica.com/documentation.

Install client drivers.

Extend your installation with HP Vertica packages.

Install or upgrade the Management Console.

Get started!
l

Read the Concepts Guide for a high-level overview of the HP Vertica Analytics Platform.

Proceed to the Installing and Connecting to the VMart Example Database in the Getting Started
Guide, where you will be guided through setting up a database, loading sample data, and running
sample queries.

About HP Vertica-created Linux Users and Their


Privileges
This topic describes the Linux accounts that the installer creates and configures so HP Vertica can
run. When you install HP Vertica, the installation script optionally creates the following Linux user
and group:
l

dbadminAdministrative user

verticadbaGroup for DBA users

dbadmin and verticadba are the default names. If you want to change what these Linux accounts
are called, you can do so using the installation script. See Installing HP Vertica with the install_
vertica Script for details.

HP Vertica Analytics Platform (7.1.x)

Page 189 of 5055

HP Vertica Documentation

Before You Install HP Vertica


The user who runs the HP Vertica installer must have sudo privileges on all cluster nodes. See the
following topics for more information:
l

Installation Overview and Checklist

General Hardware and OS Requirements and Recommendations

When You Install HP Vertica


The Linux dbadmin user owns the database catalog and data storage on disk. When you run the
install script, HP Vertica creates this user on each node in the database cluster. It also adds
dbadmin to the Linux dbadmin and verticadba groups, and configures the account as follows:
l

Configures and authorizes dbadmin for passwordless SSH between all cluster nodes. SSH
must be installed and configured to allow passwordless logins. See Enable Secure Shell (SSH)
Logins.

Sets the dbadmin user's BASH shell to /bin/bash, required to run scripts, such as install_
vertica and the Administration Tools.

Provides read-write-execute permissions on the following directories:


n

/opt/vertica/*

/home/dbadminthe default directory for database data and catalog files (configurable
through the install script)

Note: The HP Vertica installation script also creates an HP Vertica database superuser named
dbadmin. They share the same name, but they are not the same; one is a Linux user and the
other is an HP Vertica user. See DBADMIN User in the Administrator's Guide for information
about the database superuser.

After You Install HP Vertica


Root or sudo privileges are not required to start or run HP Vertica after the installation process
completes.
The dbadmin user can log in and perform HP Vertica tasks, such as creating a database,
installing/changing the license key, or installing drivers. If dbadmin wants database directories in a

HP Vertica Analytics Platform (7.1.x)

Page 190 of 5055

HP Vertica Documentation

location that differs from the default, the root user (or a user with sudo privileges) must create the
requested directories and change ownership to the dbadmin user.
HP Vertica prevents administration from users other than the dbadmin user (or the user name you
specified during the installation process if not dbadmin). Only this user can run Administration
Tools.

See Also
l

Installation Overview and Checklist

Before You Install HP Vertica

Platform Requirements and Recommendations

Enable Secure Shell (SSH) Logins

HP Vertica Analytics Platform (7.1.x)

Page 191 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 192 of 5055

HP Vertica Documentation

Before You Install HP Vertica


Complete all of the tasks in this section before you install HP Vertica. When you have completed
this section, proceed to Installing HP Vertica.

HP Vertica Analytics Platform (7.1.x)

Page 193 of 5055

HP Vertica Documentation

Platform Requirements and Recommendations


You must verify that your servers meet the platform requirements described in Supported
Platforms. The Supported Platforms topics detail supported versions for the following:
l

OS for Server and Management Console (MC)

Supported Browsers for MC

HP Vertica driver compatibility

Hadoop

Various plug-ins

BASH Shell
All shell scripts included in HP Vertica must run under the BASH shell. If you are on a Debian
system, then the default shell can be DASH. DASH is not supported. Change the shell for root and
for the dbadmin user to BASH with the chsh command.
For example:
# getent passwd | grep root
root:x:0:0:root:/root:/bin/dash
# chsh
Changing shell for root.
New shell [/bin/dash]: /bin/bash
Shell changed.

Then, as root, change the symbolic link for /bin/sh from /bin/dash to /bin/bash:
# rm /bin/sh
# ln -s /bin/bash /bin/sh

Log out and back in for the change to take effect.

Install the Latest Vendor Specific System Software


Install the latest vendor drivers for your hardware. For HP Servers, update to the latest versions for:

HP Vertica Analytics Platform (7.1.x)

Page 194 of 5055

HP Vertica Documentation

HP ProLiant Smart Array Controller Driver (cciss)

Smart Array Controller Firmware

HP Array Configuration Utility (HP ACU CLI)

Data Storage Recommendations


l

All internal drives connect to a single RAID controller.

The RAID array should form one hardware RAID device as a contiguous /data volume.

Validation Utilities
HP Vertica provides several validation utilities that validate the performance on prospective hosts.
The utilities are installed when you install the HP Vertica RPM, but you can use them before you
run the install_vertica script. See Validation Scripts for more details on running the utilities and
verifying that your hosts meet the recommended requirements.

General Hardware and OS Requirements and Recommendations


Hardware Recommendations
The HP Vertica Analytic Database is based on a massively parallel processing (MPP), sharednothing architecture, in which the query processing workload is divided among all nodes of the
Vertica database. HP highly recommends using a homogeneous hardware configuration for your
HP Vertica cluster; that is, each node of the cluster should be similar in CPU, clock speed, number
of cores, memory, and operating system version.
Note that HP has not tested HP Vertica on clusters made up of nodes with disparate hardware
specifications. While it is expected that an HP Vertica database would functionally work in a mixed
hardware configuration, performance will most certainly be limited to that of the slowest node in the
cluster.
Detailed hardware recommendations are available in the HP Vertica Hardware Planning Guide.

Platform OS Requirements
Important! Deploy HP Vertica as the only active process on each hostother than Linux
processes or software explicitly approved by HP Vertica. HP Vertica cannot be collocated with
other software. Remove or disable all non-essential applications from cluster hosts.

HP Vertica Analytics Platform (7.1.x)

Page 195 of 5055

HP Vertica Documentation

You must verify that your servers meet the platform requirements described in HP Vertica Server
and HP Vertica Management Console.

Verify Sudo
HP Vertica uses the sudo command during installation and some administrative tasks. Ensure that
sudo is available on all hosts with the following command:
# which sudo
/usr/bin/sudo

If sudo is not installed, browse to the Sudo Main Page and install sudo on all hosts.
When you use sudo to install HP Vertica, the user that performs the installation must have
privileges on all nodes in the cluster.
Configuring sudo with privileges for the individual commands can be a tedious and error-prone
process; thus, the HP Vertica documentation does not include every possible sudo command that
you can include in the sudoers file. Instead, HP recommends that you temporarily elevate the sudo
user to have all privileges for the duration of the install.
Note: See the sudoers and visudo man pages for the details on how to write/modify a sudoers
file.
To allow root sudo access on all commands as any user on any machine, use visudo as root to edit
the /etc/sudoers file and add this line:
## Allow root to run any commands anywhere
root
ALL=(ALL) ALL

After the installation completes, remove (or reset) sudo privileges to the pre-installation settings.

HP Vertica Analytics Platform (7.1.x)

Page 196 of 5055

HP Vertica Documentation

Prepare Disk Storage Locations


You must create and specify directories in which to store your catalog and data files (physical
schema). You can specify these locations when you install or configure the database, or later
during database operations.
The directory you specify for your catalog files (the catalog path) is used across all nodes. That is, if
you specify /home/catalog for your catalog files, HP Vertica will use /home/catalog as the catalog
path on all nodes. The catalog directory should always be separate from any data files.
Note: Do not use a shared directory for more than one node. Data and catalog directories must
be distinct for each node. Multiple nodes must not be allowed to write to the same data or
catalog directory.
The same is true for your data path. If you specify that your data should be stored in /home/data,
HP Vertica ensures this is the data path used on all database nodes.
Do not use a single directory to contain both catalog and data files. You can store the catalog and
data directories on different drives, which can be either on drives local to the host (recommended for
the catalog directory) or on a shared storage location, such as an external disk enclosure or a SAN.
Both the catalog and data directories must be owned by the database administrator.
Before you specify a catalog or data path, be sure the parent directory exists on all nodes of your
database. The database creation process in admintools creates the actual directories, but the
parent directory must exist on all nodes.
You do not need to specify a disk storage location during installation. However, you can by using
the --data-dir parameter to the install_vertica script. See Specifying Disk Storage Location
During Installation

See Also
l

Specifying Disk Storage Location on MC

Specifying Disk Storage Location During Database Creation

Configuring Disk Usage to Optimize Performance

Using Shared Storage With HP Vertica

HP Vertica Analytics Platform (7.1.x)

Page 197 of 5055

HP Vertica Documentation

Disk Space Requirements for HP Vertica


In addition to actual data stored in the database, HP Vertica requires disk space for several data
reorganization operations, such as mergeout and managing nodes in the cluster. For best results,
HP recommends that disk utilization per node be no more than sixty percent (60%) for a K-Safe=1
database to allow such operations to proceed.
In addition, disk space is temporarily required by certain query execution operators, such as hash
joins and sorts, in the case when they cannot be completed in memory (RAM). Such operators
might be encountered during queries, recovery, refreshing projections, and so on. The amount of
disk space needed (known as temp space) depends on the nature of the queries, amount of data on
the node and number of concurrent users on the system. By default, any unused disk space on the
data disk can be used as temp space. However, HP recommends provisioning temp space
separate from data disk space. See Configuring Disk Usage to Optimize Performance.

HP Vertica Analytics Platform (7.1.x)

Page 198 of 5055

HP Vertica Documentation

Configuring the Network


This group of steps involve configuring the network. These steps differ depending on your
installation scenario. A single node installation requires little network configuration, since the single
instance of the HP Vertica server does not need to communication with other nodes in a cluster. For
cluster and cloud install scenarios, you must make several decisions regarding your configuration.
HP Vertica supports server configuration with multiple network interfaces. For example, you might
want to use one as a private network interface for internal communication among cluster hosts (the
ones supplied via the --hosts option to install_vertica) and a separate one for client
connections.
Important: HP Vertica performs best when all nodes are on the same subnet and have the
same broadcast address for one or more interfaces. A cluster that has nodes on more than one
subnet can experience lower performance due to the network latency associated with a multisubnet system at high network utilization levels.

Important Notes
l

Network configuration is exactly the same for single nodes as for multi-node clusters, with one
special exception. If you install HP Vertica on a single host machine that is to remain a
permanent single-node configuration (such as for development or Proof of Concept), you can
install HP Vertica using localhost or the loopback IP (typically 127.0.0.1) as the value for -hosts. Do not use the hostname localhost in a node definition if you are likely to add nodes to
the configuration later.

If you are using a host with multiple network interfaces, configure HP Vertica to use the address
which is assigned to the NIC that is connected to the other cluster hosts.

Use a dedicated gigabit switch. If you do not performance could be severely affected.

Do not use DHCP dynamically-assigned IP addresses for the private network. Use only static
addresses or permanently-leased DHCP addresses.

Optionally Run Spread on Separate Control Network


If your query workloads are network intensive, you can use the --control-network parameter with
the install_vertica script (see Installing HP Vertica with the install_vertica Script) to allow
spread communications to be configured on a subnet that is different from other HP Vertica data
communications.

HP Vertica Analytics Platform (7.1.x)

Page 199 of 5055

HP Vertica Documentation

The --control-network parameter accepts either the default value or a broadcast network IP
address (for example, 192.168.10.255 ).

Configure SSH
l

Verify that root can use Secure Shell (SSH) to log in (ssh) to all hosts that are included in the
cluster. SSH (SSH client) is a program for logging into a remote machine and for running
commands on a remote machine.

If you do not already have SSH installed on all hosts, log in as root on each host and install it
before installing HP Vertica. You can download a free version of the SSH connectivity tools from
OpenSSH.

Make sure that /dev/pts is mounted. Installing HP Vertica on a host that is missing the mount
point /dev/pts could result in the following error when you create a database:
TIMEOUT ERROR: Could not login with SSH. Here is what SSH said:Last login: Sat Dec 15
18:05:35 2007 from node01

Allow Passwordless SSH Access for the Dbadmin User


The dbadmin user must be authorized for passwordless ssh. In typical installs, you won't need to
change anything; however, if you set up your system to disallow passwordless login, you'll need to
enable it for the dbadmin user. See Enable Secure Shell (SSH) Logins.

Ensure Ports Are Available


Verify that ports required by HP Vertica are not in use by running the following command as the root
user and comparing it with the ports required in Firewall Considerations below:
netstat -atupn

Firewall Considerations
HP Vertica requires several ports to be open on the local network. HP Vertica does not recommend
placing a firewall between nodes (all nodes should be behind a firewall), but if you must use a
firewall between nodes, ensure the following ports are available:

HP Vertica Analytics Platform (7.1.x)

Page 200 of 5055

HP Vertica Documentation

Port

Protocol

Service

Notes

22

TCP

sshd

Required by Administration Tools and


the Management Console Cluster
Installation wizard.

5433 TCP

HP Vertica

HP Vertica client (vsql, ODBC, JDBC,


etc) port.

5434 TCP

HP Vertica

Intra-cluster communication. HP Vertica


opens the HP Vertica client port +1 (5434
by default) for intra-cluster
communication, such as during a plan. If
the port +1 from the default client port is
not available, then HP Vertica opens a
random port for intra-cluster
communication.

5433 UDP

HP Vertica

HP Vertica spread monitoring.

5444 TCP

HP Vertica

MC-to-node and node-to-node (agent)

Management Console

communications port. See Changing MC


or Agent Ports.

5450 TCP

HP Vertica

Port used to connect to MC from a web

Management Console

browser and allows communication from


nodes to the MC application/web server.
See Connecting to Management Console.

4803 TCP

Spread

Client connections.

4803 UDP

Spread

Daemon to Daemon connections.

4804 UDP

Spread

Daemon to Daemon connections.

6543 UDP

Spread

Monitor to Daemon connection.

HP Vertica Analytics Platform (7.1.x)

Page 201 of 5055

HP Vertica Documentation

Operating System Configuration Task Overview


This topic provides a high-level overview of the OS settings required for HP Vertica. Each item
provides a link to additional details about the setting and detailed steps on making the configuration
change. The installer tests for all of these settings and provides hints, warnings, and failures if the
current configuration does not meet HP Vertica requirements.

Before you Install the Operating System


Configuration Description
Supported

Verify that your servers meet the platform requirements described in HP Vertica

Platforms

7.1Supported Platforms. Unsupported operating systems are detected by the


installer.

LVM

Linux Logical Volume Manager (LVM) is not supported on partitions that contain
HP Vertica files.

Filesystem

The filesystem for the HP Vertica data and catalog directories must be
formatted as ext3 or ext4.

Swap Space

A 2GB swap partition is required. Partition the remaining disk space in a single
partition under "/".

Disk Block

The disk block size for the HP Vertica data and catalog directories should be

Size

4096 bytes (the default for ext3 and ext4 filesystems).

Memory

For more information on sizing your hardware, see the HP Vertica Hardware
Planning Guide.

Firewall Considerations
Configuration Description
Firewall/Ports

Firewalls, if present, must be configured so as not to interfere with HP Vertica.

General Operating System Configuration - Automatically


Configured by Installer
These general OS settings are automatically made by the installer if they do not meet HP Vertica
requirements. You can prevent the installer from automatically making these configuration changes

HP Vertica Analytics Platform (7.1.x)

Page 202 of 5055

HP Vertica Documentation

by using the --no-system-configuration parameter for the install_vertica script.


Configuration Description
Nice Limits

The database administration user must be able to nice processes back to the
default level of 0.

min_free_

The vm.min_free_kbytes setting in /etc/sysctl.conf must be configured

kbytes

sufficiently high. The specific value depends on your hardware configuration.

User Open

The open file limit for the dbadmin user should be at least 1 file open per MB of

Files Limit

RAM, 65536, or the amount of RAM in MB; whichever is greater.

System Open

The maximum number of files open on the system must not be less than at

File Limits

least the amount of memory in MB, but not less than 65536.

Pam Limits

/etc/pam.d/su must contain the line:


session required pam_limits.so
This allows for the conveying of limits to commands run with the su command.

Address

The address space limits (as setting) defined in /etc/security/limits.conf must

Space Limits

be unlimited for the database administrator.

File Size

The file sizelimits (fsize setting) defined in /etc/security/limits.conf must be

Limits

unlimited for the database administrator.

User Process

The nproc setting defined in /etc/security/limits.conf must be 1024 or the

Limits

amount of memory in MB, whichever is greater.

Maximum

The vm.max_map_count in /etc/sysctl.conf must be 65536 or the amount of

Memory Maps

memory in KB / 16, whichever is greater.

General Operating System Configuration - Manual Configuration


The following general OS settings must be done manually.
Configuration

Description

Disk Readahead

This disk readahead must be at least 2048. The specific value depends on
your hardware configuration.

NTP Services

The NTP daemon must be enabled and running.

HP Vertica Analytics Platform (7.1.x)

Page 203 of 5055

HP Vertica Documentation

Configuration

Description

SELinux

SElinux must be disabled or run in permissive mode.

CPU Frequency

HP Vertica recommends that you disable CPU Frequency Scaling.

Scaling
Important: Your systems may use significantly more energy when CPU
frequency scaling is disabled.
Transparent

Transparent Hugepages should be disabled or set to madvise.

Hugepages
I/O Scheduler

The I/O Scheduler for disks used by HP Vertica must be set to deadline or
noop.

Support Tools

Several optional packages can be installed to assist HP Vertica support when


troubleshooting your system.

System User Requirements


The following tasks pertain to the configuration of the system user required by HP Vertica.
Configuration Required Setting(s)
System User

The installer automatically creates a user with the correct settings. If you

Requirements

specify a user with --dba-user, then the user must conform to the
requirements for the HP Vertica system user.

LANG

The LANG environment variable must be set and valid for the database

Environment

administration user.

Settings
TZ

The TZ environment variable must be set and valid for the database

Environment

administration user.

Settings

Before You Install The Operating System


The topics in this section detail system settings that must be configured when you install the
operating system. These settings cannot be easily changed after the operating system is installed.

HP Vertica Analytics Platform (7.1.x)

Page 204 of 5055

HP Vertica Documentation

Supported Platforms
The HP Vertica installer checks the type of operating system that is installed. If the operating
system does not meet one of the supported operating systems (See HP Vertica Server and HP
Vertica Management Console), or the operating system cannot be determined, then the installer
halts.
The installer generates one of the following issue identifiers if it detects an unsupported operating
system:
l

[S0320] - Fedora OS is not supported.

[S0321] - The version of RedHat is not supported.

[S0322] - The version of Ubuntu/Debian is not supported.

[S0323] - The operating system could not be determined. The unknown operating system is not
supported because it does not match the list of supported operating systems.

LVM Warning
HP Vertica does not support LVM (Logical Volume Manager) on any drive where database (catalog
and data) files are stored. The installer reports this issue with the identifier: S0170.

Filesystem Requirement
HP Vertica requires that your Linux filesystem be either ext3 or ext4. All other filesystem types are
unsupported. The installer reports this issue with the identifier S0160.

Swap Space Requirements


HP Vertica requires at least 2 GB swap partition regardless of the amount of RAM installed on your
system. The installer reports this issue with identifier S0180.
For typical installations HP Vertica recommends that you partition your system with a 2GB primary
partition for swap regardless of the amount of installed RAM. Larger swap space is acceptable, but
unnecessary.
Note: Do not place a swap file on a disk containing the HP Vertica data files. If a host has only
two disks (boot and data), put the swap file on the boot disk.

HP Vertica Analytics Platform (7.1.x)

Page 205 of 5055

HP Vertica Documentation

If you do not have at least a 2 GB swap partition then you may experience performance issues
when running HP Vertica.
You typically define the swap partition when you install Linux. See your platforms documentation
for details on configuring the swap partition.

Disk Block Size Requirements


HP Vertica recommends that the disk block size be 4096 bytes, which is generally the default on
ext3 and ext4 filesystems. The installer reports this issue with the identifier S0165.
The disk block size is set when you format your file system. Changing the block size requires a reformat.

Memory Requirements
HP Vertica requires, at a minimum, 1GB of RAM per logical processor. The installer reports this
issue with the identifier S0190.
However, for performance reasons, you typically require more RAM than the minimum. For more
information on sizing your hardware, see the HP Vertica Hardware Planning Guide.

Firewall Considerations
HP Vertica requires multiple ports be open between nodes. You may use a firewall (IP Tables) on
Redhat/CentOS and Ubuntu/Debian based systems. Note that firewall use is not supported on
SuSE systems and that SuSE systems must disable the firewall. The installer reports issues found
with your IP tables configuration with the identifiers N0010 for (systems that use IP Tables) and
N011 (for SuSE systems).
The installer checks the IP tables configuration and issues a warning if there are any configured
rules or chains. The installer does not detect if the configuration may conflict with HP Vertica. It is
your responsibility to verify that your firewall allows traffic for HP Vertica as described in Ensure
Ports Are Available.
Note: The installer does not check NAT entries in iptables.
You can modify your firewall to allow for HP Vertica network traffic, or you can disable the firewall if
your network is secure. Note that firewalls are not supported for HP Vertica systems running on
SuSE.

HP Vertica Analytics Platform (7.1.x)

Page 206 of 5055

HP Vertica Documentation

RedHat And CentOS Systems


For details on how to configure iptables and allow specific ports to be open, see the platformspecific documentation for your platform:
l

RedHat: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_
Linux/6/html/Security_Guide/sect-Security_Guide-IPTables.html

CentOS: http://wiki.centos.org/HowTos/Network/IPTables

To disable iptables, run the following command as root or sudo:


service iptables save
service iptables stop
chkconfig iptables off

To disable iptables if you are using the ipv6 versions of iptables, run the following command as root
or sudo:
service ip6tables save
service ip6tables stop
chkconfig ip6tables off

Ubuntu and Debian Based Systems


For details on how to configure iptables and allow specific ports to be open, see the platformspecific documentation for your platform:
l

Debian: https://wiki.debian.org/iptables

Ubuntu: https://help.ubuntu.com/12.04/serverguide/firewall.html.

Note: Ubuntu uses the ufw program to manage iptables.


To disable iptables on Debian, run the following command as root or sudo:
/etc/init.d/iptables stop
update-rc.d -f iptables remove

To disable iptables on Ubuntu, run the following command:


sudo ufw disable

HP Vertica Analytics Platform (7.1.x)

Page 207 of 5055

HP Vertica Documentation

SuSE Systems
The firewall must be disabled on SUSE systems. To disable the firewall on SuSE systems, run the
following command:
/sbin/SuSEfirewall2 off

Port Availability
The install_vertica script checks that required ports are open and available to HP Vertica. The
installer reports any issues with the identifier: N0020.

Port Requirements
The following table lists the ports required by HP Vertica.
Port

Protocol

Service

Notes

22

TCP

sshd

Required by Administration Tools and


the Management Console Cluster
Installation wizard.

5433 TCP

HP Vertica

HP Vertica client (vsql, ODBC, JDBC,


etc) port.

5434 TCP

HP Vertica

Intra-cluster communication. HP Vertica


opens the HP Vertica client port +1 (5434
by default) for intra-cluster
communication, such as during a plan. If
the port +1 from the default client port is
not available, then HP Vertica opens a
random port for intra-cluster
communication.

5433 UDP

HP Vertica

HP Vertica spread monitoring.

5444 TCP

HP Vertica

MC-to-node and node-to-node (agent)

Management Console

communications port. See Changing MC


or Agent Ports.

HP Vertica Analytics Platform (7.1.x)

Page 208 of 5055

HP Vertica Documentation

Port

Protocol

5450 TCP

Service

Notes

HP Vertica

Port used to connect to MC from a web

Management Console

browser and allows communication from


nodes to the MC application/web server.
See Connecting to Management Console.

4803 TCP

Spread

Client connections.

4803 UDP

Spread

Daemon to Daemon connections.

4804 UDP

Spread

Daemon to Daemon connections.

6543 UDP

Spread

Monitor to Daemon connection.

General Operating System Configuration - Automatically


Configured by the Installer
These general Operating System settings are automatically made by the installer if they do not
meet HP Vertica requirements. You can prevent the installer from automatically making these
configuration changes by using the --no-system-configuration parameter for the install_
vertica script.

sysctl
During installation, HP Vertica attempts to automatically change various OS level settings. The
installer may not change values on your system if they exceed the threshold required by the
installer. You can prevent the installer from automatically making these configuration changes by
using the --no-system-configuration parameter for the install_vertica script.
To permanently edit certain settings and prevent them from reverting on reboot, use sysctl.
The sysctl settings relevant to the installation of HP Vertica include:
l

min_free_kbytes

fs.file_max

vm.max_map_count

HP Vertica Analytics Platform (7.1.x)

Page 209 of 5055

HP Vertica Documentation

Permanently Changing Settings with sysctl:


1. As the root user, open the /etc/sysctl.conf file:
# vi /etc/sysctl.conf

2. Enter a parameter and value:


parameter = value

For example, to set the parameter and value for fs.file-max to meet HP Vertica requirements,
enter:
fs.file-max = 65536

3. Save your changes, and close the /etc/sysctl.conf file.


4. As the root user, reload the config file:
# sysctl -p

Identifying Settings Added by the Installer


You can see whether the installer has added a setting by opening the /etc/sysctl.conf file:
# vi /etc/sysctl.conf

If the installer has added a setting, the following line appears:


# The following 1 line added by Vertica tools. 2015-02-23 13:20:29
parameter = value

Nice Limits Configuration


The HP Vertica system user (dbadmin by default) must be able to raise and lower the priority of HP
Vertica processes. To do this, the nice option in the /etc/security/limits.conf file must include
an entry for the dbadmin user. The installer reports this issue with the identifier: S0010.
The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.

HP Vertica Analytics Platform (7.1.x)

Page 210 of 5055

HP Vertica Documentation

Note: HP Vertica never raises priority above the default level of 0. However, HP Vertica does
lower the priority of certain HP Vertica threads and needs to able to raise the priority of these
threads back up to the default level. This setting allows HP Vertica to raise the priorities back
to the default level.

All Systems
To set the Nice Limit configuration for the dbadmin user, edit /etc/security/limits.conf and
add the following line. Replace dbadmin with the name of your system user.
dbadmin -

nice

min_free_kbytes Setting
This topic details how to update the min_free_kbytes setting so that it is within the range supported
by HP Vertica. The installer reports this issue with the identifier: S0050 if the setting is too low, or
S0051 if the setting is too high.
The vm.min_free_kbytes setting configures the page reclaim thresholds. When this number is
increased the system starts reclaiming memory earlier, when its lowered it starts reclaiming
memory later. The default min_free_kbytes is calculated at boot time based on the number of pages
of physical RAM available on the system.
The setting must be the greater of:
l

The default value configured by the system, or

4096, or

determine the value from running the command below.

The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.

AllSystems
To manually set min_free_kbytes:

HP Vertica Analytics Platform (7.1.x)

Page 211 of 5055

HP Vertica Documentation

1. Determine the current/default setting with the following command:


/sbin/sysctl vm.min_free_kbytes

2. If the result of the previous command is No such file or directory or the default value is
less than 4096, then run the command below:
memtot=`grep MemTotal /proc/meminfo | awk '{printf "%.0f",$2}'`
echo "scale=0;sqrt ($memtot*16)" | bc

3. Edit or add the current value of vm.min_free_kbytes in /sbin/sysctl.conf with the value
from the output of the previous command.
# The min_free_kbytes setting
vm.min_free_kbytes=5572

4. Run sysctl -p to apply the changes in sysctl.conf immediately.

Note: These steps will need to be replicated for each node in the cluster.

User Max Open Files Limit


This topic details how to change the user max open-files limit setting to meet HP Vertica
requirements. The installer reports this issue with the identifier: S0060.
The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.
HP Vertica requires that the dbadmin user not be limited when opening files.The open file limit
should be at least 1 file open per MB of RAM, 65536, or the amount of RAM in MB; whichever is
greater. HP Vertica sets this to the minimum recommended value of 65536 or the amount of RAM in
MB.

All Systems
The open file limit can be determined by running ulimit -n as the dbadmin user. For example:
dbadmin@localhost:$ ulimit -n

HP Vertica Analytics Platform (7.1.x)

Page 212 of 5055

HP Vertica Documentation

65536

To manually set the limit, edit /etc/security/limits.conf and edit/add the line for the nofile
setting for the user you configured as the database admin (default dbadmin). The setting must be at
least 65536.
dbadmin -

nofile

65536

Note: There is also an open file limit on the system. See System Max Open Files Limit.

System Max Open Files Limit


This topic details how to modify the limit for the number of open files on your system so that it
meets HP Vertica requirements. The installer reports this issue with the identifier: S0120.
HP Vertica opens many files. Some platforms have global limits on the number of open files. The
open file limit must be set sufficiently high so as not to interfere with database operations.
The recommended value is at least the amount of memory in MB, but not less than 65536.
The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.

All Systems
To manually set the open file limit:
1. Run /sbin/sysctl fs.file-max to determine the current limit.
2. If the limit is not 65536 or the amount of system memory in MB (whichever is higher), then edit
or add fs.file-max=max number of files to /etc/sysctl.conf.
# Controls the maximum number of open files
fs.file-max=65536

3. Run sysctl -p to apply the changes in sysctl.conf immediately.

Note: These steps will need to be replicated for each node in the cluster.

HP Vertica Analytics Platform (7.1.x)

Page 213 of 5055

HP Vertica Documentation

Pam Limits
This topic details how to enable the "su" pam_limits.so module required by HP Vertica. The installer
reports issues with the setting with the identifier: S0070.
On some systems the pam module called pam_limits.so is not set in the file /etc/pam.d/su.
When it is not set, it prevents the conveying of limits (such as open file descriptors) to any
command started with su -.
In particular, the HP Vertica init script would fail to start HP Vertica because it calls the
Administration Tools to start a database with the su - command. This problem was first noticed on
Debian systems, but the configuration could be missing on other Linux distributions. See the pam_
limits man page for more details.
The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.

All Systems
To manually configure this setting, append the following line to the /etc/pam.d/su file:
session required pam_limits.so

See the pam_limits man page for more details: man pam_limits.

User Address Space Limits


This topic details how to modify the Linux address space limit for the dbadmin user so that it meets
HP Vertica requirements. The address space setting controls the maximum number of threads and
processes for each user. If this setting does not meet the requirements then the installer reports this
issue with the identifier: S0090.
The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.
The address space available to the dbadmin user must not be reduced via user limits and must be
set to unlimited.

All Systems
To manually set the address space limit:

HP Vertica Analytics Platform (7.1.x)

Page 214 of 5055

HP Vertica Documentation

1. Run ulimit -v as the dbadmin user to determine the current limit.


2. If the limit is not unlimited, then add the following line to /etc/security/limits.conf.
Replace dbadmin with your database admin user
dbadmin -

as

unlimited

User File Size Limit


This topic details how to modify the file size limit for files on your system so that it meets HP
Vertica requirements. The installer reports this issue with the identifier: S0100.
The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.
The file size limit for the dbadmin user must not be reduced via user limits and must be set to
unlimited.

All Systems
To manually set the file size limit:
1. Run ulimit -f as the dbadmin user to determine the current limit.
2. If the limit is not unlimited, then edit/add the following line to /etc/security/limits.conf.
Replace dbadmin with your database admin user.
dbadmin -

fsize

unlimited

User Process Limit


This topic details how to change the user process limit so that it meets HP Vertica
requirements.The installer reports this issue with the identifier: S0110.
The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.
The user process limit must be high enough to allow for the many threads opened by HP Vertica.
The recommended limit is the amount of RAM in MB and must be at least 1024.

HP Vertica Analytics Platform (7.1.x)

Page 215 of 5055

HP Vertica Documentation

All Systems
To manually set the user process limit:
1. Run ulimit -u as the dbadmin user to determine the current limit.
2. If the limit is not the amount of memory in MB on the server, then edit/add the following line to
/etc/security/limits.conf. Replace 4096 with the amount of system memory, in MB, on
the server.
dbadmin -

nproc

4096

Maximum Memory Maps Configuration


This topic details how to modify the limit for the number memory maps a process can have on your
system so that it meets HP Vertica requirements. The installer reports this issue with the identifier:
S0130.
The installer automatically configures the correct setting if the default value does not meet system
requirements. If there is an issue setting this value, or you have used the --no-systemconfiguration argument to the installer and the current setting is incorrect, then the installer
reports this as an issue.
HP Vertica uses a lot of memory while processing and can approach the default limit for memory
maps per process.
The recommended value is at least the amount of memory on the system in KB / 16, but not less
than 65536.

All Systems
To manually set the memory map limit:
1. Run /sbin/sysctl vm.max_map_count to determine the current limit.
2. If the limit is not 65536 or the amount of system memory in KB / 16 (whichever is higher), then
edit/add the following line to /etc/sysctl.conf. Replace 65536 with the value for your
system.
# The following 1 line added by Vertica tools. 2014-03-07 13:20:31
vm.max_map_count=65536

3. Run sysctl -p to apply the changes in sysctl.conf immediately.

HP Vertica Analytics Platform (7.1.x)

Page 216 of 5055

HP Vertica Documentation

Note: These steps will need to be replicated for each node in the cluster.

General Operating System Configuration - Manual Configuration


The following general Operating System settings must be done manually.

Disk Readahead
This topic details how to change Disk Readahead to a supported value. HP Vertica requires that
Disk Readahead be set to at least 2048. The installer reports this issue with the identifier: S0020.
Note:
l

These commands must be executed with root privileges and assumes the blockdev
program is in /sbin.

The blockdev program operates on whole devices, and not individual partitions. You cannot
set the readahead value to different settings on the same device. If you run blockdev
against a partition, for example: /dev/sda1, then the setting is still applied to the entire
/dev/sda device. For instance, running /sbin/blockdev --setra 2048 /dev/sda1 also
causes /dev/sda2 through /dev/sdaN to use a readahead value of 2048.

RedHat and SuSE Based Systems


For each drive in the HP Vertica system, HP Vertica recommends that you set the readahead value
to 2048 for most deployments. The command immediately changes the readahead value for the
specified disk. The second line adds the command to /etc/rc.local so that the setting is applied
each time the system is booted. Note that some deployments may require a higher value and the
setting can be set as high as 8192, under guidance of support.
/sbin/blockdev --setra 2048 /dev/sda
echo '/sbin/blockdev --setra 2048 /dev/sda' >> /etc/rc.local

Ubuntu and Debian Systems


For each drive in the HP Vertica system, set the readahead value to 2048. Run the command once
in your shell, then add the command to /etc/rc.local so that the setting is applied each time the
system is booted. Note that on Ubuntu systems, the last line in rc.local must be "exit 0". So you
must manually add the following line to etc/rc.local before the last line with exit 0.
/sbin/blockdev --setra 2048 /dev/sda

HP Vertica Analytics Platform (7.1.x)

Page 217 of 5055

HP Vertica Documentation

Note: For systems that do not support /etc/rc.local, use the equivalent startup script that
is run after the destination runlevel has been reached. For example SuSE uses
/etc/init.d/after.local.

Enabling Network Time Protocol (NTP)


Before you can install HP Vertica, you must enable Network Time Protocol (NTP) on your system
for clock synchronization. NTP must be both enabled and active at the time of installation. If NTP is
not enabled and active at the time of installation, the installer reports this issue with the identifier
S0030.

Verify That NTP Is Running


The network time protocol (NTP) daemon must be running on all of the hosts in the cluster so that
their clocks are synchronized. The spread daemon relies on all of the nodes to have their clocks
synchronized for timing purposes. If your nodes do not have NTP running, the installation can fail
with a spread configuration error or other errors.
Note: Different Linux distributions refer to the NTP daemon in different ways. For example,
SUSE and Debian/Ubuntu refer to it as ntp, while CentOS and Red Hat refer to it as ntpd. If
the following commands produce errors, try using ntp in place of ntpd.
To verify that your hosts are configured to run the NTP daemon on startup, enter the following
command:
$ chkconfig --list ntpd

Debian and Ubuntu do not support chkconfig, but they do offer an optional package. You can
install this package with the command sudo apt-get install sysv-rc-conf. To verify that your
hosts are configured to run the NTP daemon on startup with the sysv-rc-conf utility, enter the
following command:
$ sysv-rc-conf --list ntpd

The chkconfig command can produce an error similar to ntpd: unknown service. If you get this
error, verify that your Linux distribution refers to the NTP daemon as ntpd rather than ntp. If it does
not, you need to install the NTP daemon package before you can configure it. Consult your Linux
documentation for instructions on how to locate and install packages.
If the NTP daemon is installed, your output should resemble the following:

HP Vertica Analytics Platform (7.1.x)

Page 218 of 5055

HP Vertica Documentation

ntp 0:off 1:off 2:on 3:on 4:off 5:on 6:off

The output indicates the runlevels where the daemon runs. Verify that the current runlevel of the
system (usually 3 or 5) has the NTP daemon set to on. If you do not know the current runlevel, you
can find it using the runlevel command:
$ runlevel
N 3

Configure NTP for Red Hat and SUSE


If your system is based on Red Hat or SUSE, use the service and chkconfig utilities to start NTP
and have it start at startup.
/sbin/service ntpd restart
/sbin/chkconfig ntpd on

Red HatNTP uses the default time servers at ntp.org. You can change the default NTP
servers by editing /etc/ntpd.conf.

SUSEBy default, no time servers are configured. You must edit /etc/ntpd.conf after the
install completes and add time servers.

Configure NTP for Ubuntu and Debian


By default, the NTP daemon is not installed on some Ubuntu and Debian systems. First, install
NTP, and then start the NTP process. You can change the default NTP servers by editing
/etc/ntpd.confas shown:
sudo apt-get install ntp
sudo /etc/init.d/ntp reload

Verify That NTP Is Operating Correctly


To verify that the Network Time Protocol Daemon (NTPD) is operating correctly, issue the
following command on all nodes in the cluster.
For Red Hat and SUSE:
/usr/sbin/ntpq -c rv | grep stratum

For Ubuntu and Debian:

HP Vertica Analytics Platform (7.1.x)

Page 219 of 5055

HP Vertica Documentation

ntpq -c rv | grep stratum

A stratum level of 16 indicates that NTP is not synchronizing correctly.


If a stratum level of 16 is detected, wait 15 minutes and issue the command again. It may take this
long for theNTP server to stabilize.
If NTP continues to detect a stratum level of 16, verify that the NTP port (UDP Port 123) is open on
all firewalls between the cluster and the remote machine you are attempting to synchronize to.

Red Hat Documentation Related to NTP


The preceding links were current as of the last publication of the HP Vertica documentation and
could change between releases.
l

http://kbase.redhat.com/faq/docs/DOC-6731

http://kbase.redhat.com/faq/docs/DOC-6902

http://kbase.redhat.com/faq/docs/DOC-6991

SELinux Configuration
HP Vertica does not support SELinux except when SELinux is running in permissive mode. If it
detects that SELinux is installed and the mode cannot be determined the installer reports this issue
with the identifier: S0080. If the mode can be determined, and the mode is not permissive, then the
issue is reported with the identifier: S0081.

RedHat and SuSE Based Systems


You can either disable SELinux or change it to use permissive mode.
To disable SELinux:
1. Edit /etc/selinux/config and change setting for SELinux to disabled (SELINUX=disabled).
This disables SELinux at boot time.
2. As root/sudo, type setenforce 0 to disable SELinux immediately.
To change SELinux to use permissive mode:
1. Edit /etc/selinux/config and change setting for SELINUX to permissive
(SELINUX=Permissive).

HP Vertica Analytics Platform (7.1.x)

Page 220 of 5055

HP Vertica Documentation

2. As root/sudo, type setenforce Permissive to switch to permissive mode immediately.

Ubuntu and Debian Systems


You can either disable SELinux or change it to use permissive mode.
To disable SELinux:
1. Edit /selinux/config and change setting for SELinux to disabled (SELINUX=disabled). This
disables SELinux at boot time.
2. As root/sudo, type setenforce 0 to disable SELinux immediately.
To change SELinux to use permissive mode:
1. Edit /selinux/config and change setting for SELinux to permissive (SELINUX=Permissive).
2. As root/sudo, type setenforce Permissive to switch to permissive mode immediately.

CPU Frequency Scaling


This topic details the various CPU frequency scaling methods supported by HP Vertica. In general,
if you do not require CPU frequency scaling, then disable it so as not to impact system
performance.
Important: Your systems may use significantly more energy when frequency scaling is
disabled.
The installer allows CPU frequency scaling to be enabled when the cpufreq scaling governor is set
to performance. If the cpu scaling governor is set to ondemand, and ignore_nice_load is 1 (true),
then the installer fails with the error S0140. If the cpu scaling governor is set to ondemand and
ignore_nice_load is 0 (false), then the installer warns with the identifier S0141.
CPU frequency scaling is a hardware and software feature that helps computers conserve energy
by slowing the processor when the system load is low, and speeding it up again when the system
load increases. This feature can impact system performance, since raising the CPU frequency in
response to higher system load does not occur instantly. Always disable this feature on the HP
Vertica database hosts to prevent it from interfering with performance.
You disable CPU scaling in your host's system BIOS. There may be multiple settings in your host's
BIOS that you need to adjust in order to completely disable CPU frequency scaling. Consult your
host hardware's documentation for details on entering the system BIOS and disabling CPU
frequency scaling.

HP Vertica Analytics Platform (7.1.x)

Page 221 of 5055

HP Vertica Documentation

If you cannot disable CPU scaling through the system BIOS, you can limit the impact of CPU
scaling by disabling the scaling through the Linux kernel or setting the CPU frequency governor to
always run the CPU at full speed.
Caution: This method is not reliable, as some hardware platforms may ignore the kernel
settings. The only reliable method is to disable CPU scaling in BIOS.
The method you use to disable frequency depends on the CPU scaling method being used in the
Linux kernel. See your Linux distribution's documentation for instructions on disabling scaling in the
kernel or changing the CPU governor.

Transparent Hugepages
This topic details how to modify transparent hugepages so that the configuration meets HP Vertica
requirements. HP Vertica requires that transparent hugepages be disabled or set to madvise. The
installer reports this issue with the identifier: S0310.
If you are not using madvise as your transparent hugepage setting, then you can disable it with the
following steps:

RedHat Systems
To determine if transparent hugepages is enabled, run the following command. The setting returned
in brackets is your current setting.
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
[always] madvise never

You can disable transparent hugepages one of two ways:


1. Edit your boot loader (for example /etc/grub.conf), typically you add the following to the end
of the kernel line. However, consult the documentation for your system before editing your boot
loader configuration.
transparent_hugepage=never

2. Or, edit /etc/rc.local and add the following script.

Note: For systems that do not support /etc/rc.local, use the equivalent startup script
that is run after the destination runlevel has been reached. For example SuSE uses
/etc/init.d/after.local.

HP Vertica Analytics Platform (7.1.x)

Page 222 of 5055

HP Vertica Documentation

if test -f /sys/kernel/mm/redhat_transparent_hugepage/enabled; then


echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
fi

You must reboot your system for the setting to take effect, or run the following two echo lines to
proceed with the install without rebooting:
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

Other Systems
Note: SuSE did not offer transparent hugepage support in the initial 11.0 release. Subsequent
SuSE service packs do include support for transparent hugepages.
To determine if transparent hugepages is enabled, run the following command. The setting returned
in brackets is your current setting. Depending on your platform OS, the madvise setting may not be
displayed.
cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

You can disable transparent hugepages one of two ways:


1. Edit your boot loader (for example /etc/grub.conf), typically you add the following to the end
of the kernel line. However, consult the documentation for your system before editing your
bootloader configuration.
transparent_hugepage=never

2. Or, edit /etc/rc.local (on systems that support rc.local) and add the following script.

Note: For systems that do not support /etc/rc.local, use the equivalent startup script
that is run after the destination runlevel has been reached. For example SuSE uses
/etc/init.d/after.local.

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then


echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi

HP Vertica Analytics Platform (7.1.x)

Page 223 of 5055

HP Vertica Documentation

You must reboot your system for the setting to take effect, or run the following two echo lines to
proceed with the install without rebooting:
echo never > /sys/kernel/mm/transparent_hugepage/enabled

I/O Scheduling
This topic details how to change I/O Scheduling to a supported scheduler. HP Vertica requires that
I/O Scheduling be set to deadline or noop. If the installer detects that the system is using an
unsupported scheduler, then it reports this issue with the identifier: S0150. If the installer cannot
detect the type of scheduler that the system uses (typically if your system is using a RAID array)
then it reports the issue with identifier: S0151.
If your system is not using a RAID array, then complete the steps below to change your IO
Scheduler to a supported scheduler. If you are using a RAIDarray then consult the documentation
from your RAID vendor for the best performing scheduler for your hardware.

Configure the I/O Scheduler


The Linux kernel can use several different I/O schedulers to prioritize disk input and output. Most
Linux distributions use the Completely Fair Queuing (CFQ) scheme by default, which gives input
and output requests equal priority. This scheduler is efficient on systems running multiple tasks that
need equal access to I/O resources. However, it can create a bottleneck when used on HP
Verticadrives containing the catalog and data directories, since it gives write requests equal priority
to read requests, and its per-process I/O queues can penalize processes making more requests
than other processes.
Instead of the CFQ scheduler, configure your hosts to use either the Deadline or NOOP I/O
scheduler for the drives containing the catalog and data directories:
l

The Deadline scheduler gives priority to read requests over write requests. It also imposes a
deadline on all requests. After reaching the deadline, such requests gain priority over all other
requests. This scheduling methods helps prevent processes from becoming starved for I/O
access. The Deadline scheduler is best used on physical media drives (disks using spinning
platters), since it attempts to group requests for adjacent sectors on a disk, lowering the time the
drive spends seeking.

The NOOP scheduler uses a simple FIFO approach, placing all input and output requests into a
single queue. This scheduler is best used on solid state drives (SSDs). Since SSDs do not have
a physical read head, no performance penalty exists when accessing non-adjacent sectors.

Failure to use one of these schedulers for the HP Vertica drives containing the catalog and data
directories can result in slower database performance. Other drives on the system (such as the

HP Vertica Analytics Platform (7.1.x)

Page 224 of 5055

HP Vertica Documentation

drive containing swap space, log files, or the Linux system files) can still use the default CFQ
scheduler (although you should always use the NOOP scheduler for SSDs).
There are two ways for you to set the scheduler used by your disk devices:
1. Write the name of the scheduler to a file in the /sys directory.
--or-2. Use a kernel boot parameter.

Configure the I/O Scheduler - Changing the Scheduler Through the /sys Directory
You can view and change the scheduler Linux uses for I/O requests to a single drive using a virtual
file under the /sys directory. The name of the file that controls the scheduler a block device uses is:
/sys/block/deviceName/queue/scheduler

Where deviceName is the name of the disk device, such as sda or cciss\!c0d1 (the first disk on an
HP RAID array). Viewing the contents of this file shows you all of the possible settings for the
scheduler, with the currently-selected scheduler surrounded by square brackets:
# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

To change the scheduler, write the name of the scheduler you want the device to use to its
scheduler file. You must have root privileges to write to this file. For example, to set the sda drive to
use the deadline scheduler, run the following command as root:
# echo deadline > /sys/block/sda/queue/scheduler
# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

Changing the scheduler immediately affects the I/O requests for the device. The Linux kernel starts
using the new scheduler for all of the drive's input and output requests.
Note: While tests have shown no problems are caused by changing the scheduler settings
while HP Vertica is running, you should strongly consider shutting down any running HP
Vertica database before changing the I/O scheduler or making any other changes to the
system configuration.
Changes to the I/O scheduler made through the /sys directory only last until the system is
rebooted, so you need to add commands that change the I/O scheduler to a startup script (such as

HP Vertica Analytics Platform (7.1.x)

Page 225 of 5055

HP Vertica Documentation

those stored in /etc/init.d, or though a command in /etc/rc.local). You also need to use a
separate command for each drive on the system whose scheduler you want to change.
For example, to make the configuration take effect immediately and add it to rc.local so it is used on
subsequent reboots.
Note: For systems that do not support /etc/rc.local, use the equivalent startup script that
is run after the destination runlevel has been reached. For example SuSE uses
/etc/init.d/after.local.

echo deadline > /sys/block/sda/queue/scheduler


echo 'echo deadline > /sys/block/sda/queue/scheduler' >> /etc/rc.local

Note: On some Ubuntu/Debian systems, the last line in rc.local must be "exit 0". So you
must manually add the following line to etc/rc.local before the last line with exit 0.
You may prefer to use this method of setting the I/O scheduler over using a boot parameter if your
system has a mix of solid-state and physical media drives, or has many drives that do not store HP
Vertica catalog and data directories.

Configure the I/O Scheduler - Changing the Scheduler with a Boot Parameter
Use the elevator kernel boot parameter to change the default scheduler used by all disks on your
system. This is the best method to use if most or all of the drives on your hosts are of the same type
(physical media or SSD) and will contain catalog or data files. You can also use the boot parameter
to change the default to the scheduler the majority of the drives on the system need, then use the
/sys files to change individual drives to another I/O scheduler. The format of the elevator boot
parameter is:
elevator=schedulerName

Where schedulerName is deadline, noop, or cfq. You set the boot parameter using your
bootloader (grub or grub2 on most recent Linux distributions). See your distribution's documentation
for details on how to add a kernel boot parameter.

Support Tools
HP Vertica suggests that the following tools are installed so support can assist in troubleshooting
your system if any issues arise:

HP Vertica Analytics Platform (7.1.x)

Page 226 of 5055

HP Vertica Documentation

pstack (or gstack) package. Identified by issue S0040 when not installed.

mcelog package. Identified by issue S0041 when not installed.

sysstat package. Identified by issue S0045 when not installed.

RedHat Based Systems


To install the required tools on RedHat based systems, run the following commands as sudo or
root:
yum install pstack
yum install mcelog
yum install sysstat

Ubuntu and Debian Systems


To install the required tools on Ubuntu and Debian systems, run the following commands as sudo or
root:
apt-get install pstack
apt-get install mcelog
apt-get install sysstat

SuSE Systems
To install the required too on SuSE systems, run the following command as sudo or root.
zypper install sysstat
zypper install mcelog

There is no individual SuSE package for pstack/gstack. However, the gdb package contains
gstack, so you could optionally install gdb instead, or build pstack/gstack from source. To install
the gdb package:
zypper install gdb

System User Configuration


The following tasks pertain to the configuration of the system user required by HP Vertica.

HP Vertica Analytics Platform (7.1.x)

Page 227 of 5055

HP Vertica Documentation

System User Requirements


HP Vertica has specific requirements for the system user that runs and manages HP Vertica. If you
specify a user during install, but the user does not exist, then the installer reports this issue with the
identifier: S0200.

System User Requirement Details


HP Vertica requires a system user to own database files and run database processes and
administration scripts. By default, the install script automatically configures and creates this user
for you with the username dbadmin. See About HP Vertica-created Linux Users and Their
Privileges for details on the default user created by the install script. If you decide to manually
create your own system user, then you must create the user before you run the install script. If you
manually create the user:
Note: Instances of dbadmin and verticadba are placeholders for the names you choose if you
do not use the default values.

the user must have the same username and password on all nodes

the user must use the BASH shell as the user's default shell. If not, then the installer reports this
issue with identifier [S0240].

the user must be in the verticadba group (for example: usermod -a -G verticadba
userNameHere). If not, the installer reports this issue with identifier [S0220].

Note: You must create a verticadba group on all nodes. If you do not, then the installer
reports the issue with identifier [S0210].

the user's login group must be either verticadba or a group with the same name as the user (for
example, the home group for dbadmin is dbadmin). You can check the groups for a user with the
id command. For example: id dbadmin. The "gid" group is the user's primary group. If this is not
configured correctly then the installer reports this issue with the identifier [S0230]. HP Vertica
recommends that you use verticadba as the user's primary login group. For example: usermod g verticadba userNameHere. If the user's primary group is not verticadba as suggested, then
the installer reports this with HINT [S0231].

HP Vertica Analytics Platform (7.1.x)

Page 228 of 5055

HP Vertica Documentation

the user must have a home directory. If not, then the installer reports this issue with identifier
[S0260].

the user's home directory must be owned by the user. If not, then the installer reports the issue
with identifier [S0270].

the system must be aware of the user's home directory (you can set it with the usermod
command: usermod -m -d /path/to/new/home/dir userNameHere). If this is not configured
correctly then the installer reports the issue with [S0250].

the user's home directory must be owned by the user (use the chown and chgrp commands if
necessary). If this is not configured correctly, then the installer reports the issue with identifier
[S0280].

the user's home directory should have secure permissions. Specifically, it should not be writable
by anyone or by the group. Ideally the permissions should be, when viewing with ls, "---"
(nothing), or "r-x" (read and execute). If this is not configured as suggested then the installer
reports this with HINT [S0290].

TZ Environment Variable
This topic details how to set or change the TZ environment variable and update your Tzdata
Package. If this variable is not set, then the installer reports this issue with the identifier: S0305.
Before installing HP Vertica, update the Tzdata Package for your system and set the default time
zone for your database administrator account by specifying the TZ environmental variable. If your
database administrator is being created by the install_vertica script, then set the TZ variable
after you have installed HP Vertica.

Update Tzdata Package


The tzdata package is a public-domain time zone database that is pre-installed on most linux
systems. The tzdata package is updated periodically for time-zone changes across the world. HP
recommends that you update to the latest tzdata package before installing or updating HP Vertica.
Update your tzdata package with the following command:
l

For RedHat based systems: yum update tzdata

For Debian and Ubuntu systems: apt-get install tzdata

HP Vertica Analytics Platform (7.1.x)

Page 229 of 5055

HP Vertica Documentation

Setting the Default Time Zone


When a client receives the result set of a SQL query, all rows contain data adjusted, if necessary,
to the same time zone. That time zone is the default time zone of the initiator node unless the client
explicitly overrides it using the SQL SET TIME ZONE command described in the SQL Reference
Manual. The default time zone of any node is controlled by the TZ environment variable. If TZ is
undefined, the operating system time zone.
Important: The TZ variable must be set to the same value on all nodes in the cluster.
If your operating system timezone is not set to the desired timezone of the database then make
sure that the Linux environment variable TZ is set to the desired value on all cluster hosts.
The installer returns a warning if the TZ variable is not set. If your operating system timezone is
appropriate for your database, then the operating system timezone is used and the warning can be
safely ignored.

Setting the Time Zone on a Host


Important: If you explicitly set the TZ environment variable at a command line before you start
the Administration Tools, the current setting will not take effect. The Administration Tools
uses SSH to start copies on the other nodes, so each time SSH is used, the TZ variable for the
startup command is reset. TZ must be set in the .profile or .bashrc files on all nodes in the
cluster to take affect properly.
You can set the time zone several different ways, depending on the Linux distribution or the system
administrators preferences.
l

To set the system time zone on Red Hat and SUSE Linux systems, edit:
/etc/sysconfig/clock

To set the TZ variable, edit, /etc/profile, or /dbadmin/.bashrc or /home/dbadmin/.bash_


profile and add the following line (for example, for the US Eastern Time Zone):
export TZ="America/New_York"

For details on which timezone names are recognzied by HP Vertica, see the appendix: Using
Time Zones With HPVertica.

HP Vertica Analytics Platform (7.1.x)

Page 230 of 5055

HP Vertica Documentation

LANG Environment Variable Settings


This topic details how to set or change the LANG environment variable. The LANGenvironment
variable controls the locale of the host. If this variable is not set, then the installer reports this issue
with the identifier: S0300. If this variable is not set to a valid value, then the installer reports this
issue with the identifier: S0301.

Set the Host Locale


Each host has a system setting for the Linux environment variable LANG. LANG determines the
locale category for native language, local customs, and coded character set in the absence of the
LC_ALL and other LC_ environment variables. LANG can be used by applications to determine
which language to use for error messages and instructions, collating sequences, date formats, and
so forth.
To change the LANG setting for the database administrator, edit, /etc/profile, or
/dbadmin/.bashrc or /home/dbadmin/.bash_profile on all cluster hosts and set the
environment variable; for example:
export LANG=en_US.UTF-8

The LANG setting controls the following in HP Vertica:


l

OS-level errors and warnings, for example, "file not found" during COPY operations.

Some formatting functions, such as TO_CHAR and TO_NUMBER. See also Template Patterns
for Numeric Formatting.

The LANG setting does not control the following:


l

HP Vertica specific error and warning messages. These are always in English at this time.

Collation of results returned by SQL issued to HP Vertica. This must be done using a database
parameter instead. See Implement Locales for International Data Sets section in the
Administrator's Guide for details.

Note: If the LC_ALL environment variable is set, it supersedes the setting of LANG.

HP Vertica Analytics Platform (7.1.x)

Page 231 of 5055

HP Vertica Documentation

Installing HP Vertica
There are different paths you can take when installing HP Vertica. You can:
l

Install HP Vertica on one or more hosts using the command line, and not use the Management
Console.

Install the Management Console, and from the Management Console install HP Vertica on one
or more hosts by using the Management Console cluster creation wizard.

Install HP Vertica on one or more hosts using the command line, then install the Management
Console and import the cluster to be managed.

HP Vertica Analytics Platform (7.1.x)

Page 232 of 5055

HP Vertica Documentation

Installing Using the Command Line


Although HP supports installation on one node, two nodes, and multiple nodes, this section
describes how to install the HP Vertica software on a cluster of nodes. It assumes that you have
already performed the tasks in Before You Install HP Vertica, and that you have an HP Vertica
license key.
To install HP Vertica, complete the following tasks:
1. Download and install the HP Vertica server package
2. Installing HP Vertica with the install_vertica Script
Special notes
l

Downgrade installations are not supported.

Be sure that you download the RPM for the correct operating system and architecture.

HP Vertica supports two-node clusters with zero fault tolerance (K=0 safety). This means that
you can add a node to a single-node cluster, as long as the installation node (the node upon
which you build) is not the loopback node (localhost/127.0.0.1).

The Version 7.0 installer introduces new platform verification tests that prevent the install
from continuing if the platform requirements are not met by your system. Manually verify that
your system meets the requirements in Before You Install HP Verticaon your systems. These
tests ensure that your platform meets the hardware and software requirements for HP Vertica.
Previous versions documented these requirements, but the installer did not verify all of the
settings. If this is a fresh install, then you can simply run the installer and view a list of the
failures and warnings to determine which configuration changes you must make.

Back Up Existing Databases


If you are doing an upgrade installation, back up the following for all existing databases:
l

The Catalog and Data directories, using the HP Vertica backup utility. See Backing Up and
Restoring the Database in the Administrator's Guide.

/opt/vertica/, using manual methods. For example:

HP Vertica Analytics Platform (7.1.x)

Page 233 of 5055

HP Vertica Documentation

a. Enter the command:


tar -czvf /tmp/vertica.tgz /opt/vertica

b. Copy the tar file to a backup location.

Backing up MC
Before you upgrade MC, HP recommends that you back up your MC metadata (configuration and
user settings) on a storage location external to the server on which you installed MC.
1. On the target server (where you want to store MC metadata), log on as root or a user with sudo
privileges.
2. Create a backup directory; for example:
# mkdir /backups/mc/mc-backup-20130425

3. Copy the /opt/vconsole directory to the new backup folder:


# cp r /opt/vconsole /backups/mc/mc-backup-20130425

After you have completed the backup tasks, proceed to Upgrading HP Vertica to a New Version.

Download and Install the HP Vertica Server Package


To Download and Install the HP Vertica Server Package:
1. Use a Web browser to log in to myVertica portal.
2. Click the Download tab and download the HP Vertica server package to the Administration
Host.
Be sure the package you download matches the operating system and the machine
architecture on which you intend to install it. In the event of a node failure, you can use any
other node to run the Administration Tools later.
3. If you installed a previous version of HP Vertica on any of the hosts in the cluster, use the
Administration Tools to shut down any running database.

HP Vertica Analytics Platform (7.1.x)

Page 234 of 5055

HP Vertica Documentation

The database must stop normally; you cannot upgrade a database that requires recovery.
4. If you are using sudo, skip to the next step. If you are root, log in to the Administration Host as
root (or log in as another user and switch to root).
$ su - root
password: root-password
#

Caution: When installing HP Vertica using an existing user as the dba, you must exit all
UNIX terminal sessions for that user after setup completes and log in again to ensure that
group privileges are applied correctly.

After HP Vertica is installed, you no longer need root privileges. To verify sudo, see General
Hardware and OS Requirements and Recommendations.
5. Use one of the following commands to run the RPM package installer:
n

If you are root and installing an RPM:


# rpm -Uvh pathname

If you are using sudo and installing an RPM:


$ sudo rpm -Uvh pathname

If you are using Debian, replace rpm -Uvh with dpkg -i

where pathname is the HP Vertica package file you downloaded.

Note: If the package installer reports multiple dependency problems, or you receive the
error "ERROR: You're attempting to install the wrong RPM for this operating system",
then you are trying to install the wrong HP Vertica server package. Make sure that the
machine architecture (32-bit or 64-bit) of the package you downloaded matches the
operating system.

HP Vertica Analytics Platform (7.1.x)

Page 235 of 5055

HP Vertica Documentation

Installing HP Vertica with the install_vertica Script


About the Installation Script
You run the install script after you have installed the HP Vertica package. The install script is run on
a single node, using a Bash shell, and it copies the HP Vertica package to all other hosts (identified
by the --hosts argument) in your planned cluster.
The install script runs several tests on each of the target hosts to verify that the hosts meet the
system and performance requirements for an HP Vertica node. The install script modifies some
operating system configuration settings to meet these requirements. Other settings cannot be
modified by the install script and must be manually re-configured.
The installation script takes the following basic parameters:
l

A list of hosts on which to install.

Optionally, the HP Vertica RPM/DEB path and package file name if you have not pre-installed
the server package on other potential hosts in the cluster.

Optionally, a system user name. If you do not provide a user name, then the install script creates
a new system user named dbadmin. If you do provide a username and the username does not
exist on the system, then the install script creates that user.

For example:
# /opt/vertica/sbin/install_vertica --hosts node01,node02,node03 --rpm /tmp/vertica_
7.1.x.x86_64.RHEL5.rpm --dba-user mydba

Note: The install script sets up passwordless ssh for the administrator user across all the
hosts. If passwordless ssh is already setup, the install script verifies that it is functioning
correctly.

To Perform a Basic Install of HP Vertica:


1. As root (or sudo) run the install script. The script must be run in a BASH shell as root or as a
user with sudo privileges. There are many options you can configure when running the install
script. See install_vertica Options below for the complete list of options.

HP Vertica Analytics Platform (7.1.x)

Page 236 of 5055

HP Vertica Documentation

If the installer fails due to any requirements not being met, you can correct the issue and then
re-run the installer with the same command line options.
To perform a basic install:
n

As root:
# /opt/vertica/sbin/install_vertica --hosts host_list --rpm package_name --dbauser dba_username

Using sudo:
$ sudo /opt/vertica/sbin/install_vertica --hosts host_list --rpm package_name -dba-user dba_username

Basic Installation Parameters


Option

Description

--hosts host_list

A comma-separated list of IP addresses to include in the cluster;


do not include space characters in the list. Examples:
--hosts 127.0.0.1
--hosts 192.168.233.101,192.168.233.102,192.168.233.103

Note: HP Vertica stores only IPaddresses in its


configuration files. You can provide a hostname to the -hosts parameter, but it is immediately converted to an
IPaddress when the script is run.
--rpm

package_name

The path and name of the HP Vertica RPM package. Example:

--deb package_name
--rpm /tmp/vertica_7.1.x.x86_64.RHEL5.rpm

For Debian and Ubuntu installs, provide the name of the Debian
package, for example:
--deb /tmp/vertica_7.1.x86.deb

HP Vertica Analytics Platform (7.1.x)

Page 237 of 5055

HP Vertica Documentation

Option

Description

--dba-user dba_username

The name of the Database Administrator system account to


create. Only this account can run the Administration Tools. If you
omit the --dba-user parameter, then the default database
administrator account name is dbadmin.
This parameter is optional for new installations done as root but
must be specified when upgrading or when installing using sudo.
If upgrading, use the -u parameter to specify the same DBA
account name that you used previously. If installing using sudo,
the user must already exist.

Note: If you manually create the user, modify the user's


.bashrc file to include the line:
PATH=/opt/vertica/bin:$PATH so that the HP Vertica
tools such as vsql and admintools can be easily started by
the dbadmin user.

2. When prompted for a password to log into the other nodes, provide the requested password.
This allows the installation of the package and system configuration on the other cluster nodes.
If you are root, this is the root password. If you are using sudo, this is the sudo user password.
The password does not echo on the command line. For example:
HP Vertica Database 7.0 Installation Tool
Please enter password for root@host01:password

3. If the dbadmin user, or the user specified in the argument --dba-user, does not exist, then the
install script prompts for the password for the user. Provide the password. For example:
Enter password for new UNIX user dbadmin:password
Retype new UNIX password for user dbadmin:password

4. Carefully examine any warnings or failures returned by install_vertica and correct the
problems.
For example, insufficient RAM, insufficient network throughput, and too high readahead
settings on the filesystem could cause performance problems later on. Additionally, LANG

HP Vertica Analytics Platform (7.1.x)

Page 238 of 5055

HP Vertica Documentation

warnings, if not resolved, can cause database startup to fail and issues with VSQL. The
system LANG attributes must be UTF-8 compatible. Once you fix the problems, re-run the
install script.
5. Once installation is successful, disconnect from the Administration Host, as instructed by
the script; then complete the required post-installation steps.
At this point, root privileges are no longer needed and the database administrator can perform
any remaining steps.

To Complete Required Post-install Steps:


1. Log in to the Database Administrator account on the administration host.
2. Install the License Key
3. Accept the EULA.
4. If you have not already done so, proceed to the Getting Started Guide. Otherwise, proceed to
Configuring the Database in the Administrator's Guide.

install_vertica Options
The table below details all of the options available to the install_vertica script. Most options have a
long and short form. For example --hosts is interchangeable with -s. the only required options are
--hosts/-s and --rpm/--deb/-r.
Option
(long form, short form)

Description

--help

Display help for this script.

HP Vertica Analytics Platform (7.1.x)

Page 239 of 5055

HP Vertica Documentation

Option
(long form, short form)
--hosts host_list,
-s host_list

Description
A comma-separated list of host names or IP
addresses to include in the cluster. Do not include
spaces in the list. The IP addresses or hostnames
must be for unique hosts. You cannot list the same
host using multiple IP addresses/hostnames.
Examples:
--hosts host01,host02,host03
-s
192.168.233.101,192.168.233.102,192.168.233.1
03

Note: If you are upgrading an existing


installation of HP Vertica, be sure to use the
same host names that you used previously.
--rpm package_name,
--deb package_name,
-r package_name

The name of the RPM or Debian package. The


install package must be provided if you are installing
or upgrading multiple nodes and the nodes do not
have the latest server package installed, or if you
are adding a new node. The install_vertica and
update_vertica scripts serially copy the server
package to the other nodes and install the package.
If you are installing or upgrading a large number of
nodes, then consider manually installing the
package on all nodes before running the upgrade
script, as the script runs faster if it does not need to
serially upload and install the package on each
node.
Example:
--rpm vertica_7.1.x.x86_64.RHEL5.rpm

HP Vertica Analytics Platform (7.1.x)

Page 240 of 5055

HP Vertica Documentation

Option
(long form, short form)
--data-dir data_directory,
-d data_directory

Description
The default directory for database data and catalog
files. The default is /home/dbadmin.
Note: Do not use a shared directory over more
than one host for this setting. Data and catalog
directories must be distinct for each node.
Multiple nodes must not be allowed to write to
the same data or catalog directory.

--temp-dir directory

The temporary directory used for administrative


purposes. If it is a directory within /opt/vertica,
then it will be created by the installer. Otherwise, the
directory should already exist on all nodes in the
cluster. The location should allow dbadmin write
privileges.
The default is /tmp.
Note: This is not a temporary data location for
the database.

HP Vertica Analytics Platform (7.1.x)

Page 241 of 5055

HP Vertica Documentation

Option
(long form, short form)

Description

--dba-user dba_username

The name of the Database Administrator system


account to create. Only this account can run the
Administration Tools. If you omit the --dba-user
parameter, then the default database administrator
account name is dbadmin.
This parameter is optional for new installations done
as root but must be specified when upgrading or
when installing using sudo. If upgrading, use the -u
parameter to specify the same DBA account name
that you used previously. If installing using sudo,
the user must already exist.
Note: If you manually create the user, modify
the user's .bashrc file to include the line:
PATH=/opt/vertica/bin:$PATH so that the
HP Vertica tools such as vsql and admintools
can be easily started by the dbadmin user.

--dba-group GROUP,
-g GROUP

The UNIX group for DBA users. The default is

--dba-user-home dba_home_directory,
-l dba_home_directory

The home directory for the database administrator.

--dba-user-password
dba_password,
-p dba_password

The password for the database administrator

verticadba.

The default is /home/dbadmin.

account. If not supplied, the script prompts for a


password and does not echo the input.

--dba-user-password-disabled

Disable the password for the --dba-user. This


argument stops the installer from prompting for a
password for the --dba-user. You can assign a
password later using standard user management
tools such as passwd.

HP Vertica Analytics Platform (7.1.x)

Page 242 of 5055

HP Vertica Documentation

Option
(long form, short form)
--spread-logging,
-w

Description
Configures spread to output logging output to
/opt/vertica/log/spread_<hostname>.log.
Does not apply to upgrades.
Note: Do not enable this logging unless
directed to by Vertica Analytics Platform
Technical Support.

--ssh-password password,
-P password

The password to use by default for each cluster


host. If not supplied, and the -i option is not used,
then the script prompts for the password if and when
necessary and does not echo the input. Do not use
with the -i option.
Special note about password:
If you run the install_vertica script as root,
specify the root password with the -P parameter:
# /opt/vertica/sbin/install_vertica -P <root_
passwd>

If, however, you run the install_vertica script


with the sudo command, the password for the -P
parameter should be the password of the user who
runs install_vertica, not the root password. For
example if user dbadmin runs install_vertica
with sudo and has a password with the value
dbapasswd, then the value for -P should be
dbapasswd:
$ sudo /opt/vertica/sbin/install_vertica -P
dbapasswd

HP Vertica Analytics Platform (7.1.x)

Page 243 of 5055

HP Vertica Documentation

Option
(long form, short form)
--ssh-identity file,
-i file

Description
The root private-key file to use if passwordless ssh
has already been configured between the hosts.
Verify that normal SSH works without a password
before using this option. The file can be private key
file (for example, id_rsa), or PEM file. Do not use
with the --ssh-password/-P option.
HP Vertica accepts the following:
l

By providing an SSH private key which is not


password protected. You cannot run the
install_verticascript with the sudo command
when using this method.

By providing a password-protected private key


and using an SSH-Agent. Note that sudo
typically resets environment variables when it is
invoked. Specifically, the SSH_AUTHSOCK
variable required by the SSH-Agent may be
reset. Therefore, configure your system to
maintain SSH_AUTHSOCK or invoke the
install_vertica command using a method similar
to the following: sudo SSH_AUTHSOCK=$SSH_
AUTHSOCK /opt/vertica/sbin/install_
vertica ...

HP Vertica Analytics Platform (7.1.x)

Page 244 of 5055

HP Vertica Documentation

Option
(long form, short form)
--dba-user dba_username,
-u dba_username

Description
The name of the Database Administrator account
to create. Only this account can run the
Administration Tools. If you omit the --dba-user
parameter, then the install script creates a new
system account named dbadmin. If dbadmin exists,
it verifies the user can be used as the database
administrator.
Note: This parameter is optional for new
installations done as root but must be specified
when using sudo. If upgrading Vertica Analytics
Platform, use the --dba-user parameter to
specify the same DBA account name that you
used previously. If installing using sudo, the
user must already exist.

--config-file file,
-z file

Accepts an existing properties file created by -record-config file_name. This properties file
contains key/value parameters that map to values in
the install_vertica script, many with Boolean
arguments that default to false.

HP Vertica Analytics Platform (7.1.x)

Page 245 of 5055

HP Vertica Documentation

Option
(long form, short form)
--add-hosts host_list,
-A host_list

Description
A comma-separated list of hosts to add to an
existing HP Vertica cluster.
--add-hosts modifies an existing installation of HP
Vertica by adding a host to the database cluster and
then reconfiguring the spread. This is useful for
increasing system performance or setting K-safety
to one (1) or two (2).
Notes:
l

If you have used the -T parameter to configure


spread to use direct point-to-point
communication within the existing cluster, you
must use the -T parameter when you add a new
host; otherwise, the new host automatically uses
UDP broadcast traffic, resulting in cluster
communication problems that prevent HP
Vertica from running properly.
Examples:
--add-hosts host01
--add-hosts 192.168.233.101

The update_vertica script described in Adding


Nodes calls the install_vertica script to
update the installation. You can use either the
install_vertica or update_vertica script
with the --add-hosts parameter.

--record-config file_name,
-B file_name

Accepts a file name, which when used in


conjunction with command line options, creates a
properties file that can be used with the --configfile parameter. This parameter creates the
properties file and exits; it has no impact on
installation.

HP Vertica Analytics Platform (7.1.x)

Page 246 of 5055

HP Vertica Documentation

Option
(long form, short form)

Description

--clean

Forcibly cleans previously stored configuration files.


Use this parameter if you need to change the hosts
that are included in your cluster.Only use this
parameter when no database is defined. Cannot be
used with update_vertica.

--license { license_file | CE },
-L { license_file | CE }

Silently and automatically deploys the license key


to /opt/vertica/config/share. On multi-node
installations, the -license option also applies the
license to all nodes declared in the --hosts host_
list.
If specified with CE, automatically deploys the
Community Edition license key, which is included in
your download. You do not need to specify a license
file.
Examples:
--license CE
--license /tmp/vlicense.dat

HP Vertica Analytics Platform (7.1.x)

Page 247 of 5055

HP Vertica Documentation

Option
(long form, short form)
--remove-hosts host_list,
-R host_list

Description
A comma-separated list of hosts to remove from an
existing HP Vertica cluster.
--remove-hosts modifies an existing installation of
HP Vertica by removing a host from the database
cluster and then reconfiguring the spread. This is
useful for removing an obsolete or over-provisioned
system. For example:
---remove-hosts host01
-R 192.168.233.101

Notes:
l

If you used the -T parameter to configure spread


to use direct point-to-point communication within
the existing cluster, you must use -T when you
remove a host; otherwise, the hosts
automatically use UDP broadcast traffic,
resulting in cluster communication problems that
prevents HP Vertica from running properly.

The update_vertica script described in


Removing Nodes in the Administrator's Guide
calls the install_vertica script to perform the
update to the installation. You can use either the
install_vertica or update_vertica script
with the -R parameter.

HP Vertica Analytics Platform (7.1.x)

Page 248 of 5055

HP Vertica Documentation

Option
(long form, short form)
--control-network { BCAST_ADDR | default
},
-S { BCAST_ADDR | default }

Description
Takes either the value 'default' or a broadcast
network IP address (BCAST_ADDR) to allow
spread communications to be configured on a
subnet that is different from other HP Vertica data
communications. --control-network is also used
to force a cluster-wide spread reconfiguration when
changing spread related options.
Note: The --control-network must match
the subnet for at least some of the nodes in the
database. If the provided address does not
match the subnet of any node in the database
then the installer displays an error and stops. If
the provided address matches some, but not all
of the node's subnets, then a warning is
displayed, but the install continues. Ideally, the
value for --control-network should match all
node subnets.
Examples:
--control-network default
--control-network 10.20.100.255

HP Vertica Analytics Platform (7.1.x)

Page 249 of 5055

HP Vertica Documentation

Option
(long form, short form)
--point-to-point,
-T

Description
Configures spread to use direct point-to-point
communication between all HP Vertica nodes. You
should use this option if your nodes aren't located on
the same subnet. You should also use this option for
all virtual environment installations, regardless of
whether the virtual servers are on the same subnet
or not. The maximum number of spread daemons
supported in point-to-point communication in HP
Vertica 7.1 is 80. It is possible to have more than 80
nodes by using large cluster mode, which does not
install a spread daemon on each node.
Cannot be used with --broadcast, as the setting
must be either --broadcast or --point-to-point.
Important: When changing the configuration from -broadcast (the default) to --point-to-point or
from --point-to-point to --broadcast, the -control-network parameter must also be used.
Note: Spread always runs on UDP. -T does
not denote TCP.

HP Vertica Analytics Platform (7.1.x)

Page 250 of 5055

HP Vertica Documentation

Option
(long form, short form)
--broadcast,
-U

Description
Specifies that HP Vertica use UDP broadcast traffic
by spread between nodes on the subnet. This
parameter is automatically used by default. No more
than 80 spread daemons are supported by
broadcast traffic. It is possible to have more than 80
nodes by using large cluster mode, which does not
install a spread daemon on each node.
Cannot be used with --point-to-point, as the
setting must be either --broadcast or --point-topoint.
Important: When changing the configuration from -broadcast (the default) to --point-to-point or
from --point-to-point to --broadcast, the -control-network parameter must also be used.
Note: Spread always runs on UDP. -U does
not mean use UDP instead of TCP.

--accept-eula,
-Y

Silently accepts the EULA agreement. On multinode installations, the --accept-eula value is
propagated throughout the cluster at the end of the
installation, at the same time as the Administration
Tools metadata.

--no-system-configuration

By default, the installer makes system configuration


changes to meet server requirements. If you do not
want the installer to change any system properties,
then use the --no-system-configuration. The
installer presents warnings or failures for
configuration settings that do not meet requirements
that it normally would have automatically
configured.
Note: The system user account is still
created/updated when using this parameter.

HP Vertica Analytics Platform (7.1.x)

Page 251 of 5055

HP Vertica Documentation

Option
(long form, short form)

Description

--failure-threshold

Stops the installation when the specified failure


threshold is encountered.
Options can be one of:
l

HINT - Stop the install if a HINT or greater issue


is encountered during the installation tests. HINT
configurations are settings you should make, but
the database runs with no significant negative
consequences if you omit the setting.

WARN (default) - Stop the installation if a


WARN or greater issue is encountered. WARN
issues may affect the performance of the
database. However, for basic testing purposes
or Community Edition users, WARN issues can
be ignored if extreme performance is not
required.

FAIL - Stop the installation if a FAIL or greater


issue is encountered. FAIL issues can have
severely negative performance consequences
and possible later processing issues if not
addressed. However, HP Vertica can start even
if FAIL issues are ignored.

HALT - Stop the installation if a HALT or greater


issue is encountered. The database may not be
able to be started if you choose his option. Not
supported in production environments.

NONE - Do not stop the installation. The


database may not start. Not supported in
production environments.

HP Vertica Analytics Platform (7.1.x)

Page 252 of 5055

HP Vertica Documentation

Option
(long form, short form)
--large-cluster,
-2
[ <integer> | DEFAULT ]

Description
Enables a large cluster layout, in which control
message responsibilities are delegated to a subset
of Vertica Analytics Platform nodes (called control
nodes) to improve control message performance in
large clusters. Consider using this parameter with
more than 50 nodes.
Options can be one of:
l

<integer>The number of control nodes you


want in the cluster. Valid values are 1 to 120 for
all new databases.

DEFAULTVertica Analytics Platform chooses


the number of control nodes using calculations
based on the total number of cluster nodes in the
--hosts argument.

For more information, see Large Cluster in the


Administrator's Guide.

Installing HP Vertica Silently


This section describes how to create a properties file that lets you install and deploy HP Verticabased applications quickly and without much manual intervention.
Note: The procedure assumes that you have already performed the tasks in Before you Install
HP Vertica.
Install the properties file:
1. Download and install the HP Vertica install package, as described in Installing HP Vertica.
2. Create the properties file that enables non-interactive setup by supplying the parameters you
want HP Vertica to use. For example:
The following command assumes a multi-node setup:

HP Vertica Analytics Platform (7.1.x)

Page 253 of 5055

HP Vertica Documentation

# /opt/vertica/sbin/install_vertica --record-config file_name --license


/tmp/license.txt --accept-eula --dba-user-password password --ssh-password password -hosts host_list --rpm package_name

The following command assumes a single-node setup:


# /opt/vertica/sbin/install_vertica --record-config file_name --license
/tmp/license.txt --accept-eula --dba-user-password password

Option

Description

--record-file file_name

[Required] Accepts a file name, which when used in


conjunction with command line options, creates a
properties file that can be used with the --configfile option during setup. This flag creates the
properties file and exits; it has no impact on
installation.

--license-file { license_file |
CE }

Silently and automatically deploys the license key to


/opt/vertica/config/share. On multi-node installations,
the -license option also applies the license to all
nodes declared in the --hosts host_list.
If specified with CE, automatically deploys the
Community Edition license key, which is included in
your download. You do not need to specify a license
file.

--accept-eula

Silently accepts the EULA agreement during setup.

--dba-user-password password

The password for the Database Administrator


account; if not supplied, the script prompts for the
password and does not echo the input.

--ssh-password password

The root password to use by default for each cluster


host; if not supplied, the script prompts for the
password if and when necessary and does not echo
the input.

HP Vertica Analytics Platform (7.1.x)

Page 254 of 5055

HP Vertica Documentation

Option

Description

--hosts host_list

A comma-separated list of hostnames or IP addresses


to include in the cluster; do not include space
characters in the list.
Examples:
--hosts host01,host02,host03
--hosts
192.168.233.101,192.168.233.102,192.168.233.103

--rpm package_name
--deb package_name

The name of the RPM or Debian package that


contained this script.
Example:
--rpm vertica_7.1.x.x86_64.RHEL5.rpm

This parameter is required on multi-node installations if


the RPM or DEB package is not already installed on
the other hosts.
See Installing HP Vertica with the install_vertica Script for the complete set of installation
parameters.

Tip: Supply the parameters to the properties file once only. You can then install HP
Vertica using just the --config-file parameter, as described below.

3. Use one of the following commands to run the installation script.


n

If you are root:


/opt/vertica/sbin/install_vertica --config-file file_name

If you are using sudo:


$ sudo /opt/vertica/sbin/install_vertica --config-file file_name

HP Vertica Analytics Platform (7.1.x)

Page 255 of 5055

HP Vertica Documentation

--config-file file_name accepts an existing properties file created by --recordconfig file_name. This properties file contains key/value parameters that map to values
in the install_vertica script, many with boolean arguments that default to false
The command for a single-node install might look like this:
# /opt/vertica/sbin/install_vertica --config-file /tmp/vertica-inst.prp

4. If you did not supply a --ssh-password password parameter to the properties file, you are
prompted to provide the requested password to allow installation of the RPM/DEB and system
configuration of the other cluster nodes. If you are root, this is the root password. If you are
using sudo, this is the sudo user password. The password does not echo on the command line.

Note: If you are root on a single-node installation, you are not prompted for a password.

5. If you did not supply a --dba-user-password password parameter to the properties file, you
are prompted to provide the database administrator account password.
The installation script creates a new Linux user account (dbadmin by default) with the
password that you provide.
6. Carefully examine any warnings produced by install_vertica and correct the problems if
possible. For example, insufficient RAM, insufficient Network throughput and too high
readahead settings on filesystem could cause performance problems later on.

Note: You can redirect any warning outputs to a separate file, instead of having them
display on the system. Use your platforms standard redirected machanisms. For example:
install_vertica [options] > /tmp/file 1>&2.

7. Optionally perform the following steps:


n

Install the ODBC and JDBC driver.

Install the vsql client application on non-cluster hosts.

8. Disconnect from the Administration Host as instructed by the script. This is required to:
n

Set certain system parameters correctly.

Function as the HP Vertica database administrator.

HP Vertica Analytics Platform (7.1.x)

Page 256 of 5055

HP Vertica Documentation

At this point, Linux root privileges are no longer needed. The database administrator can
perform the remaining steps.

Note: When creating a new database, the database administrator might want to use
different data or catalog locations than those created by the installation script. In that
case, a Linux administrator might need to create those directories and change their
ownership to the database administrator.

If you supplied the --license and --accept-eula parameters to the properties file, then
proceed to the Getting Started Guide and then see Configuring the Database in the
Administrator's Guide. Otherwise:

1. Log in to the Database Administrator account on the administration host.


2. Accept the End User License Agreement and install the license key you downloaded
previously as described in Install the License Key.
3. Proceed to the Getting Started Guide and then see Configuring the Database in the
Administrator's Guide.

Notes
l

Downgrade installations are not supported.

The following is an example of the contents of the configuration properties file:


accept_eula = Truelicense_file = /tmp/license.txt
record_to = file_name
root_password = password
vertica_dba_group = verticadba
vertica_dba_user = dbadmin
vertica_dba_user_password = password

Installing HP Vertica on Amazon Web Services (AWS)


Beginning with Vertica 6.1.x, you can use Vertica on AWS by utilizing a pre-configured Amazon
Machine Image (AMI). For details on installing and configuring a cluster on AWS, refer to About
Using HP Vertica on Amazon Web Services (AWS).

HP Vertica Analytics Platform (7.1.x)

Page 257 of 5055

HP Vertica Documentation

Creating a Cluster Using MC


You can use Management Console to install an HP Vertica cluster on hosts where HP Vertica
software has not been installed. The Cluster Installation wizard lets you specify the hosts you want
to include in your HP Vertica cluster, loads the HP Vertica software onto the hosts, validates the
hosts, and assembles the nodes into a cluster.
Management Console must be installed and configured before you can create a cluster on targeted
hosts. See Installing and Configuring the MC for details.

Steps Required to Install an HP Vertica Cluster Using MC:


l

Install and configure MC

Prepare the Hosts

Create the private key file and copy it to your local machine

Run the Cluster Installation Wizard

Validate the hosts and create the cluster

Create a new database on the cluster

Prepare the Hosts


Before you can install an HP Vertica cluster using the MC, you must prepare each host that will
become a node in the cluster. The cluster creation process runs validation tests against each host
before it attempts to install the HP Vertica software. These tests ensure that the host is correctly
configured to run HP Vertica.

Install Perl
The MC cluster installer uses Perl to perform the installation. Install Perl 5 on the target hosts
before performing the cluster installation. Perl is available for download from www.perl.org.

Validate the Hosts


The validation tests provide:

HP Vertica Analytics Platform (7.1.x)

Page 258 of 5055

HP Vertica Documentation

Warnings and error messages when they detect a configuration setting that conflicts with the
HP Vertica requirements or any performance issue

Suggestions for configuration changes when they detect an issue

Note: The validation tests do not automatically fix all problems they encounter.
All hosts must pass validation before the cluster can be created.
If you accepted the default configuration options when installing the OS on your host, then the
validation tests will likely return errors, since some of the default options used on Linux systems
conflict with HP Vertica requirements. See the Installation Guide for details on OS settings. To
speed up the validation process you can perform the following steps on the prospective hosts
before you attempt to validate the hosts. These steps are based on Red Hat Enterprise Linux and
CentOS systems, but other supported platforms have similar settings.
On each host you want to include in the HP Vertica cluster, you must stage the host according to
Before You Install HP Vertica.

Create a Private Key File


Before you can install a cluster, Management Console must be able to access the hosts on which
you plan to install HP Vertica. MC uses password-less SSH to connect to the hosts and install HP
Vertica software using a private key file.
If you already have a private key file that allows access to all hosts in the potential cluster, you can
use it in the cluster creation wizard.
Note: The private key file is required to complete the MC cluster installation wizard.

Create a Private Key File


1. Log in on the server as root or as a user with sudo privileges.
2. Change to your home directory.
$ cd ~

3. If an .ssh directory does not exist, create one.

HP Vertica Analytics Platform (7.1.x)

Page 259 of 5055

HP Vertica Documentation

$ mkdir .ssh

4. Generate a passwordless private key/public key pair.


$ ssh-keygen -q -t rsa -f ~/.ssh/vid_rsa -N ''

This command creates two files: vid_rsa and vid_rsa.pub. The vid_rsa file is the private key file
that you upload to the MC so that it can access nodes on the cluster and install HP vertica. The
vid_rsa.pub file is copied to all other hosts so that they can be accessed by clients using the
vid_rsa file.
5. Make your .ssh directory readable and writable only by yourself.
$ chmod 700 /root/.ssh

6. Change to the .ssh directory.


$ cd ~/.ssh

7. Concatenate the public key into to the file vauthorized_keys2.


$ cat vid_rsa.pub >> vauthorized_keys2

8. If the host from which you are creating the public key will also be in the cluster, then copy the
public key into the local-hosts authorized key file:
cat vid_rsa.pub >> authorized_keys2

9. Make the files in your .ssh directory readable and writable only by yourself.
$ chmod 600 ~/.ssh/*

10. Create the .ssh directory on the other nodes.


$ ssh <host> "mkdir /root/.ssh"

11. Copy the vauthorized key file to the other nodes.

HP Vertica Analytics Platform (7.1.x)

Page 260 of 5055

HP Vertica Documentation

$ scp -r /root/.ssh/vauthorized_keys2 <host>:/root/.ssh/.

12. On each node, concatenate the vauthorized_keys2 public key to the authorized_keys2 file and
make the file readable and writable only by the owner.
$ ssh <host> "cd /root/.ssh/;cat vauthorized_keys2 >> authorized_keys2; chmod 600
/root/.ssh/authorized_keys2"

13. On each node, remove the vauthorized_keys2 file.


$ ssh -i /root/.ssh/vid_rsa <host> "rm /root/.ssh/vauthorized_keys2"

14. Copy the vid_rsa file to the workstation from which you will access the MC cluster installation
wizard. This file is required to install a cluster from the MC.
A complete example of the commands for creating the public key and allowing access to three
hosts from the key is below. The commands are being initiated from the docg01 host, and all hosts
will be included in the cluster (docg01 - docg03):
ssh docg01
cd ~/.ssh
ssh-keygen -q -t rsa -f ~/.ssh/vid_rsa -N ''
cat vid_rsa.pub > vauthorized_keys2
cat vid_rsa.pub >> authorized_keys2
chmod 600 ~/.ssh/*
scp -r /root/.ssh/vauthorized_keys2 docg02:/root/.ssh/.
scp -r /root/.ssh/vauthorized_keys2 docg03:/root/.ssh/.
ssh docg02 "cd /root/.ssh/;cat vauthorized_keys2 >> authorized_keys2; chmod 600
/root/.ssh/authorized_keys2"
ssh docg03 "cd /root/.ssh/;cat vauthorized_keys2 >> authorized_keys2; chmod 600
/root/.ssh/authorized_keys2"
ssh -i /root/.ssh/vid_rsa docg02 "rm /root/.ssh/vauthorized_keys2"
ssh -i /root/.ssh/vid_rsa docg03 "rm /root/.ssh/vauthorized_keys2"
rm ~/.ssh/vauthorized_keys2

Use MC's Cluster Installation Wizard


MC's Cluster Installation Wizard guides you through the steps required to install an HP Vertica
cluster on hosts that do not already have HP Vertica software installed.
Note: If you are using MC with the HP Vertica AMI on Amazon Web Services, note that the
Create Cluster and Import Cluster options are not supported.

HP Vertica Analytics Platform (7.1.x)

Page 261 of 5055

HP Vertica Documentation

Prerequisites
Before you proceed, make sure you:
l

Installed and configured MC.

Prepared the hosts that you will include in the HP Vertica database cluster.

Created the private key (pem) file and copied it to your local machine.

Obtained a copy of your HP Vertica license if you are installing the Enterprise Edition. If you are
using the Community Edition, a license key is not required.

Downloaded the HP Vertica server RPM (or DEB file).

Have read/copy permissions on files stored on the local browser host that you will transfer to the
host on which MC is installed.

Permissions on Files you'll Transfer to MC


On your local workstation, you must have at least read/write privileges on files you'll upload to MC
through the Cluster Installation Wizard. These files include the HP Vertica server package, the
license key (if needed), the private key file, and an optional CSV file of IP addresses.

Create a New HP Vertica Cluster Using MC


1. Connect to Management Console and log in as an MC administrator.
2. On MC's Home page, click the Databases and Clusters task.
3. Click the plus sign and select Create Cluster.
4. The Create Cluster wizard opens. Provide the following information:
a. Cluster nameA label for the cluster
5. Vertica Admin UserThe user that is created on each of the nodes when they are installed,
typically 'dbadmin'. This user has access to HP Vertica and is also an OS user on the host.
6. Password for the HP Vertica Admin UserThe password you enter (required) is set for each
node when MC installs HP Vertica.

HP Vertica Analytics Platform (7.1.x)

Page 262 of 5055

HP Vertica Documentation

Note: MC does not support an empty password for the administrative user.

7. HP Vertica Admin PathStorage location for catalog files, which defaults to /home/dbadmin
unless you specified a different path during MC configuration (or later on MC's Settings page).

Important: The Vertica Admin Path must be the same as the Linux database
administrator's home directory. If you specify a path that is not the Linux dbadmin's home
directory, MC returns an error.
8. Click Next and specify the private key file and host information:
a. Click Browse and navigate to the private key file (vid_rsa) that you created earlier.

Note: You can change the private key file at the beginning of the validation stage by
clicking the name of the private key file in the bottom-left corner of the page. However,
you cannot change the private key file after validation has begun unless the first host
fails validation due to an SSH login error.

b. Include the host IP addresses. Here you have three options:


Specify later (but include number of nodes). This option allows you to specify the number of
nodes, but not the specific IPs. You can specify the specific IPs before you validate hosts.
Import IP addresses from local file. You can specify the hosts in a CSV file using either IP
addresses or host names.
Enter a range of IP addresses. You can specify a range of IPs to use for new nodes. For
example 192.168.1.10 to 192.168.1.30. The range of IPs must be on the same or
contiguous subnets.
9. Click Next and select the software and license:
a. Vertica Software. If one or more HP Vertica packages have been uploaded, you can select
one from the list; otherwise select Upload a new local vertica binary file and browse to
an HP Vertica server file on your local system.
b. Vertica License. Click Browse and navigate to a local copy of your HP Vertica license if
you are installing the <ENT)>. Community Edition versions require no license key.

HP Vertica Analytics Platform (7.1.x)

Page 263 of 5055

HP Vertica Documentation

10. Click Next. The Create cluster page opens. If you did not specify the IP addresses, select
each host icon and provide an IP address by entering the IP in the box and clicking Apply for
each host you add.
The hosts are now ready for Host Validation and Cluster Creation.

Validate Hosts and Create the Cluster


Host validation is the process where the MC runs tests against each host in a proposed cluster.
You can validate hosts only after you have completed the cluster installation wizard. You must
validate hosts before the MC can install HP Vertica on each host.
At any time during the validation process, but before you create the cluster, you can add and
remove hosts by clicking the appropriate button in the upper left corner of the page on MC. A Create
Cluster button appears when all hosts that appear in the node list are validated.

How to Validate Hosts


To validate one or more hosts:
1. Connect to Management Console and log in as an MC administrator.
2. On the MC Home page, click the Databases and Clusters task.
3. In the list of databases and clusters, select the cluster on which you have recently run the
cluster installation wizard (Creating... appears under the cluster) and click View.
4. Validate one or several hosts:
n

To validate a single host, click the host icon, then click Validate Host.

To validate all hosts at the same time, click All in the Node List, then click Validate Host.

To validate more than one host, but not all of them, Ctrl+click the host numbers in the node
list, then click Validate Host.

5. Wait while validation proceeds.


The validation step takes several minutes to complete. The tests run in parallel for each host,
so the number of hosts does not necessarily increase the amount of time it takes to validate all
the hosts if you validate them at the same time. Hosts validation results in one of three
possible states:

HP Vertica Analytics Platform (7.1.x)

Page 264 of 5055

HP Vertica Documentation

Green check markThe host is valid and can be included in the cluster.

Orange triangleThe host can be added to the cluster, but warnings were generated. Click
the tests in the host validation window to see details about the warnings.

Red XThe host is not valid. Click the tests in the host validation window that have red X's
to see details about the errors. You must correct the errors re-validate or remove the host
before MC can create the cluster.
To remove an invalid host: Highlight the host icon or the IP address in the Node List and
click Remove Host.

All hosts must be valid before you can create the cluster. Once all hosts are valid, a Create Cluster
button appears near the top right corner of the page.

How to Create the Cluster


1. Click Create Cluster to install HP Vertica on each host and assemble the nodes into a cluster.
The process, done in parallel, takes a few minutes as the software is copied to each host and
installed.
2. Wait for the process to complete. When the Success dialog opens, you can do one of the
following:
n

Optionally create a database on the new cluster at this time by clicking Create Database

Click Done to create the database at a later time

See Creating a Database on a Cluster for details on creating a database on the new cluster.

Create a Database on a Cluster


After you use the MC Cluster Installation Wizard to create an HP Vertica cluster, you can create a
database on that cluster through the MC interface. You can create the database on all cluster nodes
or on a subset of nodes.
If a database had been created using the Administration Tools on any of the nodes, MC detects
(autodiscovers) that database and displays it on the Manage (Cluster Administration) page so you
can import it into the MC interface and begin monitoring it.
MC allows only one database running on a cluster at a time, so you might need to stop a running
database before you can create a new one.

HP Vertica Analytics Platform (7.1.x)

Page 265 of 5055

HP Vertica Documentation

The following procedure describes how to create a database on a cluster that you created using the
MC Cluster Installation Wizard. To create a database on a cluster that you created by running the
install_vertica script, see Creating an Empty Database.

Create a Database on a Cluster


To create a new empty database on a new cluster:
1. If you are already on the Databases and Clusters page, skip to the next step; otherwise:
a. Connect to MC and sign in as an MC administrator.
b. On the Home page, click the Databases and Clusters task.
2. If no databases exist on the cluster, continue to the next step; otherwise:
a. If a database is running on the cluster on which you want to add a new database, select the
database and click Stop.
b. Wait for the running database to have a status of Stopped.
3. Click the cluster on which you want to create the new database and click Create Database.
4. The Create Database wizard opens. Provide the following information:
n

Database name and password. See Creating a Database Name and Password for rules.

Optionally click Advanced to open the advanced settings and change the port and catalog,
data, and temporary data paths. By default the MC application/web server port is 5450 and
paths are /home/dbadmin, or whatever you defined for the paths when you ran the cluster
creation wizard. Do not use the default agent port 5444 as a new setting for the MC
application/web server port. See MC Settings > Configuration for port values.

5. Click Continue.
6. Select nodes to include in the database.
The Database Configuration window opens with the options you provided and a graphical
representation of the nodes appears on the page. By default, all nodes are selected to be part of
this database (denoted by a green check mark). You can optionally click each node and clear
Include host in new database to exclude that node from the database. Excluded nodes are
gray. If you change your mind, click the node and select the Include check box.

HP Vertica Analytics Platform (7.1.x)

Page 266 of 5055

HP Vertica Documentation

7. Click Create in the Database Configuration window to create the database on the nodes.
The creation process takes a few moments and then the database is started and a Success
message appears.
8. Click OK to close the success message.
The Database Manager page opens and displays the database nodes. Nodes not included in
the database are gray.

HP Vertica Analytics Platform (7.1.x)

Page 267 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 268 of 5055

HP Vertica Documentation

Installing and Configuring Management Console


This section describes how to install, configure, and upgrade Management Console (MC). If you
need to back up your instance of MC, see Backing Up MC in the Administrator's Guide.
You can install MC before or after you install HP Vertica; however, consider installing HP Vertica
and creating a database before you install MC. After you finish configuring MC, it automatically
discovers your running database cluster, saving you the task of importing it manually.

Before You Install MC


Each version of HP Vertica Management Console (MC) is compatible only with the matching
version of the HP Vertica server. For example, HP Vertica 7.1.0 server is supported with HP
Vertica 7.1.0 MC only. Read the following documents for more information:
l

Supported Platforms document, at http://www.vertica.com/documentation. The Supported


Platforms document also lists supported browsers for MC.

Installation Overview and Checklist. Make sure you have everything ready for your HP Vertica
configuration.

Before you Install HP Vertica. Read for required prerequisites for all HP Vertica configurations,
including Management Console.

Driver Requirements for Linux SuSe Distributions


The MC (vertica-console) package contains the Oracle Implementation of Java 6 JRE and
requires that you install the unixODBC driver manager on SuSe Linux platforms. unixODBC
provides needed libraries libodbc and lidodbcinst.

Port Requirements
When you use MC to create a HP Vertica cluster, the Create Cluster Wizard uses SSH on its
default port (22).
Port 5444 is the default agent port and must be available for MC-to-node and node-to-node
communications.
Port 5450 is the default MC port and must be available for node-to-MC communications.
See Ensure Ports Are Available for more information about port and firewall considerations.

HP Vertica Analytics Platform (7.1.x)

Page 269 of 5055

HP Vertica Documentation

Firewall Considerations
Make sure that a firewall or iptables are not blocking communications between the cluster's
database, Management Console, and MC's agents on each cluster node.

IP Address Requirements
If you install MC on a server outside the HP Vertica cluster it will be monitoring, that server must be
accessible to at least the public network interfaces on the cluster.

Disk Space Requirements


You can install MC on any node in the cluster, so there are no special disk requirements for MC
other than disk space you would normally allocate for your database cluster. See Disk Space
Requirements for HP Vertica.

Time Synchronization and MC's Self-Signed Certificate


When you connect to MC through a client browser, HP Vertica assigns each HTTPS request a selfsigned certificate, which includes a timestamp. To increase security and protect against password
replay attacks, the timestamp is valid for several seconds only, after which it expires.
To avoid being blocked out of MC, synchronize time on the hosts in your HP Vertica cluster, and on
the MC host if it resides on a dedicated server. To recover from loss or lack of synchronization,
resync system time and the Network Time Protocol. See Set Up Time Synchronization in the
Installation Guide. If you want to generate your own certificates and keys for MC, see Generating
Certificates and Keys for MC.

SSL Requirements
The openssl package must be installed on your Linux environment so SSL can be set up during the
MC configuration process. See SSL Prerequisites in the Administrator's Guide.

File Permission Requirements


On your local workstation, you must have at least read/write privileges on any files you plan to
upload to MC through the Cluster Installation Wizard. These files include the HP Vertica server
package, the license key (if needed), the private key file, and an optional CSV file of IP addresses.

HP Vertica Analytics Platform (7.1.x)

Page 270 of 5055

HP Vertica Documentation

Monitor Resolution
Management Console requires a minimum resolution of 1024 x 768, but HP recommends higher
resolutions for optimal viewing.

Installing Management Console


You can install Management Console on any node you plan to include in the HP Vertica database
cluster, as well as on its own, dedicated server outside the cluster.

Install Management Console on the MC Server


1. Download the MC package (vertica-console-<current-version>.<Linux-distro>) from
myVertica portal and save it to a location on the target server, such as /tmp.
2. On the target server, log in as root or a user with sudo privileges.
3. Change directory to the location where you saved the MC package.
4. Install MC using your local Linux distribution package management system (for example, rpm,
yum, zipper, apt, dpkg).
The following command is a generic example for Red Hat 5:
# rpm -Uvh vertica-console-<current-version>.x86_64.RHEL5.rpm

The following command is a generic example for Debian 5 and Debian 6:


# dpkg -i vertica-console-<current-version>.deb

For Ubuntu systems, use sudo:


$ sudo dpkg -i vertica-console-<current-version>.deb

5. Open a browser and enter the IP address or host name of the server on which you installed
MC, as well as the default MC port 5450.
For example, you'll enter one of:

HP Vertica Analytics Platform (7.1.x)

Page 271 of 5055

HP Vertica Documentation

https://xx.xx.xx.xx:5450/ https://hostname:5450/

6. When the Configuration Wizard dialog box appears, proceed to Configuring MC.

See Also
l

Upgrading MC

Configuring MC
After you install MC, you need to configure it through a client browser connection. An MC
configuration wizard walks you through creating the Linux MC super administrator account,
storage locations, and other settings that MC needs to run. Information you provide during the
configuration process is stored in the /opt/vconsole/config/console.properties file.
If you need to change settings after the configuration wizard ends, such as port assignments, you
can do so later through Home > MC Settings page.

How to Configure MC
1. Open a browser session.
2. Enter the IP address or host name of the server on which you installed MC (or any cluster
node's IP/host name if you already installed HP Vertica), and include the default MC port 5450.
For example, you'll enter one of:
https://xx.xx.xx.xx:5450/ https://hostname:5450/

3. Follow the configuration wizard.

About Authentication for the MC Super Administrator


In the final step of the configuration process, you choose an authentication method for the MC
super administrator. You can decide to have MC authenticate the MC super (in which case the
process is complete), or you can choose LDAP.
If you choose LDAP, provide the following information for the newly-created MC super
administrator:

HP Vertica Analytics Platform (7.1.x)

Page 272 of 5055

HP Vertica Documentation

Corporate LDAP service host (IP address or host name)

LDAP server running port (default 389)

LDAP DN (distinguished name) for base search/lookup/authentication criteria


At a minimum, specify the dc (domain component) field. For example: dc=vertica, dc=com
generates a unique identifier of the organization, like the corporate Web URL vertica.com

Default search path for the organization unit (ou)


For example: ou=sales, ou=engineering

Search attribute for the user name (uid), common name (cn), and so on
For example, uid=jdoe, cn=Jane Doe

Binding DN and password for the MC super administrator.


In most cases, you provide the "Bind as administrator" fields, information used to establish the
LDAP service connection for all LDAP operations, like search. Instead of using the administrator
user name and password, the MC administrator could use his or her own LDAP credentials, as
long as that user has search privileges.

If You Choose Bind Anonymously


Unless you specifically configure the LDAP server to deny anonymous binds, the underlying LDAP
protocol will not cause MC's Configure Authentication process to fail if you choose "Bind
anonymously" for the MC administrator. Before you use anonymous bindings for LDAP
authentication on MC, be sure that your LDAP server is configured to explicitly disable/enable this
option. For more information, see the article on Infusion Technology Solutions and the OpenLDAP
documentation on access control.

What Happens Next


Shortly after you click Finish, you should see a status in the browser; however, for several seconds
you might see only an empty page. During this brief period, MC runs as the local user 'root' long
enough to bind to port number 5450. Then MC switches to the MC super administrator account that
you just created, restarts MC, and displays the MC login page.

HP Vertica Analytics Platform (7.1.x)

Page 273 of 5055

HP Vertica Documentation

Where to Go Next
If you are a new MC user and this is your first MC installation, you might want to familiarize yourself
with MC design. See Management Console in the Concepts Guide.
If you'd rather use MC now, the following following topics in the Administrator's Guide should help
get you started:
If you want to ...

See ...

Use the MC interface to install HP Vertica on a cluster

Creating a Cluster Using MC

of hosts
Create a new, empty HP Vertica database or import

Managing Database Clusters on MC

an existing HP Vertica database cluster into the MC


interface
Create new MC users and map them to one or more

Managing Users and Privileges (About

HP Vertica databases that you manage through the

MC Users and About MC Privileges and

MC interface

Roles)

Monitor MC and one or more MC-managed HP Vertica

Monitoring HP Vertica Using

databases

Management Console

Change default port assignments or upload a new HP

Managing MC Settings

Vertica license or SSL certificate


Compare MC functionality to functionality that the

Administration Tools and Management

Administration Tools provides

Console

HP Vertica Analytics Platform (7.1.x)

Page 274 of 5055

HP Vertica Documentation

After You Install HP Vertica


The tasks described in this section are optional and are provided for your convenience. When you
have completed this section, proceed to one of the following:
l

Using the Getting Started Guide in the Getting Started Guide

Configuring the Database in the Administrator's Guide

Install the License Key


If you did not supply the -L parameter during setup, or if you did not bypass the -L parameter for a
silent install, the first time you log in as the Database Administrator and run the HP Vertica
Administration Tools or Management Console, HP Vertica requires you to install a license key.
Follow the instructions in Managing licenses in Administrator's Guide.

Optionally Install vsql Client Application on Non-Cluster


Hosts
You can use the HP Vertica vsql executable image on a non-cluster Linux host to connect to an HP
Vertica database.
l

On Red Hat 5.0 64-bit and SUSE 10/11 64-bit, you can install the client driver RPM, which
includes the vsql executable. See Installing the Client RPM on Red Hat and SUSE for details.

If the non-cluster host is running the same version of Linux as the cluster, copy the image file to
the remote system. For example:
$ scp host01:/opt/vertica/bin/vsql .$ ./vsql

If the non-cluster host is running a different version of Linux than your cluster hosts, and that
operating system is not Red Hat version 5 64-bit or SUSE 10/11 64-bit, you must install the HP
Vertica server RPM in order to get vsql. Download the appropriate rpm package from the
Download tab of the myVertica portal then log into the non-cluster host as root and install the
rpm package using the command:
# rpm -Uvh filename

In the above command, filename is the package you downloaded. Note that you do not have to
run the install_HP Vertica script on the non-cluster host in order to use vsql.

HP Vertica Analytics Platform (7.1.x)

Page 275 of 5055

HP Vertica Documentation

Notes
l

Use the same Command-Line Options that you would on a cluster host.

You cannot run vsql on a Cygwin bash shell (Windows). Use ssh to connect to a cluster host,
then run vsql.

In release 5.1.5 vsql is also available for additional platforms. See Installing the vsql Client.

HP Vertica Analytics Platform (7.1.x)

Page 276 of 5055

HP Vertica Documentation

Install HP Vertica Documentation


The latest documentation for your HP Vertica release is available at
http://www.vertica.com/documentation. After you install HP Vertica, you can optionally install the
documentation on your database server and client systems.

Installing the HP Vertica Documentation


To install a local copy of the documentation:
1. Open a Web browser and go to http://www.vertica.com/documentation.
2. Scroll down to Install documentation locally and save the HP Vertica documentation
package (.tar.gz or .zip) to your system; for example, to /tmp.
3. Extract the contents using your preferred unzipping application.
4. The home page for the HTML documentation is located at /HTML/index.htm in the extracted
folder.

Get Started!
HP Vertica lets you choose between instant gratification and a more detailed path in setting up your
example database. Both processes, described in the Getting Started Guide, are simple, and both let
you start using your database immediatelyliterally within minutes.
l

If you can't wait to get started, read about the one-step installation script in Installing the
Example Database.

If you prefer a more thorough, but equally useful example, see Advanced Installation in the
Getting Started Guide.

HP Vertica Analytics Platform (7.1.x)

Page 277 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 278 of 5055

HP Vertica Documentation

Installing Client Drivers


After you install HP Vertica, install drivers on the client systems from which you plan to access
your databases. HP supplies drivers for ADO.NET, JDBC, ODBC Perl, and Python. For
instructions on installing these drivers, see Client driver install procedures in the Connecting to HP
Vertica Guide.

HP Vertica Analytics Platform (7.1.x)

Page 279 of 5055

HP Vertica Documentation

Upgrading HP Vertica
Follow the steps in this section to:
l

Upgrade HP Vertica to a new version.

Upgrade your HP Vertica license.

Upgrade Management Console.

Upgrade the client authentication records to the new format.

Upgrading HP Vertica to a New Version


Requirement Testing
The Version 7.0 installer introduces platform verification tests that prevent the install from
continuing if the platform requirements are not met by your system. Manually verify that your
system meets the requirements in Before You Install HP Vertica before you update the server
package on your systems. These tests ensure that your platform meets the hardware and software
requirements for HP Vertica. Previous versions documented these requirements, but the installer
did not verify all of the settings.
Version 7.0 introduces the installation parameter --failure-threshold that allows you to change
the level at which the installer stops the installation process based on the severity of the failed test.
By default, the installer stops on all warnings. You can change the failure threshold to FAIL to
bypass all warnings and only stop on failures. However, your platform is unsupported until you
correct all warnings generated by the installer. By changing the failure threshold you are able to
immediately upgrade and bring up your HP Vertica database, but performance cannot be
guaranteed until you correct the warnings.

Transaction Catalog Storage


When upgrading from 5.x to a later version of HP Vertica, due to a change in how transaction
catalog storage works in HP Vertica 6.0 and later, the amount of space that the transaction catalog
takes up can increase significantly during and after the upgrade. Verify that you have at least 4
times the size of the Catalog folder in the catalog free (in addition to normal free space
requirements) on your nodes prior to upgrading.
To determine the amount of space the Catalog folder is using, run du -h on the Catalog folder. Do
not run du -h on the entire catalog. Run it specifically on the Catalog folder in the catalog.

HP Vertica Analytics Platform (7.1.x)

Page 280 of 5055

HP Vertica Documentation

For example:
[dbadmin@localhost ~]$ du -h /home/dbadmin/db/v_db_node0001_catalog/Catalog/

Configuration Parameter Storage


As of version 7.1.x, parameter configurations are now stored in the catalog, rather than in individual
vertica.conf files at the node level. If you want to view node-specific settings prior to upgrading,
you can query the CONFIGURATION_PARAMETERS system table on each node to view parameter
values.
When you upgrade to 7.1, HP Vertica performs the following steps:
1. Backs up current vertica.conf files to vertica-orig.conf files.
2. Chooses the most up-to-date node's configuration parameter settings as the database-level
settings.
3. Stores new database-level values in the catalog.
4. Checks whether the values in all the nodes' vertica.conf files match the database-level
values. If not, Vertica rewrites that node's vertica.conf file to match database level settings.
The previous settings can still be referenced in each node's vertica-orig.conf files.
If you previously made manual changes to individual vertica.conf files, you can re-set those
node-specific settings using ALTER NODEafter you upgrade. You will still be able to reference the
previous values in the vertica-orig.conf files.
Important: Once you upgrade to 7.1, do not hand edit any vertica.conf files. Additionally, do
not use any workarounds for syncing vertica.conf files.

Upgrading HP Vertica
Follow these steps to upgrade your database. Note that upgrades are incremental and must follow
one of the following upgrade paths:
l

HP Vertica 3.5 to 4.0

HP Vertica 4.0 to 4.1

HP Vertica 4.1 to 5.0

HP Vertica Analytics Platform (7.1.x)

Page 281 of 5055

HP Vertica Documentation

HP Vertica 4.1 to 5.1

HP Vertica 5.0 to 5.1

HP Vertica 5.0 to 6.0

HP Vertica 5.1 to 6.0

HP Vertica 6.0 to 6.1

HP Vertica 6.1 to 7.0. If you have enabled LDAPover SSL/TLS, read Configuring LDAP Over
SSL/TLS When Upgrading HPVertica before upgrading.

HP Vertica 7.0 to 7.1

Important: Hewlett-Packard strongly recommends that you follow the upgrade paths. Be sure
to read the Release Notes and New Features for each version you skip. The HP Vertica
documentation is available in the rpm, as well as at http://www.vertica.com/documentation
(which also provides access to previous versions of the documentation).
1. Back up your existing database. This is a precautionary measure so that you can restore from
the backup if the upgrade is unsuccessful.
2. Stop the database using admintools if it is not already stopped. See Stopping a Database.
3. On each host that you have an additional package installed, such as the R Language Pack,
uninstall the package. For example: rpm -e vertica-R-lang.

Important: If you fail to uninstall HP Vertica packages prior to upgrading the server
package, then the server package fails to install due to dependencies on the earlier version
of the package.
4. On any host in the cluster, install the new HP Vertica Server RPM or DEB. See Download and
Install the HP Vertica Server Package.
For example:
rpm syntax:
# rpm -Uvh /home/dbadmin/vertica-x86_64.RHEL5.rpm

HP Vertica Analytics Platform (7.1.x)

Page 282 of 5055

HP Vertica Documentation

deb syntax:
# dpkg -i /home/dbadmin/vertica-x86_64.RHEL5.deb

Important: If you fail to install the rpm or deb prior to running the next step, then update_
vertica fails with an error due to the conflict between the version of the update_vertica
script and the version of the rpm argument.
5. As root or sudo, run update_vertica. Use the same options that you used when you last
installed or upgraded the database, except for the --hosts/-s host_list parameter, as
the upgrade script automatically determines the hosts in the cluster.
If you forgot the options that were last used, open /opt/vertica/config/admintools.conf
in a text editor and find the line that starts with install_opts. This line details each option. It
is important to use the same options that were used previously as omitting any options used
previously causes them to revert to their default setting when the upgrade script runs. Also, if
you use different options than originally used, then the update script reconfigures the cluster to
use the new options, which can cause issues with your existing database.
Installing HP Vertica with the install_vertica Script provides details on all options available to
the update_vertica script. update_vertica uses the same options as install_vertica.
For example:
# /opt/vertica/sbin/update_vertica

--rpm /home/dbadmin/vertica-x86_64.RHEL5.rpm

Important: The rpm/deb file must be readable by the dbadmin user when upgrading. Some
upgrade scripts are run as the dbadmin user, and that user must be able to read the
rpm/deb file.
6. Start the database. The start-up scripts analyze the database and perform any necessary data
and catalog updates for the new version.
7. Perform another backup. When moving from Version 5.0 and earlier to Version 5.1 and later,
the backup process changes from using backup.sh to using vbr.py. You cannot use an
incremental backup between these different versions of backup scripts. Create a full backup
the first time you move to using vbr.py, and optionally use incremental backups as you
continue to upgrade. However, HP Vertica recommends doing full backups each time if disk
space and time allows.

HP Vertica Analytics Platform (7.1.x)

Page 283 of 5055

HP Vertica Documentation

8. Continue along the upgrade path and perform these same steps for each version in your
upgrade path.
9. After you have upgraded to the latest version of the server, install any additional packs you
previously removed. See the pack install/upgrade instructions for details on upgrading the
packs. For R, see Installing/Upgrading the R Language Pack for HP Vertica.

Notes
l

Release 5.1 introduced a new backup utility, vbr.py. This utility replaced both the backup.sh
and restore.sh scripts, making both obsolete. Any backups created with backup.sh are
incompatible with backups created with vbr.py. HP Vertica recommends that you use the
current utility vbr.py as soon as possible after successfully upgrading from a version prior to
Release 5.1 to Release 5.1 or later. Documentation for the 5.0 scripts remained in the 5.1
documentation. However, the topics were marked Obsolete in that version and were removed
from later versions of the documentation.

Downgrade installations are not supported.

Configuring LDAP Over SSL/TLS When Upgrading HPVertica


If you have LDAP enabled over SSL/TLS, in HP Vertica 7.0, the certificate authentication is more
secure than in previous releases. Before you upgrade to HP Vertica 7.0, you must perform several
tasks to connect to the LDAP server after the upgrade.
When using SSL/TLS and upgrading to 7.1, note that the SSLCertificate and SSLPrivateKey
parameters are automatically set by Admintools if you set EnableSSL=1 in the previous version.
This section describes the steps you should follow when setting up secure LDAP authentication on
a new installation of HP Vertica 7.0. The section also includes the procedure you should follow
should you choose to revert to the more permissive behavior used in HP Vertica 6.1.
l

Using HPVertica 7.0 Secure LDAP Authentication

Using HPVertica 6.1 Secure LDAP Authentication

Using HPVertica 7.0 Secure LDAP Authentication


If you are a new customer installing HP Vertica 7.0 and you want to use LDAP over SSL/TLS, take
the following steps on all cluster nodes. You must perform these steps to configure

HP Vertica Analytics Platform (7.1.x)

Page 284 of 5055

HP Vertica Documentation

LDAPauthentication:
1. If necessary, modify the LDAP authentication record in your vertica.conf file to point to the
correct server.
2. As the root user, if necessary, create an ldap.conf file and add the following settings. The
TLS_REQCERToption is required. You must include either theTLS_CACERTor TLS_CADIRoption.
TLS_REQCERT hard
TLS_CACERT = /<certificate_path>/CA-cert-bundle.crt
or
TLS_CADIR = <certificate_path>

The options for TLS_REQCERTare:


n

hard: If the client does not provide a certificate or provides an invalid certificate, they cannot
connect. This is the default behavior.

never: The client does not request or check a certificate.

allow: If the client does not provide a certificate or provides an invalid certification, they can
connect anyway.

try: If the client does not provide a certificate, the client can connect. If the client provides
an invalid certificate, they cannot connect.
TLS_CACERT specifies the path to the file that contains the certificates.
TLS_CADIR specifies the path to the directory that contains the certificates.

3. Store the ldap.conf file in a location that is readable by DBADMIN. The DBADMINmust be
able to access the ldap.conf file and all path names specified in the ldap.conf file on all
cluster nodes.
4. Set the Linux LDAPCONF environment variable to point to this ldap.conf file.
Make sure this environment variable is set before you start the HP Vertica software or you
create a database. To ensure that this happens, add a command to the DBADMIN's profile to
set LDAPCONF to point to the ldap.conf file every time you start the database.
If you start the database using a script like a startup or init file, add steps to the script that set
the LDAPCONF variable to point to the ldap.conf file.

HP Vertica Analytics Platform (7.1.x)

Page 285 of 5055

HP Vertica Documentation

5. Test that LDAP authentication works with and without SSL/TLS. You can use the ldapsearch
tool for this.
6. Repeat steps 15 for all cluster nodes.

Using HPVertica 6.1 Secure LDAP Authentication


If you have LDAP enabled over SSL/TLS and you want to use the more permissive LDAP settings
used in HP Vertica 6.1, perform the following tasks on all cluster nodes.These settings allow HP
Vertica to connect to the LDAP server, even if authentication fails. You must perform these tasks
before you upgrade to HP Vertica 7.0 and you must perform them on all cluster nodes:
1. If necessary, modify the LDAP authentication record in your vertica.conf file to point to the
correct server.
2. As the root user, create or modify the ldap.conf file and make the following changes to
ldap.conf:
TLS_REQCERT allow

hard: If you do not provide a certificate or you provide an invalid certificate, you cannot
connect. This is the default.

never: The client will not request or check a certificate..

allow: If you do not provide a certificate, or you provide an invalid certification, you can
connect anyway. This is consistent with the behavior in HP Vertica 6.1.

try: If you do not provide a certificate, you can connect. If you provide an invalid certificate,
you cannot connect.

3. Store the ldap.conf file in a location that is readable by DBADMIN. The DBADMINmust be
able to access the ldap.conf file and all path names specified in the ldap.conf file on all
cluster nodes.
4. Set the Linux LDAPCONF environment variable to point to this ldap.conffile.
Make sure this environment variable is set before you start the HP Vertica software or you
create a database. To ensure that this happens, add a command to the DBADMIN's Linux
profile to set LDAPCONF to point to the ldap.conf file every time you log in.

HP Vertica Analytics Platform (7.1.x)

Page 286 of 5055

HP Vertica Documentation

5. If you start the database using a script like a startup or init file, add steps to the script that set
the LDAPCONF variable to point to the ldap.conf file.
6. Test that LDAP authentication works with and without SSL/TLS. You can use the ldapsearch
tool for this.
7. Repeat steps 15 for all cluster nodes.

HP Vertica Analytics Platform (7.1.x)

Page 287 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 288 of 5055

HP Vertica Documentation

Upgrading Your HP Vertica License


To upgrade from the Community Edition license, obtain an evaluation or Enterprise Edition license
from HP. For information on applying your new license, see Installing or Upgrading a License Key in
the Administrator's Guide.

Upgrading MC
If you are moving from MC 6.1.1 to MC 6.1.2, you can install MC on any HP Vertica cluster node.
This scenario requires a fresh install because HP does not provide scripts to migrate metadata (MC
users and settings) established in earlier releases from your existing server to the cluster node. See
Installing and Configuring Management Console.
After you install and configure MC, you will need to recreate MC users you'd created for your 6.1.1
MC instance, if any, and apply previous MC settings to the new MC version.
Tip: You can export MC-managed database messages and user activity to a location on the
existing server. While you can't import this data, using the exported files as a reference could
help make metadata recreation easier. See Exporting MC-managed Database Messages and
Logs and Exporting the User Audit Log.

If You Want to Keep MC on the Existing Server


If you want to keep MC on the same server (such as on the dedicated server that had been required
in previous MC releases), your MC metadata is retained when you run the vertica-console
installation script.

Before You Upgrade MC on the Same Server


1. Log in as root or a user with sudo privileges on the server where MC is already installed.
2. Open a terminal window and shut down the MC process using the following command:
# /etc/init.d/vertica-consoled stop

3. Back up MC to preserve configuration metadata. See Backing Up MC .

HP Vertica Analytics Platform (7.1.x)

Page 289 of 5055

HP Vertica Documentation

Upgrade MC on the Same Server


1. Download the MC package (vertica-console-<current-version>.<Linux-distro>) from
myVertica portal and save it to a location on the target server, such as /tmp.
2. On the target server, log in as root or a user with sudo privileges.
3. Change directory to the location where you saved the MC package.
4. Install MC using your local Linux distribution package management system (for example, rpm,
yum, zipper, apt, dpkg).
The following command is a generic example for Red Hat 5:
# rpm -Uvh vertica-console-<current-version>.x86_64.RHEL5.rpm

The following command is a generic example for Debian 5 and Debian 6:


# dpkg -i vertica-console-<current-version>.deb

For Ubuntu systems, use sudo:


$ sudo dpkg -i vertica-console-<current-version>.deb

5. Open a browser and enter the IP address or host name of the server on which you installed
MC, as well as the default MC port 5450.
For example, you'll enter one of:
https://xx.xx.xx.xx:5450/ https://hostname:5450/

6. When the Configuration Wizard dialog box appears, proceed to Configuring MC.

Upgrading Client Authentication in HP Vertica


HP Vertica 7.1.0 changed the storage location for the client authentication records from the
vertica.conf file to the database catalog. When you upgrade to HP Vertica 7.1.1, the client
authentication records in the vertica.conf file are converted and inserted into the database
catalog. HP Vertica updates the catalog information on all nodes in the cluster.

HP Vertica Analytics Platform (7.1.x)

Page 290 of 5055

HP Vertica Documentation

Authentication is not enabled after upgrading. As a result, all users can connect to the database.
However, if they have a password, they must enter it.
Take the following steps to make sure that client authentication is configured correctly and enabled
for use with a running database:
1. Review the client authentication methods that HP Vertica created during the upgrade. The
following system tables contain information about those methods:
n

CLIENT_AUTHContains information about the client authentication methods that HP


Vertica created for your database during the upgrade.

CLIENT_AUTH_PARAMSContains information about the parameters that HP Vertica defined


for the GSS, Ident, and LDAPauthentication methods.

USER_CLIENT_AUTHContains information about which authentication methods are


associated with which database users. You associate an authentication method with a user
with the GRANT(Authentication) statement.

2. Review the vertica.log file to see which authentication records HP Vertica was not able to
create during the upgrade.
3. Create any required new records using CREATE AUTHENTICATION.
4. After the upgrade, enable all the defined authentication methods. You need to enter an
ALTERAUTHENTICATIONstatement for each method as follows:
=> ALTER AUTHENTICATION auth_method_name ENABLE;

5. If you are using LDAPover SSL/TLS, you must define the new parameters:
n

tls_reqcert

tls_cacert

To do so, use ALTERAUTHENTICATIONas follows:


=> ALTER AUTHENTICATION Ldap1 SET host='ldaps://abc.dc.com', binddn_prefix='CN=',
binddn_suffix=',OU=Unit2,DC=dc,DC=com', basedn='dc=DC,dc=com',
tls_cacert='/home/dc.com.ca.cer', starttls='hard', tls_reqcert='never';

6. Create an authentication method (LOCALTRUSTor LOCALPASSWORD)with a very high


priority, such as, 10,000. Grant this method to the DBADMINuser, and set the priority using

HP Vertica Analytics Platform (7.1.x)

Page 291 of 5055

HP Vertica Documentation

ALTERAUTHENTICATION.For example:
=> CREATE AUTHENTICATION dbadmin_default TRUST LOCAL;
=> ALTER AUTHENTICATION dbadmin_default PRIORITY 10000;

With the high priority, this new authentication method supersedes any authentication methods
you create for PUBLIC. Even if you make changes to PUBLICauthentication methods, the
DBADMINuser can connect to the database at any time.

HP Vertica Analytics Platform (7.1.x)

Page 292 of 5055

HP Vertica Documentation

Uninstalling HP Vertica
You can uninstall HP Vertica and Management Console by running commands at the command
line.

Uninstalling HP Vertica
To uninstall HP Vertica:
1. For each host in the cluster, do the following:
a. Choose a host machine and log in as root (or log in as another user and switch to root).
$ su - root
password: root-password

b. Find the name of the package that is installed:


# rpm -qa | grep vertica

For deb packages:


# dpkg -l | grep vertica

c. Remove the package:


# rpm -e package

For deb packages:


# dpkg -r package

Note: If you want to delete the configuration file used with your installation, you can
choose to delete the /opt/vertica/ directory and all subdirectories using this command:
# rm -rf /opt/vertica/
2. For each client system, do the following:

HP Vertica Analytics Platform (7.1.x)

Page 293 of 5055

HP Vertica Documentation

a. Delete the JDBC driver jar file.


b. Delete ODBC driver data source names.
c. Delete the ODBC driver software by doing the following:
i. In Windows, go to Start > Control Panel > Add or Remove Programs.
ii. Locate HP Vertica.
iii. Click Remove.

Uninstalling MC
The uninstall command shuts down Management Console and removes most of the files that MC
installation script installed.

Uninstall MC
1. Log in to the target server as root.
2. Stop Management Console:
# /etc/init.d/vertica-consoled stop

3. Look for previously-installed versions of MC and note the version:


# rpm -qa | grep vertica

4. Remove the package:


# rpm -e <vertica-console>

Note: If you want to delete the MC directory and all subdirectories, use the following
command: # rm -rf /opt/vconsole

If You Want to Reinstall MC


To re-install MC, see Installing and Configuring Management Console.

HP Vertica Analytics Platform (7.1.x)

Page 294 of 5055

HP Vertica Documentation

Troubleshooting the HP Vertica Install


The topics described in this section are performed automatically by the install_vertica script
and are described in Installing HP Vertica. If you did not encounter any installation problems,
proceed to the Administrator's Guide for instructions on how to configure and operate a database.

HP Vertica Analytics Platform (7.1.x)

Page 295 of 5055

HP Vertica Documentation

Validation Scripts
HP Vertica provides several validation utilities that can be used prior to deploying HP Vertica to
help determine if your hosts and network can properly handle the processing and network traffic
required by HP Vertica. These utilities can also be used if you are encountering performance issues
and need to troubleshoot the issue.
After you install the HP Vertica RPM, you have access to the following scripts in
/opt/vertica/bin:
l

Vcpuperf - a CPU performance test used to verify your CPU performance.

Vioperf - an Input/Output test used to verify the speed and consistency of your hard drives.

Vnetperf - a Network test used to test the latency and throughput of your network between
hosts.

These utilities can be run at any time, but are well suited to use before running the install_vertica
script.

Vcpuperf
The vcpuperf utility measures your server's CPU processing speed and compares it against
benchmarks for common server CPUs. The utility performs a CPU test and measures the time it
takes to complete the test. The lower the number scored on the test, the better the performance of
the CPU.
The vcpuperf utility also checks the high and low load times to determine if CPU throttling is
enabled. If a server's low-load computation time is significantly longer than the high-load
computation time, CPU throttling may be enabled. CPU throttling is a power-saving feature.
However, CPU throttling can reduce the performance of your server. HP Vertica recommends
disabling CPU throttling to enhance server performance.

Syntax
vcpuperf [-q]

Option
Option Description
-q

Run in quiet mode. Quiet mode displays only the CPU Time, Real Time, and high and
low load times.

HP Vertica Analytics Platform (7.1.x)

Page 296 of 5055

HP Vertica Documentation

Returns
l

CPU Time: the amount of time it took the CPU to run the test.

Real Time: the total time for the test to execute.

High load time: The amount of time to run the load test while simulating a high CPU load.

Low load time: The amount of time to run the load test while simulating a low CPU load.

Example
The following example shows a CPU that is running slightly slower than the expected time on a
Xeon 5670 CPU that has CPU throttling enabled.
[root@docb04 bin]# /opt/vertica/bin/vcpuperfCompiled with: 4.1.2 20080704 (Red Hat 4.1.252)Expected time on Core 2, 2.53GHz: ~9.5sExpected time on Nehalem, 2.67GHz:
~9.0sExpected time on Xeon 5670, 2.93GHz: ~8.0sThis machine's time: CPU Time: 8.540000s
Real Time:8.710000sSome machines automatically throttle the CPU to save power. This test
can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz). Low load times much
larger than 100-200us or much larger than the corresponding
high load time indicate low-load throttling, which can adversely affect small query /
concurrent performance.This machine's high load time: 67 microseconds.This machine's low
load time: 208 microseconds.

Vioperf
The vioperf utility quickly tests the performance of your host's input and output subsystem. The
utility performs the following tests:
l

sequential write

sequential rewrite

sequential read

skip read (read non-contiguous data blocks)

The utility verifies that the host reads the same bytes that it wrote and prints its output to STDOUT.
The utility also logs the output to a JSON formatted file.

Syntax
vioperf [--help] [--duration=<INTERVAL>] [--log-interval=<INTERVAL>] [--log-file=<FILE>] [-condense-log] [<DIR>*]

HP Vertica Analytics Platform (7.1.x)

Page 297 of 5055

HP Vertica Documentation

Minimum and Recommended IO Performance


l

The minimum required I/O is 20 MB/s read/write per physical processor core on each node, in
full duplex (reading and writing) simultaneously, concurrently on all nodes of the cluster.

The recommended I/O is 40 MB/s per physical core on each node.

For example, the I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240
MB/s required minimum, 480 MB/s recommended.

Options
Option

Description

--help

Prints a help message and exits.

--duration

The length of time vioprobe runs performance tests. The default is 5 minutes.
Specify the interval in seconds, minutes, or hours with any of these suffixes:

--log-interval

Seconds: s, sec, secs, second, seconds. Example: --duration=60sec

Minutes: m, min, mins, minute, minutes. Example: --duration=10min

Hours: h, hr, hrs, hour, hours. Example: --duration=1hrs

The interval at which the log file reports summary information. The default
interval is 10 seconds. This option uses the same interval notation as -duration.

--log-file

The path and name where log file contents are written, in JSON. If not specified,
then vioperf creates a file named resultsdate-time.JSON in the current
directory.

--condense-log

Directs vioperf to write the log file contents in condensed format, one JSON
entry per line, rather than as indented JSON syntax.

<DIR>

Zero or more directories to test. If you do not specify a directory, vioperf tests
the current directory. To test the performance of each disk, specify different
directories mounted on different disks.

Returns
The utility returns the following information:

HP Vertica Analytics Platform (7.1.x)

Page 298 of 5055

HP Vertica Documentation

Heading

Description

test

The test being run (Write, ReWrite, Read, or Skip Read)

directory

The directory in which the test is being run.

counter name

The counter type of the test being run. Can be either MB/s or
Seeks per second.

counter value

The value of the counter in MB/s or Seeks per second across


all threads. This measurement represents the bandwidth at
the exact time of measurement. Contrast with counter value
(avg).

counter value (10 sec avg)

The average amount of data in MB/s, or the average number


of Seeks per second, for the test being run in the duration
specified with --log-interval. The default interval is 10
seconds. The counter value (avg) is the average
bandwidth since the last log message, across all threads.

counter value/core

The counter value divided by the number of cores.

counter value/core (10 sec avg)

The counter value (10 sec avg) divided by the number of


cores.

thread count

The number of threads used to run the test.

%CPU

The available CPU percentage used during this test.

%IO Wait

The CPU percentage in I/O Wait state during this test. I/O
wait state is the time working processes are blocked while
waiting for I/O operations to complete.

elapsed time

The amount of time taken for a particular test. If you run the
test multiple times, elapsed time increases the next time the
test is run.

remaining time

The time remaining until the next test. Based on the -duration option, each of the tests is run at least once. If the
test set is run multiple times, then remaining time is how
much longer the test will run. The remaining time value is
cumulative. Its total is added to elapsed time each time the
same test is run again.

HP Vertica Analytics Platform (7.1.x)

Page 299 of 5055

HP Vertica Documentation

Example
Invoking vioperf from a terminal outputs the following message and sample results:
[dbadmin@node01 ~]$ /opt/vertica/bin/vioperf --duration=60s
The minimum required I/O is 20 MB/s read and write per physical processor core on each
node, in full duplex i.e. reading and writing at this rate simultaneously, concurrently
on all nodes of the cluster. The recommended I/O is 40 MB/s per physical core on each
node. For example, the I/O rate for a server node with 2 hyper-threaded six-core CPUs is
240 MB/s required minimum, 480 MB/s recommended.
Using direct io (buffer size=1048576, alignment=512) for directory "/home/dbadmin"
test
| directory
| counter name
|
counter value
| counter value (10 sec avg)
| counter value/core | counter
value/core (10 sec avg) | thread count | %CPU | %IO Wait | elapsed time (s)| remaining
time (s)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Write
| /home/dbadmin/VMart/v_vmart_node0001_catalog
| MB/s
| 71
| 71
| 71
| 71
| 1
| 92
| 1
| 10
| 65
Write
| /home/dbadmin/VMart/v_vmart_node0001_catalog
| MB/s
| 67
| 63
| 67
| 63
| 1
| 94
| 0
| 20
| 55
Write
| /home/dbadmin/VMart/v_vmart_node0001_catalog
| MB/s
| 66
| 63
| 66
| 63
| 1
| 93
| 0
| 30
| 45
Write
| /home/dbadmin/VMart/v_vmart_node0001_catalog
| MB/s
| 65
| 64
| 65
| 64
| 1
| 94
| 0
| 40
| 35

Vnetperf
The vnetperf utility allows you to measure the network performance of your hosts. It can measure
network latency and the throughput for both the TCP and UDP protocols.
Important: This utility introduces a high network load and must not be used on a running HP
Vertica cluster or database performance is degraded.
Using this utility you can detect:
l

if throughput is low for all hosts or a particular host,

if latency is high for all hosts or a particular host,

bottlenecks between one or more hosts or subnets,

HP Vertica Analytics Platform (7.1.x)

Page 300 of 5055

HP Vertica Documentation

too low a limit in the number of TCP connections that can be established simultaneously,

and if there is a high rate of packet loss on the network.

The latency test measures the latency from the host running the script to the other hosts. Any host
that has a particularly high latency should be investigated further.
The throughput tests measure both UDP and TCP throughput. You can specify a rate limit in MB/s
to use for these tests, or allow the utility to use a range of throughputs to be used.

Syntax
vnetperf [options] [tests]

Recommended Network Performance


l

The maximum recommended RTT (round-trip time) latency is 1000 microseconds. The ideal
RTT latency is 200 microseconds or less. HP Vertica recommends that clock skew be kept to
under 1 second.

The minimum recommended throughput is 100MB/s. Ideal throughput is 800 MB/s or more.

Note: UDP numbers may be lower, multiple network switches may reduce performance
results.

Options
Option

Description

--condense

Condense the log into one JSON entry per line, instead of
indented JSON syntax.

--collect-logs

Collect the test log files from each host.

--datarate rate

Limit the throughput to this rate in MB/s. A rate of 0 loops the tests
through several different rates. The default is 0.

--duration seconds

The time limit for each test to run in seconds. The default is 1.

--hosts host1,host2,...

A comma-separated list of hosts on which to run the tests. Do not


use spaces between the comma's and the host names.

HP Vertica Analytics Platform (7.1.x)

Page 301 of 5055

HP Vertica Documentation

Option

Description

--hosts file

A hosts file that specifies the hosts on which to run the tests. If
the --hosts argument is not used, then the utility attempts to
access admintools and determine the hosts in the cluster.

--identity-file file

If using passwordless SSH/SCP access between the hosts, then


specify the key file used to gain access to the hosts.

--ignore-bad-hosts

If set, run the tests on the reachable hosts even if some hosts are
not reachable. If not set, and a host is unreachable, then no tests
are run on any hosts.

--log-dir directory

If --collect-logs is set, the directory in which to place the collected


logs. The default directory is named logs.netperf.<timestamp>

--log-level LEVEL

The log level to use. Possible values are: INFO, ERROR,


DEBUG, and WARN. The default is WARN.

--list-tests

Lists the tests that can be run by this utility.

--output-file file

The file that JSON results are written to. The default is
results.<timestamp>.json.

--ports port1,port2,port3

The port numbers to use. If only one is specified then the next two
numbers in sequence are also used. The default ports are
14159,14160, 14161.

--scp-options 'options'

Using this argument, you can specify one or more standard SCP
command line arguments enclosed in single quotes. SCP is used
to copy test binaries over to the target hosts.

--ssh-options 'options'

Using this argument, you can specify one or more standard SSH
command line arguments enclose in single quotes. SSH is used
to issue test commands on the target hosts.

--vertica-install directory

If specified, then the utility assumes HP Vertica is installed on


each of the hosts and to use the test binaries on the target system
rather than copying them over using SCP.

Tests
Note: If the tests argument is omitted then all tests are run.

HP Vertica Analytics Platform (7.1.x)

Page 302 of 5055

HP Vertica Documentation

Test

Description

latency

Test the latency to each of the hosts.

tcp-throughput

Test the TCP throughput amongst the hosts.

udp-throughput

Test the UDP throughput amongst the hosts.

Returns
For each host it returns the following:
Latency test returns:
l

The Round Trip Time (rtt) latency for each host in milliseconds.

Clock Skew = the difference in time shown by the clock on the target host relative to the host
running the utility.

UDP and TCP throughput tests return:


l

The date/time and test name.

The rate limit in MB/s.

The node being tested.

Sent and Received data in MB/s and bytes.

The duration of the test in seconds.

Example
/opt/vertica/bin/vnetperf --condense -hosts 10.20.100.66,10.20.100.67 --identity-file
'/root/.ssh/vid_rsa'

Enable Secure Shell (SSH) Logins


The administrative account must be able to use Secure Shell (SSH) to log in (ssh) to all hosts
without specifying a password. The shell script install_vertica does this automatically. This section
describes how to do it manually if necessary.

HP Vertica Analytics Platform (7.1.x)

Page 303 of 5055

HP Vertica Documentation

1. If you do not already have SSH installed on all hosts, log in as root on each host and install it
now. You can download a free version of the SSH connectivity tools from OpenSSH.
2. Log in to the HP Vertica administrator account (dbadmin in this example).
3. Make your home directory (~) writable only by yourself. Choose one of:
$ chmod 700 ~

or
$ chmod 755 ~

where:
700 includes

755 includes

400 read by owner

400 read by owner

200 write by owner

200 write by owner

100 execute by owner 100 execute by owner


040 read by group
010 execute by group
004 read by anybody (other)
001 execute by anybody
4. Change to your home directory:
$ cd ~

5. Generate a private key/ public key pair:


$ ssh-keygen -t rsaGenerating public/private rsa key pair.
Enter file in which to save the key (/home/dbadmin/.ssh/id_rsa):
Created directory '/home/dbadmin/.ssh'.
Enter passphrase (empty for no passphrase):

HP Vertica Analytics Platform (7.1.x)

Page 304 of 5055

HP Vertica Documentation

Enter same passphrase again:


Your identification has been saved in /home/dbadmin/.ssh/id_rsa.
Your public key has been saved in /home/dbadmin/.ssh/id_rsa.pub.

6. Make your .ssh directory readable and writable only by yourself:


$ chmod 700 ~/.ssh

7. Change to the .ssh directory:


$ cd ~/.ssh

8. Copy the file id_rsa.pub onto the file authorized_keys2.


$ cp id_rsa.pub authorized_keys2

9. Make the files in your .ssh directory readable and writable only by yourself:
$ chmod 600 ~/.ssh/*

10. For each cluster host:


$ scp -r ~/.ssh <host>:.

11. Connect to each cluster host. The first time you ssh to a new remote machine, you could get a
message similar to the following:
$ ssh dev0 Warning: Permanently added 'dev0,192.168.1.92' (RSA) to the list of known
hosts.

This message appears only the first time you ssh to a particular remote host.

See Also
l

OpenSSH

HP Vertica Analytics Platform (7.1.x)

Page 305 of 5055

HP Vertica Documentation

Appendix: Time Zones


Using Time Zones With HPVertica
HP Vertica uses the TZ environment variable on each node, if it has been set, for the default current
time zone. Otherwise, HP Vertica uses the operating system time zone.
The TZ variable can be set by the operating system during login (see /etc/profile,
/etc/profile.d, or /etc/bashrc) or by the user in .profile, .bashrc or .bash-profile.
TZ must be set to the same value on each node when you start HP Vertica.
The following command returns the current time zone for your database:
=> SHOW TIMEZONE;
name
|
----------+-----------------timezone | America/New_York
(1 row)

setting

You can also use the SET TIMEZONE TO {value | 'value' } command to set the time zone for a
single session.
There is no database default time zone; instead, TIMESTAMP WITH TIMEZONE
(TIMESTAMPTZ) data is stored in GMT (UTC) by converting data from the current local time zone
to GMT.
When TIMESTAMPTZ data is used, data is converted back to use the current local time zone,
which might be different from the local time zone where the data was stored.This conversion takes
into account Daylight Saving Time (Summer Time), if applicable, depending on the year and date, to
know when the Daylight Saving Time change occurred.
TIMESTAMP WITHOUT TIMEZONE data stores the timestamp, as given, and retrieves it exactly
as given.The current time zone is ignored. The same is true for TIME WITHOUT TIMEZONE.For
TIME WITH TIMEZONE (TIMETZ), however, the current time zone setting is stored along with the
given time, and that time zone is used on retrieval.
Note: HP recommends that you use TIMESTAMPTZ, not TIMETZ.
TIMESTAMPTZ uses the current time zone on both input and output, such as in the following
example:
=> CREATE TEMP TABLE s (tstz TIMESTAMPTZ);=> SET TIMEZONE TO 'America/New_York';
=> INSERT INTO s VALUES ('2009-02-01 00:00:00');

HP Vertica Analytics Platform (7.1.x)

Page 306 of 5055

HP Vertica Documentation

=> INSERT INTO s VALUES ('2009-05-12 12:00:00');


=> SELECT tstz AS 'Local timezone', tstz AT TIMEZONE 'America/New_York' AS 'America/New_
York',
tstz AT TIMEZONE 'GMT' AS 'GMT' FROM s;
Local timezone
| America/New_York
|
GMT
------------------------+---------------------+--------------------2009-02-01 00:00:00-05 | 2009-02-01 00:00:00 | 2009-02-01 05:00:00
2009-05-12 12:00:00-04 | 2009-05-12 12:00:00 | 2009-05-12 16:00:00
(2 rows)

The -05 in the Local time zone column above shows that the data is displayed in EST, while -04
indicates EDT. The other two columns show the TIMESTAMP WITHOUT TIMEZONE at the
specified time zone.
The next example illustrates what occurs if the current time zone is changed to, for example,
Greenwich Mean Time:
=> SET TIMEZONE TO 'GMT';=> SELECT tstz AS 'Local timezone', tstz AT TIMEZONE
'America/New_York' AS
'America/New_York', tstz AT TIMEZONE 'GMT' as 'GMT' FROM s;
Local timezone
| America/New_York
|
GMT
------------------------+---------------------+--------------------2009-02-01 05:00:00+00 | 2009-02-01 00:00:00 | 2009-02-01 05:00:00
2009-05-12 16:00:00+00 | 2009-05-12 12:00:00 | 2009-05-12 16:00:00
(2 rows)

The +00 in the Local time zone column above indicates that TIMESTAMPTZ is displayed in 'GMT'.
The approach of using TIMESTAMPTZ fields to record events captures the GMT of the event, as
expressed in terms of the local time zone. Later, it allows for easy conversion to any other time
zone, either by setting the local time zone or by specifying an explicit AT TIMEZONE clause.
The following example shows how TIMESTAMP WITHOUT TIMEZONE fields work in HP Vertica.
=> CREATE TEMP TABLE tnoz (ts TIMESTAMP);=> INSERT INTO tnoz VALUES('2009-02-01
00:00:00');
=> INSERT INTO tnoz VALUES('2009-05-12 12:00:00');
=> SET TIMEZONE TO 'GMT';
=> SELECT ts AS 'No timezone', ts AT TIMEZONE 'America/New_York' AS
'America/New_York', ts AT TIMEZONE 'GMT' AS 'GMT' FROM tnoz;
No timezone
|
America/New_York
|
GMT
---------------------+------------------------+-----------------------2009-02-01 00:00:00 | 2009-02-01 05:00:00+00 | 2009-02-01 00:00:00+00
2009-05-12 12:00:00 | 2009-05-12 16:00:00+00 | 2009-05-12 12:00:00+00
(2 rows)

The +00 at the end of a timestamp indicates that the setting is TIMESTAMP WITH TIMEZONE in
GMT (the current time zone). The 'America/New_York' column shows what the 'GMT' setting was
when you recorded the time, assuming you read a normal clock in the time zone 'America/New_

HP Vertica Analytics Platform (7.1.x)

Page 307 of 5055

HP Vertica Documentation

York'. What this shows is that if it is midnight in the 'America/New_York' time zone, then it is 5 am
GMT.
Note: 00:00:00 Sunday February 1, 2009 in America/New_York converts to 05:00:00 Sunday
February 1, 2009 in GMT.
The 'GMT' column displays the GMT time, assuming the input data was captured in GMT.
If you don't set the time zone to GMT, and you use another time zone, for example 'America/New_
York', then the results display in 'America/New_York' with a -05 and -04, showing the difference
between that time zone and GMT.
=> SET TIMEZONE TO 'America/New_York';=> SHOW TIMEZONE;
name
|
setting
----------+-----------------timezone | America/New_York
(1 row)
=> SELECT ts AS 'No timezone', ts AT TIMEZONE 'America/New_York' AS
'America/New_York', ts AT TIMEZONE 'GMT' AS 'GMT' FROM tnoz;
No timezone
|
America/New_York
|
GMT
---------------------+------------------------+-----------------------2009-02-01 00:00:00 | 2009-02-01 00:00:00-05 | 2009-01-31 19:00:00-05
2009-05-12 12:00:00 | 2009-05-12 12:00:00-04 | 2009-05-12 08:00:00-04
(2 rows)

In this case, the last column is interesting in that it returns the time in New York, given that the data
was captured in 'GMT'.

See Also
l

TZ Environment Variable

SET TIME ZONE

Date/Time Data Types

Africa
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers

HP Vertica Analytics Platform (7.1.x)

Page 308 of 5055

HP Vertica Documentation

Africa/Asmera
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau
Africa/Blantyre
Africa/Brazzaville
Africa/Bujumbura
Africa/Cairo Egypt
Africa/Casablanca
Africa/Ceuta
Africa/Conakry
Africa/Dakar
Africa/Dar_es_Salaam
Africa/Djibouti
Africa/Douala
Africa/El_Aaiun
Africa/Freetown
Africa/Gaborone
Africa/Harare
Africa/Johannesburg
Africa/Kampala
Africa/Khartoum
Africa/Kigali
Africa/Kinshasa

HP Vertica Analytics Platform (7.1.x)

Page 309 of 5055

HP Vertica Documentation

Africa/Lagos
Africa/Libreville
Africa/Lome
Africa/Luanda
Africa/Lubumbashi
Africa/Lusaka
Africa/Malabo
Africa/Maputo
Africa/Maseru
Africa/Mbabane
Africa/Mogadishu
Africa/Monrovia
Africa/Nairobi
Africa/Ndjamena
Africa/Niamey
Africa/Nouakchott
Africa/Ouagadougou
Africa/Porto-Novo
Africa/Sao_Tome
Africa/Timbuktu
Africa/Tripoli Libya
Africa/Tunis
Africa/Windhoek

HP Vertica Analytics Platform (7.1.x)

Page 310 of 5055

HP Vertica Documentation

America
America/Adak America/Atka US/Aleutian
America/Anchorage SystemV/YST9YDT US/Alaska
America/Anguilla
America/Antigua
America/Araguaina
America/Aruba
America/Asuncion
America/Bahia
America/Barbados
America/Belem
America/Belize
America/Boa_Vista
America/Bogota
America/Boise
America/Buenos_Aires
America/Cambridge_Bay
America/Campo_Grande
America/Cancun
America/Caracas
America/Catamarca
America/Cayenne
America/Cayman
America/Chicago CST6CDT SystemV/CST6CDT US/Central

HP Vertica Analytics Platform (7.1.x)

Page 311 of 5055

HP Vertica Documentation

America/Chihuahua
America/Cordoba America/Rosario
America/Costa_Rica
America/Cuiaba
America/Curacao
America/Danmarkshavn
America/Dawson
America/Dawson_Creek
America/Denver MST7MDT SystemV/MST7MDT US/Mountain America/Shiprock Navajo
America/Detroit US/Michigan
America/Dominica
America/Edmonton Canada/Mountain
America/Eirunepe
America/El_Salvador
America/Ensenada America/Tijuana Mexico/BajaNorte
America/Fortaleza
America/Glace_Bay
America/Godthab
America/Goose_Bay
America/Grand_Turk
America/Grenada
America/Guadeloupe
America/Guatemala
America/Guayaquil
America/Guyana

HP Vertica Analytics Platform (7.1.x)

Page 312 of 5055

HP Vertica Documentation

America/Halifax Canada/Atlantic SystemV/AST4ADT


America/Havana Cuba
America/Hermosillo
America/Indiana/Indianapolis
America/Indianapolis
America/Fort_Wayne EST SystemV/EST5 US/East-Indiana
America/Indiana/Knox America/Knox_IN US/Indiana-Starke
America/Indiana/Marengo
America/Indiana/Vevay
America/Inuvik
America/Iqaluit
America/Jamaica Jamaica
America/Jujuy
America/Juneau
America/Kentucky/Louisville America/Louisville
America/Kentucky/Monticello
America/La_Paz
America/Lima
America/Los_Angeles PST8PDT SystemV/PST8PDT US/Pacific US/Pacific- New
America/Maceio
America/Managua
America/Manaus Brazil/West
America/Martinique
America/Mazatlan Mexico/BajaSur
America/Mendoza

HP Vertica Analytics Platform (7.1.x)

Page 313 of 5055

HP Vertica Documentation

America/Menominee
America/Merida
America/Mexico_City Mexico/General
America/Miquelon
America/Monterrey
America/Montevideo
America/Montreal
America/Montserrat
America/Nassau
America/New_York EST5EDT SystemV/EST5EDT US/Eastern
America/Nipigon
America/Nome
America/Noronha Brazil/DeNoronha
America/North_Dakota/Center
America/Panama
America/Pangnirtung
America/Paramaribo
America/Phoenix MST SystemV/MST7 US/Arizona
America/Port-au-Prince
America/Port_of_Spain
America/Porto_Acre America/Rio_Branco Brazil/Acre
America/Porto_Velho
America/Puerto_Rico SystemV/AST4
America/Rainy_River
America/Rankin_Inlet

HP Vertica Analytics Platform (7.1.x)

Page 314 of 5055

HP Vertica Documentation

America/Recife
America/Regina Canada/East-Saskatchewan Canada/Saskatchewan SystemV/CST6
America/Santiago Chile/Continental
America/Santo_Domingo
America/Sao_Paulo Brazil/East
America/Scoresbysund
America/St_Johns Canada/Newfoundland
America/St_Kitts
America/St_Lucia
America/St_Thomas America/Virgin
America/St_Vincent
America/Swift_Current
America/Tegucigalpa
America/Thule
America/Thunder_Bay
America/Toronto Canada/Eastern
America/Tortola
America/Vancouver Canada/Pacific
America/Whitehorse Canada/Yukon
America/Winnipeg Canada/Central
America/Yakutat
America/Yellowknife

Antarctica
Antarctica/Casey

HP Vertica Analytics Platform (7.1.x)

Page 315 of 5055

HP Vertica Documentation

Antarctica/Davis
Antarctica/DumontDUrville
Antarctica/Mawson
Antarctica/McMurdo
Antarctica/South_Pole
Antarctica/Palmer
Antarctica/Rothera
Antarctica/Syowa
Antarctica/Vostok

Asia
Asia/Aden
Asia/Almaty
Asia/Amman
Asia/Anadyr
Asia/Aqtau
Asia/Aqtobe
Asia/Ashgabat Asia/Ashkhabad
Asia/Baghdad
Asia/Bahrain
Asia/Baku
Asia/Bangkok
Asia/Beirut
Asia/Bishkek
Asia/Brunei

HP Vertica Analytics Platform (7.1.x)

Page 316 of 5055

HP Vertica Documentation

Asia/Calcutta
Asia/Choibalsan
Asia/Chongqing Asia/Chungking
Asia/Colombo
Asia/Dacca Asia/Dhaka
Asia/Damascus
Asia/Dili
Asia/Dubai
Asia/Dushanbe
Asia/Gaza
Asia/Harbin
Asia/Hong_Kong Hongkong
Asia/Hovd
Asia/Irkutsk
Asia/Jakarta
Asia/Jayapura
Asia/Jerusalem Asia/Tel_Aviv Israel
Asia/Kabul
Asia/Kamchatka
Asia/Karachi
Asia/Kashgar
Asia/Katmandu
Asia/Krasnoyarsk
Asia/Kuala_Lumpur
Asia/Kuching

HP Vertica Analytics Platform (7.1.x)

Page 317 of 5055

HP Vertica Documentation

Asia/Kuwait
Asia/Macao Asia/Macau
Asia/Magadan
Asia/Makassar Asia/Ujung_Pandang
Asia/Manila
Asia/Muscat
Asia/Nicosia Europe/Nicosia
Asia/Novosibirsk
Asia/Omsk
Asia/Oral
Asia/Phnom_Penh
Asia/Pontianak
Asia/Pyongyang
Asia/Qatar
Asia/Qyzylorda
Asia/Rangoon
Asia/Riyadh
Asia/Riyadh87 Mideast/Riyadh87
Asia/Riyadh88 Mideast/Riyadh88
Asia/Riyadh89 Mideast/Riyadh89
Asia/Saigon
Asia/Sakhalin
Asia/Samarkand
Asia/Seoul ROK
Asia/Shanghai PRC

HP Vertica Analytics Platform (7.1.x)

Page 318 of 5055

HP Vertica Documentation

Asia/Singapore Singapore
Asia/Taipei ROC
Asia/Tashkent
Asia/Tbilisi
Asia/Tehran Iran
Asia/Thimbu Asia/Thimphu
Asia/Tokyo Japan
Asia/Ulaanbaatar Asia/Ulan_Bator
Asia/Urumqi
Asia/Vientiane
Asia/Vladivostok
Asia/Yakutsk
Asia/Yekaterinburg
Asia/Yerevan

Atlantic
Atlantic/Azores
Atlantic/Bermuda
Atlantic/Canary
Atlantic/Cape_Verde
Atlantic/Faeroe
Atlantic/Madeira
Atlantic/Reykjavik Iceland
Atlantic/South_Georgia
Atlantic/St_Helena
Atlantic/Stanley

HP Vertica Analytics Platform (7.1.x)

Page 319 of 5055

HP Vertica Documentation

Australia
Australia/ACT
Australia/Canberra
Australia/NSW
Australia/Sydney
Australia/Adelaide
Australia/South
Australia/Brisbane
Australia/Queensland
Australia/Broken_Hill
Australia/Yancowinna
Australia/Darwin
Australia/North
Australia/Hobart
Australia/Tasmania
Australia/LHI
Australia/Lord_Howe
Australia/Lindeman
Australia/Melbourne
Australia/Victoria
Australia/Perth Australia/West

Etc/GMT
Etc/GMT+1
Etc/GMT+2

HP Vertica Analytics Platform (7.1.x)

Page 320 of 5055

HP Vertica Documentation

Etc/GMT+3
Etc/GMT+4
Etc/GMT+5
Etc/GMT+6
Etc/GMT+7
Etc/GMT+8
Etc/GMT+9
Etc/GMT+10
Etc/GMT+11
Etc/GMT+12
Etc/GMT-1
Etc/GMT-2
Etc/GMT-3
Etc/GMT-4
Etc/GMT-5
Etc/GMT-6
Etc/GMT-7
Etc/GMT-8
Etc/GMT-9
Etc/GMT-10
Etc/GMT-11
Etc/GMT-12
Etc/GMT-13
Etc/GMT-14

HP Vertica Analytics Platform (7.1.x)

Page 321 of 5055

HP Vertica Documentation

Europe
Europe/Amsterdam
Europe/Andorra
Europe/Athens
Europe/Belfast
Europe/Belgrade
Europe/Ljubljana
Europe/Sarajevo
Europe/Skopje
Europe/Zagreb
Europe/Berlin
Europe/Brussels
Europe/Bucharest
Europe/Budapest
Europe/Chisinau Europe/Tiraspol
Europe/Copenhagen
Europe/Dublin Eire
Europe/Gibraltar
Europe/Helsinki
Europe/Istanbul Asia/Istanbul Turkey
Europe/Kaliningrad
Europe/Kiev
Europe/Lisbon Portugal
Europe/London GB GB-Eire

HP Vertica Analytics Platform (7.1.x)

Page 322 of 5055

HP Vertica Documentation

Europe/Luxembourg
Europe/Madrid
Europe/Malta
Europe/Minsk
Europe/Monaco
Europe/Moscow W-SU
Europe/Oslo
Arctic/Longyearbyen
Atlantic/Jan_Mayen
Europe/Paris
Europe/Prague Europe/Bratislava
Europe/Riga
Europe/Rome Europe/San_Marino Europe/Vatican
Europe/Samara
Europe/Simferopol
Europe/Sofia
Europe/Stockholm
Europe/Tallinn
Europe/Tirane
Europe/Uzhgorod
Europe/Vaduz
Europe/Vienna
Europe/Vilnius
Europe/Warsaw Poland
Europe/Zaporozhye
Europe/Zurich

HP Vertica Analytics Platform (7.1.x)

Page 323 of 5055

HP Vertica Documentation

Indian
Indian/Antananarivo
Indian/Chagos
Indian/Christmas
Indian/Cocos
Indian/Comoro
Indian/Kerguelen
Indian/Mahe
Indian/Maldives
Indian/Mauritius
Indian/Mayotte
Indian/Reunion

Pacific
Pacific/Apia
Pacific/Auckland NZ
Pacific/Chatham NZ-CHAT
Pacific/Easter
Chile/EasterIsland
Pacific/Efate
Pacific/Enderbury
Pacific/Fakaofo
Pacific/Fiji
Pacific/Funafuti

HP Vertica Analytics Platform (7.1.x)

Page 324 of 5055

HP Vertica Documentation

Pacific/Galapagos
Pacific/Gambier SystemV/YST9
Pacific/Guadalcanal
Pacific/Guam
Pacific/Honolulu HST SystemV/HST10 US/Hawaii
Pacific/Johnston
Pacific/Kiritimati
Pacific/Kosrae
Pacific/Kwajalein Kwajalein
Pacific/Majuro
Pacific/Marquesas
Pacific/Midway
Pacific/Nauru
Pacific/Niue
Pacific/Norfolk
Pacific/Noumea
Pacific/Pago_Pago
Pacific/Samoa US/Samoa
Pacific/Palau
Pacific/Pitcairn SystemV/PST8
Pacific/Ponape
Pacific/Port_Moresby
Pacific/Rarotonga
Pacific/Saipan
Pacific/Tahiti

HP Vertica Analytics Platform (7.1.x)

Page 325 of 5055

HP Vertica Documentation

Pacific/Tarawa
Pacific/Tongatapu
Pacific/Truk
Pacific/Wake
Pacific/Wallis
Pacific/Yap

HP Vertica Analytics Platform (7.1.x)

Page 326 of 5055

Administrator's Guide

HP Vertica Analytics Platform (7.1.x)

Page 327 of 5055

HP Vertica Documentation

Administration Overview
This document describes the functions performed by an HP Vertica database administrator (DBA).
Perform these tasks using only the dedicated database administrator account that was created
when you installed HP Vertica. The examples in this documentation set assume that the
administrative account name is dbadmin.
l

To perform certain cluster configuration and administration tasks, the DBA (users of the
administrative account) must be able to supply the root password for those hosts. If this
requirement conflicts with your organization's security policies, these functions must be
performed by your IT staff.

If you perform administrative functions using a different account from the account provided
during installation, HP Vertica encounters file ownership problems.

If you share the administrative account password, make sure that only one user runs the
Administration Tools at any time. Otherwise, automatic configuration propagation does not
work correctly.

The Administration Tools require that the calling user's shell be /bin/bash. Other shells give
unexpected results and are not supported.

HP Vertica Analytics Platform (7.1.x)

Page 328 of 5055

HP Vertica Documentation

Managing Licenses
You must license HP Vertica in order to use it. Hewlett-Packard supplies your license information
to you in the form of one or more license files, which encode the terms of your license. Several
licenses are available:
l

vlicense.dat, for columnar tables.

flextables.key, for Flex Zone flexible tables.

vlicense_565_bytes.dat, for data stored in a Hadoop environment with HPVertica for SQL on
Hadoop.

To prevent introducing special characters (such as line endings or file terminators) into the license
key file, do not open the file in an editor or e-mail client. Though special characters are not always
visible in an editor, their presence invalidates the license.

Copying Enterprise, Evaluation, Flex Zone, and SQL on


Hadoop License Files
For ease of HP Vertica Enterprise Edition and SQL on Hadoop installation, HP recommends that
you copy the license file to /tmp/vlicense.dat on the Administration host.
If you have a license for Flex Zone, HP recommends that you copy the license file to
/opt/vertica/config/share/license.com.vertica.flextables.key.
Be careful not to change the license key file in any way when copying the file between Windows
and Linux, or to any other location. To help prevent applications from trying to alter the file, enclose
the license file in an archive file (such as a .zip or .tar file).
After copying the license file from one location to another, check that the copied file size is identical
to that of the one you received from HP Vertica.

Obtaining a License Key File


To obtain an Enterprise Edition, Evaluation, Flex Zone, or SQL on Hadoop license key, contact HP
Vertica at: http://www.vertica.com/about/contact-us/
Your HP Vertica Community Edition download package includes the Community Edition license,
which allows three nodes and 1TB of data. The HP Vertica Community Edition license does not
expire.

HP Vertica Analytics Platform (7.1.x)

Page 329 of 5055

HP Vertica Documentation

Understanding HP Vertica Licenses


HP Vertica has flexible licensing terms. It can be licensed on the following bases:
l

Term-based (valid until a specific date)

Raw data size based (valid to store up to some amount of raw data)

Both term-based and data-size-based

Unlimited duration and data storage

Raw data size based and a limit of 3 nodes (HP Vertica Community Edition)

Node-based and an unlimited number of CPUs and users (one node is a server acting as a single
computer system, whether physical or virtual)

Your license key has your licensing bases encoded into it. If you are unsure of your current license,
you can view your license information from within HP Vertica.
HPVertica for SQLon Hadoop is a separate product with its own license. This documentation
covers both products.

HP Vertica Analytics Platform License Types


HP Vertica Analytics Platform is a full-featured offering with all analytical functions described in this
documentation. It is best used for advanced analytics and enterprise data warehousing. There are
two editions, Community Edition and Enterprise Edition.
HP Vertica Community Edition. You can download and start using Community Edition for free.
The Community Edition license allows customers the following:
l

3 node limit

1 terabyte columnar table data limit

1 terabyte Flex table data limit

Community Edition licenses cannot be installed co-located in a Hadoop infrastructure and used to
query data stored in Hadoop formats.
HP Vertica Enterprise Edition. You can purchase the Enterprise Edition license. The Enterprise
Edition license entitles customers to:

HP Vertica Analytics Platform (7.1.x)

Page 330 of 5055

HP Vertica Documentation

No node limit

Columnar data, amount specified by the license

1 terabyte Flex table data

Query data stored in HDFS using the HCatalog Connector or HDFSConnector, and back up HP
Vertica data to HDFS

Enterprise Edition licenses cannot be installed co-located in a Hadoop infrastructure and used to
query data stored in Hadoop formats, and exclude the ability to read ORC Files directly.
Flex Zone. Flex Zone is a license for the flex tables technology, available in version 7.0.
Customers can separately purchase and apply a Flex Zone license to their installation. The Flex
Zone license entitles customers to the licensed amount of Flex table data and removes the 3 node
restriction imposed by the Community Edition.
Customers whose primary goal is to work with flex tables can purchase a Flex Zone license. When
they purchase Flex Zone, customers receive a complimentary Enterprise License, which entitles
them to one terabyte of columnar data and imposes no node limit.
Note: Customers who purchase a Flex Zone license must apply two licenses: their Enterprise
Edition license and their Flex Zone license.

Allowances

Community
Edition

Enterprise
Edition

Enterprise Edition + Flex


Zone

Flex
Zone

Node Limit

3 nodes

Unlimited

Unlimited

Unlimited

Columnar
Data

1 terabyte

Per license

Per license

Flex Data

1 terabyte

terabyte
1 terabyte

Per license

Per
license

HPVertica for SQLon Hadoop License


Available in version 7.1, HPVertica for SQLon Hadoop is a license for running HPVertica on a
Hadoop environment. This allows users to run Vertica on data that is in a shared storage
environment. It is best used for exploring data in a Hadoop data lake. It can be used only in colocated Hadoop environments to query data stored in Hadoop (Hortonworks, MapR, or Cloudera).

HP Vertica Analytics Platform (7.1.x)

Page 331 of 5055

HP Vertica Documentation

Customers can purchase this term-based SQLon Hadoop license per the number of nodes they
plan to use in their Hadoop environment. The license then audits the number of nodes being used
for compliance.
Users have access to a subset of the Enterprise Edition features with this license.
Features

Enterprise Edition

SQL on Hadoop

HP Vertica Distributed R

Yes

No

HPVertica Pulse

Yes

No

HPVertica Place

Yes

No

Key Value

Yes

Yes

Live Aggregate Projections

Yes

No

Time Series Analytics

Yes

Yes

Text Search

Yes

No

Workload Management

Yes

Yes

Advanced Workload

Yes

No

Database Designer

Yes

Yes

Management Console

Yes

Yes

Backup and Restore

Yes

No

Flex Zone

Yes

Yes

External Tables

Yes

Yes

File Formats

Text, Flex, ROS, Hadoop

Text, Flex ROS, Hadoop

Formats

Formats

EXT4, HDFS, MapR NFS

HDFS, MapR NFS

Yes

Yes

No

Yes

Management (Cascading
Resource Pools)

Primary File System for


Storage
Hadoop Technologies (Hcat,
HDFS, Map Reduce Conn)
ORC Reader

HP Vertica Analytics Platform (7.1.x)

Page 332 of 5055

HP Vertica Documentation

MPPSQL Engine

Yes

Yes

Hadoop Distribution

Hortonworks, Cloudera,

Hortonworks, Cloudera, MapR

Compatibility

MapR

Java User-Defined

Yes

Yes

C++ User-Defined Extensions

Yes

No

R User-Defined Extensions

Yes

No

Extensions

Installing or Upgrading a License Key


The steps you follow to apply your HP Vertica license key vary, depending on the type of license
you are applying and whether you are upgrading your license. This section describes the following:
l

New HP Vertica License Installations

HP Vertica License Renewals or Upgrades

Flex Zone License Installations

SQL on Hadoop License Installations

New HP Vertica License Installations


1. Copy the license key file to your Administration Host.
2. Ensure the license key's file permissions are set to at least 666 (read and write permissions for
all users).
3. Install HP Vertica as described in the Installation Guide if you have not already done so. The
interface prompts you for the license key file.
4. To install Community Edition, leave the default path blank and press OK. To apply your
evaluation or Enterprise Edition license, enter the absolute path of the license key file you
downloaded to your Administration Host and press OK. The first time you log in as the
Database Administrator and run the Administration Tools, the interface prompts you to
accept the End-User License Agreement (EULA).

HP Vertica Analytics Platform (7.1.x)

Page 333 of 5055

HP Vertica Documentation

Note: If you installed Management Console, the MC administrator can point to the
location of the license key during Management Console configuration.

5. Choose View EULA to review the EULA.


6. Exit the EULA and choose Accept EULA to officially accept the EULA and continue installing
the license, or choose Reject EULA to reject the EULA and return to the Advanced Menu.

HP Vertica License Renewals or Upgrades


If your license is expiring or you want your database to grow beyond your licensed data size, you
must renew or upgrade your license. Once you have obtained your renewal or upgraded license key
file, you can install it using Administration Tools or Management Console.

Uploading or Upgrading a License Key Using Administration Tools


1. Copy the license key file to your Administration Host.
2. Ensure the license key's file permissions are set to at least 666 (read and write permissions for
all users).
3. Start your database, if it is not already running.
4. In the Administration Tools, select Advanced > Upgrade License Key and click OK.
5. Enter the path to your new license key file and click OK. The interface prompts you to accept
the End-User License Agreement (EULA).
6. Choose View EULA to review the EULA.
7. Exit the EULA and choose Accept EULA to officially accept the EULA and continue installing
the license, or choose Reject EULA to reject the EULA and return to the Advanced Tools
menu.

Uploading or Upgrading a License Key Using Management Console


1. From your database's Overview page in Management Console, click the License tab. The
License page displays. You can view your installed licenses on this page.
2. Click the Install New License button at the top of the License page.

HP Vertica Analytics Platform (7.1.x)

Page 334 of 5055

HP Vertica Documentation

3. Browse to the location of the license key from your local computer (where the web browser is
installed) and upload the file.
4. Click the Apply button at the top of the page. The interface prompts you to accept the End-User
License Agreement (EULA).
5. Select the check box to officially accept the EULA and continue installing the license, or click
Cancel to exit.

Note: As soon as you renew or upgrade your license key from either your Administration
Host or Management Console, HP Vertica applies the license update. No further warnings
appear.

Flex Table License Installations


Installing a Flex Table license using vsql
1. Install HP Vertica as described in the Installation Guide if you have not already done so.
2. Copy the Flex Zone flex tables license key file to your Administration Host. HP recommends
that you copy the license file to
/opt/vertica/config/share/license.com.vertica.flextables.key.
3. Start your database, if it is not already running.
4. In the Administration Tools, connect to your database.
5. At the vsql prompt, select INSTALL_LICENSE as described in the SQL Reference Manual.
=> SELECT INSTALL_LICENSE
('/opt/vertica/config/share/license.com.vertica.flextables.key');

Installing a Flex Table license using Management Console


1. Start Management Console.
2. From your database's Overview page in Management Console, click the License tab. The
License page displays. You can view your installed licenses on this page.
3. Click the Install New License button at the top of the License page.

HP Vertica Analytics Platform (7.1.x)

Page 335 of 5055

HP Vertica Documentation

4. Browse to the location of the license key from your local computer (where the web browser is
installed) and upload the file.
5. Click the Apply button at the top of the page. The interface prompts you to accept the End-User
License Agreement (EULA).
6. Select the check box to officially accept the EULA and continue installing the license, or click
Cancel to exit.

Viewing Your License Status


HP Vertica has several functions to display your license terms and current status.

Examining Your License Key


Use the DISPLAY_LICENSE SQL function described in the SQL Reference Manual to display the
license information. This function displays the dates for which your license is valid (or Perpetual if
your license does not expire) and any raw data allowance. For example:
=> SELECT DISPLAY_LICENSE();
DISPLAY_LICENSE
---------------------------------------------------HP Vertica Systems, Inc.
1/1/2011
12/31/2011
30
50TB
(1 row)

Or, use the LICENSES table described in the SQL Reference Manual to view information about all
your installed licenses. This table displays your license types, the dates for which your licenses are
valid, and the size and node limits your licenses impose. In the example below, the licenses table
displays the Community Edition license and the default license that controls HP Vertica's flex data
capacity.
=> SELECT * FROM licenses; \x
-[ RECORD 1 ]--------+---------------------------------------license_id
| 45035996273704986
name
| vertica
licensee
| Vertica Community Edition
start_date
| 2011-11-22
end_date
| Perpetual
size
| 1TB
is_community_edition | t
node_restriction
| 3

HP Vertica Analytics Platform (7.1.x)

Page 336 of 5055

HP Vertica Documentation

-[ RECORD 2 ]--------+---------------------------------------license_id
| 45035996274085644
name
| com.vertica.flextable
licensee
| Vertica Community Edition, Flextable
start_date
| 2013-10-29
end_date
| Perpetual
size
| 1TB
is_community_edition | t
node_restriction
|

You can also view the LICENSES table inManagement Console. On your database's Overview
page in Management Console, click the License tab. The License page displays information about
your installed licenses.

Viewing Your License Status


If your license includes a raw data size allowance, HP Vertica periodically audits your database's
size to ensure it remains compliant with the license agreement. If your license has an end date, HP
Vertica also periodically checks to see if the license has expired. You can see the result of the
latest audits using the GET_COMPLIANCE_STATUS function.
GET_COMPLIANCE_STATUS
--------------------------------------------------------------------------------Raw Data Size: 2.00GB +/- 0.003GB
License Size : 4.000GB
Utilization : 50%
Audit Time
: 2011-03-09 09:54:09.538704+00
Compliance Status : The database is in compliance with respect to raw data size.
License End Date: 04/06/2011
Days Remaining: 28.59
(1 row)

Viewing Your License Status Through MC


Information about license usage is on the Settings page. See Monitoring Database Size for License
Compliance.

Calculating the Database Size


You can use your HP Vertica software until your columnar data reaches the maximum raw data
size that the license agreement provides. This section describes when data is monitored, what data
is included in the estimate, and the general methodology used to produce an estimate. For more
information about monitoring for data size, see Monitoring Database Size for License Compliance.

HP Vertica Analytics Platform (7.1.x)

Page 337 of 5055

HP Vertica Documentation

How HP Vertica Estimates Raw Data Size


HP Vertica uses statistical sampling to calculate an accurate estimate of the raw data size of the
database. In this context, raw data means the uncompressed data stored in a single HP Vertica
database. For the purpose of license size audit and enforcement, HP Vertica evaluates the raw
data size as if the data had been exported from the database in text format, rather than as
compressed data.
HP Vertica conducts your database size audit using statistical sampling. This method allows HP
Vertica to estimate the size of the database without significantly impacting database performance.
The trade-off between accuracy and impact on performance is a small margin of error, inherent in
statistical sampling. Reports on your database size include the margin of error, so you can assess
the accuracy of the estimate. To learn more about simple random sampling, see Simple Random
Sampling.

Excluding Data From Raw Data Size Estimate


Not all data in the HP Vertica database is evaluated as part of the raw data size. Specifically, HP
Vertica excludes the following data:
l

Multiple projections (underlying physical copies) of data from a logical database entity (table).
Data appearing in multiple projections of the same table is counted only once.

Data stored in temporary tables.

Data accessible through external table definitions.

Data that has been deleted, but which remains in the database. To understand more about
deleting and purging data, see Purging Deleted Data.

Data stored in the WOS.

Data stored in system and work tables such as monitoring tables, Data Collector tables, and
Database Designer tables.

Delimiter characters.

Evaluating Data Type Footprint Size


The data sampled for the estimate is treated as if it had been exported from the database in text
format (such as printed from vsql). This means that HP Vertica evaluates the data type footprint

HP Vertica Analytics Platform (7.1.x)

Page 338 of 5055

HP Vertica Documentation

sizes as follows:
l

Strings and binary types (CHAR, VARCHAR, BINARY, VARBINARY) are counted as their
actual size in bytes using UTF-8 encoding.

Numeric data types are counted as if they had been printed. Each digit counts as a byte, as does
any decimal point, sign, or scientific notation. For example, -123.456 counts as eight bytes (six
digits plus the decimal point and minus sign).

Date/time data types are counted as if they had been converted to text, including any hyphens or
other separators. For example, a timestamp column containing the value for noon on July 4th,
2011 would be 19 bytes. As text, vsql would print the value as 2011-07-04 12:00:00, which is 19
characters, including the space between the date and the time.

Using AUDIT to Estimate Database Size


To supply a more accurate database size estimate than statistical sampling can provide, use the
AUDIT function to perform a full audit. This function has parameters to set both the error_
tolerance and confidence_level. Using one or both of these parameters increases or decreases
the function's performance impact.
For instance, lowering the error_tolerance to zero (0) and raising the confidence_level to 100,
provides the most accurate size estimate, and increases the performance impact of calling the
AUDIT function. During a detailed, low error-tolerant audit, all of the data in the database is dumped
to a raw format to calculate its size. Since performing a stringent audit can significantly impact
database performance, never perform a full audit of a production database. See AUDIT for details.
Note: Unlike estimating raw data size using statistical sampling, a full audit performs SQL
queries on the full database contents, including the contents of the WOS.

Monitoring Database Size for License Compliance


Your HP Vertica license can include a data storage allowance. The allowance can be for columnar
data, for flex table data, or for both types of data (two separate licenses). The audit()function
estimates the columnar table data size, while the audit_flex() function calculates the amount of
flex table data storage. Monitoring data sizes for columnar and flex tables lets you plan either to
schedule deleting old data to keep your database in compliance with your license, or to budget for a
license upgrade for additional data storage.
Note: An audit of columnar data includes any materialized columns in flex tables.

HP Vertica Analytics Platform (7.1.x)

Page 339 of 5055

HP Vertica Documentation

Viewing Your License Compliance Status


HP Vertica periodically runs an audit of the columnar data size to verify that your database remains
compliant with your license. You can view the results of the most recent audit by calling the GET_
COMPLIANCE_STATUS function.
GET_COMPLIANCE_STATUS
--------------------------------------------------------------------------------Raw Data Size: 2.00GB +/- 0.003GB
License Size : 4.000GB
Utilization : 50%
Audit Time
: 2011-03-09 09:54:09.538704+00
Compliance Status : The database is in compliance with respect to raw data size.
License End Date: 04/06/2011
Days Remaining: 28.59
(1 row)

Periodically running GET_COMPLIANCE_STATUS to monitor your database's license status is


usually enough to ensure that your database remains compliant with your license. If your database
begins to near its columnar data allowance, you can use the other auditing functions described
below to determine where your database is growing and how recent deletes have affected the size
of your database.

Manually Auditing Columnar Data Usage


You can manually check license compliance for all columnar data in your database using the
AUDIT_LICENSE_SIZE SQL function. This function performs the same audit that HP Vertica
periodically performs automatically. The AUDIT_LICENSE_SIZE SQL check runs in the
background, so the function returns immediately. You can then view the audit results using GET_
COMPLIANCE_STATUS.
Note: When you audit columnar data, the results include any flexible table virtual columns that
you have materialized. Materialized columns include columns that you specify when creating a
flex table, and any that promote from virtual columns to real columns.
An alternative to AUDIT_LICENSE_SIZE is to use the AUDIT SQL function to audit the size of the
columnar tables in your entire database by passing an empty string to the function. Unlike AUDIT_
LICENSE_SIZE, this function operates synchronously, returning when it has estimated the size of
the database.
=> SELECT AUDIT('');

HP Vertica Analytics Platform (7.1.x)

Page 340 of 5055

HP Vertica Documentation

AUDIT
---------76376696
(1 row)

The size of the database is reported in bytes. The AUDIT function also allows you to control the
accuracy of the estimated database size using additional parameters. See the entry for the AUDIT
function in the SQL Reference Manual for full details. HP Vertica does not count the AUDIT
function results as an official audit. It takes no license compliance actions based on the results.
Note: The results of the AUDIT function do not include flexible table data. Use the AUDIT_
FLEXfunction to monitor data usage for your Flex Tables license .

Manually Auditing Flex Table Data Usage


You can use the AUDIT_FLEX function to manually audit data usage for one or more flexible tables.
The function measures encoded, compressed data stored in ROS containers for the __raw__
column of one or more flexible tables. The audit results include only virtual columns in flex tables,
not data included in materialized columns. Temporary flex tables are not included in the audit.

Targeted Auditing
If audits determine that the columnar table estimates are unexpectedly large, consider schemas,
tables, or partitions are using the most storage. You can use the AUDIT function to perform
targeted audits of schemas, tables, or partitions by supplying the name of the entity whose size you
want to find. For example, to find the size of the online_sales schema in the VMart example
database, run the following command:
VMart=> SELECT AUDIT('online_sales');
AUDIT
---------35716504
(1 row)

You can also change the granularity of an audit to report the size of each entity in a larger entity (for
example, each table in a schema) by using the granularity argument of the AUDIT function. See the
AUDIT function's entry in the SQL Reference Manual.

HP Vertica Analytics Platform (7.1.x)

Page 341 of 5055

HP Vertica Documentation

Using Management Console to Monitor License Compliance


You can also get information about data storage of columnar data (for columnar tables and for
materialized columns in flex tables) through the Management Console. This information is available
in the database Overview page, which displays a grid view of the database's overall health.
l

The needle in the license meter adjusts to reflect the amount used in megabytes.

The grace period represents the term portion of the license.

The Audit button returns the same information as the AUDIT() function in a graphical
representation.

The Details link within the License grid (next to the Audit button) provides historical information
about license usage. This page also shows a progress meter of percent used toward your
license limit.

Managing License Warnings and Limits


Term License Warnings and Expiration
The term portion of an HP Vertica license is easy to manageyou are licensed to use HP Vertica
until a specific date. If the term of your license expires, HP Vertica alerts you with messages
appearing in the Administration Tools and vsql. For example:
=> CREATE TABLE T (A INT);NOTICE: Vertica license is in its grace period
HINT: Renew at http://www.vertica.com/
CREATE TABLE

Contact HP Vertica at http://www.vertica.com/about/contact-us/ as soon as possible to renew


your license, and then install the new license. After the grace period expires, HP Vertica stops
processing queries.

Data Size License Warnings and Remedies


If your HP Vertica columnar license includes a raw data size allowance, HP Vertica periodically
audits the size of your database to ensure it remains compliant with the license agreement. For
details of this audit, see Calculating the Database Size. You should also monitor your database
size to know when it will approach licensed usage. Monitoring the database size helps you plan to
either upgrade your license to allow for continued database growth or delete data from the database

HP Vertica Analytics Platform (7.1.x)

Page 342 of 5055

HP Vertica Documentation

so you remain compliant with your license. See Monitoring Database Size for License Compliance
for details.
If your database's size approaches your licensed usage allowance, you will see warnings in the
Administration Tools and vsql. You have two options to eliminate these warnings:
l

Upgrade your license to a larger data size allowance.

Delete data from your database to remain under your licensed raw data size allowance. The
warnings disappear after HP Vertica's next audit of the database size shows that it is no longer
close to or over the licensed amount. You can also manually run a database audit (see
Monitoring Database Size for License Compliance for details).

If your database continues to grow after you receive warnings that its size is approaching your
licensed size allowance, HP Vertica displays additional warnings in more parts of the system after
a grace period passes.

If Your HP Vertica Enterprise Edition Database Size Exceeds Your


Licensed Limits
If your Enterprise Edition database size exceeds your licensed data allowance, all successful
queries from ODBC and JDBC clients return with a status of SUCCESS_WITH_INFO instead of
the usual SUCCESS. The message sent with the results contains a warning about the database
size. Your ODBC and JDBC clients should be prepared to handle these messages instead of
assuming that successful requests always return SUCCESS.

If Your HP VerticaCommunity Edition Database Size Exceeds Your


Licensed Limits
If your Community Edition database size exceeds your licensed data allowance, you will no longer
be able to load or modify data in your database. In addition, you will not be able to delete data from
your database.
To bring your database under compliance, you can choose to:
l

Drop database tables

Upgrade to HP Vertica Enterprise Edition (or an evaluation license)

HP Vertica Analytics Platform (7.1.x)

Page 343 of 5055

HP Vertica Documentation

Exporting License Audit Results to CSV


You can use admintools to audit a database for license compliance and export the results in CSV
format, as follows:
admintools -t license_audit [--password=password] --database=database] [--file=csv-file]
[--quiet]

where:
l

database must be a running database. If the database is passport protected, you must also
supply the password.

--file csv-file directs output to the specified file. If csv-file already exists, the tool returns
with an error message. If this option is unspecified, output is directed to stdout.

--quiet specifies to run the tool in quiet mode; if unspecified, status messages are sent to
stdout.

Running the license_audit tool is equivalent to invoking the following SQL statements:
select
select
select
select
select

audit('');
audit_flex('');
* from dc_features_used;
* from vcatalog.license_audits;
* from vcatalog.user_audits;

Audit results include the following information:


l

Log of used HP Vertica features

Estimated database size

Raw data size allowed by your HP Vertica license

Percentage of licensed allowance that the database currently uses

Audit timestamps

The following truncated example shows the raw CSV output that license_audit generates:
FEATURES_USED
features_used,feature,date,sum
features_used,metafunction::get_compliance_status,2014-08-04,1
features_used,metafunction::bootstrap_license,2014-08-04,1
...

HP Vertica Analytics Platform (7.1.x)

Page 344 of 5055

HP Vertica Documentation

LICENSE_AUDITS
license_audits,database_size_bytes,license_size_bytes,usage_percent,audit_start_
timestamp,audit_end_timestamp,confidence_level_percent,error_tolerance_percent,used_
sampling,confidence_interval_lower_bound_bytes,confidence_interval_upper_bound_
bytes,sample_count,cell_count,license_name
license_audits,808117909,536870912000,0.00150523690320551,2014-08-04 23:59:00.02487404,2014-08-04 23:59:00.578419-04,99,5,t,785472097,830763721,10000,174754646,vertica
...
USER_AUDITS
user_audits,size_bytes,user_id,user_name,object_id,object_type,object_schema,object_
name,audit_start_timestamp,audit_end_timestamp,confidence_level_percent,error_tolerance_
percent,used_sampling,confidence_interval_lower_bound_bytes,confidence_interval_upper_
bound_bytes,sample_count,cell_count
user_audits,812489249,45035996273704962,dbadmin,45035996273704974,DATABASE,,VMart,201410-14 11:50:13.230669-04,2014-10-14 11:50:14.06905704,99,5,t,789022736,835955762,10000,174755178
AUDIT_SIZE_BYTES
audit_size_bytes,now,audit
audit_size_bytes,2014-10-14 11:52:14.015231-04,810584417
FLEX_SIZE_BYTES
flex_size_bytes,now,audit_flex
flex_size_bytes,2014-10-14 11:52:15.117036-04,11850

HP Vertica Analytics Platform (7.1.x)

Page 345 of 5055

HP Vertica Documentation

Configuring the Database


This section provides information about:
l

The Configuration Procedure

Configuration Parameters

Designing a logical schema

Creating the physical schema

You'll also want to set up a security scheme. See Implementing Security.


See also implementing locales for international data sets.
Note: Before you begin this section, HP strongly recommends that you follow the Tutorial in
the Getting Started Guide to quickly familiarize yourself with creating and configuring a fullyfunctioning example database.

HP Vertica Analytics Platform (7.1.x)

Page 346 of 5055

HP Vertica Documentation

Configuration Procedure
This section describes the tasks required to set up an HP Vertica database. It assumes that you
have obtained a valid license key file, installed the HP Vertica rpm package, and run the installation
script as described in the Installation Guide.
You'll complete the configuration procedure using the:
l

Administration Tools
If you are unfamiliar with Dialog-based user interfaces, read Using the Administration Tools
Interface before you begin. See also the Administration Tools Reference for details.

vsql interactive interface

The Database Designer, described fully in Creating a Database Design

Note: Users can also perform certain tasks using the Management Console. Those tasks will
point to the appropriate topic.

IMPORTANT NOTES
l

Follow the configuration procedure in the order presented in this book.

HP strongly recommends that you first use the Tutorial in the Getting Started Guide to
experiment with creating and configuring a database.

Although you may create more than one database (for example, one for production and one for
testing), you may create only one active database for each installation of Vertica Analytics
Platform

The generic configuration procedure described here can be used several times during the
development process and modified each time to fit changing goals. You can omit steps such as
preparing actual data files and sample queries, and run the Database Designer without
optimizing for queries. For example, you can create, load, and query a database several times
for development and testing purposes, then one final time to create and load the production
database.

HP Vertica Analytics Platform (7.1.x)

Page 347 of 5055

HP Vertica Documentation

Prepare Disk Storage Locations


You must create and specify directories in which to store your catalog and data files (physical
schema). You can specify these locations when you install or configure the database, or later
during database operations.
The directory you specify for your catalog files (the catalog path) is used across all nodes. That is, if
you specify /home/catalog for your catalog files, HP Vertica will use /home/catalog as the catalog
path on all nodes. The catalog directory should always be separate from any data files.
Note: Do not use a shared directory for more than one node. Data and catalog directories must
be distinct for each node. Multiple nodes must not be allowed to write to the same data or
catalog directory.
The same is true for your data path. If you specify that your data should be stored in /home/data,
HP Vertica ensures this is the data path used on all database nodes.
Do not use a single directory to contain both catalog and data files. You can store the catalog and
data directories on different drives, which can be either on drives local to the host (recommended for
the catalog directory) or on a shared storage location, such as an external disk enclosure or a SAN.
Both the catalog and data directories must be owned by the database administrator.
Before you specify a catalog or data path, be sure the parent directory exists on all nodes of your
database. The database creation process in admintools creates the actual directories, but the
parent directory must exist on all nodes.
You do not need to specify a disk storage location during installation. However, you can by using
the --data-dir parameter to the install_vertica script. See Specifying Disk Storage Location
During Installation

See Also
l

Specifying Disk Storage Location on MC

Specifying Disk Storage Location During Database Creation

Configuring Disk Usage to Optimize Performance

Using Shared Storage With HP Vertica

HP Vertica Analytics Platform (7.1.x)

Page 348 of 5055

HP Vertica Documentation

Specifying Disk Storage Location During Installation


There are three ways to specify the disk storage location. You can specify the location when you:
l

Install HP Vertica

Create a database using the Administration Tools

Install and configure Management Console

To Specify the Disk Storage Location When You install:


When you install HP Vertica, the --data_dir parameter in the install_vertica script (see
Installing HP Vertica with the install_vertica Script) lets you specify a directory to contain database
data and catalog files. The script defaults to the Database Administrator's default home directory:
/home/dbadmin.
You should replace this default with a directory that has adequate space to hold your data and
catalog files.
Before you create a database, verify that the data and catalog directory exists on each node in the
cluster. Also verify that the directory on each node is owned by the database administrator.

Notes
l

Catalog and data path names must contain only alphanumeric characters and cannot have
leading space characters. Failure to comply with these restrictions will result in database
creation failure.

HP Vertica refuses to overwrite a directory if it appears to be in use by another database.


Therefore, if you created a database for evaluation purposes, dropped the database, and want to
reuse the database name, make sure that the disk storage location previously used has been
completely cleaned up. See Managing Storage Locations for details.

Specifying Disk Storage Location During Database Creation


When you invoke the Create Database command in the Administration Tools, a dialog box allows
you to specify the catalog and data locations. These locations must exist on each host in the
cluster and must be owned by the database administrator.

HP Vertica Analytics Platform (7.1.x)

Page 349 of 5055

HP Vertica Documentation

When you click OK, HP Vertica automatically creates the following subdirectories:
catalog-pathname/database-name/node-name_catalog/data-pathname/database-name/node-name_
data/

For example, if you use the default value (the database administrator's home directory) of
/home/dbadmin for the Stock Exchange example database, the catalog and data directories are
created on each node in the cluster as follows:
/home/dbadmin/Stock_Schema/stock_schema_node1_host01_catalog/home/dbadmin/Stock_
Schema/stock_schema_node1_host01_data

Notes
l

Catalog and data path names must contain only alphanumeric characters and cannot have
leading space characters. Failure to comply with these restrictions will result in database
creation failure.

HP Vertica refuses to overwrite a directory if it appears to be in use by another database.


Therefore, if you created a database for evaluation purposes, dropped the database, and want to
reuse the database name, make sure that the disk storage location previously used has been
completely cleaned up. See Managing Storage Locations for details.

Specifying Disk Storage Location on MC


You can use the MC interface to specify where you want to store database metadata on the cluster
in the following ways:
l

When you configure MC the first time

When you create new databases using on MC

See Configuring Management Console.

HP Vertica Analytics Platform (7.1.x)

Page 350 of 5055

HP Vertica Documentation

Configuring Disk Usage to Optimize Performance


Once you have created your initial storage location, you can add additional storage locations to the
database later. Not only does this provide additional space, it lets you control disk usage and
increase I/O performance by isolating files that have different I/O or access patterns. For example,
consider:
l

Isolating execution engine temporary files from data files by creating a separate storage location
for temp space.

Creating labeled storage locations and storage policies, in which selected database objects are
stored on different storage locations based on measured performance statistics or predicted
access patterns.

See Managing Storage Locations for details.

Using Shared Storage With HP Vertica


If using shared SAN storage, ensure there is no contention among the nodes for disk space or
bandwidth.
l

Each host must have its own catalog and data locations. Hosts cannot share catalog or data
locations.

Configure the storage so that there is enough I/O bandwidth for each node to access the storage
independently.

Viewing Database Storage Information


You can view node-specific information on your HP Vertica cluster through the Management
Console. See Monitoring HP Vertica Using MC for details.

Disk Space Requirements for HP Vertica


In addition to actual data stored in the database, HP Vertica requires disk space for several data
reorganization operations, such as mergeout and managing nodes in the cluster. For best results,
HP recommends that disk utilization per node be no more than sixty percent (60%) for a K-Safe=1
database to allow such operations to proceed.
In addition, disk space is temporarily required by certain query execution operators, such as hash
joins and sorts, in the case when they cannot be completed in memory (RAM). Such operators
might be encountered during queries, recovery, refreshing projections, and so on. The amount of

HP Vertica Analytics Platform (7.1.x)

Page 351 of 5055

HP Vertica Documentation

disk space needed (known as temp space) depends on the nature of the queries, amount of data on
the node and number of concurrent users on the system. By default, any unused disk space on the
data disk can be used as temp space. However, HP recommends provisioning temp space
separate from data disk space. See Configuring Disk Usage to Optimize Performance.

Disk Space Requirements for Management Console


You can install MC on any node in the cluster, so there are no special disk requirements for MC
other than disk space you would normally allocate for your database cluster. See Disk Space
Requirements for HP Vertica.

Prepare the Logical Schema Script


Designing a logical schema for an HP Vertica database is no different from designing one for any
other SQL database. Details are described more fully in Designing a Logical Schema.
To create your logical schema, prepare a SQL script (plain text file, typically with an extension of
.sql) that:
1. Creates additional schemas (as necessary). See Using Multiple Schemas.
2. Creates the tables and column constraints in your database using the CREATE TABLE
command.
3. Defines the necessary table constraints using the ALTER TABLE command.
4. Defines any views on the table using the CREATE VIEW command.
You can generate a script file using:
l

A schema designer application.

A schema extracted from an existing database.

A text editor.

One of the example database example-name_define_schema.sql scripts as a template. (See


the example database directories in /opt/vertica/examples.)

In your script file, make sure that:


l

Each statement ends with a semicolon.

You use data types supported by HP Vertica, as described in the SQL Reference Manual.

HP Vertica Analytics Platform (7.1.x)

Page 352 of 5055

HP Vertica Documentation

Once you have created a database, you can test your schema script by executing it as described in
Create the Logical Schema. If you encounter errors, drop all tables, correct the errors, and run the
script again.

Prepare Data Files


Prepare two sets of data files:
l

Test data files. Use test files to test the database after the partial data load. If possible, use part
of the actual data files to prepare the test data files.

Actual data files. Once the database has been tested and optimized, use your data files for your
initial Bulk Loading Data.

How to Name Data Files


Name each data file to match the corresponding table in the logical schema. Case does not matter.
Use the extension .tbl or whatever you prefer. For example, if a table is named Stock_Dimension,
name the corresponding data file stock_dimension.tbl. When using multiple data files, append _
nnn (where nnn is a positive integer in the range 001 to 999) to the file name. For example, stock_
dimension.tbl_001, stock_dimension.tbl_002, and so on.

Prepare Load Scripts


Note: You can postpone this step if your goal is to test a logical schema design for validity.
Prepare SQL scripts to load data directly into physical storage using the COPY...DIRECT
statement using vsql, or through ODBC as described in the Connecting to HP Vertica Guide.
You need scripts that load the:
l

Large tables

Small tables

HP recommends that you load large tables using multiple files. To test the load process, use files of
10GB to 50GB in size. This size provides several advantages:
l

You can use one of the data files as a sample data file for the Database Designer.

You can load just enough data to Perform a Partial Data Load before you load the remainder.

HP Vertica Analytics Platform (7.1.x)

Page 353 of 5055

HP Vertica Documentation

If a single load fails and rolls back, you do not lose an excessive amount of time.

Once the load process is tested, for multi-terabyte tables, break up the full load in file sizes of
250500GB.

See the Bulk Loading Data and the following additional topics for details:
l

Bulk Loading Data

Using Load Scripts

Using Parallel Load Streams

Loading Data into Pre-Join Projections

Enforcing Constraints

About Load Errors

Tip: You can use the load scripts included in the example databases in the Getting Started
Guide as templates.

Create an Optional Sample Query Script


The purpose of a sample query script is to test your schema and load scripts for errors.
Include a sample of queries your users are likely to run against the database. If you don't have any
real queries, just write simple SQL that collects counts on each of your tables. Alternatively, you
can skip this step.

HP Vertica Analytics Platform (7.1.x)

Page 354 of 5055

HP Vertica Documentation

Create an Empty Database


Two options are available for creating an empty database:
l

Using the Management Console

Using Administration Tools

Creating a Database Name and Password


Database name must conform to the following rules:
l

Be between 1-30 characters

Begin with a letter

Follow with any combination of letters (upper and lowercase), numbers, and/or underscores.

Database names are case sensitive; however, HP strongly recommends that you do not create
databases with the same name that uses different case; for example, do not create a database
called mydatabase and another database called MyDataBase.

Database Passwords
Database passwords may contain letters, digits, and certain special characters; however, no nonASCII Unicode characters may be used. The following table lists special (ASCII) characters that
HP Vertica permits in database passwords. Special characters can appear anywhere within a
password string; for example, mypas$word or $mypassword or mypassword$ are all permitted.
Caution: Using special characters in database passwords that are not listed in the following
table could cause database instability.

Character Description
#

pound sign

exclamation point

plus sign

asterisk

HP Vertica Analytics Platform (7.1.x)

Page 355 of 5055

HP Vertica Documentation

question mark

comma

period

forward slash

equals sign

tilde

minus sign

dollar sign

underscore

colon
space

"

double quote

'

single quote

percent sign

&

ampersand

parenthesis

parenthesis

semicolon

<

less than sign

>

greater than sign

at sign

back quote

square bracket

square bracket

HP Vertica Analytics Platform (7.1.x)

Page 356 of 5055

HP Vertica Documentation

backslash

caret

vertical bar

curly bracket

curly bracket

See Also
l

Password Guidelines

Create an Empty Database Using MC


You can create a new database on an existing HP Vertica cluster through the Management
Console interface.
Database creation can be a long-running process, lasting from minutes to hours, depending on the
size of the target database. You can close the web browser during the process and sign back in to
MC later; the creation process continues unless an unexpected error occurs. See the Notes
section below the procedure on this page.
You currently need to use command line scripts to define the database schema and load data.
Refer to the topics in Configuration Procedure. You should also run the Database Designer, which
you access through the Administration Tools, to create either a comprehensive or incremental
design. Consider using the Tutorial in the Getting Started Guide to create a sample database you
can start monitoring immediately.

How to Create an Empty Database on an MC-managed Cluster


1. If you are already on the Databases and Clusters page, skip to the next step; otherwise:
a. Connect to MC and sign in as an MC administrator.
b. On the Home page, click the Databases and Clusters task.
2. If no databases exist on the cluster, continue to the next step; otherwise:
a. If a database is running on the cluster on which you want to add a new database, select the
database and click Stop.

HP Vertica Analytics Platform (7.1.x)

Page 357 of 5055

HP Vertica Documentation

b. Wait for the running database to have a status of Stopped.


3. Click the cluster on which you want to create the new database and click Create Database.
4. The Create Database wizard opens. Provide the following information:
n

Database name and password. See Creating a Database Name and Password for rules.

Optionally click Advanced to open the advanced settings and change the port and catalog,
data, and temporary data paths. By default the MC application/web server port is 5450 and
paths are /home/dbadmin, or whatever you defined for the paths when you ran the Cluster
Creation Wizard or the install_vertica script. Do not use the default agent port 5444 as a
new setting for the MC port. See MC Settings > Configuration for port values.

5. Click Continue.
6. Select nodes to include in the database.
The Database Configuration window opens with the options you provided and a graphical
representation of the nodes appears on the page. By default, all nodes are selected to be part of
this database (denoted by a green check mark). You can optionally click each node and clear
Include host in new database to exclude that node from the database. Excluded nodes are
gray. If you change your mind, click the node and select the Include check box.
7. Click Create in the Database Configuration window to create the database on the nodes.
The creation process takes a few moments, after which the database starts and a Success
message appears on the interface.
8. Click OK to close the success message.
MC's Manage page opens and displays the database nodes. Nodes not included in the database
are colored gray, which means they are standby nodes you can include later. To add nodes to or
remove nodes from your HP Vertica cluster, which are not shown in standby mode, you must run
the install_vertica script.

Notes
l

If warnings occur during database creation, nodes will be marked on the UI with an Alert icon
and a message.

HP Vertica Analytics Platform (7.1.x)

Page 358 of 5055

HP Vertica Documentation

Warnings do not prevent the database from being created, but you should address warnings
after the database creation process completes by viewing the database Message Center
from the MC Home page.

Failure messages display on the database Manage page with a link to more detailed
information and a hint with an actionable task that you must complete before you can
continue. Problem nodes are colored red for quick identification.

To view more detailed information about a node in the cluster, double-click the node from the
Manage page, which opens the Node Details page.

To create MC users and grant them access to an MC-managed database, see About MC Users
and Creating an MC User.

See Also
l

Creating a Cluster Using MC

Troubleshooting Management Console

Restarting MC

Create a Database Using Administration Tools


1. Run the Administration Tools from your Administration Host as follows:
$ /opt/vertica/bin/admintools

If you are using a remote terminal application, such as PuTTY or a Cygwin bash shell, see
Notes for Remote Terminal Users.
2. Accept the license agreement and specify the location of your license file. For more information
see Managing Licenses for more information.
This step is necessary only if it is the first time you have run the Administration Tools
3. On the Main Menu, click Configuration Menu, and click OK.
4. On the Configuration Menu, click Create Database, and click OK.
5. Enter the name of the database and an optional comment, and click OK.

HP Vertica Analytics Platform (7.1.x)

Page 359 of 5055

HP Vertica Documentation

6. Establish the superuser password for your database.


n

To provide a password enter the password and click OK. Confirm the password by entering
it again, and then click OK.

If you don't want to provide the password, leave it blank and click OK. If you don't set a
password, HP Vertica prompts you to verify that you truly do not want to establish a
superuser password for this database. Click Yes to create the database without a password
or No to establish the password.

Caution: If you do not enter a password at this point, the superuser password is set to
empty. Unless the database is for evaluation or academic purposes, HP strongly
recommends that you enter a superuser password. See Creating a Database Name and
Password for guidelines.

7. Select the hosts to include in the database from the list of hosts specified when HP Vertica
was installed (install_vertica -s), and click OK.
8. Specify the directories in which to store the data and catalog files, and click OK.

Note: Do not use a shared directory for more than one node. Data and catalog directories
must be distinct for each node. Multiple nodes must not be allowed to write to the same
data or catalog directory.

9. Catalog and data path names must contain only alphanumeric characters and cannot have
leading spaces. Failure to comply with these restrictions results in database creation failure.
For example:
Catalog pathname: /home/dbadmin
Data Pathname: /home/dbadmin
10. Review the Current Database Definition screen to verify that it represents the database you
want to create, and then click Yes to proceed or No to modify the database definition.
11. If you click Yes, HP Vertica creates the database you defined and then displays a message to
indicate that the database was successfully created.

HP Vertica Analytics Platform (7.1.x)

Page 360 of 5055

HP Vertica Documentation

Note: : For databases created with 3 or more nodes, HP Vertica automatically sets Ksafety to 1 to ensure that the database is fault tolerant in case a node fails. For more
information, see Failure Recovery in the Administrator's Guide and MARK_DESIGN_
KSAFE in the SQL Reference Manual.

12. Click OK to acknowledge the message.

Create the Logical Schema


1. Connect to the database.
In the Administration Tools Main Menu, click Connect to Database and click OK.
See Connecting to the Database for details.

The vsql welcome script appears:


Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
=>

2. Run the logical schema script


Using the \i meta-command in vsql to run the SQL logical schema script that you prepared
earlier.

HP Vertica Analytics Platform (7.1.x)

Page 361 of 5055

HP Vertica Documentation

3. Disconnect from the database


Use the \q meta-command in vsql to return to the Administration Tools.

Perform a Partial Data Load


HP recommends that for large tables, you perform a partial data load and then test your database
before completing a full data load. This load should load a representative amount of data.
1. Load the small tables.
Load the small table data files using the SQL load scripts and data files you prepared earlier.
2. Partially load the large tables.
Load 10GB to 50GB of table data for each table using the SQL load scripts and data files that
you prepared earlier.
For more information about projections, see Physical Schema in the Concepts Guide.

Test the Database


Test the database to verify that it is running as expected.
Check queries for syntax errors and execution times.
1. Use the vsql \timing meta-command to enable the display of query execution time in
milliseconds.
2. Execute the SQL sample query script that you prepared earlier.
3. Execute several ad hoc queries.

Optimize Query Performance


Optimizing the database consists of optimizing for compression and tuning for queries. (See
Creating a Database Design.)
To optimize the database, use the Database Designer to create and deploy a design for optimizing
the database. See the Tutorial in the Getting Started Guide for an example of using the Database
Designer to create a Comprehensive Design.
After you have run the Database Designer, use the techniques described in Optimizing Query
Performance in the Analyzing Data Guide to improve the performance of certain types of queries.

HP Vertica Analytics Platform (7.1.x)

Page 362 of 5055

HP Vertica Documentation

Note: The database response time depends on factors such as type and size of the application
query, database design, data size and data types stored, available computational power, and
network bandwidth. Adding nodes to a database cluster does not necessarily improve the
system response time for every query, especially if the response time is already short, e.g.,
less then 10 seconds, or the response time is not hardware bound.

Complete the Data Load


To complete the load:
1. Monitor system resource usage
Continue to run the top, free, and df utilities and watch them while your load scripts are
running (as described in Monitoring Linux Resource Usage). You can do this on any or all
nodes in the cluster. Make sure that the system is not swapping excessively (watch kswapd in
top) or running out of swap space (watch for a large amount of used swap space in free).

Note: HP Vertica requires a dedicated server. If your loader or other processes take up
significant amounts of RAM, it can result in swapping.

2. Complete the large table loads


Run the remainder of the large table load scripts.

Test the Optimized Database


Check query execution times to test your optimized design:
1. Use the vsql \timing meta-command to enable the display of query execution time in
milliseconds.
Execute a SQL sample query script to test your schema and load scripts for errors.

Note: Include a sample of queries your users are likely to run against the database. If you
don't have any real queries, just write simple SQL that collects counts on each of your
tables. Alternatively, you can skip this step.

2. Execute several ad hoc queries

HP Vertica Analytics Platform (7.1.x)

Page 363 of 5055

HP Vertica Documentation

a. Run Administration Tools and select Connect to Database.


b. Use the \i meta-command to execute the query script; for example:
vmartdb=> \i vmart_query_03.sql customer_name
| annual_income
------------------+--------------James M. McNulty |
999979
Emily G. Vogel
|
999998
(2 rows)
Time: First fetch (2 rows): 58.411 ms. All rows formatted: 58.448 ms
vmartdb=> \i vmart_query_06.sql
store_key | order_number | date_ordered
-----------+--------------+-------------45 |
202416 | 2004-01-04
113 |
66017 | 2004-01-04
121 |
251417 | 2004-01-04
24 |
250295 | 2004-01-04
9 |
188567 | 2004-01-04
166 |
36008 | 2004-01-04
27 |
150241 | 2004-01-04
148 |
182207 | 2004-01-04
198 |
75716 | 2004-01-04
(9 rows)
Time: First fetch (9 rows): 25.342 ms. All rows formatted: 25.383 ms

Once the database is optimized, it should run queries efficiently. If you discover queries that you
want to optimize, you can modify and update the design. See Incremental Design in the
Administrator's Guide.

Set Up Incremental (Trickle) Loads


Once you have a working database, you can use trickle loading to load new data while concurrent
queries are running.
Trickle load is accomplished by using the COPY command (without the DIRECT keyword) to load
10,000 to 100,000 rows per transaction into the WOS. This allows HP Vertica to batch multiple
loads when it writes data to disk. While the COPY command defaults to loading into the WOS, it
will write ROS if the WOS is full.
See Trickle Loading Data for details.

See Also
l

COPY

Loading Data Through ODBC

HP Vertica Analytics Platform (7.1.x)

Page 364 of 5055

HP Vertica Documentation

Implement Locales for International Data Sets


The locale is a parameter that defines the user's language, country, and any special variant
preferences, such as collation. HP Vertica uses the locale to determine the behavior of certain
string functions. The locale also determines the collation for various SQL commands that require
ordering and comparison, such as GROUP BY, ORDER BY, joins, and the analytic ORDER BY
clause.
By default, the locale for your HP Vertica database is en_US@collation=binary (English US). You
can define a new default locale that is used for all sessions on the database. You can also override
the locale for individual sessions. However, projections are always collated using the default en_
US@collation=binary collation, regardless of the session collation. Any locale-specific collation is
applied at query time.
You can set the locale through ODBC, JDBC, and ADO.net.

ICU Locale Support


HP Vertica uses the ICU library for locale support; you must specify locale using the ICU locale
syntax. The locale used by the database session is not derived from the operating system (through
the LANG variable), so Hewlett-Packard recommends that you set the LANG for each node running
vsql, as described in the next section.
While ICU library services can specify collation, currency, and calendar preferences, HP Vertica
supports only the collation component. Any keywords not relating to collation are rejected.
Projections are always collated using the en_US@collation=binary collation regardless of the
session collation. Any locale-specific collation is applied at query time.
The SET DATESTYLE TO ... command provides some aspects of the calendar, but HP Vertica
supports only dollars as currency.

Changing DB Locale for a Session


This examples sets the session locale to Thai.
1. At the operating-system level for each node running vsql, set the LANG variable to the locale
language as follows:
export LANG=th_TH.UTF-8

HP Vertica Analytics Platform (7.1.x)

Page 365 of 5055

HP Vertica Documentation

Note: If setting the LANG= as shown does not work, the operating system support for
locales may not be installed.

2. For each HP Vertica session (from ODBC/JDBC or vsql) set the language locale.
From vsql:
\locale th_TH

3. From ODBC/JDBC:
"SET LOCALE TO th_TH;"

4. In PUTTY (or ssh terminal), change the settings as follows:


settings > window > translation > UTF-8

5. Click Apply and then click Save.


All data loaded must be in UTF-8 format, not an ISO format, as described in Loading UTF-8 Format
Data. Character sets like ISO 8859-1 (Latin1), which are incompatible with UTF-8, are not
supported, so functions like SUBSTRING do not work correctly for multibyte characters. Thus,
settings for locale should not work correctly. If the translation setting ISO-8859-11:2001 (Latin/Thai)
works, the data is loaded incorrectly. To convert data correctly, use a utility program such as Linux
iconv.
Note: The maximum length parameter for VARCHAR and CHAR data type refers to the
number of octets (bytes) that can be stored in that field, not the number of characters. When
using multi-byte UTF-8 characters, make sure to size fields to accommodate from 1 to 4 bytes
per character, depending on the data.

See Also
l

Supported Locales

About Locales

HP Vertica Analytics Platform (7.1.x)

Page 366 of 5055

HP Vertica Documentation

SET LOCALE

ICU User Guide

Specify the Default Locale for the Database


After you start the database, the default locale configuration parameter, DefaultSessionLocale,
sets the initial locale. You can override this value for individual sessions.
To set the locale for the database, use the configuration parameter as follows:
=> ALTER DATABASE mydb SET DefaultSessionLocale = 'ICU-locale-identifier';

For example:
=> ALTER DATABASE mydb SET DefaultSessionLocale = 'en_GB';

Override the Default Locale for a Session


To override the default locale for a specific session, use one of the following commands:
l

The vsql command


\locale

<ICU-locale-identifier>;

For example:
=> \locale en_GBINFO:
INFO 2567: Canonical locale: 'en_GB'
Standard collation: 'LEN'
English (United Kingdom)

The statement SET LOCALE TO <ICU-locale-identifier>.


=> SET LOCALE TO en_GB;
INFO 2567: Canonical locale: 'en_GB'
Standard collation: 'LEN'
English (United Kingdom)

You can also use the Short Form of a locale in either of these commands:
=> SET LOCALE TO LEN;

HP Vertica Analytics Platform (7.1.x)

Page 367 of 5055

HP Vertica Documentation

INFO 2567: Canonical locale: 'en'


Standard collation: 'LEN'
English

=> \locale LEN


INFO 2567: Canonical locale: 'en'
Standard collation: 'LEN'
English

You can use these commands to override the locale as many times as needed during a database
session. The session locale setting applies to any subsequent commands issued in the session.

See Also
l

SET LOCALE

Best Practices for Working with Locales


It is important to understand the distinction between the locale settings on the database server and
locale settings at the client application level. The server locale settings impact only the collation
behavior for server-side query processing. The client application is responsible for verifying that the
correct locale is set in order to display the characters correctly. Hewlett-Packard recommends the
following best practices to ensure predictable results:

Server Locale
The server session locale should be set as described in Specify the Default Locale for the
Database. If you are using different locales in different sessions, at the start of each session from
your client, set the server locale .

vsql Client
l

If thedatabase does not have a default session locale, set the server locale for the session to the
desired locale, as described in Override the Default Locale for a Session.

The locale setting in the terminal emulator where the vsql client runs should be set to be
equivalent to session locale setting on the server side (ICU locale). By doing so, the data is
collated correctly on the server and displayed correctly on the client.

All input data for vsql should be in UTF-8, and all output data is encoded in UTF-8

HP Vertica does not support non UTF-8 encodings and associated locale values; .

For instructions on setting locale and encoding, refer to your terminal emulator documentation.

HP Vertica Analytics Platform (7.1.x)

Page 368 of 5055

HP Vertica Documentation

ODBC Clients
l

ODBC applications can be either in ANSI or Unicode mode. If the user application is Unicode,
the encoding used by ODBC is UCS-2. If the user application is ANSI, the data must be in
single-byte ASCII, which is compatible with UTF-8 used on the database server. The ODBC
driver converts UCS-2 to UTF-8 when passing to the HP Vertica server and converts data sent
by the HP Vertica server from UTF-8 to UCS-2.

If the user application is not already in UCS-2, the application must convert the input data to
UCS-2, or unexpected results could occur. For example:
n

For non-UCS-2 data passed to ODBC APIs, when it is interpreted as UCS-2, it could result in
an invalid UCS-2 symbol being passed to the APIs, resulting in errors.

The symbol provided in the alternate encoding could be a valid UCS-2 symbol. If this occurs,
incorrect data is inserted into the database.

If the database does not have a default session locale, ODBC applications should set the
desired server session locale using SQLSetConnectAttr (if different from database wide
setting). By doing so, you get the expected collation and string functions behavior on the server.

JDBC and ADO.NET Clients


l

JDBC and ADO.NET applications use a UTF-16 character set encoding and are responsible for
converting any non-UTF-16 encoded data to UTF-16. The same cautions apply as for ODBC if
this encoding is violated.

The JDBC and ADO.NET drivers convert UTF-16 data to UTF-8 when passing to the HP
Vertica server and convert data sent by HP Vertica server from UTF-8 to UTF-16.

If there is no default session locale at the database level, JDBC and ADO.NET applications
should set the correct server session locale by executing the SET LOCALE TO command in
order to get the expected collation and string functions behavior on the server. For more
information, see SET LOCALE.

Usage Considerations
Session related:
l

The locale setting is session scoped and applies only to queries (no DML/DDL) executed in that
session. You cannot specify a locale for an individual query.

HP Vertica Analytics Platform (7.1.x)

Page 369 of 5055

HP Vertica Documentation

You can set the default locale for new sessions using the DefaultSessionLocale configuration
parameter

Query related:
The following restrictions apply when queries are run with locale other than the default en_
US@collation=binary:
l

When one or more of the left-side NOT IN columns is CHAR or VARCHAR, multicolumn NOT
IN subqueries are not supported . For example:
=> CREATE TABLE test (x VARCHAR(10), y INT);
=> SELECT ... FROM test WHERE (x,y) NOT IN (SELECT ...);
ERROR: Multi-expression NOT IN subquery is not supported because a left hand
expression could be NULL

Note: Even if columns test.x and test.y have a NOT NULL constraint, an error occurs.

If the outer query contains a GROUP BY on a CHAR or a VARCHAR column, correlated


HAVING clause subqueries are not supported. In the following example, the GROUP BY x in the
outer query causes the error:
=> DROP TABLE test CASCADE;
=> CREATE TABLE test (x VARCHAR(10));
=> SELECT COUNT(*) FROM test t GROUP BY x HAVING x
IN (SELECT x FROM test WHERE t.x||'a' = test.x||'a' );
ERROR: subquery uses ungrouped column "t.x" from outer query

Subqueries that use analytic functions in the HAVING clause are not supported. For example:
=> DROP TABLE test CASCADE;
=> CREATE TABLE test (x VARCHAR(10));
=> SELECT MAX(x)OVER(PARTITION BY 1 ORDER BY 1)
FROM test GROUP BY x HAVING x IN (SELECT MAX(x) FROM test);
ERROR: Analytics query with having clause expression that involves aggregates and
subquery
is not supported

DML/DDL related:
l

SQL identifiers (such as table names and column names) can use UTF-8 Unicode characters.
For example, the following CREATE TABLE statement uses the (German eszett) in the table

HP Vertica Analytics Platform (7.1.x)

Page 370 of 5055

HP Vertica Documentation

name:
=> CREATE TABLE strae(x int, y int);
CREATE TABLE

Projection sort orders are made according to the default en_US@collation=binary collation.
Thus, regardless of the session setting, issuing the following command creates a projection
sorted by col1 according to the binary collation:
=> CREATE PROJECTION p1 AS SELECT * FROM table1 ORDER BY col1;

In such cases, strae and strasse are not stored near each other on disk.
Sorting by binary collation also means that sort optimizations do not work in locales other than
binary. HP Vertica returns the following warning if you create tables or projections in a nonbinary locale:
WARNING: Projections are always created and persisted in the default HP Vertica
locale. The current locale is de_DE

When creating pre-join projections, the projection definition query does not respect the locale or
collation setting. When you insert data into the fact table of a pre-join projection, referential
integrity checks are not locale or collation aware.
For example:
\locale LDE_S1
-- German
=> CREATE TABLE dim (col1 varchar(20) primary key);
=> CREATE TABLE fact (col1 varchar(20) references dim(col1));
=> CREATE PROJECTION pj AS SELECT * FROM fact JOIN dim
ON fact.col1 = dim.col1 UNSEGMENTED ALL NODES;
=> INSERT INTO dim VALUES('');
=> COMMIT;

The following INSERT statement fails with a "nonexistent FK" error even though '' is in the dim
table, and in the German locale 'SS' and '' refer to the same character.
=> INSERT INTO fact VALUES('SS');
ERROR: Nonexistent foreign key value detected in FK-PK join (fact x dim)
using subquery and dim_node0001; value SS
=> => ROLLBACK;
=> DROP TABLE dim, fact CASCADE;

HP Vertica Analytics Platform (7.1.x)

Page 371 of 5055

HP Vertica Documentation

When the locale is non-binary, HP Vertica uses the COLLATION function to transform the input
to a binary string that sorts in the proper order.
This transformation increases the number of bytes required for the input according to this
formula:
result_column_width = input_octet_width * CollationExpansion + 4

The default value of the CollationExpansion configuration parameter is 5.


l

CHAR fields are displayed as fixed length, including any trailing spaces. When CHAR fields are
processed internally, they are first stripped of trailing spaces. For VARCHAR fields, trailing
spaces are usually treated as significant characters; however, trailing spaces are ignored when
sorting or comparing either type of character string field using a non-BINARY locale.

Change Transaction Isolation Levels


By default, HP Vertica uses the READ COMMITTED isolation level for every session. If you prefer,
you can change the default isolation level for the database or for a specific session.
To change the isolation level for a specific session, use the SET SESSION CHARACTERISTICS
command.
To change the isolation level for the database, use the TransactionIsolationLevel configuration
parameter. Once modified, HP Vertica uses the new transaction level for every new session.
The isolation level for the database can be SERIALIZABLE or READ COMMITTED:
=> ALTER DATABASE mydb SET TransactionIsolationLevel = 'SERIALIZABLE';
=> ALTER DATABASE mydb SET TransactionIsolationLevel = 'READ COMMITTED';

To view the transaction characteristics:


=> SHOW TRANSACTION_ISOLATION;

A change to isolation level only applies to future sessions. Existing sessions and their transactions
continue to use the original isolation level.
A transaction retains its isolation level until it completes, even if the session's transaction isolation
level changes mid-transaction. HP Vertica internal processes (such as the Tuple Mover and
refresh operations) and DDL operations are always run at SERIALIZABLE isolation level to ensure
consistency.

HP Vertica Analytics Platform (7.1.x)

Page 372 of 5055

HP Vertica Documentation

See Also
l

Transactions

Configuration Parameters

SET SESSION CHARACTERISTICS

SHOW

HP Vertica Analytics Platform (7.1.x)

Page 373 of 5055

HP Vertica Documentation

Configuration Parameters
Configuration parameters are settings that affect database behavior. You can use configuration
parameters to enable, disable, or tune features related to different database aspects like Tuple
Mover, security, Database Designer, or projections. Configuration parameters have default values,
stored in the HP Vertica database. However you can modify certain parameters to configure your
HP Vertica database using one of the following options:
l

Dynamically through the Management Console browser-based interface

At the command line directly

From vsql

You can also configure certain parameters at the node level. See Setting and Clearing Configuration
Parameters for more information.
Before you modify a database parameter, review all documentation about the parameter to
determine the context under which you can change it. Some parameter changes do not take effect
until after you restart the database. See the CHANGE_REQUIRES_RESTART column in the
CONFIGURATION_PARAMETERS system table to determine whether a parameter requires a
restart to take effect.
Important: If using 7.1, do not hand edit any vertica.conf files. Additionally, do not use any
workarounds for syncing vertica.conf files.

Configuring HP Vertica Settings Using MC


To change database settings for any MC-managed database, click the Settings tab at the bottom
of the Overview, Activity, or Manage pages. The database must be running.
The Settings page defaults to parameters in the General category. To change other parameters,
click an option from the tab panel on the left.

HP Vertica Analytics Platform (7.1.x)

Page 374 of 5055

HP Vertica Documentation

Some settings require that you restart the database, and MC will prompt you to do so. You can
ignore the prompt, but those changes will not take effect until after you restart the database.
If you want to change settings that are specific to Management Console, such as change MC or
agent port assignments, see Managing MC Settings for more information.

See Also
l

Configuration Parameters

HP Vertica Analytics Platform (7.1.x)

Page 375 of 5055

HP Vertica Documentation

Configuring HP Vertica at the Command Line


The tables in this section list parameters for configuring HP Vertica at the command line.
Setting and Clearing Configuration Parameters
General Parameters
Tuple Mover Parameters
Projection Parameters
Epoch Management Parameters
Monitoring Parameters
Profiling Parameters
Security Parameters
Database Designer Parameters
Internationalization Parameters
Data Collector Parameters
Kerberos Authentication Parameters
HCatalog Connector Parameters

Setting and Clearing Configuration Parameters


While all parameters are configurable at the database level, some can be set or cleared at the node
and session levels as well. HP Vertica is designed to operate with minimal configuration changes.
Set and change configuration parameters carefully following any documented guidelines for that
parameter.
Important: If using 7.1, do not manually edit any vertica.conf files. Additionally, do not use
any workarounds to sync vertica.conf files.

Viewing Configuration Parameter Values


There are two simple ways to view active configuration parameter values:
l

Use the SHOW statements

Query from related system tables

HP Vertica Analytics Platform (7.1.x)

Page 376 of 5055

HP Vertica Documentation

If a configuration parameter requires a restart to take effect, the values in a SHOW CURRENT
statement could differ from values in other SHOW statements. To see which parameters require
restart, query the CONFIGURATION_PARAMETERS system table.
l

SHOW CURRENT: Displays active configuration parameter values set at all levels. HP Vertica
checks values set at the session level first. If a value is not set at the session level, HP Vertica
checks whether the value is set for the node where you are logged in. If there is no node-level
setting, HP Vertica checks for a setting at the database level. If no values are set,
SHOW CURRENT shows the default value for the configuration parameter.

SHOW DATABASE: Displays configuration parameter values set for the database.

SHOW NODE: Displays configuration parameter values set for a node.

SHOW SESSION: Displays configuration parameter values set for the current session.

To use system tables to see configuration parameters:


l

Query SESSION_PARAMETERS for parameters configurable at the session level

Query CONFIGURATION_PARAMETERS for parameters configurable at the database or node


level.

Parameters appear in both system tables if they are configurable at the database, node, and
session levels.

Setting Configuration Parameters


To set a parameter value at the database level, use the ALTER DATABASEstatement with the
SET parameter:
ALTER DATABASE dbname SET parameter_name = parameter_value;
For example:
=> ALTER DATABASE mydb SET AnalyzeRowCountInterval = 3600;

You can set some parameters at the node level. To do so, use the ALTER NODE statement with
the SET parameter:
ALTER NODE node_name SET parameter_name = parameter_value;
For example, to prevent clients from connecting to node01, set the MaxClientSessions
configuration parameter to 0:

HP Vertica Analytics Platform (7.1.x)

Page 377 of 5055

HP Vertica Documentation

=> ALTER NODE node01 SET MaxClientSessions = 0;

You can set some parameters at the session level. To do so, use the ALTER SESSION statement
with the SET parameter:
ALTER SESSION SET parameter_name = parameter_value;
For example:
=> ALTER SESSION SET ForceUDxFencedMode = 1;

You can set multiple configuration parameter values at the database, node, and session levels
using comma-separated arguments.
The following example shows how to set multiple parameters at the database level:
ALTER DATABASE mydb SET AnalyzeRowCountInterval = 3600, FailoverToStandbyAfter = '5
minutes';

Clearing Configuration Parameters


To clear a database-level parameter, use the ALTER DATABASEstatement with the CLEAR
parameter. The resulting value reflects the default value.
ALTER DATABASE dbname CLEAR parameter_name;
For example:
=> ALTER DATABASE mydb CLEAR AnalyzeRowCountInterval;

To clear a node-level parameter value, use the ALTERNODE statement with the CLEAR
parameter. The resulting value reflects the value set at the database level, or the default value if no
database-level value is set.
ALTER NODE node_name CLEAR parameter_name;
In this example, MaxClientSessions will be cleared to the default value (50) on node01:
ALTER NODE node01 CLEAR MaxClientSessions;

To clear a session-level parameter value, use the ALTERSESSION statement with the CLEAR
parameter. If the parameter is set a the node level, the resulting value reflects that setting. If no
node-level setting exists, the resulting value reflects the value set at the database level, if
applicable. If the parameter is not set at the node or database level, the resulting value reflects the
default value.
ALTER SESSION CLEAR parameter_name;

HP Vertica Analytics Platform (7.1.x)

Page 378 of 5055

HP Vertica Documentation

For example:
=> ALTER SESSION CLEAR ForceUDxFencedMode;

You can clear multiple configuration parameter values at the database, node, and session levels
using comma-separated arguments.
The following example shows how to clear multiple parameters at the database level:
ALTER DATABASE mydb CLEAR AnalyzeRowCountInterval, FailoverToStandbyAfter;

General Parameters
You use these general parameters to configure HP Vertica.
Parameters

Description

AnalyzeRowCountInterval

The interval at which HP Vertica calculates the


number of rows in the projection and aggregates
row counts calculated during loads. See
Collecting Statistics.
Default Value: 60 seconds

CompressCatalogOnDisk

Compresses the size of the catalog on disk when


enabled (value set to 1 or 2).
Default Value: 0
Valid values:
l

1Compress checkpoints, but not logs

2Compress checkpoints and logs

Consider enabling this parameter if the catalog


disk partition is small (<50 GB) and the metadata
is large (hundreds of tables, partitions, or nodes).

HP Vertica Analytics Platform (7.1.x)

Page 379 of 5055

HP Vertica Documentation

Parameters

Description

CompressNetworkData

Compresses all data sent over the internal


network when enabled (value set to 1). This
compression speeds up network traffic at the
expense of added CPU load. If the network is
throttling your database performance, enable
compression to correct the issue.
Default Value: 0

EnableCooperativeParse

Implements multi-threaded cooperative parsing


capabilities on a node.You can use this
parameter for both delimited and fixed-width
loads. Enabled by default.
Default Value: 1

EnableDataTargetParallelism

Allows multiple threads for sorting and writing


data to ROS, improving data loading
performance. Enabled by default.
Default Value: 1

EnableResourcePoolCPUAffinity

Aligns queries to the resource pool of the


processing CPU. When disabled (value is set to
0), queries run on any CPU, regardless of the
CPU_AFFINITY_SET of the resource pool.
Enabled by default.
Default Value: 1

ExternalTablesExceptionsLimit

Determines the maximum number of COPY


exceptions and rejections allowed when a
SELECT statement references an external table.
Set to -1 to remove any exceptions limit. See
Validating External Tables.
Default Value: 100

FailoverToStandbyAfter

Specifies the length of time that an active


standby node waits before taking the place of a
failed node.
This parameter takes Interval Values.

HP Vertica Analytics Platform (7.1.x)

Page 380 of 5055

HP Vertica Documentation

Parameters

Description

FencedUDxMemoryLimitMB

Sets the maximum amount of memory, in


megabytes (MB), that a fenced-mode UDF can
use. If a UDF attempts to allocate more memory
than this limit, that attempt triggers an exception.
For more information, see Fenced Mode in the
Extending HPVertica Guide.
Default Value: -1no limit

FlexTableDataTypeGuessMultiplier

Specifies the multiplier to use for a key value


when creating a view for a flex keys table.
Default Value: 2.0
See Setting Flex Table Parameters.

FlexTableRawSize

The default value (in bytes) of the __raw__


column size of flex table. The maximum value of
is 32000000. See Setting Flex Table Parameters.
Default Value: 130000

JavaBinaryForUDx

The full path to the Java executable that HP


Vertica uses to run Java UDxs. See Installing
Java on Hosts in the Extending HPVertica
Guide.

JavaClassPathForUDx

Specifies the Java classpath for the JVMthat


executes Java UDxs.
Default Value: ${vertica_home}
/packages/hcat/lib/*
Required values: Must list all directories
containing JARfiles that Java UDxs import.
See Handling Java UDx Dependencies in the
Extending HPVertica Guide.

HP Vertica Analytics Platform (7.1.x)

Page 381 of 5055

HP Vertica Documentation

Parameters

Description

MaxAutoSegColumns

Specifies the number of columns (01024) to


segment automatically when creating autoprojections from COPY and INSERT INTO
statements. Setting this parameter to zero (0)
indicates to use all columns in the hash
segmentation expression.
Default Value: 32

MaxClientSessions

Determines the maximum number of client


sessions that can run on a single node of the
database. The default value allows five additional
administrative logins. These logins prevent
DBAs from being locked out of the system if the
limit is reached by non-dbadmin users.
Default Value: 50 user logins and 5 additional
administrative logins
Tip: By setting this parameter to 0, you can
prevent new client sessions from being opened
while you are shutting down the database.
Restore the parameter to its original setting after
you restarted the database. See the section
"Interrupting and Closing Sessions" in Managing
Sessions.

PatternMatchAllocator

Setting this parameter to 1 overrides the heap


memory allocator for the pattern-match library.
The Perl Compatible Regular Expressions
(PCRE) pattern-match library evaluates regular
expressions. Restart the database for this
parameter to take effect. For more information,
see Regular Expression Functions.
Default Value: 0

HP Vertica Analytics Platform (7.1.x)

Page 382 of 5055

HP Vertica Documentation

Parameters

Description

PatternMatchStackAllocator

Overrides the stack memory allocator for the


pattern-match library. The Perl Compatible
Regular Expressions (PCRE) pattern-match
library evaluates regular expressions. Restart the
database for this parameter to take effect. For
more information, see Regular Expression
Functions.
Default Value: 1

SegmentAutoProjection

Determines whether auto-projections are


segmented by default. Set to 0 to disable.
Default Value: 1

TransactionIsolationLevel

Changes the isolation level for the database.


After modification, HP Vertica uses the new
transaction level for every new session. Existing
sessions and their transactions continue to use
the original isolation level. See Change
Transaction Isolation Levels.
Default Value: READCOMMITTED

TransactionMode

Specifies whether transactions are in read/write


or read-only modes. Read/write is the default.
Existing sessions and their transactions continue
to use the original isolation level.
Default Value: READWRITE

Tuple Mover Parameters


These parameters control how the Tuple Mover operates.

HP Vertica Analytics Platform (7.1.x)

Page 383 of 5055

HP Vertica Documentation

Parameters

Description

ActivePartitionCount

Sets the number of partitions, called active partitions, that are


currently being loaded. For information about how the Tuple
Mover treats active (and inactive) partitions during a
mergeout operation, see Understanding the Tuple Mover.
Default Value: 1
Example:
ALTER DATABASE mydb SET ActivePartitionCount = 2;

MergeOutInterval

The number of seconds the Tuple Mover waits between


checks for new ROS files to merge out. If ROS containers
are added frequently, you may need to decrease this value.
Default Value: 600
Example:
ALTER DATABASE mydb SET MergeOutInterval = 1200;

MoveOutInterval

The number of seconds the Tuple Mover waits between


checks for new data in the WOS to move to ROS.
Default Value: 300
Example:
ALTER DATABASE mydb SET MoveOutInterval = 600;

MoveOutMaxAgeTime

The specified interval (in seconds) after which the Tuple


Mover is forced to write the WOS to disk. The default interval
is 30 minutes.
Tip: If you had been running the force_moveout.sh script in
previous releases, you no longer need to run it.
Default Value: 1800
Example:
ALTER DATABASE mydb SET MoveOutMaxAgeTime = 1200;

MoveOutSizePct

The percentage of the WOS that can be filled with data before
the Tuple Mover performs a moveout operation.
Default Value: 0
Example:
ALTER DATABASE mydb SET MoveOutSizePct = 50;

HP Vertica Analytics Platform (7.1.x)

Page 384 of 5055

HP Vertica Documentation

Projection Parameters
The following table describes the configuration parameters that help you manage projections.
Parameters

Description

AnalyzeRowCountInterval

The interval at which HP Vertica calculates the


number of rows in the projection and aggregates
row counts calculated during loads. See
Collecting Statistics.
Default Value: 60 seconds

EnableGroupByProjections

When you set EnableGroupByProjections to '1',


you can create live aggregate projections. For
more information, see Live Aggregate
Projections.
Default Value: 1

EnableTopKProjections

When you set EnableTopKProjections to '1',


you can create Top-K projections that allow you
to retrieve Top-K data quickly. For more
information, see Top-K Projections.
Default Value: 1

EnableExprsInProjections

When you set EnableExprsInProjections to '1',


you can create projections that use expressions
to calculate column values. For more
information, see Projections with Expressions.
Default Value: 1

MaxAutoSegColumns

Specifies the number of columns (0 1024) to


segment automatically when creating autoprojections from COPY and INSERT INTO
statements.
Set to 0 to use all columns in the hash
segmentation expression.
Default Value: 32

HP Vertica Analytics Platform (7.1.x)

Page 385 of 5055

HP Vertica Documentation

Parameters

Description

SegmentAutoProjection

Determines whether auto-projections are


segmented by default. Set to 0 to disable.
Default Value: 1

Epoch Management Parameters


The following table describes the epoch management parameters for configuring HP Vertica.
Parameters

Description

AdvanceAHMInterval

Determines how frequently (in seconds) HP Vertica checks the


history retention status.
Note: AdvanceAHMInterval cannot be set to a value that is less
than the EpochMapInterval.
Default Value: 180 (3 minutes)
Example:
ALTER DATABASE mydb SET AdvanceAHMInterval = '3600';

EpochMapInterval

Determines the granularity of mapping between epochs and time


available to historical queries. When a historical queries AT TIME
T request is issued, HP Vertica maps it to an epoch within a
granularity of EpochMapInterval seconds. It similarly affects the
time reported for Last Good Epoch during Failure Recovery. Note
that it does not affect internal precision of epochs themselves.
Tip: Decreasing this interval increases the number of epochs
saved on disk. Therefore, consider reducing the
HistoryRetentionTime parameter to limit the number of history
epochs that HP Vertica retains.
Default Value: 180 (3 minutes)
Example:
ALTER DATABASE mydb SET EpochMapInterval = '300';

HP Vertica Analytics Platform (7.1.x)

Page 386 of 5055

HP Vertica Documentation

Parameters

Description

HistoryRetentionTime

Determines how long deleted data is saved (in seconds) as an


historical reference. When the specified time since the deletion has
passed, you can purge the data. Use the -1 setting if you prefer to
use HistoryRetentionEpochs to determine which deleted data
can be purged.
Note: The default setting of 0 effectively prevents the use of the
Administration Tools 'Roll Back Database to Last Good Epoch'
option because the AHM remains close to the current epoch and a
rollback is not permitted to an epoch prior to the AHM.
Tip: If you rely on the Roll Back option to remove recently loaded
data, consider setting a day-wide window to remove loaded data.
For example:
ALTER DATABASE mydb SET HistoryRetentionTime = 86400;

Default Value: 0 (Data saved when nodes are down.)


Example:
ALTER DATABASE mydb SET HistoryRetentionTime = '240';

HistoryRetentionEpochs

Specifies the number of historical epochs to save, and therefore,


the amount of deleted data.
Unless you have a reason to limit the number of epochs, HP
recommends that you specify the time over which deleted data is
saved.
If you specify both History parameters, HistoryRetentionTime
takes precedence. Setting both parameters to -1, preserves all
historical data.
See Setting a Purge Policy.
Default Value: -1 (Disabled)
Example:
ALTER DATABASE mydb SET HistoryRetentionEpochs = '40';

Monitoring Parameters
The following table describes the monitoring parameters for configuring HP Vertica.

HP Vertica Analytics Platform (7.1.x)

Page 387 of 5055

HP Vertica Documentation

Parameters

Description

SnmpTrapDestinations

Defines where HP Vertica sends traps for SNMP. See Configuring

List

Reporting for SNMP.


Default Value: none
Example:
ALTER DATABASE mydb SET SnmpTrapDestinationsList = 'localhost 162
public';

SnmpTrapsEnabled

Enables event trapping for SNMP. See Configuring Reporting for


SNMP.
Default Value: 0
Example:
ALTER DATABASE mydb SET SnmpTrapsEnabled = 1;

SnmpTrapEvents

Define which events HP Vertica traps through SNMP. See


Configuring Reporting for SNMP.
Default Value: Low Disk Space, Read Only File System, Loss of K
Safety, Current Fault Tolerance at Critical Level, Too Many ROS
Containers, WOS Over Flow, Node State Change, Recovery Failure,
and Stale Checkpoint
Example:
ALTER DATABASE mydb SET SnmpTrapEvents = 'Low Disk
Space, Recovery Failure';

SyslogEnabled

Enables event trapping for syslog. See Configuring Reporting for


Syslog.
Default Value: 0
Example:
ALTER DATABASE mydb SET SyslogEnabled = 1 );

SyslogEvents

Defines events that generate a syslog entry. See Configuring


Reporting for Syslog.
Default Value: none
Example:
ALTER DATABASE mydb SET SyslogEvents = 'Low Disk
Space, Recovery Failure';

HP Vertica Analytics Platform (7.1.x)

Page 388 of 5055

HP Vertica Documentation

Parameters

Description

SyslogFacility

Defines which SyslogFacility HP Vertica uses. See Configuring


Reporting for Syslog.
Default Value: user
Example:
ALTER DATABASE mydb SET SyslogFacility = 'ftp';

Profiling Parameters
The following table describes the profiling parameters for configuring HP Vertica. See Profiling
Database Performance for more information on profiling queries.
Parameters

Description

GlobalEEProfiling

Enables profiling for query execution runs in all sessions on


all nodes.
Default Value: 0
Example:
ALTER DATABASE mydb SET GlobalEEProfiling = 1;

GlobalQueryProfiling

Enables query profiling for all sessions on all nodes.


Default Value: 0
Example:
ALTER DATABASE mydb SET GlobalQueryProfiling = 1;

GlobalSessionProfiling

Enables session profiling for all sessions on all nodes.


Default Value: 0
Example:
ALTER DATABASE mydb SET GlobalSessionProfiling = 1;

Security Parameters
The following table describes configuration parameters for configuring client authentication. These
parameters define the hash algorithm, enable SSL, and set the SSL private key, certificate, and
certificate authority for HP Vertica.

HP Vertica Analytics Platform (7.1.x)

Page 389 of 5055

HP Vertica Documentation

Parameters

Description

EnableSSL

Enables SSL for the server. See Implementing SSL.


Default Value: 0
Example:
ALTER DATABASE mydb SET EnableSSL = '1';

SSLPrivateKey

Specifies the server's private key. Only the value of this parameter is visible to
dbadmin users.
Default Value: No default value
Example:
ALTER DATABASE mydb SET SSLPrivateKey = '<contents of server.key
file>';

Include the contents of the server.key file, but do not include the file name.
Note: This parameters gets set automatically during upgrade to 7.1 if you
set
EnableSSL=1 prior to the upgrade.
SSLCertificate

Sets the SSL certificate. If your SSL certificate is a certificate chain, cut and
paste only the top-most certificate of the certificate chain to set this value.
Default Value: No default value
Example:
ALTER DATABASE mydb SET SSLCertificate = '<contents of server.crt
file>';

Include the contents of the server.crt file, but do not include the file name.
Note: This parameters gets set automatically during upgrade to 7.1 if you
set
EnableSSL=1 prior to the upgrade.

HP Vertica Analytics Platform (7.1.x)

Page 390 of 5055

HP Vertica Documentation

Parameters

Description

SSLCA

Sets the SSL certificate authority.


Default Value: No default value
Example:
ALTER DATABASE mydb SET SSLCA = '<contents of certificate authority
root.crt file>';

Include the contents of the certificate authority root.crt file, but do not
include the file name.
SecurityAlgorit

Sets the algorithm for the function that hash authentication usesMD5 or SHA-

hm

512.
Default Value: 'NONE'
Example:
ALTER DATABASE mydb SET SecurityAlgorithm = 'MD5';
ALTER DATABASE mydb SET SecurityAlgorithm = 'SHA512';

View parameter values with the statement, SHOW DATABASE. You must be a database superuser to
view the value:
SHOW DATABASE mydb SSLCertificate;

See Also
Kerberos Authentication Parameters
Configuring SSL

Database Designer Parameters


The following table describes the parameters for configuring the HP Vertica Database Designer.

HP Vertica Analytics Platform (7.1.x)

Page 391 of 5055

HP Vertica Documentation

Parameter

Description

DBDCorrelationSampleRowCou

Minimum number of table rows at which Database Designer

nt

discovers and records correlated columns.


Default Value: 4000
Example:
ALTER DATABASE mydb SET DBDCorrelationSampleRowCount =
3000;

DBDLogInternalDesignProcess

Enables or disables Database Designer logging.


Default value: False
Examples:
ALTER DATABASE mydb SET DBDLogInternalDesignProcess =
'1';
ALTER DATABASE mydb SET DBDLogInternalDesignProcess =
'0';

Internationalization Parameters
The following table describes the internationalization parameters for configuring HP Vertica.
Parameters

Description

DefaultIntervalStyle

Sets the default interval style to use. If set to 0 (default), the


interval is in PLAIN style (the SQL standard), no interval units on
output. If set to 1, the interval is in UNITS on output. This
parameter does not take effect until the database is restarted.
Default Value: 0
Example:
ALTER DATABASE mydb SET DefaultIntervalStyle = 1;

DefaultSessionLocale

Sets the default session startup locale for the database. This
parameter does not take effect until the database is restarted.
Default Value: en_US@collation=binary
Example:
ALTER DATABASE mydb SET DefaultSessionLocale = 'en_GB';

HP Vertica Analytics Platform (7.1.x)

Page 392 of 5055

HP Vertica Documentation

Parameters

Description

EscapeStringWarning

Issues a warning when back slashes are used in a string literal.


This is provided to help locate back slashes that are being treated
as escape characters so they can be fixed to follow the Standard
conforming string syntax instead.
Default Value: 1
Example:
ALTER DATABASE mydb SET EscapeStringWarning = '1';

StandardConformingStrings

In HP Vertica 4.0, determines whether ordinary string literals ('...')


treat backslashes (\) as string literals or escape characters. When
set to '1', backslashes are treated as string literals, when set to '0',
back slashes are treated as escape characters.
Tip: To treat backslashes as escape characters, use the
Extended string syntax:
(E'...');
See String Literals (Character) in the SQL Reference Manual.
Default Value: 1
Example:
ALTER DATABASE mydb SET StandardConformingStrings = '0';

Data Collector Parameters


The following table lists the Data Collector parameter for configuring HP Vertica.
Parameter

Description

EnableDataCollector Enables and disables the Data Collector, which is the Workload
Analyzer's internal diagnostics utility. Affects all sessions on all nodes.
Use 0 to turn off data collection.
Default value: 1 (Enabled)
Example:
ALTER DATABASE mydb SET EnableDataCollector = 0;

For more information, see the following topics in the SQL Reference Manual:

HP Vertica Analytics Platform (7.1.x)

Page 393 of 5055

HP Vertica Documentation

Data Collector Functions

ANALYZE_WORKLOAD

V_MONITOR.DATA_COLLECTOR

V_MONITOR.TUNING_RECOMMENDATIONS

See also the following topics in the Administrator's Guide


l

Retaining Monitoring Information

Analyzing Workloads

Tuning Recommendations

Analyzing Workloads Through Management Console and Through an API

Kerberos Authentication Parameters


The following parameters let you configure the HP Vertica principal for Kerberos authentication and
specify the location of the Kerberos keytab file.
Parameter

Description

KerberosServiceName Provides the service name portion of the HP Vertica Kerberos principal.
By default, this parameter is 'vertica'. For example:
vertica/vcluster@EXAMPLE.COM.
KerberosRealm

Provides the realm portion of the HP Vertica Kerberos principal. A realm


is the authentication administrative domain and is usually formed in
uppercase letters; for example: vertica/vcluster@EXAMPLE.COM.

HP Vertica Analytics Platform (7.1.x)

Page 394 of 5055

HP Vertica Documentation

Parameter

Description

KerberosKeytabFile

Provides the location of the keytab file that contains credentials for the
HP Vertica Kerberos principals. By default, this file is located in /etc.
For example: KerberosKeytabFile=/etc/krb5.keytab.
Notes:
l

Each principal must take the form


KerberosServiceName/hostname@KerberosRealm. The host name
is the first entry for the node in /etc/hosts.

The keytab file must be readable by the file owner who is running
the process (typically the Linux dbadmin user assigned file
permissions 0600).

KerberosHostname

If you use individual principals for each node in your cluster


(recommended), do not set this parameter. Ifit is not set, HP Vertica
automatically obtains the correct value from the operating system.
However, if you use a single principal for all nodes in your cluster (not
recommended), set this to the host element of that principal.For
example, vertica/vcluster@EXAMPLE.COM.

HCatalog Connector Parameters


The following table describes the parameters for configuring the HCatalog Connector. See Using
the HCatalog Connector in the Hadoop Integration Guide for more information.
Parameter

Description

HCatConnectionTimeout

The number of seconds the HCatalog Connector waits for a


successful connection to the WebHCat server before returning a
timeout error.
Default Value:0 (Wait indefinitely)
Requires Restart:No
Example:
ALTER DATABASE mydb SET HCatConnectionTimeout = 30;

HP Vertica Analytics Platform (7.1.x)

Page 395 of 5055

HP Vertica Documentation

HCatSlowTransferLimit

The lowest transfer speed (in bytes per second) that the HCatalog
Connector allows when retrieving data from the WebHCat server. In
some cases, the data transfer rate from the WebHCat server to HP
Vertica is below this threshold. In such cases, after the number of
seconds specified in the HCatSlowTransferTime parameter pass,
the HCatalog Connector cancels the query and closes the
connection.
Default Value:65536
Requires Restart:No
Example:
ALTER DATABASE mydb SET HCatSlowTransferLimit = 32000;

HCatSlowTransferTime

The number of seconds the HCatalog Connector waits before testing


whether the data transfer from the WebHCat server is too slow. See
the HCatSlowTransferLimit parameter.
Default Value:60
Requires Restart:No
Example:
ALTER DATABASE mydb SET HCatSlowTransferTime = 90;

Note: You can override these configuration parameters when creating an HCatalog schema.
See CREATEHCATALOGSCHEMA in the SQL Reference Manual for an explanation.

HP Vertica Analytics Platform (7.1.x)

Page 396 of 5055

HP Vertica Documentation

Designing a Logical Schema


Designing a logical schema for an HP Vertica database is no different than designing for any other
SQL database. A logical schema consists of objects such as schemas, tables, views and
referential Integrity constraints that are visible to SQL users. HP Vertica supports any relational
schema design of your choice.

HP Vertica Analytics Platform (7.1.x)

Page 397 of 5055

HP Vertica Documentation

Using Multiple Schemas


Using a single schema is effective if there is only one database user or if a few users cooperate in
sharing the database. In many cases, however, it makes sense to use additional schemas to allow
users and their applications to create and access tables in separate namespaces. For example,
using additional schemas allows:
l

Many users to access the database without interfering with one another.
Individual schemas can be configured to grant specific users access to the schema and its
tables while restricting others.

Third-party applications to create tables that have the same name in different schemas,
preventing table collisions.

Unlike other RDBMS, a schema in an HP Vertica database is not a collection of objects bound to
one user.

Multiple Schema Examples


This section provides examples of when and how you might want to use multiple schemas to
separate database users. These examples fall into two categories: using multiple private schemas
and using a combination of private schemas (i.e. schemas limited to a single user) and shared
schemas (i.e. schemas shared across multiple users).

Using Multiple Private Schemas


Using multiple private schemas is an effective way of separating database users from one another
when sensitive information is involved. Typically a user is granted access to only one schema and
its contents, thus providing database security at the schema level. Database users can be running
different applications, multiple copies of the same application, or even multiple instances of the
same application. This enables you to consolidate applications on one database to reduce
management overhead and use resources more effectively. The following examples highlight using
multiple private schemas.
l

Using Multiple Schemas to Separate Users and Their Unique Applications


In this example, both database users work for the same company. One user (HRUser) uses a
Human Resource (HR) application with access to sensitive personal data, such as salaries,
while another user (MedUser) accesses information regarding company healthcare costs
through a healthcare management application. HRUser should not be able to access company
healthcare cost information and MedUser should not be able to view personal employee data.

HP Vertica Analytics Platform (7.1.x)

Page 398 of 5055

HP Vertica Documentation

To grant these users access to data they need while restricting them from data they should not
see, two schemas are created with appropriate user access, as follows:
n

HRSchemaA schema owned by HRUser that is accessed by the HR application.

HealthSchemaA schema owned by MedUser that is accessed by the healthcare


management application.

Using Multiple Schemas to Support Multitenancy


This example is similar to the last example in that access to sensitive data is limited by
separating users into different schemas. In this case, however, each user is using a virtual
instance of the same application.
An example of this is a retail marketing analytics company that provides data and software as a
service (SaaS) to large retailers to help them determine which promotional methods they use are
most effective at driving customer sales.
In this example, each database user equates to a retailer, and each user only has access to its
own schema. The retail marketing analytics company provides a virtual instance of the same
application to each retail customer, and each instance points to the users specific schema in
which to create and update tables. The tables in these schemas use the same names because
they are created by instances of the same application, but they do not conflict because they are
in separate schemas.
Example of schemas in this database could be:

MartSchemaA schema owned by MartUser, a large department store chain.

PharmSchemaA schema owned by PharmUser, a large drug store chain.

Using Multiple Schemas to Migrate to a Newer Version of an Application


Using multiple schemas is an effective way of migrating to a new version of a software
application. In this case, a new schema is created to support the new version of the software,
and the old schema is kept as long as necessary to support the original version of the software.
This is called a rolling application upgrade.

HP Vertica Analytics Platform (7.1.x)

Page 399 of 5055

HP Vertica Documentation

For example, a company might use a HR application to store employee data. The following
schemas could be used for the original and updated versions of the software:
n

HRSchemaA schema owned by HRUser, the schema user for the original HR application.

V2HRSchemaA schema owned by V2HRUser, the schema user for the new version of the
HR application.

Using Combinations of Private and Shared Schemas


The previous examples illustrate cases in which all schemas in the database are private and no
information is shared between users. However, users might want to share common data. In the
retail case, for example, MartUser and PharmUser might want to compare their per store sales of a
particular product against the industry per store sales average. Since this information is an industry
average and is not specific to any retail chain, it can be placed in a schema on which both users are
granted USAGE privileges. (For more information about schema privileges, see Schema
Privileges.)
Example of schemas in this database could be:
l

MartSchemaA schema owned by MartUser, a large department store chain.

PharmSchemaA schema owned by PharmUser, a large drug store chain.

IndustrySchemaA schema owned by DBUser (from the retail marketing analytics company)
on which both MartUser and PharmUser have USAGE privileges. It is unlikely that retailers
would be given any privileges beyond USAGE on the schema and SELECT on one or more of its

HP Vertica Analytics Platform (7.1.x)

Page 400 of 5055

HP Vertica Documentation

tables.

Creating Schemas
You can create as many schemas as necessary for your database. For example, you could create a
schema for each database user. However, schemas and users are not synonymous as they are in
Oracle.
By default, only a superuser can create a schema or give a user the right to create a schema. (See
GRANT (Database) in the SQL Reference Manual.)
To create a schema use the CREATE SCHEMA statement, as described in the SQL Reference
Manual.

Specifying Objects in Multiple Schemas


Once you create two or more schemas, each SQL statement or function must identify the schema
associated with the object you are referencing. You can specify an object within multiple schemas
by:
l

Qualifying the object name by using the schema name and object name separated by a dot. For
example, to specify MyTable, located in Schema1, qualify the name as Schema1.MyTable.

Using a search path that includes the desired schemas when a referenced object is unqualified.
By Setting Search Paths, HP Vertica will automatically search the specified schemas to find the
object.

HP Vertica Analytics Platform (7.1.x)

Page 401 of 5055

HP Vertica Documentation

Setting Search Paths


The search path is a list of schemas where HP Vertica looks for tables and User Defined Functions
(UDFs) that are referenced without a schema name. For example, if a statement references a table
named Customers without naming the schema that contains the table, and the search path is
public, Schema1, and Schema2, HP Vertica first searches the public schema for a table named
Customers. If it does not find a table named Customers in public, it searches Schema1 and then
Schema2.
HP Vertica uses the first table or UDF it finds that matches the unqualified reference. If the table or
UDF is not found in any schema in the search path, HP Vertica reports an error.
Note: HP Vertica only searches for tables and UDFs in schemas to which the user has access
privileges. If the user does not have access to a schema in the search path, HP Vertica silently
skips the schema. It does not report an error or warning if the user's search path contains one
or more schemas to which the user does not have access privileges. Any schemas in the
search path that do not exist (for example, schemas that have been deleted since being added
to the search path) are also silently ignored.
The first schema in the search path to which the user has access is called the current schema. This
is the schema where HP Vertica creates tables if a CREATE TABLE statement does not specify a
schema name.
The default schema search path is "$user", public, v_catalog, v_monitor, v_internal.
=> SHOW SEARCH_PATH;
name
|
setting
-------------+--------------------------------------------------search_path | "$user", public, v_catalog, v_monitor, v_internal
(1 row)

The $user entry in the search path is a placeholder that resolves to the current user name, and
public references the public schema. The v_catalog and v_monitor schemas contain HP
Vertica system tables, and the v_internal schema is for HP Vertica's internal use.
Note: HP Vertica always ensures that the v_catalog, v_monitor, and v_internal schemas are
part of the schema search path.
The default search path has HP Vertica search for unqualified tables first in the users schema,
assuming that a separate schema exists for each user and that the schema uses their user name. If
such a user schema does not exist, or if HP Vertica cannot find the table there, HP Vertica next
search the public schema, and then the v_catalog and v_monitor built-in schemas.

HP Vertica Analytics Platform (7.1.x)

Page 402 of 5055

HP Vertica Documentation

A database administrator can set a user's default search schema when creating the user by using
the SEARCH_PATH parameter of the CREATE USER statement. An administrator or the user can
change the user's default search path using the ALTER USER statement's SEARCH_PATH
parameter. Changes made to the default search path using ALTER USER affect new user
sessionsthey do not affect any current sessions.
A user can use the SET SEARCH_PATH statement to override the schema search path for the
current session.
Tip: The SET SEARCH_PATH statement is equivalent in function to the CURRENT_
SCHEMA statement found in some other databases.
To see the current search path, use the SHOWSEARCH_PATH statement. To view the current
schema, use SELECT CURRENT_SCHEMA(). The function SELECT CURRENT_SCHEMA()
also shows the resolved name of $user.
The following example demonstrates displaying and altering the schema search path for the current
user session:
=> SHOW SEARCH_PATH;
name
|
setting
-------------+--------------------------------------------------search_path | "$user", PUBLIC, v_catalog, v_monitor, v_internal
(1 row)
=> SET SEARCH_PATH TO SchemaA, "$user", public;
SET
=> SHOW SEARCH_PATH;
name
|
setting
-------------+-----------------------------------------------------------search_path | SchemaA, "$user", public, v_catalog, v_monitor, v_internal
(1 row)

You can use the DEFAULT keyword to reset the schema search path to the default.
=> SET SEARCH_PATH TO DEFAULT;SET
=> SHOW SEARCH_PATH;
name
|
setting
-------------+--------------------------------------------------search_path | "$user", public, v_catalog, v_monitor, v_internal
(1 row)

To view the default schema search path for a user, query the search_path column of the V_
CATALOG.USERS system table:
=> SELECT search_path from USERS WHERE user_name = 'ExampleUser';

HP Vertica Analytics Platform (7.1.x)

Page 403 of 5055

HP Vertica Documentation

search_path
--------------------------------------------------"$user", public, v_catalog, v_monitor, v_internal
(1 row)
=> ALTER USER ExampleUser SEARCH_PATH SchemaA,"$user",public;
ALTER USER
=> SELECT search_path from USERS WHERE user_name = 'ExampleUser';
search_path
-----------------------------------------------------------SchemaA, "$user", public, v_catalog, v_monitor, v_internal
(1 row)
=> SHOW SEARCH_PATH;
name
|
setting
-------------+--------------------------------------------------search_path | "$user", public, v_catalog, v_monitor, v_internal
(1 row)

Note that changing the default search path has no effect ion the user's current session. Even using
the SET SEARCH_PATH DEFAULT statement does not set the search path to the newly-defined
default value. It only has an effect in new sessions.

See Also
l

HP Vertica System Tables

Creating Objects That Span Multiple Schemas


HP Vertica supports views or pre-join projections that reference tables across multiple schemas.
For example, a user might need to compare employee salaries to industry averages. In this case,
the application would query a shared schema (IndustrySchema) for salary averages in addition to
its own private schema (HRSchema) for company-specific salary information.

HP Vertica Analytics Platform (7.1.x)

Page 404 of 5055

HP Vertica Documentation

Best Practice: When creating objects that span schemas, use qualified table names. This
naming convention avoids confusion if the query path or table structure within the schemas
changes at a later date.

Tables in Schemas
In HP Vertica you can create both base tables and temporary tables, depending on what you are
trying to accomplish. For example, base tables are created in the HP Vertica logical schema while
temporary tables are useful for dividing complex query processing into multiple steps.
For more information, see Creating Tables and Creating Temporary Tables.

About Base Tables


The CREATE TABLE statement creates a table in the HP Vertica logical schema. The example
databases described in the Getting Started Guide include sample SQL scripts that demonstrate
this procedure. For example:
CREATE TABLE vendor_dimension (
vendor_key
INTEGER
NOT NULL PRIMARY KEY,
vendor_name
VARCHAR(64),
vendor_address
VARCHAR(64),
vendor_city
VARCHAR(64),
vendor_state
CHAR(2),
vendor_region
VARCHAR(32),
deal_size
INTEGER,
last_deal_update DATE
);

Automatic Projection Creation


To get your database up and running quickly, HP Vertica automatically creates a default projection
for each table created through the CREATE TABLE and CREATE TEMPORARY TABLE
statements. Each projection created automatically (or manually) includes a base projection name
prefix. You must use the projection prefix when altering or dropping a projection (ALTER
PROJECTION RENAME, DROP PROJECTION).
How you use the CREATE TABLE statement determines when the projection is created:
l

If you create a table without providing the projection-related clauses, HP Vertica automatically
creates a superprojection for the table when you use an INSERT INTO or COPY statement to
load data into the table for the first time. The projection is created in the same schema as the
table. Once HP Vertica has created the projection, it loads the data.

HP Vertica Analytics Platform (7.1.x)

Page 405 of 5055

HP Vertica Documentation

If you use CREATE TABLE AS SELECT to create a table from the results of a query, the table
is created first and a projection is created immediately after, using some of the properties of the
underlying SELECT query.

(Advanced users only) If you use any of the following parameters, the default projection is
created immediately upon table creation using the specified properties:
n

column-definition (ENCODING encoding-type and ACCESSRANK integer)

ORDER BY table-column

hash-segmentation-clause

UNSEGMENTED {NODE node | ALL NODES }

KSAFE

Note: Before you define a superprojection in the above manner, read Creating Custom
Designs in the Administrator's Guide.

See Also
l

Creating Base Tables

Projection Concepts

CREATE TABLE

About Temporary Tables


A common use case for a temporary table is to divide complex query processing into multiple steps.
Typically, a reporting tool holds intermediate results while reports are generated (for example, first
get a result set, then query the result set, and so on). You can also write Subqueries.
Note: The default retention when creating temporary tables is ON COMMIT DELETE ROWS,
which discards data at transaction completion. The non-default value is ON COMMIT PRESERVE
ROWS, which discards data when the current session ends.
You create temporary tables Using the CREATE TEMPORARY TABLE statement.

HP Vertica Analytics Platform (7.1.x)

Page 406 of 5055

HP Vertica Documentation

Global Temporary Tables


HP Vertica creates global temporary tables in the public schema, with the data contents private to
the transaction or session through which data is inserted.
Global temporary table definitions are accessible to all users and sessions, so that two (or more)
users can access the same global table concurrently. However, whenever a user commits or rolls
back a transaction, or ends the session, HP Vertica removes the global temporary table data
automatically, so users see only data specific to their own transactions or session.
Global temporary table definitions persist in the database catalogs until they are removed explicitly
through a DROP TABLE statement.

Local Temporary Tables


Local temporary tables are created in the V_TEMP_SCHEMA namespace and inserted into the user's
search path transparently. Each local temporary table is visible only to the user who creates it, and
only for the duration of the session in which the table is created.
When the session ends, HP Vertica automatically drops the table definition from the database
catalogs. You cannot preserve non-empty, session-scoped temporary tables using the ON
COMMIT PRESERVE ROWS statement.
Creating local temporary tables is significantly faster than creating regular tables, so you should
make use of them whenever possible.

Automatic Projection Creation and Characteristics


Once local or global table exists, HP Vertica creates auto-projections for temporary tables
whenever you load or insert data.
The default auto-projection for a temporary table has the following characteristics:
l

It is a superprojection.

It uses the default encoding-type AUTO.

It is automatically unsegmented on the initiator node, if you do not specify a HashSegmentation-Clause.

The projection is not pinned.

Temp tables are not recoverable, so the superprojection is not K-Safe (K-SAFE=0), and you
cannot make it so.

Auto-projections are defined by the table properties and creation methods, as follows:

HP Vertica Analytics Platform (7.1.x)

Page 407 of 5055

HP Vertica Documentation

If table...

Sort order is...

Segmentation is...

Is created from input

Same as input

On PK column (if any), on all FK columns (if

stream (COPY or

stream, if sorted.

any), on the first 31 configurable columns of

INSERT INTO)

the table

Is created from CREATE

Same as input

Same segmentation columns f query output is

TABLE AS SELECT

stream, if sorted.

segmented

If not sorted,

The same as the load, if output of query is

sorted using

unsegmented or unknown

query

following rules.
Has FK and PK

FK first, then PK

PK columns

constraints

columns

Has FK constraints only

FK first, then

Small data type (< 8 byte) columns first, then

(no PK)

remaining columns

large data type columns

Has PK constraints only

PK columns

PK columns

On all columns

Small data type (< 8 byte) columns first, then

(no FK)
Has no FK or PK
constraints

large data type columns

Advanced users can modify the default projection created through the CREATE TEMPORARY TABLE
statement by defining one or more of the following parameters:
l

Column-Definition (temp table) (ENCODING encoding-type and ACCESSRANK integer)

ORDER BY table-column

Hash-Segmentation-Clause

UNSEGMENTED { NODE node | ALL NODES }

NO PROJECTION

Note: Before you define the superprojection in this manner, read Creating Custom Designs in
the Administrator's Guide.

HP Vertica Analytics Platform (7.1.x)

Page 408 of 5055

HP Vertica Documentation

See Also
l

Creating Temporary Tables

Projection Concepts

CREATE TEMPORARY TABLE

HP Vertica Analytics Platform (7.1.x)

Page 409 of 5055

HP Vertica Documentation

Implementing Views
A view is a stored query that dynamically accesses and computes data from the database at
execution time. It differs from a projection in that it is not materialized: it does not store data on
disk. This means that it doesn't need to be refreshed whenever the data in the underlying tables
change, but it does require additional time to access and compute data.
Views are read-only and they support references to tables, temp tables, and other views. They do
not support inserts, deletes, or updates. You can use a view as an abstraction mechanism to:
l

Hide the complexity of SELECT statements from users for support or security purposes. For
example, you could create a view that selects specific columns from specific tables to ensure
that users have easy access to the information they need while restricting them from
confidential information.

Encapsulate the details of the structure of your tables, which could change as your application
evolves, behind a consistent user interface.

Creating Views
A view contains one or more SELECT statements that reference any combination of one or more
tables, temp tables, or views. Additionally, views can specify the column names used to display
results.
The user who creates the view must be a superuser or have the following privileges:
l

CREATE on the schema in which the view is created.

SELECT on all the tables and views referenced within the view's defining query.

USAGE on all the schemas that contain the tables and views referenced within the view's
defining query.

To create a view:
1. Use the CREATE VIEW statement to create the view.
2. Use the GRANT (View) statement to grant users the privilege to use the view.

Note: Once created, a view cannot be actively altered. It can only be deleted and recreated.

HP Vertica Analytics Platform (7.1.x)

Page 410 of 5055

HP Vertica Documentation

Using Views
Views can be used in the FROM clause of any SQL query or subquery. At execution, HP Vertica
internally substitutes the name of the view used in the query with the actual query used in the view
definition. The following example defines a view (ship) and illustrates how a query that refers to the
view is transformed internally at execution time.
l

New view
=> CREATE VIEW ship AS SELECT * FROM public.shipping_dimension;

Original query
=> SELECT * FROM ship;

Transformed query
=> SELECT * FROM (SELECT * FROM public.shipping_dimension) AS ship;

Tip: To use a view, a user must be granted SELECT permissions on the view. See GRANT
(View).
The following example creates a view named myview that sums all individual incomes of customers
listed in the store.store_sales_fact table by state. The results are grouped in ascending order
by state.
=> CREATE VIEW myview AS
SELECT SUM(annual_income), customer_state
FROM public.customer_dimension
WHERE customer_key IN
(SELECT customer_key
FROM store.store_sales_fact)
GROUP BY customer_state
ORDER BY customer_state ASC;

The following example uses the myview view with a WHERE clause that limits the results to
combined salaries of greater than 2,000,000,000.
=> SELECT * FROM myview where sum > 2000000000;
SUM
| customer_state
-------------+---------------2723441590 | AZ

HP Vertica Analytics Platform (7.1.x)

Page 411 of 5055

HP Vertica Documentation

29253817091
4907216137
3769455689
3330524215
4581840709
3310667307
2793284639
5225333668
2128169759
2806150503
2832710696
14215397659
2642551509
(14 rows)

|
|
|
|
|
|
|
|
|
|
|
|
|

CA
CO
CT
FL
IL
IN
MA
MI
NV
PA
TN
TX
UT

Views and Run-Time Errors


If HP Vertica does not have to evaluate an expression that would generate a run-time error in order
to answer a query, the run-time error might not occur. See the following sequence of commands for
an example of this scenario.
If you run a query like the following, HP Vertica returns an error, because TO_DATEcannot convert
the string 'F' to the specified date format:
=> SELECT TO_DATE('F','dd mm yyyy') FROM customer_dimension;
ERROR: Invalid input for DD: "F"

Now create a view using the same query:


=> CREATE VIEW temp AS SELECT TO_DATE('F','dd mm yyyy')
FROM customer_dimension;
CREATE VIEW

The view, however, cannot be used in all queries without generating the same error message. For
example, the following query returns the same error:
=> SELECT * FROM temp;
ERROR: Invalid input for DD: "F"

When you then issue a COUNT command, the returned row count is correct:
=> SELECT COUNT(*) FROM temp;
COUNT
------100
(1 row)

This behavior works as intended. You can create views that contain subqueries, where not every
row is intended to pass the predicate.

HP Vertica Analytics Platform (7.1.x)

Page 412 of 5055

HP Vertica Documentation

Creating a Database Design


Data in HP Vertica is physically stored in projections. When you initially load data into a table using
INSERT, COPY (or COPY LOCAL), HP Vertica creates a default superprojection for the table. This
superprojection ensures that all of the data is available for queries. However, these
superprojections might not optimize database performance, resulting in slow query performance
and low data compression.
To improve performance, create a physical design for your database that optimizes both query
performance and data compression. You can use the Database Designer or create this design by
hand.
Database Designer is a tool that recommends the design of design (projections) that provide the
best query performance. Using Database Designer minimizes the time you spend on manual
database tuning and provides the ability to re-design the database incrementally to optimize for
changing workloads over time.
Database Designer runs as a background process. If non-superusers are running Database
Designer on, or deploying for the same tables at the same time, Database Designer may not be
able to complete.
Tip: HP recommends that you first globally optimize your database using the Comprehensive
setting in Database Designer. If the performance of the comprehensive design is not adequate,
you can design custom projections using an incremental design and manually, as described in
Creating Custom Designs.

What Is a Design?
A design is a physical storage plan that optimizes query performance. Database Designer uses
sophisticated strategies to create a design that provides excellent performance for ad-hoc queries
and specific queries while using disk space efficiently. Database Designer bases the design on the
following information that you provide:
l

Design type (comprehensive or incremental)

Optimization objective (query, load, or balanced)

K-safety

HP Vertica Analytics Platform (7.1.x)

Page 413 of 5055

HP Vertica Documentation

Design queries: Typical queries that you run during normal database operations. Each query can
be assigned a weight that indicates its relative importance so that Database Designer can
prioritize it when creating the design. Database Designer groups queries that affect the design
that Database Designer creates in the same way and considers one weighted query when
creating a design.

Design tables that contain sample data.

Setting that specifies that Database Designer be guided to create only unsegmented
projections.

Setting that specifies that Database Designer analyze statistics before creating the design.

The result of a Database Designer run is:


l

A design script that creates the projections for the design in a way that meets the optimization
objectives and distributes data uniformly across the cluster.

A deployment script that creates and refreshes the projections for your design. For
comprehensive designs, the deployment script contains commands that remove non-optimized
projections. The deployment script includes the full design script.

A backup script that contains SQLstatements to deploy the design that existed on the system
before deployment. This file is useful in case you need to revert to the pre-deployment design.

While running Database Designer, you can choose to deploy your design automatically after the
deployment script is created, or to deploy it manually, after you have reviewed and tested the
design. HP Vertica recommends that you test the design on a non-production server before
deploying the design to your production server.

How Database Designer Creates a Design


During the design process, Database Designer analyzes the logical schema definition, sample
data, and sample queries, and creates a physical schema (projections) in the form of a SQL script
that you deploy automatically or manually. This script creates a minimal set of superprojections to
ensure K-safety.
In most cases, the projections that Database Designer creates provide excellent query
performance within physical constraints while using disk space efficiently.
Database Designer:

HP Vertica Analytics Platform (7.1.x)

Page 414 of 5055

HP Vertica Documentation

Recommends buddy projections with the same sort order, which can significantly improve
load, recovery, and site node performance. All buddy projections have the same base name so
that they can be identified as a group.

Note: If you manually create projections, Database Designer recommends a buddy with the
same sort order, if one does not already exist. By default, Database Designer recommends
both super and non-super segmented projections with a buddy of the same sort order and
segmentation.

Automatically rebalances data after you add or remove nodes.

Accepts queries as design input.

Runs the design and deployment processes in the background.


Running in background is useful if you have a large design that you want to run overnight. An
active SSH session is not required, so designand deploy operations continue to run
uninterrupted, even if the session is terminated.

Accepts a file of sample queries to consider when creating a design. Providing this file is
optional for comprehensive designs.
If you do not provide this file, Database Designer recommends a generic design that does not
consider specific queries. For incremental designs, you must provide sample queries. The query
file can contain up to 100 queries.

Accepts unlimited queries for a comprehensive design.

Allows you to analyze column correlations. Correlation analysis typically only needs to be
performed once and only if the table has more than DBDCorrelationSampleRowCount (default:
4000) rows.
By default, Database Designer does not analyze column correlations. To set the correlation
analysis mode, use DESIGNER_SET_ANALYZE_CORRELATIONS_MODE

Identifies similar design queries and assigns them a signature.


For queries with the same signature, Database Designer weights the queries, depending on how
many queries have that signature. It then considers the weighted query when creating a design.

HP Vertica Analytics Platform (7.1.x)

Page 415 of 5055

HP Vertica Documentation

Recommends and creates projections in a way that minimizes data skew by distributing data
uniformly across the cluster.

Note: Hewlett-Packard does not recommend live aggregate projections and TopKprojections.

Produces higher quality designs by considering UPDATE, DELETE,and SELECTstatements.

Does not sort, segment, or partition projections on LONG VARBINARY and LONG VARCHAR
columns.

Who Can Run Database Designer


To use Administration Tools to run Database Designer and create an optimal database design, you
must be a DBADMINuser.
To run Database Designer programmatically or using Management Console, you must be one of
two types of users:
l

DBADMINuser

Have been assigned the DBDUSERrole and be the owner of database tables for which you are
creating a design

Granting and Enabling the DBDUSER Role


For a non-DBADMIN user to be able to run Database Designer using Management Console, follow
the steps described in Allowing the DBDUSER to Run Database Designer Using Management
Console.
For a non-DBADMINuser to be able to run Database Designer programmatically, following the
steps described in Allowing the DBDUSER to Run Database Designer Programmatically.
Important: When you grant the DBDUSER role, make sure to associate a resource pool with
that user to manage resources during Database Designer runs. (For instructions about how to
associate a resource pool with a user, see User Profiles.)
Multiple users can run Database Designer concurrently without interfering with each other or
using up all the cluster resources. When a user runs Database Designer, either using the
Management Console or programmatically, its execution is mostly contained by the user's
resource pool, but may spill over into system resource pools for less-intensive tasks.

HP Vertica Analytics Platform (7.1.x)

Page 416 of 5055

HP Vertica Documentation

Allowing the DBDUSER to Run Database Designer Using Management Console


To allow a user with the DBDUSERrole to run Database Designer using Management Console,
you first need to create the user on the HP Vertica server.
As DBADMIN, take these steps on the server:
1. Add a temporary folder to all cluster nodes.
=> CREATE LOCATION '/tmp/dbd' ALL NODES;

2. Create the user who needs access to Database Designer.


=> CREATE USER new_user;

3. Grant the user the privilege to create schemas on the database for which they want to create a
design.
=> GRANT CREATE ON DATABASE new_database TO new_user;

4. Grant the DBDUSERrole to the new user.


=> GRANT DBDUSER TO new_user;

5. On all nodes in the cluster, grant the user access to the temporary folder.
=> GRANT ALL ON LOCATION '/tmp/dbd' TO new_user;

6. Grant the new user access to the database schema and its tables.
=> GRANT ALL ON SCHEMA user_schema TO new_user;
=> GRANT ALL ON ALL TABLES IN SCHEMA user_schema TO new_user;

After you have completed this task, you need to do the following to map the MCuser to the new_
user you created in the previous steps:
1. Log in to Management Console as an MCSuper user.
2. Click MCSettings.
3. Click User Management.

HP Vertica Analytics Platform (7.1.x)

Page 417 of 5055

HP Vertica Documentation

4. To create a new MCuser, click Add.To use an existing MCuser, select the user and click
Edit.
5. Next to the DBaccess level window, click Add.
6. In the AddPermissions window, do the following:
a. From the Choose a database drop-down list, select the database for which you want the
user to be able to create a design.
b. In the Database username field, enter the user name you created on the HP Vertica
server, new_user in this example.
c. In the Database password field, enter the password for the database you selected in step
a.
d. In the Restrict access drop-down list, select the level of MCuser you want for this user.
7. Click OKto save your changes.
8. Log out of the MCSuper user account.
The MCuser is now mapped to the user that you created on the HP Vertica server. Log in as the
MCuser and use Database Designer to create an optimized design for your database.
For more information about mapping MCusers, see Mapping an MC User to a Database User's
Privileges.

Allowing the DBDUSER to Run Database Designer Programmatically


To allow a user with the DBDUSER role to run Database Designer programmatically, take these
steps:
1. The DBADMIN user must grant the DBDUSER role:
=> GRANT DBDUSER TO <username>;

This role persists until the DBADMIN user revokes it.


2. For a non-DBADMINuser to run the Database Designer programmatically or using
Management Console, one of the following two steps must happen first:
n

If the user's default role is already DBDUSER, skip this step. Otherwise, The user must
enable the DBDUSER role:

HP Vertica Analytics Platform (7.1.x)

Page 418 of 5055

HP Vertica Documentation

=> SET ROLE DBDUSER;

The DBADMIN must add DBDUSER as the default role for that user:
=> ALTER USER <username> DEFAULT ROLE DBDUSER;

DBDUSERCapabilities and Limitations


The DBDUSER role has the following capabilities and limitations:
l

A DBDUSER cannot create a design with a K-safety less than the system K-safety. If the
designs violate the current K-safet by not having enough buddy projections for the tables, the
design does not complete.

A DBDUSER cannot explicitly change the ancient history mark (AHM), even during deployment
of their design.

When you create a design, you automatically have privileges to manipulate the design. Other tasks
may require that the DBDUSER have additional privileges:
To...
Add design tables

DBDUSER must have...


l

USAGE privilege on the design table schema

OWNER privilege on the design table

Add a single design query

Privilege to execute the design query

Add a file of design queries

Read privilege on the storage location that contains the


query file

Add design queries from the result

Privilege to execute all the queries in the file

Privilege to execute the user query

Privilege to execute each design query retrieved from

of a user query
the results of the user query
Create the design and deployment

scripts

WRITE privilege on the storage location of the design


script

WRITE privilege on the storage location of the


deployment script

HP Vertica Analytics Platform (7.1.x)

Page 419 of 5055

HP Vertica Documentation

Workflow for Running Database Designer


HP Vertica provides three ways to run Database Designer:
l

Using Management Console to Create a Design

Using Administration Tools to Create a Design

About Running Database Designer Programmatically

The following workflow is common to all these ways to run Database Designer:

HP Vertica Analytics Platform (7.1.x)

Page 420 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 421 of 5055

HP Vertica Documentation

Logging Projection Data for Database Designer


When you run Database Designer, the Optimizer proposes a set of ideal projections based on the
options that you specify. When you deploy the design, Database Designer creates the design
based on these projections. However, space or budget constraints may prevent Database Designer
from creating all the proposed projections. In addition, Database Designer may not be able to
implement the projections using ideal criteria.
To get information about the projections, first enable the Database Designer logging capability.
When enabled, Database Designer stores information about the proposed projections in two Data
Collector tables. After Database Designer deploys the design, these logs contain information about
which proposed projections were actually created. After deployment, the logs contain information
about:
l

Projections that the Optimizer proposed

Projections that Database Designer actually created when the design was deployed

Projections that Database Designer created, but not with the ideal criteria that the Optimizer
identified.

The DDL used to create all the projections

Column optimizations

If you do not deploy the design immediately, review the log to determine if you want to make any
changes. If the design has been deployed, you can still manually create some of the projections
that Database Designer did not create.
To enable the Database Designer logging capability, see Enabling Logging for Database Designer
To view the logged information, see Viewing Database Designer Logs.

Enabling Logging for Database Designer


By default, Database Designer does not log information about the projections that theOptimizer
proposed and the Database Designer deploys.
To enable Database Designer logging, enter the following command:
=> ALTER DATABASE mydb SET DBDLogInternalDesginProcess = 1;

To disable Database Designer logging, enter the following command:

HP Vertica Analytics Platform (7.1.x)

Page 422 of 5055

HP Vertica Documentation

=> ALTER DATABASE mydb SET DBDLogInternalDesginProcess = 0;

For more information about logging, see:


l

Logging Projection Data for Database Designer

Viewing Database Designer Logs

Viewing Database Designer Logs


You can find data about the projections that Database Designer considered and deployed in two
Data Collector tables:
l

DC_DESIGN_PROJECTION_CANDIDATES

DC_DESIGN_QUERY_PROJECTION_CANDIDATES

DC_DESIGN_PROJECTION_CANDIDATES
The DC_DESIGN_PROJECTION_CANDIDATES table contains information about all the
projections that the Optimizer proposed. This table also includes the DDL that creates them. The
is_a_winner field indicates if that projection was part of the actual deployed design. To view the
DC_DESIGN_PROJECTION_CANDIDATEStable, enter:
=> SELECT *

FROM DC_DESIGN_PROJECTION_CANDIDATES;

DC_DESIGN_QUERY_PROJECTION_CANDIDATES
The DC_DESIGN_QUERY_PROJECTION_CANDIDATEStable lists plan features for all design
queries.
Possible features are:
l

FULLY DISTRIBUTED JOIN

MERGE JOIN

GROUPBY PIPE

FULLY DISTRIBUTED GROUPBY

RLE PREDICATE

HP Vertica Analytics Platform (7.1.x)

Page 423 of 5055

HP Vertica Documentation

VALUE INDEX PREDICATE

LATE MATERIALIZATION

For all design queries, the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table includes


the following plan feature information:
l

Optimizer path cost.

Database Designer benefits.

Ideal plan feature and its description, which identifies how the referenced projection should be
optimized.

If the design was deployed, the actual plan feature and its description is included in the table.
This information identifies how the referenced projection was actually optimized.

Because most projections have multiple optimizations, each projection usually has multiple
rows.To view the DC_DESIGN_QUERY_PROJECTION_CANDIDATEStable, enter:
=> SELECT *

FROM DC_DESIGN_QUERY_PROJECTION_CANDIDATES;

To see example data from these tables, see Database Designer Logs: Example Data.

Database Designer Logs: Example Data


In the following example, Database Designer created the logs after creating a comprehensive
design for the VMart sample database. The output shows two records from the DC_DESIGN_
PROJECTION_CANDIDATES table.
The first record contains information about the customer_dimension_dbd_1_sort_$customer_
gender$__$annual_income$projection. The record includes the
CREATEPROJECTIONstatement that Database Designer used to create the projection. The is_
a_winner column is t, indicating that Database Designer created this projection when it deployed
the design.
The second record contains information about the product_dimension_dbd_2_sort_$product_
version$__$product_key$projection. For this projection, the is_a_winner column is f. The
Optimizer recommended that Database Designer create this projection as part of the design.
However, Database Designer did not create the projection when it deployed the design. The log
includes the DDL for the CREATE PROJECTION statement. If you want to add the projection
manually, you can use that DDL. For more information, see Creating a Design Manually.

HP Vertica Analytics Platform (7.1.x)

Page 424 of 5055

HP Vertica Documentation

=> SELECT * FROM dc_design_projection_candidates;


-[ RECORD 1 ]--------+--------------------------------------------------------------time
| 2014-04-11 06:30:17.918764-07
node_name
| v_vmart_node0001
session_id
| localhost.localdoma-931:0x1b7
user_id
| 45035996273704962
user_name
| dbadmin
design_id
| 45035996273705182
design_table_id
| 45035996273720620
projection_id
| 45035996273726626
iteration_number
| 1
projection_name
| customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."customer_dimension_
dbd_1_
sort_$customer_gender$__$annual_income$"
(
customer_key ENCODING AUTO,
customer_type ENCODING AUTO,
customer_name ENCODING AUTO,
customer_gender ENCODING RLE,
title ENCODING AUTO,
household_id ENCODING AUTO,
customer_address ENCODING AUTO,
customer_city ENCODING AUTO,
customer_state ENCODING AUTO,
customer_region ENCODING AUTO,
marital_status ENCODING AUTO,
customer_age ENCODING AUTO,
number_of_children ENCODING AUTO,
annual_income ENCODING AUTO,
occupation ENCODING AUTO,
largest_bill_amount ENCODING AUTO,
store_membership_card ENCODING AUTO,
customer_since ENCODING AUTO,
deal_stage ENCODING AUTO,
deal_size ENCODING AUTO,
last_deal_update ENCODING AUTO
)
AS
SELECT customer_key,
customer_type,
customer_name,
customer_gender,
title,
household_id,
customer_address,
customer_city,
customer_state,
customer_region,
marital_status,
customer_age,
number_of_children,
annual_income,
occupation,
largest_bill_amount,
store_membership_card,
customer_since,
deal_stage,

HP Vertica Analytics Platform (7.1.x)

Page 425 of 5055

HP Vertica Documentation

deal_size,
last_deal_update
FROM public.customer_dimension
ORDER BY customer_gender,
annual_income
UNSEGMENTED ALL NODES;
is_a_winner
| t
-[ RECORD 2 ]--------+------------------------------------------------------------time
| 2014-04-11 06:30:17.961324-07
node_name
| v_vmart_node0001
session_id
| localhost.localdoma-931:0x1b7
user_id
| 45035996273704962
user_name
| dbadmin
design_id
| 45035996273705182
design_table_id
| 45035996273720624
projection_id
| 45035996273726714
iteration_number
| 1
projection_name
| product_dimension_dbd_2_sort_$product_version$__$product_key$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."product_dimension_
dbd_2_
sort_$product_version$__$product_key$"
(
product_key ENCODING AUTO,
product_version ENCODING RLE,
product_description ENCODING AUTO,
sku_number ENCODING AUTO,
category_description ENCODING AUTO,
department_description ENCODING AUTO,
package_type_description ENCODING AUTO,
package_size ENCODING AUTO,
fat_content ENCODING AUTO,
diet_type ENCODING AUTO,
weight ENCODING AUTO,
weight_units_of_measure ENCODING AUTO,
shelf_width ENCODING AUTO,
shelf_height ENCODING AUTO,
shelf_depth ENCODING AUTO,
product_price ENCODING AUTO,
product_cost ENCODING AUTO,
lowest_competitor_price ENCODING AUTO,
highest_competitor_price ENCODING AUTO,
average_competitor_price ENCODING AUTO,
discontinued_flag ENCODING AUTO
)
AS
SELECT product_key,
product_version,
product_description,
sku_number,
category_description,
department_description,
package_type_description,
package_size,
fat_content,
diet_type,
weight,
weight_units_of_measure,
shelf_width,

HP Vertica Analytics Platform (7.1.x)

Page 426 of 5055

HP Vertica Documentation

shelf_height,
shelf_depth,
product_price,
product_cost,
lowest_competitor_price,
highest_competitor_price,
average_competitor_price,
discontinued_flag
FROM public.product_dimension
ORDER BY product_version,
product_key
UNSEGMENTED ALL NODES;
is_a_winner
| f
.
.
.

The next example shows the contents of two records in the DC_DESIGN_QUERY_
PROJECTION_CANDIDATES. Both of these rows apply to projection id 45035996273726626.
In the first record, the Optimizer recommends that Database Designer optimize the customer_
gender column for the GROUPBYPIPEalgorithm.
In the second record, the Optimizer recommends that Database Designer optimize the
public.customer_dimension table for late materialization. Late materialization can improve the
performance of joins that might spill to disk.
=> SELECT * FROM dc_design_query_projection_candidates;
-[ RECORD 1 ]-----------------+----------------------------------------------------------time
| 2014-04-11 06:30:17.482377-07
node_name
| v_vmart_node0001
session_id
| localhost.localdoma-931:0x1b7
user_id
| 45035996273704962
user_name
| dbadmin
design_id
| 45035996273705182
design_query_id
| 3
iteration_number
| 1
design_table_id
| 45035996273720620
projection_id
| 45035996273726626
ideal_plan_feature
| GROUP BY PIPE
ideal_plan_feature_description | Group-by pipelined on column(s) customer_gender
dbd_benefits
| 5
opt_path_cost
| 211
-[ RECORD 2 ]-----------------+----------------------------------------------------------time
| 2014-04-11 06:30:17.48276-07
node_name
| v_vmart_node0001
session_id
| localhost.localdoma-931:0x1b7
user_id
| 45035996273704962
user_name
| dbadmin
design_id
| 45035996273705182
design_query_id
| 3

HP Vertica Analytics Platform (7.1.x)

Page 427 of 5055

HP Vertica Documentation

iteration_number
design_table_id
projection_id
ideal_plan_feature
ideal_plan_feature_description
dbd_benefits
opt_path_cost
.
.
.

|
|
|
|
|
|
|

1
45035996273720620
45035996273726626
LATE MATERIALIZATION
Late materialization on table public.customer_dimension
4
669

You can view the actual plan features that Database Designer implemented for the projections it
created. To do so, query the V_INTERNAL.DC_DESIGN_QUERY_PROJECTIONStable:
=> select * from v_internal.dc_design_query_projections;
-[ RECORD 1 ]-------------------+-----------------------------------------------------------time
| 2014-04-11 06:31:41.19199-07
node_name
| v_vmart_node0001
session_id
| localhost.localdoma-931:0x1b7
user_id
| 45035996273704962
user_name
| dbadmin
design_id
| 45035996273705182
design_query_id
| 1
projection_id
| 2
design_table_id
| 45035996273720624
actual_plan_feature
| RLE PREDICATE
actual_plan_feature_description | RLE on predicate column(s) department_description
dbd_benefits
| 2
opt_path_cost
| 141
-[ RECORD 2 ]-------------------+-----------------------------------------------------------time
| 2014-04-11 06:31:41.192292-07
node_name
| v_vmart_node0001
session_id
| localhost.localdoma-931:0x1b7
user_id
| 45035996273704962
user_name
| dbadmin
design_id
| 45035996273705182
design_query_id
| 1
projection_id
| 2
design_table_id
| 45035996273720624
actual_plan_feature
| GROUP BY PIPE
actual_plan_feature_description | Group-by pipelined on column(s) fat_content
dbd_benefits
| 5
opt_path_cost
| 155

Specifying Parameters for Database Designer


Before you run Database Designer to create a design, provide information that allows Database
Designer to create the optimal physical schema:

HP Vertica Analytics Platform (7.1.x)

Page 428 of 5055

HP Vertica Documentation

Design Name

Design Types

Optimization Objectives

Design Tables with Sample Data

Design Queries

K-Safety for Design

Replicated and Unsegmented Projections

Statistics Analysis

Design Name
All designs that Database Designer creates must have a name that you specify. The design name
must be alphanumeric or underscore (_)characters, and can be no more than 32 characters long.
(Administrative Tools and Management Console limit the design name to 16 characters.)
The design name becomes part of the files that Database Designer generates, including the
deployment script, allowing the files to be easily associated with a particular Database Designer
run.

Design Types
The Database Designer can create two distinct design types. The design you choose depends on
what you are trying to accomplish:
l

Comprehensive Design

Incremental Design

Comprehensive Design
A comprehensive design creates an initial or replacement design for all the tables in the specified
schemas. Create a comprehensive design when you are creating a new database.
To help Database Designer create an efficient design, load representative data into the tables
before you begin the design process. When you load data into a table, HP Vertica creates an
unoptimized superprojection so that Database Designer has projections to optimize. If a table has
no data, Database Designer cannot optimize it.

HP Vertica Analytics Platform (7.1.x)

Page 429 of 5055

HP Vertica Documentation

Optionally, supply Database Designer with representative queries that you plan to use so Database
Designer can optimize the design for them. If you do not supply any queries, Database Designer
creates a generic optimization of the superprojections that minimizes storage, with no queryspecific projections.
During a comprehensive design, Database Designer creates deployment scripts that:
l

Create new projections to optimize query performance, only when they do not already exist.

Create replacement buddy projections when Database Designer changes the encoding of preexisting projections that it has decided to keep.

Incremental Design
An incremental design creates an enhanced design with additional projections, if required, that are
optimized specifically for the queries that you provide. Create an incremental design when you have
one or more queries that you want to optimize.

Optimization Objectives
When creating a design, Database Designer can optimize the design for one of three objectives:
l

Load Database Designer creates a design that is optimized for loads, minimizing database size,
potentially at the expense of query performance.

Performance Database Designer creates a design that is optimized for fast query performance.
Because it recommends a design for optimized query performance, this design might
recommend more than the Load or Balanced objectives, potentially resulting in a larger database
storage size.

Balanced Database Designer creates a design whose objectives are balanced between
database size and query performance.

Design Tables with Sample Data


You must specify one or more design tables for Database Designer to deploy a design. If your
schema is empty, it does not appear as a design table option.
When you specify design tables, consider the following:
l

To create the most efficient projections for your database, load a moderate amount of
representative data into tables before running Database Designer. Database Designer considers
the data in this table when creating the design.

HP Vertica Analytics Platform (7.1.x)

Page 430 of 5055

HP Vertica Documentation

If your design tables have a large amount if data, the Database Designer run takes a long time; if
your tables have too little data, the design is not optimized. HP Vertica recommends that 10
GBof sample data is sufficient for creating an optimal design.

If you submit a design table with no data, Database Designer ignores it.

If one of your design tables has been dropped, you will not be able to build or deploy your design.

Design Queries
If you supply representative queries that you run on your database to Database Designer, it
optimizes the performance of those queries.
If you are creating an incremental design, you must supply design queries; if you are creating a
comprehensive design, HP Vertica recommends you supply design queries to create an optimal
design.
Database Designer checks the validity of all queries when you add them to your design and again
when it builds the design. If a query is invalid, Database Designer ignores it.

Query Repository
Using Management Console, you can submit design queries from the QUERY_
REQUESTSsystem table. This is called the query repository.
The QUERY_REQUESTStable contains queries that users have run recently. For a
comprehensive design, you can submit up to 200 queries from the QUERY_REQUESTStable to
Database Designer to be considered when creating the design. For an incremental design, you can
submit up to 100 queries from the QUERY_REQUESTStable.

K-Safety for Design


When you create a comprehensive design, you can set a K-safety value for your design. Valid
values are 0, 1, and 2. The value you specify is limited by the maximum K-safety allowed by the
number of nodes in your cluster.
Note: If you are not a DBADMINuser, you cannot set the design K-safety to a value less than
the system K-safety.
The default K-safety is as follows:

HP Vertica Analytics Platform (7.1.x)

Page 431 of 5055

HP Vertica Documentation

If your cluster has one or two nodes, the default K-safety is 0.

If your cluster has three or more nodes, the default K-safety is 1. .

For a comprehensive design, you can make the following changes to the design K-safety before
deploying the design:
l

If your cluster has one or two nodes, you cannot change the K-safety.

If your cluster has three or four nodes, you change the K-safety to 1 or 0.

If you cluster has five or more nodes, you can change the K-safety to 2, 1, or 0.

You cannot change the K-safety value of an incremental design. Incremental designs assume the
K-safety value of your cluster.

K-Safety Requirements
When creating projections with the Database Designer, projection definitions that meet K-Safe
design requirements are recommended and marked with the K-safety level. Note the output from
running the optimized design script generated by the Database Designer in the following example:
=> \i VMart_Schema_design_opt_1.sql
CREATE PROJECTION
CREATE PROJECTION
mark_design_ksafe
---------------------Marked design 1-safe
(1 row)

For more information about K-safety, seeK-Safety in the Concepts Guide.

Replicated and Unsegmented Projections


When creating a comprehensive design, Database Designer creates projections based on data
statistics and queries. It also reviews the submitted design tables to decide whether projections
should be segmented (distributed across the cluster nodes) or replicated (duplicated on all cluster
nodes).
For detailed information, see the following sections:
l

Replicated Projections

Unsegmented Projections

HP Vertica Analytics Platform (7.1.x)

Page 432 of 5055

HP Vertica Documentation

Replicated Projections
Replication occurs when HP Vertica stores identical copies of data across all nodes in a cluster.
If you are running on a single-node database, all projections are replicated because segmentation is
not possible in a single-node database.
Assuming that largest-row-count equals the number of rows in the design table with the largest
number of rows, Database Designer recommends that a projection be replicated if any one of the
following is true:
l

Condition 1: largest-row-count < 1,000000 and number of rows in the table <= 10% of largestrow-count.

Condition 2: largest-row-count >= 10,000,000 and number of rows in the table <= 1% of largestrow-count.

Condition 3: The number of rows in the table <= 100,000.

For more information about replication, see High Availability With Projections in the Concepts
Guide.

Unsegmented Projections
Segmentation occurs when HP Vertica distributes data evenly across multiple database nodes so
that all nodes participate in query execution. Projection segmentation provides high availability and
recovery, and optimizes query execution.
When running Database Designer programmatically or using Management Console, you can
specify to allow Database Designer to recommend unsegmented projections in the design. If you
do not specify this, Database Designer recommends only segmented projections.
Database Designer recommends segmented superprojections for large tables when deploying to
multiple node clusters, and recommends replicated superprojections for smaller tables.
Database Designer does not segment projections on:
l

Single-node clusters

LONG VARCHAR and LONG VARBINARY columns

For more information about segmentation, see High Availability With Projections in the Concepts
Guide.

HP Vertica Analytics Platform (7.1.x)

Page 433 of 5055

HP Vertica Documentation

Statistics Analysis
By default, Database Designer analyzes statistics for the design tables when adding them to the
design. This option is optional, but HPHP Vertica recommends that you analyze statistics because
accurate statistics help Database Designer optimize compression and query performance.
Analyzing statistics takes time and resources. If the current statistics for the design tables are up to
date, do not bother analyzing the statistics. When in doubt, analyze the statistics to make sure they
are current.
For more information, see Collecting Statistics.

Building a Design
After you have created design tables and loaded data into them, and then specified the parameters
you want Database Designer to use when creating the physical schema, direct Database Designer
to create the scripts necessary to build the design.
Note: You cannot stop a running database if Database Designer is building a database design.
When you build a database design, HP Vertica generates two scripts:
l

Deployment script<design_name>_deploy.sqlContains the SQL statements that create


projections for the design you are deploying, deploy the design, and drop unused projections.
When the deployment script runs, it creates the optimized design. For details about how to run
this script and deploy the design, see Deploying a Design.

Design script<design_name>_design.sqlContains the


CREATEPROJECTIONstatements that Database Designeruses to create the design. Review
this script to make sure you are happy with the design.
The design script is a subset of the deployment script. It serves as a backup of the DDLfor the
projections that the deployment script creates.

If you run Database Designer from Administrative Tools,.HP HP Vertica also creates a backup
script named <design_name>_projection_backup_<unique id #>.sql. This script contains
SQLstatements to deploy the design that existed on the system before deployment. This file is
useful in case you need to revert to the pre-deployment design.
When you create a design using Management Console:

HP Vertica Analytics Platform (7.1.x)

Page 434 of 5055

HP Vertica Documentation

If you submit a large number of queries to your design and build it right immediately, a timing
issue could cause the queries not to load before deployment starts. If this occurs, you may see
one of the following errors:
n

No queries to optimize for

No tables to design projections for

To accommodate this timing issue, you may need to reset the design, check the Queries tab to
make sure the queries have been loaded, and then rebuild the design. Detailed instructions are
in:

Using the Wizard to Create a Design

Creating a Design Manually

The scripts are deleted when deployment completes. To save a copy of the deployment script
after the design is built but before the deployment completes, go to the Output window and copy
and paste the SQLstatements to a file.

Resetting a Design
You must reset a design when:
l

You build a design and the output scripts described in Building a Design are not created.

You build a design but Database Designer cannot complete the design because the queries it
expects are not loaded.

Resetting a design discards all the run-specific information of the previous Database Designer
build, but retains its configuration (design type, optimization objectives, K-safety, etc.) and tables
and queries.
After you reset a design, review the design to see what changes you need to make. For example,
you can fix errors, change parameters, or check for and add additional tables or queries. Then you
can rebuild the design.
You can only reset a design in Management Console or by using the DESIGNER_RESET_
DESIGNfunction.

HP Vertica Analytics Platform (7.1.x)

Page 435 of 5055

HP Vertica Documentation

Deploying a Design
After running Database Designer to generate a deployment script, HP Vertica recommends that
you test your design on a non-production server before you deploy it to your production server.
Both the design and deployment processes run in the background. This is useful if you have a large
design that you want to run overnight. Because an active SSH session is not required, the
design/deploy operations continue to run uninterrupted, even if the session is terminated.
Note: You cannot stop a running database if Database Designer is building or deploying a
database design.
Database Designer runs as a background process. Multiple users can run Database Designer
concurrently without interfering with each other or using up all the cluster resources. However, if
multiple users are deploying a design on the same tables at the same time, Database Designer may
not be able to complete the deployment. To avoid problems, consider the following:
l

Schedule potentially conflicting Database Designer processes to run sequentially overnight so


that there are no concurrency problems.

Avoid scheduling Database Designer runs on the same set of tables at the same time.

There are two ways to deploy your design:


l

Deploying Designs Using Database Designer

Deploying Designs Manually

Deploying Designs Using Database Designer


HP recommends that you run Database Designer and deploy optimized projections right after
loading your tables with sample data because Database Designer provides projections optimized
for the current state of your database.
If you choose to allow Database Designer to automatically deploy your script during a
comprehensive design and are running Administrative Tools, Database Designer creates a backup
script of your database's current design. This script helps you re-create the design of projections
that may have been dropped by the new design. The backup script is located in the output directory
you specified during the design process.

HP Vertica Analytics Platform (7.1.x)

Page 436 of 5055

HP Vertica Documentation

If you choose not to have Database Designer automatically run the deployment script (for example,
if you want to maintain projections from a pre-existing deployment), you can manually run the
deployment script later. See Deploying Designs Manually.
To deploy a design while running Database Designer, do one of the following:
l

In Management Console, select the design and click Deploy Design.

In the Administration Tools, select Deploy design in the Design Options window.

If you are running Database Designer programmatically, use DESIGNER_RUN_POPULATE_


DESIGN_AND_DEPLOY and set the deploy parameter to 'true'.
Once you have deployed your design, query the DEPLOY_STATUS system table to see the steps
that the deployment took:
vmartdb=> SELECT * FROM V_MONITOR.DEPLOY_STATUS;

Deploying Designs Manually


If you chose not to have Database Designer deploy your design at design time, you can deploy the
design later using the deployment script:
1. Make sure that you have a database that contains the same tables and projections as the
database on which you ran Database Designer. The database should also contain sample
data.
2. To deploy the projections to a test or production environment, use the following vsql command
to execute the deployment script, where <design_name> is the name of the database design:
=> \i <design_name>_deploy.sql

How to Create a Design


There are three ways to create a design using Database Designer:
l

From Management Console, open a database and select the Design page at the bottom of the
window.
For details about using Management Console to create a design, see Using Management
Console to Create a Design

HP Vertica Analytics Platform (7.1.x)

Page 437 of 5055

HP Vertica Documentation

Programmatically, using the techniques described in About Running Database Designer


Programmatically in the Analyzing Data Guide. To run Database Designer programmatically,
you must be a DBADMIN or have been granted the DBDUSER role and enabled that role.

From the Administration Tools menu, by selecting Configuration Menu > Run Database
Designer. You must be a DBADMIN user to run Database Designer from the Administration
Tools.
For details about using Administration Tools to create a design, see Using Administration Tools
to Create a Design.

The following table shows what Database Designer capabilities are available in each tool:

Database Designer
Capability

Management
Console

Create design
Design name length (# of

Running Database
Designer
Programmatically

Administrative
Tools

Yes

Yes

Yes

16

32

16

Yes

Yes

Yes

characters)
Build design (create design
and deployment scripts)
Create backup script
Set design type

Yes
Yes

Yes

Yes

Set optimization objective

Yes

Yes

Yes

Add design tables

Yes

Yes

Yes

Add design queries file

Yes

Yes

Yes

(comprehensive or
incremental)

Add single design query

Yes

Use query repository

Yes

Yes

Set K-safety

Yes

Yes

Yes

Analyze statistics

Yes

Yes

Yes

HP Vertica Analytics Platform (7.1.x)

Page 438 of 5055

HP Vertica Documentation

Database Designer
Capability

Management
Console

Require all unsegmented

Running Database
Designer
Programmatically

Yes

Yes

Yes

Yes

Administrative
Tools

projections
View event history
Set correlation analysis

Yes

mode (Default = 0)

Using Management Console to Create a Design


To use Management Console to create an optimized design for your database, you must be a
DBADMINuser or have been assigned the DBDUSERrole.
Management Console provides two ways to create a design
WizardThis option walks you through the process of configuring a new design. Click Back

and Next to navigate through the Wizard steps, or Cancel to cancel creating a new design.
To learn how to use the Wizard to create a design, see Using the Wizard to Create a Design.
l

ManualThis option creates and saves a design with the default parameters.
To learn how to create a design manually, see Creating a Design Manually

Tip: If you have many design tables that you want Database Designer to consider, it might be
easier to use the Wizard to create your design. In the Wizard, you can submit all the tables in a
schema at once; creating a design manually requires that you submit the design tables one at a
time.

Using the Wizard to Create a Design


Take these steps to create a design using the Management Console's Wizard:
1. Log in to Management Console, select and start your database, and click Design at the bottom
of the window. The Database Designer window appears. If there are no existing designs, the
New Design window appears.

HP Vertica Analytics Platform (7.1.x)

Page 439 of 5055

HP Vertica Documentation

The left side of the Database Designer window lists the database designs you own, with the
most recent design you worked on selected. That pane also lists the current status of the
design.
The main pane contains details about the selected design.

2. To create a new design, click New Design.


3. Enter a name for your design, and click Wizard.
For more information, see Design Name.
4. Navigate through the Wizard using the Back and Next buttons.

HP Vertica Analytics Platform (7.1.x)

Page 440 of 5055

HP Vertica Documentation

5. To build the design immediately after exiting the Wizard, on the Execution Options window,
select Auto-build.

Important: Hewlett-Packard does not recommend that you auto-deploy the design from
the Wizard. There may be a delay in adding the queries to the design, so if the design is
deployed but the queries have not yet loaded, deployment may fail. If this happens, reset
the design, check the Queries tab to make sure the queries have been loaded, and deploy
the design.
6. When you have entered all the information, the Wizard displays a summary of your choices.
Click Submit Design to build your design.

Creating a Design Manually


To create a design using Management Console and specify the configuration, take these steps.

HP Vertica Analytics Platform (7.1.x)

Page 441 of 5055

HP Vertica Documentation

1. Log in to Management Console, select and start your database, and click Design at the bottom
of the window. The Database Designer window appears.
The left side of the Database Designer window lists the database designs you own, with the
most recent design you worked on highlighted. That pane also lists the current status of the
design. Details about the most recent design appear in the main pane.
The main pane contains details about the selected design.

2. To create a new design, click New Design.


3. Enter a name for your design and select Manual.

HP Vertica Analytics Platform (7.1.x)

Page 442 of 5055

HP Vertica Documentation

After a few seconds, the main Database Design window opens, displaying the default design
parameters. HP Vertica has created and saved a design with the name you specified, and
assigned it the default parameters.
For more information, see Design Name.
4. On the General window, modify the design type, optimization objectives, K-safety, Analyze
Correlations Mode, and the setting that allows Database Designer to create unsegmented
projections.
If you choose Incremental, the design automatically optimizes for the desired queries, and the
K-safety defaults to the value of the cluster K-safety; you cannot change these values for an
incremental design.
Analyze Correlations Mode determines if Database Designer analyzes and considers column
correlations when creating the design. For more information, see DESIGNER_SET_
ANALYZE_CORRELATIONS_MODE.
5. Click the Tables tab. You must submit tables to your design.
6. To add tables of sample data to your design, click AddTables. A list of available tables
appears; select the tables you want and click Save. If you want to remove tables from your
design, click the tables you want to remove, and click Remove Selected.
If a design table has been dropped from the database, a red circle with a white exclamation
point appears next to the table name. Before you can build or deploy the design, you must
remove any dropped tables from the design. To do this, select the dropped tables and and click
Remove Selected. You cannot build or deploy a design if any of the design tables have been
dropped.
7. Click the Queries tab. To add queries to your design, do one of the following:
n

To add queries from the QUERY_REQUESTSsystem table, click Query Repository,


select the desired queries and click Save. All valid queries that you selected appear in
theQueries window.

To add queries from a file, select Choose File. All valid queries in the file that you select are
added to the design and appear in the Queries window.

Database Designer checks the validity of the queries when you add the queries to the design
and again when you build the design. If it finds invalid queries, it ignores them.

HP Vertica Analytics Platform (7.1.x)

Page 443 of 5055

HP Vertica Documentation

If you have a large number of queries, it may take time to load them. Make sure that all the
queries you want Database Designer to consider when creating the design are listed in the
Queries window.
8. Once you have specified all the parameters for your design, you should build the design. To do
this, select your design and click Build Design.
9. Select Analyze Statistics if you want Database Designer to analyze the statistics before
building the design.
For more information see Statistics Analysis.
10. If you do not need to review the design before deploying it, select Deploy Immediately.
Otherwise, leave that option unselected.
11. Click Start. On the left-hand pane, the status of your design displays as Building until it is
complete.
12. To follow the progress of a build, click Event History. Status messages appear in this window
and you can see the current phase of the build operation. The information in the Event History
tab contains data from theOUTPUT_EVENT_HISTORY system table.
13. When the build completes, the left-hand pane displays Built. To view the deployment script,
select your design and click Output.
14. After you deploy the design using Management Console, the deployment script is deleted. To
keep a permanent copy of the deployment script, copy and paste the SQLcommands from the
Output window to a file.
15. Once you have reviewed your design and are ready to deploy it, select the design and click
Deploy Design.
16. To follow the progress of the deployment, click Event History. Status messages appear in this
window and you can see the current phase of the deployment operation.
In the Event History window, while the design is running, you can do one of the following:
n

Click the blue button next to the design name to refresh the event history listing.

Click Cancel Design Run to cancel the design in progress.

Click Force Delete Design to cancel and delete the design in progress.

HP Vertica Analytics Platform (7.1.x)

Page 444 of 5055

HP Vertica Documentation

17. When the deployment completes, the left-hand pane displays Deployment Completed. To
view the deployment script, select your design and click Output.
Your database is now optimized according to the parameters you set.

Using Administration Tools to Create a Design


To use the Administration Tools interface to create an optimized design for your database, you
must be a DBADMINuser. Follow these steps:
1. Log in as the dbadmin user and start Administration Tools.
2. From the main menu, start the database for which you want to create a design. The database
must be running before you can create a design for it.
3. On the main menu, select Configuration Menu and click OK.
4. On the Configuration Menu, select Run Database Designer and click OK.
5. On the Select a database to design window, enter the name of the database for which you
are creating a design and click OK.
6. On the Enter the directory for Database Designer output window, enter the full path to the
directory to contain the design script, deployment script, backup script, and log files, and click
OK.
For information about the scripts, see Building a Design.
7. On the Database Designer window, enter a name for the design and click OK.
For more information about design names, see Design Name.
8. On theDesign Type window, choose which type of design to create and click OK.
For a description of the design types, see Design Types
9. The Select schema(s) to add to query search path window lists all the schemas in the
database that you selected. Select the schemas that contain representative data that you want
Database Designer to consider when creating the design and click OK.
For more information about choosing schema and tables to submit to Database Designer, see
Design Tables with Sample Data.

HP Vertica Analytics Platform (7.1.x)

Page 445 of 5055

HP Vertica Documentation

10. On the Optimization Objectives window, select the objective you want for the database
optimization:
Optimize with Queries

For more information, see Design Queries.


n

Update statistics
For more information see Statistics Analysis.

Deploy design
For more information, see Deploying a Design.

For details about these objectives, see Optimization Objectives.


11. The final window summarizes the choices you have made and offers you two choices:
n

Proceed with building the design, and deploying it if you specified to deploy it immediately.
If you did not specify to deploy, you can review the design and deployment scripts and
deploy them manually, as described in Deploying Designs Manually.

Cancel the design and go back to change some of the parameters as needed.

12. Creating a design can take a long time.To cancel a running design from the Administration
Tools window, enter Ctrl+C.
To create a design for the VMart example database, see Using Database Designer to Create a
Comprehensive Design in the Getting Started Guide.

HP Vertica Analytics Platform (7.1.x)

Page 446 of 5055

HP Vertica Documentation

HP Vertica Analytics Platform (7.1.x)

Page 447 of 5055

About Running Database Designer Programmatically


If you have been granted the DBDUSER role and have enabled the role, you can access Database
Designer functionality programmatically. In previous releases, Database Designer was available
only via the Administration Tools. Using the DESIGNER_* command-line functions, you can
perform the following Database Designer tasks:
l

Create a comprehensive or incremental design.

Add tables and queries to the design.

Set the optimization objective to prioritize for query performance or storage footprint.

Assign a weight to each query.

Assign the K-safety value to a design.

Analyze statistics on the design tables.

Create the script that contains the DDL statements that create the design projections.

Deploy the database design.

Specify that all projections in the design be segmented.

Populate the design.

Cancel a running design.

Wait for a running design to complete.

Deploy a design automatically.

Drop database objects from one or more completed or terminated designs.

Important: When you grant the DBDUSER role, you must associate a resource pool with that
user to manage resources during Database Designer runs. Multiple users can run Database
Designer concurrently without interfering with each other or using up all the cluster resources.
When a user runs Database Designer, either using the Administration Tools or
programmatically, its execution is mostly contained by the user's resource pool, but may spill
over into some system resource pools for less-intensive tasks.

HP Vertica Analytics Platform

Page 448 of 5055

HP Vertica Documentation

For detailed information about each function, see Database Designer Functions in the SQL
Reference Manual.

When to Run Database Designer Programmatically


Run Database Designer programmatically when you want to:
l

Optimize performance on tables you own.

Create or update a design without the involvement of the superuser.

Add individual queries and tables, or add data to your design and then rerun Database Designer
to update the design based on this new information.

Customize the design.

Use recently executed queries to set up your database to run Database Designer automatically
on a regular basis.

Assign each design query a query weight that indicates the importance of that query in creating
the design. Assign a higher weight to queries that you run frequently so that Database Designer
prioritizes those queries in creating the design.

Categories Database Designer Functions


You can run Database Designer functions in vsql:

Setup Functions
This function directs Database Designer to create a new design:
l

DESIGNER_CREATE_DESIGN

Configuration Functions
The following functions allow you to specify properties of a particular design:
l

DESIGNER_DESIGN_PROJECTION_ENCODINGS

DESIGNER_SET_DESIGN_KSAFETY

DESIGNER_SET_OPTIMIZATION_OBJECTIVE

HP Vertica Analytics Platform (7.1.x)

Page 449 of 5055

HP Vertica Documentation

DESIGNER_SET_DESIGN_TYPE

DESIGNER_SET_PROPOSED_UNSEGMENTED_PROJECTIONS

DESIGNER_SET_ANALYZE_CORRELATIONS_MODE

Input Functions
The following functions allow you to add tables and queries to your Database Designer design:
l

DESIGNER_ADD_DESIGN_QUERIES

DESIGNER_ADD_DESIGN_QUERIES_FROM RESULTS

DESIGNER_ADD_DESIGN_QUERY

DESIGNER_ADD_DESIGN_TABLES

Invocation Functions
These functions populate the Database Designer workspace and create design and deployment
scripts. You can also analyze statistics, deploy the design automatically, and drop the workspace
after the deployment:
l

DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY

DESIGNER_WAIT_FOR_DESIGN

Output Functions
The following functions display information about projections and scripts that the Database
Designer created:
l

DESIGNER_OUTPUT_ALL_DESIGN_PROJECTIONS

DESIGNER_OUTPUT_DEPLOYMENT_SCRIPT

Cleanup Functions
The following functions cancel any running Database Designer operation or drop a Database
Designer design and all its contents:

HP Vertica Analytics Platform (7.1.x)

Page 450 of 5055

HP Vertica Documentation

DESIGNER_CANCEL_POPULATE_DESIGN

DESIGNER_DROP_DESIGN

DESIGNER_DROP_ALL_DESIGNS

Privileges for Running Database Designer Functions


If they have been granted the DBDUSER role, non-DBADMIN users can run Database Designer
using the functions described in Categories of Database Designer Functions. Non-DBADMIN
users cannot run Database Designer using Administration Tools, even if they have been assigned
the DBDUSER role.
To grant the DBDUSER role:
1. The DBADMIN user must grant the DBDUSER role:
=> GRANT DBDUSER TO <username>;

This role persists until the DBADMIN revokes it.


IMPORTANT: When you grant the DBDUSER role, make sure to associate a resource pool
with that user to manage resources during Database Designer runs. Multiple users can run
Database Designer concurrently without interfering with each other or using up all the cluster
resources. When a user runs Database Designer, either using the Administration Tools or
programmatically, its execution is mostly contained by the user's resource pool, but may spill
over into some system resource pools for less-intensive tasks.
2. For a user to run the Database Designer functions, one of the following must happen first:
n

The user must enable the DBDUSER role:


=> SET ROLE DBDUSER;

The superuser must add DBDUSER as the default role:


=> ALTER USER <username> DEFAULT ROLE DBDUSER;

DBDUSER Capabilities and Limitations


The DBDUSER role has the following capabilities and limitations:

HP Vertica Analytics Platform (7.1.x)

Page 451 of 5055

HP Vertica Documentation

A DBDUSER can change K-safety for their own designs, but they cannot change the system Ksafety value. The DBDUSER can set the K-safety to a value less than or equal to the system Ksafety value, but is limited to a value of 0, 1, or 2.

A DBDUSER cannot explicitly change the ancient history mark (AHM), even during deployment
of their design.

DBDUSER Privileges
When you create a design, you automatically have privileges to manipulate the design. Other tasks
may require that the DBDUSER have additional privileges:
To...
Add tables to a design

Add a single design query to the

DBDUSER must have...


l

USAGE privilege on the design table schema

OWNER privilege on the design table

Privilege to execute the design query

Read privilege on the storage location that contains

design
Add a query file to the design

the query file

Add queries from the result of a user

Privilege to execute all the queries in the file

Privilege to execute the user query

Privilege to execute each design query retrieved from

query to the design


the results of the user query
Create the design and deployment

scripts

WRITE privilege on the storage location of the design


script

WRITE privilege on the storage location of the


deployment script

Workflow for Running Database Designer Programmatically


The following example shows the steps you take to create a design by running Database Designer
programmatically.
Note: Be sure to back up the existing design using the EXPORT_CATALOG function before

HP Vertica Analytics Platform (7.1.x)

Page 452 of 5055

HP Vertica Documentation

running the Database Designer functions on an existing schema. You must explicitly back up
the current design when using Database Designer to create a new comprehensive design.
Before you run this example, you should have the DBDUSER role, and you should have enabled
that role using the SET ROLE DBDUSER command:
1. Create a table in the public schema:
=> CREATE TABLE T(
x INT,
y INT,
z INT,
u INT,
v INT,
w INT PRIMARY KEY
);

2. Add data to the table:


\! perl -e 'for ($i=0; $i<100000; ++$i)
{printf("%d, %d, %d, %d, %d, %d\n",
$i/10000, $i/100, $i/10, $i/2, $i, $i);}'
| vsql -c "COPY T FROM STDIN DELIMITER ',' DIRECT;"

3. Create a second table in the public schema:


=> CREATE TABLE T2(
x INT,
y INT,
z INT,
u INT,
v INT,
w INT PRIMARY KEY
);

4. Copy the data from table T1 to table T2 and commit the changes:
=> INSERT /*+DIRECT*/ INTO T2 SELECT * FROM T;
=> COMMIT;

5. Create a new design:


=> SELECT DESIGNER_CREATE_DESIGN('my_design');

This command adds information to the DESIGNS system table in the V_MONITOR schema.

HP Vertica Analytics Platform (7.1.x)

Page 453 of 5055

HP Vertica Documentation

6. Add tables from the public schema to the design :


=> SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t');
=> SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t2');

These commands add information to the DESIGN_TABLES system table.


7. Create a file named queries.txt in /tmp/examples, or another directory where you have
READ and WRITE privileges. Add the following two queries in that file and save it. Database
Designer uses these queries to create the design:
SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;
SELECT DISTINCT w FROM T;

8. Add the queries file to the design and display the resultsthe numbers of accepted queries,
non-design queries, and unoptimizable queries:
=> SELECT DESIGNER_ADD_DESIGN_QUERIES
('my_design',
'/tmp/examples/queries.txt',
'true'
);

The results show that both queries were accepted:


Number
Number
Number
Number

of
of
of
of

accepted queries
=2
queries referencing non-design tables =0
unsupported queries
=0
illegal queries
=0

The DESIGNER_ADD_DESIGN_QUERIES function populates the DESIGN_QUERIES


system table.
9. Set the design type to comprehensive. (This is the default.) A comprehensive design creates
an initial or replacement design for all the design tables:
=> SELECT DESIGNER_SET_DESIGN_TYPE('my_design', 'comprehensive');

10. Set the optimization objective to query. This setting creates a design that focuses on faster
query performance, which might recommend additional projections. These projections could
result in a larger database storage footprint:

HP Vertica Analytics Platform (7.1.x)

Page 454 of 5055

HP Vertica Documentation

=> SELECT DESIGNER_SET_OPTIMIZATION_OBJECTIVE('my_design', 'query');

11. Create the design and save the design and deployment scripts in /tmp/examples, or another
directory where you have READ and WRITE privileges. The following command:
n

Analyzes statistics

Doesn't deploy the design.

Doesn't drop the design after deployment.

Stops if it encounters an error.


=> SELECT DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY
('my_design',
'/tmp/examples/my_design_projections.sql',
'/tmp/examples/my_design_deploy.sql',
'True',
'False',
'False',
'False'
);

This command adds information to the following system tables:


n

DEPLOYMENT_PROJECTION_STATEMENTS

DEPLOYMENT_PROJECTIONS

OUTPUT_DEPLOYMENT_STATUS

12. Examine the status of the Database Designer run to see what projections Database Designer
recommends. In the deployment_projection_name column:
n

rep indicates a replicated projection

super indicates a superprojection


The deployment_status column is pending because the design has not yet been
deployed.
For this example, Database Designer recommends four projections:

HP Vertica Analytics Platform (7.1.x)

Page 455 of 5055

HP Vertica Documentation

=> \x
Expanded display is on.
=> SELECT * FROM OUTPUT_DEPLOYMENT_STATUS;
-[ RECORD 1 ]--------------+----------------------------deployment_id
| 45035996273795970
deployment_projection_id
| 1
deployment_projection_name | T_DBD_1_rep_my_design
deployment_status
| pending
error_message
| N/A
-[ RECORD 2 ]--------------+----------------------------deployment_id
| 45035996273795970
deployment_projection_id
| 2
deployment_projection_name | T2_DBD_2_rep_my_design
deployment_status
| pending
error_message
| N/A
-[ RECORD 3 ]--------------+----------------------------deployment_id
| 45035996273795970
deployment_projection_id
| 3
deployment_projection_name | T_super
deployment_status
| pending
error_message
| N/A
-[ RECORD 4 ]--------------+----------------------------deployment_id
| 45035996273795970
deployment_projection_id
| 4
deployment_projection_name | T2_super
deployment_status
| pending
error_message
| N/A

13. View the script /tmp/examples/my_design_deploy.sql to see how these projections are
created when you run the deployment script. In this example, the script also assigns the
encoding schemes RLE and COMMONDELTA_COMP to columns where appropriate.
14. Deploy the design from the directory where you saved it:
=> \i /tmp/examples/my_design_deploy.sql

15. Now that the design is deployed, delete the design:


=> SELECT DESIGNER_DROP_DESIGN('my_design');

HP Vertica Analytics Platform (7.1.x)

Page 456 of 5055

HP Vertica Documentation

Creating Custom Designs


HP strongly recommends that you use the physical schema design produced by Database
Designer, which provides K-safety, excellent query performance, and efficient use of storage
space. If you find that any of your queries are not running as efficiently as you would like, you can
use the Database Designer incremental design process to optimize the database design for the
query.
If the projections created by Database Designer still do not meet your needs, you can write custom
projections, from scratch or based on projection designs created by Database Designer.
If you are unfamiliar with writing custom projections, start by modifying an existing design
generated by Database Designer.

The Design Process


To customize an existing design or create a new one, take these steps:
1. Plan the design or design modification.
As with most successful projects, a good design requires some up-front planning. See
Planning Your Design.
2. Create or modify projections.
For an overview of the CREATE PROJECTION statement and guidelines for creating
common projections, see Design Fundamentals. The CREATE PROJECTION section in the
SQL Reference Manual also provides more detail.
3. Deploy the projections to a test environment. See Writing and Deploying Custom Projections.
4. Test the projections.
5. Modify the projections as necessary.
6. Once you have finalized the design, deploy the projections to the production environment.

HP Vertica Analytics Platform (7.1.x)

Page 457 of 5055

HP Vertica Documentation

Planning Your Design


The syntax for creating a design is easy for anyone who is familiar with SQL. As with any
successful project, however, a successful design requires some initial planning. Before you create
your first design:
l

Become familiar with standard design requirements and plan your design to include them. See
Design Requirements.

Determine how many projections you need to include in the design. See Determining the
Number of Projections to Use.

Determine the type of compression and encoding to use for columns. See Data Encoding and
Compression.

Determine whether or not you want the database to be K-safe. HP Vertica recommends that all
production databases have a minimum K-safety of one (K=1). Valid K-safety values are 0, 1, and
2. See Designing for K-Safety.

Design Requirements
A physical schema design is a script that contains CREATE PROJECTION statements. These
statements determine which columns are included in projections and how they are optimized.
If you use Database Designer as a starting point, it automatically creates designs that meet all
fundamental design requirements. If you intend to create or modify designs manually, be aware that
all designs must meet the following requirements:
l

Every design must create at least one superprojection for every table in the database that is
used by the client application. These projections provide complete coverage that enables users
to perform ad-hoc queries as needed. They can contain joins and they are usually configured to
maximize performance through sort order, compression, and encoding.

Query-specific projections are optional. If you are satisfied with the performance provided
through superprojections, you do not need to create additional projections. However, you can
maximize performance by tuning for specific query work loads.

HP recommends that all production databases have a minimum K-safety of one (K=1) to support
high availability and recovery. (K-safety can be set to 0, 1, or 2.) See High Availability With
Projections in the Concepts Guide and Designing for K-Safety.

HP Vertica Analytics Platform (7.1.x)

Page 458 of 5055

HP Vertica Documentation

HP Vertica recommends that if you have more than 20 nodes, but small tables, do not create
replicated projections. If you create replicated projections, the catalog becomes very large and
performance may degrade. Instead, consider segmenting those projections.

Determining the Number of Projections to Use


In many cases, a design that consists of a set of superprojections (and their buddies) provides
satisfactory performance through compression and encoding. This is especially true if the sort
orders for the projections have been used to maximize performance for one or more query
predicates (WHERE clauses).
However, you might want to add additional query-specific projections to increase the performance
of queries that run slowly, are used frequently, or are run as part of business-critical reporting. The
number of additional projections (and their buddies) that you create should be determined by:
l

Your organization's needs

The amount of disk space you have available on each node in the cluster

The amount of time available for loading data into the database

As the number of projections that are tuned for specific queries increases, the performance of these
queries improves. However, the amount of disk space used and the amount of time required to load
data increases as well. Therefore, you should create and test designs to determine the optimum
number of projections for your database configuration. On average, organizations that choose to
implement query-specific projections achieve optimal performance through the addition of a few
query-specific projections.

Designing for K-Safety


Before creating custom physical schema designs, determine whether you want the database to be
K-safe and adhere to the appropriate design requirements for K-safe databases or databases with
no K-safety. HP requires that all production databases have a minimum K-safety of one (K=1).
Valid K-safety values for production databases are 1 and 2. Non-production databases do not have
to be K-safe and can be set to 0. You can start by creating a physical schema design with no Ksafety, and then modify it to be K-safe at a later point in time. See High Availability and Recovery
and High Availability With Projections in the Concepts Guide for an explanation of how HP Vertica
implements high availability and recovery through replication and segmentation.
Your database must have a minimum number of nodes to be able to have a K-safety level greater
than zero, as shown in the following table:

HP Vertica Analytics Platform (7.1.x)

Page 459 of 5055

HP Vertica Documentation

K-level Number of Nodes Required


0

1+

3+

5+

2K+1

Note: HP Vertica does not support values of K higher than 2.


The value of K can be 1 or 2 only when the physical schema design meets certain redundancy
requirements. See Requirements for a K-Safe Physical Schema Design. To create designs that are
K-safe, HP recommends that you use the Database Designer.
By default, HP Vertica creates K-safe superprojections when the database has a K-safety greater
than 0 (K>0). When creating projections with Database Designer, projection definitions that meet Ksafe design requirements are recommended and marked with the K-safety level. Database
Designer creates a script that uses the MARK_DESIGN_KSAFEfunction to set the K-safety of the
physical schema to 1:
=> SELECT MARK_DESIGN_KSAFE (1);
MARK_DESIGN_KSAFE
---------------------Marked design 1-safe
(1 row)

Monitoring K-Safety
Monitoring tables can be accessed programmatically to enable external actions, such as alerts.
You monitor the K-safety level by polling the SYSTEM table column and checking the value. See
SYSTEM in the SQL Reference Manual.

Loss of K-Safety
When K nodes in your cluster fail, your database continues to run, although performance is
affected. Further node failures could potentially cause the database to shut down if the failed node's
data is not available from another functioning node in the cluster.

Requirements for a K-Safe Physical Schema Design


Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at
least three nodes. (If your cluster has one or two nodes, it generates designs with a K-safety of 0.
You can modify a design created for a three-node (or greater) cluster, and the K-safe requirements
are already set.

HP Vertica Analytics Platform (7.1.x)

Page 460 of 5055

HP Vertica Documentation

If you create custom projections, your physical schema design must meet the following
requirements to be able to successfully recover the database in the event of a failure:
l

Segmented projections must be segmented across all nodes. Refer to Designing for
Segmentation and Designing Segmented Projections for K-Safety.

Replicated projections must be replicated on all nodes. See Designing Replicated Projections
for K-Safety.

Segmented projections must have K buddy projections (projections that have identical columns
and segmentation criteria, except that corresponding segments are placed on different nodes).

You can use the MARK_DESIGN_KSAFE function to find out whether your schema design meets
requirements for K-safety.

Requirements for a Physical Schema Design with No K-Safety


If you use Database Designer to generate an comprehensive design that you can modify and you
do not want the design to be K-safe, set K-safety level to 0 (zero).
If you want to start from scratch, do the following to establish minimal projection requirements for a
functioning database with no K-safety (K=0):
1. Define at least one superprojection for each table in the logical schema.
2. Replicate (define an exact copy of) each dimension table superprojection on each node.

Designing Segmented Projections for K-Safety


If you are creating or modifying a design for a K-safe database, you need to create K-safe
projections for fact tables and large dimension tables. (A dimension table is considered to be large if
it is similar in size to a fact table.) To accomplish this, you must:
l

Create a segmented projection for each fact and large dimension table.

Create segmented buddy projections for each of these projections. The total number of
projections in a buddy set must be two for a K=1 database or three for a K=2 database.

For an overview of segmented projections and their buddies, see Projection Segmentation in the
Concepts Guide. For information about designing for K-safety, see Designing for K-Safety and
Designing for Segmentation.

Segmenting Projections
To segment a projection, use the segmentation clause to specify the:

HP Vertica Analytics Platform (7.1.x)

Page 461 of 5055

HP Vertica Documentation

Segmentation method to use.

Column to use to segment the projection.

Nodes on which to segment the projection. You can segment projections across all the nodes, or
just the number of nodes necessary to maintain K-safety, either three for a K=1 database or five
for a K=2 database.

See the CREATE PROJECTION statement in the SQL Reference Manual.


The following segmentation clause uses hash segmentation to segment the projection across all
nodes based on the T_retail_sales_fact.pos_transaction_number column:
CREATE PROJECTION retail_sales_fact_P1... SEGMENTED BY HASH(T_retail_sales_fact.pos_
transaction_number) ALL NODES;

Creating Buddy Projections


To create a buddy projection, copy the original projection and modify it as follows:
l

Rename it to something similar to the name of the original projection. For example, a projection
named retail_sales_fact_P1 could have buddies named retail_sales_fact_P1_B1 and
retail_sales_fact_P1_B2.

Modify the sort order as needed.

Create an offset to store the segments for the buddy on different nodes. For example, the first
buddy in a projection set would have an offset of one (OFFSET1;) the second buddy in a
projection set would have an offset of two (OFFSET2;), and so on.

To create a buddy for the projection created in the previous example:


CREATE PROJECTION retail_sales_fact_P1_B1... SEGMENTED BY HASH(T_retail_sales_fact.pos_
transaction_number) ALL NODES OFFSET 1;

Designing Replicated Projections for K-Safety


If you are creating or modifying a design for a K-safe database, make sure that projections for
dimension tables are replicated on each node in the database.
You can accomplish this using a single CREATE PROJECTION command for each dimension
table. The UNSEGMENTED ALL NODES syntax within the segmentation clause automatically
creates an unsegmented projection on each node in the database.

HP Vertica Analytics Platform (7.1.x)

Page 462 of 5055

HP Vertica Documentation

When you run your design script, HP Vertica generates a list of nodes based on the number of
nodes in the database and replicates the projection accordingly. Replicated projections have the
name:
projection-name_node-name

If, for example, the nodes are named NODE01, NODE02, and NODE03, the projections are named
ABC_NODE01, ABC_NODE02, and ABC_NODE03.
Note: This naming convention can affect functions that provide information about projections,
for example, GET_PROJECTIONS or GET_PROJECTION_STATUS, where you must
provide the name ABC_NODE01 instead of just ABC. To view a list of the nodes in a database,
use the View Database command in the Administration Tools.
The following script uses the UNSEGMENTED ALL NODES syntax to create one unsegmented
superprojection for the store_dimension table on each node.
CREATE PROJECTION store_dimension( C0_store_dimension_floor_plan_type ENCODING RLE ,
C1_store_dimension_photo_processing_type ENCODING RLE ,
C2_store_dimension_store_key ,
C3_store_dimension_store_name ,
C4_store_dimension_store_number ,
C5_store_dimension_store_street_address ,
C6_store_dimension_store_city ,
C7_store_dimension_store_state ,
C8_store_dimension_store_region ,
C9_store_dimension_financial_service_type ,
C10_store_dimension_selling_square_footage ,
C11_store_dimension_total_square_footage ,
C12_store_dimension_first_open_date ,
C13_store_dimension_last_remodel_date )
AS SELECT T_store_dimension.floor_plan_type,
T_store_dimension.photo_processing_type,
T_store_dimension.store_key,
T_store_dimension.store_name,
T_store_dimension.store_number,
T_store_dimension.store_street_address,
T_store_dimension.store_city,
T_store_dimension.store_state,
T_store_dimension.store_region,
T_store_dimension.financial_service_type,
T_store_dimension.selling_square_footage,
T_store_dimension.total_square_footage,
T_store_dimension.first_open_date,
T_store_dimension.last_remodel_date
FROM store_dimension T_store_dimension
ORDER BY T_store_dimension.floor_plan_type, T_store_dimension.photo_processing_type
UNSEGMENTED ALL NODES;

Note: Large dimension tables can be segmented. A dimension table is considered to be large
when it is approximately the same size as a fact table.

HP Vertica Analytics Platform (7.1.x)

Page 463 of 5055

HP Vertica Documentation

Designing for Segmentation


You segment projections using hash segmentation. Hash segmentation allows you to segment a
projection based on a built-in hash function that provides even distribution of data across multiple
nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or
more column values, each having a large number of unique values and an acceptable amount of
skew in the value distribution. Primary key columns that meet the criteria could be an excellent
choice for hash segmentation.
Note: For detailed information about using hash segmentation in a projection, see CREATE
PROJECTION in the SQL Reference Manual.
When segmenting projections, determine which columns to use to segment the projection. Choose
one or more columns that have a large number of unique data values and acceptable skew in their
data distribution. Primary key columns are an excellent choice for hash segmentation. The columns
must be unique across all the tables being used in a query.

HP Vertica Analytics Platform (7.1.x)

Page 464 of 5055

HP Vertica Documentation

Design Fundamentals
Although you can write custom projections from scratch, HP Vertica recommends that you use
Database Designer to create a design to use as a starting point. This ensures that you have
projections that meet basic requirements.

Writing and Deploying Custom Projections


Before you write custom projections, be sure to review the topics in Planning Your Design carefully.
Failure to follow these considerations can result in non-functional projections.
To manually modify or create a projection:
1. Write a script to create the projection, using the CREATE PROJECTION statement.
2. Use the \i meta-command in vsql to run the script.

Note: You must have a database loaded with a logical schema.

3. For a K-safe database, use the function SELECT get_projections('table_name') to verify


that the projections were properly created. Good projections are noted as being "safe." This
means that the projection has enough buddies to be K-safe.
4. If you added the new projection to a database that already has projections that contain data,
you need to update the newly created projection to work with the existing projections. By
default, the new projection is out-of-date (not available for query processing) until you refresh
it.
5. Use the MAKE_AHM_NOW function to set the Ancient History Mark (AHM) to the greatest
allowable epoch (now).
6. Use the DROP_PROJECTION function to drop any previous projections that are no longer
needed.
These projections can waste disk space and reduce load speed if they remain in the database.
7. Run the ANALYZE_STATISTICS function on all projections in the database. This function
collects and aggregates data samples and storage information from all nodes on which a
projection is stored, and then writes statistics into the catalog. For example:
=>SELECT ANALYZE_STATISTICS ('');

HP Vertica Analytics Platform (7.1.x)

Page 465 of 5055

HP Vertica Documentation

Designing Superprojections
Superprojections have the following requirements:
l

They must contain every column within the table.

For a K-safe design, superprojections must either be replicated on all nodes within the database
cluster (for dimension tables) or paired with buddies and segmented across all nodes (for very
large tables and medium large tables). See Physical Schema and High Availability With
Projections in the Concepts Guide for an overview of projections and how they are stored. See
Designing for K-Safety for design specifics.

To provide maximum usability, superprojections need to minimize storage requirements while


maximizing query performance. To achieve this, the sort order for columns in superprojections is
based on storage requirements and commonly used queries.

Minimizing StorageRequirements
Minimizing storage not only saves on physical resources, it increases performance by requiring the
database to perform less disk I/O. To minimize storage space for a projection:
l

Analyze the type of data stored in each projection column and choose the most effective
encoding method. See the CREATE PROJECTION statement and Encoding-Type in the SQL
Reference Manual.
The HP Vertica optimizer gives Run-Length Encoding (RLE) preference, so be sure to use it
whenever appropriate. Run Length Encoding (RLE) replaces sequences (runs) of identical
values with a single pair that contains the value and number of occurrences. Therefore, use it
only when the run length is large, such as when sorting low-cardinality columns.

Prioritize low-cardinality columns in the column sort order. This minimizes the number of rows
that HP Vertica stores and accesses to retrieve query results.

For more information about minimizing storage requirements, see Choosing Sort Order: Best
Practices.

Maximizing Query Performance


In addition to minimizing storage requirements, the column sort order facilitates the most commonly
used queries for the table. This means that the column sort order prioritizes the lowest cardinality
columns that are actually used in queries. For examples that take into account both storage and
query requirements, see Choosing Sort Order: Best Practices.
Note: For maximum performance, do not sort projections on LONG VARBINARY and LONG

HP Vertica Analytics Platform (7.1.x)

Page 466 of 5055

HP Vertica Documentation

VARCHAR columns.
Projections within a buddy set can all have different sort orders. This enables you to maximize
query performance for groups of queries with common WHERE clauses, but different sort orders.
If, for example, you have a three-node cluster, your buddy set contains three interrelated
projections, each having its own sort order.
In a database with a K-safety of 1 or 2, buddy projections are used for data recovery. If a node fails,
it queries the other nodes to recover data through buddy projections. If a projection's buddies use
different sort orders, it takes longer to recover the projection because the data has to be resorted
during recovery to match the sort order of the projection.
To address this, consider using identical sort orders for tables that are rarely queried or that are
repeatedly accessed by the same query, and use multiple sort orders for tables that are accessed
by queries with common WHERE clauses, but different sort orders.
If you have queries that access multiple tables or you want to maintain the same sort order for
projections within buddy sets, create query-specific projections. Designs that contain projections
for specific queries are called optimized designs.

Projection Design for Merge Operations


The HP Vertica query optimizer automatically picks the best projections to use for queries, but you
can help improve the performance of MERGE operations by ensuring projections are designed for
optimal use.
Good projection design lets HP Vertica choose the faster merge join between the target and source
tables without having to perform additional sort and data transfer operations.
HP recommends that you first use Database Designer to generate a comprehensive design and
then customize projections, as needed. Be sure to first review the topics in Planning Your Design.
Failure to follow those considerations could result in non-functioning projections.
In the following MERGE statement, HP Vertica inserts and/or updates records from the source
table's column b into the target table's column a:
=> MERGE INTO target t USING source s ON t.a = s.b WHEN ....

HP Vertica can use a local merge join if tables target and source use one of the following
projection designs, where their inputs are pre-sorted through the CREATE PROJECTION ORDER BY
clause:

HP Vertica Analytics Platform (7.1.x)

Page 467 of 5055

HP Vertica Documentation

Replicated projections that are sorted on:


n

Column a for target

Column b for source

Segmented projections that are identically segmented on:


n

Column a for target

Column b for source

Corresponding segmented columns

Tip: For best merge performance, the source table should be smaller than the target table.

See Also
l

Optimized Versus Non-Optimized MERGE

HP Vertica Analytics Platform (7.1.x)

Page 468 of 5055

HP Vertica Documentation

Maximizing Projection Performance


This section explains how to design your projections in order to optimize their performance.

Choosing Sort Order: Best Practices


When choosing sort orders for your projections, HP Vertica has several recommendations that can
help you achieve maximum query performance, as illustrated in the following examples.

Combine RLE and Sort Order


When dealing with predicates on low-cardinality columns, use a combination of RLE and sorting to
minimize storage requirements and maximize query performance.
Suppose you have a students table contain the following values and encoding types:
Column

# of Distinct Values

Encoded With

gender

2 (M or F)

RLE

pass_fail

2 (P or F)

RLE

class

4 (freshman, sophomore, junior, or senior) RLE

name

10000 (too many to list)

Auto

You might have queries similar to this one:


SELECT name FROM studentsWHERE gender = 'M' AND pass_fail = 'P' AND class = 'senior';

The fastest way to access the data is to work through the low-cardinality columns with the smallest
number of distinct values before the high-cardinality columns. The following sort order minimizes
storage and maximizes query performance for queries that have equality restrictions on gender,
class, pass_fail, and name. Specify the ORDER BY clause of the projection as follows:
ORDER BY students.gender, students.pass_fail, students.class, students.name

In this example, the gender column is represented by two RLE entries, the pass_fail column is
represented by four entries, and the class column is represented by 16 entries, regardless of the
cardinality of the students table. HP Vertica efficiently finds the set of rows that satisfy all the
predicates, resulting in a huge reduction of search effort for RLE encoded columns that occur early
in the sort order. Consequently, if you use low-cardinality columns in local predicates, as in the
previous example, put those columns early in the projection sort order, in increasing order of distinct
cardinality (that is, in increasing order of the number of distinct values in each column).

HP Vertica Analytics Platform (7.1.x)

Page 469 of 5055

HP Vertica Documentation

If you sort this table with student.class first, you improve the performance of queries that restrict
only on the student.class column, and you improve the compression of the student.class
column (which contains the largest number of distinct values), but the other columns do not
compress as well.Determining which projection is better depends on the specific queries in your
workload, and their relative importance.
Storage savings with compression decrease as the cardinality of the column increases; however,
storage savings with compression increase as the number of bytes required to store values in that
column increases.

Maximize the Advantages of RLE


To maximize the advantages of RLE encoding, use it only when the average run length of a column
is greater than 10 when sorted. For example, suppose you have a table with the following columns,
sorted in order of cardinality from low to high:
address.country, address.region, address.state, address.city, address.zipcode

The zipcode column might not have 10 sorted entries in a row with the same zip code, so there is
probably no advantage to run-length encoding that column, and it could make compression worse.
But there are likely to be more than 10 countries in a sorted run length, so applying RLE to the
country column can improve performance.

Put Lower Cardinality Column First for Functional Dependencies


In general, put columns that you use for local predicates (as in the previous example) earlier in the
join order to make predicate evaluation more efficient.In addition, if a lower cardinality column is
uniquely determined by a higher cardinality column (like city_id uniquely determining a state_id), it
is always better to put the lower cardinality, functionally determined column earlier in the sort order
than the higher cardinality column.

HP Vertica Analytics Platform (7.1.x)

Page 470 of 5055

HP Vertica Documentation

For example, in the following sort order, the Area_Code column is sorted before the Number column
in the customer_info table:
ORDER BY = customer_info.Area_Code, customer_info.Number, customer_info.Address

In the query, put the Area_Code column first, so that only the values in the Number column that start
with 978 are scanned.
=> SELECT AddressFROM customer_info WHERE Area_Code='978' AND Number='9780123457';

Sort for Merge Joins


When processing a join, the HP Vertica optimizer chooses from two algorithms:
l

Merge joinIf both inputs are pre-sorted on the join column, the optimizer chooses a merge
join, which is faster and uses less memory.

Hash joinUsing the hash join algorithm, HP Vertica uses the smaller (inner) joined table to
build an in-memory hash table on the join column. A hash join has no sort requirement, but it
consumes more memory because Vertica builds a hash table with the values in the inner table.
The optimizer chooses a hash join when projections are not sorted on the join columns.

If both inputs are pre-sorted,merge joins do not have to do any pre-processing, making the join
perform faster. HP Vertica uses the term sort-merge join to refer to the case when at least one of
the inputs must be sorted prior to the merge join. HP Vertica sorts the inner input side but only if the
outer input side is already sorted on the join columns.
To give the Vertica query optimizer the option to use an efficient merge join for a particular join,
create projections on both sides of the join that put the join column first in their respective
projections. This is primarily important to do if both tables are so large that neither table fits into

HP Vertica Analytics Platform (7.1.x)

Page 471 of 5055

HP Vertica Documentation

memory. If all tables that a table will be joined to can be expected to fit into memory simultaneously,
the benefits of merge join over hash join are sufficiently small that it probably isn't worth creating a
projection for any one join column.

Sort on Columns in Important Queries


If you have an important query, one that you run on a regular basis, you can save time by putting the
columns specified in the WHERE clause or the GROUP BY clause of that query early in the sort
order.
If that query uses a high-cardinality column such as Social Security number, you may sacrifice
storage by placing this column early in the sort order of a projection, but your most important query
will be optimized.

Sort Columns of Equal Cardinality By Size


If you have two columns of equal cardinality, put the column that is larger first in the sort order. For
example, a CHAR(20) column takes up 20 bytes, but an INTEGER column takes up 8 bytes. By
putting the CHAR(20) column ahead of the INTEGER column, your projection compresses better.

Sort Foreign Key Columns First, From Low to High Distinct Cardinality
Suppose you have a fact table where the first four columns in the sort order make up a foreign key
to another table. For best compression, choose a sort order for the fact table such that the foreign
keys appear first, and in increasing order of distinct cardinality. Other factors also apply to the
design of projections for fact tables, such as partitioning by a time dimension, if any.
In the following example, the table inventory stores inventory data, and product_key and
warehouse_key are foreign keys to the product_dimension and warehouse_dimension tables:
=> CREATE TABLE inventory (
date_key INTEGER NOT NULL,
product_key INTEGER NOT NULL,
warehouse_key INTEGER NOT NULL,
...
);
=> ALTER TABLE inventory
ADD CONSTRAINT fk_inventory_warehouse FOREIGN KEY(warehouse_key)
REFERENCES warehouse_dimension(warehouse_key);
ALTER TABLE inventory
ADD CONSTRAINT fk_inventory_product FOREIGN KEY(product_key)
REFERENCES product_dimension(product_key);

The inventory table should be sorted by warehouse_key and then product, since the cardinality of
the warehouse_key column is probably lower that the cardinality of the product_key.

HP Vertica Analytics Platform (7.1.x)

Page 472 of 5055

HP Vertica Documentation

Prioritizing Column Access Speed


If you measure and set the performance of storage locations within your cluster, HP Vertica uses
this information to determine where to store columns based on their rank. For more information, see
Setting Storage Performance.
How Columns are Ranked
HP Vertica stores columns included in the projection sort order on the fastest available storage
locations. Columns not included in the projection sort order are stored on slower disks. Columns for
each projection are ranked as follows:
l

Columns in the sort order are given the highest priority (numbers > 1000).

The last column in the sort order is given the rank number 1001.

The next-to-last column in the sort order is given the rank number 1002, and so on until the first
column in the sort order is given 1000 + # of sort columns.

The remaining columns are given numbers from 10001, starting with 1000 and decrementing by
one per column.

HP Vertica then stores columns on disk from the highest ranking to the lowest ranking. It places
highest-ranking columns on the fastest disks and the lowest-ranking columns on the slowest disks.
Overriding Default Column Ranking
You can modify which columns are stored on fast disks by manually overriding the default ranks for
these columns. To accomplish this, set the ACCESSRANK keyword in the column list. Make sure to
use an integer that is not already being used for another column. For example, if you want to give a
column the fastest access rank, use a number that is significantly higher than 1000 + the number of
sort columns. This allows you to enter more columns over time without bumping into the access
rank you set.
The following example sets the access rank for the C1_retail_sales_fact_store_key column to
1500.
CREATE PROJECTION retail_sales_fact_P1 ( C1_retail_sales_fact_store_key ENCODING RLE
ACCESSRANK 1500,
C2_retail_sales_fact_pos_transaction_number ,
C3_retail_sales_fact_sales_dollar_amount ,
C4_retail_sales_fact_cost_dollar_amount )

HP Vertica Analytics Platform (7.1.x)

Page 473 of 5055

HP Vertica Documentation

Projection Examples
This section provides examples that show you how to create projections.

New K-Safe=2 Database


In this example, projections are created for a new five-node database with a K-safety of 2. To
simplify the example, this database contains only two tables: retail_sale_fact and store_
dimension. Creating projections for this database consists of creating the following segmented and
unsegmented (replicated) superprojections:
l

Segmented projections
To support K-safety=2, the database requires three segmented projections (one projection and
two buddy projections) for each fact table. In this case, it requires three segmented projections
for the retail_sale_fact table:
Projection Description
P1

The primary projection for the retail_sale_fact table.

P1_B1

The first buddy projection for P1. This buddy is required to provide K-safety=1.

P1_B2

The second buddy projection for P1. This buddy is required to provide Ksafety=2.

Unsegmented Projections
To support the database, one unsegmented superprojection must be created for each dimension
table on each node. In this case, one unsegmented superprojection must be created on each
node for the store_dimension table:
Node

Unsegmented Projection

Node01 store_dimension_Node01
Node02 store_dimension_Node02
Node03 store_dimension_Node03
Node04 store_dimension_Node04
Node05 store_dimension_Node05

HP Vertica Analytics Platform (7.1.x)

Page 474 of 5055

HP Vertica Documentation

Creating Segmented Projections Example


The following SQL script creates the P1 projection and its buddies, P1_B1 and P1_B2, for the
retail_sales_fact table. The following syntax is significant:
l

CREATE PROJECTION creates the named projection (retail_sales_fact_P1, retail_


sales_fact_ P1_B1, or retail_sales_fact_P1_B2).

ALL NODES automatically segments the projections across all five nodes in the cluster without
specifically referring to each node.

HASH evenly distributes the data across these nodes.

OFFSET ensures that the same data is not stored on the same nodes for each of the buddies.
The first buddy uses OFFSET 1 to shift the storage locations by 1 and the second buddy uses
OFFSET 2 to shift the storage locations by 1. This is critical to ensure K-safety.
CREATE PROJECTION retail_sales_fact_P1 (
C1_retail_sales_fact_store_key ENCODING RLE ,
C2_retail_sales_fact_pos_transaction_number ,
C3_retail_sales_fact_sales_dollar_amount ,
C4_retail_sales_fact_cost_dollar_amount )
AS SELECT T_retail_sales_fact.store_key,
T_retail_sales_fact.pos_transaction_number,
T_retail_sales_fact.sales_dollar_amount,
T_retail_sales_fact.cost_dollar_amount
FROM retail_sales_fact T_retail_sales_fact
ORDER BY T_retail_sales_fact.store_key
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODES;
----------------------------------------------------------- Projection #
: 6
-- Projection storage (KBytes) : 4.8e+06
-- Note: This is a super projection for table: retail_sales_fact
CREATE PROJECTION retail_sales_fact_P1_B1 (
C1_retail_sales_fact_store_key ENCODING RLE ,
C2_retail_sales_fact_pos_transaction_number ,
C3_retail_sales_fact_sales_dollar_amount ,
C4_retail_sales_fact_cost_dollar_amount )
AS SELECT T_retail_sales_fact.store_key,
T_retail_sales_fact.pos_transaction_number,
T_retail_sales_fact.sales_dollar_amount,
T_retail_sales_fact.cost_dollar_amount
FROM retail_sales_fact T_retail_sales_fact
ORDER BY T_retail_sales_fact.store_key
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODES OFFSET 1;
----------------------------------------------------------- Projection #
: 6
-- Projection storage (KBytes) : 4.8e+06
-- Note: This is a super projection for table: retail_sales_fact
CREATE PROJECTION retail_sales_fact_P1_B2 (
C1_retail_sales_fact_store_key ENCODING RLE ,
C2_retail_sales_fact_pos_transaction_number ,

HP Vertica Analytics Platform (7.1.x)

Page 475 of 5055

HP Vertica Documentation

C3_retail_sales_fact_sales_dollar_amount ,
C4_retail_sales_fact_cost_dollar_amount )
AS SELECT T_retail_sales_fact.store_key,
T_retail_sales_fact.pos_transaction_number,
T_retail_sales_fact.sales_dollar_amount,
T_retail_sales_fact.cost_dollar_amount
FROM retail_sales_fact T_retail_sales_fact
ORDER BY T_retail_sales_fact.store_key
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODES OFFSET 2;
----------------------------------------------------------

Creating Unsegmented Projections Example


The following script uses the UNSEGMENTED ALL NODES syntax to create one unsegmented
superprojection for the store_dimension table on each node.
CREATE PROJECTION store_dimension ( C0_store_dimension_floor_plan_type ENCODING RLE ,
C1_store_dimension_photo_processing_type ENCODING RLE ,
C2_store_dimension_store_key ,
C3_store_dimension_store_name ,
C4_store_dimension_store_number ,
C5_store_dimension_store_street_address ,
C6_store_dimension_store_city ,
C7_store_dimension_store_state ,
C8_store_dimension_store_region ,
C9_store_dimension_financial_service_type ,
C10_store_dimension_selling_square_footage ,
C11_store_dimension_total_square_footage ,
C12_store_dimension_first_open_date ,
C13_store_dimension_last_remodel_date )
AS SELECT T_store_dimension.floor_plan_type,
T_store_dimension.photo_processing_type,
T_store_dimension.store_key,
T_store_dimension.store_name,
T_store_dimension.store_number,
T_store_dimension.store_street_address,
T_store_dimension.store_city,
T_store_dimension.store_state,
T_store_dimension.store_region,
T_store_dimension.financial_service_type,
T_store_dimension.selling_square_footage,
T_store_dimension.total_square_footage,
T_store_dimension.first_open_date,
T_store_dimension.last_remodel_date
FROM store_dimension T_store_dimension
ORDER BY T_store_dimension.floor_plan_type, T_store_dimension.photo_processing_type
UNSEGMENTED ALL NODES;

Adding Node to a Database


In this example, a fourth node (Node04) is being added to a three-node database cluster. The
database contains two tables: retail_sale_fact and store_dimension. It also contains the
following segmented and unsegmented (replicated) superprojections:

HP Vertica Analytics Platform (7.1.x)

Page 476 of 5055

HP Vertica Documentation

Segmented projections
P1 and its buddy, B1, are projections for the retail_sale_fact table. They were created using
the ALL NODES syntax, so HP Vertica automatically segments the projections across all three
nodes.

Unsegmented Projections
Currently three unsegmented superprojections exist for the store_dimension table, one for
each node, as follows:
Node

Unsegmented Projection

Node01 store_dimension_Node01
Node02 store_dimension_Node02
Node03 store_dimension_Node03
To support an additional node, replacement projections need to be created for the segmented
projections, P1 and B1. The new projections could be called P2 and B2, respectively. Additionally, an
unsegmented superprojection (store_dimension_Node04) needs to be created for the dimension
table on the new node (Node04).

Creating Segmented Projections Example


The following SQL script creates the original P1 projection and its buddy, B1, for the retail_sales_
fact table. Since the script uses the ALL NODES syntax, creating a new projection that includes
the fourth node is as easy as copying the script and changing the names of the projection and its
buddy to unique names (for example, P2 for the projection and P2_B2 for its buddy). The names that
need to be changed are highlighted within the example.
CREATE PROJECTION retail_sales_fact_P1 ( C1_retail_sales_fact_store_key ENCODING RLE ,
C2_retail_sales_fact_pos_transaction_number ,
C3_retail_sales_fact_sales_dollar_amount ,
C4_retail_sales_fact_cost_dollar_amount )
AS SELECT T_retail_sales_fact.store_key,
T_retail_sales_fact.pos_transaction_number,
T_retail_sales_fact.sales_dollar_amount,
T_retail_sales_fact.cost_dollar_amount
FROM retail_sales_fact T_retail_sales_fact
ORDER BY T_retail_sales_fact.store_key
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODES;
----------------------------------------------------------- Projection #
: 6
-- Projection storage (KBytes) : 4.8e+06
-- Note: This is a super projection for table: retail_sales_fact

HP Vertica Analytics Platform (7.1.x)

Page 477 of 5055

HP Vertica Documentation

CREATE PROJECTION retail_sales_fact_P1_B1 (


C1_retail_sales_fact_store_key ENCODING RLE ,
C2_retail_sales_fact_pos_transaction_number ,
C3_retail_sales_fact_sales_dollar_amount ,
C4_retail_sales_fact_cost_dollar_amount )
AS SELECT T_retail_sales_fact.store_key,
T_retail_sales_fact.pos_transaction_number,
T_retail_sales_fact.sales_dollar_amount,
T_retail_sales_fact.cost_dollar_amount
FROM retail_sales_fact T_retail_sales_fact
ORDER BY T_retail_sales_fact.store_key
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODES
OFFSET 1;
----------------------------------------------------------

Creating Unsegmented Projections Example


The following script used the ALL NODES syntax to create the original three unsegmented
superprojections for the store_dimension table, one per node.
In the following syntax,
l

CREATE PROJECTION creates a superprojection called store_dimension.

ALL NODES automatically places a complete copy of the superprojection on each of the three
original nodes.
CREATE PROJECTION store_dimension (
C0_store_dimension_floor_plan_type ENCODING RLE ,
C1_store_dimension_photo_processing_type ENCODING RLE ,
C2_store_dimension_store_key ,
C3_store_dimension_store_name ,
C4_store_dimension_store_number ,
C5_store_dimension_store_street_address ,
C6_store_dimension_store_city ,
C7_store_dimension_store_state ,
C8_store_dimension_store_region ,
C9_store_dimension_financial_service_type ,
C10_store_dimension_selling_square_footage ,
C11_store_dimension_total_square_footage ,
C12_store_dimension_first_open_date ,
C13_store_dimension_last_remodel_date )
AS SELECT T_store_dimension.floor_plan_type,
T_store_dimension.photo_processing_type,
T_store_dimension.store_key,
T_store_dimension.store_name,
T_store_dimension.store_number,
T_store_dimension.store_street_address,
T_store_dimension.store_city,
T_store_dimension.store_state,
T_store_dimension.store_region,
T_store_dimension.financial_service_type,
T_store_dimension.selling_square_footage,

HP Vertica Analytics Platform (7.1.x)

Page 478 of 5055

HP Vertica Documentation

T_store_dimension.total_square_footage,
T_store_dimension.first_open_date,
T_store_dimension.last_remodel_date
FROM store_dimension T_store_dimension
ORDER BY T_store_dimension.floor_plan_type, T_store_dimension.photo_processing_type
UNSEGMENTED ALL NODES;

HP Vertica Analytics Platform (7.1.x)

Page 479 of 5055

HP Vertica Documentation

Implementing Security
In your HP Vertica database, there are three primary security operations:
l

Client authentication prevents unauthorized access to the database.

Connection encryption prevents the interception of data and authenticates the identity of the
server and the client.

Client authorization (managing users and privileges) controls what users can access and
change in the database.

Client Authentication
To gain access to HP Vertica, a user or client application must supply the name of a valid user
account. You can configure HP Vertica to require just a user name,. However, you are likely to
require an additional means of authentication, such as a password.
You can use different authentication methods based on:
l

Connection type

Client IP address range

User name for the client that is attempting to access the server

For details, see Client Authentication.

Connection Encryption
To secure the connection between the client and the server, you can configure HP Vertica and
database clients to use Secure Socket Layer (SSL) to communicate. HP Vertica uses SSL to:
l

Authenticate the server so the client can confirm the server's identity. HP Vertica supports
mutual authentication in which the server can also confirm the identity of the client. This
authentication helps prevent "man-in-the-middle" attacks.

Encrypt data sent between the client and database server to significantly reduce the likelihood
that the data can be read if the connection between the client and server is compromised.

Verify that data sent between the client and server has not been altered during transmission.

For details, see Implementing SSL.

HP Vertica Analytics Platform (7.1.x)

Page 480 of 5055

HP Vertica Documentation

Client Authorization
Database users should have access to just the database resources they need to perform their
required tasks. For example, some users need to query only specific sets of data. To prevent
unauthorized access to additional data, you can limit their access to just the data that they need to
run their queries. Other users should be able to read the data but not be able to modify or insert new
data. Still other users might need more permissive access, including the right to create and modify
schemas, tables, and views, or grant other users access to database resources.
A collection of SQL statements control authorization for the resources users can access. For
details, see Managing Users and Privileges, in particular About Database Privileges.
You can also use roles to grant users access to a set of privileges, rather than directly grant the
privileges for each user. See About Database Roles.
Use the GRANT Statements to assign privileges to users and the REVOKE Statements to repeal
privileges.

HP Vertica Analytics Platform (7.1.x)

Page 481 of 5055

HP Vertica Documentation

Client Authentication
When a client (the user who runs a client application or the client application itself) connects to the
HP Vertica database server, it supplies the HP Vertica database user name to gain access. HP
Vertica restricts which database users can connect through client authentication, a process
where the database server establishes the identity of the requesting client and determines whether
that client is authorized to connect to the HP Vertica server using the supplied credentials.
HP Vertica offers several client authentication methods. You can configure HP Vertica to require
just a user name for connections, but you probably require more secure authentication, such as a
password at a minimum.

How Client Authentication Works


When connecting to an HP Vertica database, a user or client application must supply the name of a
valid user account. In addition, the application usually includes a means of authentication, such as
a password or security certificate.
There are two types of client authentication:
l

LOCALAuthenticating users or applications that are trying to connect from the same node on
which the database is running.

HOSTAuthenticating users or applications that are trying to connect from a node that has a
different IPv4or IPv6 address than the database.
For more information, see IPv4 and IPv6 for Client Authentication.

The DBADMINuser manages the client authentication information that the database uses to
authenticate users.
HP Vertica takes the following steps to authenticate users:
1. When a user or application attempts to connect to an HP Vertica database, the system checks
to see if the user is a DBADMINuser.
DBADMINusers can access the database at all times, unless their access is specifically
blocked by an authentication method, such as reject.

Note: For the DBADMIN user to be able to perform all Admintools functions, the
DBADMIN must always be able to authenticate by LOCAL TRUST or LOCAL

HP Vertica Analytics Platform (7.1.x)

Page 482 of 5055

HP Vertica Documentation

PASSWORD (the default for DBADMIN user). If you have changed DBADMIN user
authentication from LOCAL TRUST or LOCAL PASSWORD, use
theALTERAUTHENTICATION statement to once again give the DBADMIN user LOCAL
TRUST or LOCAL PASSWORD authentication.

2. For non-DBADMINusers, the database checks to see if the user is associated with an
authentication method through a GRANTstatement. If a user has been associated with more
than one authentication method, HP Vertica tries to authenticate the user with the higher
priority authentication method.

Note: For detailed information about how authentication priorities work, see Priorities for
Client Authentication Methods.

If the user presents the correct credentials, the database allows the user to log in .
The DBADMINuser can grant an authentication method to users or user roles. The
DBADMINuser can also create a default authentication method that HP Vertica uses when no
authentication has been associated with a user or role.
3. If the user is not associated with an authentication method, the database checks to see if the
DBADMINhas established a default authentication method.
4. If the DBADMINhas specified a default authentication method, the database authenticates the
user using that default method.
5. If you have not specified a default authentication method, the database checks to see if the
DBADMIN user has defined any authentication methods. If the DBADMIN has not, no
authentication information exists in the database. However, if a password exists, the
DBADMINuser can log in.
6. If authentication information exists, HP Vertica rejects the user's request to connect to the
database. The DBADMINhas not granted an authentication method for that user nor has the
DBADMINdefined a default authentication method for all users ('public').
7. If authentication records exist in the database, HP Vertica uses implicit trust/implicit password
to authenticate the user.

HP Vertica Analytics Platform (7.1.x)

Page 483 of 5055

HP Vertica Documentation

IPv4 and IPv6 for Client Authentication


HP Vertica7.1 supports clients using either the IPv4 or the IPv6 protocol to connect to the database
server. Internal communication between database servers must consistently use one address
family (IPv4 or IPv6). The client, however, can connect to the database from either type of IP
address.
If the client will be connecting from either IPv4 or IPv6, you must create two authentication
methods, one for each address. Any authentication method that uses HOST authentication requires
an IPaddress.
For example, the first statement allows users to connect from any IPv4 address. The second
statement allows users to connect from any IPv6 address:
=> CREATE AUTHENTICATION <name> METHOD 'gss' HOST '0.0.0.0/0'; --IPv4
=> CREATE AUTHENTICATION <name> METHOD 'gss' HOST '::/0';
--IPv6

If you are using a literal IPv6 address in a URL, you must enclose the IPv6 address in square
brackets as shown in the following examples:
=>
=>
=>
=>
=>

ALTER
ALTER
ALTER
ALTER
ALTER

AUTHENTICATION
AUTHENTICATION
AUTHENTICATION
AUTHENTICATION
AUTHENTICATION

Ldap
Ldap
Ldap
Ldap
Ldap

SET
SET
SET
SET
SET

host='ldap://[1dfa:2bfa:3:45:5:6:7:877]';
host='ldap://[fdfb:dbfa:0:65::177]';
host='ldap://[fdfb::177]';
host='ldap://[::1]';
host='ldap://[1dfa:2bfa:3:45:5:6:7:877]:5678';

If you are working with a multi-node cluster, any IP/netmask settings in (HOST, HOST TLS,
HOST NO TLS) must match all nodes in the cluster. This setup allows the database owner to
authenticate with and administer every node in the cluster. For example, specifying 10.10.0.8/30
allows a CIDR address range of 10.10.0.810.10.0.11.
For detailed information about IPv6 addresses, see RFC1924 and RFC2732.

Supported Client Authentication Methods


HP Vertica supports the following types of authentication to prove a client's identity.
l

Trust authenticationAuthorizes any user that connects to the server using a valid user name.

Reject authenticationBlocks the connection and prevents the requesting client from
accessing the database.

GSSauthenticationAuthorizes connecting to HP Vertica using a secure, mutual

HP Vertica Analytics Platform (7.1.x)

Page 484 of 5055

HP Vertica Documentation

authentication service with single sign-on and trusted third-party certificate authority. GSS
authentication uses the GSS-APIstandard and provides compatibility with non-MITKerberos
implementations, such as those for Java and Windows clients.

Note: Client authentication using Kerberos 5 was deprecated in HP Vertica 7.0.


GSSauthentication replaces Kerberos authentication.

Hash authenticationSends encrypted passwords hashed by the MD5 algorithm or the more
secure SHA-512 method over the network. The server provides the client with salt.

Important: Hewlett-Packard recommends that you use hash authentication instead of


password authentication. When a password is sent using password authentication, HP
Vertica transmits it in clear text. Hash authentication transmits passwords securely.
l

LDAP authenticationWorks like password authentication except the LDAPmethod


authenticates the client against a Lightweight Directory Access Protocol or Active Directory
server.

Ident authenticationAuthenticates the client against the username in an Ident server.

TLSauthenticationAuthenticates the client using digital certificates that contain a public key.
Transport Layer Security (TLS)is the successor to Secure SocketsLayer(SSL)authentication.

Local and Host Authentication


You can define aclient authentication method as:
l

Local: Local connection to the database.

Host:Remote connection to the database from different hosts, each with their own IPv4or IPv6
address and host parameters. For more information, see IPv4 and IPv6 for Client
Authentication.
You can designate host authentication with or without TLS.

Some authentication methods cannot be designated as local, as listed in this table:


Authentication Method

Local?

Host?

GSS

No

Yes

HP Vertica Analytics Platform (7.1.x)

Page 485 of 5055

HP Vertica Documentation

Authentication Method

Local?

Host?

Ident

Yes

No

LDAP

Yes

Yes

Hash

Yes

Yes

Reject

Yes

Yes

TLS

No

Yes

Trust

Yes

Yes

Managing Client Authentication


The DBADMINuser manages the client authentication records that are stored in your HP Vertica
database.
Important: Configure client authentication so that the DBADMINuser can always access the
database locally. If a problem occurs with the authentication that blocks all users from logging
in, the DBADMINuser needs access to correct the problem.
The DBADMINuser manages the following tasks:
Action

HP Vertica Analytics Platform (7.1.x)

How To

Page 486 of 5055

HP Vertica Documentation

Create an authentication record.

Use the CREATE AUTHENTICATION statement to


define authentication records for your database. HP
Vertica supports the following authentication methods:

Enable an authentication method.

Password

Trust

LDAP

GSS

Ident

Hash

Reject

TLS

When you create an authentication method, by default,


HP Vertica enables it. If an authentication method has
been disabled, use ALTERAUTHENTICATION to reenable it.

Disable an authentication method.

If you want to temporarily or permanently disable an


authentication method, use
ALTERAUTHENTICATION.

Set parameters for a given

Use ALTERAUTHENTICATION to define the

authentication method.

parameters that a specific authentication method


requires. The following methods require parameters:
l

LDAP

Ident

GSS

Define an authentication method as

Use the ALTERAUTHENTICATION statement to define

the default method.

which authentication method should be the default. If a


user is not associated with an authentication method, the
database tries to authenticate using the default method.

HP Vertica Analytics Platform (7.1.x)

Page 487 of 5055

HP Vertica Documentation

Associate an authentication

Use the GRANT(Authentication)statement to associate

method with a user.

a specific authentication method with a given user. If you


do not associate an authentication method with a user,
HP Vertica uses the default method.
You can associate multiple authentication methods to a
user. Use priorities to identify which authentication
method should be tried first. For more information, see
Priorities for Client Authentication Methods.

Revoke an authentication method

Use the REVOKE Authentication statement to remove

associated with a user.

an authentication method from a user. After the method


has been revoked, HP Vertica uses the default
authentication method to authenticate the user.

Drop an authentication method from

Use the DROPAUTHENTICATIONstatement to

the database.

remove an authentication method from your database.


After the method has been dropped, HP Vertica uses the
default authentication method to authenticate any user
who was associated with the dropped method.

For detailed information about managing authentication records, see:


l

Creating Authentication Records

Deleting Authentication Records

Enabling and Disabling Authentication Methods

Granting and Revoking Authentication Methods

Modifying Authentication Records

Creating Authentication Records


As of HP Vertica 7.1, you can manage client authentication records using the new vsql commands.
To use these statements, you must be connected to the database.
You can no longer modify client authentication records using the Administration Tools.The
Administration Tools interface allows you to modify the contents of the vertica.conf file.
However, HP Vertica ignores any client authentication information stored in that file.

HP Vertica Analytics Platform (7.1.x)

Page 488 of 5055

HP Vertica Documentation

When you create authentication records using CREATE AUTHENTICATION,specify the following
information.
What you
need to
specify

Description

Authentication Aname that you define for HP Vertica use.


method name
Authentication The type of authentication HP Vertica should use to validate the user or client
type

Access

attempting to connect:
l

'gss'

'ident'

'ldap'

'hash'

'reject'

'trust'

'tls'

LOCAL

HOST

HOST NO TLS

HOST TLS

method

Host

IP address or range of IPaddresses from which the user or application tries to

IPaddress

connect. This can be an IPv4 address or an IPv6 address. For more information,
see IPv4 and IPv6 for Client Authentication.

The following examples show how to create authentication records that are stored in the catalog.
When you create an authentication record using CREATEAUTHENTICATION, HP Vertica
automatically enables it.
This example creates an authentication method named localpwd to authenticate users who are
trying to log in from a local host using a password:

HP Vertica Analytics Platform (7.1.x)

Page 489 of 5055

HP Vertica Documentation

=> CREATE AUTHENTICATION localpwd METHOD 'hash' LOCAL;

This example creates an authentication method named v_ldap that uses LDAP over TLSto
authenticate users logging in from the host with the IPv4 address 10.0.0.0/23:
=> CREATE AUTHENTICATION v_ldap METHOD 'ldap' HOST TLS '10.0.0.0/23';

This example creates an authentication method named v_kerberos to authenticate users that are
trying to connect from any host in the networks 2001:0db8:0001:12xx:
=> CREATE AUTHENTICATION v_kerberos METHOD 'gss' HOST '2001:db8:1::1200/56';

This example creates an authentication method named, RejectNoSSL, that rejects users from any
IPaddress that are trying to authenticate without SSL/TLS:
=> CREATE AUTHENTICATION RejectNoSSL METHOD 'reject' HOST NO TLS '0.0.0.0/0';

--IPv4

=> CREATE AUTHENTICATION RejectNoSSL METHOD 'reject' HOST NO TLS '::/128';

--IPv6

See Also
l

Deleting Authentication Records

Enabling and Disabling Authentication Methods

Granting and Revoking Authentication Methods

Modifying Authentication Records

IPv4 and IPv6 for Client Authentication

Priorities for Client Authentication Methods


You can associate one or more authentication methods to a connection or user. For a user who has
multiple authentication methods, specify the order in which HP Vertica shouldtry them. To do so,
assign a priority to each authentication method using ALTERAUTHENTICATION. All priority
values should be a non-negative INTEGER.
Higher values indicate higher priorities. HP Vertica tries to authenticate a user with an
authentication method in order of priority from highest to lowest. For example:

HP Vertica Analytics Platform (7.1.x)

Page 490 of 5055

HP Vertica Documentation

A priority of 10 is higher than a priority of 5.

A priority 0 is the lowest possible value.

Priority Order for Authentication Methods


When you associate multiple authentication methods with a connection, HP Vertica uses the
following order to determine how to authenticate the client:
l

Administrator-assigned priority for an individual method

The most specific IPaddresses have priority over the least specific IPaddresses
For example, theIPv4 address 10.3.4.128/25 has priority over 10.3.0.0/24, which in turn has
priority over 10.3.0.0/16. The IPv6 address 2001:db8:ab::123/128 has priority over
2001:db8:1::1200/56.

Reject

GSS | LDAP |TLS| Ident , withTLShaving a higher priority than NOTLS

Hash

Trust

Authentication Attempts Using Multiple Methods


If there is only one authentication method associated with a user, HP Vertica uses that method to
authenticate the login attempt.
If the administrator has associated multiple authentication methods with a given user or IPaddress
, HP Vertica tries to authenticate as follows:
l

If the highest priority authentication method is Ident and authentication fails, HP Vertica tries the
next highest priority authentication method, regardless of what method it uses.
If the next attempt does not use Ident authentication and fails, the authentication process ends.
However, if the next attempt uses Ident and fails, HP Vertica continues to the next highest
priority method. This process continues until authentication is successful or a non-Ident
authentication attempt fails.

HP Vertica Analytics Platform (7.1.x)

Page 491 of 5055

HP Vertica Documentation

If the highest priority method is LDAP and authentication fails, HP Vertica searches for the next
highest priority LDAP method. Authentication attempts continue until the authentication is
successful, or there are no additional LDAP authentication methods that satisfy the connection
criteria.
Note that if a user not found error occurs during LDAPauthentication, the retry connection
attempt initiates only if you set the ldap_continue parameter to yes.

For all other authentication types, HP Vertica tries the highest priority authentication method
associated with that user. If that authentication fails, the authentication process stops.

For example, suppose there are two client authentication methods associated with a user, as
follows:
=>
=>
=>
=>
=>
=>

CREATE AUTHENTICATION auth_name1 METHOD 'hash' LOCAL;


GRANT AUTHENTICATION auth_name1 to user;
ALTER AUTHENTICATION auth_name1 PRIORITY 5;
CREATE AUTHENTICATION auth_name2 METHOD 'ident' LOCAL;
GRANT AUTHENTICATION auth_name2 to user;
ALTER AUTHENTICATION auth_name2 PRIORITY 10;

When user tries to connect to the database, HP Vertica first tries auth_name2 to authenticate
because it has a higher priority. If that fails, HP Vertica tries auth_name1. If that fails,
authentication fails.

Specifying Authentication Method Priority


To specify priorities for client authentication methods, use ALTERAUTHENTICATION. The
priority value must be a non-negative INTEGER. Higher numbers indicate a higher priority. The
default value, 0, is the lowest possible priority.
The syntax is:
ALTER AUTHENTICATION <name> ... PRIORITY <priority_value>;

If you do not specify a priority, or omit the <priority_value> when using ALTER
AUTHENTICATION, HP Vertica sets the priority to 0.

DBADMINand Authentication Priority


To allow the DBADMINuser to connect to the database at any time, Hewlett-Packard recommends
that you create an authentication method (LOCALTRUSTor LOCALPASSWORD)with a very
high priority, such as 10,000. Grant this method to the DBADMINuser, and set the priority using
ALTERAUTHENTICATION.

HP Vertica Analytics Platform (7.1.x)

Page 492 of 5055

HP Vertica Documentation

With the high priority, this new authentication method supersedes any authentication methods you
create for PUBLIC (which includes the DBADMINuser). Even if you make changes to
PUBLICauthentication methods, the DBADMIN still has access.
Note: For the DBADMIN user to be able to perform all Admintools functions, the DBADMIN
must always be able to authenticate by LOCAL TRUST or LOCAL PASSWORD (the default
for DBADMIN user). If you have changed DBADMIN user authentication from LOCAL TRUST
or LOCAL PASSWORD, use the ALTERAUTHENTICATION statement to once again give
the DBADMIN user LOCAL TRUST or LOCAL PASSWORD authentication.

Enabling and Disabling Authentication Methods


When you create an authentication method, HP Vertica stores it in the catalog and enables it
automatically. To enable or disable an authentication method, use the ALTERAUTHENTICATION
statement. Before you can use this approach, you must be connected to your database.
If an authentication method is not enabled, HP Vertica cannot use it to authenticate users and
clients trying to connect to the database.
If no authentication methods are enabled, any user can connect to the database. If the user has a
password, they will have to enter it to connect.
To disable an authentication method:
ALTER AUTHENTICATION v_kerberos DISABLE;

To re-enable this authentication method:


ALTER AUTHENTICATION v_kerberos ENABLE;

See Also
l

Creating Authentication Records

Deleting Authentication Records

Granting and Revoking Authentication Methods

Modifying Authentication Records

HP Vertica Analytics Platform (7.1.x)

Page 493 of 5055

HP Vertica Documentation

Granting and Revoking Authentication Methods


Before HP Vertica can validate a user or client through an authentication method, you must first
associate that authentication method with the user or role that requires it. To do this, use
GRANTAUTHENTICATION. When that user or role no longer needs to connect to HP Vertica
using that method, you can disassociate that authentication from that user with
REVOKEAUTHENTICATION.

Granting Authentication Methods


You can grant an authentication method to a specific user or role. You can also specify the default
authentication method by granting an authentication method to Public. Use the GRANT
(Authentication)statement as follows:
This example uses a GRANTAUTHENTICATION statement to associate v_ldap authentication
with user jsmith:
=> GRANT AUTHENTICATION v_ldap TO jsmith;

This example uses a GRANTAUTHENTICATION statement to associate v_gss authentication to


the role DBprogrammer:
=> CREATE ROLE DBprogrammer;
=> GRANT AUTHENTICATION v_gss to DBprogrammer;

This example sets the default client authentication method to v_localpwd:


=> GRANT AUTHENTICATION v_localpwd TO Public;

Revoking Authentication Methods


If you no longer want to authenticate a user or client with a given authentication method, use the
REVOKE (Authentication) statement as follows:
This example revokes v_ldap authentication from user jsmith:
=> REVOKE AUTHENTICATION v_ldap FROM jsmith;

This example revokes v_gss authentication from the role DBprogrammer:

HP Vertica Analytics Platform (7.1.x)

Page 494 of 5055

HP Vertica Documentation

=> REVOKE AUTHENTICATION v_gss FROM DBprogrammer;

This example removes localpwd as the default client authentication method:


=> REVOKE AUTHENTICATION localpwd from Public;

Modifying Authentication Records


To modify existing authentication records, you must first be connected to your database. The
following examples show how to make changes to your authentication records.

Enable or Disable an Authentication Method


Use ALTERAUTHENTICATIONto disable the v_ldap authentication method. Then, re-enable it:
=> ALTER AUTHENTICATION v_ldap DISABLE;
=> ALTER AUTHENTICATION v_ldap ENABLE;

Rename an Authentication Method


Rename the v_kerberos authentication method to K5, and enable it. All users who have been
associated with the v_kerberos authentication method are now associated with the K5 method
granted instead.
=> ALTER AUTHENTICATION v_kerberos RENAME TO K5 ENABLE;

Specify a Priority for an Authentication Method


Specify a priority of 10 for K5 authentication:
=> ALTER AUTHENTICATION K5 PRIORITY 10;

Change a Parameter
Set the system_users parameter for ident1 authentication to root:
=> CREATE AUTHENTICATION ident1 METHOD 'ident' LOCAL;
=> ALTER AUTHENTICATION ident1 SET system_users='root';

HP Vertica Analytics Platform (7.1.x)

Page 495 of 5055

HP Vertica Documentation

Change the IPaddress and specify the parameters for an LDAP authentication method named
Ldap1.
In this example, you specify the bind parameters for the LDAPserver. HP Vertica connects to the
LDAP server, which authenticates the HP Vertica client. If the authentication succeeds, HP
Vertica authenticates any users who have been granted the Ldap1 authentication method on the
designated LDAPserver:
=> CREATE AUTHENTICATION Ldap1 METHOD 'ldap' HOST '172.16.65.196';
=> ALTER AUTHENTICATION Ldap1 SET host='ldap://172.16.65.177',
binddn_prefix='cn=', binddn_suffix=',dc=qa_domain,dc=com';

Change the IPaddress, and specify the parameters for an LDAP authentication method named
Ldap1. Assume that HP Vertica does not have enough information to create the distinguished name
(DN)for a user attempting to authenticate. Therefore, in this case, you must specify to use LDAP
bind and search:
=> CREATE AUTHENTICATION LDAP1 METHOD 'ldap' HOST '172.16.65.196';
=> ALTER AUTHENTICATION Ldap1 SET host='ldap://172.16.65.177',
basedn='dc=qa_domain,dc=com',binddn='cn=Manager,dc=qa_domain,
dc=com',search_attribute='cn',bind_password='secret';

Change the Associated Method


Change the localpwd authentication from trust to hash:
=> CREATE AUTHENTICATION localpwd METHOD 'trust' LOCAL;
=> ALTER AUTHENTICATION localpwd METHOD 'hash';

ALTERAUTHENTICATIONvalidates the parameters you enter. If there are errors, it disables the
authentication method that you are trying to modify.

See Also
l

Creating Authentication Records

Deleting Authentication Records

Enabling and Disabling Authentication Methods

Granting and Revoking Authentication Methods

IPv4 and IPv6 for Client Authentication

HP Vertica Analytics Platform (7.1.x)

Page 496 of 5055

HP Vertica Documentation

Deleting Authentication Records


To delete client authentication record, use DROPAUTHENTICATION. Before you can use this
approach, you must be connected to your database.
To delete an authentication record for md5_auth, use the following command:
=> DROP AUTHENTICATION md5_auth;

To delete an authentication record for a method that has been granted to a user, use the
CASCADEkeyword:
=> CREATE AUTHENTICATION localpwd METHOD 'password' LOCAL;
=> GRANT AUTHENTICATION localpwd TO jsmith;
=> DROP AUTHENTICATION localpwd CASCADE;

See Also
l

Creating Authentication Records

Enabling and Disabling Authentication Methods

Granting and Revoking Authentication Methods

Modifying Authentication Records

Viewing Information About Client Authentication Records


For information about client authentication records that you have configured for your database,
query the following system tables in the V_CATALOGschema:
l

CLIENT_AUTH

CLIENT_AUTH_PARAMS

PASSWORD_AUDITOR

USER_CLIENT_AUTH

To determine the details behind the client authentication used for a particular user session, query
the following tables in the V_MONITORschema:

HP Vertica Analytics Platform (7.1.x)

Page 497 of 5055

HP Vertica Documentation

SESSIONS

USER_SESSIONS

HP Vertica Analytics Platform (7.1.x)

Page 498 of 5055

HP Vertica Documentation

Password Authentication
One of the simplest ways to authenticate a client connection to the database is to assign the user
account an HP Vertica password. If a user account has a password, the user or client for that
account must supply the correct password to connect to the database. However, if the user
account does not have a password, and HP Vertica is not configured to use another form of client
authentication, the user account is always allowed to log in.
HP Vertica stores passwords in an encrypted format to prevent potential theft. However, the
transmission of the password to HP Vertica is in plain text. Thus, it is possible for a "man-in-themiddle" attack to intercept the password.
To secure the login, Hewlett-Packard recommends implementing SSL security or hash
authentication.

About Password Creation and Modification


You must be a superuser to create passwords for user accounts using the CREATE USER
statement. A superuser can set any user account's password.
l

To add a password, use the ALTER USER statement.

To change a password, use ALTER USER or the vsql \password command.

Users can also change their own passwords.


To make password authentication more effective, HP Vertica recommends that you enforce
password policies that control how often users are forced to change passwords and the required
content of a password. You set these policies using Profiles.

Default Password Authentication


When you have not specified any authentication methods, HP Vertica defaults to using password
authentication for user accounts that have passwords.
If you create authentication methods, even for remote hosts, password authentication is disabled.
In such cases, you must explicitly enable password authentication. The following commands
create the local_pwd authentication method and make it the default for all users. When you create
an authentication method, HP Vertica enables it automatically:
=> CREATE AUTHENTICATION local_pwd METHOD hash' LOCAL;
=> GRANT AUTHENTICATION local_pwd To Public;

HP Vertica Analytics Platform (7.1.x)

Page 499 of 5055

HP Vertica Documentation

Profiles
You set password policies using profiles. A profile is a group of parameters that includes
requirements for user passwords.
A profile controls:
l

How often users must change their passwords.

How many times users must change their passwords before they can reuse an old password.

How many times a user can fail to log in before the account is locked.

The required length and content of the password:


n

Maximum and minimum number of characters

Minimum number of capital letters, lowercase letters, digits, and symbols required in a
password.

To set a user's password policy, assign the user to a profile. To enforce different password policies
for different users, create multiple profiles. For example, you might create one profile for interactive
users, requiring them to frequently change their passwords. You might create another profile for
user accounts that are not required to change passwords.

Create and Modify Profiles


You create profiles using the CREATE PROFILE statement and change profiles using ALTER
PROFILE. You can assign a user to a profile when you create the user (CREATE USER), or after,
using the ALTER USER statement. A user can be assigned to only one profile at a time.
All newly created databases contain an initial profile named DEFAULT. HP Vertica assigns all
users to the DEFAULT profile if:
l

You do not explicitly assign users a profile when you create them.

You drop the profile to which a user is currently assigned.

You can change the policy parameters in the DEFAULT profile, but you cannot delete it.

Important: During upgrades from versions of HP Vertica earlier than version 5.0, each
database receives a DEFAULT profile. All users are then automatically assigned to that
profile.

HP Vertica Analytics Platform (7.1.x)

Page 500 of 5055

HP Vertica Documentation

The profiles that you create can inherit some or all of their policy parameters from the DEFAULT
profile. When you create a profile using CREATE PROFILE, a parameter inherits its value from the
DEFAULT profile if:
l

You set it to the DEFAULTvalue.

You do not assign a value.

If you change a parameter in the DEFAULT profile, you also change that parameter's value in every
profile that inherits the parameter from DEFAULT.
Changes to a profile's policies for password content do not have an immediate effect on the users.
When HP Vertica does not test user's passwords to verify that they comply with the new password
criteria. Instead, the changed settings only affect the users the next time they change their
password. To make sure that users comply with the new password policy, use the ALTER USER
statement to expire user passwords. HP Vertica prompts users with expired passwords to change
their passwords when they next log in.
Note: Only the profile settings for how many failed login attempts trigger Account Locking and
how long accounts are locked have an effect on password authentication methods such as
LDAP or GSS. All password complexity, reuse, and lifetime settings affect only passwords
that HP Vertica manages.

See Also
l

PROFILES

Password Expiration
User profiles control how often users must change their passwords. Initially, the DEFAULT profile
is set so that passwords never expire.
Important: Password expiration has no effect on any of the user's current sessions.

Setting Password Expiration and Grace Period


You can change the default value to set a password expiration. Alternatively, you can create
additional profiles that set time limits for passwords and assign users to them.
When a password expires, the user must change the password on the next login. However, you can
set a PASSWORD_GRACE_TIME in any individual user's profile, allowing that user to log in after

HP Vertica Analytics Platform (7.1.x)

Page 501 of 5055

HP Vertica Documentation

the expiration. After the password expires, HP Vertica issues a warning about the password
expiration but continues to recognize the password.
After the grace period ends, users must change their passwords to log in, unless they have
changed them already in response to the warning.

Expiring a Password
You can expire a user's password immediately using the ALTER USER statement's PASSWORD
EXPIRE parameter. By expiring a password, you can:
l

Force users to comply with a change to password policy.

Set a new password when a user forgets the old password.

Account Locking
In a profile, you can set a password policy for how many consecutive failed login attempts a user
account is allowed before locking.This locking mechanism helps prevent dictionary-style bruteforce attempts to guess users' passwords.
Set this value using the FAILED_LOGIN_ATTEMPTS parameter using the CREATE PROFILE or
ALTER PROFILE statement.
HP Vertica locks any user account that has more consecutive failed login attempts than the value
to which you set FAILED_LOGIN_ATTEMPTS. The user cannot log in to a locked account, even by
supplying the correct password.

Unlock a Locked Account


You can unlock accounts in one of two ways, depending on your privileges.
l

ManuallyIf you are a superuser, you can manually unlock the account using the ALTER
USER command.

Note: A superuser account cannot be locked, because it is the only user that can unlock
accounts. For this reason, choose a very secure password for a superuser account. See
Password Guidelines for suggestions.

Password Lock Time SettingSpecify the number of days until an account unlocks in the
PASSWORD_LOCK_TIME parameter of the user's profile. HP Vertica automatically unlocks the
account after the specified number of days has passed. If you set this parameter to UNLIMITED,
the user's account is never automatically unlocked, and a superuser must manually unlock it.

HP Vertica Analytics Platform (7.1.x)

Page 502 of 5055

HP Vertica Documentation

Password Guidelines
For passwords to be effective, they must be hard to guess. You need to protect passwords from:
l

Dictionary-style, brute-force attacks

Users who have knowledge of the password holder (family names, dates of birth, etc.)

Use Profiles to enforce good password practices (password length and required content). Make
sure database users know the password guidelines, and encourage them not to use personal
information in their passwords.

Strong Passwords
Use the following password guidelines, published by the Internet Engineering Task Force (IETF),
when you create passwords:
l

Use mixed-case characters.

Use passphrases, a sequence of words or other text that is hard to guess.

Use non-alphabetic characters, for example, numeric digits and punctuation.

Use a password that is easy to remember, so you do not need to write it down. For example, use
i3atSandw1ches! instead of !a#^*!$&D)z.

Use a password that you can type quickly without having to look at the keyboard.

Weak Passwords
Avoid using the following practices to create a password:
l

Do not use your login or user name in any form (as-is, reversed, capitalized, doubled, and so on).

Do not use your first, middle, or last name in any form.

Do not use your spouse's, partner's, child's, parent's, friend's, or pet's name in any form.

Do not use other information easily obtained about you, including your date of birth, license plate
number, telephone number, Social Security number, make of your automobile, house address,
and so on.

HP Vertica Analytics Platform (7.1.x)

Page 503 of 5055

HP Vertica Documentation

Do not use a password of all digits or all the same letter.

Do not use a word contained in English or foreign language dictionaries, spelling lists, acronym
or abbreviation lists, or other lists of words.

Do not use a password that contains fewer than six characters.

Do not give your password to another person for any reason.

See Also
l

Creating a Database Name and Password

Configuring LDAPAuthentication
Lightweight Directory Access Protocol (LDAP)is an authentication method that works like
password authentication. The main difference is that the LDAPmethod authenticates clients trying
to access your HP Vertica database against an LDAPor Active Directory server. Use
LDAPauthentication when your database needs to authenticate a user with an LDAPor Active
Directory server.
For details about configuring LDAPauthentication, see:
l

What You Need to Know to Configure LDAPAuthentication

General LDAPParameters

Workflow for Configuring LDAPBind

Workflow for Configuring LDAPBind and Search

Configuring MultipleLDAPServers

What You Need to Know to Configure LDAPAuthentication


Before you configure LDAPauthentication for your HP Vertica database, review the following
topics:
l

Prerequisites for LDAPAuthentication

LDAPAuthentication Definitions

DBADMINAuthentication Access

HP Vertica Analytics Platform (7.1.x)

Page 504 of 5055

HP Vertica Documentation

General LDAPParameters

Bind vs. Bind and Search

LDAPAnonymous Binding

Using LDAPOver SSL/TLS

Prerequisites for LDAPAuthentication


Before you configure LDAPauthentication for your HP Vertica database you must have:
l

IPaddress and host name for the LDAP server. HP Vertica supports IPv4 and IPv6 addresses.

Your organization's Active Directory information.

A service account for bind and search.

Administrative access to your HP Vertica database.

open-ldap-tools package installed on at least one node. This package includes ldapsearch.

LDAPAuthentication Definitions
The following definitions are important to remember for LDAPauthentication:
Parameter name

Description

Host

IPaddress or host name of theLDAPserver. HP Vertica supports IPv4 and


IPv6 addresses. For more information, see IPv4 and IPv6 for Client
Authentication.

Common name

Depending on your LDAPenvironment, this value can be either the

(CN)

username or the first and last name of the user.

Domain

Comma-separated list that contains your organization's domain component

component (DC)

broken up into separate values, for example:


dc=vertica, dc=com

Distinguished

domain.com. A DNconsists of two DCcomponents, as in "DC=example,

name (DN)

DC= com".

Organizational unit

Unit in the organization with which the user is associated, for example,

(OU)

Vertica Users.

HP Vertica Analytics Platform (7.1.x)

Page 505 of 5055

HP Vertica Documentation

Parameter name

Description

sAMAccountName An Active Directory user account field. This value is usually the attribute to
be searched when you use bind and search against the Microsoft Active
Directory server.
UID

A commonly used LDAPaccount attribute used to store a username.

Bind

LDAPauthentication method that allows basic binding using theDN.

Bind and search

LDAPauthentication method that must log in to the LDAPserver to search


on the specified attribute.

Service account

An LDAP user account that can be used to log in to the LDAP server during
bind and search. This account's password is usually shared.

Anonymous

Allows a client to connect and search the directory (bind and search)

binding

without needing to log in.

ldapsearch

A command-line utility to search the LDAP directory. It returns information


that you use to configure LDAP bind and search.

basedn

Distinguished name where the directory search should begin.

binddn

Domain name to find in the directory search.

search_attribute

Text to search for to locate the user record. The default is UID.

DBADMINAuthentication Access
The DBADMINuser must have access to the database at all times.
The DBADMINaccount must authenticate against the database using local trust or local hash
authentication.
Hewlett-Packard recommends that you create an authentication method (LOCALTRUSTor
LOCALPASSWORD)with a very high priority, say, 10,000. Grant this method to the
DBADMINuser and set the priority using ALTERAUTHENTICATION.
With the high priority, this new authentication method supersedes any authentication methods you
create for PUBLIC (which includes the DBADMINuser). Even if you make changes to
PUBLICauthentication methods, the DBADMINuser can now connect to the database at any
time.
This example shows how you configure local trust authentication for the DBADMINuser. As a
result, the user can use vsql with the -h option and does not need to enter a password:

HP Vertica Analytics Platform (7.1.x)

Page 506 of 5055

HP Vertica Documentation

=> CREATE AUTHENTICATION v_dbadmin_trust METHOD 'trust' LOCAL;


=> GRANT AUTHENTICATION v_dbadmin_trust TO dbadmin;
=> ALTER AUTHENTICATION v_dbadmin_trust PRIORITY 10000;

The next example shows how you configure local hash authentication for DBADMIN.They allow
the user to access the HP Vertica database using the assigned password from any IPv4 address.
The DBADMINuser can access the database using vsql -h, the Administration Tools, or any
other tools that connects to HP Vertica:
=>
=>
=>
=>

CREATE AUTHENTICATION v_dbadmin_hash METHOD 'hash' HOST '0.0.0.0/0';


GRANT AUTHENTICATION v_dbadmin_hash TO dbadmin;
ALTER AUTHENTICATION v_dbadmin_hash PRIORITY 10000;
SELECT SET_CONFIG_PARAMETER('SecurityAlgorithm', 'SHA512');

Note: HP Vertica supports IPv4 and IPv6 addresses. For more information, see IPv4 and IPv6
for Client Authentication.

LDAPParameters
There are several parameters that you need to configure for LDAPauthentication.

General LDAPParameters
There are four parameters you can use when configuring for either LDAPbind or LDAPbind and
search:
Parameter name

Description

host

LDAP server URI in the following


format:
schema://host:optional_port
schema is either ldap (for
LDAP/Active Directory) or ldaps (for
secure LDAP/ActiveDirectory).

HP Vertica Analytics Platform (7.1.x)

Page 507 of 5055

HP Vertica Documentation

Parameter name

Description

starttls

Optional parameter that defines


StartTLS behavior:
l

softIf the server does not


support TLS, continue
authenticating the user in plain
text. This value is equivalent to
the -Z option in ldapsearch.

hardIf server does not support,


authentication should fail. This
value is equivalent to the -ZZ in
ldapsearch.
Using ldaps is equivalent to
starttls='hard'. However, if
you use them together in the same
connection string, authentication
fails.

basedn

Base DN for search.

ldap_continue

When set to yes, this parameter


allows a connection retry when a user
not found error occurs during the
previous connection attempt.
For any other failure error, the system
automatically retries the connection.

LDAPBind Parameters
Use the following parameters when authenticating with LDAPbind to create the bind name string:
Paramete
r name
binddn_
prefix

Description
First half of the bind string.

HP Vertica Analytics Platform (7.1.x)

Page 508 of 5055

HP Vertica Documentation

Paramete
r name
binddn_
suffix

Description
Second half of bind string.
You must use the binddn_prefix and binddn_suffix together.
In the following example, the bind name becomes cn=<user_login_
name>;ou=vertica users;dc=verticacorp;dc=com.
=> ALTER AUTHENTICATION auth_method_name SET binddn_prefix='cn=',binddn_
sufffix=';
ou=vertica users;dc=verticacorp;dc=com';

domain_
prefix

The domain where to find the user name.


In the following example, the bind name is verticacorp/<user_login_name>
ALTER AUTHENTICATION auth_method_name SET domain_prefix='VerticaCorp';

email_
suffix

The part of an email address that comes after the @ sign.

In the following example, the bind name becomes <user_login_name>@verticacorp.com.


=> ALTER AUTHENTICATION auth_method_name SET email_sufffix='VerticaCorp.com';

To create the bind name string, you must provide one of the following,
l

Both binddn_prefix and binddn_suffix

domain_name

email_suffix

Otherwise, HP Vertica performs a bind and search operation instead of a bind operation.

LDAPBind and Search Parameters


Use the following parameters when authenticating with LDAPbind and search:

HP Vertica Analytics Platform (7.1.x)

Page 509 of 5055

HP Vertica Documentation

Parameter name

Description

binddn

Bind DN. Domain name to find in


the directory search.

bind_password

Bind password.Required if you


specify a binddn.

search_attribute

Optional attribute to search for on


the LDAPserver.

The following example shows how to set these three attributes. In this example, it sets
l

binddn to cn=Manager,dc=qa_domain,dc=com

bind_password to secret

search_attribute to cn
=> ALTER AUTHENTICATION auth_method_name SET host=:'ldap://engvmqa13',
basedn='dc=qa_domain,dc=com',binddn='cn=Manager,dc=qa_domain,dc=com',
bind_password='secret',search_attribute='cn';

The binddn and bind_password parameters are optional. If you omit them, HP Vertica performs an
anonymous search.

Bind vs. Bind and Search


There are two LDAPmethods that you use to authenticate your HP Vertica database against an
LDAP server.
l

BindUse LDAPbind when HP Vertica connects to the LDAP server and binds using the CN
and password. (These values are the username and password of the user logging into the
database). Use the bind method when your LDAP account's CN field matches that of the
username defined in your database.

Bind and searchUse LDAPbind and search when your LDAP account's CN field is a user's
full name or does not match the username defined in your database. For bind and search, the
username is usually in another field such as UID or sAMAccountName in a standard Active
Directory environment. Bind and search requires your organization's Active Directory
information. This information allows HP Vertica to log into the LDAP server and search for the
specified field.
If you are using bind and search, having a service account simplifies your server side
configuration. In addition, you do not need to store your Active Directory password.

HP Vertica Analytics Platform (7.1.x)

Page 510 of 5055

HP Vertica Documentation

For details, see


l

Workflow for Configuring LDAPBind

Workflow for Configuring LDAPBind and Search

LDAPAnonymous Binding
Anonymous binding is an LDAP server function. Anonymous binding allows a client to connect and
search the directory (bind and search) without logging in. You do not need to include binddn and
bindpasswd.
You also do not need to log in when you configure LDAP authentication using Management
Console.

Using LDAPOver SSL/TLS


HP Vertica supports Transport Layer Security (TLS) for client authentication. TLSuses OpenSSL
0.9.8za.
You use ALTERAUTHENTICATIONto specify LDAPand SSL/TLSparameters. If you specify a
host URL that starts with ldaps, the HP Vertica server authenticates using SSL/TLS on the
specified port or on the secure LDAPS port (636).
ldaps://abc.dc.com

If the LDAP server does not support SSL on that port, authentication fails.
If you specify a host URLthat starts with ldap and set the LDAPstarttls parameter, the HP
Vertica server sends a StartTLS request. This request determines if the LDAP server supports TLS
on the specified port or on the default LDAP port (389).
=> ALTER AUTHENTICATION Ldap1 SET host='ldaps://abc.dc.com', binddn_prefix='CN=',
binddn_suffix=',OU=Unit2,DC=dc,DC=com', basedn='dc=DC,dc=com',
tls_cacert='/home/dc.com.ca.cer', starttls='hard', tls_reqcert='never';

If the LDAP server does not support TLS on that port, the result depends on the value of the
starttls parameter:
l

starttls = hard: The HP Vertica server terminates the authentication process.

starttls = soft:The HP Vertica server proceeds with the authentication but does not use
TLS.

To configure LDAPover SSL/TLS, use the following configuration parameters:

HP Vertica Analytics Platform (7.1.x)

Page 511 of 5055

HP Vertica Documentation

Parameter Name

Description

TLS_REQCERT

hardIf the client does not provide a certificate, or provides an


invalid certificate, it cannot connect. This is the default behavior.
neverThe client does not request or verify a certificate.
allowIf the client does not provide a certificate or provides an
invalid certificate, it can connect anyway.
tryIf the client does not provide a certificate, they can
connect. If the client provides an invalid certificate, they cannot
connect.

TLS_CADIR

Path to the folder with the CAcertificates. For example:


ALTER AUTHENTICATION Ldap1 SET TLS_CADIR
='/scratch_b/qa/vertica/QA/VT_Scenario/V_SEC/';

TLS_CACERT

Path to the CA certificate. For example:


ALTER AUTHENTICATION Ldap1 SET TLS_CACERT
='/scratch_b/qa/vertica/QA/VT_Scenario/V_
SEC/dc.com.ca.cer';

If you do not provide one or more of these parameters, the LDAP server checks to see if the
LDAPNOINIT environment variable points to the ldap.conf file. If it does, the server uses the
parameters specified in the ldap.conf file. If the LDAP server cannot find the ldap.conf file,
authentication fails.
The following example shows how to specify the TLS parameters and the LDAP parameters when
configuring LDAP over SSL/TLS:
=> CREATE AUTHENTICATION LDAP1 METHOD 'ldap' HOST :clientIP = '172.16.65.177';
GRANT AUTHENTICATION ldap1 TO user1;
ALTER AUTHENTICATION Ldap1 SET host='ldaps://abc.dc.com', binddn_prefix='CN=',
binddn_suffix=',OU=Unit2,DC=dc,DC=com', basedn='dc=DC,dc=com',
tls_cacert='/home/dc.com.ca.cer', starttls='hard', tls_reqcert='never';

Service Accounts and Organizational Units


Before you configure LDAPauthentication for your HP Vertica database, consider the following
steps. These recommendations can improve the effectiveness of LDAP-based security on your
system:

HP Vertica Analytics Platform (7.1.x)

Page 512 of 5055

HP Vertica Documentation

Create a service account with your LDAPserver. A service account is a single account that
is specifically set up so that users in a given organization can share an account that configures
LDAPaccess. Create a service account and use that in your LDAP URL to avoid use of account
names and passwords, which change often. If you add, remove, or change users, you do not
have to modify the LDAPURL. Having a service account allows you to restrict individual users
from searching the LDAPserver, but it allows applications like HP Vertica to search the server.

Set up an organizational unit (OU). Create an Active Directory OU, which is a group of users
in a given organization. Add all the HP Vertica users to theOU, and specify theOUin the
LDAPURL. Doing so allows the LDAP server to search just the HP Vertica OU for the user,
minimizing search time. In addition, using OUs prevents changes to the users' OUs for other
applications.

Workflow for Configuring LDAPBind


To configure your HP Vertica database to authenticate clients using LDAP bind, follow these steps:
1. Obtain a service account, as described in Service Accounts and Organizational Units. You
cannot use the service account in the connectionparameters for LDAP bind.
2. Compare the user's LDAP account name to their HP Vertica username. For example, if John
Smith's Active Directory (AD) sAMAccountName = jsmith, his HP Vertica username must
also be jsmith.
However, the LDAPaccount does not have to match the database user name, as shown in the
following example:
=> CREATE USER r1 IDENTIFIED BY '$vertica$';
=> CREATE AUTHENTICATION ldap1 METHOD 'ldap' HOST '172.16.65.177';
=> ALTER AUTHENTICATION ldap1 SET HOST=
'ldap://172.16.65.10',basedn='dc=dc,dc=com',binddn_
suffix=',ou=unit2,dc=dc,dc=com',binddn_prefix='cn=use';
=> GRANT AUTHENTICATION ldap1 TO r1;
\! ${TARGET}/bin/vsql -p $PGPORT -U r1 -w $LDAP_USER_PASSWD -h ${HOSTNAME} -c
"select user_name, client_authentication_name from sessions;"
user_name | client_authentication_name
-----------+---------------------------r1
| ldap
(1 row)

3. Run ldapsearch from an HP Vertica node against your LDAP or AD server. Verify the
connection to the server and identify the values of relevant fields. Running ldapsearch helps

HP Vertica Analytics Platform (7.1.x)

Page 513 of 5055

HP Vertica Documentation

you build the client authentication string needed to configure LDAP authentication.
In the following example, ldapsearch returns the CN, DN, and sAMAccountName fields (if
they exist) for any user whose CN contains the username jsmith. This search succeeds only
for LDAPservers that allow anonymous binding:
ldapsearch -x -h 10.10.10.10 -b "ou=Vertica Users,dc=CompanyCorp,dc=com"
'(cn=jsmith*)' cn dn uid sAMAccountName

ldapsearch returns the following results. The relevant information for LDAPbind is in bold:
# extended LDIF
#
# LDAPv3
# base <ou=Vertica Users,dc=CompanyCorp,dc=com> with scope subtree
# filter: (cn=jsmith*)
# requesting: cn dn uid sAMAccountName
#
# jsmith, Users, CompanyCorp.com
dn:cn=jsmith,ou=Vertica Users,dc=CompanyCorp,dc=com
cn: jsmith
uid: jsmith
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1

4. Create a new authentication record based on the information from ldapsearch. In the
ldapsearch entry, the CN is username jsmith, so you do not need to set it. HP Vertica
automatically sets the CN to the username of the user who is trying to connect. HP Vertica
uses that CN to bind against the LDAP server.
=> CREATE AUTHENTICATION v_ldap_bind METHOD 'ldap' HOST '0.0.0.0/0';
GRANT AUTHENTICATION v_ldap_bind TO public;
ALTER AUTHENTICATION v_ldap_bind SET
host='ldap://10.10.10.10/',
basedn='DC=CompanyCorp,DC=com',
binddn_prefix='cn=',
binddn_suffix='OU=VerticaUsers,DC=CompanyCorp,DC=com';

Workflow for Configuring LDAPBind and Search


To configure your HP Vertica database to authenticate clients using LDAP bind and search, follow
these steps:

HP Vertica Analytics Platform (7.1.x)

Page 514 of 5055

HP Vertica Documentation

1. Obtain a service account, as described in Service Accounts and Organizational Units.


2. From an HP Vertica node, run ldapsearch against your LDAP or AD server. Verify the
connection to the server, and identify the values of relevant fields. Running ldapsearch helps
you build the client authentication string needed to configure LDAP authentication.
In the following example, ldapsearch returns the CN, DN, and sAMAccountName fields (if
they exist) for any user whose CN contains the username, John. This search succeeds only
for LDAPservers that allow anonymous binding:
ldapsearch -x -h 10.10.10.10 -b 'OU=Vertica Users,DC=CompanyCorp,DC=com' -s sub -D
'CompanyCorp\jsmith' -W '(cn=John*)' cn dn uid sAMAccountName

3. Review the results that ldapsearch returns.The relevant information for bind and search is in
bold:
# extended LDIF
#
# LDAPv3
# base <OU=Vertica Users,DC=CompanyCorp,DC=com> with scope subtree
# filter: (cn=John*)
# requesting: cn dn sAMAccountName
#
# John Smith, Vertica Users, CompanyCorp.com
dn: CN=John Smith,OU=Vertica Users,DC=CompanyCorp,DC=com
cn: John Smith
sAMAccountName: jsmith
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1

4. Create the client authentication record. The sAMAccountName attribute contains the
username you wantjsmith. Set your search attribute to that field so that the search finds the
appropriate account.
=> CREATE AUTHENTICATION v_ldap_bind_search METHOD 'ldap' HOST 'no/search';
GRANT AUTHENTICATION v_ldap_bind_search TO public;
ALTER AUTHENTICATION v_ldap_bind_search SET
host='ldap://10.10.10.10',
basedn='OU=Vertica,DC=CompanyCorp,DC=com',
binddn='CN=John Smith,OU=Vertica Users,DC=CompanyCorp,DC=com',
bind_password='password',
search_attribute='cn';

HP Vertica Analytics Platform (7.1.x)

Page 515 of 5055

HP Vertica Documentation

Configuring MultipleLDAPServers
If you need to configure multiple LDAPservers that have different URLs, create a separate
authentication record for each server. Use the PRIORITYkeyword to indicate which search the
LDAP server performs first.
The following statements create two authentication methods, vldap1 and vldap2. They specify
that the LDAP server first search the entire directory (basedn=dc=example,dc=com) for a DN with
an OU attribute Sales. If the first search returns no results, or otherwise fails, the LDAP server next
searches for a DN with the OU attribute Marketing:
=> CREATE AUTHENTICATION vldap1 method "ldap" HOST 10.0.0.0/8;
ALTER AUTHENTICATION vldap1 SET
host='ldap://ldap.example.com/search',
basedn='dc=example,dc=com',
search_attribute='Sales'
PRIORITY 1;
GRANT AUTHENTICATION vldap1 to public;
=> CREATE AUTHENTICATION vldap2 method "ldap" HOST 10.0.0.0/8;
ALTER AUTHENTICATION vldap2 SET
host='ldap://ldap.example.com/search',
basedn='dc=example,dc=com',
search_attribute='Marketing'
PRIORITY 0;
GRANT AUTHENTICATION vldap1 to public;

Configuring Kerberos Authentication


Kerberos authentication differs from user name/password authentication. Instead of authenticating
each user to each network service, Kerberos uses symmetric encryption through a trusted third
party, called the Key Distribution Center (KDC). In this environment, clients and servers validate
their authenticity by obtaining a shared secret (ticket) from the KDC, after which clients and servers
can talk to each other directly.
HP Vertica uses the GSS-API(Generic Security Services Application Programming Interface)to
communicate with the Kerberos client. When you create an authentication method, specify that HP
Vertica use the 'gss' method to authenticate with Kerberos, as in the following syntax:
=> CREATE AUTHENTICATION <method_name> METHOD 'gss' HOST <ip_address>;

Topics in this section describe how to configure the HP Vertica server and clients for Kerberos
authentication. This section does not describe how to install, configure, or administer a Key
Distribution Center.

HP Vertica Analytics Platform (7.1.x)

Page 516 of 5055

HP Vertica Documentation

To install the Kerberos 5 GSS-API distribution for your operating system, see the MIT Kerberos
Distribution Page.

Kerberos Requirements
You must meet the following minimum requirements to use Kerberos authentication with the HP
Vertica server and client drivers.

Kerberos Server
Your network administrator should have already installed and configured one or more Kerberos Key
Distribution Centers (KDC). The KDC must be accessible from every node in your Vertica
Analytics Platform cluster.
The KDC must support Kerberos 5 using GSS-API. For details, see the MIT Kerberos Distribution
Page.

Client Package
The Kerberos 5 client package contains software that communicates with the KDC server. This
package is not included as part of the HP Vertica Analytics Platform installation. Kerberos software
is built into Microsoft Windows. If you are using another operating system, you must obtain and
install the client package.
If you do not already have the Kerberos 5 client package on your system, download it. Then, install
the package on each HP Vertica server and each HP Vertica client used in Kerberos authentication,
except the KDC itself.
Refer to the Kerberos documentation for installation instructions, found at:
l

The MIT website

The MITKerberos Distribution page

Client/Server Identity
Configure as Kerberos principals:
l

Each client (users or applications that connects to HP Vertica)

The HP Vertica server

These principals authenticate using the KDC.

HP Vertica Analytics Platform (7.1.x)

Page 517 of 5055

HP Vertica Documentation

Each client platform has a different security framework. The steps required to configure and
authenticate against Kerberos differ among clients. See the following topics for more information:
l

Configure HP Vertica for Kerberos Authentication

Configure Clients for Kerberos Authentication.

Configure HP Vertica for Kerberos Authentication


To set up HP Vertica for Kerberos authentication, you must perform a series of short procedures
that are described in the following sections. Perform the steps in this order:
l

Install the Kerberos 5 Client Package

Create the HPVertica Principals and Keytabs

Specify KDCInformation and Configure Realms

Inform HPVertica About the Kerberos Principals and Keytab

Configure the Authentication Method for All Clients

Restartthe Database

Get the Ticket and Authenticate HP Vertica with the KDC

Install the Kerberos 5 Client Package


See Kerberos Prerequisites.

Create the HPVertica Principals and Keytabs


HP Vertica uses special principals for system-level operations. These principals identify the HP
Vertica service and are used in two ways:
l

Kerberized HP Vertica clients request access to this service when they authenticate to the
database.

System processes like the Tuple Mover use this identity when they authenticate to external
services such as Hadoop.

Principals
A principal has three elements: a name, a host name, and a realm. You choose the name and realm,
but the host name must match the value that is supplied by the operating system. Typically this is

HP Vertica Analytics Platform (7.1.x)

Page 518 of 5055

HP Vertica Documentation

the fully-qualified host name. If the host name part of your principal doesn't match the value
supplied by the operating system, Kerberos authentication fails.
Some systems use a hosts file (/etc/hosts) to define host names. A hosts file can define more than
one name for a host. The operating system supplies the first entry, so use that in your principal. For
example, if your hosts file contains:
192.168.1.101 node1.example.com node1

then use node1.example.com as the hostname value.

Keytab Files
Principals are stored in encrypted keytab files. The keytab file contains the credentials for the HP
Vertica principal. The keytab allows the HP Vertica server to authenticate itself to the KDC. You
need the keytab so that Vertica Analytics Platform does not have to prompt for a password.
Create one principal for each node in your cluster. You can then either create individual keytab files
(one for each node containing only that node's principal) or create one keytab file containing all the
principals.
l

Create one keytab file with all principals to simplify setup: all nodes have the same file,
making initial setup easier. If you add nodes later you either update (and redistribute) the global
keytab file or make separate keytabs for the new nodes. If a principal is compromised it is
compromised on all nodes where it is present in a keytab file.

Create separate keytab files on each node to simplify maintenance. Initial setup is more
involved as you must create a different file on each node, but no principals are shared across
nodes. If you add nodes later you create keytabs on the new nodes. Each node's keytab
contains only one principal, the one to use for that node.

Creating the Principals and Keytab on a Linux KDC


1. Start the Kerberos 5 database administration utility (kadmin or kadmin.local) to create HP
Verticaprincipals on a Linux KDC.
n

Use kadmin if you are accessing the KDC on a remote server. If you have access to the
Kerberos administrator password, you can use kadmin on any machine where the Kerberos
5 client package is installed. When you start kadmin, the utility prompts you for the Kerberos
administrator's password. You might need root privileges on the client to run kadmin.

HP Vertica Analytics Platform (7.1.x)

Page 519 of 5055

HP Vertica Documentation

Use kadmin.local if:


o The KDC is on the machine that you are logging in to.
o

You have root privileges on that server.

kadmin.local does not require the administrators login credentials.


For more information about the kadmin and kadmin.local commands, see the kadmin
documentation.
2. Create one principal for each HP Vertica node. HP recommends using the name vertica. The
host name must match the value supplied by the operating system. The following example
creates the principal vertica for the node named node1.example.com:
$ sudo /usr/kerberos/sbin/kadmin.local
kadmin.local add_principal vertica/node1.example.com

3. Add the principal to the keytab. For example:


$ sudo /usr/kerberos/sbin/kadmin.local
kadmin.local ktadd -k ./node1.keytab vertica/node1.example.com@EXAMPLE.COM
Authenticating as principal vertica/node1.example.com with password.

Repeat the ktadd command once per principal, either creating separate keytabs for each one
or adding them all to a single keytab file (such as krb5.keytab). If you are using a single file, see
the documentation for the -glob option in the MITKerberos documentation.
4. Copy each keytab file to the /etc folder on the corresponding cluster node. Use the same path
and file name on all nodes.
5. On each node, make the keytab file readable by the file owner who is running the database
process (typically, the Linux dbadmin user). For example, you can change ownership of the
files to dbadmin as follows:
$ sudo chown dbadmin *.keytab

Important: In a production environment, you must control who can access the keytab file
to prevent unauthorized users from delegating your server. For more information about
delegation (also known as impersonation), see Technet.Microsoft.com.
After you create a keytab file, you can use the klist command to view keys stored in the file:

HP Vertica Analytics Platform (7.1.x)

Page 520 of 5055

HP Vertica Documentation

$ sudo /usr/kerberos/bin/klist -ke -t


Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- ------------------------------------------------4 vertica/node1.example.com@EXAMPLE.COM
4 vertica/node1.example.com@EXAMPLE.COM

Creating the Principals and Keytab on Active Directory KDC


Active Directory is a database of user authentication information. To configure HP Vertica for
Kerberos authentication on Active Directory, add the HP Vertica server and clients to an existing
Active Directory domain. Modify the Kerberos configuration file (krb5.conf) on the HP Vertica
server so that all parties support encryption types that the Active Directory KDC uses.
To configure encryption on the KDC:
1. Open gpedit.msc.
2. Expand Computer Configuration > Windows Settings > Security Settings > Local
Policies > Security Options.
3. Double-click Network security: Configure encryption types allowed for Kerberos.
4. On the LocalSettings tab select an encryption algorithm that meets your security
requirements. All parties must agree on the algorithm.
5. Run the command gpupdate /force to refresh the local and Active Directory-based group
policy settings, including security settings.
6. Use the Active Directory and Computers Snap-in on the Windows server to create users for
each Vertica host and Vertica server. Specify an encryption type that is compatible with your
Kerberos server.
7. Use the ktpass command on the Windows server to configure the server principal name for the
host or service in Active Directory. Then, generate a .keytab file, as in the following example:
ktpass -out ./host.node1.example.com.keytab
-princ host/node1.example.com@EXAMPLE.COM
-mapuser node1
-mapop set -pass <password>
-crypto <encryption_type> -ptype <principal_type>
ktpass -out ./vertica.node1.example.com.keytab
-princ vertica/node1.example.com@EXAMPLE.COM
-mapuser node1
-mapop set -pass <password>

HP Vertica Analytics Platform (7.1.x)

Page 521 of 5055

HP Vertica Documentation

-crypto <encryption_type> -ptype <principal_type>

For more information about keytab files, see Technet.Microsoft.com.


To view a list of the service principal names that a computer has registered with Active Directory,
run the setspn l. In this case, use the hostname command, where hostname is the host name of
the computer object that you want to query. For example:
setspn -L vertica
Registered ServicePrincipalNamefor CN=vertica,CN=Users,
EXAMPLE=example,EXAMPLE=com
vertica/node1.example.com
setspn -L node1
Registered ServicePrincipalNamefor CN=vertica,CN=Users,
EXAMPLE=example,EXAMPLE=com
host/node1.example.com

Specify KDCInformation and Configure Realms


Each client and Vertica Analytics Platform server in the Kerberos realm must have a valid,
identically configured Kerberos configuration (krb5.conf) file. Without this file, the client does not
know how to reach the KDC.
If you use Microsoft Active Directory, you do not need to perform this step. Refer to the Kerberos
documentation for your platform for more information about the Kerberos configuration file on Active
Directory.
At a minimum, you must configure the following sections in the krb5.conf file.
l

[libdefaults]Settings used by the Kerberos 5 library

[realms]Realm-specific contact information and settings

[domain_realm]Maps server hostnames to Kerberos realms

See the Kerberos documentation for information about other sections in this configuration file.
You must update the /etc/krb5.conf file to reflect your site's Kerberos configuration. The
simplest way to enforce consistency among all clients and servers in the Kerberos realm is to copy
the /etc/krb5.conf file from the KDC. Then, place this file in the /etc directory on each HP
Vertica cluster node.

Inform HPVertica About the Kerberos Principals and Keytab


Follow these steps to inform HP Vertica of the principal name and keytab location.

HP Vertica Analytics Platform (7.1.x)

Page 522 of 5055

HP Vertica Documentation

For information about the parameters that you are setting in this procedure, see Kerberos
Authentication Parameters.
1. Log in to the database as an administrator (typically dbadmin).
2. Set the KerberosKeyTabFile configuration parameter to point to the location of the keytab file:
=> ALTER DATABASE mydb SET KerberosKeytabFile = '/etc/krb5.keytab';

The keytab file must be in the same location (/etc/krb5.keytab in this example) on all nodes.
3. Set the service name for the HP Vertica principal; for example, vertica:
=> ALTER DATABASE mydb SET KerberosServiceName = 'vertica';

4. Provide the realm portion of the principal, for example, EXAMPLE.COM:


=> ALTER DATABASE mydb SET KerberosRealm = 'EXAMPLE.COM'

Configure the Authentication Method for All Clients


To make sure that all clients use the gss authentication method, run the following statements:
=> CREATE AUTHENTICATION <method_name> METHOD 'gss' HOST '0.0.0.0/0';
GRANT AUTHENTICATION <method_name> TO Public;

For more information, see Implementing Client Authentication.

Restartthe Database
For all settings to take effect, you must restart the database.

Get the Ticket and Authenticate HP Vertica with the KDC


The following example shows how to get the ticket and authenticate Vertica Analytics Platform with
the KDC using the kinit command. You commonly perform this final step from the vsql client.
/etc/krb5.conf
EXAMPLE.COM = {
kdc = myserver.example.com:11
admin_server = myadminserver.example.com:000
kpasswd_protocol = SET_CHANGE
default_domain = example /etc/krb5.conf
}

To request a ticket from the KDC server, call the kinit utility.

HP Vertica Analytics Platform (7.1.x)

Page 523 of 5055

HP Vertica Documentation

$ kinit kuser@EXAMPLE.COM
Password for kuser@EXAMPLE.COM:

Configure Clients for Kerberos Authentication


Each supported platform has a different security framework. Thus, the steps required to configure
and authenticate against Kerberos differ among clients.
On the server side, you construct the HP Vertica Kerberos service name principal using this format:
Kerberos_Service_Name/Kerberos_Host_Name@Kerberos_Realm

For each client, the GSS libraries require the following format for the HP Vertica service principal:
Kerberos_Service_Name@Kerberos_Host_Name

You can omit the realm portion of the principal because GSS libraries use the realm name of the
configured default (Kerberos_Realm) realm.
For information about client connection strings, see the following topics in the Connecting to HP
Vertica Guide:
l

ODBC DSN Parameters

JDBC Connection Properties

ADO.NET Connection Properties

(vsql) Command-Line Options

Note: A few scenarios exist in which theHP Vertica server principal name might not match the
host name in the connection string. See Troubleshooting Kerberos Authentication for more
information.

In This Section
l

Configure ODBC and vsql Clients on Linux, HP-UX, AIX, MAC OSX, and Solaris

Configure ODBC and vsql Clients on Windows and ADO.NET

Configure JDBC Clients on all Platforms

HP Vertica Analytics Platform (7.1.x)

Page 524 of 5055

HP Vertica Documentation

Configure ODBC and vsql Clients on Linux, HP-UX, AIX, MAC OSX, and Solaris
This topic describes the requirements for configuring an ODBC or vsql client on Linux, HP-UX, AIX,
MAC OSX, or Solaris.

Install the Kerberos 5 Client Package


See Kerberos Prerequisites.

Provide Clients with a Valid Kerberos Configuration


File
The Kerberos configuration (krb5.conf) file contains Kerberos-specific information, including:
l

default realm name

How to reach the KDC

Default realm name

Domain

Path to log files

DNS lookup,

Encryption types to use

Ticket lifetime

The default location for the Kerberos configuration file is /etc/krb5.conf.


To communicate with the KDC, each client participating in Kerberos authentication must have a
valid, identically configured krb5.conf file. When configured properly, the client can authenticate
with Kerberos and retrieve a ticket through the kinit utility. Likewise, the server can then use
ktutil to store its credentials in a keytab file
Tip: One of the simplest ways to enforce consistency among clients, Vertica Analytics
Platform, and the KDC is to copy the /etc/krb5.conf file from the KDC to the client's /etc
directory.

HP Vertica Analytics Platform (7.1.x)

Page 525 of 5055

HP Vertica Documentation

Authenticate and Connect Clients


ODBC and vsql use the client's ticket established by kinit to perform Kerberos authentication.
These clients rely on the security library's default mechanisms to find the ticket file and the and
Kerberos configuration file.
To authenticate against Kerberos, call the kinit utility to obtain a ticket from the Kerberos KDC
server. The following two examples show how to send the ticket request using ODBC and vsql
clients.
ODBC Authentication Request and Connection
1. On an ODBC client, acquire a ticket for the kuser user by calling the kinit utility.
$ kinit kuser@EXAMPLE.COM
Password for kuser@EXAMPLE.COM:

2. Connect to HP Vertica, and provide the principals in the connection string:


char outStr[100];
SQLLEN len;
SQLDriverConnect(handle, NULL, "Database=VMart;User=kuser;
Server=myserver.example.com;Port=5433;KerberosHostname=vcluster.example.com",
SQL_NTS, outStr, &len);

vsql Authentication Request Connection


If the vsql client is on the same machine you are connecting to, vsql connects through a UNIX
domain socket. This connection bypasses Kerberos authentication. When you authenticate with
Kerberos, especially if the client authentication method is configured as 'local', you must include
the -h hostname option. See Command Line Options in the Connecting to HP Vertica Guide.
1. On the vsql client, call the kinit utility:
$ kinit kuser@EXAMPLE.COM
Password for kuser@EXAMPLE.COM:

2. Connect to HP Vertica, and provide the host and user principals in the connection string:
$ ./vsql -K vcluster.example.com -h myserver.example.com -U kuser
Welcome to vsql, the Vertica Analytic Database
interactive terminal.

HP Vertica Analytics Platform (7.1.x)

Page 526 of 5055

HP Vertica Documentation

Type: \h or \? for help with vsql commands


\g or terminate with semicolon to execute query
\q to quit

In the future, when you log in to vsql as kuser, vsql uses your cached ticket without prompting you
for a password.
You can verify the authentication method by querying the SESSIONS system table:
kuser=> SELECT authentication_method FROM sessions;
authentication_method
----------------------GSS-Kerberos
(1 row)

See Also
l

Kerberos Client/Server Requirements

ODBC DSN Parameters in the Connecting to HP Vertica Guide

(vsql) Command-Line Options in the Connecting to HP Vertica Guide

Configure ADO.NET, ODBC, and vsql Clients on Windows


The HP Vertica client drivers support the Windows SSPI library for Kerberos authentication.
Windows Kerberos configuration is stored in the registry.
You can choose between two different setup scenarios for Kerberos authentication on ODBCand
vsql clients on Windows and ADO.NET:
l

Windows KDC on Active Directory with Windows Built-in Kerberos Client and HP Vertica

Linux KDC with Windows Built-in Kerberos Client and HP Vertica

Windows KDC on Active Directory with Windows Built-in Kerberos Client and HP
Vertica
Kerberos authentication on Windows is commonly used with Active Directory, Microsoft's
enterprise directory service/Kerberos implementation.Typically your organization's network or IT
administrator performs the setup.
Windows clients have Kerberos authentication built into the authentication process. You do not
need any additional software.

HP Vertica Analytics Platform (7.1.x)

Page 527 of 5055

HP Vertica Documentation

Your login credentials authenticate you to the Kerberos server (KDC) when you:
l

Log in to Windows from a client machine

Use a Windows instance that has been configured to use Kerberos through Active Directory

To use Kerberos authentication on Windows clients, log in as REALM\user.


ADO.NETIntegratedSecurity
When you use the ADO.NET driver to connect to HP Vertica, you can optionally specify
IntegratedSecurity=true in the connection string. This Boolean setting informs the driver to
authenticate the calling user against his or her Windows credentials. As a result, you do not need to
include a user name or password in the connection string. If you add a user=<username> entry to
the connection string, the ADO.NETdriver ignores it.

Linux KDC with Windows Built-in Kerberos Client and HP Vertica


A simple, but less common scenario is to configure Windows to authenticate against a nonWindows KDC. In this implementation, you use the ksetup utility to point the Windows operating
system native Kerberos capabilities at a non-Active Directory KDC. By logging in to Windows, you
obtain a ticket-granting ticket, similar to the Active Directory implementation. However, in this
case, Windows is internally communicating with a Linux KDC. See the Microsoft Windows Server
Ksetup page for more information.

Configuring Windows Clients for Kerberos Authentication


Depending on which implementation you want to configure, refer to one of the following pages on
the Microsoft Server website:
l

To set up Windows clients with Active Directory, refer to Step-by-Step Guide to Kerberos 5
(krb5 1.0) Interoperability.

To set up Windows clients with the ksetup utility, refer to the Ksetup page.

Authenticate and Connect Clients


The KDCcan authenticate both an ADO.NET and a vsql client.
Note: Use the fully-qualified domain name as the server in your connection string; for example,
use host.example.com instead of just host. That way, if the server moves location, you do
not have to change your connection string.
ADO.NET Authentication Request and Connection

HP Vertica Analytics Platform (7.1.x)

Page 528 of 5055

HP Vertica Documentation

This example shows how to use the IntegratedSecurity=true, setting to specify that the
ADO.NETdriver authenticate the calling user's Windows credentials:
VerticaConnection conn = new
VerticaConnection("Database=VMart;Server=host.example.com;
Port=5433;IntegratedSecurity=true;
KerberosServiceName=vertica;KerberosHostname=vcluster.example.com");
conn.open();

vsql Authentication Request and Connection


1. Log in to your Windows client, for example, as EXAMPLE\kuser.
2. Run the vsql client and supply the connection string to HP Vertica:
C:\Users\kuser\Desktop>vsql.exe -h host.example.com -K vcluster -U kuser
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit

See Also
l

Kerberos Client/Server Requirements

vsql Command Line Options in the Connecting to HP Vertica Guide

ADO.NET Connection Properties in the Connecting to HP Vertica Guide

Configure JDBC Clients on All Platforms


Kerberos authentication on JDBC clients uses Java Authentication and Authorization Service
(JAAS) to acquire the initial Kerberos credentials. JAAS is an API framework that hides platformspecific authentication details and provides a consistent interface for other applications.
You specify the client login process through the JAAS Login Configuration File. This file contains
options that specify the authentication method and other settings to use for Kerberos. A class
called the LoginModule defines valid options in the configuration file.
The JDBCclient principal is crafted as jdbc-username@server-from-connection-string.

About the LoginModule


Many vendors can provide a LoginModule implementation that you can use for Kerberos
authentication. However, HP recommends that you use the JAAS public class

HP Vertica Analytics Platform (7.1.x)

Page 529 of 5055

HP Vertica Documentation

com.sun.security.auth.module.Krb5LoginModul provided in the Java Runtime Environment


(JRE).
The Krb5LoginModule authenticates users using Kerberos protocols and is implemented differently
on non-Windows and Windows platforms:
l

On non-Windows platforms: The Krb5LoginModule defers to a native Kerberos client


implementation. Thus, you can use the same /etc/krb5.conf setup as you use to configure
ODBC and vsql clients on Linux, HP-UX, AIX, MAC OSX, and Solaris platforms.

On Windows platforms: The Krb5LoginModule uses a custom Kerberos client implementation


bundled with the Java Runtime Environment (JRE). Windows settings are stored in a
%WINDIR%\krb5.ini file, which has similar syntax and conventions to the non-Windows
krb5.conf file. You can copy a krb5.conf from a non-Windows client to %WINDIR%\krb5.ini.

You can find documentation for the LoginModules in the com.sun.security.auth package, and
on the Krb5LoginModule web page.

Create the JAAS login configuration


The JAASConfigName connection property identifies a specific configuration within a JAAS
configuration that contains the Krb5LoginModule and its settings. The JAASConfigName setting
lets multiple JDBC applications with different Kerberos settings coexist on a single host. The
default configuration name is verticajdbc.

Important: Carefully construct the JAAS login configuration file. If syntax is incorrect,
authentication fails.
You can configure JAAS-related settings in the java.security master security properties file. This
file resides in the lib/security directory of the JRE. For more information, see Appendix A in the
JavaTM Authentication and Authorization Service (JAAS) Reference Guide.
Create a JDBCLogin Context
The following example shows how to create a login context for Kerberos authentication on a
JDBCclient. The client uses the default JAASConfigName of verticajdbc and specifies that:
l

The ticket-granting ticket will be obtained from the ticket cache

The user will not be prompted for a password if credentials cannot be obtained from the cache,
keytab file, or through a shared state.

HP Vertica Analytics Platform (7.1.x)

Page 530 of 5055

HP Vertica Documentation

verticajdbc {
com.sun.security.auth.module.Krb5LoginModule
required
useTicketCache=true
doNotPrompt=true;
};

JDBC Authentication Request and Connection


You can configure the Krb5LoginModule to use a cached ticket or keytab. The driver can also
acquire a ticket or keytab automatically if the calling user provides a password.
In the preceding example, the login process uses a cached ticket and does not prompt for a
password because both useTicketCache and doNotPrompt are set to true. If doNotPrompt=false
and you provide a user name and password during the login process, the driver provides that
information to the LoginModule. The driver then calls the kinit utility on your behalf.
1. On a JDBCclient, call the kinit utility to acquire a ticket:
kinit kuser@EXAMPLE.COM

If you prefer to use a password instead of calling the kinit utility, see the next section.
2. Connect to HP Vertica:
Properties props = new Properties();
props.setProperty("user", "kuser");
props.setProperty("KerberosServiceName", "vertica");
props.setProperty("KerberosHostName", "vcluster.example.com");
props.setProperty("JAASConfigName", "verticajdbc");
Connection conn = DriverManager.getConnection
"jdbc:vertica://myserver.example.com:5433/VMart", props);

Have the Driver Acquire a Ticket


Sometimes, you may want to bypass calling the kinit utility yourself but still use encrypted,
mutual authentication. In such cases, you can optionally pass the driver a clear text password to
acquire the ticket from the KDC. The password is encrypted when sent across the network. For
example, useTicketCache and doNotPrompt are both false in the following example. Thus, the
calling user's credentials are not obtained through the ticket cache or keytab.
verticajdbc {
com.sun.security.auth.module.Krb5LoginModule

HP Vertica Analytics Platform (7.1.x)

Page 531 of 5055

HP Vertica Documentation

required
useTicketCache=false
doNotPrompt=false;
};

The preceding example demonstrates the flexibility of JAAS. The driver no longer looks for a
cached ticket, and you do not have to call kinit. Instead, the driver takes the password and user
name and calls kinit on your behalf.

See Also
l

Kerberos Client/Server Requirements

JDBC Connection Properties in the Connecting to HP Vertica Guide

JavaTM Authentication and Authorization Service (JAAS) Reference Guide (external website)

Troubleshooting Kerberos Authentication


This topic provides tips that help you avoid and troubleshoot issues related to Kerberos
authentication with Vertica Analytics Platform.

JDBC Client Authentication Fails


If Kerberos authentication fails on a JDBCclient, check the JAAS login configuration file for syntax
issues. If syntax is incorrect, authentication fails.

Working Domain Name Service (DNS) Not Configured


Verify that the DNS entries and hosts on the network are all properly configured for your
environment. Refer to the Kerberos documentation for your platform for details.

System Clocks Out of Sync


System clocks in your network must remain in sync for Kerberos authentication to work properly.
To do so:
l

Install NTP on the Kerberos server (KDC).

Install NTP on each server in your network.

Synchronize system clocks on all machines that participate in the Kerberos realm within a few
minutes of the KDC and each other

HP Vertica Analytics Platform (7.1.x)

Page 532 of 5055

HP Vertica Documentation

Clock skew can be problematic on Linux virtual machines that need to sync with the Windows Time
Service. Try the following to keep time in sync:
1. Using any text editor, open /etc/ntp.conf.
2. Under the Undisciplined Local Clock section, add the IP address for the Vertica Analytics
Platform server. Then, remove existing server entries.
3. Log in to the server as root, and set up a cron job to sync time with the added IP address every
half hour, or as often as needed. For example:
# 0 */2 * * * /etc/init.d/ntpd restart

4. Alternatively, run the following command to force clock sync immediately:


$ sudo /etc/init.d/ntpd restart

For more information, see Set Up Time Synchronization in the Installation Guide and the Network
Time Protocol website.

Kerberos Ticket Is Valid but Hadoop Access Fails


HP Vertica uses Kerberos tickets to obtain Hadoop tokens. It then uses the Hadoop tokens to
access the Hadoop data. Hadoop tokens expire after a period of time, so HP Vertica periodically
refreshes them. However, if your Hadoop cluster is set to expire tokens frequently, it is possible
that tokens might not be refreshed in time. If the token expires, you cannot access data.
Setting the HadoopFSTokenRefreshFrequency configuration parameter allows you to specify how
often HP Vertica should refresh the token. Specify this value, in seconds, to be smaller than the
expiration period set for Hadoop. For example:
=> ALTER DATABASE exampledb SET HadoopFSTokenRefreshFrequency = '86400';

Encryption Algorithm Choices


Kerberos is based on symmetric encryption. Be sure that all Kerberos parties used in the Kerberos
realm agree on the encryption algorithm to use. If they do not agree, authentication fails. You can
review the exceptions in the vertica.log.
On a Windows client, be sure the encryption types match the types set on Active Directory. See
Configure HP Vertica for Kerberos Authentication.

HP Vertica Analytics Platform (7.1.x)

Page 533 of 5055

HP Vertica Documentation

Be aware that Kerberos is used only for securing the login process. After the login process
completes, by default, information travels between client and server without encryption. If you want
to encrypt traffic, use SSL. For details, see Implementing SSL.

Kerberos Passwords Not Recognized


If you change your Kerberos password, you must re-create all of your keytab files.

Using the ODBC Data Source Configuration Utility


On Windows vsql clients, you may choose to use the ODBC Data Source Configuration utility and
supply a client Data Source. If so, be sure you enter a Kerberos host name in the Client Settings tab
to avoid client connection failures with the Vertica Analytics Platform server.

Authentication Failure in Backup, Restore, or Admin Tools


This problem can arise in configurations where each HP Vertica node uses its own Kerberos
principal. (This is the recommended configuration.) When using vbr.py or admintools you might see
an error such as the following:
vsql: GSSAPI continuation error: Miscellaneous failure
GSSAPI continuation error: Server not found in Kerberos database

Backup/restore and the admin tools use the value of KerberosHostname, if it is set, in the Kerberos
principal used to authenticate. The same value is used on all nodes. If you have defined one
Kerberos principal per node, as recommended, this value does not match. To correct this, unset the
KerberosHostname parameter:
=> ALTER DATABASE mydb CLEAR KerberosHostname;

Server's Principal Name Does Not Match Host Name


This problem can arise in configurations where a single Kerberos principal is used for all nodes. HP
recommends against this; use one principal per node and do not set the KerberosHostname
parameter.
In some cases during client connection, the HP Vertica server's principal name might not match the
host name in the connection string. (See also Using the ODBC Data Source Configuration Utility in
this topic.)
On ODBC, JDBC, and ADO.NET clients, set the host name portion of the server's principal using
the KerberosHostName connection string.

Tip: On vsql clients, you set the host name portion of the server's principal name using the

HP Vertica Analytics Platform (7.1.x)

Page 534 of 5055

HP Vertica Documentation

-K KRB HOST command-line option. The default value is specified by the -h switch, which
is the host name of the machine on which the HP Vertica server is running. -K is
equivalent to the drivers' KerberosHostName connection string value.
For details, see CommandLine Options in the Connecting to HP Vertica Guide.

Principal/Host Mismatch Issues and Resolutions


l

The KerberosHostName configuration parameter has been overridden.


For example, consider the following connection string:
jdbc:vertica://node01.example.com/vmart?user=kuser

Because the this connection string includes no explicit KerberosHostName parameter, the driver
defaults to the host in the URL (node01.example.com). If you overwrite the server-side
KerberosHostName parameter as abc, the client generates an incorrect principal.
To resolve this issue, explicitly set the clients KerberosHostName to the connection string, as in
this example:
jdbc:vertica://node01.example.com/vmart?user=kuser&kerberoshostname=abc

Connection load balancing is enabled, but the node against which the client authenticates might
not be the node in the connection string.
In this situation, consider changing all nodes to use the same KerberosHostName setting. When
you use the default to the host that was originally specified in the connection string, load
balancing cannot interfere with Kerberos authentication.

You have a DNS name that does not match the Kerberos host name.
For example, imagine a cluster of six servers, where you want hr-servers and financeservers to connect to different nodes on the Vertica Analytics Platform cluster. Kerberos
authentication, however, occurs on a single (the same) KDC. In the following example, the
Kerberos service host name of the servers is server.example.com.
Suppose you have the following list of example servers:

HP Vertica Analytics Platform (7.1.x)

Page 535 of 5055

HP Vertica Documentation

server1.example.com
server2.example.com
server3.example.com
server4.example.com
server5.example.com
server6.example.com

192.16.10.11
192.16.10.12
192.16.10.13
192.16.10.14
192.16.10.15
192.16.10.16

Now, assume you have the following DNSentries:


finance-servers.example.com 192.168.10.11, 192.168.10.13, 192.168.10.13
hr-servers.example.com 192.168.10.14, 192.168.10.15, 192.168.10.16

When you connect to finance-servers.example.com, specify:


n

Kerberos -h host name option as server.example.com

-K host option for hr-servers.example.com

For example:
$ vsql -h finance-servers.example.com -K server.example.com

You do not have DNS set up on the client machine, so you must connect by IP only.
To resolve this issue, specify:
n

Kerberos -h host name option for the IP address

-K host option for server.example.com

For example:
$ vsql -h 192.168.1.12 -K server.example.com

There is a load balancer involved (Virtual IP), but there is no DNSname for the VIP.
Specify:
n

Kerberos -h host name option for the Virtual IP address

-K host option for server.example.com

For example:

HP Vertica Analytics Platform (7.1.x)

Page 536 of 5055

HP Vertica Documentation

$ vsql -h <virtual IP> -K server.example.com

You connect to HP Vertica using an IP address, but there is no host name to construct the
Kerberos principal name.
Provide the instance or host name for the HP Vertica as described in Inform HPVertica About
the Kerberos Principals and Keytab

You set the server-side KerberosHostName configuration parameter to a name other than the HP
Vertica node's host name, but the client cannot determine the host name based on the host
name in the connection string alone.
Rename the KerberosHostName to match the name of the HP Vertica node's host name. For
more information, see the following topics in the Connecting to HP Vertica Guide:
n

ODBC DSN Parameters

JDBC Connection Properties

ADO.NET Connection Properties

Configuring Hash Authentication


HP Vertica 7.1 provides a new hash authentication method. Hash authentication allows you to use
the MD5 algorithm or the more secure algorithm, SHA-512, to store user passwords. SHA-512 is
one of the industry-standard SHA-2 family of hash algorithms that address weaknesses in SHA-1
and MD5.
Important: Hewlett-Packard strongly recommends that you use SHA-512 for hash
authentication because it is more secure than MD5.
Before HP Vertica 7.1, the database used only MD5 to store passwords. The MD5 algorithm
continues to be the default algorithm for storing passwords after you update your HP Vertica pre-7.1
server. All current users can still authenticate as in earlier releases until you reconfigure hash
authentication.
Before you perform hash authentication, review the following topics:
l

Hash Authentication ParametersDescribes the two hash authentication parameters that


specify which hashing algorithm to use.

HP Vertica Analytics Platform (7.1.x)

Page 537 of 5055

HP Vertica Documentation

Upgrade Considerations for Hash AuthenticationBefore you implement the SHA-512


algorithm for one or more users, you must be aware of several issues. For details, review this
topic before proceeding.

How to Configure Hash AuthenticationAfter you decide to implement hash authentication in


your database, follow the steps described in this topic..

Hash Authentication Parameters


Two parameters control which algorithm hash authentication uses for hashing and storing user
passwords:
l

A user-level parameter, Security_Algorithm:


=> ALTER USER username SECURITY_ALGORITHM 'MD5' IDENTIFIED BY 'newpassword';
=> ALTER USER username SECURITY_ALGORITHM 'SHA512' IDENTIFIED BY 'newpassword';

A system-level configuration parameter, SecurityAlgorithm:


=> SELECT SET_CONFIG_PARAMETER('SecurityAlgorithm', 'MD5');
=> SELECT SET_CONFIG_PARAMETER('SecurityAlgorithm', 'SHA512');

Both parameters can have the following values:


l

'NONE'

'MD5'

'SHA512'

The user-level parameter usually has precedence over the system-level parameter. However, if the
user-level parameter is 'NONE', HP Vertica hashes passwords with the algorithm assigned to the
system-level parameter value. If both parameters are 'NONE', HP Vertica uses the MD5 algorithm.
These values, which are stored in the PASSWORD_AUDITORsystem table, affect the security
algorithm that is actually used for hash authentication.
User-Level Parameter
Value

System-Level Parameter
Value

Algorithm Used for Hash


Authentication

'NONE'

'NONE'

MD5

'NONE'

'MD5'

MD5

HP Vertica Analytics Platform (7.1.x)

Page 538 of 5055

HP Vertica Documentation

User-Level Parameter
Value

System-Level Parameter
Value

Algorithm Used for Hash


Authentication

'NONE'

'SHA512'

SHA-512

'MD5'

'NONE'

MD5

'MD5'

'MD5'

MD5

'MD5'

'SHA512'

MD5

'SHA512'

'NONE'

SHA-512

'SHA512'

'MD5'

SHA-512

'SHA512'

'SHA512'

SHA-512

Upgrade Considerations for Hash Authentication


For HP Vertica releases before 7.1, MD5 is the only algorithm used for hashing passwords. In HP
Vertica 7.1, you can use either the MD5 algorithm or the more secure SHA-512 algorithm. Before
you upgrade, you must consider the following behaviors to avoid problems.

Upgrade the Client and Server


To implement the more secure SHA-512 algorithm for hashing passwords, you must upgrade both
the client and the server to HP Vertica 7.1 or higher. Suppose you upgrade the server but not the
client. Then, you specify that one or more users to have their passwords stored using SHA-512.
The client does not understand hashing with SHA-512. When it sends a message to the server, the
server returns an error.

User-Level Parameter Has Priority Over System-LevelParameter


When you initially upgrade from a pre-7.1 database, the user-level parameter, Security_Algorithm,
is set to 'NONE'. This setting allows all existing users to continue connecting to the HP Vertica
server, and their passwords are hashed using MD5.
If you want one or more users to use the SHA-512 algorithm, first set the system-level parameter
SecurityAlgorithm to 'SHA512'.
Important: Changing only the system-level parameter to SHA-512 does not change the hash
algorithm to SHA-512 for existing users. The hash algorithm does not change until the user
password changes.
You can use any of three approaches to changing the user password:

HP Vertica Analytics Platform (7.1.x)

Page 539 of 5055

HP Vertica Documentation

Manually set the user's user-level security algorithm to 'SHA512'. Then, change the users
password, as in the following statement:
=> ALTER USER username SECURITY_ALGORITHM 'SHA512' IDENTIFIED BY 'newpassword';

Set the user's password to expire immediately as in the following statement. After the password
expires, the user responds by changing it.
=> ALTER USER username PASSWORD EXPIRE;

Ask the user to change the password.

All new passwords inherit the system-level security algorithm, which is SHA-512.

Password Considerations for Hash Authentication


If you want a user to use the SHA-512 hash algorithm, after setting the user-level algorithm to SHA512, you must change the password. Doing so allows the new password to be recognized by the
SHA-512 algorithm on the system trying to connect to HP Vertica.
If the security algorithm does not change and you try to change a user password to the current
password that is currently used, you see an error that says you cannot reuse the current password.
However, if you try to reuse the current password after changing the security algorithm, you do not
see that error. HP Vertica allows you to reuse the current password because the password hashed
using MD5 does not match the same password hashed using SHA-512.

How to Configure Hash Authentication


Follow these steps to configure hash authentication:
1. Create an authentication method that is based on hash encryption. When you create an
authentication method, it is automatically enabled for use.
The following example shows how to create an authentication method v_hash for users logging
in from theIPaddress 10.0.0.0/0.
=> CREATE AUTHENTICATION v_hash METHOD 'hash' HOST '10.0.0.0/0';

If users are trying to connect from an IPv6 address, the statement might look like this example:

HP Vertica Analytics Platform (7.1.x)

Page 540 of 5055

HP Vertica Documentation

=> CREATE AUTHENTICATION v_hash METHOD 'hash' HOST '2001:db8:ab::123/128';

2. Decide which password-hashing algorithm you want to use: MD5 or the more secure SHA-512.
3. Specify the security algorithm as follows:
n

At the system level, set the SecurityAlgorithm configuration parameter. This setting applies
to all users, unless their user-level security is set to another value:
=> ALTER DATABASE mydb SecurityAlgorithm = 'MD5';
=> ALTER DATABASE mydb SecurityAlgorithm = 'SHA512';

If you want users to inherit the system-level security, set their passwords to expire
immediately. Users must change their passwords before they log in again. Alternatively, you
can ask users to change their passwords. HP Vertica hashes all new passwords using the
system-level security algorithm.
n

At the user level, use ALTER USERto set the Security_Algorithm user parameter. Changing
this parameter at the user level overrides the system-level value:
=> ALTER USER username SECURITY_ALGORITHM 'MD5' IDENTIFIED BY 'newpassword';
=> ALTER USER username SECURITY_ALGORITHM 'SHA512' IDENTIFIED BY 'newpassword';

4. Associate the v_hash authentication method with the desired users or user roles, using a
GRANTstatement:
=> GRANT AUTHENTICATION v_hash to user1, user2, ...;

For more information about how these parameters work, see Hash Authentication Parameters.

Configuring TLSAuthentication
HP Vertica supports Transport Layer Security (TLS) for client authentication. For an overview of
client authentication, refer to Client Authentication.
The terms SSL and TLS are often used interchangeably. This document uses both terms. TLS is
the successor to SSL and offers greater security. The original SSL standard was renamed TLS at
the time it became open source. The introduction of TLS began with version 1, which is essentially
equal to SSL 3. You use openssl commands to create certificates and keys and TLS syntax to
create an authentication method.

HP Vertica Analytics Platform (7.1.x)

Page 541 of 5055

HP Vertica Documentation

Authentication methods define how clients connect to an HP Vertica server. Before you define a
TLS authentication method, you should understand what type of authentication methods your HP
Vertica server supports. You should also perform any prerequisite tasks.
In regards to SSL, your server can operate with:
l

No SSL

SSL Server Mode The client does not need certificate or key files.

SSL Mutual Mode The client needs certificate, key, and certificate authority files.

SSL modes are independent of authentication, except that the SSL client self-authentication
method requires that your server be set-up in SSL Mutual Mode. Otherwise, if you are not
implementing client self-authentication method, you can use TLS authentication with either SSL
Server Mode or SSL Mutual Mode.
Before you create a TLS authentication method, perform the pre-requisite tasks necessary for your
specific environment (for example, certificate creation). Refer to Implementing SSL and all
subsections applicable to your environment.
To create a TLS authentication method, use the command CREATE AUTHENTICATION as
documented in the SQL Reference Manual.

HP Vertica Analytics Platform (7.1.x)

Page 542 of 5055

HP Vertica Documentation

SSL Server Modes and Client Authentication Summary

Implementing Client Self-Authentication


To use a client self-authentication method, your server must be in SSL Mutual Mode.

HP Vertica Analytics Platform (7.1.x)

Page 543 of 5055

HP Vertica Documentation

To create an authentication method for client self-authentication, use the CREATE


AUTHENTICATION statement. Specify the auth_type 'tls' and with HOST TLS. For further
information, see Creating an Authentication Method with Client Self-Authentication Method .
Important: You use the 'tls' auth_type only when you want to create an authentication
method for client self-authentication. You must use the 'tls' auth_type with the HOST TLS
syntax.

Creating an Authentication Method with Client Self-Authentication Method


This section provides sample chronological steps for setting up a client for self-authentication,
creating an authentication method, and associating the method with a user through a grant
statement.
1. Follow all applicable procedures for implementing SSL and distributing certificates and keys.
Refer to Implementing SSL as it applies to your environment.
When you create a client key, make sure to include a Common Name (CN) that is the
database user name you want to use with the target database.

Common Name (e.g., your name or your servers hostname)


[]:mydatabaseusername
2. Create the authentication method. Authentication methods are automatically enabled when
you create them.

CREATE AUTHENTICATION myssltest METHOD 'tls' HOST TLS '10.0.0.0/23';


3. Associate the method with the user through a grant statement.

GRANT AUTHENTICATION myssltest TO mydatabaseusername;


Your client can now log on and be recognized.
For information on creating authentication methods, refer to the SQL Reference Manual topic,
CREATE AUTHENTICATION.

HP Vertica Analytics Platform (7.1.x)

Page 544 of 5055

HP Vertica Documentation

Requiring TLS for Client Connections


You can require clients to use TLS when connecting to HP Vertica.To do so, create a client
authentication method for them that uses the HOST TLS syntax with the CREATE
AUTHENTICATION statement.
Specific clients might connect through a network connection known to be insecure. In such cases,
you can choose to limit specific users to connecting through TLS. You can also require all clients to
use TLS.
See Creating Authentication Records for more information about creating client authentication
methods.

Configuring Ident Authentication


The Ident protocol, defined in RFC 1413, authenticates a database user with a system user
name.To see if that system user can log in without specifying a password, you configure HP
Vertica client authentication to query an Ident server. With this feature, the DBADMINuser can run
automated scripts to execute tasks on the HP Vertica server.
Caution: Ident responses can be easily spoofed by untrusted servers. Use Ident
authentication only on local connections, where the Ident server is installed on the same
computer as the HP Vertica database server.
Following the instructions in these topics to install, set up, and configure Ident authentication for
your database:
l

Installing and Setting Up an Ident Server

Configuring Ident Authentication for Database Users

Installing and Setting Up an Ident Server


To use Ident authentication, you must install the oidentd server and enable it on your HP Vertica
server. oidentd is an Ident daemon that is compatible with HP Vertica and compliant with RFC
1413.
To install and configure oidentd on Red Hat Linux for use with your HP Vertica database, follow
these steps:

HP Vertica Analytics Platform (7.1.x)

Page 545 of 5055

HP Vertica Documentation

1. Install oidentd on Red Hat Linux by running this command:


$ yum install oidentd

Note: The source code and installation instructions for oidentd are available at the oidentd
website.

2. Make sure that the Ident server accepts IPv6 connections to prevent authentication failure. To
do so, you must enable this capability. In the script /etc/init.d/oidentid, change the line
from:
exec="/usr/sbin/oidentd"

to
exec="/usr/sbin/oidentd -a ::"

Start oidentd with -a :: at the Linux prompt.


3. Restart the server with the following command:
/etc/init.d/oidentd restart

Configuring Ident Authentication for Database Users


To configure Ident authentication, take the following steps:
1. Create an authentication method that uses Ident.
The Ident server must be installed on the same computer as your database, so specify the
keyword LOCAL. HP Vertica requires that the Ident server and database always be on the
same computer as the database.
=> CREATE AUTHENTICATION v_ident METHOD 'ident' LOCAL;

2. Set the Ident authentication parameters, specifying the system users who should be allowed to
connect to your database.

HP Vertica Analytics Platform (7.1.x)

Page 546 of 5055

HP Vertica Documentation

=> ALTER AUTHENTICATION v_ident SET system_users='user1:user2:user3';

3. Associate the authentication method with the HP Vertica user. Use a GRANTstatement that
allows the system user user1 to log in using Ident authentication:
=> GRANT AUTHENTICATION v_ident TO user1;

Ident Configuration Examples


The following examples show several ways to configure Ident authentication.
Allow system_user1 to connect to the database as HP Vertica vuser1:
=>
=>
=>
=>

CREATE AUTHENTICATION v_ident METHOD 'ident' LOCAL;


ALTER AUTHENTICATION v_ident SET system_users='system_user1';
GRANT AUTHENTICATION v_ident to vuser1;
ENABLE AUTHENTICATION v_ident;

Allow system_user1, system_user2, and system_user3 to connect to the database as vuser1.


Use colons(:)to separate the user names:
=> CREATE AUTHENTICATION v_ident METHOD 'ident' LOCAL;
=> ALTER AUTHENTICATION v_ident SET system_users='system_user1:system_user2:system_
user3';
=> GRANT AUTHENTICATION v_ident TO vuser1;
=> ENABLE AUTHENTICATION v_ident;

Associate the authentication with Public using a GRANTAUTHENTICATION statement. The


users, system_user1, system_user2, and system_user3 can now connect to the database as any
database user:
=> CREATE AUTHENTICATION v_ident METHOD 'ident' LOCAL;
=> ALTER AUTHENTICATION v_ident SET system_users='system_user1:system_user2:system_
user3';
=> GRANT AUTHENTICATION to Public;
=> ENABLE AUTHENTICATION v_ident;

Set the system_users parameter to * to allow any system user to connect to the database as
vuser1:
=>
=>
=>
=>

CREATE AUTHENTICATION v_ident METHOD 'ident' LOCAL;


ALTER AUTHENTICATION v_ident SET system_users='*';
GRANT AUTHENTICATION v_ident TO vuser1;
ENABLE AUTHENTICATION v_ident;

HP Vertica Analytics Platform (7.1.x)

Page 547 of 5055

HP Vertica Documentation

Using a GRANTstatement, associate the v_ident authentication with Public to allow system_
user1 to log into the database as any database user:
=>
=>
=>
=>

CREATE AUTHENTICATION v_ident METHOD 'ident' LOCAL;


ALTER AUTHENTICATION v_ident SET system_users='system_user1';
GRANT AUTHENTICATION v_ident to Public;
ENABLE AUTHENTICATION v_ident;

HP Vertica Analytics Platform (7.1.x)

Page 548 of 5055

HP Vertica Documentation

Implementing SSL
To protect privacy and verify data integrity, you can configure HP Vertica and database clients to
use Secure Socket Layer (SSL). SSL allows secure connection and communication between the
client and the server. The SSL protocol uses a trusted third party called a Certificate Authority (CA).
Both the owner of a certificate and the party that relies on the certificate trust the CA.
HP Vertica supports the following authentication methods under SSL v3/Transport Layer Security
(TLS) 1.0 protocol:
l

SSL server authentication Lets the client confirm the server's identity. The client verifies
that the server's certificate and public key are valid and were issued by a certificate authority
(CA) listed in the client's list of trusted CAs. This authentication helps prevent man-in-themiddle attacks. See "Prerequisites for SSL Server Authentication and SSL Encryption" in SSL
Prerequisites and Configuring SSL.

SSL client authentication (Optional) Lets the server confirm the client's identity. The server
verifies that the client's certificate and public key are valid and were issued by a certificate
authority (CA) listed in the server's list of trusted CAs. Client authentication is optional because
HP Vertica can authenticate the client at the application protocol level through user name and
password credentials. See "Optional Prerequisites for SSL Server and Client Mutual
Authentication" in SSL Prerequisites.

Encryption Encrypts data sent between the client and database server. This method
significantly reduces the likelihood that the data can be read if the connection between the client
and server is compromised. Encryption works at both ends of a transaction, regardless of
whether SSL Client Authentication is enabled. See "Prerequisites for SSL Server Authentication
and SSL encryption" in SSL Prerequisites and Configuring SSL.

Data integrity Verifies that data sent between the client and server has not been altered
during transmission.

Note: For server authentication, HP Vertica supports using RSA encryption with ephemeral
Diffie-Hellman (DH). DH is the key agreement protocol.

Certificate Authority
The CA issues electronic certificates to identify one or both ends of a transaction. These
certificates to verify ownership of a public key by the name on the certificate.

HP Vertica Analytics Platform (7.1.x)

Page 549 of 5055

HP Vertica Documentation

Public and Private Keys


A CA issues digital certificates that contain a public key and the identity of the owner.
The public key is available to all users through a publicly accessible directory. However, private
keys are confidential to their respective owners. When you use a private/public key pair, the data is
encrypted by one key and decrypted by its corresponding key.
l

If encrypted with a public key, data can be decrypted by its corresponding private key only.

If encrypted with a private key, data can be decrypted by its corresponding public key only.

For example, suppose Alice wants to send confidential data to Bob. Because she wants only Bob
to read it, she encrypts the data with Bob's public key. Even if someone else gains access to the
encrypted data, it remains protected. Because only Bob has access to his corresponding private
key, he is the only person who can decrypt Alice's encrypted data back into its original form.

SSL Prerequisites
Before you implement SSL security, obtain the appropriate certificate signed by a certificate
authority (CA) and private key files. Then, copy the certificate to your system. (See the OpenSSL
documentation.) These files must use the Privacy-Enhanced Mail (PEM) format.

SSL Server Authentication and SSL Encryption


Follow these steps to set up SSL server or mutual mode:
1. Copy the server certificate file (server.crt) and private key (server.key) to one of your
server hosts in the cluster. It is not necessary to copy these files to all server hosts. (After you
generate your certificates and keys, refer to Distributing Certificates and Keys.)
The public key contained within the certificate and the corresponding private key allow the SSL
connection to encrypt the data. This encryption helps protect data integrity.
2. If you are planning to use SSL mutual mode, the client needs to verify the server's certificate:
n

If you are planning to use SSL mutual mode with JDBC, copy the root.crt file to one of the
clients; the root.crt is later incorporated into the truststore. The truststore can then be copied
to other clients. For more information, see JDBC Certificates in the section Generating SSL
Certificates and Keys.

HP Vertica Analytics Platform (7.1.x)

Page 550 of 5055

HP Vertica Documentation

If you are using the vsql program:


o

If the VSQL_HOME environment variable is not set, copy the root.crt file to the .vsql
subdir of the login user's home directory. (e.g., ~/.vsql/root.crt)

If the VSQL_HOME environment variable is set, copy the root.crt file to the .vsql
subdir of the target directory. (e.g., $vsql_home/.vsql/root.crt)
The root.crt file contains either the server's certificate or the Certificate Authority that
issued the server certificate.

Important:If you do not perform this step, database operation may be compromised.
If the client cannot authenticate the server, the database does not start. This
situation can occur if a counterfeit server with the valid certificate file tries to
connect, the client detects that root.crt does not match the CA used to sign the
certificate.

Optional Prerequisites for SSL Server and SSL Mutual Mode


Follow these steps only if you want to have both server and client mutually authenticate
themselves with SSL keys.
Setting up SSL Mutual Mode is optional. The server can use alternative techniques, such as
database-level password authentication, to verify the client's identity.
1. Copy the root.crt file to one server host in the cluster. (See Distributing Certificates and
Keys.)
The root.crt file has the same name on the client and server. However, the file contents can
differ. The contents would be identical only if the client and server certificates were used by the
same root certificate authority (CA).
2. Copy the client certificate file (client.crt) and private key (client.key) to each client. For
vsql:
n

If the VSQL_HOME environment variable is set, copy the file to the .vsql subdir of the target
directory. (e.g., $vsql_home/.vsql/client.crt)

If the VSQL_HOME environment variable is not set, copy the files to the .vsql subdir of the
login user's home directory. (e.g., ~/.vsql/client.crt)

HP Vertica Analytics Platform (7.1.x)

Page 551 of 5055

HP Vertica Documentation

If you are using either ODBC or JDBC, you can place the files anywhere on your system.
Then, provide the location in the connection string (ODBC/JDBC) or ODBCINI (ODBC only).
See Configuring SSL for ODBC Clients and Configuring SSL for JDBC Clients.

Important: If you're using ODBC, the private key file (client.key) must have read and
write permissions only for the dbadmin user (such as 0600 for Linux). Do not provide any
additional permissions or extend them to any other users.

Generating SSL Certificates and Keys


This section describes the following:
l

Generating Certificate Authority (CA) certificates and keys that can then be used to sign server
and client keys.

Important: In a production environment, always use certificates signed by a CA.


l

Creating a server private key and requesting a new server certificate that includes a public key.

Creating a client private key and requesting a new client certificate that includes a public key.

For more detailed information on creating signed certificates, refer to the OpenSSL documentation.
Sample procedures show you how to create certifications and keys. All examples are hypothetical;
the commands shown allow many other possible options not used in these examples. Create your
commands based upon your specific environment.

Create a CA Private Key and Public Certificate


Create a CA private key and public certificate:
1. Use the openssl genrsa command to create a CA private key.

openssl genrsa -out -new_servercakey.pem 1024

2. Use the openssl req command to create a CA public certificate.

openssl req -config openssl_req_ca.conf newkey rsa:1024 x509 days 3650


-key new_servercakey.pem -out new_serverca.crt

HP Vertica Analytics Platform (7.1.x)

Page 552 of 5055

HP Vertica Documentation

You enter the following sample CA certificate values in response to openssl command line
prompts. Rather than enter these values from command line prompts, you can provide the
same information within .conf files (as shown in the command examples in this section).
Country Name (2 letter code) [GB]:US
State or Province Name (full name) [Berkshire]:Massachusetts
Locality Name (e.g., city) [Newbury]:Cambridge
Organization Name (e.g., company) [My Company Ltd]:HP Vertica
Organizational Unit Name (e.g., section) []:Support_CA
Common Name (e.g., your name or your server's hostname) []:myhost
Email Address []:myhost@vertica.com
When you create a certificate, you must include one unique name for each certificate that you
create. This unique name is referred to as a Distinguished Name (DN). The examples in this
section use the Organizational Unit Name for the DN.
Result: You now have a CA private key, new_servercakey.pem. You also have a CA public
certificate, new_serverca.crt. Use both the private key and the public certificate in the procedures
that follow for creating server and client certificates.

Creating the Server Private Key and Certificate


Create the servers private key file and certificate request, and sign the server certificate using the
CA private key file:
1. Use the openssl genrsa command to create the servers private key file.

openssl genrsa -out new_server.key 1024


HP Vertica supports unencrypted key files only; do not use a des3 argument.
2. Use the openssl req command to create the server certificate request.

openssl req -config openssl_req_server.conf -new -key new_server.key -out


new_server_reqout.txt
The configuration file (openssl_req_server.conf) includes information that is incorporated
into your certificate request. If it does not, then for the .conf file, enter the information in

HP Vertica Analytics Platform (7.1.x)

Page 553 of 5055

HP Vertica Documentation

response to command-line prompts.) In this example, the Organizational Unit Name contains
the unique DN, Support_server.

Country Name (2 letter code) [GB]:US


State or Province Name (full name) [Berkshire]:Massachusetts
Locality Name (e.g., city) [Newbury]:Cambridge
Organization Name (e.g., company) [My Company Ltd]:HP Vertica
Organizational Unit Name (e.g., section) []:Support_server
Common Name (e.g., your name or your server's hostname) []:myhost
Email Address []:myhost@vertica.com
3. Use the openssl command x509 to sign the servers certificate using the CA private key file
and public certificate.

openssl x509 -req -in new_server_reqout.txt -days 3650 -sha1 CAcreateserial -CA new_serverca.crt -CAkey new_servercakey.pem -out new_
server.crt
Result: You created the server private key file, new_server.key, and then signed the server
certificate using the CA private key (new_servercakey.pem) and CA public certificate (new_
serverca.crt). The result outputs to a new server certificate, new_server.crt.

Create the Client Private Key and Certificate


Create the clients private key file and certificate request, and sign the client certificate using the
CA private key file:
1. Use the openssl genrsa command to create the clients private key file.

openssl genrsa -out new_client.key 1024


2. Use the openssl req command to create the client certificate request.

openssl req -config openssl_req_client.conf -new -key new_client.key -out


new_client_reqout.txt
The configuration file (openssl_req_client.conf) includes information that is incorporated
into your certificate request. If it does not, then, for the .conf file enter the information in

HP Vertica Analytics Platform (7.1.x)

Page 554 of 5055

HP Vertica Documentation

response to command-line prompts. In this example, the Organizational Unit Name contains
the unique DN, Support_client.

Important: If you plan to use mutual authentication, you can create the client certificate in
a way that allows self-authentication. To do so, set the Common Name (CN) field to the
value of the database user name you want to use with the target database.

Country Name (2 letter code) [GB]:US


State or Province Name (full name) [Berkshire]:Massachusetts
Locality Name (e.g., city) [Newbury]:Cambridge
Organization Name (e.g., company) [My Company Ltd]:HP Vertica
Organizational Unit Name (e.g., section) []:Support_client
Common Name (e.g., your name or your server's hostname) []:myhost
Email Address []:myhost@vertica.com
3. Use the openssl command x509 to sign the clients certificate using the CA private key file and
public certificate.

openssl x509 -req -in new_client_reqout.txt -days 3650 -sha1 CAcreateserial -CA new_serverca.crt -CAkey new_servercakey.pem -out new_
client.crt
Result: You created the client private key file, new_client.key, and then signed the client
certificate using the CA private key (new_servercakey.pem) and CA public certificate (new_
serverca.crt). The result outputs to a new server certificate, new_client.crt.

HP Vertica Analytics Platform (7.1.x)

Page 555 of 5055

HP Vertica Documentation

Generating Certificates and Keys Summary

Set Client Key and Certificate Permissions


Set permissions for client certificates and keys:
chmod 700 new_client.crt new_client.key

HP Vertica Analytics Platform (7.1.x)

Page 556 of 5055

HP Vertica Documentation

You do not need to set permissions for server certificates and keys because you add the contents
of those files to parameters on the server.

JDBC Certificates
The Java Runtime Environment (JRE) manages all keys and certificates with a keystore and a
truststore.
l

A keystore is a repository that includes private keys and trusted certificates. The private keys
have public key certificates associated with them. The keystore provides credentials for
authentication.

A truststore contains the certificates of trusted third-party certificate authorities with which you
communicate. The truststore verifies credentials.

After you generate the SSL certificates and keys, verify that the JRE detects the certificate
authoritys certificate.
This sample procedure adds a client certificate to a keystore/truststore.
1. Use the openssl x509 command to convert the CA certificate to a form that Java can detect.

openssl x509 -in new_serverca.crt -out new_serverca.crt.der -outform der


2. Use the keytool utility with the keystore option to create and add credentials to a truststore
(truststore).
n

The -noprompt option allows you to proceed without prompts. You can add the commands
given in this procedure to a script.

The alias and storepass values shown are arbitrary examples only. Specify values
appropriate for use in your environment.

keytool -noprompt -keystore truststore -alias verticasql -storepass


vertica -importcert -file new_serverca.crt.der
3. Because you cannot directly import both the certificate and key into your keystore, add them
into a pkcs12 file using the openssl pkcs12 command.

openssl pkcs12 -export -in new_client.crt -inkey new_client.key -password

HP Vertica Analytics Platform (7.1.x)

Page 557 of 5055

HP Vertica Documentation

pass:vertica -certfile new_client.crt -out keystore.p12


4. Use the keytool utility to import your certificate and key into your keystore.

keytool -noprompt -importkeystore -srckeystore keystore.p12 -srcstoretype


pkcs12 -destkeystore verticastore -deststoretype JKS -deststorepass
vertica -srcstorepass vertica
To complete the setup for mutual authentication, you must perform a similar procedure to add your
server certificate to a keystore/truststore.

JDBC Certificates Summary

HP Vertica Analytics Platform (7.1.x)

Page 558 of 5055

HP Vertica Documentation

Generating Certificates and Keys for MC


A certificate signing request (CSR) is a block of encrypted text. You generate the CSR on the
server on which the certificate will be used. You send the CSR to a certificate authority (CA) to
apply for a digital identity certificate. The CA uses the CSR to create your SSL certificate from
information in your certificate; for example, organization name, common (domain) name, city,
country, and so on.
Management Console (MC) uses a combination of OAuth (Open Authorization), Secure Socket
Layer (SSL), and locally-encrypted passwords to secure HTTPS requests between a user's
browser and MC, and between MC and the agents. Authentication occurs through MC and between
agents within the cluster. Agents also authenticate and authorize jobs.
The MC configuration process sets up SSL automatically, but you must have the openssl package
installed on your Linux environment first.
When you connect to MC through a client browser, HP Vertica assigns each HTTPS request a selfsigned certificate, which includes a timestamp. To increase security and protect against password
replay attacks, the timestamp is valid for several seconds only, after which it expires.
To avoid being blocked out of MC, synchronize time on the hosts in your HP Vertica cluster, and on
the MC host if it resides on a dedicated server. To recover from loss or lack of synchronization,
resync system time and the Network Time Protocol. See Set Up Time Synchronization in the
Installation Guide. If you want to generate your own certificates and keys for MC, see Generating
Certificates and Keys for MC.

Signed Certificates
For production, you must use certificates signed by a certificate authority. You can create and
submit a certificate.Then, when the certificate returns from the CA, import the certificate into MC.
To generate a new CSR, use the openssl command:
openssl req -new -key /opt/vertica/config/keystore.key -out server.csr

When you press Enter, you are prompted to enter information that will be incorporated into your
certificate request. Some fields contain a default value, which, for security reasons, you should
change. Other fields you can leave blank, such as password and optional company name. To leave
the field blank, type '.' .
Important: The keystore.key value for the -key option creates private key for the keystore. If
you generate a new key and import it using the Management Console interface, the MC

HP Vertica Analytics Platform (7.1.x)

Page 559 of 5055

HP Vertica Documentation

process does restart properly. You must restore the original keystore.jks file and restart
Management Console.
This information is contained in the CSR and shows both the default and replacement values:
Country Name (2 letter code) [GB]:USState or Province Name (full name)
[Berkshire]:Massachusetts
Locality Name (eg, city) [Newbury]: Cambridge
Organization Name (eg, company) [My Company Ltd]:HP
Organizational Unit Name (eg, section) []:Information Management
Common Name (eg, your name or your server's hostname) []:console.vertica.com
Email Address []:mcadmin@vertica.com

The Common Name field is the fully qualified domain name of your server. Your entry must
exactly match what you type in your web browser, or you receive a name mismatch error.

Self-Signed Certificates
To test your new SSL implementation, you can self-sign a CSR using either a temporary certificate
or your own internal CA, if one is available.
Note: A self-signed certificate generates a browser-based error notifying you that the signing
certificate authority is unknown and not trusted. For testing purposes, accept the risks and
continue.
The following command generates a temporary certificate, which expires after 365 days:
openssl x509 -req -days 365 -in server.csr -signkey /opt/vertica/config/keystore.key -out
server.crt

This example shows the command's output to the terminal window:


Signature oksubject=/C=US/ST=Massachusetts/L=Billerica/O=HP/OU=IT/
CN=console.vertica.com/emailAddress=mcadmin@vertica.com
Getting Private key

You can now import the self-signed key, server.crt, into Management Console.

See Also
l

How to Configure SSL

Key and Certificate Management Tool

HP Vertica Analytics Platform (7.1.x)

Page 560 of 5055

HP Vertica Documentation

Importing a New Certificate to MC


Use the procedure that follows to import a new certificate into Management Console.
Note: To generate a new certificate for Management Console, you must use the
keystore.key file, which is located in /opt/vconsole/config on the server on which you
installed MC. Any other generated key/certificate pair causes MC to restart incorrectly. You
will then have to restore the original keystore.jks file and restart Management Console. See
Generating Certifications and Keys for Management Console.

To Import a New Certificate


1. Connect to Management Console, and log in as an administrator.
2. On the Home page, click MC Settings.
3. In the button panel on the left, click SSL certificates.
4. To the right of "Upload a new SSL certificate," click Browse to import the new key.
5. Click Apply.
6. Restart Management Console.

Distributing Certificates and Keys


After you create the prerequisite certifications and keys for one host, you can distribute them
throughout the cluster using the Administration Tools. Client files, however, cannot be distributed
through Administration Tools.
You do not have to start the database to use Admintools to distribute the certificate and key files.
To distribute certifications and keys to all hosts in a cluster:
1. Log on to a host that contains the certifications and keys you want to distribute, and start the
Administration Tools.
See Using the Administration Tools for information about accessing the Administration Tools.
2. On the Main Menu in the Administration Tools, select Configuration Menu, and click OK.
3. On the Configuration Menu, select Distribute Config Files, and click OK.

HP Vertica Analytics Platform (7.1.x)

Page 561 of 5055

HP Vertica Documentation

4. Select SSL Keys, and click OK.


5. Select the database on which you want to distribute the files, and