Professional Documents
Culture Documents
H19048
Reference Architecture
Abstract
This reference architecture describes how to deploy VMware
Greenplum on Dell PowerFlex in a two-layer architecture. This
document also states the best practices to deploy Greenplum in a
PowerFlex environment to meet performance, resiliency, and scale
requirements.
PowerFlex Engineering
Validated
Copyright
The information in this publication is provided as is. Dell Inc. makes no representations or warranties of any kind with respect
to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular
purpose.
Use, copying, and distribution of any software described in this publication requires an applicable software license.
Copyright © 2022 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell EMC and other
trademarks are trademarks of Dell Inc. or its subsidiaries. Intel, the Intel logo, the Intel Inside logo and Xeon are trademarks
of Intel Corporation in the U.S. and/or other countries. Other trademarks may be trademarks of their respective owners.
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.com
Copyright © 2022 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and
intellectual property laws.
VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered
trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned
herein may be trademarks of their respective companies.
Dell Inc. believes the information in this document is accurate as of its publication date. The information is subject to change
without notice.
Published in the USA 08/22 Reference Architecture H19048.
Reference Architecture
Executive summary
Contents
Chapter 6 Conclusion 35
Summary ............................................................................................................36
Chapter 7 References 37
Dell Technologies documentation .......................................................................38
Greenplum documentation ..................................................................................38
Reference Architecture
Executive summary
Chapter 8 Appendix 39
Appendix A Configuration 40
Overview .............................................................................................................41
PowerFlex configuration ......................................................................................41
Greenplum configuration .....................................................................................42
Appendix B Results 43
Gpcheckperf results ............................................................................................44
Reference Architecture
Executive summary
Terminology ........................................................................................................7
Audience..............................................................................................................7
Revisions .............................................................................................................8
Reference Architecture
Executive summary
Problem statement
Present-day businesses require modern product delivery that caters to their growing
business needs and meets their end-user requirements. To build a big data system
among different competing platforms, the users would prefer a platform with the most
capable, user-friendly, and durable adoption. Many organizations are facing challenges in
big data analytics to achieve dynamic growth and flexibility, while maintaining high
performance and availability. The reality is that often one of these key components is
sacrificed for the sake of the other. VMware Greenplum is a combination that holds all
these components: Greenplum for providing specialized big data analytics database,
VMware for self-management and automation, and PowerFlex for flexibility, resiliency,
and high performance.
Solution overview
PowerFlex is a software-defined storage platform designed to significantly reduce
operational and infrastructure complexity, empowering organizations to move faster by
delivering flexibility, elasticity, and simplicity with predictable performance and resiliency at
scale. The PowerFlex family provides a foundation that combines compute and high-
performance storage resources in a managed, unified fabric. PowerFlex comes in flexible
deployment options – integrated rack, appliance, or Ready Nodes - that enable two-layer
(compute and server SAN), single-layer (HCI), and storage only architectures. PowerFlex
is ideal for high-performance applications and databases, building an agile private cloud,
or consolidating resources in heterogeneous environments.
Building a big data system needs a lot of attention as every part and component must be
engineered. To build such a system is expensive and bears the risk of developing an
under featured platform when the vendor has a proprietary and a closed source code.
VMware Greenplum is open source and based on the PostgreSQL open-source core.
Greenplum benefits from being open source and it is based on the trust of two decades of
open-source PostgreSQL development on the core database engine.
Reference Architecture
Executive summary
Terminology
The following table provides definitions for some of the terms that are used in this
document.
Table 1. Terminology
Term Definition
CO Compute Only
OS Operating System
SO Storage Only
VM Virtual Machine
Audience
This reference architecture is intended for database administrators, system engineers,
and partners who want to deploy Greenplum database on a PowerFlex platform.
The reader of this document should have a working knowledge of the following
technologies for the maximum understanding of this reference architecture:
• PowerFlex
• VMware Greenplum
Reference Architecture
Executive summary
Revisions
Date Description
Reference Architecture
Executive summary
Reference Architecture
Executive summary
Reference Architecture
Executive summary
PowerFlex Software is the key factor of success in the PowerFlex offering. PowerFlex software
software components provide software-defined storage services. The software components help to
components simplify the infrastructure management and orchestration with comprehensive ITOM and
LCM capabilities that span compute and storage infrastructure, from BIOS and firmware to
nodes, software, and networking.
PowerFlex
PowerFlex is the software foundation of the PowerFlex platform. It delivers high
performance, highly resilient block storage service that can scale to thousands of nodes.
PowerFlex Manager
PowerFlex Manager is the software component in PowerFlex family that enables ITOM
automation and LCM capabilities while enabling flexible APIs and extensive automation.
PowerFlex PowerFlex is available in multiple consumption options to help customers meet their
consumption project and data center requirements. The PowerFlex appliance and PowerFlex rack
options provide customers the flexibility to choose a deployment option to meet their exact
requirements.
PowerFlex rack
The PowerFlex rack is a fully engineered system, with integrated networking that enables
the customers to simplify deployments and accelerate time to value.
Reference Architecture
Executive summary
PowerFlex appliance
The PowerFlex appliance allows customers the flexibility and savings to choose their own
compatible networking. The PowerFlex appliance offers customers a smaller starting point
of four nodes, while enabling them to use their existing network infrastructure.
With PowerFlex, the customers deploy to match their initial needs and easily expand with
massive scale potential, without having to compromise on performance and resiliency.
Flexible PowerFlex is available through APEX custom solutions by the APEX Flex on Demand and
consumption- APEX Datacenter Utility for customers looking to adopt consumption-based OpEx models.
based billing
options APEX Flex on Demand
APEX Flex on Demand allows you to pay for technology as you use it and provides
immediate access to buffer capacity. Your payment adjusts to match your usage.
Reference Architecture
Chapter 2: Product overview
Deploy anywhere Run analytics on public and private clouds, Kubernetes, or on-premises.
VMware Greenplum provides your enterprise with flexibility and choice because it can be
deployed on all major public and private cloud platforms, on-premises, and with container
orchestration systems like Kubernetes. Deploy and manage hundreds of Greenplum
instances easily.
Reference Architecture
Chapter 3: Solution architecture
Overview ............................................................................................................15
Reference Architecture
Chapter 3: Solution architecture
Overview
This section provides an overview of the components that are involved in this solution
from a physical and logical perspective. For this document, the Greenplum solution is
deployed on Dell PowerFlex rack. Initially, PowerFlex with a disaggregated architecture
(two-layer) system is deployed with Compute Only (CO) nodes running ESXi hypervisor
for compute and network, and Storage Only (SO) nodes with Red Hat Enterprise Linux
7.9. After the PowerFlex is installed and validated, Greenplum is installed on top of
PowerFlex.
Logical architecture
The following figure illustrates the logical view of the Greenplum on PowerFlex with 10 SO
nodes and 12 CO nodes. The Greenplum is deployed on the CO nodes with one instance
of master and 10 segments.
Reference Architecture
Chapter 3: Solution architecture
Reference Architecture
Chapter 3: Solution architecture
Each of the SO node is fully populated with ten 7.68 TB SAS SSD drives. From
PowerFlex storage layout perspective, a single PowerFlex cluster with two protection
domains is used. Since each of the PowerFlex node is fully populated with ten disks,
these 100 disks are used to create four storage pools from which the various volumes are
created. The 12 CO nodes which are the ESXi hosts has the SDC component that makes
the ESXi data stores available from the PowerFlex volumes. Once the data stores
become available, Greenplum VMs are created in the VCenter with a single master and
ten segments.
For more information about the detailed configuration of the SO nodes, CO nodes, and
Greenplum master and segment VMs, see Greenplum configuration.
Reference Architecture
Chapter 3: Solution architecture
Network architecture
The following figure shows the two-layer network architecture that is based on PowerFlex
best practices:
• Two Z9100 switches are configured with VLT to provide fault tolerance and enable
connectivity with other switches.
• Three dual port 25 Gb Mellanox NICs on each server provide 6 x 25 Gb ports.
• On compute nodes, 2 x 25 Gb ports are NIC teamed to provide high availability.
Another 2x25 Gb ports for Greenplum interconnect.
• Dedicated VLAN is configured to provide connectivity with the customer network,
similar VLAN is dedicated vMotion, and VLAN 105 is dedicated to Hypervisor
(ESXi) management.
Reference Architecture
Chapter 3: Solution architecture
Greenplum architecture
After PowerFlex is installed, Greenplum database is configured by creating VMs on the
compute nodes. As shown in the following figure, each ESXi host holds a single VM with
five primary and five mirror segments. Greenplum master is configured on one of the ESXi
hosts with no master mirror being configured. The configuration details of master and
segments are mentioned in the node configuration section of the Appendix. Greenplum
interconnect uses a standard Ethernet switching fabric in this case it’s a 2 x 25 Gb switch.
Half of the primary segments are using protection domain 01 and the other half are using
protection domain 02. Their corresponding mirrors are using the opposite protection
domains. These mirrors act as an additional level of data protection; in case an entire
protection domain goes down in PowerFlex.
Reference Architecture
Chapter 4: Testing and validation
Reference Architecture
Chapter 4: Testing and validation
Basic validation
After the Greenplum cluster is installed, a basic validation is carried out to ensure that
PowerFlex compute, storage, and networking components are performing as expected.
The following tools are used to carry out the basic validation of PowerFlex and
Greenplum: FIO and gpcheckperf.
FIO tool: This tool is used to carry out storage IO characterization, an FIO tool is used to
measure the IOPS and bandwidth of a cluster. FIO was originally written as a tool to test
the Linux I/O subsystem. It has the flexibility to select different IO sizes, sequential or
random reads and writes. It spawns worker threads or processes to perform the specified
I/O operations.
Gpcheckperf tool: Greenplum provides a utility that is called gpcheckperf, which is used
to identify hardware and system-level performance issues on the hosts in the Greenplum
cluster. gpcheckperf starts a session on one of the hosts in the cluster and runs the
following performance tests:
Reference Architecture
Chapter 4: Testing and validation
FIO storage IO Read and write tests are carried out using FIO tool. The following commands are issued
characterization on each file system mountpoint which is mapped to every disk for each SDS using the FIO
tool.
The FIO test was run using a PowerFlex cluster that had 5 Storage Only nodes in a single
protection domain. Each Storage Only node was populated with ten SSD disks, in total
there were 50 disks.
Note: The FIO tests showed a read bandwidth of 28 GB/s. The HBA limit on the storage nodes
were reached in this test. There was no bottleneck observed at the PowerFlex storage side. Extra
network bandwidth allows an increased bandwidth.
Reference Architecture
Chapter 4: Testing and validation
The FIO test was run using a PowerFlex cluster that had five Storage Only nodes in a
single protection domain. Each Storage Only node was populated with 10 SSD disks, in
total there were 50 disks.
Note: The FIO tests showed a write bandwidth of 14 GB/s. The HBA limit on the storage nodes
were reached in this test. The NVMe nodes would eliminate this bottleneck, and then moving to
100 GbE networking would increase the performance up to a theoretical 20 GB/s per node. There
was no bottleneck observed at the PowerFlex storage side.
The bandwidth in the write test is half of that observed during the read test. This
bandwidth loss happens because the writes to PowerFlex volumes are mirrored at the
storage layer to allow for data resiliency. Therefore, every host write results in two writes
in the storage layer.
Reference Architecture
Chapter 4: Testing and validation
Greenplum IO Gpcheckperf is used to test the performance of the hardware across the Greenplum
characterization cluster. I/O, memory, and network tests are carried out from the Greenplum VMs by
gpcheckperf utility, provided by VMware.
Reference Architecture
Chapter 4: Testing and validation
• Data and queries are all static, which allows measuring and comparing performance
of the cluster with previous runs.
• Static data is a range of small, medium, large, and even extra-large tables.
• Queries were designed to simulate mix of workload on the cluster which would impact
CPU, memory, IO resources.
• Query test suite has two types of workloads. One with 19 queries (smaller suite) and
the other with 91 queries (complex suite), both allowing for sequential or parallel runs.
• Query test suite can be scheduled through cron, can be run multiple times, and can
be scaled up to support additional stress testing (such as running 2 x 91 queries
simultaneously, for a total of 182 parallel queries).
• Once the query test suite is performed, query runtime can be analyzed against
different configurations and different clusters to assess performance gains or losses.
Test results
There are two use cases considered for running tests on the Greenplum cluster.
The first use case is to run the data load along with 182 parallel complex queries test
simultaneously to stress the system. These are the actual production queries extracted
from Dell Digital team workload.
Table 2. Parameters
Parameters Value
Time taken for test 11.58 min
Read bandwidth 40 GB/s
Write Bandwidth 10 GB/s
Reference Architecture
Chapter 4: Testing and validation
It is observed there was no bottleneck at the storage side of PowerFlex when this test was
run. The network bandwidth picked close to 50 GB/s which is the maximum limit that can
be achieved as the network card has 2 x 25 GB/s support.
The second use case is to create a storage-consistent snapshot while the above test of
data load along with 182 parallel queries is being run.
Table 3. Parameters
Parameters Value
Time taken for test 12.33 min
Read bandwidth 32 GB/s
Write Bandwidth 9 GB/s
When the snapshot is taken, the maximum bandwidth at PowerFlex is seen at 41 GB/s.
The read bandwidth is at 32 GB/s whereas the write bandwidth is at 9 GB/s. This
bandwidth drop is temporary due to the heavy load on the system (relative to its size) at
the time of the snapshot and was not seen when the workload was run again.
Note: The GPDSB like queries that were used in the tests described in this document are not part
of an audited benchmark and provided for educational purpose only.
As mentioned in the logical architecture section for Greenplum, each ESXi host holds a
single VM with five primary and five mirror segments. There are ten ESXi hosts and 100
segments in total. Out of the 100 segments, there were 50 primary segments to carry out
the GPDSB tests. The configuration details of master and segments is mentioned in the
Appendix.
Initially, a trial GPDSB test was carried out to verify that the queries can run successfully.
The smoke test was carried out with a scale factor of 1, which means 1 GB data load
across the data segments of Greenplum. A score of 0 is expected as shown in the
following result.
Reference Architecture
Chapter 4: Testing and validation
Load 35
Analyze 38
1 User Queries 183
Concurrent Queries 365
Q 594
TPT 366
TTT 365
TLD 0
Score 0
The following config which was present in the tpcds_variables.sh file was used to
run the GPDSB test that uses five concurrent users along with 3 TB of data load.
# benchmark options
GEN_DATA_SCALE="3000"
MULTI_USER_COUNT="5"
# step options
RUN_COMPILE_TPCDS="true"
RUN_GEN_DATA="true"
RUN_INIT="true"
RUN_DDL="true"
RUN_LOAD="true"
RUN_SQL="true"
RUN_SINGLE_USER_REPORT="true"
RUN_MULTI_USER="true"
RUN_MULTI_USER_REPORT="true"
RUN_SCORE="true"
After running the GPDSB test, an overall score of 40 was achieved as follows:
The test was also carried with varying number of concurrent users from 1 to 5 for 3 TB
load. The following graph shows the performance scale linearity while running the tests:
Reference Architecture
Chapter 4: Testing and validation
SCORE
25,000
25
20,000 20
15,000 15
10,000 10
5,000 5
0 0
1 2 3 4 5
Virtual Users 1 2 3 4 5
Score 44 43 42 41 40
Weighted Queries 297 594 891 1188 1485
Concurrent Queries 6635 13819 21387 29512 38549
The following Global User Configuration (GUC) settings are applied during the test run of
3 TB tests.
gpconfig -c gp_interconnect_queue_depth -v 16
gpconfig -c gp_interconnect_snd_queue_depth -v 16
gpconfig -c gp_resqueue_priority_cpucores_per_segment -v 16
gpconfig -c optimizer_enable_associativity -v on
Reference Architecture
Chapter 4: Testing and validation
During the 3 TB GPDSB test run, the host CPU and VM CPU were close to 75-80%
utilized as shown in the following figures.
Reference Architecture
Chapter 4: Testing and validation
The following figure shows the PowerFlex GUI during the 3 TB GPDSB run. An average of
4 GB/s was achieved with close to 14K IOPS.
Reference Architecture
Chapter 5: Best practices
Overview ............................................................................................................32
Reference Architecture
Chapter 5: Best practices
Overview
This section describes the best practices that are derived from the performance tests and
the standard practices to deploy the PowerFlex cluster. Most of the PowerFlex and
VMware practices are standard proven best practices whereas Greenplum practices are
derived after running multiple tests with various parameters and fine-tuning.
Reference Architecture
Chapter 5: Best practices
Reference Architecture
Chapter 5: Best practices
• Use the statement_mem to allocate the memory used for a query per segment
database.
• Use the resource queues to set the numbers of active queries
(ACTIVE_STATEMENTS) and the amount of memory (MEMORY_LIMIT) that can
be used by queries in the queue.
• Associate all the users with a resource queue. Do not use the default queue.
• Set the priority to match the real needs of the queue for the workload and time of
the day. Avoid using MAX priority.
• Ensure that the resource queue memory allocations do not exceed the setting
for gp_vmem_protect_limit.
• Dynamically update the resource queue settings to match the daily operations flow.
See VMware Greenplum documentation to get the latest recommendations and best
practices for Greenplum.
Reference Architecture
Chapter 6: Conclusion
Chapter 6 Conclusion
Summary .........................................................................................................366
Reference Architecture
Chapter 6: Conclusion
Summary
This reference architecture describes the process to deploy VMware Greenplum on
PowerFlex using a two-layer architecture illustrating the flexibility and performance of the
solution and how it can fit complex use cases.
These use cases are not just theoretical, the architecture was tested using realistic data.
Part of this included performing 182 parallel queries while the PowerFlex storage was
placed under additional load and showed excellent performance without any storage
bottlenecks.
The architecture also provides best practices for deploying Greenplum in a PowerFlex
environment. These best practices are designed to drive performance, resiliency, and
scale from the environment when implemented.
If you are interested in discovering other reference architectures for your PowerFlex
environment go to the Dell InfoHub. If you want to learn more about how Dell can help
your organization reach its IT goals, contact your Dell representative.
Reference Architecture
Chapter 7: References
Chapter 7 References
Greenplum documentation.............................................................................388
Reference Architecture
Chapter 7: References
Greenplum documentation
The following VMware documentation provides additional and relevant information:
• VMware Greenplum Documentation
• VMware Greenplum Features
Reference Architecture
Chapter 8: Appendix
Chapter 8 Appendix
Appendix A Configuration…………………………………………………………40
Appendix B Results………………………………………………………………...44
Reference Architecture
Appendix A Configuration
Overview ............................................................................................................41
Reference Architecture
Overview
This section describes the node configuration details for Greenplum and PowerFlex.
PowerFlex configuration
Node The following table shows the node configuration:
configuration
Table 4. PowerFlex node details
Component Details
Reference Architecture
Greenplum configuration
The following table shows the configuration of the Greenplum components:
Component Details
Memory/node • 1 TB/Node
VM Cores/node • 60 vcores
Network • 2x25 Gb
Reference Architecture
Appendix B Results
Gpcheckperf results…………………………………………………………………44
Reference Architecture
Gpcheckperf results
== RESULT 2021-08-25T12:27:20.823828
Writes:
Note: Read results are removed because gpcheckperf returned unrealistic measurement. The
reason is that gpcheckperf uses DD command to prepare a file that contains zeros for the read
test. Due to optimizations for zero data at the storage layer, the reads never reach the SSD drives,
and the bandwidth that is returned is unrealistically high, not reflecting the read bandwidth of
actual data stored in PowerFlex.
Network:
Reference Architecture
stream min bandwidth (MB/s): 9816.50
[sdw5]
Reference Architecture