Professional Documents
Culture Documents
RACK ADMINISTRATION
- CLASSROOM
Version [2.0]
PARTICIPANT GUIDE
PARTICIPANT GUIDE
prathamesh.mitkar@atos.net
prathamesh.mitkar@atos.net
Dell Confidential and Proprietary
Copyright © 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies,
Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be trademarks of their respective owners.
Introduction
Course Objectives
Prerequisite Skills
Course Agenda
Introductions
This module revisits the VxFlex integrated rack architecture and configurations. It
also introduces management tools that are used by an administrator to configure,
monitor, and troubleshoot VxFlex integrated rack.
This lesson reviews the VxFlex integrated rack architecture and operations. It
presents hardware and software components, networking, and the logical
configuration of the system. It also covers a list of resources and documentation
available to support managing and maintaining VxFlex integrated rack in your
environment.
evenly distributing data among servers and media devices. The symmetry
resulting from this leads to high and predictable system performance.
VxFlex integrated rack comes with a rich set of tools to manage the
environment. VxFlex Manager enables administrators to deploy and manage
the infrastructure and workloads. VxFlex OS interfaces and hypervisor native
tools also provide options to configure and manage system resources. The
Release Certification Matrix (RCM) enables system life cycle assurance through
compliance.
Here are common deployment use cases for the VxFlex integrated rack. The
applications/workloads can vary from high-performance business database
applications such as Oracle, SAP, and Microsoft, to modernized analytic
applications such as Hadoop and Splunk.
The image on the slide represents workloads that run on VxFlex integrated rack.
These workloads are sorted by industry verticals.
VxFlex nodes: VxFlex integrated rack nodes use Dell PowerEdge servers to
provide computing power and storage. The available server options for VxFlex
nodes are PowerEdge R640, R740xd, and R840. The older systems use
PowerEdge models R630, and R730xd. VxFlex nodes are available as either
hyper-converged, storage-only, or compute-only nodes. These nodes come with
various combinations of processor, storage, and memory to provide choice and
flexibility that is needed in a customer environment.
The storage-only nodes run Red Hat Enterprise Linux operating system and
provide storage capacity independent of processing power. The compute-only
nodes provide computing power independent of storage. The hyper-converged
nodes provide both processing power and storage capacity. All nodes have four
10/25-GbE SFP28+ ports along with a management port for iDRAC.
Tip: All nodes use a storage controller (RAID) card that contains a
pair of M.2 SSDs as a boot device called BOSS. BOSS is a plug-in
PCIe device that provides protection and high-speed connectivity.
Refer to VxFlex integrated rack Datasheet for details on node type
and specifications.
The SDS enables a node in the VxFlex OS cluster to contribute its local storage to
the aggregated pool. Any server that contributes storage to the VxFlex OS cluster
needs a running instance of the SDS.
Protection Domain
Fault set A Fault set B
Storage Pools enable the creation of different storage tiers in the VxFlex OS
cluster. A Storage Pool is a set of physical storage devices in a Protection Domain.
When a volume is configured from a Storage Pool, it is distributed over all devices
and servers contributing to that pool.
Depending on the allocation unit size, a storage pool layout can either be of
"Medium Granularity (MG)" or "Fine Granularity (FG)" type. In MG storage pools,
volumes are divided into 1 MB allocation units, distributed and replicated across all
disks contributing to a pool. FG storage pools are more space efficient, with an
allocation unit of just 4 KB and a physical data placement scheme based on Log
Structure Array (LSA) architecture.
A Fault Set is a logical entity that creates groups of SDSs in a Protection Domain
that are likely to fail as a group. For example, SDSs that are all powered in the
same rack. By design, VxFlex OS never maintains both copies of a block of data in
the same Fault Set.
Each SDC node has an in-memory map that holds information about which SDS
owns which chunk of data. The SDC cache is extremely space-efficient, consuming
a minimum of memory. For example, For the 1-MB allocation unit, VxFlex OS can
store all needed metadata for 10 PB of data in 3 MB of RAM. VxFlex OS has
hundreds of thousands of metadata entries within this 3 MB which makes the
system highly scalable. The SDS controls the allocation of chunks to specific data
locations within the SDS. The SDS also maintains its own metadata for this
information.
The one rack unit Cisco Nexus 93180YC-EX access switch provides 48 -
1/10/25G-Gbps SFP+ ports and six - 40/100-Gbps QSFP+ uplink ports. It provides
10 GbE and 25 GbE IP connectivity between the FLEX nodes and Controller
nodes. It also provides a 40 Gb uplink connection to the external network or
aggregation layer.
The Cisco Nexus 3172TQ management switch provides 1 Gb connection for out-of-
band management. It uplinks directly to the customer management network to
provide access to the component management ports.
Spine-leaf Architecture
For greater scalability without network oversubscription, VxFlex integrated rack can
be configured with spine-leaf architecture. In this architecture, access switches
(Cisco Nexus 93180YC-EX) are replaced by the leaf switches (Cisco Nexus
93240YC-FX2), and the aggregation switches (Cisco Nexus 9236C) are replaced
with spine switches (Cisco Nexus 9336C-FX2). The leaf switches connect to the
spine switches in a full-mesh topology. This environment can scale out by adding
leaf switches and reduce oversubscription traffic by adding spine switches. The
spine-leaf architecture ensures predictable latency as server traffic always travels
the same number of hops to another server (except the servers on the same leaf).
Uplink to the customer core network is provided through two or four border leaf
switches. With the spine-leaf architecture, Layer 3 gateways are available at leaf
switches. This distributed gateway enables VM migration seamlessly between the
racks. VxFlex integrated rack currently supports three to six spine switches.
Customer Network
Each VxFlex node contributes four physical adapters or vmnics to the DVswitch
uplinks. Two of these connections are aggregated using Virtual Port Channel
(vPC). vPC is used only for nondata traffic on VxFlex nodes. The figure shows a
range of typical vPC identifiers that might be used for a series of servers. For
example, the first FLEX node is vPC 111, the second is vPC 112. Other two
connections are dedicated for VxFlex OS data traffic. DVSwitch 0 is used for all the
management and customer production VLANs. DVSwitch 1 and 2 are used for
VxFlex OS data 1 and data 2 VLANs respectively. Each VxFlex node also provides
an Ethernet port for out-of-band iDRAC connection.
Note that both compute, and storage only nodes have iDRAC connections for out-
of-band node management (not shown in the graphic).
Caution: PowerEdge Rx40 (R640, R740, ...) servers with any NVMe
devices cannot be added in a DirectPath I/O based system. Instead,
you can use an RDM-based system.
Dell EMC cabinets are configured with Panduit Intelligent Physical Infrastructure
(IPI) appliance. The IPI Appliance provides an intelligent gateway to gather
information about power, thermals, security, alerts, and all components in the
physical infrastructure for each cabinet.
The IPI Appliance incorporates door thermal sensors, door handle sensors, HID
security door handles, and intelligent PDUs. The IPI Appliance is the central point
of information for all intelligent operations in the cabinet. The PDUs enable remote
monitoring capabilities in each IPI cabinet and outlet-level control for each PDU.
Each cabinet has its own appliance with standard, redundant power.
The IPI Appliance is configured with default settings in the factory. Within the
solution, there are environmental, security, and power requirements, in addition to
asset and thermal management considerations.
Reference Documentation
VxFlex integrated rack systems are delivered as prebuilt, preconfigured, and fully
tested at the factory for the optimum system operation. This eliminates the need for
most system configuration, monitoring, and performance tuning tasks. Many of the
availability, load balancing, and capacity management tasks are highly automated,
which makes it a low touch, easy to manage solution. However, VxFlex integrated
rack administrators are still responsible for managing its day-to-day operation.
The Getting Started page provides a guided flow through the common
configurations that are required to prepare a new VxFlex Manager environment. A
green check mark on a step indicates that you have completed the step. As an
administrator you should have the VxFM already configured during implementation,
however for any future expansions or infrastructure changes you can revisit this
page. The Getting Started page provides the following information:
The Dashboard provides utilization overview of the services and resources being
managed by the VxFlex Manager. It displays at-a-glance information about service
history, resource overview, appliance overview, and activity logs. The dashboard
contains following sections:
A resource is a physical and virtual data center object that VxFlex Manager
interacts with including but not limited to server/nodes, network switches, VM
managers, and element managers. The Resources page displays detailed
information about all the resources and server pools that VxFlex Manager has
discovered and inventoried.
VxFM online help is built into the tool and serves as the user guide.
The Integrated Dell Remote Access Controller (iDRAC) is designed to make server
administrators more productive and improve the overall availability of the Dell
servers. iDRAC alerts administrators to server issues, helps them perform remote
server management, and reduces the need for physical access to the server.
As a part of iDRAC, the Dell Lifecycle Controller simplifies server life cycle
management tasks like provisioning, deployment, servicing, user customization,
patching and updating. It is a collection of out-of-band automation services,
embedded pre-OS applications, and remote interfaces that give you deployment,
update, and maintenance capabilities through managed, persistent storage.
Lifecycle Controller reduces your time spent on management tasks, reduces
potential for error, improves security, and increases overall efficiency in your
VxFlex integrated rack environment.
When you log in to the iDRAC web interface, the system Dashboard page provides
the summary of the managed server. You can view system health, information, and
the virtual console. The tabs on the top provide information about system and
storage components, configuration options, and server maintenance.
VxFlex integrated rack has two separate vSphere environments - one for the
VxFlex node cluster (running production applications), and the other one for the
VxFlex Management Controller cluster. Both the environments have distinct
elements such as ESXi hosts, networks, VMs, and datastores.
Cloning of VMs
Template creation
VMware vSphere vMotion, and VMware Storage vMotion
Initial configuration of VMware Distributed Resource Scheduler (DRS) and
VMware vSphere high-availability (HA) clusters
For more information about VMware vSphere and vCenter Server, see
www.vmware.com.
Red Hat Virtualization Manager (RHV-M) provides a user interface and a RESTful
API to manage the resources in Red Hat Virtualization environment. RHV-M is an
appliance style management package that is installed as a virtual machine in the
VxFlex Management Controller cluster (similar to VMware vCSA). It provides a rich
set of capabilities to monitor and manage virtual resources. Besides handling
standard virtual machines tasks, such as VM creation, managing virtual networking,
and VM storage, it provides policy-based VM scheduling, user access
management, and automation through Ansible.
The engine-backup tool provides capability to back up and restore the RHV-M
database and configuration. Backup and restore APIs used by RHV-M enable an
administrator to perform full or file-level backup and restore of a virtual machine
and its data.
The RHV-M Dashboard provides an overview of the health and status of the RHV
environment. It displays the summary of system resources and their utilization. The
top section of the Dashboard provides a global inventory of resources and their
status. The Global Utilization section shows the overall utilization of the system
resources, displayed as percentage and line graph of utilization in the last 24
hours. The Cluster Utilization section shows the cluster utilization for the CPU and
memory in a heatmap.
You can further drill down to each type of resource by navigating through the
resource tabs in the right-side panel of the Dashboard. The Compute tab provides
access and configuration to compute resources including VMs, Templates, and
Hosts. The Network tab provides access to various network resources and vNIC
Profiles. The Storage tab provides access to Domains, Volumes, and Disks.
NX-OS is an operating system for the Cisco Nexus series Ethernet switches. NX-
OS provides switch management functionality through a CLI. It provides:
This lesson presents the VxFlex OS management interfaces that are extensively
used to configure and manage storage resources in the VxFlex integrated rack
environment.
The Dashboard tiles provide a visual overview of the storage system status. The
tiles are dynamic, and contents are refreshed at the interval set in the system
preferences (default: 10-second intervals). System preferences can also be used to
set the display to basic or advanced reporting. Below is an overview of the various
tiles:
capacity Protected, Degraded, Unused, Spare, and so on. The number at the
center displays the total amount of available raw storage.
The toggle at the bottom-left corner of the physical capacity tile shows
Capacity Utilization. This tile displays the capacity used in the VxFlex OS
system. Besides displaying the physical, allocated, and provisioned
capacity, it indicates the ratio of compression and the capacity savings due
to thin provisioning.
I/O Workload: Displays the performance statistics of the system—IOPS,
bandwidth, and I/O size.
Rebalance and Rebuild: This tile displays the system's internal IOPS. This
internal I/O may be generated due to rebuilding or rebalancing the data within
the system.
SDCs: Displays the number of SDCs in the system.
Volumes: Displays the number of volumes created, available capacity, and the
used capacity in the system. The amount of free capacity shown on this tile is
the maximum amount that can be used for creating a volume. This capacity
considers how much raw data is needed for maintaining data mirroring and
system spares.
Protection Domains: Displays the number and status of all Protection Domains
that are defined in the system. This tile also displays the number and status of
all Storage Pools defined in the Protection Domains.
SDSs: Displays the number and status of all SDSs in the system. If any SDSs
are currently in Maintenance Mode, the orange maintenance icon is displayed
on this tile. This tile also displays the number and status of all storage devices
that are defined in the Storage Pools.
Management: Displays the status of the MDM cluster. The status is displayed
graphically as a combination of the MDM cluster elements, and an alert icon if
active alerts exist. When you hover your mouse pointer over this tile, a tooltip
displays the MDM IP information of the cluster.
Capacity Utilization
You can change the filter option at the top left part of the screen. You can choose
the view for Protection Domain and pools.
The Frontend view provides detailed information about frontend objects in the
system, including volumes, SDCs and snapshots, and lets you perform various
configuration operations. The Backend view provides detailed information about
VxFlex OS backend objects such as Protection Domain, Storage Pools, and
storage devices in the system. You can configure these objects from the backend
view. The device view shows all the devices in the system. Expand all the devices
by clicking + signs. You can also perform a few media device operations from this
view, such as device LED On/off.
The Monitor > Alerts view provides a list of the alert messages currently active in
the system, in table format. You can filter the table rows according to alert severity,
and according to object types in the system.
An example of an alert message is shown on the slide. Regarding the license alert
message, it shows that a trial license is in use, thus the user must purchase a
license and install it.
The VxFlex OS CLI enables you to perform all provisioning, maintenance, and
monitoring activities in VxFlex OS. Use SSH or RDM to log in to the shell running
on the MDM servers to use CLI.
The VxFlex OS Gateway includes the REST Gateway and the SNMP trap sender
functionality. In a VxFlex integrated rack, the Gateway is installed on the VxFlex
Management Controller. As part of the installation process, the VxFlex OS
Lightweight Installation Agent (LIA) is also installed. LIA works seamlessly with the
Gateway enabling the future upgrades and collecting system logs. You can enable
and disable the Gateway components.
The VxFlex OS Installer is part of the VxFlex OS Gateway. The Installer is used to
install and configure VxFlex OS components. The Installer also provides capability
to upgrade, analyze, and collect system logs. System Administrators are not
expected to use this interface for any day-to-day system administration and
management.
This lesson presents the VxFlex OS capacity components and their properties.
In a VxFlex integrated rack system, each SDS node that contributes storage to the
VxFlex OS is a part of a VxFlex OS protection domain. A protection domain
contains a group of SDSs that provide backups and protection for each other.
Typically, a VxFlex integrated rack system has only one protection domain, but it is
possible to distribute SDSs into multiple protection domains (typically in a multi-
tenant environment).
Configuring more than one Protection Domain may help to contain the impact of
server downtime on storage availability. With multiple Protection Domains, one
server or device can fail in every Protection Domain, and production I/O is
unaffected. Multiple Protection Domains improve system resilience.
You can get all the information about a Protection Domain (PD) from the VxFlex
OS CLI or GUI. The scli --query_protection_domain command provides
information about PD ID, number of Storage Pools, Fault Sets, SDS nodes in the
PD, number of volumes and available capacity. Details on each Storage Pool,
SDS, and volume is also displayed. In the GUI, you can view the property sheet for
the protection domain.
The most common use of storage pools is to establish performance tiering. For
example, within a protection domain, you can combine all the flash devices into one
pool and all the hard disk drives into another pool. By assigning volumes, you can
guarantee that frequently accessed data resides on low-latency flash devices while
the less frequently accessed data resides on high-capacity HDDs. Thus, you can
establish a performance tier and a capacity tier. You can divide the device
population as you see fit to create any number of storage pools.
VxFlex OS might not perform optimally if there are large differences between the
sizes of the devices in the same Storage Pool. For example, if one device has a
much larger capacity than the rest of the devices, performance may be affected.
With version 3.x, VxFlex OS introduces a new storage efficient Fine Granularity
layout. The space allocation in the FG layout is based on 4 KB allocation unit,
which can significantly reduce the amount of unused allocation units as data is
written, especially for small write I/Os. FG storage pools can live alongside MG
pools in a given SDS. Volumes can be migrated between the two layouts.
Caution: Storage Pools are set at the time of deployment and should
not be changed.
The zero padding policy cannot be changed after the addition of the first device to a
specific Storage Pool. You can add Storage Pools during installation. Although
storage pools are configured for optimal performance during installation, you can
modify the Storage Pools post-installation.
Note: NVDIMM failure in SDS will result in the failure of all the SSDs
associated with that NVDIMM in the node.
FG enables data compression which allows for faster reads and writes. Data
compression is not supported for Medium Granularity (MG). FG pools support thin
provisioned, zero padded volumes. FG pools use log structure array (LSA) to store
data in fixed size (256 KB) containers called logs. This architecture mitigates
fragmentation issues and minimizes empty regions. It also enables inline
defragmentation and does a garbage collection of full logs during rewrites.
Another consideration for the FG pools is the fact that 256x more metadata is
written due to the 4 KB allocation unit compared to 1 MB allocation unit. Byte
alignment further increases the amount of metadata. Compression results in more
data to be stored, which further adds to more metadata. The metadata of FG pools
cannot be saved in memory like in MG pools. So, FG reserves some space on
each disk to save the metadata.
The Storage Pool is always added to a Protection Domain. Each time that you add
devices to the system, you must map them to Storage Pools. Create Storage Pools
before you start adding SDSs and devices to the system. Storage pools can be
added from the CLI, GUI, and vSphere Plug-in. Adding FG pool first requires an
acceleration pool.
The acceleration pool is added in a similar way as other pool types. Choose
NVDIMM under the Pool Type, and add NVDIMM devices (DAX devices) for each
SDS node in the cluster.
2. Right-click the desired Protection Domain and select Add > Add Storage Pool
3. Specify the name for the new storage pool
4. Select the desired storage pool configuration options
5. Once a pool is created, right click the desired SDSs that are contributing
storage to the pool, and choose Add devices.
6. Choose the Path, Name, and Storage Pool to add the device.
You can get all the information about a Storage Pool from the VxFlex OS CLI or
GUI. The CLI command provides detailed information about volumes that are
created from the Storage Pools and available storage capacity. It also provides
information about all other attributes of a Storage Pool. You can also view
information about storage pools in the GUI by opening the properties sheet.
Starting with version 3.0, VxFlex OS provides the capability for inline data
compression. The compression reduces the volume of data that is stored on the
disk and improves space utilization. This storage efficiency feature is enabled via
NVDIMM devices that are used for fine-grained storage pools.
FG compressed pools can be chosen for situations where space efficiency is more
valuable than I/O performance, and the data is compressible. If VxFlex OS
determines that the user data is not compressible, it will override the compression
attribute. FG non-compressed pools are used where it does not make sense to
enable compression, but you still need read-intensive performance. In case a
volume has compression that is enabled, and non-compressible data is being
written, the system can determine that it should not compress the data, thus
avoiding the additional CPU utilization required by compression.
VxFlex OS protects data during server failures by reserving spare capacity, which
cannot be used for volume allocation. If there is a failure, VxFlex OS must have
enough capacity available to rebuild the data that was on the failed component.
Having enough spare capacity ensures full system protection during the event of a
node or disk failure. Ensure that the spare capacity is at least equal to the capacity
of the node containing the maximum capacity or the maximum Fault Set capacity.
Spare capacity is configured as a percentage of the total capacity. Therefore, if all
nodes contain equal capacity, set the capacity value to at least 1/N of the total
capacity—where N is the number of SDS nodes.
For example, you have an 11-node cluster of 3 TB each where the system must
protect against single node failure. The spare capacity should be at least 3 TB or
1/11th of the capacity—about 9.1%. Keep in mind that although 30 TB is available
for production data out of the 33 TB total, data is mirrored, so the protected
capacity available is only 15 TB.
There usually is no reason to modify the spare capacity from default values.
However, it is possible. Having a larger spare capacity allows the system to tolerate
some cascaded failures since there is more capacity to store rebuilt data. However,
less capacity is then available for storing user data.
Decreasing the spare capacity must be done with extreme caution. Although the
system gains more usable capacity, there may not be enough space to rebuild the
data protection after a failure.
This lesson presents activities that a VxFlex integrated rack administrator would
perform for storage provisioning. It presents creating volumes, expanding volume
capacity, mapping and unmapping volumes, and removing a volume.
VxFlex OS Volumes
Volumes are created from a Storage Pool and can be exposed to the applications
as a local storage device using the SDCs. When a volume is configured from a
Storage Pool, it is distributed over all devices residing in that pool. Each volume
block has two copies on two different SDSs. This allows the system to maintain
data availability following a single-point failure. The data is available following
multiple failures, as long as each failure took place in a different storage pool.
It is important to understand that the VxFlex OS volume chunks are not the same
as data blocks. The I/O operations are performed at the block level. If an
application writes out 4 KB of data, only 4 KB are written, not 1 MB. The same goes
for read operations—only the required data is read.
If thin provisioning must be used for the VxFlex OS volumes, ensure that you
monitor their consumption in the VxFlex OS interfaces. Also, ensure that you have
the appropriate capacity threshold alert set in the VxFlex OS system. A best
practice is to keep extra free space to avoid any issues with oversubscribed
storage pools.
Add Volumes
You can create and map volumes using various management tools. To start
allocating volumes, the system requires that there be at least four SDS nodes. The
created volume cannot be used until it is mapped to at least one SDC.
Volumes can be added using the GUI. After selecting the Volumes submenu from
the Frontend tab, the system administrator can right-click the desired storage pool
to create a volume. If you want to create more than one volume, type the number of
volumes you would like to add in the Copies box. If you are adding multiple
volumes, they are created with the same name and a number is appended instead
of the characters. The number in the Size box represents the volume size in GB.
The basic allocation granularity is 8 GB. You can select either Thick or the Thin
provisioning options. Thin provisioning is the default for volumes created from an
FG pool. Leave the RAM Read Cache cleared for MG pools provisioned by SSD
devices.
Administrator can
When a name has not been defined, the system displays default system-defined
names, using the volume’s ID. In place of the Protection Domain and Storage Pool
names, you can also use Protection Domain ID and Storage Pool ID respectively. It
is highly recommended to use thick provisioning for creating VxFlex OS volumes.
Also, do not enable RAM Read Cache since it is disabled on the pools.
You can alternately use a CLI command when logged into the master MDM:
In the Frontend > Volumes view, go to the volumes, and select the desired
volumes.
Right-click a storage pool and select Map. The Map Volumes window is
displayed, showing a list of the volumes to be mapped.
In the Select Nodes panel, select one or more SDCs to which you want to map
the volumes.
The progress of the operation is displayed at the bottom of the window. Keep the
window open until the operation is completed and until you can see the result of the
operation.
You can also create and add volumes directly from VxFlex Manager. In the Volume
Name field, select Create New Volume, or select an existing volume. For a
compute-only service, you can select only an existing volume that has not yet been
mapped. For a hyper-converged service, VxFlex Manager shows both options.
Select the volume name, Storage Pool, size, and volume type - thick or thin.
The new volume icon will appear on the Service Details page. The volume is
grayed out, because the service is still in progress. After the deployment completes
successfully, the volume shows the check mark and appears in the Storage list on
the Service Details page. For a storage only service, the volume is created, but not
mapped. For a compute only or hyper-converged service, the volume is mapped to
all the SDCs in the cluster.
The procedure to create and map volumes using the VxFlex OS plug-in is as
follows:
Click the VxFlex OS plug-in from the vSphere Web Client home tab.
From the Storage Pools screen, select the storage pool and click Create
volume.
In the Create Volume dialog box, enter the volume information.
To map the volume to ESXs, select Map volume to ESXs.
In the Select ESXs area, select the clusters or ESXs to which this volume
should be mapped.
After a volume is created and mapped to the desired SDC, you can use the volume
to provide storage to your virtual machines. The volume can be used to expand an
existing datastore or create a VMFS datastore on it. To identify the unique ID,
select the volume in the VxFlex OS GUI, and look at its ID in the properties sheet.
You can also identify the unique ID using VMware. The VMware management
interface shows each device named as EMC Fibre Channel Disk followed by an ID
number starting with the prefix eui.
Set Volume Limits: Volume limits are set on a per SDC basis and ensure that
the Quality of Service is being maintained.
Remove: Before removing a volume from a system, you must ensure that it is
not mapped to any SDCs. If it is mapped, unmap it before removing it. Removal
of a volume erases all the data on the corresponding volume.
You can increase, but not decrease, a volume capacity at any time, as long as
there is enough capacity for the volume size to grow. The size of an existing
volume can be increased while it is still mapped to the SDCs. However, the
operating system must also recognize that volume capacity increase. In the case of
an ESXi, the capacity of the datastore on that volume must be increased.
Also, the new size will be rounded up to the next multiple of 8 GB.
If the volume is being used for a VMware datastore, the datastore does not
increase its size automatically. You have to increase the datastore afterwards in
order for it to use the additional capacity of the volume.
Unmap Volumes
If you no longer need to use a volume, you may unmap it. You can unmap a VxFlex
OS volume from the VxFlex OS GUI. Remember to first remove them in an orderly
manner from the application and host environment. From VxFlex OS GUI,
Frontend > Volumes, right-click the target volume that you want to unmap and
select Unmap Volumes from the drop-down list.
The Unmap Volumes window is displayed, showing a list of the volumes that to
be unmapped.
If you want to exclude some SDCs from the unmap operation, clear the
checkbox for those nodes—these cleared SDCs retain mapping to the volume.
Then click Unmap Volumes.
The progress of the operation is displayed at the bottom of the window. Keep the
window open until the operation is completed, and until you can see the result of
the operation. You can verify the number of SDCs mapped to a volume from the
Mapped SDCs column in the Frontend window.
Snapshots are unmapped in the same way as volumes are unmapped. Unmapping
a volume does not delete the volume or return the volume capacity to the storage
pool.
Volume Limits
With the VxFlex OS Quality of Service (QoS) feature, the Administrator can control
or throttle the IOPS and/or bandwidth of any volume. It ensures that a volume
does not monopolize all the potential IOPs from the Storage Pool.
The bandwidth and IOPS limits for volumes can be monitored and set for each
SDC that the volume is mapped to. It enables the administrators to adjust the
amount of bandwidth and IOPS that any given SDC can use. You can configure
this QoS feature with the CLI and GUI on a client or volume basis. You can adjust
the amount of bandwidth and IOPS that any given SDC can use. For the QoS
feature to work, the volumes must be mapped before setting these limits. The
defaults are unlimited. Here are the steps:
In the Frontend > Volumes view, right-click the target volume. From the drop-
down list, select Set Volume Limits.
The Set Volume Limits window is displayed.
In the Bandwidth Limits and IOPS Limits boxes, type the required values, or
select the corresponding Unlimited option.
The number of IOPS must be larger than 10.
The volume network bandwidth is in MB/sec.
In the Select Nodes panel, select the SDCs to which you want to apply the
changes.
Click Set Limits.
The example here shows how to retrieve volume bandwidth limits through the GUI.
To view the limits through the CLI, use the following command:
Remove Volumes
Before removing a volume from a system, you must ensure that it is not mapped to
any SDC. If it is, unmap it before removing it.
Removal of a volume erases all the data on the corresponding volume. To remove
one or multiple volumes, perform these steps:
In the Frontend > Volumes view, right-click the target volume that you want to
remove.
From the drop-down list, select Remove.
The Remove Volumes window is displayed, showing a list of the volumes that
will be removed.
Click OK.
You may be asked to validate the VxFlex OS credential to complete the operation.
You can follow the same procedure if you want to remove a volume’s related
snapshots or remove snapshots only. Before removing a volume or snapshots, you
must ensure that they are not mapped to any SDCs. If they are, unmap them
before removing them.
Volume Migration
Volume migration between the pools or Storage Domains may be needed for many
reasons. Performance tiering may be the primary driver as application workloads
are dynamic and may change over time. You may also want to migrate volumes
during the development cycle from testing, development to operations.
With VxFlex OS 3.0, volumes and their snapshots can be migrated across storage
pools within the same Protection Domain and across the Protection Domains. It is
done using V-Tree Migration. Snapshots can only be migrated within the same
storage pool type. If you need to migrate a source volume between the different
storage types, you need to delete all the snapshots attached to the volume before
performing the migration.
To perform the migration, select the V-Tree Migration from the volume toolbar
menu. From displayed volumes, right-click on the volume that needs to be
migrated. Select V-Tree Migration > Migrate. From the migration wizard, select
the destination pool, and click OK.
The direction of the flow of data and the status of the migration are visible in the V-
Tree Migration view. The migration can also be performed using the CLI.
Immediately after initiating the migration, a new volume is created in the target
storage pool. so it resides in both pools for a time. New writes are sent to the new
target volume as the migration proceeds. Once the migration is complete, the
volume in the source pool will disappear.
Note that migration workload can be throttled to manage its impact to the
production workload. To do so, right-click on the storage pool, from Settings and
choose Set I/O Priority, and select the Migration Policy.
The table in the slide displays all the possible migration paths currently supported
between storage pool configurations. If the source volume is on a medium-grained
pool, that pool must be zero-padded to be relocated to the fine-grained pool.
This lesson presents techniques to manage rebuild and rebalance workloads and
resource consumption.
VxFlex OS mirrors all user data. Each piece of data is stored on two different
servers within a Protection Domain. The copies are randomly distributed across the
storage devices in the Storage Pool.
inaccessible. This process minimizes the amount of data that is transferred over
the network during recovery.
Minimum Rebuild time is important and VxFlex OS automatically selects the type of
rebuild to perform. Sometimes, more data is transferred to minimize the time that
the user data is not fully protected.
Rebalance is the process of moving data copies between the SDSs to balance the
workloads evenly across the nodes. It distributes data evenly across servers and
storage media. It occurs when VxFlex OS detects that the user data is not evenly
balanced across the devices in a Storage Pool. Rebalance can occur as a result of
several conditions such as SDS addition or removal, device addition or removal, or
following a recovery and rebuild operations.
VxFlex OS moves copies of the data from the most used devices to the least used
ones.
Both rebuild and rebalance compete with the application I/O for the system
resources including network, CPU, and storage media. VxFlex OS provides a rich
set of parameters that can control this resource consumption. The system is
factory-tuned for balancing between speedy rebuild/rebalance and minimization of
the effect on the application I/O. The user has fine-grained control over the rebuild
and rebalance behavior.
There are various settings that can affect the resources that are used to perform
these actions which may help system performance and affect recovery times after
a failure.
Network throttling affects network limits and is used to control the flow of traffic over
the network. It is configured per Protection Domain. The SDS nodes transfer data
between themselves. This data consists of user data being replicated as part of the
VxFlex OS data protection, and data copied for internal rebalancing and recovery
from failures. You can modify the balance between these types of workloads by
throttling the data copy bandwidth. This change affects all SDSs in the specified
Protection Domain.
When both rebuild and rebalance occur simultaneously, the aggregate bandwidth
that is consumed by both does not exceed the individual maximum for each type.
The Rebuild throttling policy determines the priority between the rebuild I/O and the
application I/O when accessing SDS devices. Application I/Os are continuously
served. Rebuild throttling increases the time the system is exposed with a single
copy of data but also reduces the impact on the application. If modifying the rebuild
throttling, choose the right balance between the two.
Rebalance throttling sets the rebalance priority policy for a Storage Pool. The policy
determines the priority between the rebalance I/O and the application I/O when
accessing SDS devices. Application I/Os are continuously served. Unlike rebuild,
rebalance does not impact the reliability of the system so reducing its impact is not
risky.
By default, the Rebuild and Rebalance features are enabled in the system,
because they are essential for system health, optimal performance, and data
protection. These features are only disabled temporarily in specific circumstances,
and should not be left disabled for long periods of time. Rebuild and Rebalance
features are enabled and disabled per Storage Pool.
For example, new servers are added to the cluster during the application peak
workload hours. To avoid network congestion from rebuild and rebalance, defer
these operations to off-peak hours.
Enabling or disabling the rebuild or rebalance features can be done through the
GUI and CLI and should be done with extreme caution.
Introduction
The checksum feature addresses errors that change the payload during the transit
through the VxFlex OS system. VxFlex OS protects data in-flight by calculating and
validating the checksum value for the payload at both ends.
During write operations, the checksum is calculated when the SDC receives the
write request from the application. This checksum is validated just before each
SDS writes the data on the storage device. During read operations, the checksum
is calculated when the data is read from the SDS device. It is validated by the SDC
before the data returns to the application.
Pools with Fine Granularity with or without compression, have persistent checksum
by default. This can't be changed. Each I/O goes through compression, checksum
is calculated before it is written to the disk. There are two types of checksum:
The checksum feature may have a major impact on performance and availability
during periods of high sustained I/Os and is usually disabled. To modify this setting,
perform the following steps:
In the Backend view, navigate to, and select the desired Storage Pools
Right-click and select Configure Inflight Checksum from the drop-down list
The Configure Inflight Checksum window is displayed
To enable the Checksum feature, select the Enable Inflight Checksum
option
To disable the Checksum feature, clear the Enable Inflight Checksum
option
Click OK
The Background Device Scanner enhances the resilience of VxFlex integrated rack
by constantly searching for, and fixing device errors before they can affect the
system. It provides increased data reliability compared to what the media
checksum scheme provides. The scanner seeks out corrupted sectors on the
devices in the pool, provides SNMP reporting about errors that are found, and
keeps statistics about its operation. When a scan is completed, the process
repeats, thus adding constant protection to the system.
You can set the scan rate (default: 1 MB/second per device), which limits the
bandwidth that is allowed for scanning. The following scan modes are available:
Device only mode: The scanner uses the device's internal checksum
mechanism to validate the primary and secondary data. If a read succeeds in
both devices, no action is taken. If a faulty area is read, an error is generated. If
a read fails on one device, the scanner attempts to correct the faulty device with
the data from the good device. If the fix succeeds, the error-fixes counter is
increased. If the fix fails, a device error is issued. If the read fails on both
devices, the scanner skips to the next storage block.
A similar algorithm is performed every time an application read fails on the primary
device.
Data comparison mode: This is only available if zero padding is enabled. The
scanner performs the same algorithm as above. In addition, after successful
reads of primary and secondary, the scanner calculates and compares their
checksums. If this comparison fails, the compare errors counter is increased,
and the scanner attempts to overwrite the secondary device with the data from
the primary device. If it fails, a device error is issued.
The scanning function is enabled and disabled (default) at the Storage Pool level,
and this setting affects all devices in the Storage Pool. You can make these
changes at any time, and you can add/remove volumes and devices while the
scanner is enabled.
The scanning will start about 30s after adding a device to a Storage Pool in which
the scanner is enabled.
You can use the GUI to apply performance profiles to system components. The
high-performance profile configures a predefined set of parameters for high-
performance use cases. The main difference between the high and default profiles
are the amount of server resources (CPU and memory) that are consumed. The
high-performance profile always consumes more resources.
VxFlex OS licenses are purchased by physical device capacity in TB. You can
activate your licensed capacity over multiple VxFlex OS systems, each system with
its unique installation ID. The license is installed on the MDM cluster, using the
set_license command. Since VxFlex OS licenses are purchased by physical device
capacity in TB, any upgrade or addition may require extra licensing. You can view
current license information using the CLI or the GUI.
In the VxFlex OS GUI, in the upper right corner, open the drop-down list that is
displayed next to the username and select About. To update the license
information choose System Settings > License. Now copy and paste the license
key information.
Window: The sliding time window for each interval—Short, Medium, and Long
Threshold: The number of errors that may occur before error reporting
commences
Period: The time interval of each window, in seconds
The primary accounts are the VxFlex OS user accounts. These accounts are used
to authenticate against the VxFlex OS cluster itself. They are sometimes referred to
as the MDM accounts, since the MDM handles the authentication. This is the
account that you use to log in to the VxFlex OS GUI. Also, you often have to log in
using a VxFlex OS account before running a scli command. The default
administrator account is admin, but you can create more accounts as needed.
VxFlex OS runs on nodes and virtual machines that use either the CentOS or
RHEL operating system. These have their own individual OS user accounts. You
would use one of these user accounts when opening an SSH session to an SVM,
storage-only node, or KVM host to perform maintenance or to run a scli command.
You may have to log in to the VxFlex OS Gateway virtual machine's operating
system for maintenance as well. The default user name for these VMs is root.
The VxFlex OS Gateway hosts a web-based interface. It has its own admin
account that is separate from the others.
VxFlex OS User accounts grant users role-based access to the GUI and specific
SCLI commands.
User authentication may be done with both local authentication and using Active
Directory (AD) over LDAP or LDAPS (Secure LDAP). VxFlex OS can support both
AD users that are fully controlled through the customer’s existing centralized
authentication server, and local users concurrently. You can associate groups from
the AD with the existing VxFlex OS roles to ensure the Role-Based Access (RBAC)
model. When a user logs in to the VxFlex OS system, the MDM identifies that the
user belongs to the AD domain. The MDM then authenticates the user against the
AD server over secured communications. After the user is authenticated, VxFlex
OS accepts the group to which the user belongs, and associates the appropriate
role and permissions to the user.
Access to the VxFlex OS Gateway requires defining a dedicated named user. This
user may either be a local user or an LDAP user. Access to the Installation
Manager (IM) requires a user name and password which should be the VxFlex OS
Gateway user.
The authorization permissions of each user role are defined differently for local
authentication and for LDAP authentication. Although the role names are similar,
the permissions that are granted to them are not.
User roles that are defined locally are defined in a nested manner. Higher-level
roles automatically include the abilities of lower roles. For example, a user with a
configurator role also has the abilities of the monitor role. A user with the
administrator role has the abilities of both a configurator and monitor.
User roles that are defined in the LDAP domain are mutually exclusive, with no
overlap, apart from the Configurator role. For example, if you want to give an LDAP
user permission to perform both monitoring and configuration roles, assign that
user to both Backend/Frontend Configurator and Monitor LDAP groups.
Before you begin, ensure that the OpenLDAP package is installed and configured
on each server running the MDM. LDAP configuration steps are operating system
dependent and not presented here. Steps for preparing a server may be different
for secure and nonsecure LDAP authentication. Once servers are ready, follow the
following steps to configure LDAP/LDAPS authentication:
Keep in mind that LDAP user roles are not nested in the way that local VxFlex OS
user roles are. For example, granting an LDAP group an administrator role does
not give it the monitor role. LDAP users cannot log in to the VxFLEX OS GUI or the
VMware plug-in unless they have the monitor role.
Once authentication is set, a user can log in to the system according to the defined
method. When logging in locally, the command expects a user name and
password. The LDAP command should also include the LDAP domain that it is
using, and the LDAP authentication parameter.
When logging into the GUI, use a local username, such as admin, or an LDAP
username with the domain name. An example of an LDAP username is
sunder@corp.local.
You can only create local users with the CLI interface, and they are only effective
within the VxFlex OS CLI environment. This command is only available to
administrator roles. The --add_user command is used to set the username and
role for the new user. When a new user is created, the administrator that created
the user receives an automatically generated password that is required for first-time
authentication. When the new user logs in the first time, they are required to
change this password. When the system authenticates a user, all commands that
are performed are tracked to their credentials until a logout is performed, or until
the session expires.
You can modify existing VxFlex OS user's roles. You can also disable the default
Super User to ensure that all users are associated with specific people. If you need
to re-enable the SuperUser, use the reset_admin command.
For detailed information about this topic, see training and documentation at
www.vmware.com.
The VxFlex integrated rack compute only and hyperconverged nodes provide the
required computing resources. These resources are pooled together and
configured to host virtual machines. On these VMs you can run your application
workloads and other services. You have a choice of hypervisors to use on these
nodes: VMware vSphere or Red Hat Virtualization (RHV).
Virtual machines play a few different roles in a VxFlex integrated rack system.
Production VMs: You deploy your production virtual machines onto the VxFlex
integrated rack system to perform your computing needs. These VMs could be
database servers, application servers, or any other kind of virtual machine.
VxFlex integrated rack services VMs: Many internal functions and VxFlex
integrated rack management tools run on virtual machines. This includes
vCenter, Red Hat Virtualization Manager, VxFlex Manager, and support
services.
Storage Services VMs: Storage virtual machines are needed to use the
physical storage on a hyperconverged node. These VMs give VxFlex OS
access to storage.
VxFlex integrated rack has two separate vSphere environments - one for the
VxFlex Management Controller cluster, and the other for the VxFlex node cluster.
The Controller cluster hosts VMs that provide services for the VxFlex integrated
rack system itself. It is a VMware vSphere cluster where all the nodes run the ESXi
hypervisor, and they are managed by VMware vCenter. For storage, the Controller
cluster uses VMware vSAN. Similar to VxFlex OS, vSAN aggregates the locally
attached disks of the VxFlex Controller nodes to create a pool of distributed shared
storage.
The VxFlex node clusters primarily host the production virtual machines. Nodes in
the VxFlex cluster run either ESXi or RHV hypervisors and are managed by
VMware vCenter or by Red Hat Virtualization Manager. Unlike the Controller
cluster, the VxFlex node cluster uses VxFlex OS for all the customer production
data. VxFlex OS provides massive scalability and flexibility in terms of
hypervisor/OS and bare-metal deployments. If using ESXi nodes, Storage Virtual
Machines are needed to provide storage to VxFlex OS.
As in any vSphere deployment, you can use the VMware vCenter as the
centralized tool to manage ESXi hosts and VMs that are running in the VxFlex
cluster.
The VxFlex Controller cluster maintains the environment for the overall VxFlex
integrated rack management. The virtual machines running on the Controller
Cluster include vCenter Server Virtual Appliance (vCSA), VxFlex OS Gateway VM,
VxFlex Manager, and OpenManage Enterprise VMs, Secure Remote Services
VMs, and Windows jump servers for support access. This cluster uses vSAN for
storage, so you do not need VxFlex OS in the controller cluster.
vSphere provides a high availability solution for vCenter Server, which is known as
vCenter Server High Availability (VCHA). The vCenter High Availability architecture
uses a three-node cluster to provide availability against multiple types of hardware
and software failures. A vCenter HA cluster consists of one active node that serves
client requests, one passive node to assume the role of the active node in case of
failure, and one quorum node called the witness node.
From vSphere 6.5 VUM and SQL are integrated into vCSA, so the SQL license
requirement is removed along with their virtual machines.
The vSAN datastore is created during initial setup, which uses the storage local to
Controller nodes. The vSAN datastore size is an aggregate of all the Capacity
drives in the Controller cluster. All VMs created in the VxFlex Controller cluster are
stored on the vSAN datastore.
To view the vSAN Datastore backing in vSphere Web Client, go to Storage, select
the vsanDatastore, and click Configure > Device Backing. This view shows the
physical disks that makes up the vSAN. These are disks from the VxFlex
Management Controller nodes running ESXi.
The top portion of this screen shows each node and the disk group. Notice that
the disk group for each node has the same number of disks.
The bottom screen shows the physical disks that each node has contributed to
the VxFlex Controller vSAN.
The VxFlex node clusters (or production clusters) provide compute and storage to
customer applications. The node's local storage is pooled together by VxFlex OS.
To provide storage to VxFlex OS, ESXi hosts need a Storage VM (SVM). The SVM
has the node's storage controller mapped to it using DirectPath I/O. This gives the
SVM direct access to the storage controller. Rebooting or powering off the SVM
causes VxFlex OS to believe that the node failed.
The Storage VMs cannot be migrated to other hosts because they need direct
access to the local storage of the node. However, other VMs consuming the VxFlex
OS storage can be migrated from one ESXi host to another or from one RHV host
to another.
Nodes running RHV do not need storage VMs. Instead, VxFlex OS SDC and SDS
software run directly on the RHEL OS of the node.
The Storage-only node runs RHEL and contributes storage to the VxFlex OS
cluster. No customer application runs on Storage-only nodes. Compute only nodes
provide computing power, but do not contribute any storage to the VxFlex OS
storage pool
The ESXi nodes in the VxFlex cluster are managed by a VCSA that is hosted on
the controller cluster. Similarly, RHV nodes are managed by an RHV-M virtual
machine on the controller cluster.
Each ESXi node requires a Storage VM, to access VxFlex OS storage. Because
the SVM provides access to VxFlex OS storage, it cannot be stored on VxFlex OS
storage. Instead, its files are stored on a small datastore that uses the node's
internal BOSS (PowerEdge 14G) or SATADOM (13G) storage. These datastores
are labeled DASXX or something similar.
DASXX datastores should only store the SVMs and system files. Production virtual
machines should be stored on datastores that are backed by VxFlex OS storage.
Shown is the storage view in the vSphere Web Client for the FLEX cluster. Notice
the five DASXX datastores in this example. There is one datastore for each node.
The device backing of these datastores is labeled as a local SATADOM device.
Along with the datastore, these devices also host the ESXi operating system.
Any virtual machine that is used for production must be stored on VxFlex OS
storage. The first thing that you need to do is to create a volume in VxFlex OS. It is
recommended to use thick provisioning for the VxFlex OS volume. Thick
provisioning is recommended because the hypervisor will not be aware of whether
the volume is overprovisioned. Thin provisioning can be subsequently used when
creating the virtual machine disks (if needed).
You must also map the volume to all the SDCs so that they have access to the
volume. Then make a note of the ID number of the new volume. This number is
needed to locate the volume in vSphere or RHV-M.
In vSphere, you can see the details of the Storage Devices available to a VxFlex
node cluster. The EMC Fibre Channel Disk that you see here are the VxFlex OS
volumes that are mapped or available to a specific host. The ends of their
identifiers match the ones that are shown in the VxFlex OS interface.
If you have recently created and mapped a volume, you may have to rescan for
storage devices.
After you have created the VxFlex OS volumes, you can create a datastore on the
VxFlex OS EMC Fibre Channel Disk. When creating a datastore, be sure to select
a device that uses a VxFlex OS volume. Here, you see the wizard screen to select
the device. The selected device has an Identification number that matches the one
that was previously created in VxFlex OS. After completing the wizard, select
Finish, and the datastore is created.
VxFlex integrated rack provides the same methods for building virtual machines in
vSphere as in any vSphere environment. Some of these methods, such as cloning
or deploying VMs from a template, require vCenter. Others are universal to
whatever management platform is being used. Using the New Virtual Machine
wizard makes it easy. One key difference when allocating storage for a VM is that
you should choose a datastore that uses VxFlex OS volumes.
Create VM Example
When creating VMs in the VxFlex integrated rack environment, ensure that you
select the VxFlex OS storage and not the individual datastore on each host.
Allocating storage is part of the process when using the New Virtual Machine
wizard in the vSphere Web Client.
Add Storage to VM
You can expand the storage capacity of a virtual machine by adding a virtual disk.
To add a disk to a virtual machine, select the New Hard disk under New Device in
the Edit Settings screen. Specify the size of the new disk, expand the New Hard
disk and then expand Location. Select either Store with the virtual machine or
Browse. Browse shows you a list of devices (as shown in the image). If you set up
your VMFS datastores with meaningful names, it can help you to choose the
correct device.
There are many options to control a VM and its environment. Some of these
options include monitoring, creating a snapshot, cloning, creating a template, and
adding/removing devices. To see all the options, select the VM in the left navigation
pane and right-click.
You can modify virtual machine settings with the Edit Settings option. With
supported guest operating systems, you can also add CPU and memory while the
virtual machine is powered on.
Types of migrations:
Cold: Migrate a virtual machine that is powered off
Suspended: Migrate a virtual machine that is suspended
vSphere VMotion: Migrate a virtual machine that is powered on
Concurrent migrations are possible. Refer to VMware website for the latest
information about maximum concurrent migration to a single vSphere VMFS
datastore.
Migration Wizard
Note: You cannot migrate the Storage virtual machines as they use local storage
so therefore not a candidate for migration.
If using Red Hat Virtualization, you use Red Hat Virtualization Manager (RHV-M),
to manage the environment. RHV-M runs on a virtual machine that is hosted on the
controller cluster.
The access switches provide networking for both the controller nodes and VxFlex
nodes. The traffic coming from these nodes uses various VLANs to allow the traffic
to remain separated, even if it is traveling over the same physical cable. Also, the
switches are configured with virtual port channels (vPC) to allow multiple physical
connections to act as one, even if they are on separate switches.
Each VxFlex node has four connections to the access switches: two connections to
each switch. Two of these four connections (one from each switch) are used to
provide networking for management and production VM traffic. A virtual port
channel (vPC) is created for these connections to enable them to act as one.
Because different types of traffic from different VLANs are traveling on this port, it is
configured in switchport trunk mode. This allows multiple VLANs to use that port.
The other two connections (one from each switch) are dedicated to the VxFlex OS
data traffic. These ports tag all incoming traffic with a VLAN ID for that data
network. The two VxFlex OS data ports on the two switches use different VLAN
IDs, and they are not part of a vPC. Instead VxFlex OS handles the load balancing
of traffic on these two connections.
Each access switch has two ports that connect to each controller node. Since each
port carries traffic that is segregated on different VLANs, they are configured in
switchport trunk mode. This allows them to accept traffic tagged with multiple
VLANs. Also, to provide higher bandwidth, they are configured as virtual port
channels.
Virtual Port-Channels
VxFlex integrated rack uses virtual port channel for all the management and
production traffic. With port channels, links that are connected to different network
devices act as a single port channel to a third device. They are already set up for
the management switches to the uplink switches providing high availability. The
other benefit of vPC includes fast convergence and bandwidth. Do not change
these settings.
To see the configuration of the physical switches, log in to the switch and use Cisco
NX-OS commands. Here are a few useful commands.
The show running-config command displays the full configuration file for the
switch. With this command, you can see every configuration setting on a switch.
Although the large amount of output can be difficult to read, if you know what to
look for, you can find the full details of any component.
Shown are two portions of the show running-config command output. They
show the configuration of some of the ports that connect to VxFlex nodes and to
Controller nodes. The description for the port usually helps identifies its purpose.
The first port for the VxFlex node is used for management traffic. It is set for
switchport mode trunk and the VLAN IDs that are allowed are shown. The
channel-group specifies that it is a part of a virtual port group (vPC). The second
port for the VxFlex node is for VxFlex OS data. The switchport access vlan
command shows which VLAN ID the switch will tag all data from this port.
The two ports shown that connect to controller nodes are also trunked and the
allowed VLANs are shown. They are both part of a vPC, because they have a
channel-group assigned.
The show interface brief command displays a list of all interfaces on the
switch, their status, which VLAN they are tagging packets with, and which virtual
port channel they are a part of.
In the command output shown, Eth1/1/1 is a port that connect to a VxFlex node for
management traffic. The Mode column shows that it is a trunk that accepts multiple
VLAN IDs. The accepted VLANs are not shown with this command. You can also
see that it is part of vPC 111 in the Port Ch # column.
Eth1/2/1 is a port that connect to a VxFlex node for VxFlex OS data traffic. The
VLAN column shows that it is tagging packets with a VLAN ID of 231.
The show vpc command displays a list of all virtual port channels. The Active
VLANs column shows which VLANs are allowed on that port channel. A VLAN only
appears in that column if it is configured for that port channel on both switches.
The show vlan command displays a list of all VLANs, their name, and which ports
and vPCs are using them. This command is helpful when determining which VLAN
IDs are already in use and what they are used for.
In the VxFlex integrated rack environment, each ESXi node uses distributed virtual
switches that may contain multiple port groups depending on the network and the
cluster. The distributed virtual switches span all nodes in the cluster. Each
distributed switch has port groups and uplinks.
Because port groups span multiple ESXi hosts, they need access to physical
networking to allow communication between hosts. Each distributed switch has
uplinks to provide connections between hosts and to other components outside the
vSphere environment.
At the access switch side, the switch port that is connected to the ESXi node must
be configured to accept the tagged traffic. This occurs if the port is configured in a
trunk mode and allow the VLAN traffic to pass through. Once configured, the switch
accepts the tagged packets.
The screenshot displays two uplinks each with six vmnics (one for each node) in
this specific VxFlex integrated rack deployment.
In the VxFlex node cluster, DVswitch0 carries the traffic for management processes
and for any production data. To keep the different traffic segregated, the distributed
virtual switch uses multiple port groups. Each port group is given its own VLAN, so
that the data remains separate even when traveling across the physical switches.
Upon installation, there are three port groups.
DVswitch0 has two uplinks on each ESXi host. Each uplink connects to a separate
access switch. The ports on these switches for these connections are configured
as a VLAN trunk. This allows the traffic from all port groups, which have been
tagged with different VLAN IDs, to travel over those ports.
Note: The screenshot has had some vmkernals and virtual machines removed to
save space.
DVswitch1 and DvSwitch2 in the VxFlex node cluster are for the VxFlex OS data
traffic. This traffic is sent over separate uplinks than the management and
production traffic on DVswitch0 to provide high performance for the VxFlex OS
data.
Both of these distributed switches have one port group each, vcesys-sio-data1
and vcesys-sio-data2. Each of these port groups has a VMkernel port for each
ESXi host. These VMkernel ports allow the SDC software, which runs on the kernel
as a driver, to access VxFlex OS volumes over the network. Each SVM also has its
network interfaces for data on each port group. This allows the VxFlex OS SDS
software running on the SVM to send data over the network.
Each distributed switch has only one uplink. Instead of applying a VLAN tag at the
port group, VLAN tagging is performed at the physical switch. Each data network
has its own VLAN ID.
To review, let's look at how an SDC running on an ESXi server would access data
on a VxFlex OS volume. The SDC runs as a driver on the ESXi kernel, so it must
communicate through a VMkernel port, vmk2 or vmk3. The VMkernel port is part of
one of the VxFlex OS port groups on a distributed switch. It contacts an SVM,
either through an uplink that is configured for that distributed switch, or directly if
the SVM is on the same host.
Let's take a closer look at the network portion of the solution and how it can affect
the VxFlex OS data I/O path.
The diagram shows three distributed virtual switches, DVswitch 0, 1, and 2, each
with one or more port groups. The lines show how traffic from those port groups
travels to the physical switches.
In DVswitch0 at the top, are three port groups, each used for management
purposes. These port groups have two vmnics to use as uplinks for its traffic. The
ports are set up for link aggregation. This requires setting the proper teaming and
failover settings for the port groups and enabling VPC on the physical switch.
Since each port group on DVswitch0 uses a different VLAN, the switch port that is
connected to the ESXi node is configured to accept VLAN tagged traffic. This
occurs if the port is configured to be in trunk mode and configured to allow the
VLAN traffic to pass through.
DvSwitch1 and DvSwitch2 both carry VxFlex OS data traffic. Two separate
distributed switches are used for the data traffic to provide traffic isolation. Notice
that they each use separate vmnics that are connected to separate physical
switches. This is spreading the I/O workload, and making data access available
across two different 10/25-Gb connections for high throughput.
The two VxFlex OS data networks do not have a VLAN assigned at the port group.
However, a VLAN tag is assigned at the physical switch.
Finally, in the physical area at the top, you can see the iDRAC 1-GbE switch which
is separate from the other switches. There are two reasons for this. First, you do
not need high-speed data ports for management traffic. And second, it separates
management control traffic from the production traffic. All these come together to
comprise different I/O paths in the VxFlex integrated rack. It is good to know these
elements and their relationships, especially if you need to troubleshoot an issue.
You can view the physical adapters on an ESXi host by selecting it and going to the
Configure > Physical adapters section. This shows a list of network interface
cards and their speeds, MAC addresses, and so on. In this image, we are looking
at physical network interface cards, which show us four 10Gb ports that are used
for VxFlex integrated rack networking. It also shows which distributed switch is
using each interface as an uplink.
VxFlex Management Controller cluster also has three DVswitches, however the
layout is different than the VxFlex node cluster. Using the Topology view gives a
good high-level view of the networking.
In this image, we see VLAN ID 110 is used for the vcesys-esx-mgmt port group and
VLAN ID 151 is used for the vcesys-sio-mgmt, VxFlex OS management port group.
These VLANs are the same as the VxFlex node cluster networking. VLANs should
be consistent across clusters and laid out globally. For DVswitch0, we see that
there are two uplinks each with three vmnics, for the three nodes in this VxFlex
Controller cluster.
Production virtual machines need their own port groups and VLANs to have
networking. They should not use the existing port groups or VLANs, which are for
VxFlex integrated rack system use only.
Since production traffic should be separated from VxFlex OS data traffic, it should
use the uplinks that are configured for DVswitch0. Also, because production traffic
should be logically separated from the other management traffic on DVswitch0, you
will create a separate VLAN and port group for production.
There are two things that you must do to add networking for production virtual
machines. You must configure the access switches to accept traffic tagged with the
new VLAN ID. Because the uplinks of DVswitch0 use virtual peer channels, you
must configure those vPCs. You also must create a port group that will use the new
VLAN ID. That way, traffic from any virtual machine using that port group will be
tagged with the correct VLAN ID, and be allowed to travel across the access
switches.
Depending on the need, you may create multiple port groups and VLANs for
production. Multiple VLANs and port groups will allow you to create separate
networks for different applications in your production environment.
To configure the physical switches, you will create a VLAN and add it to the
relevant port channels. Log in to the access switches and use the show vlan
command to list the currently configured VLANs. Find a VLAN number that is not
currently used. This will be your new VLAN.
Also, display the port channel interfaces with the show interface
description command. Make a note of the interface names, starting with Po,
that are used for the uplink, peer-link, and connections to all ESXi hosts. These are
the port channels that will need the new VLAN added.
Once you have gathered the necessary information, you can configure both access
switches. First, create the new VLAN using the available VLAN ID that you
identified. Give it a name to make it easier to know its purpose.
Next, add that VLAN to virtual port channels for the peer link, uplink, and ESXi
hosts. Use the vPC numbers identified earlier.
Confirm that the VLANs have been added with the show vpc command. This
shows your new VLAN listed under each of the vPCs. This command only shows
VLANs that are configured across the virtual port channel. Therefore, VLANs are
not listed here until they are configured.
The switches are now configured to accept traffic tagged with the new VLAN ID.
To create a network for production virtual machines, create a new Distributed Port
Group. Begin this process using the wizard:
In the New Distributed Port Group wizard, you can configure most settings to
meet your requirements. However, you must set the VLAN ID to match the VLAN
you had configured on the access switches. Also, under Teaming and failover,
select Route Based on IP hash for the Load balancing. This setting is required for
any uplinks that use vPC, like the ones on DVswitch0. Route based on IP Hash
works by taking the source and destination IP addresses, and performing
calculations on each packet to determine which uplink to be used. Because the
load balancing is based on the source/destination IP addresses, a VM
communicating with multiple IP addresses can balance its load across all the
network adapters. This makes better use of the available bandwidth.
First, the access switches must be configured to accept traffic tagged with your
new VLAN ID. Then, in Red Hat Virtualization Manager (RHV-M) create a network.
Enable VLAN tagging, and provide the VLAN ID. Also, set the network to use an
MTU of 9000.
For each Red Hat host, allow it to use the new network, and set it to use bond0 on
the host. bond0 is used for management and production traffic. The other
interfaces are for VxFlex OS data traffic only.
This lab presents activities that are related to managing your virtual compute and
network resources.
This lesson presents VxFlex OS snapshots and how it can be used to protect
VxFlex integrated rack data.
Daily backups provide minimal required data insurance by protecting against data
corruption, accidental data deletion, storage component failure, and site disaster.
The daily backup process creates fully recoverable, point-in-time copies of
application data. Successful daily backups ensure that, in a disaster, a business
can recover with not more than 24 hours of lost data. The best practice is to
replicate the backup data to a second site to protect against a total loss of data if
there is a full site disaster. Most daily backups are saved for 30 days to 60 days.
For datasets that are more valuable, data replication achieves a higher level of
data insurance. Typically, data replication is done in addition to daily backup.
Replication cannot always protect against data corruption, because a corrupted file
replicates as a corrupted file.
VxFlex integrated rack supports a wide range of data protection options for both
operational recovery and business continuity. Besides VxFlex OS and hypervisor-
based solutions, VxFlex integrated rack can integrate with Dell EMC backup
recovery and business continuity solutions.
VxFlex OS Snapshots
Snapshots are point in time copies of a volume. VxFlex OS provides the capability
to take snapshots of a volume. Once a snapshot is created, it exists as a separate
unmapped volume that can be used in the same manner as any other VxFlex OS
volume. Snapshots are thin provisioned 1 and are generated instantaneously. This
means that they can be created quickly and do not use much space. An
administrator can create up to 126 snapshots per volume for MG pools and 126 for
an FG pool. Out of this, 64 can be policy managed. Compared to MG storage
pools, snapshots of FG storage pool create significant capacity savings.
In the context of VxFlex integrated rack, you could perform a snapshot of a volume
that is being used for a VMware datastore. This would create a point-in-time copy
of the contents of that datastore which includes all the virtual machines that are
stored there.
1
Maximum thin capacity provisioning = 5 * (gross capacity - used capacity))
If you select multiple volumes and create a snapshot of them, they are placed into
a consistency group. A consistency group enables manipulation of the snapshots
as one set. For example, selecting one snapshot of a consistency group and then
removing the consistency group, deletes all snapshots that are a part of that
consistency group. However, a user can also remove individual snapshots if
needed.
Consistency groups are used when multiple volumes have a contextual relationship
and must have snapshots performed simultaneously. They can be especially useful
for creating crash-consistent backups for database applications.
The structure related to all the snapshots resulting from one volume is referred to
as a volume tree or VTree. It is a tree spanning from the source volume as the root,
whose siblings are either snapshots of the volume itself or descendants of it. Thus,
some snapshot operations are related to the VTree, and may affect parts of it.
You can also capture a snapshot of a snapshot, which creates a new branch in the
V-tree. One limitation concerning V-Trees is that they cannot be moved from a
traditional medium-grained storage pool to a new fine-grained pool.
The graphic on the slide represents an example of a VTree structure. The BLUE,
S11 and S12 are snapshots of V1. S111 is a snapshot of snapshot S11. Together,
V1 and S1x, and S1xx are the VTree of V1. When you migrate a volume, volume
tree and all its snapshots are migrated together
Create Snapshots
Creating snapshots is done through the VxFlex OS GUI. Select the volumes that
you want to create a snapshot of, right-click, and select Snapshot Volume. Next,
set the name of the snapshot and confirm the settings.
You can also create snapshots using VxFlex OS CLI commands. The following
example shows the commands to create a snapshot, and map it to an SDC.
Create a snapshot:
scli --snapshot_volume --volume_name vol_1 --snapshot_name
snap_1
Map a snapshot:
Snapshot Policy
Snapshot policies contain a few attributes and elements, offering the ability to
automate snapshots for specified volumes based on specified retention schedules.
Multiple source volumes can be managed per policy, but a source volume cannot
span policies- it can be the source volume for a single policy.
Removing Snapshots
You can remove a volume together with its snapshots, or remove individual
snapshots. Snapshots can also persist after the base volume is removed, so they
are independent.
You can also remove a consistency group and its snapshots. Before removing a
volume or snapshot, you must ensure that they are not mapped to any SDCs. If
they are, unmap them before removing them. Removal of a volume or snapshot
erases all the data on the corresponding volume or snapshot.
In Frontend > Volumes > V-Trees view, select the volume from which you
want to remove the snapshots and right-click. You can either remove the
volume or consistency group.
The Remove Volumes window is displayed, showing a list of the objects that
will be removed.
Restore Snapshot
In case of data corruption on the source volume, you can revert the data back from
a snapshot using Overwrite Content feature available on the volume. You can
choose the desired Snapshot as a source volume to revert to that point-in-time.
You can also recover data from a snapshot manually by mounting it to the SDC
from which the data originated, and copying data back to the production volume.
This lesson presents VMware vSphere data protection features. It also provides
overview of RHV protection features.
VMware Snapshots
Although a snapshot acts as a copy of the entire virtual machine, only the changes
to the virtual machine are stored. This means that the initial size of a snapshot is
small. The longer a snapshot is retained, the more capacity it uses, since the
number of changes to a VM grows. For this reason, snapshots are good for short
periods of time. For longer retention, a backup solution is needed.
To create a snapshot, right-click the virtual machine and select Snapshots > Take
Snapshot. You can then give the snapshot an identifying name and description. If
the VM is powered on, you can also snapshot the virtual machine’s memory. This
takes longer since it needs to copy the memory to disk, but it enables the VM to be
rolled back without requiring a reboot.
Keep in mind that VMware High Availability does not provide seamless recovery.
During normal operation, the virtual machine only uses one ESXi host as its
compute source. When there is a problem, the virtual machines go down and then
restarts on another host. When virtual machines are restarted on another host, they
reboot. High Availability requires that the virtual machines use shared storage and
that the hosts are placed in a cluster with a shared management network. VxFlex
integrated rack already has all hosts in a cluster with a shared management
network, and VxFlex OS provides shared storage between all hosts. VxFlex
integrated rack environment, therefore, can enable the use of VMware vSphere
High Availability feature.
VMware Fault Tolerance can be enabled for individual virtual machines to provide
zero downtime on a host failure. It works by creating a secondary copy of the virtual
machine on another ESXi host of the cluster. This secondary copy has its own set
of virtual machine files and memory which are kept synchronized with the primary
virtual machine. The synchronization happens every 10 to a few hundred
milliseconds using a method called Fast Checkpoints. This option is ideal for
uninterrupted availability of critical virtual machines.
To configure, right-click the virtual machine, and select Fault Tolerance > Turn on
Fault Tolerance. Select datastores for the secondary VM files and other fault
tolerance files, and select the ESXi host that runs the secondary VM.
RHV Managers enables you to take snapshots of a virtual machine for operational
recovery. Snapshot causes a new Copy-on-write (COW) layer to be created. All
writes performed after a snapshot is created are written to the new COW layer. Any
virtual machine that is not being cloned or migrated can have a snapshot taken
when running, paused, or stopped. Snapshots of VMs that are based on Direct
LUN connections are not supported, live or otherwise.
The backup and restore API is used to perform full or file-level backup and
restoration of virtual machines. The API combines several components of RHV,
such as live snapshots and the REST API, to create and work with temporary
volumes. These volumes can be attached to a VM containing backup software
provided by an independent software provider.
The engine-backup tool can be used to back up the RHV Manager. The tool backs
up the engine database and configuration files into a single file, and can be run
without interrupting the ovirt-engine service.
RHV supports two types of disaster recovery solutions to ensure that environments
can recover when a site outage occurs. Both solutions support two sites, and both
require replicated storage.
the primary and secondary site. Virtual machines automatically migrate to hosts
in the secondary site if an outage occurs. However, the environment must meet
latency and networking requirements.
Active-Passive Disaster Recovery solution is implemented by configuring
two separate RHV environments: The active primary environment, and the
passive secondary (backup) environment. Failover and failback between sites
must be manually executed, and is managed by Ansible.
For more information about Red Hat Virtualization data protection, see
www.redhat.com/rhv
This lesson presents data protection solutions for VxFlex integrated rack systems.
Introduction to Avamar
Dell EMC Avamar is a complete backup solution. It performs scheduled and on-
demand backups and provides the backup storage.
Data Domain is a deduplicated storage system that can be integrated with Avamar.
In this type of configuration, Avamar is used to manage backup clients, schedules,
datasets, and other policies, while Data Domain is used as a storage device.
Backup data is sent directly from the client to the Data Domain system using Data
Domain’s DD Boost technology. Back up metadata used to identify files and
backups is stored on Avamar. The backup process uses Data Domain
deduplication methods rather than Avamar’s method which can provide faster
backup and recovery, especially for large active databases. Data Domain provides
flexibility since storage can be shared with other Avamar servers or other
applications.
Storing Avamar backup data on Data Domain adds a few capabilities. One
capability, Instant Access, enables a backup of a failed VMware virtual machine to
be powered on and available almost instantly. Rather than waiting for all the virtual
machine data to transfer back to the original datastore, the VM data is presented to
the hypervisor through an NFS share so that it can be instantly powered on. Later,
the data can be transferred back to the original datastore using VMware vMotion
for Storage.
Another capability, Cloud Tier, is a Data Domain feature that sends older data to
cloud storage. Avamar is integrated with the Cloud Tier feature so that you can
manage tiering policies from the Avamar GUI. Data that has been sent to the cloud
tier is recalled automatically if a restore from that data is needed.
Avamar image backups also take advantage of VMware’s changed block tracking.
This means that only the changed blocks on a virtual drive are scanned by the
Avamar image proxy. This further increases backup performance. Avamar can
back up an entire vApp as a single entity. When a virtual machine needs to be
restored, you may either restore the entire virtual machine, individual virtual drives,
or individual files from the image backup.
Guest Backup
Guest backups involve installing the backup agent directly onto the guest and
performing backups. Guest backups provide more granularity than image backups.
Administrators can choose to back up only certain files or directories, and can use
plug-ins for databases and applications. However, guest level backups use CPU
and memory resources of the machine that is being backed up. Some features that
require image backups, such as Avamar’s Instant Access are not available.
Avamar Replication
Avamar can replicate its backup data to another Avamar server for disaster
recovery. With replication, you can be sure that your backups are not lost, even if
the primary Avamar server becomes unavailable or lost. Replication takes
advantage of deduplication technology, so that only changed data is sent over the
network. Replication is available with an Avamar integrated Data Domain as well.
However, the target replication site must also have both an Avamar and Data
Domain system. Many replication topologies are supported which enables flexibility
in deploying Avamar servers.
Avamar can be used to backup a VxFlex integrated rack system. Only Avamar with
an integrated Data Domain is supported. The Avamar server can be a single node
or a virtual edition server. Multi-node Avamar servers, or grids, are not supported.
The Avamar and Data Domain systems can either be racked separately from the
VxFlex system or, in smaller environments, with it. In a small environment, the
Avamar and Data Domain systems are connected directly into the VxFlex
integrated rack access switches. Because ports on the access switch are used for
the backup components, the VxFlex integrated rack supports four fewer nodes.
In larger environments, the Avamar and Data Domain can be racked separately.
Since the Avamar node and Data Domain systems require extra network
connections, a pair of Cisco Nexus 9K switches is used. These switches provide
communication between the VxFlex integrated rack and the Avamar/Data Domain.
Also, image proxies need to be deployed in the vSphere environment. For the best
performance, these proxies should have access to the datastores of the VMs that
you want to back up.
RecoverPoint for VMs also provides protection against natural disasters, accidents,
utility outages, and technical malfunctions. It also helps administrators recover from
daily operational mishaps like data corruptions, virus attacks, and operational
errors.
RecoverPoint for VMs also helps during system upgrades and data migrations for a
data center migration or expansion.
The RecoverPoint for VMs virtual RecoverPoint Appliances (vRPA) are installed in
the VMware vSphere environment. For high availability and performance, vRPAs
are deployed in clusters of two to eight nodes. The vRPAs are delivered in an Open
Virtualization (OVA) format.
Each VMware ESXi host that participates in protecting virtual machines requires
the RecoverPoint for VMs splitter to be installed. The splitter is a vSphere
Installation Bundle (VIB) file. Splitters are aggregated within a VMware cluster. As
the ESXi splitter operates from within the virtual layer, it can replicate any storage.
ESXi Splitters can be shared by multiple vRPA clusters.
Management of the solution is done through the RecoverPoint for VMs plug in for
VMware, that interacts with the vSphere API on the vCenter Server and the REST
API on the vRPAs. If remote protection is required, a WAN link can be used to copy
data to a vRPA cluster at another location.
Repository Volume
The repository is a unique system volume that is dedicated to each vRPA cluster.
The repository volume is used for storing configuration and consistency group
information which is required for transparent failover between RPAs. The
converged systems standard size for repository volume is 5.72 GB.
Journal Volumes
Journal volumes are required on local and remote sites. Each copy of data in a
consistency group must contain one or more volumes that are dedicated to hold
point in time history of the data. The type and amount of information that is
contained in the journal differs according to the journal type. The maximum size of
a journal volume should be 250 GB—per copy for a consistency group. There are
two types of journal volumes:
Copy journals
Production journals
Protect VM Wizard
To protect a Virtual Machine, you can launch the Protect VM Wizard from the
vSphere Web Client.
To start the Wizard, right-click a VM, and From the drop-down window select the
All RecoverPoint for Virtual Machine Actions and then the Protect button. This
starts the Protect Wizard. Alternatively, you can open this Wizard from Configure>
RecoverPoint for virtual machines (under More at the bottom). Click the link
Protect this VM. Make sure that the VM is selected.
The first step in the Protect VM Wizard is the Select VM protection method. To
protect the virtual machine, choose one of the following options: Create a new
consistency group creates a consistency group for the virtual machine. In the
Create new consistency group screen, enter a descriptive name for the
consistency group. Then select the production vRPA cluster for the VM. The other
protection option is to Add VM to an existing consistency group which enables
you to add the VM to an existing consistency group. From Select consistency
group, select the consistency group to which the virtual machine is added.
Next, Configure production settings. Enter a name for the production or source
copy. Choose the size for the Production Journal. See recommendations for the
Journal size in RPVM documentation. Select the Datastores displayed in the table.
If a datastore in a different location is desired, it can be manually registered.
Next, Add a copy. Enter a name for the remote copy. For example, remote vRPA
cluster in RoundRock. In this example, if Boston is selected, the copy would be
local.
Next, Configure copy settings. The Protection Policy section is also on this
page.
If Synchronous mode is chosen, no data is lost between the production VM
and the Replica if there is a disaster.
If Asynchronous mode is chosen, you must choose the RPO (Recover Point
Objective) which determines how much data is acceptable to be lost. From the
drop-down menu in the RPO section a user can choose the size such as Bytes,
KB, MB, GB and TB, number of writes, or the passage of time from seconds to
hours.
Select copy resources (a remote cluster), and storage. Define failover networks
for the copy, and on Ready to complete page, review the settings and click
Protect.
You can manage the replication environment from plugin. From the plugin select
Protection, and then Consistency Groups. Highlight the Consistency Group.
Select Topology and review the details.
For more information about RecoverPoint for virtual machines, see product
documentation at www.dellemc.com.
This lab presents activities that are related to protecting virtual machines using
available snapshot technologies.
This lesson presents key monitoring activities for virtual compute and network
environments. For detailed information, see VMware vSphere monitoring
documentation.
Most of the virtual compute and network monitoring is done with VxFlex node
cluster vCenter. Here you can monitor the health of the ESXi hosts and the clusters
to which they belong. All production VMs run in this vCenter and must be monitored
for resource usage, and performance. You can also monitor and manage virtual
networks, such as distributed virtual switches, port groups, and VLAN settings. The
VxFlex node cluster uses datastores which is also monitored for capacity usage
and performance. These datastores are created on the VxFlex OS storage.
vCenter provides some VxFlex OS monitoring capability through the VxFlex OS
plug in.
Resource Monitoring
Resource usage is reported on the Summary tab. Select the cluster, host, or VM in
the Navigator pane, and review the USED vs CAPACITY values of the CPU,
memory, and storage.
The vSphere statistics subsystem collects data on the resource usage of inventory
objects. Data on a wide range of metrics is collected at frequent intervals. The data
is processed and archived in the vCenter Server database. You can access
statistical information through command line monitoring utilities or by viewing
performance charts in the vSphere Web Client.
Monitoring VMs
The vSphere Web Client lets you look at a virtual machine at a high level with the
Summary tab. It also enables you to monitor a specific aspect of a VM. The
Monitor tab gives you options to look at Issues, Performance, Tasks and Events,
Policies, and Utilization. The screenshot on this slide shows the recent events that
have occurred on the VM.
VM Performance Monitoring
You can review information about CPU utilization on virtual machines that are
available in vCenter Server.
Temporary spikes in CPU usage indicate that you are making the best use of CPU
resources. Consistently high CPU usage might indicate a problem. You can use the
vSphere Web Client CPU performance charts to monitor CPU usage for hosts,
clusters, resource pools, virtual machines, and vApps.
Host machine memory is the hardware back-up for the guest virtual memory and
guest physical memory. Host machine memory must be at least slightly larger than
the combined active memory of the virtual machines on the host. A virtual
machine's memory size must be slightly larger than the average guest memory
usage. Increasing the virtual machine memory size results in more overhead
memory usage.
You can view the status of each host, or node, and its VDS from the vSphere Web
Client. To view VDS health, go to Network, select the VDS in the left pane, select
the Monitor tab, and then Health.
The green color of the port plugs show that the port is active. One of the ports is
down in the example shown.
Monitoring vSAN health is critical for the proper functioning of the VxFlex Controller
cluster. To validate the health of the VxFlex Controller cluster, perform the vSAN
health test periodically. Select the cluster, and under the Monitor tab, select vSAN
> Health and then click the Retest button. In a healthy system, all tests should
pass successfully.
Events are records of user actions or system actions that occur on objects in
vCenter Server or on a host. Examples of events include license key expiry, VM
power on, or lost host connection. Event data includes details about the event such
as who generated it, when it occurred, and what type of event it is.
RHV generates events when errors occur. These events can be forwarded to an
email or SNMP server.
VxFlex OS Monitoring
Administrators can view and monitor various VxFlex OS components using the
user interface. The Dashboard tiles provide a visual overview of storage system
status. The tiles are dynamic, and contents are refreshed at the interval set in the
system preferences (default: 10s). The Dashboard’s navigation button switches the
display of the navigation tree. You can change the Dashboard display by double-
clicking the wanted navigation tree node.
You can also view MDM cluster information and the master MDM IP address by
hovering over the Management pane.
The VxFlex OS GUI provides capabilities to view detailed information about various
objects and their status.
Navigate to the wanted object. Click the expandable Property Sheet on the
right side of the window.
The Property Sheets display detailed read-only information about the element.
Users can even simultaneously work with multiple Property Sheets. One for
each of several related elements, such as a device, an SDS, Storage Pool, and
Protection Domain.
The Alerts indicators show the overall error state of the system. When lit, indicators
show the number of active alerts of each severity. Similar indicators are displayed
in some views of the Backend table, and also on Property Sheets. You can view
details about the alerts active in the system in the Alerts view.
The Alerts view provides a list of the alert messages currently active in the system,
in table format. You can filter the table rows according to alert severity, and
according to object types in the system. RED indicates critical severity alert,
whereas ORANGE is medium, and YELLOW is a low severity alert.
A quick way to see alerts is to use the State Summary view in the Backend view.
This view shows alerts next to the items that the alert pertains to. This can make it
easy to find which components need attention.
Events may be viewed when logged on the Master MDM using the
showevents.py script that is provided as part of the VxFlex OS installation. The
output of this command may be in color. It corresponds to the severity of the logs.
Green color entries for normal operations. Yellow and orange for warnings and
more critical events. The MDM stores the events in a persistent and private
database file which periodically archives them.
Shown here is the structure of a VxFlex OS event as recorded in the system. Every
VxFlex OS event has six distinct fields: ID, Date, Name, Severity, Message, and
Extended. These fields are selected for a particular event, which is displayed by the
showevents.py command.
As shown here, the showevents.py output produces one line of output per event.
Each line uses a color-coded font, based on the severity level for the particular
event.
The VxFlex OS Manage and Monitor Guide documents every possible event by a
uniquely identifying Name field. For each type of event, it indicates the
recommended action.
For detailed list of events and recommended action, see the latest VxFlex OS
Manage and Monitor Guide.
VxFlex OS events may be forwarded to a remote syslog server. Syslog allows for
convenient monitoring in data centers that have standardized syslog as a
mechanism to aggregate logs from all applications.
Hardware Monitoring
The iDRAC home page displays the Dashboard, which provides high-level health
status of the various system components. The GREEN check indicates that there
are no health issues with the server. You can click each component to find more
details.
The Dashboard also provides basic information about the system including model,
service tag, iDRAC MAC address, BIOS, and firmware version.
You can also launch a console session from the Dashboard. Power controls are
available here in the blue button just under the Dashboard title. The tabs on the top
of the home page take you to specific details based on which action you would like
to perform.
From the System view, iDRAC allows you to monitor different hardware
components such as Batteries, CPU, and Power Supplies. You can drill down
various components to find more information. For example, information about fans
and system temperature can be found under Cooling. If a fan has an issue, its
status goes to a warning or critical state. There are similar details for the memory,
network devices, and other components.
Storage is examined under its own tab where you can see the high-level status of
the physical disks and drill down into detail on each device. If a device has an
issue, the status will change color.
You can configure how iDRAC handles different alerts on the Configuration,
System Settings page. For example, you may want some alerts to generate an
email or SNMP trap. Some events can even be configured to perform an action
when they occur, such as an automatic reboot of the server.
To use email and SNMP, you must configure their settings, such as the SMTP
server and email address information, in the SMTP (Email) Configuration selection.
Once VxFlex Manager discovers OpenManage Enterprise, you can open the
application and manage critical and error alerts for the device. You can configure
the email (SMTP) address that receives system alerts, SNMP destinations, and
Syslog properties in OME. To manage these settings, you must have the
OpenManage Enterprise administrator-level credentials.
Here is an example from a Cisco 3172 showing brief information about the
interfaces on this switch. This is a good first command to run and see attributes,
such as Status and Speed. You can see the assigned VLAN, its access mode –
whether its access or trunk mode, and port speed. Notice that the 40G trunk is
used to connect to other switches. You can also see the reason that a port is down.
When you see Administratively down, it means the admin set the port to down or
shutdown. The only way this port can be active again is if the admin purposely
activates it with the no shutdown attribute on the port.
Showing the VLAN can provide a high-level view of VLAN definition. This example
is from an access switch. Notice there are port channels in use here, indicated as
Po in the Ports column. To get more details on the port channels, you can run a
show port-channel command.
In this example, you can see the virtual port channels that are set up on the Cisco
switch and their status. All port channels are tied to real ports aka interfaces. Notice
on Port-channel 40, one Ethernet interface is down and one is up. The next thing
that you may want to do is to show the details of the down interface, if you think it
should be up. Use the show interface command to gather further detailed
information about the port.
To see details of a port, use the show interface Ethernet x/x, where x/x is the port
or interface number. Notice that you can see the MTU size here along with the
status. This example is from the Cisco 3172 switch.
This lesson presents monitoring VxFlex integrated rack components using VxFlex
Manger. It also covers configuring SNMP for various components.
Shown is the dashboard for a VxFlex integrated rack system with one service. A
service is a collection of resources in a VxFlex integrated rack. In this example, the
service is a vSphere Cluster and VxFlex OS storage running on six nodes. This
service shows a warning.
The example also shows that there is a total of 14 nodes in the system. One of
them also a warning.
By clicking the blue warning links, you can quickly see which nodes or systems are
showing the warning.
Further down in the Dashboard, you can see the utilization of nodes. In the
example, 42% of the nodes, or 6 out of 14, are in use by a service. The remaining
nodes can be added later if more capacity is needed. They can be configured to
provide compute and storage to the existing service, or to a new service.
The Dashboard also shows the VxFlex OS storage usage. In the example shown,
there is only one VxFlex OS cluster that only has 512 GB of storage provisioned.
From the Services section of VxFlex Manager, you can view your services.
Selecting a service shows a diagram of all the resources in the service. You can
quickly see each resource along with their statuses. You can see details on each
resource by clicking it. You can then view logs from that resource. Some
maintenance tasks are also available. For example, you can place a node into
Service Mode which places the VxFlex OS and VMware services into maintenance
mode.
In this view, you can quickly see information about each resource. This includes
whether they are healthy and if they are complaint with the RCM level. It also
provides links to the management interfaces of each component, or in the case of a
switch, its IP that you can connect to.
VxFlex OS Details
You can view details of a resource by selecting it in the Resources view and
clicking View Details. Here, you can see detailed information about the resource
including performance statistics. Shown is the details page for a VxFlex OS
system. It shows its capacity and historical IOPS data.
Node Details
The Resources page displays detailed information about all the resources and
node pools that VxFlex Manager has discovered and inventoried. You can perform
various operations from the All Resources and Node Pools tabs. Here, you can see
that the Resource Details page displays detailed information about the resource
and associated components. Performance details, including system usage, CPU
usage, memory usage, and I/O usage are displayed. Performance usage values
are updated every five minutes.
If there is a drive failure, VxFlex Manager provides wizards to guide you through
the process of selecting a disk to remove and completing the disk replacement.
VxFlex Manager supports drive replacement for storage-only and hyperconverged
SSD drives for Rx40 (R640, R740xd...) models only. It enables drive replacement
for NVMe disks only on storage-only nodes.
VxFlex Manager monitors current firmware and software levels and compares them
to the active RCM definition, which contains the baseline firmware and software
versions. It shows any deviation from the baseline in the compliance status of the
resources. You can use VxFlex Manager to update the servers to a compliant
state. Using VxFlex Manager, you can choose a default RCM for compliance, or
add new RCMs.
You can view RCM compliance by clicking a service in the Services window and
clicking the View Compliance Report button.
VxFlex Manager and VxFlex OS connect with Secure Remote Services to transmit
encrypted RCM compliance assessment. It also transmits server hardware and
VxFlex OS alerts to Dell EMC Customer Support.
VxFlex Manager can be configured to send alerts to support staff using Secure
Remote Services. Secure Remote Services routes alerts to the Dell EMC support
queue for diagnosis and dispatch.
For information about how to configure Secure Remote Services, see the Dell EMC
VxRack FLEX Administration Guide.
VxFlex Manager enables users to forward SNMP traps or syslog to local servers of
their selection. It acts as an aggregator for all devices in VxFlex Authentication is
provided by VxFlex Manager, through the configuration settings provided. VxFlex
Manager can be configured to forward syslogs to up to five destination remote
servers.
To configure SNMP, specify the access credentials for the SNMP version you are
using and then add the remote server as a trap destination. VxFlex Manager and
the network management system use access credentials with different security
levels to establish two-way communication. For SNMPv2 traps to be sent from a
device to VxFlex Manager, you need to provide VxFlex Manager with the
community strings on which the devices are sending the traps.
VxFlex Manager receives SNMPv2 traps from devices, and forwards SNMPv2 or
v3 traps to the network management system.
The image shows commands to configure SNMP trap on the Cisco Nexus
switches. For more information, see www.Cisco.com.
This module focuses on the VxFlex integrated rack system life cycle management
and basic maintenance tasks.
This lesson presents the overview of system life cycle management and upgrades.
Managing the system life cycle is key to having a stable, secure, and compliant
system. Dell EMC converged, and hyperconverged systems use Release
Certification Matrix (RCM) for the system life cycle management. Each RCM
version document defines the specific hardware components and related software
version combinations that are tested and certified for integrity and compatibility.
The main purpose of the RCM is to provide a reference of known, approved, and
supported configurations of these systems. These documents are regularly
updated as new software and firmware are released. Using the RCM to update and
maintain a system results in a consistent, secure, up-to-date, and validated
platform over its entire life cycle.
You can download the Release Certification Matrix documents from the RCM portal
available at the Technical Resource Center.
Not adhering to the RCM can put the integrity of the VxFlex integrated rack system
at risk. VxFlex Manager provides the capability to check system compliance with
the designated RCM version. VxFlex Manager identifies noncompliant components
and recommends remediation.
The following steps are performed to view the RCM compliance report:
Select the resource for which you want to view the compliance report.
Under the RCM Compliance column, click the link corresponding to the RCM
Compliance option.
The Release Certification Matrix Compliance Report page is displayed.
Select the Firmware Components option to view the details of the firmware
available on the selected resource. Select the Software Components option to
view the software components in the compliance report.
The compliance report can also be exported for the available resources.
VxFlex Manager enables you to load newer compliance versions and specify a
default version for compliance checking. You can load multiple compliance
versions into VxFlex Manager. VxFlex Manager enables you to load the
compliance files either using Secure Remote Services or from a local repository.
If you selected to load RCM from configured Secure Remote Services, click the
Available RCMs drop-down list, and select the RCM.
Tip: If you want to be able to add the RCM with Secure Remote
Services, you must first configure the alert connector.
If the desired compliance version exists on the support VM, you can load it to
VxFlex Manager. From VxFlex Manager, click Settings, and then choose
Compliance and OS Repositories. On the Compliance and OS Repositories
page, select the Compliance Versions tab. You can use the Compliance
Versions tab to load RCM versions and specify a default version for compliance
checking. To load a new compliance version, click the Add button. The default
compliance version is always used for shared resources, such as switches. You
can specify that each service uses the default compliance version or a different
compliance version. The operating system images for ESXi and VxFlex OS are
included with the RCM.
It is recommended that the individual performing the upgrade should be well versed
with the system operation and interdependencies. Be certain to review the RCM
release notes, and the upgrade procedure in detail before performing the upgrade.
Dell EMC VxFlex integrated rack Upgrade Guide provides a detailed procedure to
upgrade between current and targeted RCM versions. Some upgrades may require
multiple hops from one RCM to another. Multiple-step upgrades may involve
significant extra effort and complexity. These upgrades occur in a specific order as
noted in the Upgrade Guide. Also, identify if there are other interrelated
components or applications that should be upgraded at the same time. System
health assessment and remediation are required prior to performing an upgrade.
The primary objective of the assessment and remediation activity is to get the
system into a good known state to perform the upgrade successfully.
The VxFlex integrated rack Upgrade service is also available from Dell EMC
Professional Services. This service reduces risk by providing the necessary
activities and tasks by highly skilled professionals to perform a Release
Certification Matrix (RCM) upgrade. It also includes an assessment and
remediation to ensure system stability.
Dell EMC Professional Services and Partner teams minimize risk with pre-
engineered, predefined, and pretested upgrade paths. Their expertise helps to
reduce the costs and time that is typically associated with “do-it-yourself” upgrades.
Best practices, unique depth, and breadth of knowledge are brought to the delivery
of this service. Also, if an issue arises, Dell Technologies technical support can
more quickly and easily diagnose and fix the problem.
As part of the preparation, check the health of the system a few days before the
upgrade time. If you find something unusual, you have time to correct it before the
targeted upgrade starts. Also, check the health again just before starting the actual
upgrade to ensure that the environment is still healthy and stable. See the Dell
EMC VxFlex integrated rack Health Assessment and Remediation Guide for more
information.
Make sure that you review this entire upgrade document and each referenced
component upgrade document before you begin the upgrade. If you are uncertain
or have questions, call Dell EMC Support. The upgrade guide also provides the
approximate duration for each component upgrade. Estimate the total upgrade time
based on the components in your environment. Schedule some additional time in
case there are other unforeseen issues. Create a configuration backup file for each
system component before you begin the upgrade.
When upgrading VxFlex integrated rack, observe the following best practices:
button is disabled for all other nodes. This option can be used to eliminate the
manual procedure of taking the node offline.
You need to upgrade the VxFlex Manager virtual appliance before proceeding with
the RCM upgrade within VxFlex Manager. Refer to the upgrade guide for the
procedure to upgrade the VxFM virtual appliance.
VxFlex Manager performs rolling updates of the VxFlex OS nodes within a single
service. That means update one node at a time. This is necessary because each
service is associated with a single Protection Domain, and a Protection Domain
might not span across services.
During execution, the workflow performs the following steps for each node and its
associated SVM:
Checks to see if the node is part of the VxFlex OS cluster. This check ensures
that the workflow does not attempt to update nodes that are not a part of the
cluster.
Finds the SVM for each node that is part of the VxFlex OS cluster by performing
a lookup within the VxFlex OS Gateway.
The workflow then puts the SDS into maintenance mode, turn offs the SVM,
and finally puts the ESXi node in maintenance mode.
Runs the update task for the firmware that is specified in the RCM. To perform
the update, it installs the required update and reboots the ESXi node. To ensure
that the ESXi node restarts successfully, it performs a verification check after
the reboot.
Takes the node out of maintenance mode, powers on the SVM and calls the
VxFlex OS Gateway.
This process repeats for every node that requires an update. The updates for each
node take approximately 45 minutes to complete. The RCM update process
handles node, BIOS, firmware, and ESXi driver updates automatically. However,
updates to the VxFlex OS software and to the top of rack switches must be
performed manually.
VxFlex integrated rack systems are engineered for high availability and reliability,
however, problems can occur. To help identify the problem, it is best to know the
normal operating environment. This can be achieved by saving logs and
configuration files after the initial installation, or when things are running normally.
This information can be used as a baseline for comparison.
The most common cause of many issues is configuration changes. Since VxFlex
integrated rack has many integrated components, it is important to validate the
impact on the overall system before making any changes. Even a small change or
fix in one area may introduce a problem in some other system areas.
Understanding the overall I/O path and component integration can help identify the
root cause of a problem.
See Dell EMC VxFlex Integrated Rack Administration Guide for procedures and
best practices about administrating the system.
Since nearly all services in a VxFlex integrated rack are distributed, networking
could be a key trouble point. Validating communication between all physical and
virtual components is a key troubleshooting task. When troubleshooting network
connectivity, you should check related server NICs, the DVswitches, and the
VLANs. You may also need to verify the Virtual Port Channels configured for the
management networks. If new production VLANs were added, check the location of
these VLANs. In VMware environments, all the production VLANs should be
configured on DVswitch 0. The ping command is the most useful tool to verify the
connectivity. Refer to the port map and LCS to verify the network information.
Depending on which components in the I/O Path fail, you will see different results.
If a VMDK file for a virtual machine is corrupted, then only that VM is affected. If a
datastore becomes corrupted, all VMs on that datastore are affected.
If an SDC fails, then the ESXi server loses access to all VxFlex OS volumes. Other
ESXi servers will not be affected.
Network failures can also cause the ESXi, or multiple ESXi servers to lose access
to VxFlex OS storage. Besides, the SDSs might lose connectivity to each other,
resulting in various VxFlex OS errors.
If an SVM or Red Hat SDS fails, then VxFlex OS shows errors for that SDS.
However, VxFlex OS storage is still available, due to its resilient nature. You will
also see a rebuild operation just after the failure.
A failure of DirectPath I/O or other device failures will also show errors for the
VxFlex OS devices. If all devices on an SDS have failed, then DirectPath I/O or a
storage controller has likely been disabled or failed. An individual device failure
indicates a problem with that device.
Troubleshoot VM Access
If you have created a VLAN for an application and that new VM/application cannot
access the new VLAN, it could be a problem with the VLAN. Check to see if it is
defined correctly in all places. For example, check the settings on the VM, on the
DVswitch, and on the physical switch. Understand whether the network traffic
needs to leave the VxFlex integrated rack environment or not, which determines if
the VLAN is enabled on the customer uplink ports. Using switch commands,
display the status of the interface ports, virtual PortChannel and VLANs.
When looking at the ESXi elements for network troubleshooting, examine each
Distributed Virtual Switch. Validate that its uplinks are active. If they are, you can
see the green designation. If you see red, it is inactive and should be investigated.
You should check the physical elements such as the switch and cables if you find
an uplink down.
You can query the status of the VxFlex OS SDS by using this scli command. It
pings the different SDS validating traffic to and from each SDS on their specific IP
addresses. You could also use this command to validate IPs if a change to the IP
address of an SDS was required.
There are multiple commands that you can run on the switch to show the status of
different components. Show interface brief, for example lets you see the
assigned VLAN, its access mode, and port speed.
If you are seeing performance issues, you may see symptoms such as reduced
IOPS reported, SDS errors, or "packet too big" messages. If you see these errors,
you should check if the auto-negotiation is ON for a switch port. All switch ports
should not use auto-negotiation and should be set to the proper speed. Remember
that there are specific policy settings that are required on different networks.
Sometimes intermittent network hardware failures can affect the latency.
You may have MTU size mismatch error messages or suspect a mismatch. MTU
mismatch errors can occur when the components along a network path are
configured with different MTU sizes. MTU mismatch is undesirable because it can
increase CPU utilization on the switches due to the overhead of disassembly and
reassembly of the packets. If there is an MTU mismatch, you may see ICMP
“packet too big” messages sent back to the source. However, if the traffic passes
through a firewall, the ICMP messages may be blocked. IPv4 routers fragment on
behalf of the source node that is sending a larger packet. However, the IPv6
routers do not fragment IPv6 packets on behalf of the source, but drop the packet.
It sends back an ICMPv6 Type 4 packet (size exceeded) to the source indicating
the proper MTU size. You can validate the MTU size by using the ping command
and telling it to not fragment the packet as shown in these examples. Next check
the DVswitches in the VxFlex integrated rack environment.
Check each DVSwitch to ensure that its MTU size is properly set. Verify that the
MTU size for DVswitch0 is set to 1500, and DVswitches 1 and 2 are set to 9000.
Next, validate the MTU size on each kernel adapter on all ESXi hosts. In the
vSphere web client, select the host in the left navigation pane, and then select a
VMkernel adapter. Adaptor vmk0 is shown in the graphics. Select Configure >
Networking > VMkernel adapter and in the bottom window under the Properties
validate the MTU size set to 1500. Similarly, verify that the VMkernal interfaces on
DVSwitch 1 and 2 for the VxFlex OS data paths are set to 9000.
Check for counter errors on a physical switch to see if packets are getting dropped
as this may be an indication of an MTU size mismatch or other error.
For vSphere port groups, validate the load balancing policy by selecting Configure
> Settings > Policies > Teaming and failover. The flexmgr-install and Sio-data
port group policies should be set to Route based on originating virtual port. All
other port group policies should be set for Route based on IP address.
To see details of a switch port, or interface, use the show interface Ethernet x/x,
where x/x is the port number. Notice that you can see the MTU size here in addition
to the status which is where you check CRC also. CRC issues can point to an MTU
size mismatch along the I/O path, or cable, or port speed that is not set properly.
This example is from the 3172 switch.
vSAN is running in the controller cluster. Although minimal changes may occur in
this environment, issues can still arise. To do some preliminary troubleshooting,
first run the Retest check of vSAN from the vSphere Web client. You can also look
at some of the documentation listed below or call Dell Technologies support.
For troubleshooting information, see the VMware® Virtual SAN Diagnostics and
Troubleshooting Reference Manual at
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vs
an/vsan-troubleshooting-reference-manual.pdf
A VxFlex integrated rack node may need to be taken down for various reasons. It is
critical to follow an orderly shutdown sequence which enables you to bring it back
online. The first task is to gather information about the node. If the targeted node is
the primary MDM, the recommendation is to switch the MDM ownership manually
before shutting it down. Refer to Administration Guide or VxFlex OS documentation
for the detailed procedure.
The best approach to bring the node offline is using the VxFlex Manager Service
Mode feature as shown in the graphics. It automates an orderly shutdown
sequence to bring the node into maintenance mode safely. Click the node from the
Services view. Under Node Actions select Enter Service Mode. The node under
the Service Mode displays with wrench icon. It also context-limits activities that can
be performed on that particular service. After the node maintenance is complete,
click Exit Service Mode to bring the node back online.
Alternately, you can use the manual process to bring the node offline.
For hyperconverged nodes running ESXi: Put the SDS for the node into
Maintenance Mode from the VxFlex OS GUI. Then, from VMware vCenter
migrate any running VMs off the node (except the Storage VM), shut down the
Storage VM, and place the ESXi node into the maintenance mode.
For storage-only nodes: After you put the node into Maintenance Mode from
VxFlex OS GUI, put it under maintenance from VMware vCenter since they do
not run any SVM.
Sometimes when hardware fails, you may experience different logical failures in
other areas of system operations. Error logs help determine what component is
failing, whether the same component is always failing, or if there is a component in
common across multiple nodes. Log in to iDRAC and examine the server to see if
something is occurring at the hardware layer. If you are unsure, after collecting
logs, you can reset error counters or logs to ensure that you are capturing and
examining the latest information.
Log Collection
This lesson presents the key system logs and the log collection process.
VxFlex Manager provides server logs, VxFlex OS logs, and an activity log of user
and system-generated actions. Other log categories include deployment,
infrastructure or hardware configuration, infrastructure or hardware monitoring,
licensing, network configuration, and template configuration. By default, log entries
display in order of occurrence. These logs contain information about VxFlex
Manager activities.
To troubleshoot the virtual environment, you may need to access the VMware
support log bundle. You can view vCenter Server logs by selecting the vCenter
Server and going to Monitor > System Logs. Additionally, you can right-click the
ESXi host or a VM and select Export System Logs to start the Export Logs
wizard. Generally, you do not need the performance data. There are more options
under some of the selection choices. Under the Storage selection, you have
different elements that are related to vSAN. If you are working in the VxFlex OS
cluster, you do not need to select vSAN.
On the VCSA virtual machine, the vc-support.sh script can be run to collect the
vCenter log bundle. It records all logs and the information from the VCSA until the
time of the collection. This script creates a vcsupport.zip file in the /root directory of
the vCSA. Either use SCP to export the generated support bundle to another
location, or download from https://<VCSAIP>:443/appliance/<support-bundle>.tgz
using root credentials. You can reference the VMware knowledgebase article
2110014 for details about the types of files that are included in the bundle.
The vm-support command can be run on the ESXi nodes to generate the
vSphere log bundle. The bundle is displayed in /var/tmp, /var/log, or the current
working directory. The vSphere bundle is the standard vmsupport bundle that is
collected in typical ESXi troubleshooting. The bundle contains all logfiles for a
specific node. This command creates a .tgz file that contains many logfiles related
to the ESXi node. You can also download the logs remotely with the URL:
https://<esxihost>/cgi-bin/vm-support.cgi. Another important script for debugging or
re-creating the system with the collected data is the reconstruct.sh script file.
This file is created in the root directory of the support bundle. Certain commands in
vm-support generate a large file. This file consumes more resources and is likely to
result in a timeout error or takes a considerable amount of time to execute. To
control the creation of larger files, the reconstruct.sh file breaks down the
larger file into fragments when added into the vm-support bundle. Upon completion
of the support tool, users can re-create the larger file by running the
reconstruct.sh file above the extracted bundle directory.
Linux logfiles can be collected using the emcgrab procedure. This procedure is a
comprehensive collection of key elements from the Linux storage only node. Log
on to the Linux console with the root user id and retrieve logs using the scp
command. Collect the Linux operating system logs from the /var/log directory.
Alternatively, EMC Grab Utility can also be used to collect the logfiles. For more
information, search for EMC grab at https://www.dell.com/support/home.
For VxFlex OS-related issues, you can retrieve logs through the VxFlex OS
Gateway or using the VxFlex OS CLI.
To retrieve logs using the VxFlex OS Gateway, log in to the VxFlex OS Gateway,
and from the top menu bar, select Maintain. Provide login credentials for the MDM
and LIA password and, click Retrieve system topology. In the Maintenance
operation screen, click Collect Logs and enter the MDM admin password. Click
Collect Logs. You can monitor the process of operation in the Monitor tab.
Refer to the VxFlex OS User Guide for information about the VxFlex OS CLI
command to retrieve system logs.
You can get logfiles for each VxFlex OS component directly by using ssh. The
logfile is a zipped .tar file format. If there is more than one component on a VxFlex
OS Virtual Machine, it gathers that information as well. Run the get_info.sh
providing the MDM user and its password. Run the script
/opt/emc/scaleio/mdm/diag/get_info.sh –u <MDMuser> -p
<MDMpassword> -r to retrieve logs. The files are located in the /tmp/scaleio-
getinfo directory.
You can collect VxFlex OS installation log by using the Show server log option in
the vSphere web client. Use copy and paste to save the information to a file.
Troubleshoot any VxFlex OS installation issues with the server log in the VxFlex
OS plug-in.
System Events can be downloaded from the Maintenance section after logging into
iDRAC. Lifecycle Log files can be viewed online.
This lab presents activities that are related to maintaining and troubleshooting
VxFlex integrated rack.
Course Summary
This course presented tasks and activities that are required to manage the VxFlex
integrated rack in your environment.