Professional Documents
Culture Documents
VMware NSX-v
Hands-on Guide
Version 1.0
This document can only continue to be successful if we have people contributing with their experience
and knowledge. I would also like to take this opportunity to thank the people that have already helped in
the past year with the creation of the current content:
Kevin Barrass, Nimish Desai, Ray Budavari, Francis Guillier, Dimitri Desmidt, Brad Hedlund, Michael
Haines, Shachar Bobrovskye, Michael Moor, Tiran Efrat, Marcos Hernandez .
Contents
12. Create Firewall Rules that Blocked Your Own VC .................... 139
12.1 How is This Related to NSX? .................................................................................139
12.2 How Can We protect Ourselves from this Situation? ...............................................141
12.3 What if we made a mistake and do not yet have access to the VC? ........................143
This is a mandatory configuration. Registering the NSX Manager with vCenter injects a plugin
into the vSphere Web Client for consumption of NSX functionalities within the Web
management platform.
While trying to Register to vCenter or configuring the Lookup Service you might see this error:
Most of the problems to register NSX Manager to vCenter or configure the SSO Lookup service
are:
Verify connectivity from NSX Manager to vCenter. Ping from NSX Manager to vCenter using
both the IP address and the Fully Qualified Domain Name (FQDN). Check for routing or static
information or for the presence of a default route in NSX Manager:
Verify NSX Manager can successfully resolve the vCenter DNS name. Ping from NSX Manager to
vCenter with FQDN:
If this does not work verify the DNS configuration on the NSX Manager.
Go to Manage -> Network -> DNS Servers:
If you have a firewall between NSX Manager and vCenter, verify it allows SSL communication on
TCP/443 (also allow ping for connective checks).
A complete list of the communication ports and protocols used for VMware NSX for vSphere is
available at the links below:
kb.vmware.com/kb/2079386
or
https://communities.vmware.com/docs/DOC-28142
Verify that actual time is synced between vCenter and NSX Manager.
One of the following issues hit during the deployment of the NSX-v Controller cluster may cause
the deployment to fail and the deletion after few minutes of the instantiated Controller nodes.
The first area to investigate is the “Task Console” on vCenter. From an analysis of the entries
displayed on the console, it is clear that first the Controller virtual machine is “powered on”,
but then it gets powered off and deleted. But why?
2.1 Troubleshooting
The Tech support file can be a very large text file, so finding an issue is as challenging as looking
for a needle in a pile of hay. What to look for?
My best advice is to start with something we know, the name of the Controller node that was
first instantiated and then deleted. This name was assigned to the Controller node after the
completion of the deployment wizard.
When you find the name try to use the arrow down key and start to read:
From this error we can learn we have connectivity issues; it appears that if the Controller node
can’t connect to NSX Manager during the deploying process, it will get automatically deleted.
The next question is: why do I have connectivity issues? In my case the NSX Controller and the
NSX Manager run in the same IP subnet.
The answer is found in the manual Static IP pool object that was created for the Controller
cluster.
In this lab I work with subnet class B 255.255.0.0 = prefix of 16, but in the object pool I
mistakenly assigned a prefix length of 24.
This was just an example on how to troubleshoot an NSX-v Controller node deployment but
there may be other reasons that can cause a similar problem.
Host preparation is the process in which the NSX Manager triggers the installation of the NSX
Kernel modules (also known as NSX VIBs) inside a vSphere cluster and builds the NSX Control
plan fabric.
Before the host preparation process we need to complete the following steps (discussed in the
previous sections):
Registering the NSX Manager with vCenter.
Deploying the NSX Controllers.
Three components are involved during the NSX host preparation: vCenter, NSX Manager, EAM
(ESX Agent Manager).
vCenter Server:
Management of vSphere compute infrastructure.
NSX Manager:
Provides the single point of configuration and REST API entry-points in a vSphere environment
for NSX.
© 2013 VMware, Inc. All rights reserved.
Page 15 of 208
NSX-v Hands-on Guide
The message clearly indicates that “Agent VIB module not installed” on one or more hosts.
We can check the vSphere ESX Agent Manager for errors:
“vCenter home > vCenter Solutions Manager > vSphere ESX Agent Manager”
On “vSphere ESX Agent Manager”, check the status of “Agencies” prefixed with “_VCNS_153”.
If any of the agencies has a bad status, select the agency and view its issues:
We need to check the associated log /var/log/esxupdate.log (on the ESXi host) for more details
on host preparation issues.
Log into the ESXi host in which you have the issue, run “tail /var/log/esxupdate.log” to view the
log
From the log it appears suddenly clear that the issues may be related to DNS name resolution.
Solution:
Configure the DNS settings in the ESXi host for the NSX host preparation to succeed.
Solution:
The NSX-v has a list of ports that need to be open in order for the host preparation to succeed.
The complete list can be found in:
https://communities.vmware.com/docs/DOC-28142
Solution:
Use a different port for EAM:
Changed the port to 80 in eam.properties in
\ProgramFiles\VMware\Infrastructure\tomcat\webapps\eam\WEB-INF\
Run this command on the ESXi hosts to check for active messaging bus connection
esxcli network ip connection list | grep 5671 (Message bus TCP connection)
3.7 The NSX manager has a direct link to download the VIB’s as zip
file
https://$nsxmgr/bin/vdn/vibs/5.5/vxlan.zip
3. Increase the local storage settings on the Flash Player will also speed up the web client.
Adobe have online tool to view and change the local storage setting:
http://www.macromedia.com/support/documentation/en/flashplayer/help/settings_m
anager07.html
Note: if the PC you are using to connect to vCenter does not have access to Internet this Adobe
link will not work. Thanks to Micha Novak and Yaniv Yaakov (my team colleagues) for this tip.
When we load the Web Client in the blue Screen try to fast “right click” on your mouse, then
click on the Settings button.
Network Port Requirements for VMware NSX for vSphere can be found KB:
kb.vmware.com/kb/2079386
https://www.rfc-editor.org/rfc/rfc7348.txt
This command will capture all the traffic sent from the local VTEP toward the physical switch
and save it in a file named cap2 with pcap format. While running this command, ping from one
guest 192.168.1.1 to another guest 192.168.1.2 (hosted in a different ESXi host) to generate
some traffic.
With WinSCP we can bring the pcap file from the ESXi host to my Windows PC and open it with
WireShark.
We can see udp traffic from VTEP host 192.168.64.130 to VTEP 192.168.64.131 dest to port
8472 (VXLAN) but where is the VXLAN header ?
Wireshark can display VXLAN traffic, but for doing that we just need to change decode to
VXLAN!!!
Right Click to the frame and chose “Decode As…”
5.3 Conclusions
When we configure VXLAN in DSwitch keeping the default MTU 1600 will keep you in the safe
side!!!
6. Teaming Policy
Teaming policies allow the NSX vSwitch to load balance the traffic between different physical
NIC’s (pNICs). The NSX Reference Design Guide (available at the link
https://communities.vmware.com/docs/DOC-27683) contains a table with different teaming
policy configuration options.
At first glance of the table, we can see that only some of the supported teaming options imply
the creation of Multiple VTEPs (on the same ESXi host).
Multiple VTEPs – two or more VTEP kernel interfaces that can be created in an NSX vSwitch.
In a Multiple VTEPs deployment we will have 1:1 mapping with the physical uplinks of the
vSwitch. That means each VTEP will send/receive traffic on a specific pNIC interface.
In our example VTEP1 will map to pNIC1 and VTEP2 will map to pNIC2.
© 2013 VMware, Inc. All rights reserved.
Page 29 of 208
NSX-v Hands-on Guide
This is the point to mention, all VXLAN traffic originated from VTEP1 will go out to pNIC1 , all
encapsulated traffic destined to VTEP1 will be received from pNIC1 (the opposite is true for
VTEP2).
This means that all VXLAN outbound and VXLAN inbound traffic from pNIC1 will forward from
and to VTEP1.
If we have more than one physical link that we would like to use for VXLAN traffic and the
upstream switches do not support LACP (or they are not configured). In that case the use of
multiple VTEPs allows to balance the traffic between physical link’s.
Configuration of the multiple VTEPs is done on the Network & Security > Installation >
Configure VXLAN tab.
Note: for the creation of multiple VTEPs it is required to select SRCID or SRCMAC as VMKNic
teaming policy during the VXLAN configuration of an ESXi cluster.
In this example we can see how 4 VTEPs are going to be created. This Number is coming from
the Number of physical uplink’s configured in the vDS.
VTEP1 will then send this traffic to pNIC1 (since VTEP1 is pinned to this uplink in our specific
example).
When VM2, with portID2, connects and generates green traffic, the NSX vSwitch will pick a
different VTEP to send out this traffic.
We will use another VTEP since the NSX vSwitch will see a different portID as the source, and
VTEP1 already has traffic. VTEP2 will hence forward this traffic to pNIC2.
Now VM3, from portID3, connects and sends yellow traffic. The NSX vSwitch with randomly
pick one of the VTEPs to handle this traffic.
Both VTEP1 and VTEP2 already have the same number of VM connections (one on each), so
there is no preference for who will be selected in terms of port-group balancing.
In this example, VTEP1 was chosen for this and forwards traffic to pNIC1.
Positive aspects: Very simple and there is no need to configure any LACP on the upstream
switch.
Negative aspects: If VM1 doesn’t generate heavy traffic, and VM2 is generating heavy VM
traffic, the usage of the physical links will not be balanced.
When VM2 with MAC2 connects and generates Green traffic, the NSX vSwitch will pick a
different VTEP to send this traffic out.
We will use the other VTEP since the NSX vSwitch sees a different MAC address as Source and
VTEP1 already have traffic. VTEP2 will forward this traffic to pNIC2.
At this point we are using both of the physical uplinks.
When VM3 with MAC3 connects and sends Yellow traffic, the NSX vSwitch will pick randomly
one of the VTEP to handle this traffic.
Both VTEP1 and VTEP2 already have the same number of VM connections, so there is not
preference for who will be selected in the context of MAC address balancing.
Positive points: very simple, no need to configure any LACP on the upstream switch.
Negative points: if VM1 doesn’t generate heavy traffic and VM2 sources very heavy VM traffic,
the utilization of the physical uplinks will not be balanced.
Starting from ESXi 5.5 release, VMware improved the hashing method for LACP to be able to
leverage up to 20 different HASH algorithms. vSphere 5.5 supports these load balancing types:
1. Destination IP address
2. Destination IP address and TCP/UDP port
3. Destination IP address and VLAN
4. Destination IP address, TCP/UDP port and VLAN
5. Destination MAC address
6. Destination TCP/UDP port
7. Source IP address
8. Source IP address and TCP/UDP port
9. Source IP address and VLAN
10. Source IP address, TCP/UDP port and VLAN
11. Source MAC address
12. Source TCP/UDP port
13. Source and destination IP address
14. Source and destination IP address and TCP/UDP port
15. Source and destination IP address and VLAN
16. Source and destination IP address, TCP/UDP port and VLAN
17. Source and destination MAC address
18. Source and destination TCP/UDP port
19. Source port ID
20. VLAN
Source or Destination IP Hash will derive from the VTEP IP address located in the outer IP
header of the VXLAN frame.
Every time we need to calculate the Hash algorithm for Source or Destination IP Method
(option 1 or 7) the VTEP IP address will be used.
Selecting LACPv2 (also referred to as “Enhanced LACP”) as teaming policy between an ESXi host
and the ToR switch leads to the creation of one VTEP only.
In this example we have 2 physical uplinks connected to one physical upstream switch. Those
uplinks are bundled together in a single “logical uplink”, which explains why a single VTEP is
created.
In this scenario we are selecting the IP Hash algorithm for LACPv2. We have two ESXi hosts,
esx1 and esx2. When VM1 connects to NSX vSwitch on host1 and generates Red traffic toward
VM2, the traffic is sent to VTEP1 (the only VTEP we have in the source ESXi host).
Then the NSX vSwitch calculates the Hash value based on Source VTEP IP1 or Destination VTEP
IP2 and as a result of this Hash value it selects pNIC1.
When the physical switch connected to esx2 receives the frame, it performs a similar hash
calculation (assuming the same IP Hash algorithm is also locally configured on the physical
switch) and selects one of the physical links (in this example pNIC1).
Now VM3 connects to NSX vSwitch at esx1 try to send Green traffic to VM4 also connected to
esx2, hence VTEP1 will handle this traffic. NSX vSwitch will calculate the Hash algorithm base on
the source IP (VTEP1) or destination IP (VTEP2) or both.
In any case, the result will be electing the same pNIC1 since this is the same Hash that was
calculated when VM1 sent traffic to VM2!!!
In this scenario we can see that both traffic flows originated from VM1 and VM3 are using the
same pNIC1 uplink.
When using L4 information, the Hashing will be calculated based on “Source port” or
“Destination port” (Option 2,4,6,8). In VXLAN that mean Hash will be derived based on the
values in the “Outer UDP Header”.
VMware creates a random UDP source port value based on the L2/L3/L4 headers present in the
original frame.
As a result of this method, every time a different flow (identified by the original L2, L3 and L4
values) is established between VMs, a different random UDP source port will be generated.
Now when VM1 and VM3 send traffic, the load-balancing algorithm may select different pNIC’s
(the more number of flows are originated, the more even utilization of the uplinks is achieved).
Note: both uplinks can be utilized also for flows originated from the same VM, as long as they
are associated to different types of communications (for example HTTP and FTP flows).
To know exactly what ESXi uplink is going to be used for traffic sourced by a given VM, it is
possible to use the following command after connecting SSH to the ESXi host where that VM is
located:
Type esxtop and then press ‘n’ (shortcut of network).
The VM named “web-sv-01a” is pinned to vmnic0. vmk3 is the VMkernel interface used for
VXLAN traffic and is pinned to vmnic0.
Note: in vSphere, vmnicx represent the physical uplinks of the ESXi host (also previously called
pNics).
6.10 Conclusion
The Controller cluster in the NSX platform is the control plane component that is responsible
for managing the switching and routing modules in the hypervisors.
The use of the Controller cluster in managing VXLAN based logical switches eliminates the need
for IP multicast in the underlay network.
Each Controller Node is assigned a set of roles that define the type of tasks the node can
implement. By default, each Controller Node is assigned all roles.
API provider: Handles HTTP web service requests from external clients (NSX Manager) and
initiates processing by other Controller Node tasks.
Persistence Server: Stores data from the NSX Manager APIs and vDS devices that must be
persisted across all Controller Nodes in case of node failures or shutdowns.
Logical manager: Monitors when endhosts arrive or leave the vDS devices and configures the
vDS forwarding states to implement logical connectivity and policies.
Switch manager: Maintains management connections for one or more vDS devices.
Directory server: manages VXLAN and the distributed logical routing directory of information.
Any multi-node HA mechanism has the potential for a “split brain” scenario in which a cluster is
partitioned into two or more groups, and those groups are not able to communicate. In this
scenario, each group might assume control of all tasks under the assumption that the other
nodes have failed. NSX uses leader election to solve this split-brain problem. One of the
Controller Nodes is elected as a leader for each role, which requires a majority vote of all active
and inactive nodes in the cluster.
The leader for each role is responsible for allocating tasks to individual Controller Nodes and
determining when a node has failed. Since election requires a majority of all nodes, it is not
possible for two leaders to exist simultaneously within a cluster, preventing a split brain
scenario. The leader election mechanism requires a majority of all cluster nodes to be
functional at all times.
Below is an example of 3 NSX Controllers and role election per Node members.
The different majority number scenarios depend on the number of deployed Controller Cluster
nodes. It is evident how deploying 2 nodes (traditionally considered an example of a redundant
system) would increase the scalability of the Controller Cluster (since at steady state two nodes
would work in parallel) without providing any additional resiliency. This is because with 2
nodes, the majority number is 2 and that means that if one of the two nodes were to fail, or
they lost communication with each other (dual-active scenario), neither of them would be able
to keep functioning (accepting API calls, etc.). The same considerations apply to a deployment
with 4 nodes that cannot provide more resiliency than a cluster with 3 elements (even if
providing better performance).
Note: Currently NSX-V 6.1 supports for production deployments only a Controller cluster with 3
nodes.
The next part of TSHOOT NSX Controllers is based on VMware NSX MH 4.1 User Guide:
https://my.vmware.com/web/vmware/details?productId=418&downloadGroup=NSX-MH-412-
DOC
Ensure that the Controllers are installed on systems that meet the minimum requirements.
On each Controller:
The CLI command “request system compatibility-report” provides informational details that
determine whether a Controller system is compatible with the Controller requirements.
The NSX Manager continually checks whether all Controller Clusters are accessible. If a
Controller Cluster is currently in disconnected status, your diagnostic efforts and log review
should be focused on the time immediately after the Controller Cluster was last seen as
connected.
© 2013 VMware, Inc. All rights reserved.
Page 48 of 208
NSX-v Hands-on Guide
This NSX “Controller nodes status” screenshot show status between the NSX Manager and
the Controller and not the overall controller cluster status.
So even if we have all controllers in “Normal” state like the figure below , that doesn’t mean
the overall controller status is ok.
Join status: verify this node completed the process of joining the cluster.
Majority status: check if this cluster is part of the majority.
Cluster ID: all node members need to be in the same cluster id.
The current status of the Controller Node’s intra-cluster communication connections can be
determined by running
If a Controller node is a Controller Cluster majority leader, it will be listening on port 2878 (as
indicated by the Y in the “listening” column).
The other Controller nodes will have a dash (-) in the “listening” column.
The next step is to check whether the Controller Cluster majority leader has any open
connections as indicated by the number in the “open conns” column. On a properly functioning
Controller, the open connections should be the same as the number of other Controller nodes
in the Controller Cluster (e.g. in a three-node Controller Cluster, the Controller Cluster majority
leader should show two open connections).
The command show control-cluster history will allow you to see a history of Controller Cluster-
related events on this node including restarts, upgrades, Controller Cluster errors and loss of
majority.
This section covers issues that may be encountered when attempting to join a new Controller
Node to an existing Controller Cluster. An explanation of why the issue occurs and instructions
on how to resolve the issue are also provided.
Symptom: Joining a new Controller node to a Controller Cluster may fail when all of the
existing Controllers are disconnected.
As we can see controller-1 and controller-2 are disconnected from the NSX Manager.
When we try to add new controller cluster we get this error message:
Explanation:
If n nodes have joined the NSX Controller Cluster, then a majority (strictly greater than 50%) of
those n nodes must be alive and connected to each other, before any new data can be written
to the system. This means that if you have a Controller Cluster of 3 nodes, 2 of them must be
alive and connected in order for new data to be written in NSX.
In our case to add a new controller node to cluster we need at least on member of the cluster
to be in “Normal” state.
Symptom: the join control-cluster CLI command hangs without ever completing the join
operation.
Explanation:
The IP address passed into the join control-cluster command was incorrect, and/or does not
refer to a currently live Controller node.
Make sure that the 192.168.110.201 node is part of the existing controller cluster.
Resolution:
Use the IP address of a properly configured Controller that is reachable across the network.
Symptom:
The join control-cluster CLI command fails.
Explanation:
If you have a Controller configured as part of a Controller Cluster, and that Controller has been
disconnected from the Controller Cluster for a long period of time (perhaps it was taken offline
or shut down), and during that time, the other Controllers in that Controller Cluster were
removed from the Controller Cluster and formed a new Controller Cluster, then the long-
disconnected Controller will not be allowed to rejoin the Controller Cluster that it left, because
that original Controller Cluster is gone.
The following event log message in the new Controller Cluster indicates that something like this
has happened:
Resolution:
You must issue the join control-cluster command with the force option on the old Controller to
force it to clear its state and join the new Controller Cluster with a fresh start.
Note: The forced join command deletes previously joined node with the same IP.
When controller cluster majority issue arises, it will be very difficult to spot it from the NSX
Manager GUI. For example the current state of the controllers from the NSX Manager point of
view is that all the members are in “Normal” state.
Node1 + Node 2 are part of the cluster and share the roles between them; for some reason
Node 3 disconnected from the majority of the cluster.
From Node 1 perspective he is the leader (have the Y) and have one open connection from
Node2 as shown below:
To recover from this scenario Node 3 needs to join to majority of the cluster, the IP address to
join needs to be the one of Node1 because it is the leader of the majority.
In this scenario all NSX Controller nodes failed or had been deleted. Do we need to start from
scratch?
The assumption is that in our environment we have already deployed NSX Edge, DLR and we
have VMs actively connected to logical switches. The desire would be to preserve all those
configurations.
Step 1:
Step 2:
© 2013 VMware, Inc. All rights reserved.
Page 57 of 208
NSX-v Hands-on Guide
Step 3:
Sync the new deployed NSX controllers to unicast mode with the current state of our NSX.
The show network connection output shown in the preceding block is an example from a
healthy Controller. If you find some of these missing, it’s likely that NSX didn’t get past its install
phase. Here are some misconfigurations that can cause this:
Bad management address or listen IP
You’ve set an incorrect IP as the management-address, or as the listen-ip for one of the roles
(like switch_manager or api_provider).
NSX attempts to bind to the specified address, and fails early if it cannot do so. You’ll see log
messages in cloudnet_cpp.log.ERROR like:
E0506 01:20:17.099596 7188 dso-deployer.cc:516] Controller component installation of rpc-broker
failed: Unable to bind a RPC port $tags:tracing:3ef7d1f519ffb7fb^
E0506 01:20:17.100162 7188 main.cc:271] RPC deployment subsystem not installed; exiting.
$tags:tracing:3ef7d1f519ffb7fb^
Or in cloudnet_cpp.log.WARNING:
W0506 01:22:27.721777 7694 ssl-socket.cc:530] SSLSocket failed to bind to 172.1.1.1:6632: Cannot
assign requested address
Note that if you are using DHCP for the IP addresses of your controller nodes (not
recommended or supported), the IP address could have changed since the last time you
configured it.
Verify that the IP addresses for switch_manager and api_provider are what they are supposed
to be by performing the CLI command:
<switch_manager|api_provider> listen-ip
to determine whether the IPs listed correspond to the IPs of the Controllers in the Controller
Cluster.
Out of disk space
The Controller may be out of disk space. Use the
‘‘show status’’
show system statistics graph <datasource> : for the graphical format output
As an example, the following output shows the RRD statistics for the datasource disk_ops:write
associated with the disk sda1 on the Controller in a tabular form:
# show system statistics disk-sda1/disk_ops:write
Time Write
12:29 0.74
12:28 0.731429
12:27 0.617143
8. Edge ECMP
This post was written by Roie Ben Haim and Max Ardica, with a special thanks to Jerome Catrouillet,
Michael Haines, Tiran Efrat and Ofir Nissim for their valuable input
In this section we will describe the Equal Cost Multi-Path functionality (ECMP) introduced in
VMware NSX release 6.1 and discuss how it addresses the requirements of scalability,
redundancy and high bandwidth. ECMP has the potential to offer substantial increases in
bandwidth by load-balancing traffic over multiple paths as well as providing fault tolerance for
failed paths. This is a feature that is available on physical networks and that has also been
introduced for virtual networking as well. ECMP uses a dynamic routing protocol to learn the
next-hop towards a final destination and to converge in case of failures. For a great demo of
how this works, you can start by watching this video, which walks you through these
capabilities in VMware NSX.
https://www.youtube.com/watch?v=Tz7SQL3VA6c
To keep pace with the growing demand for bandwidth, the data center must meet scale out
requirements, which provide the capability for a business or technology to accept increased
volume without redesign of the overall infrastructure. The ultimate goal is avoiding the “rip and
replace” of the existing physical infrastructure in order to keep up with the growing demands of
the applications. Data centers running business critical applications need to achieve near 100
percent uptime. In order to achieve this goal, we need the ability to quickly recover from
failures affecting the main core components. Recovery from catastrophic events needs to be
transparent to end user experiences.
ECMP with VMware NSX 6.1 allows you to use up to a maximum of 8 ECMP Paths
simultaneously. In a specific VMware NSX deployment, those scalability and resilience
improvements are applied to the “on-ramp/off-ramp” routing function offered by the Edge
Services Gateway (ESG) functional component, which allows communication between the
logical networks and the external physical infrastructure.
External user’s traffic arriving from the physical core routers can use up to 8 different paths (E1-
E8) to reach the virtual servers (Web, App, DB).
In the same way, traffic returning from the virtual server’s hit the Distributed Logical Router
(DLR), which can choose up to 8 different paths to get to the core network.
When a traffic flow needs to be routed, the round robin algorithm is used to pick up one of the
links as the path for all traffic of this flow. The algorithm ensures to keep in order all the packets
related to this flow by sending them through the same path. Once the next-hop is selected for a
particular Source IP and Destination IP pair, the route cache stores this. Once a path has been
chosen, all packets related to this flow will follow the same path.
There is a default IPv4 route cache timeout, which is 300 seconds. If an entry is inactive for this
period of time, it is then eligible to be removed from route cache. Note that these settings can
be tuned for your environment.
The DLR will choose a path based on a Hashing algorithm of Source IP and Destination IP.
In order to work with ECMP the requirement is to use a dynamic routing protocol: OSPF or BGP.
If we take OSPF for example, the main factor influencing the traffic outage experience is the
tuning of the OSPF timers.
OSPF will send hello messages between neighbors, the OSPF “Hello” protocol is used and
determines the Interval as to how often an OSPF Hello is sent.
Another OSPF timer called “Dead” Interval is used, which is how long to wait before we
consider an OSPF neighbor as “down”. The OSPF Dead Interval is the main factor that influences
the convergence time. Dead Interval is usually 4 times the Hello Interval but the OSPF (and BGP)
timers can be set as low as 1 second (for Hello interval) and 3 seconds (for Dead interval) to
speed up the traffic recovery.
In the example above, the E1 NSX Edge has a failure; the physical routers and DLR detect E1 as
Dead at the expiration of the Dead timer and remove their OSPF neighborship with him. As a
consequence, the DLR and the physical router remove the routing table entries that originally
pointed to the specific next-hop IP address of the failed ESG.
As a result, all corresponding flows on the affected path are re-hashed through the remaining
active units. It’s important to emphasize that network traffic that was forwarded across the
non-affected paths remains unaffected.
With ECMP it’s important to have introspection and visibility tools in order to troubleshoot
optional point of failure. Let’s look at the following topology.
A user outside our Data Center would like to access the Web Server service inside the Data
Center. The user IP address is 192.168.100.86 and the web server IP address is 172.16.10.10.
This User traffic will hit the Physical Router (R1), which has established OSPF adjacencies with
E1 and E2 (the Edge devices). As a result R1 will learn how to get to the Web server from both
E1 and E2 and will get two different active paths towards 172.16.10.10. R1 will pick one of the
paths to forward the traffic to reach the Web server and will advertise the user network subnet
192.168.100.0/24 to both E1 and E2 with OSPF.
E1 and E2 are NSX for vSphere Edge devices that also establish OSPF adjacencies with the DLR.
E1 and E2 will learn how to get to the Web server via OSPF control plane communication with
the DLR.
From the DLR perspective, it acts as a default gateway for the Web server. This DLR will form an
OSPF adjacency with E1 and E2 and have 2 different OSPF routes to reach the user network.
From the DLR we can verify OSPF adjacency with E1, E2.
From this output we can see that the DLR has two Edge neighbors: 198.168.100.3 and
192.168.100.10.The next step will be to verify that ECMP is actually working.
show ip route
The output from this command shows that the DLR learned the user network 192.168.100.0/24
via two different paths, one via E1 = 192.168.10.1 and the other via E2 = 192.168.10.10.
Now we want to display all the packets that were captured by an NSX for vSphere Edge
interface.
In the example below and in order to display the traffic passing through interface vNic_1, and
which is not OSPF protocol control packets, we need to type this command:
“debug packet display interface vNic_1 not_ip_proto_ospf”
We can see an example with a ping running from host 192.168.100.86 to host 172.16.10.10
Capture traffic
If we would like to display the captured traffic to a specific ip address 172.16.10.10, the
command capture would look like: “debug packet display interface vNic_1 dst_172.16.10.10”
ECMP currently implies stateless behavior. This means that there is no support for stateful
services such as the Firewall, Load Balancing or NAT on the NSX Edge Services Gateway.
Up to NSX 6.1.1 release, the Edge Firewall and ECMP could not be turned on at the same time
on the NSX edge device. Note however, that the Distributed Firewall (DFW) is unaffected by
this.
Starting from the NSX release 6.1.2 the Edge Firewall is not disabled automatically on ESG
when ECMP is enabled. It is hence recommended to turn off Firewall when deploying the ESG
in ECMP mode, in order to avoid traffic drops caused by asymmetric routing.
In Asymmetric routing, a packet between a source and a destination traverses one path and
takes a different path when it returns to the source.
Start from version 6.1 NSX Edge can work with ECMP – Equal Cost Multipath. The deployment
of ECMP makes very likely that traffic flows in Asymmetric fashion between Edges and DLR or
between Edge and physical routers.
ECMP with Asymmetric routing is not a problem by itself, but will cause problems when more
than one NSX Edge is in place and stateful services are inserted in the path of the traffic.
Stateful services like firewall, Load Balanced Network Address Translation (NAT) can’t work
with asymmetric routing.
User from the external network tries to access a Web VM inside the Data Center. The FW is
turned on both E1 and E2, which being deployed independently are not syncing the FW state
information. The traffic will pass through E1 Edge. From E1 the traffic will go to DLR, traverses
the NSX distributed firewall and get to the Web VM.
When the Web VM replies back, the traffic will hit the DLR default gateway. DLR has two ECMP
paths to route the traffic (via E1 or E2).
If DLR chooses the path via E2 the traffic will get to E2 and will be dropped !!!
The reason for this is that E2 is not aware of the state of the session started at E1; hence,
replay packet from Red VM arrives to E2 where it cannot match any existing session for that
flow.
From E2 perspective this is a new session but any new TCP session should start with SYN, since
this is not the begin of the session E2 will drop it!!!
Note: NSX Distributed firewall is not part of this problem, NSX Distributed firewall is
implement at the vNic level, all traffic get in/out the same vNic.
There is no Asymmetric route in the vNic level, btw this is the reason when we vMotion VM,
the Firewall Rule, Connection state is moved with the VM itself.
Starting from version 6.1 when we enable ECMP on NSX Edge we get the following message:
But the BIG difference is that Firewall Service is NOT disabled by default (the message is
displayed as a consequence of a bug that is fixed in release 6.1.3).
Even if you have “Any, Any” rule with “Accept” action as default FW rule, the traffic flows may
still be dropped because of the previously explained asymmetric routing problem!!!
Even in Syslog or LogInSight you will not see this dropped packets!!!
From the point of view of the end user, some of the sessions are working just fine (this sessions
are not following an asymmetric path), other sessions are dropped (asymmetric sessions).
The place I found where we can learn packets are dropped because of the state of the session is
with the command:
show tech-support
8.9 Conclusions
When you enable ECMP and you have more then one NSX Edge in you topology, go to Firewall
service and disable it by yourself otherwise you will spend lots of hours troubleshooting
inconsistent connectivity.
The NSX Edge Cluster Connects the Logical and Physical worlds and usually hosts the NSX Edge
Services Gateways and the DLR Control VMs.
There are deployments where the Edge Cluster may contain the NSX Controllers as well.
In this section we discuss how to design an Edge Cluster to survive a failure of an ESXi host or an
Physical entire chassis and lower the time of outage.
In the figure below we deploy NSX Edges, E1 and E2, in ECMP mode where they run
active/active both from the perspective of the control and data planes. The DLR Control VMs
run active/passive while both E1 and E2 running a dynamic routing protocol with the active DLR
Control VM.
When the DLR learns a new route from E1 or E2, it will push this information to the NSX
Controller cluster. The NSX Controller will update the routing tables in the kernel of each ESXi
hosts, which are running this DLR instance.
In the scenario where the ESXi host, which contains the Edge E1, failed:
The active DLR will update the NSX Controller to remove E1 as next hop, the NSX Controller
will update the ESXi host and as a result the “Web” VM traffic will be routed to Edge E2.
The time it takes to re-route the traffic depends on the dynamic protocol converge time.
In the specific scenario where the failed ESXi or Chassis contained both the Edge E1 and the
active DLR, we would instead face a longer outage in the forwarded traffic.
The reason for this is that the active DLR is down and cannot detect the failure of the Edge E1
and accordingly update the Controller. The ESXi will continue to forward traffic to Edge E1 until
the passive DLR becomes active, learns that the Edge E1 is down and updates the NSX
Controller.
By default when we deploy a NSX Edge or DLR in active/passive mode, the system takes care of
creating a DRS anti-affinity rule and this prevents the active/passive VMs from running in the
same ESXi host.
We need to build new DRS rules as these default rules will not prevent us from getting to the
previous dual failure scenario.
The figure below describes the network logical view for our specific example. This topology is
built from two different tenants where each tenant is being represented with a different color
and has its own Edge and DLR.
Note connectivity to the physical world is not displayed in the figure below in order to simplify
the diagram.
The physical Edge Cluster has four ESXi hosts that are distributed over two physical chassis:
We start with creating a container for all the ESXi hosts in Chassis A, this container group
configured is in DRS Host Group.
Click on Create Add button and call this group “Chassis A”.
Container type need to be “Host DRS Group” and Add ESXi host running on Chassis A (esxcomp-
01a and esxcomp-02a).
Create another DRS group called Chassis B that contains esxcomp-01b and esxcomp-02b:
We need to create a container for VMs that will run in Chassis A. At this point we just name it as
Chassis A, but we are not actually putting the VMs in Chassis A.
DRS Rules:
Now we need to take the DRS object we created before: “Chassis A” and “VM to Chassis A “ and
tie them together. The next step is to do the same for “Chassis B” and “VM to Chassis B“
Click on the Add button in DRS Rules, in the name enter something like: “VM’s Should Run on
Chassis A”
In the Type select “Virtual Machine to Hosts” because we want to bind the VM’s group to the
Hosts Group.
Below the VM group selection we need to select the group & hosts binding enforcement type.
If we choose “Must” option, in the event of the failure of all the ESXi hosts in this group (for
example if Chassis A had a critical power outage), the other ESXi hosts in the cluster (Chassis B)
would not be considered by vSphere HA as a viable option for the recovery of the VMs.
“Should” option will take other ESXi hosts as recovery option.
Now the problem with the current DRS rules and the VM placement in this Edge cluster is that
the Edge and DLR Control VM are actually running in the same ESXi host. We need to create
anti-affinity DRS rules.
An Edge and DLR that belong to the same tenant should not run in the same ESXi host.
In the case of a failure of one of the ESXi hosts we don’t face the problem where Edge and DLR
are on the same ESXi host, even if we have a catastrophic event of a chassis A or B failure.
Note:
Control VM location can move to compute cluster and we can avoid this design consideration.
Thanks to Francis Guillier and Tiran Efrat of the overview and feedback.
One of the most important NSX Edge features is NAT.
With NAT (Network Address Translation) we can change the Source or Destination IP addresses
and TCP/UDP port. Combined NAT and Firewall rules can lead to confusion when we try to
determine the correct IP address to which apply the firewall rule.
To create the correct rule we need to understand the packet flow inside the NSX Edge in
details. In NSX Edge we have two different type of NAT: Source Nat (SNAT) and Destination NAT
(DNAT).
9.1 SNAT
Allows translating an internal IP address (for example private IP described in RFC 1918) to a
public External IP address.
In figure below, the IP address for any VM in VXLAN 5001 that needs outside connectivity to the
WAN can be translated to an external IP address (this mapping is configured on the Edge). For
example, VM1 with IP address 172.16.10.11 needs to communicate with WAN Internet, so the
NSX Edge can translate it to a 192.168.100.50 IP address configured on the Edge external
interface.
Users in the external network are not aware of the internal Private IP address.
9.2 DNAT
Below is the outline of the Packet flow process inside the Edge. The important parts are where
the SNAT/DNAT Action and firewall decision action are being taken.
We can see from this process that the ingress packet will evaluate against FW rules before
SNAT/DNAT translation.
Note: the actual packet flow details are more complicated with more action/decisions in Edge
flow, but the emphasis here is on the NAT and FW functionalities only.
Because of this packet flow the firewall rule for SNAT need to be applied on the internal IP
address object and not on the IP address translated by the SNAT function. For example, when a
VM1 172.16.10.11 needs to communicate with the WAN, the firewall rule needs to be:
Because of this packet flow the firewall rules for DNAT need to be applied on the public IP
address object and not on the Private IP address after the DNAT translation. When a user from
the WAN sends traffic to 192.168.100.51, this packet will be checked against this FW rule and
then the NAT will change the destination IP address to 172.16.10.11.
Users from outside need to access an internal web server connecting to its public IP address.
The server internal IP address is 172.16.100.11, the NAT IP address is 192.168.100.6.
The first step is creating the External IP on the Edge, this IP is secondary because this edge
already has a main IP address configured in the 192.168.100.0/24 IP subnet.
Now pay attention to the firewall rules one the Edge: a user coming from the outside will try to
access the internal server by connecting to the public IP address 192.168.100.6. This implies
that the fw rule needs to allow this access.
There are several ways to verify NAT is functioning as originally planned. In our example, users
from any source address access the public IP address 192.168.100.6, and after the NAT
translation the packet destination IP address is changed to 172.16.10.11.
show nat
We can see that packet is received by the Edge and destined to the 192.168.100.6 address, the
return traffic is instead originated from the different IP address 172.16.10.11 (the private IP
address). That means DNAT translation is happening here.
Capture edge on internal interface vNic_1 we can see destination IP address has changed to
172.16.10.11 because of DNAT translation:
All the servers part of VXLAN segment 5001 (associated to the IP subnet 172.16.10.0/24) need
to leverage SNAT translation (in this example to IP address 192.168.100.3) on the outside
interface of the Edge to be able to communicate with the external network.
SNAT Configuration:
Show nat
The user originates a connection to the Web Server on destination port TCP/222 but the NSX
Edge will change it to TCP/22.
In this specific scenario, we want to create the two following SNAT rules.
SNAT Rule 1:
The IP addresses for the devices part of VXLAN 5001 (associated to the IP subnet
172.16.10.0/24) need to be translated to the Edge outside interface address 192.168.100.3.
SNAT Rule 2:
Web-SRV-01a on VXLAN 5001 needs its IP address 172.16.10.4 to be translated to the Edge
outside address 192.168.100.4.
In the configuration example above, traffic will never hit rule number 4 because 172.16.10.4 is
part of subnet 172.16.10.0/24, so its IP address will be translated to 192.168.100.3 (and not the
desired 192.168.100.4).
After re-order:
The NSX load balancing service is specially designed for cloud with the following characteristics:
The load balancing services natively offered by the NSX Edge satisfies the needs of the majority
of the application deployments. This is because the NSX Edge provides a large set of
functionalities:
Support any TCP applications, including, but not limited to, LDAP, FTP, HTTP, HTTPS
Support UDP application starting from NSX SW release 6.1.
Multiple load balancing distribution algorithms available: round-robin, least
connections, source IP hash, URI
Multiple health checks: TCP, HTTP, HTTPS including content inspection
Persistence: Source IP, MSRDP, cookie, ssl session-id
Connection throttling: max connections and connections/sec
L7 manipulation, including, but not limited to, URL block, URL rewrite, content
rewrite
Optimization through support of SSL offload
Note: the NSX platform can also integrate load-balancing services offered by 3rd party vendors.
This integration is out of the scope for this paper.
In terms of deployment, the NSX Edge offers support for two types of models:
One-arm mode (called proxy mode): this scenario is highlighted in Figure below and
consists in deploying an NSX Edge directly connected to the logical network it provides load-
balancing services for.
1. The external client sends traffic to the Virtual IP address (VIP) exposed by the load
balancer.
2. The load balancer performs two address translations on the original packets
received from the client: Destination NAT (D-NAT) to replace the VIP with the IP address of
one of the servers deployed in the server farm and Source NAT (S-NAT) to replace the client
IP address with the IP address identifying the load-balancer itself. S-NAT is required to force
through the LB the return traffic from the server farm to the client.
3. The server in the server farm replies by sending the traffic to the LB (because of the
S-NAT function previously discussed).
The LB performs again a Source and Destination NAT service to send traffic to the external
client leveraging its VIP as source IP address.
The advantage of this model is that it is simpler to deploy and flexible as it allows deploying LB
services (NSX Edge appliances) directly on the logical segments where they are needed without
requiring any modification on the centralized NSX Edge providing routing communication to the
physical network. On the downside, this option requires provisioning more NSX Edge instances
and mandates the deployment of Source NAT that does not allow the servers in the DC to have
visibility into the original client IP address.
Note: the LB can insert the original IP address of the client into the HTTP header before
performing S-NAT (a function named “Insert X-Forwarded-For HTTP header”). This provides the
servers visibility into the client IP address but it is obviously limited to HTTP traffic.
Inline mode (called transparent mode) requires instead deploying the NSX Edge inline to the
traffic destined to the server farm. The way this works is shown in Figure below.
1. The external client sends traffic to the Virtual IP address (VIP) exposed by the
load balancer.
2. The load balancer (centralized NSX Edge) performs only Destination NAT (D-
NAT) to replace the VIP with the IP address of one of the servers deployed in the server
farm.
3. The server in the server farm replies to the original client IP address and the
traffic is received again by the LB since it is deployed inline (and usually as the default
gateway for the server farm).
4. The LB performs Source NAT to send traffic to the external client leveraging
its VIP as source IP address.
This deployment model is also quite simple and allows the servers to have full visibility into
the original client IP address. At the same time, it is less flexible from a design perspective
as it usually forces using the LB as default gateway for the logical segments where the
server farms are deployed and this implies that only centralized (and not distributed)
routing must be adopted for those segments. It is also important to notice that in this case
LB is another logical service added to the NSX Edge already providing routing services
between the logical and the physical networks. As a consequence, it is recommended to
increase the form factor of the NSX Edge to X-Large before enabling load-balancing services.
In terms of scalability and throughput figures, the NSX load balancing services offered by
each single NSX Edge can scale up to (best case scenario):
Throughput: 9 Gbps
Concurrent connections: 1 million
New connections per sec: 131k
Below are some deployment examples of tenants with different applications and different
load balancing needs. Notice how each of these applications is hosted on the same Cloud
with the network services offered by NSX.
The load balancing service can be fully distributed across tenants. This brings
multiple benefits:
Each tenant has its own load balancer.
Each tenant configuration change does not impact other tenants.
Load increase on one tenant load-balancer does not impact other tenants load-
balancers scale.
Each tenant load balancing service can scale up to the limits mentioned above.
The same tenant can mix its load balancing service with other network services such
as routing, firewalling, VPN.
We will add to this lab NSX Edge service gateway (ESG) for load balancer function.
The ESG (highlighted with the red line) is deployed in one-arm mode and exposes the VIP
172.16.10.10 to load-balance traffic to the Web-Tier-01 segment.
In our lab appliance size is Compact, but we should choose the right size according to amount
of traffic expected:
Configure the Edge interface and IP address; since this is one-arm mode we have only one
interface:
Enable Load Balance in the ESG, go to Load Balance and click Edit:
Add a name, in the Type select HTTPS and Enable SSL Passthrough:
In the Algorithm select ROUND-ROBIN, monitor is default https, and add two servers member
to monitor:
In this step we glue all the configuration parts, tie the application profile to pool and give it the
Virtual IP address:
Now we can check that the load balancer is actually working by connecting to the VIP address
with a client web browser.
When we try to refresh our web browser client we see we hit 172.16.10.12 web-sv-02a :
5. Pool is in transparent mode but the Edge doesn’t sit in the return path
# show log
#########################################################
#########################################################
##########################################################
+->POOL Web-Servers-Pool-01
| LB METHOD round-robin
| LB PROTOCOL L7
| Transparent disabled
| SESSION (cur, max, total) = (0, 3, 35)
| BYTES in = (17483), out = (73029)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-01a_172.16.10.11, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 2, 8)
| | BYTES in = (8882), out = (43709)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-02a_172.16.10.12, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 1, 7)
| | BYTES in = (7233), out = (29320)
####################################################################
One-Arm-LB-0> show service loadbalancer pool
———————————————————————–
Loadbalancer Pool Statistics:
POOL Web-Servers-Pool-01
| LB METHOD round-robin
| LB PROTOCOL L7
| Transparent disabled
| SESSION (cur, max, total) = (0, 3, 35)
| BYTES in = (17483), out = (73029)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-01a_172.16.10.11, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 2, 8)
| | BYTES in = (8882), out = (43709)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-02a_172.16.10.12, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 1, 7)
| | BYTES in = (7233), out = (29320)
##########################################################################
rq[f=808202h,i=0,an=00h,rx=4m53s,wx=,ax=] rp[f=008202h,i=0,an=00h,rx=4m53s,wx=,ax=]
s0=[7,8h,fd=13,ex=] s1=[7,8h,fd=14,ex=] exp=4m52s
0x5fe50a22960: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=09
age=0s calls=2 rq[f=c08200h,i=0,an=00h,rx=20s,wx=,ax=]
rp[f=008002h,i=0,an=00h,rx=,wx=,ax=] s0=[7,8h,fd=1,ex=] s1=[7,0h,fd=-1,ex=] exp=20s
———————————————————————–
From the GUI we can see the effect in the members pool status:
To fulfill the specific requirements listed above, it is possible to deploy devices performing a
“bridging” functionality that enables communication between the “virtual world” (logical
switches) and the “physical world” (non virtualized workloads and network devices connected
to traditional VLANs).
NSX offers this functionality in software through the deployment of NSX L2 Bridging allowing
VMs to be connected at layer 2 to a physical network (VXLAN to VLAN ID mapping), even if the
hypervisor running the VM is not physically connected to that L2 physical network.
Figure above shows an example of L2 bridging, where a VM connected in logical space to the
VXLAN segment 5001 needs to communicate with a physical device deployed in the same IP
subnet but connected to a physical network infrastructure (in VLAN 100). In the current NSX-v
implementation, the VXLAN-VLAN bridging configuration is part of the distributed router
configuration; the specific ESXi hosts performing the L2 bridging functionality is hence the one
where the control VM for that distributed router is running. In case of failure of that ESXi host,
the ESXi hosting the standby Control VM (which gets activated once it detects the failure of the
Active one) would take the L2 bridging function.
Independently from the specific implementation details, below are some important
deployment considerations for the NSX L2 bridging functionality:
The VXLAN-VLAN mapping is always performed in 1:1 fashion. This means traffic for
a given VXLAN can only be bridged to a specific VLAN, and vice versa.
A given bridge instance (for a specific VXLAN-VLAN pair) is always active only on a
specific ESXi host.
However, through configuration it is possible to create multiple bridges instances
(for different VXLAN-VLAN pairs) and ensure they are spread across separate ESXi hosts.
This improves the overall scalability of the L2 bridging function.
The NSX Layer 2 bridging data path is entirely performed in the ESXi kernel, and not
in user space. Once again, the Control VM is only used to determine the ESXi host where
a given bridging instance is active, and not to perform the bridging function.
In this scenario we would like to Bridge Between App VM connected to VXLAN 5002 to virtual
machine connected to VLAN 100.
Bridging configuration is done at the DLR level. In this specific example, the DLR name is
Distributed-Router:
Now VM on Logical Switch App-Tier-01 can communicate with Physical or virtual machine on
VLAN 100.
Currently in NSX-V 6.1 we can’t enable routing on the VXLAN logical switch that is bridged to a
VLAN.
In other words, the default gateway for devices connected to the VLAN can’t be configured on
the distributed logical router:
The big difference is VXLAN 5002 is no longer connected to the DLR LIF, but it is connected
instead to the NSX Edge.
DLR Control VM can work in high availability mode, if the Active DLR control VM fails, the
standby Control VM takes over, which means the Bridge instance will move to a new ESXi host
location.
Most issues I ran into was that the bridged VLAN was missing on the trunk interface configured
on the physical switch.
Physical server is connected to VLAN 100, App VM connected to VXLAN 5002 in esx-01b.
Active DLR control VM is located at esx-02a, so the bridging function will be active in this
ESXi host.
Both ESXi hosts have two physical nics: vmnic2 and vmnic3.
Transport VLAN carries all VNI (VXLAN’s) traffic and is forwarded on the physical switch in
VLAN 20.
On physical switch-2 port E1/1 we must configure trunk port and allow both VLAN 100 and
VLAN 20.
Note: Port E1/1 will carry both VXLAN and VLAN traffic.
We need to know where the Active DLR Control VM is located (if we have HA). Inside this ESXi
host the Bridging happens in kernel space. The easy way to find it is to look at “Configuration”
section in the “Manage” tab.
Note: When we powered off the DLR Control VM (if HA is not enabled), the bridging function on
this ESXi host will stop to prevent loop.
W
e can see that Control VM located in esx-02a.corp.local
SSH to this ESXi host, find the Vdr Name of the DLR Control VM:
11.5.1 ~ # xxx-xxx -I -l
##############################################################################
##############################################################################
From this output we can see there is no any MAC address learning.
After connecting a VM to Logical Switch App-Tier-01 and pinging a VM in VLAN 100, we can see
MAC addresses from both VXLAN 5002 and VLAN100 segments:
General info:
One VDS for Edge and Compute workloads.
We have two physical link on each esxi host.
Two VTEP with Source_Port as teaming mode.
All VLANs are trunked to all physical links.
Topology:
VM1 located on compute Cluster.
VM3+Active Control VM on MNG/Edge Cluster.
VM3 MNG/Edge Cluster.
Not Work.
Bridging is not working if the VM located on VLAN 100 is not on the same host with the Active Control
VM.
From net-vdr command, in all test Bridge can see all mac of VM1,VM2,VM3
When its not working With pktcap-uw I can see ARP send out from VM1 to VM2 on:
1. ESXi1
2. ESX3 on vmnic0,vminc1 (VLAN Traffic)
3. ESXi2 on vmkic0,vmnic1 (VLAN traffic), but ARP is not received on VM2 vDS port-ID or inside
windows with wireshark.
Solution:
HP Network loop protection interacts with NSX Bridging, blocking the traffic.
Depending on the role of the VC Ethernet port, VC can use several loop avoidance mechanisms. A
VC Ethernet port can be an uplink, a downlink, or a stacking link. VC Ethernet uplink ports connect to
external LAN switches. VC Ethernet downlink ports connect to server NIC ports. VC Ethernet stacking
Link ports connect to other VC Ethernet modules.
If you Disable HP loop protection NSX Bridge starts working.
Working on daily tasks with firewalls can sometimes lead to a situation where you end up
blocking access to the management of your firewall.
This situation is very challenging, regardless of the vendor you are working with.
The end result of this scenario is that you are unable to access the firewall management to
remove the rules that are blocking you from reaching the firewall management!
Think of a situation where you deploy a distributed firewall into each of your ESX hosts in a
cluster, including the management cluster where you have your vCenter server located.
And then you change the Default Rule from the default “Allow” value to “Block” (as shown
below):
Let me show you an example of what you’ve done by implementing this rule:
Like the poor guy above blocking himself from his tree, by implementing this rule, you have
blocked yourself from managing your vCenter.
Put your vCenter (and other critical virtual machines) in an exclusion list.
Any VM on that list will not receive any distributed firewall rules.
Go to the Network & security tab Click on NSX Manager
Exclusion VM list 1
Click on Manage:
Go in the “Exclusion List” tab and click on the green plus button.
That’s it! Now your VC is excluded from any enforced firewall rules.
Exclusion VM list 6
12.3 What if we made a mistake and do not yet have access to the
VC?
We can use the NSX Manager REST API to revert to the default firewall ruleset.
By default the NSX Manager is automatically excluded from DFW, so it is always possible to
send API calls to it.
Using a REST Client or cURL:
https://addons.mozilla.org/en-US/firefox/addon/restclient
https://$nsxmgr/api/4.0/firewall/globalroot-0/config
After receiving code status 204 we will revert to the default DFW policy with default rule set to
allow.
Now we can access our VC! As we can see we revert to the default policy, but don’t panic ,
we have saved the policy.
We will need to change the last Rule from Block to Allow to fix the problem.
Client PC NSX Manager 443 TCP NSX Manager Admin Interface HTTPS
REST Client NSX Manager 443 TCP NSX Manager REST API HTTPS
REST Client NSX Controller 443 TCP NSX Controller REST API HTTPS
NSX Manager vCenter Server 443 TCP vSphere Web Access HTTPS
NSX Manager vCenter Server 902 TCP vSphere Web Access VMware
Internal
NSX Manager ESXi Host 443 TCP Management and provisioning connection HTTPS
NSX Manager ESXi Host 902 TCP Management and provisioning connection VMware
Internal
NSX Manager Distributed Firewall 443 TCP Management and provisioning connection HTTPS
NSX Manager Distributed Firewall 902 TCP Management and provisioning connection VMware
Internal
NSX Controller ESXi Host 8672 TCP User World Agent connection VMware
Internal
VXLAN Termination VXLAN Termination 8472 UDP Transport Network encapsulation between VXLAN
End Point (VTEP) End Point (VTEP) VTEP end points
NSX Manager & NTP Time Server 123 TCP/UDP NTP client connection NTP
NSX Controller
NSX Controlller NSX Controller 2878, 2888, TCP State Sync between controllers Zookeeper
3888
Some of the steps here can and should be done via NSX GUI, vRealize Operations Manager
6.0 and vRealize Log Insight, so see it like education post.
There are lots of CLI commands in this post :-). To view the output of CLI command you can scroll right.
5. Verify NSX control plan from ESXi hosts and NSX Controllers.
^C
VMs: web-sv-01a and web-sv-02a reside in different compute resource esxcomp-01a and esxcomp-
02a respectively.
From esxcomp-01a run the command esxtop then press "n" (Network):
esxcomp -01a # esxtop
PORT-ID USED -BY TEAM-PNIC DNAME PKTTX/s MbTX/s PKTRX/s MbRX/s %DRPTX %DRPRX
33554433 Management n/a vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00
50331649 Management n/a DvsPortset -0 0.00 0.00 0.00 0.00 0.00 0.00
50331651 Shadow of vmnic0 n/a DvsPortset -0 0.00 0.00 0.00 0.00 0.00 0.00
50331652 vmk0 vmnic0 DvsPortset -0 5.87 0.01 1.76 0.00 0.00 0.00
50331653 vmk1 vmnic0 DvsPortset -0 0.59 0.01 0.98 0.00 0.0 0 0.00
50331654 vmk2 vmnic0 DvsPortset -0 0.00 0.00 0.39 0.00 0.00 0.00
50331655 vmk3 vmnic0 DvsPortset -0 0.20 0.00 0.39 0.00 0.00 0.00
50331656 35669:db -sv-01a.eth0 vmnic0 DvsPortset -0 0.00 0.00 0.00 0.00 0.00 0.00
50331657 35888:web-sv-01a.eth vmnic0 DvsPortset -0 4.89 0.01 3.72 0.01 0.00 0.00
50331658 vdr -vdrPort vmnic0 DvsPortset -0 2.15 0.00 0.00 0.00 0.00 0.00
In line 12 we can see that "web-sv-01a.eth0" is shown, another imported information is its "Port-ID".
The "Port-ID" is the unique identifier for each virtual switch port, in our example web-sv-01a.eth0 has
Port-ID "50331657".
Find the vDS name:
es xc omp- 01a # es xc l i net wor k vs wi t ch dvs vmwar e v x l an l i st
VDS I D VDS Name MTU Segment I D Gat eway I P Gat eway MAC Net wor k Count Vmk ni c Count
3b bf 0e 50 73 dc 49 d8- 2e b0 df 20 91 e4 0b bd Comput e_VDS 1 600 192. 168. 250. 0 192. 168. 250. 2 00: 50: 56: 09: 46: 07 4 1
50331657 68 0
50331658 vdrPort 0
From Line 4 we see that we have a VM connected to VXLAN 5001 to port ID 50331657, which is the
same port ID of VM web-sv-01a.eth0.
Verification in esxcomp-01b:
PORT-ID USED-BY TEAM -PNIC DNAME PKTTX/s MbTX/s PKTRX/s MbRX/s %DRPTX %DRPRX
33554433 Management n/a vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00
50331649 Management n/a DvsPortset -0 0.00 0.00 0.00 0.00 0.00 0.00
50331651 Shadow of vmnic0 n/a DvsPortset -0 0.00 0.00 0.00 0.00 0.00 0.00
50331652 vmk0 vmnic0 DvsPortset -0 2.77 0.00 1.19 0.00 0.00 0.00
50331653 vmk1 vmnic0 DvsPortset -0 0.59 0.00 0.40 0.00 0.00 0.00
50331654 vmk2 vmnic0 DvsPortset -0 0.00 0.00 0.00 0.00 0.00 0.00
50331655 vmk3 vmnic0 DvsPortset -0 0.00 0.00 0.00 0.00 0.00 0.00
50331656 35663:web-sv-02a.eth vmnic0 DvsPortset -0 3.96 0.01 3.57 0.01 0.00 0.00
50331657 vdr -vdrPort vmnic0 DvsPortset -0 2.18 0.00 0.00 0.00 0.00 0.00
50331656 69 0
50331657 vdrPort 0
At this point we have verified that the VMs are located in the right ESXi hosts as shown in the topology
diagram. It is now time to start with the actual TSHOT steps.
The easy way to find out is by pinging from the VTEP IP address in esxcomp-01a to the VTEP
in esxcomp-01b>. Before doing that, let's find out the VTEP IP addresses.
es x c omp- 01a # es x c f g- v mk ni c - l
I nt er f ac e Por t Gr oup/ DVPor t I P Fami l y I P Addr es s Net mas k Br oa dc as t MAC Addr es s MTU TSO MSS Ena bl ed Ty pe
v mk 0 16 I Pv 4 192. 168. 21 0. 51 255. 255. 255 . 0 192. 168. 210. 255 00: 50: 56: 09: 08: 3e 1500 65535 t r ue S TATI C
v mk 1 26 I Pv 4 10. 20. 20. 5 1 255. 255. 255 . 0 10. 2 0. 20. 255 00: 50 : 56: 69: 80: 0f 1500 65535 t r ue STATI C
v mk 2 35 I Pv 4 10. 20. 30. 5 1 255. 255. 255 . 0 10. 2 0. 30. 255 00: 50: 56: 64: 70: 9f 1500 65535 t r ue STATI C
v mk 3 44 I Pv 4 192. 168. 25 0. 51 255. 255. 255 . 0 192. 168. 250. 255 00: 50: 56: 66: e2: ef 1600 65535 t r ue STATI C
From Line 6 we can tell that VTEP IP address for vmk3 (MTU is 1600) is 192.168.250.51.
Another command to find out the VTEP IP address is:
es xc omp- 01a # es xc l i net wor k vs wi t ch dvs vmwar e v x l an vmkni c l i st - - vds - name=Comput e_VDS
Vmkni c Name Swi t c h Por t ID VDS Por t I D Endpoi nt I D VLAN I D IP Net mask I P Acqui r e Ti meout Mul t i c ast Gr oup Count Segment I D
----------- -------------- ----------- ----------- ------- -------------- ------------- ------------------ --------------------- -------------
v mk3 50331655 44 0 0 192. 168. 250. 51 255. 255. 255. 0 0 0 192. 168. 250. 0
Vmkni c Name Swi t c h Por t ID VDS Por t I D Endpoi nt I D VLAN I D IP Net mask I P Acqui r e Ti meout Mul t i c ast Gr oup Count Segment I D
----------- -------------- ----------- ----------- ------- -------------- ------------- ------------------ --------------------- -------------
v mk3 50331655 46 0 0 192. 168. 250. 53 255. 255. 255. 0 0 0 192. 168. 250. 0
The VTEP IP for esxcomp-01b is 192.168.250.53. Now let's add this info to our topology.
From esxcomp-01b:
esxcomp-01b # esxcli network ip route ipv4 list -N vxlan
The two ESXi hosts in this example have VTEP IP addresses part of the same L2 segment, so they both
have the same default gateway. But with this command we can verify routing for VTEP’s in different
subnets
Source ping will be from VXLAN IP stack with packet size of 1570 and don't fragment bit set to 1.
Ping is successfully.
If ping with "-d" doesn't work and without "-d" work it is a clue of an MTU problem. Check
for MTU in the physical switches.
Because the VTEPs on different ESXi hosts in this example are part of the same L2 domain, we can view
the ARP entries for others VTEP's:
From esxcomp-01a:
esxcomp-01a # esxcli network ip neighbor list -N vxlan
The process responsible to communicate with the NSX Controller is called netcpad.
ESXi host using VMkernel Management interface to create this secure channel over TCP/1234, traffic is
encrypted with SSL.
Routing: Routes learn from the DLR Control VM (explained in the next post).
Based on this information the Controller learns the network state and builds directory services.
To learn how the Controller Cluster works and how fix problem in the cluster itself, please refer to the
NSX Controller Cluster Troubleshooting section.
For two VMs to be able to talk to each other we need to ensure that the NSX control plane is working fine.
In this lab we have 3 NSX Controller nodes.
Verification commands need to be issued from both ESXi and Controllers sides.
tcp 0 0 192.168.210 .51:54153 192 .168.110.202:1 234 ESTABLISHED 3518 5 newreno ne tcpa-worker
tcp 0 0 192.168.210 .51:34656 192 .168.110.203:1 234 ESTABLISHED 3451 9 newreno ne tcpa-worker
tcp 0 0 192.168.210 .51:41342 192 .168.110.201:1 234 ESTABLISHED 3451 9 newreno ne tcpa -worker
tcp 0 0 192.168.210 .56:16580 192 .168.110.202:1 234 ESTABLISHED 3451 7 newreno ne tcpa -worker
tcp 0 0 192.168.210 .56:49434 192 .168.110.203:1 234 ESTABLISHED 3467 8 newreno ne tcpa-worker
tcp 0 0 192.168.210 .56:12358 192 .168.110.201:1 234 ESTABLISHED 3451 6 newreno ne tcpa-worker
2. If you have a firewall between ESXi hosts and the NSX Controller nodes, TCP/1234 need to be open.
Verify again:
esxcomp-01a # /etc/init.d/netcpad status
Verify that the control plane is enabled in esxcomp-01a and the connection is in up state for VXLAN
5001:
es xc omp- 01a # es xc l i net wor k vs wi t ch dvs vmwar e v x l an net wor k l i s t - - v ds - name Comput e_VDS
VXLAN I D Mul t i c as t I P Cont r ol Pl ane Cont r ol l er Connect i on Por t Count MAC Ent r y Count ARP Ent r y Count
5003 N/ A ( headend r epl i c at i on) Enabl ed ( mul t i c as t pr oxy , ARP pr ox y) 192. 168. 110. 202 ( up) 2 0 0
5001 N/ A ( headend r epl i c at i on) Enabl ed ( mul t i c as t pr oxy , ARP pr ox y) 192. 168. 110. 201 ( up) 2 3 0
5000 N/ A ( headend r epl i c at i on) Enabl ed ( mul t i c as t pr oxy , ARP pr ox y) 192. 168. 110. 202 ( up) 1 3 0
5002 N/ A ( headend r epl i c at i on) Enabl ed ( mul t i c as t pr oxy , ARP pr ox y) 192. 168. 110. 203 ( up) 1 2 0
Verify that the control plane is enabled in esxcomp-01b and connection is in up state for VXLAN
5001:
es xc omp- 01b # es xc l i net wor k vs wi t ch dvs vmwar e v x l an net wor k l i s t - - v ds - name Comput e_VDS
VXLAN I D Mul t i c as t I P Cont r ol Pl ane Cont r ol l er Connect i on Por t Count MAC Ent r y Count ARP Ent r y Count
5001 N/ A ( headend r epl i c at i on) Enabl ed ( mul t i c as t pr oxy , ARP pr ox y) 192. 168. 110. 201 ( up) 2 3 0
5000 N/ A ( headend r epl i c at i on) Enabl ed ( mul t i c as t pr oxy , ARP pr ox y) 192. 168. 110. 202 ( up) 1 0 0
5002 N/ A ( headend r epl i c at i on) Enabl ed ( mul t i c as t pr oxy , ARP pr ox y) 192. 168. 110. 203 ( up) 1 2 0
5003 N/ A ( headend r epl i c at i on) Enabl ed ( mul t i c as t pr oxy , ARP pr ox y) 192. 168. 110. 202 ( up) 1 0 0
Check that esxcomp-01a can learn ARP information of remote VMs part of VXLAN 5001:
esxcomp-01a # esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --
vxlan-id=5001
IP MAC Flags
From this output we can understand that esxcomp-01a has learned the ARP info of web-sv-02a
Check that esxcomp-01b can learn ARP information for remote VMs part of VXLAN 5001:
esxcomp-01b # esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --
vxlan-id=5001
IP MAC Flags
From this output we can understand that esxcomp-01b has learned the ARP info of web-sv-01a
esxcomp-01a:
© 2013 VMware, Inc. All rights reserved.
Page 158 of 208
NSX-v Hands-on Guide
Knows web-sv-01a VM is running in VXLAN 5001, its IP address 172.16.10.11 and its MAC address
00:50:56:a6:7a:a2.
esxcomp-01b:
Knows web-sv-01b VM is running in VXLAN 5001, its IP address 172.16.10.12 and its MAC address
00:50:56:a6:a1:e3.
To answer this question we need to find out first the answer to another question: what does the NSX
controller know?
Find out who is managing VXLAN 5001, SSH to one of the NSX Controller nodes, for
example 192.168.110.202:
nsx-controller # show control-cluster logical-switches vni 5001
Line 3 say that 192.168.110.201 is managing VXLAN 5001, so the next commands will be run
from 192.168.110.201:
nsx-controller # show control-cluster logical-switches vni 5001
From this output we learn that VXLAN 5001 have 4 VTEPs connected to him and total of 6 active
connection.
At this point i would like to point you to an excellent blog page with lots of information on what is
happening under the hood in NSX.
The author’s name is Dmitri Kalintsev and the link to his blog: NSX for vSphere: Controller “Connections”
and “VTEPs”
From Dmitri’s post:
1. When a VM running on that host connects to VNI’s dvPg and its vNIC transitions into “Link Up”
state; and
2. When DLR kernel module on that host needs to route traffic to a VM on that VNI that’s running
on a different host."
We are not routing traffic between VMs, so the DLR is not part of the game here.
From this output we can learn that both VTEP's esxcomp-01a line 5 and esxcomp-01b line 3 are seen
by NSX Controller on VXLAN 5001.
The MAC address output in this command refers to the VTEPs’ MAC.
Find out what the MAC addresses associated to VMs that have been learned by the NSX Controller:
nsx-controller # show control-cluster logical-switches mac-table 5001
Find out the ARP entries associated to the VMs that have been learned by the NSX Controller:
nsx-controller # show control-cluster logical-switches arp-table 5001
To understand how the Controller has learned this info read my post NSX-V IP Discovery. In case where
ARP entry are not learn restarting the netcpad process can fix this issue.
esxcomp-01a # /etc/init.d/netcpad restart
NSX Controller knows where the VMs are located, their IP and MAC addresses. It's seem like the control
plan is working just fine.
Before starting to capture all over the place, lets try to narrow down where we think the problem could be.
When a VM connects to a Logical switch there are few security services that packets originated from the
VM must traverse, each service represented with a different slot id.
SLOT 1: Switch Security module (swsec) captures DHCP Ack and ARP message to learn the VM IP
address. This info is then forwarded to the NSX Controller cluster.
We need to check if VM traffic is successfully transmitted through the NSX Distributed firewall, which
mean through slot 2.
The capture command will need to SLOT 2 filter name for Web-sv-01a
From esxcomp-01a:
esxcomp-01a # summarize-dvfilter
~~~snip~~~~
vNic slot 2
name: nic-35888-eth0-vmware-sfw.2
agentName: vmware-sfw
vmState: Detached
failurePolicy: failClosed
slowPathID: none
vNic slot 1
name: nic-35888-eth0-dvfilter-generic-vmware-swsec.1
agentName: dvfilter-generic-vmware-swsec
vmState: Detached
failurePolicy: failClosed
slowPathID: none
We can see in line 4 that the VM name is web-sv-01a, in line 5 that filter applied at slot 2 and in
line 6 we have the filter name: nic-35888-eth0-vmware-sfw.2
Local CID 2
Destroying session 25
From he output of this command (line 12) we can tell that ICMP packets are not passing this filters
because we have 0 Dumped packet.
We found our smoking gun :-)
Local CID 2
Now we can see at line 6 that we have 6 Dumped packet. we can open web-
sv01a_before.pcap captured file:
esxcomp-01a # tcpdump-uw -r web-sv-01a_before.pcap
20:15:31.389158 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18628, length 64
20:15:32.397225 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18629, length 64
20:15:33.405253 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18630, length 64
20:15:34.413356 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18631, length 64
20:15:35.421284 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18632, length 64
20:15:36.429219 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18633, length 64
Looking back on this paragraph, we skipped intentionally step 3 "Configuration issue". Had we checked
upfront the configuration settings, we would probably have immediately noticed this problem.
esxcfg-vmknic -l
esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
/etc/init.d/netcpad (status|start|)
One of the most challenging problems in managing large networks is the complexity of security
administration. “Role-based access control (RBAC) is a method of regulating access to computer
or network resources based on the roles of individual users within an enterprise. In this context,
access is the ability of an individual user to perform a specific task, such as view, create, or
modify a file. Roles are defined according to job competency, authority, and responsibility
within the enterprise”
Within NSX we have four built in roles, and we can map Users or Groups to one of those NSX
Roles. Instead of assigning roles to individual users, the preferred way is to assigning roles to
groups.
Organizations create user groups for proper user management. After integration with SSO, NSX
Manager can get the details of groups to which a user belongs.
Within NSX Manager we have four pre built RBAC roles covering different NSX permissions and
areas in the NSX environment.
The four NSX built in roles are: Auditor, Security Administrator, NSX administrator and
Enterprise Administrator:
Whenever we want to assign a role in NSX, we can assign a role to SSO User or Group. When
Lookup service is not configured then the group based role assignment would not work (i.e the
user from that group would not be able to login to NSX).
The reason is we cannot fetch any group information from the SSO server. The group based
authentication provider is only available when Lookup service is configured. User login where
the user is explicitly assigned role on NSX will not be affected. This means that the customer has
to individually assign roles to the users and would not be able to take advantage of SSO groups.
For NSX, vCenter SSO server is one of the identity providers for authentication. For
authentication on NSX, prerequisite is that the user/group has to be assigned role on NSX.
Note: NTP/DNS must be configured on the NSX Manager for lookup service to work.
Note: The domain account must have AD read permission for all objects in the domain tree.
The event log reader account must have read permissions for security event logs.
In this example, I will use Microsoft Active directory as user Identity source. In “Active
Directory Users and Computers” I created four different groups. The groups will have the same
name as the NSX roles to make life easier: Auditor, Security Administrator, NSX Administrator,
Enterprise Administrator.
We created four AD users and added each user to a different AD group. For example, the
nsxadmin user is associated with the group NSX Administrator. This association is done by
clicking on the Add button:
In the same way, it is possible to associate others users to the defined AD groups:
username: AD groups:
Go to “Network & Security” tab and double click on the “NSX Manager”
Note: Configure Domain is not needed for RBAC, only if we want to use identity firewall rules
based on user or group information.
Fill Name and NetBIOS name fields with appropriate information of your Domain Name and
NetBIOS name:
Enter LDAP (i.e AD) IP address or hostname and domain account (username and password):
Click on next. NSX Manager will try to connect to LDAP (i,e AD) server using the above info. If
result is successful, the screenshot on next page will appear.
This configuration allows the NSX Manager to read Active Directory “Security Event Log”; this
log contains information about the users that logon/logoff from the domain. NSX can use this
information to improve user identity firewall rules.
Now we can map the Active Directory groups to the pre-built NSX Manager roles.
Here we can select if we want to map specific AD user to NSX Role or AD Group to Role.
In this example we use AD group, so we create an AD group called auditor. The format to input
here is:
“group_name”@domain.name. Let’s start with auditor group, this group is “Read Only”
permission:
Select one of the NSX Role, for Auditor AD group we chose Auditor
We can limit the scope of this group to a specific object (port group, datacenter, NSX Edge), in
this example no restrictions are applied:
The login was successful but where is the “Network & Security” tab gone ?
So far we have configured all NSX Manager section but we didn’t take care of the vCenter
Configuration permission for that group. Confusing?
vCenter has its own Role for each group. We need to configure roles to etch AD group we
configured. These settings determine what the user can do in the vCenter environment.
Let’s start by configuring the Auditor Role for Auditor AD group. We know this group is for
“Read Only” in the NSX Manager, so it will make sense to give this group “Read Only”
permission also for the vCenter environment.
Go to vCenter -> Manage -> Permissions and click the green button:
We need to choose Roles from the Assigned Role, if we select No-Access we will not be able to
login to vCenter. So we need to choose something from “Read-Only” to “Administrator”
Select “Read Only” from the Assigned Role drop down list and click on the “Add” button from
“User and Group”:
From the Domain Select your Domain name, in our lab the domain is “CORP”, choose your
Active Directory group from the list (Auditor for this example) and click the “Add” button:
We can verify that auditor1 can’t change any other vCenter configuration:
Test now the secadmin user mapped to “NSX Security” role; this user cannot make any NSX
infrastructure related change like adding a new NSX Controller node:
When logging as the nsxadmin user mapped to the NSX Administrator Role, we can see that the
user can add new Controller nodes:
But the nsxadmin user cannot change or see any firewall rules configured:
The user will gain combined permission access from both the groups.
For example: if a user is member of the “Auditor” and “NSX Security” groups, the result will be
that the user has read only permission on all NSX infrastructure tasks but also gains access to all
security related areas in NSX.
Summary
In this section we demonstrated the NSX Manager different roles. We configured Microsoft
Active Directory as External database source for user’s identity.
During November I had the opportunity to take NSX Advance bootcamp with one of brilliant
PSO Architect in the NSX field, Kevin Barrass This section was based on Kevin’s lecture, I added
screenshots and my experience.
Upgrading NSX can be very easy if planned right, or very frustrating if we try to do shortcuts in
the process. In this section I will try to document all the steps needed for completing an NSX-v
upgrade.
Before starting the upgrade procedure, pre upgrade steps must be taken under consideration:
How many times you face issue during the upgrade process, waste hours of troubleshooting,
sure you work exactly as guided, open support ticket and get the answer: you are hitting a
known upgrade issue and the workaround is writtent in the release notes. RTFM, feeling
dummy…?
Download any of your favorite MD5 tools, I’m using free winMd5Sum
Compare MD5 sum you get from Calculate against VMware official MD5 web site.
http://www.nullriver.com/
Again this line came from the field, the scenario is you complete the upgrade process and now
facing issue. How do we you know if the issue wasn’t there before you started the upgrade?
Do not assume everything is working before you start to touch the infrastructure, check it!!!
Note current versions of NSX Manager, vCenter, ESXi and Edges Verify you can log into:
NSX Manager Web UI
vCenter and see NSX Manager in Plugin
ESG, DLR control VM’s
Check DRS is enabled on clusters, Validate vMotion functions correctly, Check host connection
state with vCenter
During an NSX upgrade in some situation, a cluster with 2 hosts or less can cause issues with
DRS/Admission control/Anti-Affinity rules. My recommendation to be successful with the
upgrade process is trying to work with 3 hosts in etch cluster you plan to upgrade.
Starting from 6.0.4 we have a special API call to take snapshot of controller.
https://NSXManagerIPAddress/api/2.0/vdn/controller/controllerID/snapshot
Some browser may remove the gz extension, if the file looks like:
VMware-NSX-Manager-upgrade-bundle-6.1.0-X.X.gz
Change it to:
VMware-NSX-Manager-upgrade-bundle-6.1.0-2107742.tar.gz
Otherwise you will get an error after completing the uploade of the OVA file to NSX Manager:
To begin the NSX Manager upgrade, open the NSX Manager web interface and select the
Upgrade section:
Note: NSX Manager will reboot during the upgrade process, but the forwarding path of VM
workloads will not be affected during this step unless:
We are using user identity with distributed firewall and new user login while the NSX Manager
is down.
The upgrade process is built on two steps: validate the tar.gz image and start the actual
upgrade process:
When NSX Manager finishes the validation process, the upgrade process can start:
After completing the upgrade, confirm the version from the Summary Tab of the NSX Manager
Web UI:
During the upgrade of the NSX Controller cluster, the upgrade file is downloaded to each node;
the process will then start to upgrade node1, then node2 and at the end node3.
During the upgrade of the NSX Controller cluster we will face this state:
Node1: completed the upgrade (to release 6.1 in this specific example)
Node2: Is rebooting
Node3: In Normal state but still running the old version 6.0.0.
Results: we have only one node active running 6.1, and this node has lost the cluster majority
because of the version mismatch with the other running controller node. As a consequence the
Controller cluster is down in this situation.
While the Controller cluster is down, it is possible to have issues when performing live
migration of VMs between ESXi hosts (vMotion). This is the case if a VM is migrated to a specific
ESXi hosts and it is the first VM on that host part of a given VXLAN segment; the other VTEPs
won’t in fact be able to receive this information from the Controller (since it is down), which
implies multi-destination traffic (as ARP requests, for example) originated by VMs part of that
VXLAN segment will never be sent to ESXi host the VM moved to. This issue may be
exacerbated in virtual environments leveraging DRS, as vMotion could happen dynamically and
without the user intervention.
To limit the impact of this issue, my recommendation is to change the DRS setting to manual to
control the occurrences of vMotion events during the NSX Controller update process!!
Note: After completing the controller upgrade, change it back to the previous configuration.
Another issue may occur if the DLR Control VM gets a dynamic routing update because of a
topology change (for example a new route is added or removed); in this case, this information
cannot be communicated to the NSX Controller cluster, which implies that the kernel
forwarding tables on the ESXi hosts cannot be updated. The update will then be performed as
soon as the Controller cluster is activated.
Back to our example, when controller node-2 completes its rebooting process, we get two
controllers upgraded and running the same 6.1 version. At that point we gain back cluster
majority (and the Controller cluster gets reactivated), even if controller node-3 still needs to
finish its upgrade and rebooting process.
When all tree controller nodes completed the rebooting the cluster is fully upgraded and
functional.
During the upgrade of the NSX clusters, the ESXi host requires a reboot, but there will no
impact on data plane for VMs because they will be automatically evacuated to other hosts with
DRS.
If DRS were disabled, the vSphere admin will need to manually move the VM’s and then reboot
this ESXi host.
This is the main reason why admission control with only 2 hosts part of the cluster may prevent
automatic host upgrade. My recommendation is to avoid 2 host clusters, or manually evacuate
a host and put it into maintenance mode.
If you have manually created an anti-affinity rule for the Controllers nodes (in the current NSX
release this is not performed automatically, as it is instead the case for example for the DLR
Control VMs), using a cluster with 3 hosts will prevent the upgrade.
To solve the problem, disable this anti-affinity rule by unchecking “Enable rule” for automatic
hosts upgrade and enable it back after the upgrade is completed.
With default anti-affinity rules for Edges/DLR, 2 hosts will prevent the upgrade. Uncheck the
“Enable rule” anti-affinity rules for Edges to allow automatic hosts upgrade. Enable it after the
upgrade is completed.
If an upgrade is available for a Cluster, an “Update” link is available in the NSX UI. When the
upgrade is initiated, NSX Manager updates the NSX VIBs on each ESXi host.
Task view will reveal what happens while the upgrade process is running:
This process can affect the forwarding plan, we can minimize the traffic outage by deploying
multiple Edges working in ECMP mode.
If an upgrade is available to the Guest Introspection / Data Security an upgrade link is available
in the NSX UI.
Follow NSX installation guide for specific details on upgrading Guest Introspection / Data
Security.
The previous NSX Manager backup is only valid for the previous release
Microsoft NLB can work with two different mode Unicast or Multicast.
With Unicast Mode NSX and Microsoft NLB is working and VM in same/different VXLAN was able to ping
the NLB IP address.
With Multicast mode NSX-v and will not work without statically set ARP entry.
I found that the VDR will not update the NLB multicast mac address 03:xx:xx:xx:xx.
This is because of RFC 1812: RFC 1812 - Requirements for IP Version 4 Routers
"A router MUST not believe any ARP reply that claims that the Link Layer address of another host or
router is a broadcast or multicast address"
Also read this VMware KB:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10
06558
Solution:
configure static ARP in the DLR with the command:
net-vdr –nbr –a –i dstIp –m destMac –n lifname vdrName
dstIp is the NLB VIP IP address.
desMac is the NLB multicast MAC address.