Choosing An NGINX Plus Load-Balancing

sTechnique
Looking to load-balance in NGINX Plus? Here are several techniques to choose from,
including Round Robin, IP Hash, and Hash.
We�ve written a lot about how you can use NGINX Plus and the open source NGINX
software to load balance your websites and apps for optimal availability and
reliability. is a fundamental tool for boosting app performance, delivering apps at
scale, and deploying containers and microservices.
We�ve previously explained how you can deploy NGINX Plus in the data center
(perhaps alongside legacy application delivery controllers), in containers, and in
cloud environments, includingAmazon Web Services, the Google Cloud Platform, and
Microsoft Azure.
In this post, we�ll focus on the load-balancing techniques (also called load-
balancing methods oralgorithms) in NGINX Plus and NGINX, offering some advice on
how to choose the right method for different use cases. NGINX provides four load-
balancing techniques (Round Robin, Hash, IP Hash, andLeast Connections), and NGINX
Plus adds one more (Least Time). All methods except IP Hash are available for TCP
as well as HTTP traffic.
Reviewing the Load-Balancing Techniques

We�ll assume you know the basics of how to configure load balancing, but you can
check out these resources if you�d like a refresher:
Load Balancing in the NGINX Plus Admin Guide provides a complete overview.
Application Load Balancing with NGINX Plus links to detailed discussions of
enhanced features in NGINX Plus that can improve the efficiency of a load-balancing
method even more.
Load Balancing with NGINX and NGINX Plus, Part 1 and Part 2 is a walk-through that
builds a simple reverse proxy into a comprehensive load-balancing solution with
NGINX Plus� enhanced features.
The Load Balancing Solutions page links to blogs, webinars, and whitepapers.
For simplicity�s sake, we�ll focus on HTTP load balancing, which you configure in
the http context. TCP load balancing is configured in the stream context instead.
Although the HTTP and TCP load balancers have feature parity, the available
directives and parameters differ somewhat because of inherent differences between
the protocols; for details, see the documentation about the Upstream modules for
HTTP and TCP.
You enable load balancing with two configuration blocks, which we�ll show in their
basic form, without optional parameters or any auxiliary features:
The server block defines a virtual server that listens for traffic with the
characteristics you define, and proxies it to a named group of upstream servers. In
our examples, the virtual server listens on the default port (80) for the HTTP
traffic sent to www.example.com, and proxies it to the upstream server group called
backend. This block is the same in all our examples.
server {
server_name www.example.com;
location / {
proxy_pass http://backend;
}
}
(NGINX Plus and NGINX can also load balance FastCGI, memcached, SCGI, and uwsgi
backend servers. Replace proxy_pass with the appropriate directive � fastcgi_pass,
memcached_pass,scgi_pass, or uwsgi_pass.)
The upstream block names an upstream group and lists the servers that belong to it,
identified by hostname, IP address, or UNIX-domain socket path. In our examples,
the upstream group calledbackend includes three servers: web1, web2, and web3. The
upstream block is where you specify the load-balancing technique, so we�ll be
highlighting that in the sections that follow. As an example, here�s the block for
the default method, Round Robin:
upstream backend {
server web1;
server web2;
server web3;
}
Round Robin
Round Robin is the default load-balancing technique for both NGINX Plus and NGINX.
The load balancer runs through the list of upstream servers in sequence, assigning
the next connection request to each one in turn.
Given the following sample configuration of the backend upstream group, the load
balancer sends the first three connection requests to web1, web2, and web3 in
order, the fourth to web1, the fifth to web2, and so on.
upstream backend {
server web1;
server web2;
server web3;
}
server {
location / {
}
}
Hash
With the Hash method, for each request the load balancer calculates a hash that is
based on the combination of text and NGINX variables you specify, and associates
the hash with one of the servers. It sends all requests with that hash to that
server, so this method establishes a basic kind of session persistence.
In the following example, the hash directive uses the scheme (http or https) and
full URL of the request as the basis for the hash:
upstream backend {
hash $scheme$request_uri;
server web1;
server web2;
server web3;
}
server {
location / {
}
}
IP Hash
IP Hash (available for HTTP only) is a predefined variant of the Hash method, in
which the hash is based on the client�s IP address. You set it with the ip_hash
directive.
upstream backend {
ip_hash;
server web1;
server web2;
server web3;
}
server {
location / {
}
}
If the client has an IPv6 address, the hash is based on the entire address. If it
has an IPv4 address, the hash is based on just the first three octets of the
address. This is designed to optimize for ISP clients that are assigned IP
addresses dynamically from a subnetwork (/24) range. In case of reboot or
reconnection, the client�s address often changes to a different one in the /24
network range, but the connection still represents the same client, so there�s no
reason to change the mapping to the server.
If, however, the majority of the traffic to your site is coming from clients on the
same /24 network, IP Hash doesn�t make sense because it maps all of them to the
same server. In that case (or if you want to hash on all four octets for another
reason), use the Hash method with the $remote_addr variable instead of IP Hash.
hash $remote_addr;
Least Connections
With the Least Connections method, the load balancer compares the current number of
active connections it has to each server, and sends the request to the server with
the fewest connections. You configure it with the least_conn directive.
upstream backend {
least_conn;
server web1;
server web2;
server web3;
}
server {
location / {
}
}
Least Time
For the Least Time method (available in NGINX Plus only), the load balancer
mathematically combines two metrics for each server � the current number of active
connections and a weighted average response time for past requests � and sends the
request to the server with the lowest value.
Your choice of parameter on the least_time directive controls which of two response
times is tracked: either the time to receive the response header (header) or the
time to receive the full response (last_byte).
upstream backend {
least_time (header | last_byte);
server web1;
server web2;
server web3;
}
server {
location / {
}
}
Choosing a Load-Balancing Technique
So how do you know which of the load-balancing techniques is best for your website
or app?
Traffic patterns vary so much from site to site � and even within a single site at
different times of day � that it doesn�t make sense to base the choice of load-
balancing technique on a single characteristic (such as bursty traffic vs steady,
short-lived connections vs. long-lived, and so on). That said, we�ll consider pros
and cons of each method to help you narrow the range of choices to consider.
Running Tests to Compare Methods

Whichever subset of load-balancing methods you consider, we encourage you to test
them to see which works best for your traffic. �Best� usually means shortest time
to deliver responses to clients, but you might have different criteria.
Application performance management tools are very handy for this kind of testing �
you can create custom screens with graphs for each of the servers in the upstream
group, making it possible to compare them in real time as values change during the
test. Several APMs offer custom plug-ins for NGINX Plus and NGINX, including
AppDynamics, Datadog, Dynatrace, and New Relic.
Testing is most straightforward if all servers have the same capacity. If not, you
need to set server weights so that machines with more capacity receive more
requests. See Setting Weights When Servers Aren�t Identical below.
Some metrics to check during testing are:
CPU and memory load � Look at the percentage of total capacity used, for both CPU
and memory. If all servers aren't equally loaded, traffic is not being distributed
efficiently.
Server response time � If the time is consistently higher for some servers than
others, somehow �heavier� requests (requiring more computation or calls to a
database or other services) are getting directed to them in an unbalanced way. Try
adjusting the weights, because their absence might be causing the imbalance rather
than a problem with the load-balancing technique.
Total time to respond to the client � Again, consistently higher times for some
servers suggest they�re getting a disproportionate share of time-consuming
requests. And again, you can try adjusting weights to see if that eliminates the
issue.
Errors and failed requests � You need to make sure that the number of failed
requests and other errors is not larger than is usual for your site, otherwise
you�re testing error conditions instead of realistic traffic. For some errors (such
as HTTP response code 404 Not Found), response time can be much faster than if the
server had to retrieve the requested file; with the Least Connections and Least
Time load-balancing algorithms, this can lead the load balancer to favor a server
that is actually not working well.
Pros, Cons, and Use Cases
So now let�s look at benefits and drawbacks of each load-balancing technique, and
describe some use cases for which they�re particularly suited. We�ll discuss them
in order of increasing suitability for the majority of use cases. As a quick
preview: we consider Least Connections (and, for NGINX Plus, Least Time) to be the
best choices for the widest range of use cases.
Hash and IP Hash

The Hash and IP Hash load-balancing techniques create a fixed association between a
given type of client request (captured in the hash value) and a certain server. You
might recognize this as session persistence � all requests with a given hash value
always go to the same server.
The biggest drawback of these methods is that they are not guaranteed to distribute
requests in equal numbers across servers, let alone balance load evenly. The
hashing algorithm evenly divides the set of all possible hash values into
�buckets,� one per server in the upstream group, but there�s no way to predict
whether the requests that actually occur will have hashes that are evenly
distributed. Suppose, for example, that ten clients are accessing a site, and the
IP Hash algorithm happens to associate the hash for seven of the IP addresses with
web1, one with web2, and two with web3. The web1 server ends up receiving more than
twice as many requests as the other servers combined.
The Hash and IP Hash load-balancing techniques can result in uneven distribution of
load.
So it makes sense to use Hash or IP Hash when the benefit of maintaining sessions
outweighs the possibly bad effects of unbalanced load. They are the only form of
session persistence available in NGINX. NGINX Plus provides three other session
persistence mechanisms that are more sophisticated and work in combination with
actual load balancing (you configure them with the sticky directive). But you might
choose Hash or IP Hash even with NGINX Plus, because the three mechanisms don�t
work in the following cases:
The browser or client app doesn�t accept cookies, and the application does not have
a way to work with the session-persistence mechanisms without cookies. Use the IP
Hash method to associate each client (specifically its IP address) with a
particular server.
You want to send requests for a given URL to the same server every time, to take
advantage of caching on the server itself. Use the Hash method with the
$request_uri variable to fetch the file from the same server every time. For
example, suppose you know that serving a certain .php file requires several time-
consuming database calls, but the fetched data doesn�t change often and so is
cacheable. If you direct all requests for the file to the same server, only the
first client experiences a long delay because of the database calls. For all
subsequent clients, the data is quickly retrieved from the cache. Another advantage
is that only the one server has to cache that particular set of data. Because you
don�t end up duplicate-caching the same data on every server, you can use smaller
caches.
There are a couple cases where IP Hash � and Hash when the client IP address is in
the key � don�t work:
When the client�s IP address can change during the session, for example when a
mobile client switches from a Wi-Fi network to a cellular one.
When the requests from a large number of clients are passing through a forward
proxy, because the proxy�s IP address is used for all of them.
Hashes are deterministic (the hashing algorithm yields the same results every
time). This has a couple of positive side effects: all NGINX Plus or NGINX
instances in a deployment load balance requests in exactly the same way, and the
hash-to-server mapping persists across restarts of the load balancer. (It�s
actually recalculated after the restart, but because the result is always the same
it effectively persists.)
On the other hand, changing the set of upstream servers usually forces
recalculation of at least some of the mappings, breaking session persistence. You
can reduce the number of recalculated mappings somewhat:
For the Hash method, include the consistent parameter to the hash directive; NGINX
Plus uses theketama hashing algorithm, which results in less remapping.
For the IP Hash method, when you remove a server from the upstream group
temporarily, add thedown parameter to its server directive, as for web2 in the
following example. The mappings are not recalculated, on the assumption that the
server will return to service soon.
upstream backend {
ip_hash;
server web1;
server web2 down;
server web3;
}
Round Robin
As noted previously, Round Robin is the default load-balancing method in NGINX Plus
and NGINX. That certainly makes it the easiest method to choose � you don�t have to
configure anything beyond the upstream group itself.
The general consensus is that Round Robin works best when the characteristics of
the servers and requests are unlikely to cause some servers to become overloaded
relative to others. Some of the conditions are:
All the servers have about the same capacity. This requirement is less important if
differences between servers are accurately represented by server weights.
All the servers host the same content.
Requests are pretty similar in the amount of time or processing power they require.
If there�s a wide variation in request weight, a server can become overloaded
because the load balancer happens to send it a lot of heavyweight requests in quick
succession.
Traffic volume is not heavy enough to push servers to near full capacity very
often. If servers are already heavily loaded, it�s more likely that Round Robin�s
rote distribution of requests will lead to push some servers �over the edge� into
overload as described in the previous bullet.
Testing scenarios are particularly good use cases for Round Robin, because this
method ensures that requests are distributed across all servers and in equal
numbers (or the appropriately weighted proportion). Some other methods don�t always
distribute traffic evenly when volume is low, which can skew test results.
The even nature of the distribution can also reveal whether caches are working at
full capacity: because there�s no mechanism for sending repeated requests for a
popular file to the same server, every server is likely to end up serving and
caching a wide range of files (and, in many cases, many of the same files), which
makes the cache more likely to fill.
Finally, the even initial distribution helps uncover issues with session
persistence in NGINX Plus (as configured with the sticky directive).
Least Connections and Least Time

As we mentioned above, Least Connections is the most suitable load-balancing
technique for the widest range of use cases, and particularly for production
traffic. This is supported by anecdotal evidence from our customers. Its
performance is stable and predictable.
Least Connections also effectively distributes workload across servers according to

their capacity. A more powerful server fulfills requests more quickly, so at any
given moment it�s likely to have a smaller number of connections still being
processed (or even waiting for processing to start) than a server with less
capacity. Least Connections sends each request to the server with the smallest
number of current connections, and so is more likely to send requests to powerful
servers. (Setting weights still results in even more efficient distribution of
requests, however, as described in Setting Weights When Servers Aren�t Identical
below).
You can consider Least Time (NGINX Plus only) a more sensitive version of Least
Connections. By including the average response time, it takes into account the
server�s recent performance history (it�s actually an exponentially weighted moving
average, so older response times influence the average less than more recent
response times).
Least Time is particularly suitable when the upstream servers have very different
average response times. If, for example, you have servers in different data centers
for purposes of disaster recovery, Least Time tends to send more requests to the
local servers because they respond faster. Another use case is cloud environments,
where server performance is often very unpredictable.
Least Time is one of the load-balancing techniques in NGINX Plus
Setting Weights When Servers Aren�t Identical

We�ve mentioned several times the importance of setting server weights when the
servers in the upstream group have different capacities. It�s particularly
important for the Round Robin load balancer, which otherwise sends the same number
of requests to each server. That�s likely to result in a less powerful server being
overloaded while a more powerful one sits partly idle.
To set weights, include the weight parameter on one or more server directives in
the upstream block. The default value is 1.
You can think about the effect of setting weights for the different load-balancing
techniques in the following way. Keep in mind that the descriptions are
conceptually correct, but the implementation in the NGINX Plus code doesn�t
necessarily use the indicated mathematical operations. Here�s the upstream group
for our examples:
upstream backend {
server web1 weight=6;
server web2 weight=3;
server web3;
}
Round Robin � Each server gets a percentage of the incoming requests that�s equal
to its weight divided by the sum of the weights. In our example, out of every ten
requests web1 gets six (60%),web2 gets three (30%), and web3 gets one (10%).
Hash and IP Hash � Recall that without weights the hashing algorithm evenly divides
the set of all possible hash values into �buckets,� one per server in the upstream
group. With weights, it instead sums the weights, divides the set of possible
hashes among that number of buckets, and associates each server with the number of
buckets equivalent to its weight. In our example, there are ten buckets, each with
10% of the possible hashes in it. Six buckets (60% of the possible hashes) are
associated with web1, three buckets (30%) with web2, and one bucket (10%) with
web3.
Least Connections and Least Time � We mentioned previously that even without
weights the algorithms are pretty effective at distributing workload across servers
according to their capacity. Setting weights improves their performance in this
regard even more. Recall that Least Connections and Least Time send each request to
the server with the lowest �score� (number of connections, or a mathematical
combination of connections and time, respectively). When you assign weights, the
load balancer divides each server�s score by its weight, and again sends the
request to the server with the lowest value. Here�s an example for Least
Connections with our sample weights and the indicated number of active connections:
web1�s score of 100 is lowest and it gets the request even though its connection
count of 600 is 1.5 times web2�s and more than 4 times web3�s.
web1 � 600 active connections � 6 = 100
Summary
After reviewing the pros and cons of the load-balancing techniques available in
NGINX Plus and NGINX, we consider Least Connections (and, for NGINX Plus, Least
Time) to be most suitable for the widest range of use cases. But it�s important for
you to test several methods in your deployment, because its unique combination of
traffic and server characteristics might make another method better for you.
We�d love to hear about your experiences with load balancing in different use
cases. Please add them to the comments section below.

Choosing An NGINX Plus Load-Balancing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Choosing An NGINX Plus Load-Balancing

Uploaded by

Copyright:

Available Formats

sTechnique

Reviewing the Load-Balancing Techniques

Running Tests to Compare Methods

Some metrics to check during testing are:

Hash and IP Hash

Least Connections and Least Time

Least Connections also effectively distributes workload across servers according to

Least Time is one of the load-balancing techniques in NGINX Plus

Setting Weights When Servers Aren�t Identical

You might also like