Professional Documents
Culture Documents
ESA Endeca Load Balancing Best Practices
ESA Endeca Load Balancing Best Practices
Background
Hardware vs. Software
Routing
Heath Checks
Query Failover
Actively Managing the Load Balancing Services through Scripting
Port Balancing
Redundancy
References
Trademarks
Endeca, Endeca Information Access Platform, Endeca MDEX Engine, Endeca MDEX Engine, Guided Navigation, and
Find, Analyze, Understand are registered trademarks, and Endeca Data Foundry is a trademark of Endeca Technologies,
Inc. All other trademarks or registered trademarks contained herein are the property of their respective owners.
Endeca Solution Article Endeca Load Balancing Best Practices
Background
An Endeca-based application relies upon the availability of the MDEX Engine to service
user requests. If that MDEX Engine should be unavailable, then the Endeca portion of the
application will be unable to respond to queries. The MDEX Engine might be unavailable
or appear to be unavailable for any number of reasons, including hardware failure, an in-
process update of the MDEX Engine's indices, or, in extreme cases, very high load on a
given MDEX Engine. In addition, for high traffic sites, it may be necessary to have more
than one MDEX Engine to serve traffic. For these reasons, it is generally desirable to
implement multiple MDEX Engines for a given deployment, to ensure the highest levels
of availability and performance.
The MDEX Engine functions very similarly to a web server in terms of network traffic: It
simply accepts HTTP requests on a specified port, and returns results to the caller. This
behavior allows for standard web load balancing techniques to be applied. In particular,
all of these techniques will introduce a Virtual IP address, which will accept requests from
the application server, and route the requests to the MDEX Engine it determines best
suited to handling the request.
Endeca HTTP
MDEX Request
Engine
HTTP Load
To VIP
Balancer
HTTP
Request Endeca
To Specific API
Endeca IP and Port
MDEX
Engine Application
Server
It is important to realize that the load balancing scheme described above is no different
than the solution most sites implement for balancing external traffic to application servers.
The configuration process should therefore be familiar. In many cases, if enough ports
are available, the same physical hardware can even be used, provided any firewalls do
not restrict this loop-back.
Endeca Application
Endeca
HTTP
MDEX Requests
Server
HTTP Load
HTTP Load
Engine To VIP
Balancer
Balancer
Application Browser
Server HTTP
Endeca Requests
MDEX To VIP
Application
Engine Server
Endeca Confidential 2 of 7
Endeca Solution Article Endeca Load Balancing Best Practices
Although software load balancers may present a less expensive option from an initial cost
perspective, this potential savings should be weighed against the cost of production
failures. Particularly for a commerce site, the cost of down-time resulting from load-
balancing failures can rapidly exceed the cost of hardware switches.
Endeca recommends using a hardware load balancer for managing traffic to the MDEX
Engines.
Routing
The second option is selecting an appropriate scheduling strategy for routing requests.
Most switches and software packages support three popular modes: random, round
robin, and least connections. When using “random”, individual requests are routed to
servers randomly. When using "round robin", requests are simply routed sequentially to
successive servers. And when using "least connections", requests are routed to the
server with the least open connection requests.
Generally the best routing algorithm for balancing traffic to MDEX Engines is to use the
"least connections" model. This takes into account the variability in the time it takes a
MDEX Engine to respond to a query, and thus balances requests in a slightly smarter
way. This algorithm, coupled with MDEX Engine multithreading, effectively combats
disruptive “rogue queries”. A rogue query is a query that takes far longer than the norm to
execute. Rogue queries, by consuming a MDEX Engine processing thread, cause any
queries in the queue to wait until the rogue query finishes. Thus, every query in the
queue behind the rogue query takes longer than the norm. Please note that different
load balancer vendors may name this algorithm differently.
Also consider enabling session affinity on the application server that directs server
requests to the load balanced MDEX Engines. Session affinity, also known as “sticky
sessions” is the function of the load balancer that directs subsequent requests from each
unique session to the same MDEX Engine in the load balancer pool. Specifying session
affinity makes the utilization of the MDEX Engine cache more effective, which improves
performance of MDEX Engine access and the application server. This tends to be most
useful for applications that have complex queries to process, such as Analytics
applications.
Session affinity can increase the latency overhead of the load balancer as well as cause
uneven load across the MDEX Engines. Therefore, Endeca recommends testing the load
balanced environment for performance optimization. This will help to determine whether
the increased leverage from the MDEX cache is truly beneficial for the application.
Finally, ensure that return traffic from MDEX to the client tier is directly transmitted, and
does not pass back through the load balancer hardware.
Health Checks
Endeca Confidential 3 of 7
Endeca Solution Article Endeca Load Balancing Best Practices
Load balancers are also typically able to perform "health" status checks on the various
MDEX Engine processes, by either making an HTTP request to the MDEX Engine or by
opening a connection to the TCP port to ensure that the MDEX Engine is listening and
alive. The MDEX Engine provides a health check URL to use for this purpose:
http://[host]:[port]/admin?op=ping
where [host] is the hostname or IP address of the MDEX Engine server, and [port] is the
port on which the MDEX Engine is running.
This URL will respond with a lightweight HTML page of the form:
where [date/time] is the time the request was served. The load balancer should be
configured to check for a response within an appropriate duration (such as two seconds),
a 200 OK HTTP status code, and optionally for the word “responding” in the resultant
page.
Because these health check URLs execute within the MDEX Engine just like any other
request, they are subject to the MDEX Engine’s processing queue. This is beneficial - it
allows the health check to succeed or fail based on how well the MDEX Engine is
responding. For instance, the MDEX Engine may be running and accepting requests, but
subject to a large processing queue. In such a case, the health check may fail, because
the health check took too long to respond; it is appropriate for the MDEX Engine to be
removed from rotation in such a situation.
The MDEX Engine also supports TCP Layer 4 probes directed at the IP address/port of
the engine. These probes test that the dgraph is listening on the appropriate port, without
issuing a command to the dgraph (in turn causing the dgraph to do more work). The
configuration of Layer 4 probes is highly dependent on the brand and model of the load
balancer in use; furthermore, not all load balancers support Layer 4 probe capabilities.
Layer 4 probes may be used in conjunction with the URL-based health checks, if so
desired, but they are not necessary. Please note also that a layer 4 probe will result in a
benign error being written to the MDEX Engine’s request and error logs, leading to
unnecessarily verbose log files. The error should look similar to:
Query Failover
Upon failure of a health status check, the load balancer should be able to mark the port
(process) as unavailable and remove it from the set of ports that it is directing load to
(and preferably be able to alert on error - via a mechanism such as an SNMP trap). Most
load balancers will also support the ability to periodically attempt to re-check a port
marked as unavailable to see if it is once again available.
Of course, no matter how frequently the health check is performed (a good starting
interval is every 5 seconds), there is always a chance that the load balancer will send a
request to a MDEX Engine that is offline. For example, when an MDEX Engine has been
Endeca Confidential 4 of 7
Endeca Solution Article Endeca Load Balancing Best Practices
taken offline for a baseline update, a query could still be forwarded to that MDEX Engine
before a health check is able to mark the MDEX Engine offline.
In this case, most load balancers provide fail-over functionality that will resend this query
to a different MDEX Engine. For example, assume a load balancer is configured to send
queries to MDEX Engines A and B. The application layer submits a query to the load
balancer, which tries to forward the query to MDEX Engine A. In forwarding the query,
the load balancer attempts an HTTP connection to MDEX Engine A, and if that
connection fails (no acknowledgement from the MDEX Engine) the load balancer tries to
forward the query to MDEX Engine B. From the application (and end-user) perspective,
there's no noticeable difference, maintaining interrupt-free query processing.
The load balancer should be configured to re-route the request no more than twice,
resulting in a total maximum of three MDEX Engine requests per end user request. This
upper limit prevents truly malformed requests from generating excessive traffic to multiple
cluster members.
In order to easily integrate this scripting into a baseline update, the Deployment
Template’s Dgraph components can specify the name of a script to invoke prior to
shutdown and the name of a script to invoke after the component is started. These
optional attributes must specify the ID of a Script defined in the XML file(s). These
BeanShell scripts are executed just before the Dgraph is stopped or just after it is started.
Endeca Confidential 5 of 7
Endeca Solution Article Endeca Load Balancing Best Practices
</properties>
<log-dir>./logs/dgraphs/Dgraph1</log-dir>
<input-dir>./data/dgraphs/Dgraph1/dgraph_input</input-dir>
<update-dir>./data/dgraphs/Dgraph1/dgraph_input/updates</update-dir>
</dgraph>
See the Deployment Template Usage Guide for more information on integrating scripts
into the Deployment Template.
Port Balancing
In some cases, multiple MDEX Engine processes are run on a single server. In this
case, it is important that the load balancer can be configured to use multiple ports from
the same IP address. For example, if a server (10.0.0.1) is running two MDEX Engine
processes (one on port 8000, and one on port 9000) the load balancer must be
configured to balance requests between 10.0.0.1:8000 and 10.0.0.1:9000. While most
hardware load balancers support this configuration, some are not able to support two
separate entries for the same IP address.
If your load balancer is unable to support this configuration, another option is to assign
multiple interfaces (IP addresses) to the same server. The MDEX Engine process will
then bind itself to each of these interfaces. (There is currently no way to override this
behavior.) For example, if Server A has two interfaces (10.0.0.1 and 10.0.0.2.) then a
MDEX Engine process started on port 8000 is listening on both 10.0.0.1:8000 and
10.0.0.2:8000. A second process started on port 9000 will be listening on both
10.0.0.1:9000 and 10.0.0.2:9000. However, the load balancer can now be configured to
forward requests to 10.0.0.1:8000 or 10.0.0.2:9000.
Redundancy
Endeca typically recommends implementing two hardware switches configured
redundantly (either by network design or some other method.) This will ensure that the
load balancer is not a single point of failure as well.
Endeca HTTP
MDEX Request
Engine HTTP Load To VIP Endeca
HTTP Balancer
Request API
To Specific Application
HTTP Load
Endeca IP and Port Server
Balancer
MDEX
Engine
References
Endeca Confidential 6 of 7
Endeca Solution Article Endeca Load Balancing Best Practices
For more information on general HTTP load balancing, see Chapter 20 of HTTP:
The Definitive Guide by David Gourley and Brian Totty, published by O’Reilly.
Endeca Confidential 7 of 7