You are on page 1of 7

Endeca Solution Article

Endeca Load Balancing Best Practices


By Mark Watkins, Dave Gourley, Jack Walter, Carrie Beaulieu
Last Updated: December 2010
Endeca Product Versions: All

Load balancers are the preferred solution for providing scalability,


redundancy, and fail-over for MDEX Engine queries. This document
introduces the topic, as well as provides best practices on setting up a
load balancer for MDEX Engines. It does not, however, provide
specific load balancer configuration details. This document includes
the following sections:

 Background
 Hardware vs. Software
 Routing
 Heath Checks
 Query Failover
 Actively Managing the Load Balancing Services through Scripting
 Port Balancing
 Redundancy
 References

Copyright and Disclaimer


Product specifications are subject to change without notice and do not represent a commitment on the part of Endeca
Technologies, Inc. (“Endeca”.) Any software referenced in this document is furnished under a license agreement with
Endeca. No part of this document may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying and recording, for any purpose without the express written permission of Endeca.

Copyright © 2005-2011 Endeca Technologies Inc. All rights reserved.

Trademarks
Endeca, Endeca Information Access Platform, Endeca MDEX Engine, Endeca MDEX Engine, Guided Navigation, and
Find, Analyze, Understand are registered trademarks, and Endeca Data Foundry is a trademark of Endeca Technologies,
Inc. All other trademarks or registered trademarks contained herein are the property of their respective owners.
Endeca Solution Article Endeca Load Balancing Best Practices

Background
An Endeca-based application relies upon the availability of the MDEX Engine to service
user requests. If that MDEX Engine should be unavailable, then the Endeca portion of the
application will be unable to respond to queries. The MDEX Engine might be unavailable
or appear to be unavailable for any number of reasons, including hardware failure, an in-
process update of the MDEX Engine's indices, or, in extreme cases, very high load on a
given MDEX Engine. In addition, for high traffic sites, it may be necessary to have more
than one MDEX Engine to serve traffic. For these reasons, it is generally desirable to
implement multiple MDEX Engines for a given deployment, to ensure the highest levels
of availability and performance.

The MDEX Engine functions very similarly to a web server in terms of network traffic: It
simply accepts HTTP requests on a specified port, and returns results to the caller. This
behavior allows for standard web load balancing techniques to be applied. In particular,
all of these techniques will introduce a Virtual IP address, which will accept requests from
the application server, and route the requests to the MDEX Engine it determines best
suited to handling the request.

Endeca HTTP
MDEX Request
Engine
HTTP Load

To VIP
Balancer

HTTP
Request Endeca
To Specific API
Endeca IP and Port
MDEX
Engine Application
Server

It is important to realize that the load balancing scheme described above is no different
than the solution most sites implement for balancing external traffic to application servers.
The configuration process should therefore be familiar. In many cases, if enough ports
are available, the same physical hardware can even be used, provided any firewalls do
not restrict this loop-back.

Endeca Application
Endeca
HTTP
MDEX Requests
Server
HTTP Load

HTTP Load

Engine To VIP
Balancer

Balancer

Application Browser
Server HTTP
Endeca Requests
MDEX To VIP
Application
Engine Server

Endeca Confidential 2 of 7
Endeca Solution Article Endeca Load Balancing Best Practices

Hardware vs. Software


The first option in designing a load balancing strategy is deciding whether to implement a
hardware or software solution. There are a variety of hardware load balancing switches
on the market today, including those made by F5, Cisco, and other manufacturers.

Although software load balancers may present a less expensive option from an initial cost
perspective, this potential savings should be weighed against the cost of production
failures. Particularly for a commerce site, the cost of down-time resulting from load-
balancing failures can rapidly exceed the cost of hardware switches.

Endeca recommends using a hardware load balancer for managing traffic to the MDEX
Engines.

Routing
The second option is selecting an appropriate scheduling strategy for routing requests.
Most switches and software packages support three popular modes: random, round
robin, and least connections. When using “random”, individual requests are routed to
servers randomly. When using "round robin", requests are simply routed sequentially to
successive servers. And when using "least connections", requests are routed to the
server with the least open connection requests.

Generally the best routing algorithm for balancing traffic to MDEX Engines is to use the
"least connections" model. This takes into account the variability in the time it takes a
MDEX Engine to respond to a query, and thus balances requests in a slightly smarter
way. This algorithm, coupled with MDEX Engine multithreading, effectively combats
disruptive “rogue queries”. A rogue query is a query that takes far longer than the norm to
execute. Rogue queries, by consuming a MDEX Engine processing thread, cause any
queries in the queue to wait until the rogue query finishes. Thus, every query in the
queue behind the rogue query takes longer than the norm. Please note that different
load balancer vendors may name this algorithm differently.

Also consider enabling session affinity on the application server that directs server
requests to the load balanced MDEX Engines. Session affinity, also known as “sticky
sessions” is the function of the load balancer that directs subsequent requests from each
unique session to the same MDEX Engine in the load balancer pool. Specifying session
affinity makes the utilization of the MDEX Engine cache more effective, which improves
performance of MDEX Engine access and the application server. This tends to be most
useful for applications that have complex queries to process, such as Analytics
applications.

Session affinity can increase the latency overhead of the load balancer as well as cause
uneven load across the MDEX Engines. Therefore, Endeca recommends testing the load
balanced environment for performance optimization. This will help to determine whether
the increased leverage from the MDEX cache is truly beneficial for the application.

Finally, ensure that return traffic from MDEX to the client tier is directly transmitted, and
does not pass back through the load balancer hardware.

Health Checks

Endeca Confidential 3 of 7
Endeca Solution Article Endeca Load Balancing Best Practices

Load balancers are also typically able to perform "health" status checks on the various
MDEX Engine processes, by either making an HTTP request to the MDEX Engine or by
opening a connection to the TCP port to ensure that the MDEX Engine is listening and
alive. The MDEX Engine provides a health check URL to use for this purpose:

http://[host]:[port]/admin?op=ping

where [host] is the hostname or IP address of the MDEX Engine server, and [port] is the
port on which the MDEX Engine is running.

This URL will respond with a lightweight HTML page of the form:

dgraph [host]:[port] responding at [date/time]

where [date/time] is the time the request was served. The load balancer should be
configured to check for a response within an appropriate duration (such as two seconds),
a 200 OK HTTP status code, and optionally for the word “responding” in the resultant
page.

Because these health check URLs execute within the MDEX Engine just like any other
request, they are subject to the MDEX Engine’s processing queue. This is beneficial - it
allows the health check to succeed or fail based on how well the MDEX Engine is
responding. For instance, the MDEX Engine may be running and accepting requests, but
subject to a large processing queue. In such a case, the health check may fail, because
the health check took too long to respond; it is appropriate for the MDEX Engine to be
removed from rotation in such a situation.

The MDEX Engine also supports TCP Layer 4 probes directed at the IP address/port of
the engine. These probes test that the dgraph is listening on the appropriate port, without
issuing a command to the dgraph (in turn causing the dgraph to do more work). The
configuration of Layer 4 probes is highly dependent on the brand and model of the load
balancer in use; furthermore, not all load balancers support Layer 4 probe capabilities.

Layer 4 probes may be used in conjunction with the URL-based health checks, if so
desired, but they are not necessary. Please note also that a layer 4 probe will result in a
benign error being written to the MDEX Engine’s request and error logs, leading to
unnecessarily verbose log files. The error should look similar to:

ERROR 09/14/10 17:58:13.404 UTC DGRAPH {dgraph}: Error in


Transaction::Read: Remote client closed connection before read completed.

Query Failover
Upon failure of a health status check, the load balancer should be able to mark the port
(process) as unavailable and remove it from the set of ports that it is directing load to
(and preferably be able to alert on error - via a mechanism such as an SNMP trap). Most
load balancers will also support the ability to periodically attempt to re-check a port
marked as unavailable to see if it is once again available.

Of course, no matter how frequently the health check is performed (a good starting
interval is every 5 seconds), there is always a chance that the load balancer will send a
request to a MDEX Engine that is offline. For example, when an MDEX Engine has been

Endeca Confidential 4 of 7
Endeca Solution Article Endeca Load Balancing Best Practices

taken offline for a baseline update, a query could still be forwarded to that MDEX Engine
before a health check is able to mark the MDEX Engine offline.

In this case, most load balancers provide fail-over functionality that will resend this query
to a different MDEX Engine. For example, assume a load balancer is configured to send
queries to MDEX Engines A and B. The application layer submits a query to the load
balancer, which tries to forward the query to MDEX Engine A. In forwarding the query,
the load balancer attempts an HTTP connection to MDEX Engine A, and if that
connection fails (no acknowledgement from the MDEX Engine) the load balancer tries to
forward the query to MDEX Engine B. From the application (and end-user) perspective,
there's no noticeable difference, maintaining interrupt-free query processing.

The load balancer should be configured to re-route the request no more than twice,
resulting in a total maximum of three MDEX Engine requests per end user request. This
upper limit prevents truly malformed requests from generating excessive traffic to multiple
cluster members.

Re-routing of requests should be reserved to connection errors and timeouts. Other


errors - including 404 errors - should not cause re-routing or count as a failed health
check. The MDEX Engine responds with a 404 error when the user’s query references a
dimension value ID that is unknown to the MDEX Engine. In such a situation, re-routing
the query will offer no benefit to the end user, and marking the MDEX Engine as
unavailable will be detrimental, since it is responding correctly.

Another race condition to be aware of is when restarting multiple MDEX Engines in


series. In a configuration with two mirrored MDEX Engines, assume the first MDEX
Engine is stopped, marked as unavailable by the load balancer, and then restarted. If the
second MDEX Engine is stopped before the load balancer re-checks the first MDEX
Engine and marks it available, this can lead to a broken response to the web server. The
first MDEX Engine is online, but still marked offline at the load balancer. And the second
MDEX Engine is now offline as well. The common solution for this scenario is to either
actively manage the load balancers services (explicitly marking MDEX Engine offline or
online from the Deployment Template script) or to introduce a wait period between the
two updates. See the next section for more detail on actively managing the services.

Actively Managing the Load Balancing Services through Scripting


Many load balancers (Cisco for example) support a scripting interface to let you
programmatically disable and enable services. This allows you the option of proactively
disabling an MDEX Engine in the load balancer before shutting the MDEX Engine down
for a baseline update refresh.

In order to easily integrate this scripting into a baseline update, the Deployment
Template’s Dgraph components can specify the name of a script to invoke prior to
shutdown and the name of a script to invoke after the component is started. These
optional attributes must specify the ID of a Script defined in the XML file(s). These
BeanShell scripts are executed just before the Dgraph is stopped or just after it is started.

This functionality is typically used to implement calls to a load balancer, adding or


removing a Dgraph from the cluster as it is updated.

<dgraph id="Dgraph1" host-id="MDEXHost" port="15000"


pre-shutdown-script="DgraphPreShutdownScript"
post-startup-script="DgraphPostStartupScript">
<properties>
<property name="restartGroup" value="A" />

Endeca Confidential 5 of 7
Endeca Solution Article Endeca Load Balancing Best Practices

</properties>
<log-dir>./logs/dgraphs/Dgraph1</log-dir>
<input-dir>./data/dgraphs/Dgraph1/dgraph_input</input-dir>
<update-dir>./data/dgraphs/Dgraph1/dgraph_input/updates</update-dir>
</dgraph>

See the Deployment Template Usage Guide for more information on integrating scripts
into the Deployment Template.

Port Balancing
In some cases, multiple MDEX Engine processes are run on a single server. In this
case, it is important that the load balancer can be configured to use multiple ports from
the same IP address. For example, if a server (10.0.0.1) is running two MDEX Engine
processes (one on port 8000, and one on port 9000) the load balancer must be
configured to balance requests between 10.0.0.1:8000 and 10.0.0.1:9000. While most
hardware load balancers support this configuration, some are not able to support two
separate entries for the same IP address.

If your load balancer is unable to support this configuration, another option is to assign
multiple interfaces (IP addresses) to the same server. The MDEX Engine process will
then bind itself to each of these interfaces. (There is currently no way to override this
behavior.) For example, if Server A has two interfaces (10.0.0.1 and 10.0.0.2.) then a
MDEX Engine process started on port 8000 is listening on both 10.0.0.1:8000 and
10.0.0.2:8000. A second process started on port 9000 will be listening on both
10.0.0.1:9000 and 10.0.0.2:9000. However, the load balancer can now be configured to
forward requests to 10.0.0.1:8000 or 10.0.0.2:9000.

Redundancy
Endeca typically recommends implementing two hardware switches configured
redundantly (either by network design or some other method.) This will ensure that the
load balancer is not a single point of failure as well.

Endeca HTTP
MDEX Request
Engine HTTP Load To VIP Endeca
HTTP Balancer
Request API
To Specific Application
HTTP Load
Endeca IP and Port Server
Balancer
MDEX
Engine

References

 “Deployment Template Usage Guide” (EDeN)

 “Disabling and Enabling a F5 Load Balancer Node” (EDeN)

Endeca Confidential 6 of 7
Endeca Solution Article Endeca Load Balancing Best Practices

 Managing Services for a Cisco CSS Load Balancer (EDeN)

 For more information on general HTTP load balancing, see Chapter 20 of HTTP:
The Definitive Guide by David Gourley and Brian Totty, published by O’Reilly.

Endeca Confidential 7 of 7

You might also like