You are on page 1of 34

1 1 1

Linux High Availability Cluster Selection

Tim Burke
1 1 1
Which cluster product is right for me ?

There is no one size fits all winner

Rapidly evolving marketplace

The good news: There is a lot to choose from

The bad news: There is a lot to choose from

Strategy - be an informed consumer

1 1 1
Selection Process / Presentation Outline

Identify target applications - usage model

Identify required cluster feature set

Open source vs proprietary, product vs project

Cost factors

Vendor evaluation

OEM & ISV endorsements

1 1 1
Identify Target Applications

Clustering Categories

High Availibility Clusters



Off the shelf applications

Load Balancing Clusters

Dispatching web traffic

High Performance Computing

Large computational problems

1 1 1
High Performance Computing

HPC, HPTC cluster attributes

1. Large # of systems working together to
solve a common problem -scalability
2. Performance, not reliability is of utmost
3. Requires custom parallelized applications
4. Tends to be bleeding edge, early adopters
5. Example deployments: genetics,
pharmacutical, weather, seismic analysis,
1 1 1
Load Balancing Clusters

Front end dispatching node (or 2 for


Pool of inexpensive back end servers

Redirect transactions so no 1 system is


Balancing algorithms: round robin,

weighted, load based

Typically used for web server traffic

(Apache front end)

Useful for static content

Not applicable for dynamic content

1 1 1
High Availability Clusters

The need for high availability (HA)

Overview of high availability features

1 1 1
Reliability, Availability, Serviceability

Users & businesses have high expectations

1. Reliability - high degree of protection for corporate
data. Information is a crucial business asset.
2. Availability - near continuous data access
3. Serviceability - procedures to correct problems with
minimal business impact
1 1 1
Sources of Downtime
The Standish Group - 2001
Application bug or
hardware failure
Database error
Main-server system
Operator error
Other server's
hardware failure
Other server's sys-
tem bug
Environmental condi-
Planned outage
1 1 1
Downtime Costs -The Standish Group
Cost per minute of downtime (dollars)
Column 2
1 1 1
No Single Point of Failure (NSPF)

Hardware Redundancy - increased overall

reliability and availability
1. Multiple paths between systems
2. Storage - mirrored, RAID5
3. Multiple power sources
4. Multiple external networks
1 1 1
High Availability Clusters

Redundancy for fault


Failover - if 1 node shuts

down or fails, another node
takes over application load

Facilitates planned
1 1 1

Involves selecting a target node & moving

resources - failover policies

Example resource types

1. Physical disk ownership
2. Filesystems
3. Applications
4. Databases
5. IP addresses
1 1 1
Failover Configurations

Active / Passive

1 node runs application(s)

Other node on standby for takeover

Idle node can takeover with no performance degradation

Active / Active

All nodes actively running application(s)

Workload moves to survivor on failure

Effectively utilizes capacity (TCO)

1 1 1
Data Integrity Provisions

Crucial for safe failover of data centric services (filesystem /


In failure scenarios (eg hung node), ensure failed node can not access
storage - I/O Barriers, I/O Fencing

Lack of I/O Fencing can result in

Loss of data (backups ?)

System crashes

Common mechanisms

Power switches

SCSI reservations

Watchdog timers
1 1 1
Application Monitoring

All HA clusters monitor node state

Most monitor key cluster resources - network, disk

Many monitor application health

Process existence

Application check scripts

HTTP get on web server

Record retrieval on database

Filesystem directory listing

1 1 1
Failover Times

Don't get too hung up on this

Remember that data integrity is paramount

Quoted failover times only include cluster overhead, don't include

application recovery

Application startup time

Filesystem consistency checks

Database recovery - transaction replay


Product literature cites 5 second failover time

Can be several minutes for database recovery (size & activity

1 1 1
Open Source vs Proprietary
Project vs Product

Open source facilitates self-support &


Support is a key determinant

Products are generally well tested

Some products are also open source

If you care enough about high availability &

solution stacks, you're likely to go the product
1 1 1
Heterogeneous HA Products

Proprietary offerings that run on Linux, W2K,


Unifies user training

May compromise flexibility, adaptability or data

integrity (ouch!)

Some are Linux products with GUIs that run on

other platforms

Virtually none allow heterogeneous platforms

within the same cluster
1 1 1
Cost Factors

Beware of hidden charges

Product base fee

Application specific charges (Oracle, DB2, NFS, etc)


Some only come with bundled service offerings

Hardware requirements

Proprietary UNIX offerings typically cost several

times more
1 1 1
Vendor Evaluation

Company vision - do their cluster offerings complement or

distract. Futures roadmap.

Financial Stability

Ability to impact the marketplace

Responsiveness - ability to provide ongoing feature enhancements

Proprietary vs open source

Product integration - fit with distribution, kernel patches,

compatibility & support implications

New Linux technology vs large monolithic legacy ports

How long its been on the market

1 1 1
Open Source Projects

FailSafe - from SGI & SuSE

Optional data integrity provisions (power switch)

Supports 16 nodes

Good set of application kits

Red Hat Cluster Manager

Also offered as a product

Described later in presentation

1 1 1
HA Cluster Product Comparisons

The ground rules

Trying to remain objective

Highlight product strengths

Listed in alphabetical order

Based on web site content as of 10/2002

1 1 1
HP - MC/Serviceguard

Proprietary - Ported from HP/UX

Only supported on HP hardware

Dynamic online addition/removal of members

Worldwide support services

Quorum voting membership

Up to 8 nodes using FibreChannel storage, 2

nodes using SCSI

Compaq Alpha line targeted at HPC clusters

1 1 1
Legato - Availability Manager


Heterogeneous (Linux, W2K, Solaris, HP-UX)

Strong data centric services

Well integrated with SAN environments


Storage management, volume management, backup

Application monitoring

Extensive set of application specific modules

1 1 1
PolyServe - Application Manager


Application monitoring

Up to 16 nodes

Multiple platforms - Linux, W2K, Solaris

Doesn't require shared storage

Dynamic member addition/removal

Centralized management
1 1 1
PolyServe - Matrix Server

Tailored for Oracle 9i Real Application Clusters

Concurrent read + write access to data on shared

storage SAN

Cluster filesystem with lock manager +

distributed cache

Allows incremental growth by adding servers +


1 1 1
Red Hat - Cluster Manager

Bundled with RHL Advanced Server 2.1

Both open source & product

Data integrity provisions

Power switches (optional)

Watchdog timer software

Application monitoring

Heterogeneous fileserving via NFS + Samba

Web monitoring GUI

Also integrated Piranha load balancing cluster

1 1 1
Steeleye - LifeKeeper

Proprietary - UNIX port

Multi-platform - Linux, W2K

Wide set of application kits (separately


Established OEM relationships

Data integrity provisions - via SCSI reservations,

requiring kernel patches

Application monitoring
1 1 1

Focusing on HPC

Rackmounted Intel servers

Custom solutions

(older) XCAT software for management, parallel

operations, and installation

(newer) Cluster Systems Mgt (CSM) for Linux

Remote monitoring, resets, bios console

Parallel shell

Requires IBM hardware for imbedded service processor

High Availability via partnering

1 1 1
Veritas Cluster Server

Recent Linux port

16 nodes, wide range of supported apps

Also runs on Windows, AIX, UNIX, Solaris

Integrates with their storage offerings (volume

management, backup, data replication)

1 1 1
Other Vendors


Strategic partnering for HA software

Penguin Computing

HPC offering via partnership with Scyld Beowulf

1 1 1
Consolidated Solutions


BladeFrame hardware, backplane eliminates cabling

Management software, HA, provisioning

Linux NetworX

Turnkey solution, preintegrated hardware + management tools

Custom hardware, dense racks

1 1 1

Know what category of cluster is right for you

Be knowledgeable of required cluster features

Weigh your cost criteria

Chose a vendor you can trust to safeguard your

corporate assets

Be wary of marketing collateral