You are on page 1of 34

1 1 1

Linux High Availability Cluster Selection


Tim Burke

tburke@redhat.com
1 1 1
Which cluster product is right for me ?
There is no one size fits all winner

Rapidly evolving marketplace

The good news: There is a lot to choose from

The bad news: There is a lot to choose from
Strategy - be an informed consumer
1 1 1
Selection Process / Presentation Outline
Identify target applications - usage model
Identify required cluster feature set
Open source vs proprietary, product vs project
Cost factors
Vendor evaluation
OEM & ISV endorsements
1 1 1
Identify Target Applications
Clustering Categories
High Availibility Clusters
Database
Fileservers
Off the shelf applications
Load Balancing Clusters
Dispatching web traffic
High Performance Computing
Large computational problems
1 1 1
High Performance Computing
HPC, HPTC cluster attributes
1. Large # of systems working together to
solve a common problem -scalability
2. Performance, not reliability is of utmost
importance
3. Requires custom parallelized applications
4. Tends to be bleeding edge, early adopters
5. Example deployments: genetics,
pharmacutical, weather, seismic analysis,
modeling
1 1 1
Load Balancing Clusters
Front end dispatching node (or 2 for
redundancy)
Pool of inexpensive back end servers
Redirect transactions so no 1 system is
overloaded
Balancing algorithms: round robin,
weighted, load based
Typically used for web server traffic
(Apache front end)
Useful for static content
Not applicable for dynamic content
1 1 1
High Availability Clusters
The need for high availability (HA)
Overview of high availability features
1 1 1
Reliability, Availability, Serviceability
(RAS)
Users & businesses have high expectations
1. Reliability - high degree of protection for corporate
data. Information is a crucial business asset.
2. Availability - near continuous data access
3. Serviceability - procedures to correct problems with
minimal business impact
1 1 1
Sources of Downtime
The Standish Group - 2001
Application bug or
error
Main-system
hardware failure
Database error
Main-server system
bug
Network
Operator error
Other server's
hardware failure
Other server's sys -
tem bug
Environmental condi -
tions
Planned outage
Other
1 1 1
Downtime Costs -The Standish Group
Electronic
resource
planning
(ERP)
Supply
chain
man-
agement
E-
com-
Internet
banking
Customer
service
center
Messaging
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
Cost per minute of downtime (dollars)
Column 2
1 1 1
No Single Point of Failure (NSPF)
Hardware Redundancy - increased overall
reliability and availability
1. Multiple paths between systems
2. Storage - mirrored, RAID5
3. Multiple power sources
4. Multiple external networks
1 1 1
High Availability Clusters
Redundancy for fault
tolerance
Failover - if 1 node shuts
down or fails, another node
takes over application load
Facilitates planned
maintenance
1 1 1
Failover
Involves selecting a target node & moving
resources - failover policies
Example resource types
1. Physical disk ownership
2. Filesystems
3. Applications
4. Databases
5. IP addresses
1 1 1
Failover Configurations
Active / Passive
1 node runs application(s)
Other node on standby for takeover
Idle node can takeover with no performance degradation
Active / Active
All nodes actively running application(s)
Workload moves to survivor on failure
Effectively utilizes capacity (TCO)
1 1 1
Data Integrity Provisions
Crucial for safe failover of data centric services (filesystem /
database)
In failure scenarios (eg hung node), ensure failed node can not
access storage - I/O Barriers, I/O Fencing
Lack of I/O Fencing can result in
Loss of data (backups ?)
System crashes
Common mechanisms
Power switches
SCSI reservations
Watchdog timers
1 1 1
Application Monitoring
All HA clusters monitor node state
Most monitor key cluster resources - network, disk
Many monitor application health
Process existence
Application check scripts
HTTP get on web server
Record retrieval on database
Filesystem directory listing
1 1 1
Failover Times
Don't get too hung up on this
Remember that data integrity is paramount
Quoted failover times only include cluster overhead, don't include
application recovery
Application startup time
Filesystem consistency checks
Database recovery - transaction replay
Example
Product literature cites 5 second failover time
Can be several minutes for database recovery (size & activity
dependent)
1 1 1
Open Source vs Proprietary
Project vs Product
Open source facilitates self-support &
customization
Support is a key determinant
Products are generally well tested
Some products are also open source
If you care enough about high availability &
solution stacks, you're likely to go the product
route
1 1 1
Heterogeneous HA Products
Proprietary offerings that run on Linux, W2K,
UNIX
Unifies user training
May compromise flexibility, adaptability or data
integrity (ouch!)
Some are Linux products with GUIs that run on
other platforms
Virtually none allow heterogeneous platforms
within the same cluster
1 1 1
Cost Factors
Beware of hidden charges
Product base fee
Application specific charges (Oracle, DB2, NFS, etc)
Support
Some only come with bundled service offerings
Hardware requirements
Proprietary UNIX offerings typically cost several
times more
1 1 1
Vendor Evaluation
Company vision - do their cluster offerings complement or
distract. Futures roadmap.
Financial Stability
Ability to impact the marketplace
Responsiveness - ability to provide ongoing feature enhancements
Proprietary vs open source
Product integration - fit with distribution, kernel patches,
compatibility & support implications
New Linux technology vs large monolithic legacy ports
How long its been on the market
1 1 1
Open Source Projects
FailSafe - from SGI & SuSE
Optional data integrity provisions (power switch)
Supports 16 nodes
Good set of application kits
Red Hat Cluster Manager
Also offered as a product
Described later in presentation
1 1 1
HA Cluster Product Comparisons
The ground rules
Trying to remain objective
Highlight product strengths
Listed in alphabetical order
Based on web site content as of 10/2002
1 1 1
HP - MC/Serviceguard
Proprietary - Ported from HP/UX
Only supported on HP hardware
Dynamic online addition/removal of members
Worldwide support services
Quorum voting membership
Up to 8 nodes using FibreChannel storage, 2
nodes using SCSI
Compaq Alpha line targeted at HPC clusters
1 1 1
Legato - Availability Manager
Proprietary
Heterogeneous (Linux, W2K, Solaris, HP-UX)
Strong data centric services
Well integrated with SAN environments
Replication
Storage management, volume management, backup
Application monitoring
Extensive set of application specific modules
1 1 1
PolyServe - Application Manager
Proprietary
Application monitoring
Up to 16 nodes
Multiple platforms - Linux, W2K, Solaris
Doesn't require shared storage
Dynamic member addition/removal
Centralized management
1 1 1
PolyServe - Matrix Server
Tailored for Oracle 9i Real Application Clusters
Concurrent read + write access to data on shared
storage SAN
Cluster filesystem with lock manager +
distributed cache
Allows incremental growth by adding servers +
storage
Proprietary
1 1 1
Red Hat - Cluster Manager
Bundled with RHL Advanced Server 2.1
Both open source & product
Data integrity provisions
Power switches (optional)
Watchdog timer software
Application monitoring
Heterogeneous fileserving via NFS + Samba
Web monitoring GUI
Also integrated Piranha load balancing cluster
1 1 1
Steeleye - LifeKeeper
Proprietary - UNIX port
Multi-platform - Linux, W2K
Wide set of application kits (separately
purchaced)
Established OEM relationships
Data integrity provisions - via SCSI reservations,
requiring kernel patches
Application monitoring
1 1 1
IBM
Focusing on HPC
Rackmounted Intel servers
Custom solutions
(older) XCAT software for management, parallel
operations, and installation
(newer) Cluster Systems Mgt (CSM) for Linux
Remote monitoring, resets, bios console
Parallel shell
Requires IBM hardware for imbedded service processor
High Availability via partnering
1 1 1
Veritas Cluster Server
Recent Linux port
16 nodes, wide range of supported apps
Also runs on Windows, AIX, UNIX, Solaris
Integrates with their storage offerings (volume
management, backup, data replication)
Proprietary
1 1 1
Other Vendors
Dell
Strategic partnering for HA software
Penguin Computing
HPC offering via partnership with Scyld Beowulf
1 1 1
Consolidated Solutions
Egenera
BladeFrame hardware, backplane eliminates cabling
Management software, HA, provisioning
Linux NetworX
Turnkey solution, preintegrated hardware + management tools
Custom hardware, dense racks
1 1 1
Summary
Know what category of cluster is right for you
Be knowledgeable of required cluster features
Weigh your cost criteria
Chose a vendor you can trust to safeguard your
corporate assets
Be wary of marketing collateral

You might also like