tburke@redhat.com 1 1 1 Which cluster product is right for me ? There is no one size fits all winner
Rapidly evolving marketplace
The good news: There is a lot to choose from
The bad news: There is a lot to choose from Strategy - be an informed consumer 1 1 1 Selection Process / Presentation Outline Identify target applications - usage model Identify required cluster feature set Open source vs proprietary, product vs project Cost factors Vendor evaluation OEM & ISV endorsements 1 1 1 Identify Target Applications Clustering Categories High Availibility Clusters Database Fileservers Off the shelf applications Load Balancing Clusters Dispatching web traffic High Performance Computing Large computational problems 1 1 1 High Performance Computing HPC, HPTC cluster attributes 1. Large # of systems working together to solve a common problem -scalability 2. Performance, not reliability is of utmost importance 3. Requires custom parallelized applications 4. Tends to be bleeding edge, early adopters 5. Example deployments: genetics, pharmacutical, weather, seismic analysis, modeling 1 1 1 Load Balancing Clusters Front end dispatching node (or 2 for redundancy) Pool of inexpensive back end servers Redirect transactions so no 1 system is overloaded Balancing algorithms: round robin, weighted, load based Typically used for web server traffic (Apache front end) Useful for static content Not applicable for dynamic content 1 1 1 High Availability Clusters The need for high availability (HA) Overview of high availability features 1 1 1 Reliability, Availability, Serviceability (RAS) Users & businesses have high expectations 1. Reliability - high degree of protection for corporate data. Information is a crucial business asset. 2. Availability - near continuous data access 3. Serviceability - procedures to correct problems with minimal business impact 1 1 1 Sources of Downtime The Standish Group - 2001 Application bug or error Main-system hardware failure Database error Main-server system bug Network Operator error Other server's hardware failure Other server's sys - tem bug Environmental condi - tions Planned outage Other 1 1 1 Downtime Costs -The Standish Group Electronic resource planning (ERP) Supply chain man- agement E- com- Internet banking Customer service center Messaging 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 Cost per minute of downtime (dollars) Column 2 1 1 1 No Single Point of Failure (NSPF) Hardware Redundancy - increased overall reliability and availability 1. Multiple paths between systems 2. Storage - mirrored, RAID5 3. Multiple power sources 4. Multiple external networks 1 1 1 High Availability Clusters Redundancy for fault tolerance Failover - if 1 node shuts down or fails, another node takes over application load Facilitates planned maintenance 1 1 1 Failover Involves selecting a target node & moving resources - failover policies Example resource types 1. Physical disk ownership 2. Filesystems 3. Applications 4. Databases 5. IP addresses 1 1 1 Failover Configurations Active / Passive 1 node runs application(s) Other node on standby for takeover Idle node can takeover with no performance degradation Active / Active All nodes actively running application(s) Workload moves to survivor on failure Effectively utilizes capacity (TCO) 1 1 1 Data Integrity Provisions Crucial for safe failover of data centric services (filesystem / database) In failure scenarios (eg hung node), ensure failed node can not access storage - I/O Barriers, I/O Fencing Lack of I/O Fencing can result in Loss of data (backups ?) System crashes Common mechanisms Power switches SCSI reservations Watchdog timers 1 1 1 Application Monitoring All HA clusters monitor node state Most monitor key cluster resources - network, disk Many monitor application health Process existence Application check scripts HTTP get on web server Record retrieval on database Filesystem directory listing 1 1 1 Failover Times Don't get too hung up on this Remember that data integrity is paramount Quoted failover times only include cluster overhead, don't include application recovery Application startup time Filesystem consistency checks Database recovery - transaction replay Example Product literature cites 5 second failover time Can be several minutes for database recovery (size & activity dependent) 1 1 1 Open Source vs Proprietary Project vs Product Open source facilitates self-support & customization Support is a key determinant Products are generally well tested Some products are also open source If you care enough about high availability & solution stacks, you're likely to go the product route 1 1 1 Heterogeneous HA Products Proprietary offerings that run on Linux, W2K, UNIX Unifies user training May compromise flexibility, adaptability or data integrity (ouch!) Some are Linux products with GUIs that run on other platforms Virtually none allow heterogeneous platforms within the same cluster 1 1 1 Cost Factors Beware of hidden charges Product base fee Application specific charges (Oracle, DB2, NFS, etc) Support Some only come with bundled service offerings Hardware requirements Proprietary UNIX offerings typically cost several times more 1 1 1 Vendor Evaluation Company vision - do their cluster offerings complement or distract. Futures roadmap. Financial Stability Ability to impact the marketplace Responsiveness - ability to provide ongoing feature enhancements Proprietary vs open source Product integration - fit with distribution, kernel patches, compatibility & support implications New Linux technology vs large monolithic legacy ports How long its been on the market 1 1 1 Open Source Projects FailSafe - from SGI & SuSE Optional data integrity provisions (power switch) Supports 16 nodes Good set of application kits Red Hat Cluster Manager Also offered as a product Described later in presentation 1 1 1 HA Cluster Product Comparisons The ground rules Trying to remain objective Highlight product strengths Listed in alphabetical order Based on web site content as of 10/2002 1 1 1 HP - MC/Serviceguard Proprietary - Ported from HP/UX Only supported on HP hardware Dynamic online addition/removal of members Worldwide support services Quorum voting membership Up to 8 nodes using FibreChannel storage, 2 nodes using SCSI Compaq Alpha line targeted at HPC clusters 1 1 1 Legato - Availability Manager Proprietary Heterogeneous (Linux, W2K, Solaris, HP-UX) Strong data centric services Well integrated with SAN environments Replication Storage management, volume management, backup Application monitoring Extensive set of application specific modules 1 1 1 PolyServe - Application Manager Proprietary Application monitoring Up to 16 nodes Multiple platforms - Linux, W2K, Solaris Doesn't require shared storage Dynamic member addition/removal Centralized management 1 1 1 PolyServe - Matrix Server Tailored for Oracle 9i Real Application Clusters Concurrent read + write access to data on shared storage SAN Cluster filesystem with lock manager + distributed cache Allows incremental growth by adding servers + storage Proprietary 1 1 1 Red Hat - Cluster Manager Bundled with RHL Advanced Server 2.1 Both open source & product Data integrity provisions Power switches (optional) Watchdog timer software Application monitoring Heterogeneous fileserving via NFS + Samba Web monitoring GUI Also integrated Piranha load balancing cluster 1 1 1 Steeleye - LifeKeeper Proprietary - UNIX port Multi-platform - Linux, W2K Wide set of application kits (separately purchaced) Established OEM relationships Data integrity provisions - via SCSI reservations, requiring kernel patches Application monitoring 1 1 1 IBM Focusing on HPC Rackmounted Intel servers Custom solutions (older) XCAT software for management, parallel operations, and installation (newer) Cluster Systems Mgt (CSM) for Linux Remote monitoring, resets, bios console Parallel shell Requires IBM hardware for imbedded service processor High Availability via partnering 1 1 1 Veritas Cluster Server Recent Linux port 16 nodes, wide range of supported apps Also runs on Windows, AIX, UNIX, Solaris Integrates with their storage offerings (volume management, backup, data replication) Proprietary 1 1 1 Other Vendors Dell Strategic partnering for HA software Penguin Computing HPC offering via partnership with Scyld Beowulf 1 1 1 Consolidated Solutions Egenera BladeFrame hardware, backplane eliminates cabling Management software, HA, provisioning Linux NetworX Turnkey solution, preintegrated hardware + management tools Custom hardware, dense racks 1 1 1 Summary Know what category of cluster is right for you Be knowledgeable of required cluster features Weigh your cost criteria Chose a vendor you can trust to safeguard your corporate assets Be wary of marketing collateral