You are on page 1of 21

Asterisk / Linux Contingency

What will you do when things go wrong?
Options for disaster recovery
► Highly redundant server system
► Disk level backups
► High Availability Cluster

Each has its ups and downs:
► A redundant server system with RAID and redundant power = $$$$,
but still present a single point of failure (SPOF).
► Backups are mandatory for any enterprise class system, but still don’t
guarantee uptime.
► High Availability across 2 or more low cost servers is much more cost
effective and lowers SPOF significantly.
HA Alternatives
► Rendundant server systems should have at least RAID1
disk mirroring and dual redundant power supplies – such
systems start in the $2500 range.

► Disk level backups – simple file backups are ok, but
require that you first install OS and patches before
restoring files. An imaging server for a complete disk
image is preferred. MondoArchive is the only disk level
imaging system that does not require a dedicated
imaging server and can run without shutting down the
PBX (mondo is hardware specific).
Backup Design
Optimally backups should be stored offsite and/or on multiple media to
avoid location related disaster
On to HA
► High availability has 2 major components:
 The heartbeat system (notifies servers of outage)
 Data synchronization system (syncs data between 2 servers)
► This presentation will use Open Source Linux HA ( and
DRBD (Distributed Replicated Block Device) – DRBD is like RAID1 mirroring, except 1
hard drive is in one server, and the other is across the network in another server.

► HA starts and stops services in the event of a failure, or manual shutdown
► DRBD mirrors all data across the network in real time, in this model we assume that only
one copy of this mirror will be live at any given time, in the even of a failure the other
copy comes live.
► Simple rsync+cron could be used instead of DRBD, but is not as fast or efficient.
DRBD HA Diagram
HA can be setup in several different manners, this document
uses the following due to its simplicity and effectiveness:
Recommended HA physical layout
► The previous slide describes the physical layout quite well.
► A 2 node cluster is best for simple failover needs
► Each node should be connected on a separate subnet
using gigabit nics with a crossover cable (no switch) for
heartbeat and data sync
► Each node should have its own dedicated UPS and if
possible its own dedicated circuit breaker
► It would be even better if each node where in 2 separate
rooms, or buildings even – it is best to maintain them both
on the same local LAN, but could be done over a WAN if
speeds permit.
What Can HA Clustering Do For You
HA Details
► HA sends heartbeat signals back and forth
between 2 (or more) servers
► If a failure occurs you can detect it in milliseconds
and the standby machine can take over in 5
seconds or less (as long as your network can
provide the communication speed)
► Each server has its own IP address plus a floating
IP that is controlled by the HA service, the floating
IP will be used only on the current live system.
HA Installation
► It is recommended that install all the HA+DRBD packages then
use the sample config files here:
► On CentOS (Redhat)
Redhat) based distros (ie trixbox)
trixbox) you can use yum to install
the HA package:
 yum install -y heartbeat
DRBD Config
► Unfortunately DRBD is not quite as simple to install – due to package availability*,
and partitioning.
► You will need to either
 a: repartition your drive with an area for the DRBD partition, as
as shown on pages 3-
3-6 in the appendix**
 b: install a 2nd drive dedicated to the DRBD partition (easiest)
► If your linux distro maintains up to date packages*, you can use yum to
install, unfortunately this is usually not the case.
► 3 Components of DRBD install:
 DRBD binary (uses /etc/drbd.conf)
 DRBD Kernel Module (version specific to your kernel)
 DRBD Links (builds links from your file system to the relocated files on the DRBD
The following works on CentOS 5.1 with kernel 2.6.18-53.1.4.el5:
yum install drbd
rpm -ihv
rpm -ihv
► OS/Distro,, then install a 2nd harddrive for DRBD
Recommendation – install your OS/Distro
► Recommended – download my ha/drbd
ha/drbd config files when you get all the components installed:
► *Find a complete package or compile from source from
DRBD Config Continued
► #On both nodes(servers):
drbdadm create-md share
► #share is the name of your resource in /etc/drbd.conf
► #now on the primary node:
drbdadm -- --overwrite-data-of-peer primary all
► # this may take a LONG time to run (in the background)
► # check progress by typing: watch cat /proc/drbd
► #finally on primary node: format the drbd0 partition with a file system:
mkfs -t ext3 /dev/drbd0
► #now go to secondary node and type to sync up to the primary node:
drbdadm attach share
► # this may take a LONG time to run (in the background)
cat /proc/drbd #should tell you "ds:UpToDate/UpToDate"
► #on primary mount the new file system under the new "share" folder to test:
mkdir /share
mount /dev/drbd0 /share
Final Heartbeat config
► On both servers do the following:
► Stop all services that need failover and set to manual
► Set Heartbeat service to automatic
► Use tar to copy all service specific config files to the DRBD partition –
this only need be done on the current master server
► Add said files and folders to /etc/drbdlinks.conf to automatically build
links to the DRBD partition
► Remove amportal from /etc/rc.local, and build a new amportal script
that is HA compliant
► Edit /etc/ha.d/ , haresources , and authkeys, as well as
/etc/drbd.conf, to meet your needs
► Finally edit /etc/my.cnf:
Caviots of HA
► You want to avoid having 2 primary nodes – if both nodes are still up
but fail to see each other they would then both become primarys, and
data will become out of sync (known as “Split Brain”)
 Use ipfail and pingd/ping_group in your to minimize this possibility
► If you mess something up (delete a file, change a user) – the mistake
will instantly be synchronized to both servers – regular backups with
offsite or removable media should still be used.
► Normal RAID1 in each server is still recommended to increase uptime,
but not required - a failed disc will disconnect all calls if failover occurs.
► Supports auto failover of SIP, T1/PRI trunks (using Redfone TDMoE
hardware) or analog lines wired in parallel.
► So far IAX wont use a floating IP, it must use the real IP of the primary
server (haven’t really investigated this yet).
Resolving Failover Problems
► Resolving "unclean" failovers in which your data becomes out of sync between 2 primary nodes (aka
"Split Brain"):
► #Check both nodes to see that they are both running StandAlone status, run:
► cat /proc/drbd
► #Stop Heartbeat services on both nodes:
► service heartbeat stop
► #One of the nodes must discard its data, and allow the other to overwrite it, on node to discard run:
► drbdadm secondary share
► #(assuming "share" is your DRBD resource name)
► drbdadm -- --discard
my-data connect share

► #On the other node (the split brain survivor - aka data you wish to keep),
► #if its connection state is also StandAlone,
StandAlone, you would run:
► drbdadm connect share
► #(assuming "share" is your DRBD resource name)

► #Allow the 2 file systems to sync up, check by:
► watch cat /proc/drbd
► #Then it is recommended to reboot the 2 systems and let HA services
services once again manage the cluster
and file systems, if HA seems to be having further problems, restart
restart the HA services and run:
► tail –f /var
HA Advanced config (HA v2)
► This document does not have the capacity to cover HA v2,
and uses HAv1 which is far simpler.
► HAv1: uses 2 simple files (/etc/ha.d/ + haresources)
to configure and manage the cluster
► HAv2: Uses a complex xml config database that offers
many advanced options- primarily/most importantly
resource monitoring, rather than simple server heartbeat
monitoring: /var/lib/heartbeat/crm/cib.xml
 If your service (ie asterisk) provides proper “status” information,
HA can monitor that status and do so on several different services,
however if the service cannot provide failure notifications through
monitoring/status queries, you must custom build such capacity
using scripting (OCF Resource Agent)
Some Linux-HA Terminology
Key Linux-HA Processes (r2)
Linux-HA Architecture (r2)
Appendix & Notes
► Good references:
discussion/ha-cluster <<decent guide
► <<notes on partitions**
► http://www.linux-
► http://www.linux-
► http://www.voip-

► http://www.linux- <<VERY detailed
►;O=A CentOS 5.1 kernel modules

► trixbox + Redfone
► Major credit goes to Alan Robertson for material taken from his
extensive HA guide

► Asterisk, Digium and the Asterisk logo are registered trademarks of
Digium Corporation
► DRBD is a registered trademark of Linbit
► Linux is a registered trademark of Linus Torvalds
► Redfone and Fonebridge are registered trademarks of Redfone
Communications LLC.
► trixbox is a registered trademark of Fonality Inc.
► All other trademarks are property of their respective owners.
► Finally – me: This presentation was organized by John Hyde, this
document is copyright Simple Technologies under GPLv3 – if you
modify it please contribute your modifications back.

► Please check back at for future additions to
this document.