You are on page 1of 19

What do I need to know before setting up High Availability on a J-Series services router running JUNOS with enhanced services?

This article gives a basic understanding of what you will need to know before setting up JSRP (High Availability Cluster Configuration). PROBLEM OR GOAL: This article gives a basic understanding of what you will need to know before setting up JSRP (High Availability Cluster Configuration). This includes:

What are each of the onboard GE ports used for? How are the interfaces organized? What happens to the configuration on both cluster nodes?

SOLUTION: JSRP is a High Availability option for J-series services routers running JUNOS with enhanced services. JSRP allows the use of 2 physical services routers to be clustered together into one virtual chassis. This gives the services routers the ability to have redundancy capabilities. Also one Routing Engine (RE) assumes primaryship and manages both cluster members. You have the ability to have the cluster act in a active/passive mode or they can also both assume active roles in an active/active setup. JSRP only works if the clustering routers are of the same model and running the same version of JUNOS. The two routers do not have to have identical PIM cards but for redundant interfaces they have to be of the same type. At this time only Ethernet interfaces support redundancy. Some basic terminology for JSRP Cluster - Two routers forming a pair Node - A single router in the cluster Reth - A redundant interface made up of two Ethernet links, one from each device, that form a single logical interface Redundancy Group - A group of objects that define failover properties, can be reths or nodes or both. RTO - This stands for Real-time objects. These are dynamic objects sync'd between two nodes of the cluster including session states. Without session state sync, failover could cause existing TCP sessions to break. Interface Naming and Numbering When the services routers are put into cluster mode some of the onboard GE interfaces are reserved by JUNOS to allow the routers to sync session states as well as redundancy messages between the two nodes. Once the nodes are in cluster mode you will have a Control link (fxp1), a Management link (fxp0), and a Fabric link (fab0/fab1). The

following describes which interfaces are reserved and a brief description on what they are used for.

Control link - Ge-0/0/3 is converted fxp1 which is used by the routers as a control link between the two routers. So Ge-0/0/3 on node0 will be plugged into Ge-0/0/3 on node1. The control link sends heartbeat messages between the two routers to determine which RE will have mastership of all redundancy groups. Management link - Ge-0/0/2 is then converted to fxp0 which is used for individual management links for each of the routers. The management link is for individual access to each node independently for management purposes. Fabric link - You will need to designate one more interface on each node as a fab link, also known as a data link. This link between the nodes can be on onboard interface or an interface on a uPIM. The Fabric link is used for data flow between the two nodes (active/active) as well as for RTO sync information between nodes.

Note, the numbering of the interfaces will change. On node0 the FPC slot numbering will remain the same. The virtual chassis design treats FPC slots on node1 as a continuation from the FPC slots off node0. For instance on a J6350 node0 will have ge-0/0/0 through ge-6/0/0 and node1 will have ge-7/0/0 through ge-13/0/0. Unified Configurations The configurations of the two routers will be identical on node0 and node1 of the JSRP cluster. You only have to make configuration changes on one of the nodes, as the rest will be replicated over to the other node. For configuration details that are specific to only one of the nodes, such as management IP or hostname, configure these settings under the [edit groups node(#)] hierarchy. Redundant Interfaces (Reth) These are virtual interfaces that contain two member physical interfaces, one from each node. They can be either FE or GE type interfaces. The physical interfaces are children of the reth interface they are bound to and inherit the configuration of the parent reth interface. The Reth interface inherits its failover properties from the Redundancy Group it belongs to. Each child in the reth can act in an active or passive state but not both. Only one child member of the Reth interface can accept and send data at a time. Additional Note If you are not using any PIMs and only have the onboard interfaces in the device then you will be using either Ge-0/0/0 or Ge-0/0/1 for your fab link. This leaves only one physical interface available to use for traffic. Thus, if more than one interface is required in your environment then we recommend installing a PIM or uPIM to gain the additional ports needed.

How to: Setup a Chassis Cluster (High Availability) on a J Series device


This article describes the basic setup of a Chassis Cluster (High Availability), also known as JSRP, on J Series devices. The following setup steps will be described - setting up the JSRP cluster with nodes, also setting up the redundant interfaces.

PROBLEM OR GOAL:
Topology used for this example:

Note: This is similar to the example Active/Passive Chassis Cluster Scenario in Chapter 10 of the Security Configuration Guide. In this example, only one Redundancy Group, RG 1, is be for the failover properties of the interfaces that are defined in each Reth group.

SOLUTION:
The following are the basic steps required for configuring a Chassis Cluster on J Series devices. More details can be found in Chapter 10 of theSecurity Configuration Guide, located in the J-Series JUNOS Software Documentation. This article applies to:

J Series devices running: o JUNOS 9.4 and above o JUNOS with Enhanced Services 8.5 through 9.3

Physically connect the two devices, making sure they are the same models. For connecting the devices, it is helpful to know that after step2, the following will interface assignments will occur:

ge-0/0/2 will be used as fxp0 for individual management of each of the devices ge-0/0/3 will become fxp1 and used as the control link between the two devices The other interfaces are also renamed on the secondary device. For example, on a J2320 router, the ge0/0/0 interface on node 0 is renamed to ge-4/0/0 on node 1. Refer to the complete mapping for each JSeries device in Table 69: J-series Chassis Cluster Interface Naming Scheme of the Security Configuration Guide.

Notes: The interfaces used for the control link, in this example ge-0/0/3, must be connected with a cable. A switch cannot be used for the control link connection. Also, you will need to decide on a third link to connect the devices, which will be used for the fabric link between the devices. This can be ge-0/0/1 or any other open port.

Set the devices into cluster mode with the following command and reboot the devices. Note that this is an operational mode and not aconfigure mode command.

> set chassis cluster cluster-id <0-15> node <0-1> reboot


For example: On device A: On device B:

>set chassis cluster cluster-id 1 node 0 reboot >set chassis cluster cluster-id 1 node 1 reboot

Cluster id will be the same on both devices, but the node id should be different as one device is node0 the other device is node1 This command will need to be done on both devices The range for the cluster-id is 0-15. Setting it to 0 is the equivalent of disabling cluster mode.

After the reboot, note how the ge-0/0/2 and ge-0/0/3 interfaces are re-purposed to fxp0 and fxp1 respectively. NOTE: The following steps 3- 8 can all be performed on the primary device (Device A), and they will be automatically copied over to the secondary device (Device B) when a commit is done. Set up the device specific configurations such as host names and management IP addresses, this is specific to each device and is the only part of the configuration that is unique to its specific node. This is done by entering the following commands (all on the primary node): On device A:

{primary:node0} # set group node0 system host-name <name-node0> host name

-Device A's

# set group node0 interfaces fxp0 unit 0 family inet address <ip address/mask> -Device A's management IP address on fxp0 interface # set group node1 system host-name <name-node1> -Device B's host name # set group node1 interfaces fxp0 unit 0 family inet address <ip address/mask -Device B's management IP address on fxp0 interface
- This command is set so that the individual configs for each node set by the above commands is applied only to that node.

# set apply-groups "${NODE.EN_US}"


Create FAB links (data plane links for RTO sync, etc). On device A:

{primary:node0} # set interfaces fab0 fabric-options member-interfaces ge-0/0/1 -fab0 is node0 (Device A) interface for the data link # set interfaces fab1 fabric-options member-interfaces ge-4/0/1 -fab1 is node1 (Device B) interface for the data link
Set up the Redundancy Group 0 for the Routing Engine failover properties. Also setup Redundancy Group 1 (all the interfaces will be in one Redundancy Group in this example) to define the failover properties for the Reth interfaces. Note: If you want to use multiple Redundancy Groups for the interfaces, refer to the Security Configuration Guide.

{primary:node0} # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster

redundancy-group node 0 redundancy-group node 1 redundancy-group node 0 redundancy-group node 1

0 node 0 priority 100 0 node 1 priority 1 1 node 0 priority 100 1 node 1 priority 1

Set up the Interface monitoring. Monitoring the health of the interfaces is what triggers Redundancy group failover. On device A:

{primary:node0} # set chassis cluster 1/0/0 weight 255 # set chassis cluster 5/0/0 weight 255 # set chassis cluster 0/0/0 weight 255 # set chassis cluster 4/0/0 weight 255

reduncancy-group 1 interface-monitor fereduncancy-group 1 interface-monitor fereduncancy-group 1 interface-monitor gereduncancy-group 1 interface-monitor ge-

Setup the Redundant Ethernet interfaces (Reth interface) and assign the Redundant interface to a zone. Make sure that you setup your max number of redundant interfaces as follows: On device A:

{primary:node0} # set chassis cluster reth-count <max-number> # set interfaces <node0-interface-name> fastether-options

redundant-parent reth0 -for first interface in the group (on Device A) # set interfaces <node1-interface-name> fastether-options redundant-parent reth0 -for second interface in the group (on Device B) # set interfaces reth0 redundant-ether-options redundancy-group <group-number> -set up redundancy group for interfaces # set interfaces reth0 family inet address <ip address/mask> # set security zones security-zone <zone> interfaces reth0.0
For example: On device A:

{primary:node0} # set chassis cluster reth-count 2 # set interfaces ge-0/0/0 fastether-options redundant-parent reth1 -for first interface in the group (on Device A) # set interfaces ge-4/0/0 fastether-options redundant-parent reth1 -for second interface in the group (on Device B) # set interfaces reth1 redundant-ether-options redundancy-group 1 -set up redundancy group for interfaces # set interfaces reth1 family inet address 1.2.0.233/24 # set interfaces fe-1/0/0 fastether-options redundant-parent reth0 -for first interface in the group (on Device A) # set interfaces fe-5/0/0 fastether-options redundant-parent reth0 -for second interface in the group (on Device B) # set interfaces reth0 redundant-ether-options redundancy-group 1 -set up redundancy group for interfaces # set interfaces reth0 family inet address 10.16.8.1/24 # set security zones security-zone Untrust interfaces reth1.0 # set security zones security-zone Trust interfaces reth0.0

Commit and changes will be copied over to the Secondary Node, Device B. On device A:

{primary:node0} # commit
This will prepare the basic clustering setting for both the routers. You can check the cluster status with the following commands.

> show chassis cluster status > show chassis cluster interfaces > show chassis cluster statistics

SRX Getting Started - Configure Chassis Cluster (High Availability) on a SRX650 device
PROBLEM OR GOAL:

Configure SRX650 devices as a Chassis Cluster. The following topology will be used for the configuration:

SOLUTION:
This section contains the following:

Configuration Technical_Documentation Verification Troubleshooting

Configuration
The following are the basic steps required for configuring a Chassis Cluster on SRX650 devices. It is best to use a console connection to the SRX devices when following these steps.

Physically connect the two devices, making sure they are the same models. On the SRX 650 device, connect ge-0/0/1 on device A to ge-0/0/1 on device B. The ge-0/0/1 interface on device B will change to ge-9/0/1 after clustering happens. For connecting the devices, it is helpful to know that after step2, the following will interface assignments will occur:

ge-0/0/0 will be used as fxp0 for individual management of each of the devices ge-0/0/1 will become fxp1 and used as the control link between the two devices (This is also documented in KB15356.) The other interfaces are also renamed on the secondary device. For example, on a SRX 650 device, the ge-0/0/0 interface is renamed to ge-9/0/0 on the secondary node 1. Refer to the complete mapping for each SRX Series device in 'Table 160: SRX Series Chassis Cluster Slot Numbering and Interface Naming Example' of the Security Configuration Guide.

Important: The interfaces used for the control link, in this example ge-0/0/1, must be connected with a cable. A switch cannot be used for the control link connection. Also, you will need to decide on a third link to connect the devices, which will be used for the fabric link between the devices. This can be ge-0/0/2 or any other open port either onboard or on a gPIM. Set the devices into cluster mode with the following command and reboot the devices. Note that this is an operational mode and not aconfigure mode command.

> set chassis cluster cluster-id <0-15> node <0-1> reboot


For example: On device A: On device B:

>set chassis cluster cluster-id 1 node 0 reboot >set chassis cluster cluster-id 1 node 1 reboot

Cluster id will be the same on both devices, but the node id should be different as one device is node0 the other device is node1 This command will need to be done on both devices The range for the cluster-id is 0-15. Setting it to 0 is the equivalent of disabling cluster mode.

After the reboot, note how the ge-0/0/0 and ge-0/0/1 interfaces are re-purposed to fxp0 and fxp1 respectively. NOTE: The following steps 3- 8 can all be performed on the primary device (Device A), and they will be automatically copied over to the secondary device (Device B) when a commit is done. Set up the device specific configurations such as host names and management IP addresses, this is specific to each device and is the only part of the configuration that is unique to its specific node. This is done by entering the following commands (all on the primary node): On device A:

{primary:node0} # set group node0 system host-name <name-node0> -Device A's host name # set group node0 interfaces fxp0 unit 0 family inet address <ip address/mask> -Device A's management IP address on fxp0 interface # set group node1 system host-name <name-node1> -Device B's host name # set group node1 interfaces fxp0 unit 0 family inet address <ip address/mask -Device B's management IP address on fxp0 interface
- This command is set so that the individual configs for each node set by the above commands is applied only to that node.

# set apply-groups "${node}"


Create FAB links (data plane links for RTO sync, etc).

On device A:

{primary:node0} # set interfaces fab0 fabric-options member-interfaces ge-0/0/2 -fab0 is node0 (Device A) interface for the data link # set interfaces fab1 fabric-options member-interfaces ge-9/0/2 -fab1 is node1 (Device B) interface for the data link
Set up the Redundancy Group 0 for the Routing Engine failover properties. Also setup Redundancy Group 1 (all the interfaces will be in one Redundancy Group in this example) to define the failover properties for the Reth interfaces. Note: If you want to use multiple Redundancy Groups for the interfaces, refer to the Security Configuration Guide.

{primary:node0} # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster

redundancy-group node 0 redundancy-group node 1 redundancy-group node 0 redundancy-group node 1

0 node 0 priority 100 0 node 1 priority 1 1 node 0 priority 100 1 node 1 priority 1

Set up the Interface monitoring. Monitoring the health of the interfaces is one way to trigger Redundancy group failover. Note: interface monitoring is not recommended for redundancy-group 0. On device A:

{primary:node0} # set chassis cluster 1/0/0 weight 255 # set chassis cluster 10/0/0 weight 255 # set chassis cluster 1/0/1 weight 255 # set chassis cluster 10/0/1 weight 255

redundancy-group 1 interface-monitor geredundancy-group 1 interface-monitor geredundancy-group 1 interface-monitor geredundancy-group 1 interface-monitor ge-

Setup the Redundant Ethernet interfaces (Reth interface) and assign the Redundant interface to a zone. Make sure that you setup your max number of redundant interfaces as follows: On device A:

{primary:node0} # set chassis cluster reth-count <max-number> # set interfaces <node0-interface-name> gigether-options redundant-parent reth0 -for first interface in the group (on Device A) # set interfaces <node1-interface-name> gigether-options redundant-parent reth0 -for second interface in the group (on Device B) # set interfaces reth0 redundant-ether-options redundancy-group <group-number> -set up redundancy group for interfaces # set interfaces reth0.0 family inet address <ip address/mask> # set security zones security-zone <zone> interfaces reth0.0
For example: On device A:

{primary:node0} # set chassis cluster reth-count 2

# set interfaces ge-1/0/0 gigether-options redundant-parent reth1 -for first interface in the group (on Device A) # set interfaces ge-10/0/0 gigether-options redundant-parent reth1 -for second interface in the group (on Device B) # set interfaces reth1 redundant-ether-options redundancy-group 1 -set up redundancy group for interfaces # set interfaces reth1 family inet address 1.2.0.233/24 # set interfaces ge-1/0/1 gigether-options redundant-parent reth0 -for first interface in the group (on Device A) # set interfaces ge-10/0/1 gigether-options redundant-parent reth0 -for second interface in the group (on Device B) # set interfaces reth0 redundant-ether-options redundancy-group 1 -set up redundancy group for interfaces # set interfaces reth0 family inet address 10.16.8.1/24 # set security zones security-zone Untrust interfaces reth1.0 # set security zones security-zone Trust interfaces reth0.0
Commit and changes will be copied over to the Secondary Node, Device B. On device A:

{primary:node0} # commit
This will prepare the basic clustering setting for both the devices.

Technical Documentation
JUNOS Security Configuration Guide

PDF - See Chapter 28, Chassis Cluster (page 865) HTML - Chassis Cluster

Verification
You can check the cluster status with the following commands.

show show show show show show

chassis chassis chassis chassis chassis chassis

cluster cluster cluster cluster cluster cluster

status interfaces statistics control-plane statistics data-plane statistics status redundancy-group 2

Refer to the JUNOS Security Configuration Guide for what these commands mean: HTML - Verifying the Chassis Cluster Configuration

SRX Getting Started - Configure Chassis Cluster (High Availability) on a SRX650 device

PROBLEM OR GOAL:
Configure SRX650 devices as a Chassis Cluster. The following topology will be used for the configuration:

SOLUTION:
This section contains the following:

Configuration Technical_Documentation Verification Troubleshooting

Configuration
The following are the basic steps required for configuring a Chassis Cluster on SRX650 devices. It is best to use a console connection to the SRX devices when following these steps.

Physically connect the two devices, making sure they are the same models. On the SRX 650 device, connect ge-0/0/1 on device A to ge-0/0/1 on device B. The ge-0/0/1 interface on device B will change to ge-9/0/1 after clustering happens. For connecting the devices, it is helpful to know that after step2, the following will interface assignments will occur:

ge-0/0/0 will be used as fxp0 for individual management of each of the devices ge-0/0/1 will become fxp1 and used as the control link between the two devices (This is also documented in KB15356.) The other interfaces are also renamed on the secondary device. For example, on a SRX 650 device, the ge-0/0/0 interface is renamed to ge-9/0/0 on the secondary node 1. Refer to the complete mapping for each SRX Series device in 'Table 160: SRX Series Chassis Cluster Slot Numbering and Interface Naming Example' of the Security Configuration Guide.

Important: The interfaces used for the control link, in this example ge-0/0/1, must be connected with a cable. A switch cannot be used for the control link connection. Also, you will need to decide on a third link to connect the devices, which will be used for the fabric link between the devices. This can be ge-0/0/2 or any other open port either onboard or on a gPIM. Set the devices into cluster mode with the following command and reboot the devices. Note that this is an operational mode and not aconfigure mode command.

> set chassis cluster cluster-id <0-15> node <0-1> reboot


For example: On device A: On device B:

>set chassis cluster cluster-id 1 node 0 reboot >set chassis cluster cluster-id 1 node 1 reboot

Cluster id will be the same on both devices, but the node id should be different as one device is node0 the other device is node1 This command will need to be done on both devices The range for the cluster-id is 0-15. Setting it to 0 is the equivalent of disabling cluster mode.

After the reboot, note how the ge-0/0/0 and ge-0/0/1 interfaces are re-purposed to fxp0 and fxp1 respectively. NOTE: The following steps 3- 8 can all be performed on the primary device (Device A), and they will be automatically copied over to the secondary device (Device B) when a commit is done. Set up the device specific configurations such as host names and management IP addresses, this is specific to each device and is the only part of the configuration that is unique to its specific node. This is done by entering the following commands (all on the primary node): On device A:

{primary:node0} # set group node0 system host-name <name-node0> -Device A's host name # set group node0 interfaces fxp0 unit 0 family inet address <ip address/mask> -Device A's management IP address on fxp0 interface # set group node1 system host-name <name-node1> -Device B's host name # set group node1 interfaces fxp0 unit 0 family inet address <ip address/mask -Device B's management IP address on fxp0 interface
- This command is set so that the individual configs for each node set by the above commands is applied only to that node.

# set apply-groups "${node}"

Create FAB links (data plane links for RTO sync, etc). On device A:

{primary:node0} # set interfaces fab0 fabric-options member-interfaces ge-0/0/2 -fab0 is node0 (Device A) interface for the data link # set interfaces fab1 fabric-options member-interfaces ge-9/0/2 -fab1 is node1 (Device B) interface for the data link
Set up the Redundancy Group 0 for the Routing Engine failover properties. Also setup Redundancy Group 1 (all the interfaces will be in one Redundancy Group in this example) to define the failover properties for the Reth interfaces. Note: If you want to use multiple Redundancy Groups for the interfaces, refer to the Security Configuration Guide.

{primary:node0} # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster # set chassis cluster

redundancy-group node 0 redundancy-group node 1 redundancy-group node 0 redundancy-group node 1

0 node 0 priority 100 0 node 1 priority 1 1 node 0 priority 100 1 node 1 priority 1

Set up the Interface monitoring. Monitoring the health of the interfaces is one way to trigger Redundancy group failover. Note: interface monitoring is not recommended for redundancy-group 0. On device A:

{primary:node0} # set chassis cluster 1/0/0 weight 255 # set chassis cluster 10/0/0 weight 255 # set chassis cluster 1/0/1 weight 255 # set chassis cluster 10/0/1 weight 255

redundancy-group 1 interface-monitor geredundancy-group 1 interface-monitor geredundancy-group 1 interface-monitor geredundancy-group 1 interface-monitor ge-

Setup the Redundant Ethernet interfaces (Reth interface) and assign the Redundant interface to a zone. Make sure that you setup your max number of redundant interfaces as follows: On device A:

{primary:node0} # set chassis cluster reth-count <max-number> # set interfaces <node0-interface-name> gigether-options redundant-parent reth0 -for first interface in the group (on Device A) # set interfaces <node1-interface-name> gigether-options redundant-parent reth0 -for second interface in the group (on Device B) # set interfaces reth0 redundant-ether-options redundancy-group <group-number> -set up redundancy group for interfaces # set interfaces reth0.0 family inet address <ip address/mask> # set security zones security-zone <zone> interfaces reth0.0

For example:

On device A:

{primary:node0} # set chassis cluster reth-count 2 # set interfaces ge-1/0/0 gigether-options redundant-parent reth1 -for first interface in the group (on Device A) # set interfaces ge-10/0/0 gigether-options redundant-parent reth1 -for second interface in the group (on Device B) # set interfaces reth1 redundant-ether-options redundancy-group 1 -set up redundancy group for interfaces # set interfaces reth1 family inet address 1.2.0.233/24 # set interfaces ge-1/0/1 gigether-options redundant-parent reth0 -for first interface in the group (on Device A) # set interfaces ge-10/0/1 gigether-options redundant-parent reth0 -for second interface in the group (on Device B) # set interfaces reth0 redundant-ether-options redundancy-group 1 -set up redundancy group for interfaces # set interfaces reth0 family inet address 10.16.8.1/24 # set security zones security-zone Untrust interfaces reth1.0 # set security zones security-zone Trust interfaces reth0.0
Commit and changes will be copied over to the Secondary Node, Device B. On device A:

{primary:node0} # commit
This will prepare the basic clustering setting for both the devices.

Technical Documentation
JUNOS Security Configuration Guide

PDF - See Chapter 28, Chassis Cluster (page 865) HTML - Chassis Cluster

Verification
You can check the cluster status with the following commands.

show show show show show show

chassis chassis chassis chassis chassis chassis

cluster cluster cluster cluster cluster cluster

status interfaces statistics control-plane statistics data-plane statistics status redundancy-group 2

Refer to the JUNOS Security Configuration Guide for what these commands mean: HTML - Verifying the Chassis Cluster Configuration

SRX Getting Started -- Troubleshoot High Availability (HA)

PROBLEM OR GOAL: SOLUTION:


When working with Chassis cluster configurations the most common SRX high availability issues are due to basic configuration or architectural issues, so common clustering issues will be examined first, followed by various commands that can be used to check the HA state, then delve into the debugging facilities.

Establishing Chassis Cluster Tips


1.
Is chassis clustering enabled? Check the output of the show chassis cluster status to determine the status of a chassis cluster. If chassis clustering is not enabled, the following will be displayed:

root@SRX210> show chassis cluster status error: Chassis cluster is not enabled.
If chassis clustering is enabled, the output will look something like the following:

root@SRX5800-1> show chassis cluster status Cluster ID: 1 Node name Priority Status failover

Preempt

Manual

Redundancy group: 0 , Failover count: 1 node0 1 primary no node1 1 secondary no Redundancy group: 1 , Failover count: 1 node0 254 primary no node1 1 secondary no

no no no n

2.

Is there 'like' hardware in both nodes (chassis members)? A hardware mismatch could result in a coldsync failure or one of the nodes could be in the disabled state. In a chassis cluster environment, each node must have the same hardware, with the following exception: On the SRX5600 and SRX5800, it doesnt strictly matter which slots are used for the different cards (however recommended that they match for simplicity sake) so long as you have the same number and type of cards exist in both chassis nodes. On the SRX3400 and SRX3600 and SRX Branch products (SRX100, SRX210, SRX240, and SRX650), the same hardware is required in both cluster nodes, in the same slots.

3.

Is the JUNOS version the same on both nodes? A software mismatch could result in each node not seeing each other (split-brain) or other unpredictable behavior. Each node of a SRX chassis cluster must be running the same version of JUNOS. In JUNOS 9.6 and above, with Low Impact Cluster Upgrades, the SRX can have RTO sync while the second platform is upgraded to the new version to minimize failover; however the SRX does not support running the two members indefinitely on different JUNOS versions today.

4.

Have both nodes in the Chassis Cluster been rebooted? In order setup a chassis cluster, each chassis cluster node must be rebooted (today, this may change in the future) before they can join the cluster. If you do not reboot the nodes, then the member will not be

activated. This can be checked with the command show chassis cluster status to check the status of the cluster nodes. Also, if you RMA a device (or routing engine) you will need to issue this command again on the replaced unit, since the setting is stored in NVRAM and not in the configuration itself, when the device is replaced, it will not have this setting.

5.

Is the control link in the appropriate port on the SRX? o On the SRX3400 and SRX3600, the control port is fixed to HA port 0. o In JUNOS 10.0 with dual control links supported (requires 2 REs), control port 1. However, this will only be supported on the SRX 5600 and 5800 units, as an issue on the SRX 3400 and 3600 has delayed it's implementation for a few releases. o On the SRX3400 and SRX3600, you can use copper or fiber SFPs. o On the SRX5600 and SRX5800, you must also configure which ports are used for the control ports (this is not required on the SRX3400 and SRX3600). o On the SRX 210, the control port must be in the port fe-0/0/7, due to how the architecture is configured.

6.

If using dual control links are you running the correct version, and do you have dual routing engines? On the SRX high-end devices (SRX3400, SRX3600, SRX5600 and SRX5800), the SRX supports dual control links in JUNOS version 10.0 and beyond; however you must have two routing engines in each cluster member. The second RE is not used for backup routing today, but is used to activate the control link port in the internal switch.

7.

Is the data link properly configured on the SRX? Unlike ScreenOS, the SRX requires separate links for the control and datalink, along with different connections to the control/dataplane. For the data fabric ports, this is done by using a dataplane port (revenue port) for the dataplane. This must be configured manually to specify which interface will be the dataplane port. If using Active/Passive, 1Gbps will be more than enough for the function; 10Gbps has no advantage, however in Active/Active, if you are going to have data arrive on an interface on one chassis member, and cross the datalink to exit an egress interface on the other member, then a 10Gbps link should be used for this configuration to maximize the throughput. The datalink must be established for the HA communication to be fully supported, since it is responsible for synchronizing the real time objects to the other member.

8.

If multiple SRX clusters exist on the same L2 broadcast domain, is the same cluster ID used? If you have multiple SRX clusters on the same L2 broadcast domain, you must use different cluster ID numbers, because the cluster ID is used to form the virtual MAC address that is used for the RETH interface. Therefore if you use the same cluster ID then you will have a MAC address overlap and forwarding problems will occur. The figure below shows how the MAC address is calculated for a RETH interface.

Failover Behavior Tips


1.
Have the appropriate redundancy groups been configured on the chassis with the appropriate priorities? Redundancy Group 0 must be configured for the control plane, and the lower priority is preferred over high priority. Redundancy Group 1 and higher are used for the dataplane. In order for proper failover to occur, the SRX must make sure that the appropriate priorities are configured for the appropriate node. In question 2 there is an example of the output of the "show chassis cluster status" of a working and functioning SRX Chassis cluster, with the appropriate priority values (non-zero) for full operation. Has the SRX had enough time to boot and complete the cold sync process? Keep in mind that it takes about 5 minutes to boot up and 5 minutes to complete cold sync on a chassis, so enough time must be given to ensure that this process has completed before performing a failover. Check the output of the "show chassis cluster status" command to ensure that there are no redundancy groups with a priority of 0, as shown below.

2.

root@SRX3400-1> show chassis cluster status Cluster ID: 1 Node name Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 200 primary no node1 100 secondary no Redundancy group: 1 , Failover count: 1 node0 200 primary no node1 100 secondary no no no no no

3.

Is preempt configured for redundancy groups? If you enable preempt (which can be done on a redundancy group by redundancy group basis), the lower priority redundancy group will become the active member if the other member of the redundancy group has a higher priority and is active. If preempt is not enabled, then when a higher priority member becomes active (after being disabled) it will not seize control of the redundancy group.

4.

Is the redundant Ethernet configuration configured within the proper redundancy group, and is that group configured with the correct priority on the correct node? When using redundant Ethernet, you must make sure that the redundant Ethernet is applied to the correct redundancy group (must be RG1 or higher) and that the redundancy group has the appropriate priority values to be active on the correct node. When using Active/Passive, the control plane is RG0,

while the dataplane is RG1. In Active/Active, you can have multiple redundancy groups (RG1, RG2, RG3 &c) which can be active on different members, controlled by the priority setting for the nodes.

5.

After a control link failure, was a reboot performed on the disabled node to reactivate it in the cluster? Unless control-link-recovery is enabled, you will need to manually reboot the disabled node for it to become the secondary node in the cluster. If you do not reboot the disabled member, then it will remain in the disabled state.

6.

After a data link failure, was a reboot performed on the disabled node to reactivate it in the cluster? If a data link failure occurs, then a reboot must be performed on the disabled member in order for it to become active again. If you do not do a reboot, then the disabled member will not be able to become active. There is no command at this point to automatically reboot when a datalink failure occurs.

7.

After a manual failover of a redundancy group, was the manual failover flag cleared? When you use the command request chassis cluster failover <redundancy-group> node <new master node> to failover a redundancy group, you must clear the failover with the command request chassis cluster failover reset redundancy-group <redundancy-group>

8.

Is the feature you are trying to use supported in High Availability clusters? Today there are many features that are only supported in standalone mode and not in HA. Be cognizant that some features may not be enabled in HA, so check with the documentation, JTAC, or PLM if you run into an issue. Security Configuration Guide: http://www.juniper.net/techpubs/software/junos-srx/junossrx96/index.html http://kb.juniper.net/KB14371

Additional Troubleshooting
Below are additional basic troubleshooting steps that you can perform: 1. Check the status of show chassis cluster status which will display what the current status of the chassis:

root@SRX5800-1> show chassis cluster status Cluster ID: 1 Node name Priority Status failover

Preempt

Manual

Redundancy group: 0 , Failover count: 1 node0 1 primary no node1 1 secondary no Redundancy group: 1 , Failover count: 1 node0 254 primary no node1 1 secondary no

no no no n

2.

Check what the status of the participating physical and logical interfaces are. You can do this by using the commands show interfaces <interface> terse as well as show chassis cluster interface:

root@SRX5800-1> show interfaces terse Interface Admin Link Proto Local Remote gr-0/0/0 up down ip-0/0/0 up down mt-0/0/0 up down pd-0/0/0 up down pe-0/0/0 up down ge-11/0/0 up up ge-11/0/0.0 up up inet 200.200.200.1/24 multiservice ge-11/0/1 up up root@SRX5800-1> show chassis cluster interfaces Control link name: em0 Redundant-ethernet Information: Name Status Redundancy-group reth0 Down 1 reth1 Down 1 reth2 Down 1
3. Check the status of the control plane to see if you are receiving heartbeats and messages (for both control and data links) using the command show chassis cluster control-plane statistics:

root@SRX5800-1> show chassis cluster control-plane statistics Control link statistics: Heartbeat packets sent: 692386 Heartbeat packets received: 692352 Fabric link statistics: Probes sent: 692381 Probes received: 692100

4.

Check logs on both nodes. The following logs typically will help you identify any HA issues: FOR BOTH NODES:

show show show show

log log log log

jsrpd messages chassisd dcd

(will report hardware chassis failures)

show chassis cluster status show chassis cluster statistics show chassis cluster information