You are on page 1of 27
JUNIRES BRANCH SRX SERIES AND J SERIES CHASSIS CLUSTERING Configuring Chassis Clusters on Branch SRX Series Services Gateways and J Series Services Routers Table of Contents Introduction 3 Scope 3 Design Considerations 3 Hardware Reauirements.. 3 Software Requirements 3 Description and Deployment Scenario 3 3 7 7 7 7 7 Feature Description Redundant Ethernet Interfaces, Link Aggregation Interfaces and LACP emote Performance Monitoring IP Monitoring. Feature Support and Comparison Matrix. Clustering Configuration. 8 Disabling a Chassis Cluster 10 Cluster Monitoring, 10 ‘Viewing the Chassis Cluster Status 10 Viewing the Cluster Statistics " ‘Viewing the Control Link Status ” Viewing the Session 12 Deployment Scenarios 2 Active/Passive Cluster. 2 symmetric Routing Scenario. 1% (Case |: Failures in the Trust Zone RETH 15 Case I: Failures in the Untrust Zone Interfaces 15 Active/Active Full Mesh. 0 Special Consideration. ” Cluster Upgrade. 18 In-Band Management of Chassis Clusters. 18 Problem Statement . 18 Description and Deployment Scenario 20 Connecting to a Cluster Using SSH/Telnet 20 in-band Management Through Network and Security Manager a Updating the IDP Signatures. 2 Using SNMP 2 Software Upgrades. 24 Summary z [About Juniper Networks. 2 List of Figures Figure 1: Junos OS redundancy model 4 Figure 2: Device clustering 5 Figure 3: Active/passive cluster 3 Figure 4: Asymmetric routing scenario. 15 Figure S: Active/actve full mesh scenario, ” Figure 6: SRX Series clustering model 19 Figure 7: Common branch deployment scenarios for SRX Series clustering 20 Figure 8: Adding a cluster as a Virtual Chassisin NSM 2 2 (epyreht 0 200 per NWR Fe Introduction Modern networks require high availability. In order to accommodate this requirement, Juniper Networks® SRX Series Services Gateways and J Series Services Routers can be configured to operate in cluster mode, where a pair of devices can be connected together and configured to operate like a single node, providing device interface, and service level redundancy. Starting with the 90 release of Juniper Networks® Junos® operating system, Juniper Networks ) Series Services Routers and SRX Series Services Gateways may be deployed using the chassis cluster feature to provide high availabilty (HA). For the J Series, this feature is only available withthe flow-enabled version of Junos OS. With the Introduction ofthe SRX Series services gateways for the branch in Junos OS release 5, HA Is supported on all branch SRK Series devices. Scope ‘The purpose of this application note isto revew the HA chassis clustering feature together with ts imitations and design considerations, We wil aso giscuss some common use cases and how they telats te thei Junger Networks ScreenOS" Software NetScreen Redundancy Protocol (NSRP) counterparts Design Considerations High availability between devices is easly incorporated into enterprise designs and is particularly relevant when architecting branch and remote site links to larger corporate offices, By leveraging the HA feature, enterprises can ensure connectivity in the event of device or link failure. Hardware Requirements “Two identical J Series secure routers per cluster (Juniper Netwerks 12320 Senices Router, 12350 Services Router, 4350 ‘Services Router or J6350 Services Router) or “Two identical SRX Series gateways per cluster (Juniper Networks SRXI00 Services Gateway, SRXNO Services Gateway, ‘SRXZI0 Services Gateway, SRX220 Services Gateway, SRX240 Services Gateway, or SRX65O Services Gateway) Software Requirements Flaw-enabied Junos 05 9.0 of later for Series secure routers unos OS release 9.5 and later for SRX Series Services Gateways Description and Deployment Scenario Chassis clustering between devices may be deployed in ether active/passive or active/active scenarios, Junos OS allows an Hi cluster to additionally be used in asymmetric routing scenarios. Code examples are provided throughout this document, and deployment scenarios are discussed towards the end of the pager. Feature Description “The HA feature is modeled after redundancy features first introduced in Juniper Networks M Series Multiservice Edge Routers and T Series Core Routers. We wil ist give a brief overview of the way Junos OS redundancy wotks, so that we can bettar understand how this madel if applied when clustering devices. As Junos OS is designed with separate control and data planes, redundancy must operate in both. The conto plane in Junos OS is managed by Routing Engines (REs), which perform all the routing and forwarding computations (among many other functions). Once the Control plane converges, forwarding entries are pushed to all Packet Forwarding Engines (PFES), which are virtualized fon J Series routers, PFEs then perform route-based lookups to determine the appropriate destination for each packet Independent of the RES. This simplistic view of the Junos OS forwarding paradigm is represented in Figure 1. Cepyrert 0208 npr hewore 3 “The Junos 05 Redundancy Model FCO Figure I: Junos OS redundancy model Control plane failover is provided in Junos OS by using graceful restart or nonstop active routing (NSA). In the former, the router signals a consol plane fale tothe rest of the network. while continuing to forwarc! traffic on the data plane (since a control plane failure doesn't affect the forwarding plane). The rest ofthe network will continue to use the restarting router (fora grace period). while the restarting router forms new adjacencies. The backup RE inthis Scenario detects the encie configuration, but not the runtime state of the control plane. Ina failure, the backup RE has torecalculate all routing/forwarding tables. Nonstop routing leverages state replication between Routing Engines. in this case, a restarting outer handles contral plane failures transparently, as the backup RE takes control of the router without any assistance from the rest of the network. Routing protocols handle data plane failures, while interface, PFE, or FPC failovers are handled by diverting traffic through other interfaces, which can be achieved by using conventional routing protocols, Vitual Router Redundancy Protocol (VRRP). or aggregate interfaces. When enabling a chassis Cluster for | Series routers, unos OS uses a similiar madel-less the nanstop routing state replication-to provicie control plane redundancy as shown in Figure 2. z (epyreht 0 200 per NWR Fe "Nonstop Routing/Graceful Restart Provide Nonstop Fallover Cee era Processes Coed Peay rr] eet aes ae io Figure 2: Device clustering ‘The chassis clustering feature supports clustering of two devices and requires two connections between the devices as previously illustrated. The chassis clusters seen as a single device by both external devices and administrators of ‘the cluster. When clustering s enabled, node 1 the cluster wil enumber its Interfaces to avoid calisions with node 0, Depending on the model used (only two devices of the same model can be clustered), node 1 wil enumber its interfaces bby adding the total number of system FPCS ta the orginal FEC numberof the interface. (On a | Series router, the onboard ports and each Physical Interface Module (Pit) stot correspond to an FPC.) Accordingly, when clustering two 12320, routers, node Twill eumber its interfaces as ge-4/0/0 to ge-7/0/0, because a J2320 has three PIM slots and four standard GbE ports on the system board acting as FPCO. The following table summarizes the renumbering schema, Table: Interface Renumbering = PE eee 22320) 4 -0/0/0 e400 7350) 3 2070/0 e500 14350 7 g6-070/0 80-7/070 16350 7 -0/0/0 7/070 ‘SRXIOO/SRXIIO T 16-0/0/0 fevar0 ‘SRXZIO 2 6-070/0 Be 2/0/0 SRX22O 3 e-070/0 23/070 SRx260 3 e.070/0 e070 SRXBSO 3 2070/0 29/00 [After clustering is enabled, the system creates fxp0, fxpt, and fab interfaces. Depanding on the platform, fxpO and fxp1 are mapped to a physical interface. This s not user configurable. The fab interface is user configurable. (However, this 's limited to onboard interfaces in the SFX200 line of services gateways. The following table summarizes the fxp0 and ‘x01 mappings.) Cepyrert 0208 npr hewore 5 Table2: Mapping of Interfaces fxp0 and fxp! Davies [Fxp0 interfs Sa fab interface 1220 eros oor Use deine 750 e007 e007 Veer oetned 7350 we 07072 008 Vez dened 16350 07072 008 User dened Sex100 ARO eons eoarr User dened sRHai0 07070 e007 Uae detned Srxz20 07078 007 Vea dened srx0 07070 won arated S650 07070 ee O70n Veer oetned {As seenin figure 2, fx (the HA link) provides control plane communication between the nodes in the cluster, and fxp0 provides management access and is limited to host traffic only. This must be an Ethernet interface (WAN interfaces are ‘ot supported.) Trafic received through the fx00 interface will not be forwarded to any other interface in the system Fab interfaces are used to exchange data plane information and traffic between devices, As opposed to the fx00 and ‘xp interfaces, the fab interface can be mapped to any Ethernet interface in the system. “The control plane redundancy of the clusters similar to that used within single M Series and T Series routers. Each device acts as a Routing Engine in a system with redundant RE=, Graceful restart is used to provide contral plane failover with minimal traffic impact on the network. The control plane redundancy modat is activa/passive, where a ‘node in the cluster is designated as the active device and performs all cluster routing calculations. Except for afew key processes required for managing clustering, most of the pracesses are running only on the master RE, When the primary node falls, the routing process and other processes in the backup device will become active and assume control plane operations. Data plane redundancy is somewhat more involved, Juniper's M Series and T Series routers perform traffic forwarding. fon a packet by packet basis. There is no concept of flow, and each PFE maintains a copy of the forwarding table that \was distributed by the active RE. The forwarding table allows each PFE to perform traffic forwarding independent of ther system PFES. Ia PFE falls the est of the PFES in the system are unaffected, allowing the control plane to reroute the traffic to a working PFE. In contrast, J Series secure routers and the SRX Series gateways inspect al traffic and keep 2 table of al active sessions. Whenever a new connection is allowed through the system, the device makes note of the 5-tuple that Identifies a particular connection (source and destination IP addresses, Source and destination ports ‘as apolicable, and protocol) and updates the table with session details such as next hop, session timeouts, sequence ‘numbers (ifthe protocol is TCP), and other session-specific information required to guarantee that no packets are forwarded from unknown or undesired protocols (or users). Session information is updated as traffic traverses the ‘device and Is requited on both devices in a cluster to guarantee that established sessions are not dropped when a {allover occurs, [As shaw in Figure | the control plane REs function in active/backup made while the cata plane (PFES) function in active/active mode. With active/active PFEs, it fs passibe for traffic to ingress the cluster on one nade and egress ‘rom the other node, which means that both nodes need to be able to create and synchronize sessions. For example, when return traffic arrives asymmetrically at the node that did nat record the intial session, the chassis cluster feature gracefully forwards the traffic to the orginal node for processing, which prevents security features from being compromised. Piease be aware that the previous discussion applies only to routed traffic. Junos OS with enhanced services does not support the forwarding of Layer 2 traffic (transparent mode). Chassis clustering supports unicast [v4 trafic only 5 (epyreht 0 200 per NWR Fe Redundant Ethernet Interfaces [As previously discussed, control plane failures are detected by member nodes, causing the backup node te take Control of the cluster. Conversely, data plane failures rely on outing protocols to reroute traffic or redundant Ethernet Interfaces to overcome interface failures. The concept of redundant Ethernet is fairly simple; two Ethernet interfaces (one from each node ina cluster) are configured as part ofthe same redundant Ethernet interface (RETH interface in Junos 05 terminology). The RETH interface is then configured as part ofa redundancy group. A redundancy groups {active only on one of the nodes in the cluster, and the redundant Ethernet interfaces that are members of that group will send (and normally receive) traffic only through the physical interfaces on the active node. | redundancy group can be configured to monitor one or more physical interfaces. Each monitored interface is given @ weight, which is subtracted from the redundancy group threshold ifthe interface fils. I the threshold—due to interface failover—becomes less than zero, he redundancy group transitions state, causing the other node in the luster to ‘become active for the group. Consequently, ll the redundant Ethernet interfaces that are part of this edundancy '3rOUp wil use the interfaces on the new node to send (and normally ceive) traffic, thus routing trafic around the failure, Readers familar with NSAP will note that RETH interfaces are analogous ta virtual security interfaces (VSI) an Juniper Networks ScreenOS" Software-based devices, RETH interfaces, ust like VSis, share the same IP and media {access control (MAC) addresses between the different physical interfaces that are members of the VSV/RETH. The ‘redundant interfaces Send gratuitous Address Resolution Protocol (ARP) messages when falling over and appear as a ‘ingle interface tothe rest of the netwiork. There are, however, afew significant differences between RETHs and Vic RETH interfaces always contain the same type of physical Ethernet interfaces—for example, fete or seg. \VSic will always force afalover when the physical interface ofthe active VSI goes down. The state ofthe redundant Ethernet interface is purely a function of the state of the redundancy group with which the RETH Is assoclated. A RETH Interface wil go down fits active physical interface ts down, ETH interfaces will ony fail ever based on the monitoring of physical interfaces. IP tracking and zone monitoring are curently not supported Junos OS. ‘Tobe clear, RETH interfaces are not required to provide HA. Session information willbe synchronized regardless of the Ingress or egress interface type. Traditional routing protocols can be used to route around failures, but when connecting to simple devices that do not support routing protocols, redundant Ethernet interfaces can be useful to overcome this Limitation, Link Aggregation Interfaces and LACP AAs af anos O5 112, RETH interfaces may contain LAG interface groups as members. Additionally. the physical Interfaces contained in the LAG group can cross members ofthe SRX Series chassis cluster. This allows multiple active physical interfaces between cluster members to participate in the Redundant Ethernet (RETH) and redundancy protocol (JSP) Remote Performance Monitoring Al unos O5-based devices have the ability to perform Remote Performance Monitoring (RPM), a task unning on the router which monitors hosts using either ICMP, TCP or HTTP, which periodically checks the remote hosts, and keeps a {og history ofthe packet loss and latency results. This information can be used to monitor upstream routers in an HA cli rootfleft# set chassis cluster cluster-id 1 node 0 reboot root@left# delete interface interface-range interfaces-trust members ge-0/0/1 rootfleft# delete interface interface-range interfaces-trust members ge-0/0/2 root{left# delete interface ge-0/0/0 rootfleft# delete security zone security-zone trust interface ge-0/0/0 set interface fab0 fabric 5 (epyreht 0 200 per NWR Fe Log into each device and enable clustering by setting the appropriate cluster ID in the EEPROM. A reboot Is required for this setting to take effect. Only node O and node T can be configured, as the current implementation is limited to two ‘nodes ina cluster. In this example, node O (left) and node | (right) will be renumbered as illustrated in Table set chassis cluster cluster-id node <> rehoot on node left: root#left> sct chassis cluster cluster-id 1 node 0 reboot fon node right: root@right> get chassis cluster cluster-id 1 node 1 reboot Note: Step #1 must be performed in operational mode, notin configuration mode. After the nodes reboot, they wil form a cluster. From this point forward, the configuration ofthe clusters going tobe ‘synchronized between the node members. The following commands are entered from the configuration mode on either of the devices, [After a reboot, note how the prompts change when you enter CLL 2.Detine the interfaces used forthe fab connection, These interfaces must be connected back to back, or through a Layer 2 infrastructure, as shown in Figure 2. As expected, fabO isthe fabric interface of nodeO, while fab! is the fabric Interface of node, eet interface fab0 fabric-options menber-interfaces sot interface fabl fabric-options member-interfaces sot groupe noded interfaces fxp0 unit 0 family inct address cnoded mgat p>/ set groupe nodel ayatem host-nane set groups nodal interfaces fxp0 unst 0 family inet address / 4. (Optional) Configure device-speettic options. set groups nodad annp description set groupe node! enup description 5. Apply the group configuration set apply-groups “S{node}" 66. (Optional) Define the redundancy groups and RETH interfaces if using redundant Ethemet interfaces. sct chassis cluster reth-count set chassis cluster redundancy-group 1 node 0 priority gigether-optione redundant-parent reth. Cepyrert 0208 npr hewore ° ‘The resulting sample configuration fs shown below: #the following declares int ge-0/0/1 in node 0 as the fab interface for the node set interface fab0 fabric-options momber-interfaces go-0/0/1 #The following declares int ge~3/0/1 in node 1 as the fab interface for the node set interface fabi fabric-options menber=interfaces ge=4/0/1 ‘#oroups configuration. Configuration paraneters spacific to each node are set here. ect groupe noded system host-name left set Groups nodeo interfaces fxpo unit 0 family inet address 192.168.3.10/24 set groups node! system host-name right set Groupe node! interfaces fxp0 unit 0 family inet addrese 192.168.3.11/24 set apply-groups “$(node)" #0cfine a single RETH interface for the cluster set chaseis cluster reth-count 1 #0efine node 0 ag the primary node for reth0 set chassis cluster redundancy-group 1 node 0 priority 100 sct chassis cluster redundancy-group 1 node 1 priority 1 #Add interfaces ge-0/0/0 (in node 0) and ge~4/0/0 (ge-0/0/0 in node t) to the vata set interface ge-0/0/0 gigether-options redundant-parent retho sct interface ge-4/0/0 gigether-options redundant-parent reth0 set interfaces rethO unit 0 family inet address set interfaces rethi redundant-sther-options redundancy-group #oefine node 0 as the primary node for the control path ect chageis cluster redundancy-group 0 node 0 priority 100 set chassis cluster redundancy-group 0 node 1 priority 1 Disabling a Chassis Cluster Disabling clustering is a very simple pracess—irst set the cluster id of each node to 0 and then reboot the nodes set chaseis cluster cluster-id 0 node 0 reboot, Cluster Monitoring The following commands can be usad to verity the status of a cluster and present a view of the cluster from a node's perspective. Statistics are not synchronized between the nades in the cluster, When debugging clusters, 1's useful 19 log into each member node and analyze the output from each Viewing the Chassis Cluster Status “The command below shows the different redundant groups configured in the cluster. together with thelr specified priorities and the status of each nade. This command is useful when trying to determine which RETH interfaces are active on each node. The spacial redundancy group O refers to the status ofthe control plane. In this example, node O Is the primary node for this group and, therefore It's in charge of all control plane calculations (it acts as the master RE {and runs the control plane processes like fod, kr, dhcp, ppd. and others). show chassis cluster status Cluster: 1, Redundancy-Group: 0 bevice name Desority Status Desenpe Manual failover nodeo 100 Primary No Bo. nodet 1 Secondary No No. Cluster: 1, Redundancy-croup: 1 Device nane Priority status Preeapt Manual failover rodeo 100 primary ves so. nedet 1 secondary ves so. (epyreht 0 200 per NWR Fe Viewing the Cluster Statistics ‘The command below displays the statistics of the different objects being synchronized, the fabric and control interface hllos, and the status of the monitored interfaces in the cluster show chassis cluster statistic initial hold: 5 Reth Information: reth —atatus redundancy-group retho up, 1 Services Synchronized: service-nane ‘Tranelation context incoming NAT Reaource Manager Scssion-create Seasion-close Session-change Gate-create Session-Agecut-refresh-request Session-Ageout-refresh-reply ven Pirevall user authentication MGcP Alg #323 alg SIP alg sccr alg BPIP Alg RISP alg Interface Nonitering: interface ge-4/0/0 ge-0/0/0 £0-5/0/0 fe-1/0/0 status »P =P BP up chat 244800 heart beate sent 248764 heart beata received 1000 ms interval 2 threshold chaseis-cluster interfaces: Fabric Link: up Rtos-sent —_Rtos-received 0 0 ° ° 10 ° 225 10592 222 10390 o ° ° ° 149 1 ° 0 ° 0 ° ° ° ° ° ° ° 0 ° ° ° ° ° ° edundancy-group 1 1 1 1 244796 heartbeat packets sent on fabric-link interface 244764 heartbeat packets veceived on fabric-link interface Viewing the Control Link Status ‘This command displays the status ofthe control interface (fxpl) of this particular node show chassis cluster interface physical interface: fxpi.0, Enabled, control interface, physical link ie up Cepyrert 0208 npr hewore Viewing the Session ‘The command shown below displays the sessions in the session table of each nade by specifying the node number. ‘synchronized sessions will be seen in both nades, where they will appear as active in one nade and backup in the other. {Adetalled view of a session can be abtained by specitying the session id show security flow session nodeo Session 1D: 2, Policy name: self-tralfic-policy/1, state: Active, Timeout: 1800 In: 172.24,241,53/50045 --> 172.19.101.34/22;tep, TE: ge-0/0/0.0 out: 172.19.101.34/22 --> 172.24.241.53/50045;tep, If: -local..0 1 sessions displayed ‘show security flow scssion sossion-identifier 2 Session ID: 2, Status: Normal, State: Active Flag: Ox40 virtual system: vost, policy name: self-tratic-policy/1 Maximum timeout: 1800, current timeout: 1800 Start time: 1900, Duration: 256 Int 172-24,241.53/50045 --> 172.19.101.34/22;tep, Interface: ge-0/0/0.0, Session token: Oxa, Plag: 0x4097 Route: 0x20010, Gateway: 172.19.101.1, Tunnel: 0 Port sequence: 0, FIN sequence: 0, FIN state: 0, out: 172.19.101.34/22 —-> 172.24.241.53/50045;tep, Interface: .loval..0, Session token: ox4, Plag: 0x4112 Route: Ox#fb0006, Gateway: 172.19,101.34, Tunnel: 0 port sequence: 0, FIN sequence: 0, FIN state: 0, 1 sessions displayed TCP sequence numbers are not synchronized. However, the active node fora given session wil keep track of the sequence numbers. Whan a session Is migrated cue toa falure (for example, fallures that cause the egress interface of a session/group of sessions to be ina different nade than prior to the failure) the sequence number counting will resume on the new node based on the sequence numbers ofthe packets going through the new active node for the session(s). Deployment Scenarios [NSRP has been used in multiple networks with several topologies. This section provides the equivalent SRX Series services gateway or J Series routar for these typical scenarios Active/Passive Cluster In this case, a single device in the cluster is used to route al traffic, while the ather device is used enlyin the event of a failure. When a failure occurs, the backup device becomes master and takes over all forwarding tasks. 2 Conyrert 6200, knparNatwor Fe UNTRUST ZONE Both ethsbelong to rede Figure 3: Active/passive cluster Active/passive can be achieved using RETH interfaces ust as one would do using VSIs, The redundancy soup determines the RETH state by monitoring the state ofthe physical interfaces in rethO and rethl.f any of these Interfaces falls, the group is declared inactive by the systern that hosts the falling interface. Ona failure, both RETH Interfaces will fail over simultaneously as they belong to the same redundancy group. This configuration minimizes the ‘traffic around the fabric link, as only one node in the cluster will be forwarding traffic at any sven time, ‘#oroups Definitions set groupe noded ayatom host-name J2220-A set groups nodeo interfaces fxp0 unit 0 family inet address 192.168.3.110/24 set groups node! system hostname J2320-D set groupe node! interfaces fxp0 unit 0 family inet addres 192.168.3.111/24 set apply-groups “S{node}" cluster Configuration, redundancy-group 0 determines the status of the FE maatership, waile vedundancy-group 1 is used to control the seth interfaces set chaseis cluster reth-count 2 sct chassis cluster heartbeat-threshold 3 set chasis cluster node 0 set chassis cluster node 1 eet chassis cluster redundancy-group 0 nede 0 privity 100 ect chaseis cluster reduadancy-group 0 node 1 priority 1 #he ge-0/0/0 interface on each node is used as the fabric interface between the nodes set interfaces fab0 fabric-options momber-interfaces ge-0/0/1 set interfaces fab1 fabric-options menber-interfaces ge~4/0/1 Cepyrert 0208 npr hewore 3 #riote how the redundancy-group 1 is configured to monitor all the physical interfaces forvarding tradic. The preempt keyvord causes the mastership to be reverted back to the primary node for the group (node 0, which has a higher priority) when the failing interface causing the switchover comes back up set chassis cluster redundancy-group 1 nod= 0 priorsty 100 set chassis cluster redundancy-group 1 node 1 priority t sct chassis cluster redundancy-group 1 preempt set chassis cluster redundancy-group 1 interface-monitor fe-1/0/0 weight: 255 set chasis cluster redundancy-group 1 interface-monitor f-5/0/0 weight 255 set chassis cluster redundancy-group 1 interface-monitor ge-0/0/0 weight 255 set chassis cluster redundancy-group 1 interface-monitor ge=4/0/0 weight 255 #(optionally) Tf both data processing and control plane functions want to be performed in the same node, then redundancy-group 0 must monitor also the physical interfaces. If control and data planes are allowed to fail over independently, the following four conmands should not be set. set chassis cluster redundancy-group 0 interface-monitor fe-1/0/0 weight 255 set chassis cluster redundancy-group 0 interface-monitor fe-5/0/0 weight 255 set chassis cluster redundancy-group 0 interface-monitor ge-0/0/0 weight 255 set chassis cluster redundancy-group 0 interface-monitor ge-4/0/0 weight 255 set interfaces ge-0/0/0 gigether-options redundant-parent reth set interfaces fe-1/0/0 fastether-options redundant=parent reth0 set interfaces ge-4/0/0 gigether-options vedundant-parent retht set interfaces fe-5/0/0 fastether-options redundant-parent retho set interfaces reth0 redundant-ether-options redundancy-group 1 set interfaces rethi redundant-sther-options redundancy-group 1 #Just as regular interfaces, roth interfaces must be part of a security zone set security zones security-zone untrust interfaces reth1.0 set security zones security-zone Trust interfaces reth0.0 Asymmetric Routing Scenario This scenario makes use of the asymmetric routing capability of Junos OS with enhanced services. Trafic received by ‘anode is matched against that node’s session table, The result ofthis lookup indicates whether that node processes, ‘the session or forwards it to the ather node through the fabric link. Sessions can then be anchored to any device in the cluster; and, as long asthe session tables are replicated, the traffic will be correctly processed, To minimize fabric traffic, sessions are always anchored to the node hosting the egress interface for that particular connection, Conyrert 6200, knparNatwor Fe APPLICATIONNOTE- ranch Sx Sates ona Sees Chass Custer efaut routes rover haptic 10 (One hap LA mat of Figure 4: Asymmetric routing scenario Figure 4 shows an example of how asymmetric routing is supported. In this scenario two Internet connections are used with one being preferred, The connection to the trust zone is made using a RETH interface to provide LAN redundancy {or the devices in the trust zone. Far illustrative purposes, we will describe two falover cases in which sessions originate Inthe trust zone witha destination of the Internet (untrust zone). Case I: Failures in the Trust Zone RETH ‘Under normal operating conditions, traffic wil flow from the trust zone to the interface ge-0/0/0 (belonging to reth0.0) in node 0. Since the primary Internet connection resides in node 0, the sessions willbe created in both node O and node 1 but will only be active in node Q (since the egress interface forall of thase sessions is fe-1/0/0 belonging to ode 0). {failure in the g2-0/0/0 interface wil trigger a fallover ofthe redundancy aroup, causing the interface ge-4/0/0 (ge (0/070 in node) to become active. After the failover, traffic will arrive at node 1. After session lookup, the traffic will be Sent to node 0 as the session willbe active in this node (since the egress interface, fe-1/0/0 ts hosted in this node 0). ‘Nace 0 will then process the traffic and forward it to the Internet. The return trafic wil follow a similar pracess. Traffic will arive at node 0, be processed at node O (since the session is anchored to this node), and be sent to nade through the fabric interface where node | wil forward it through the ge-4/0/0 interface. Cae i: Fallures in the Untrust Zone Interfaces ‘This case afers from the previous one in that sessions will be migrated from nade to node. As in the previous case, ‘ratfic will be processed only by node 0 under normal operating conditions. A failure of interface fe-V/0/0 connected tothe Internet will cause a change in the routing table, which wil have a defauit route after the failure pointing to Interface fe-5/0/0 in node I. After the failure, the sessions in node O will become inactive (since the egress interface ‘now wil reside in nade 1), and the backup sessions in node 1 will become active. Traffic arving from the trust zone wil Still be received on interface s-0/0/0, but wil be forwarded to node | for processing, Ate traffic s processed in node 1.it will be forwarded to the Internet through the fe-5/0/0 interface, ‘epreht 0207 onpar Neon 16 Note that if this scenario were used with source NAT, ta accommodate different address spaces assigned by diferent providers, the above would not workas the egress sessions would be NATed differently after the failover (this isnot limitation of the HA implementation, but a consequence of the fact that if two Internet service providers (ISPs) are used, the customer doesn't own a public actress space. and a failure in one ofthe ISPS wil result in the loss of connectivity from all Ps belonging to the falled service provider). cluster Configuration, redundancy-group 1 is used to control the RETH interface connectad to the truat zone. Note how the redundancy group (and therefore vethO) will only failover if either fe-1/0/0 or fe-5/0/0 fail, but not if any of the interfaces connected to the Internet fails. set chassis cluster reth-count 1 fet chassis cluster node 0 eet chassis cluster node 1 act chassis cluster redundancy-group set chaseia cluster redundancy-group set chassis cluster radundancy-group eet chassis cluster redundancy-group sct chassis cluster redundancy-group node 0 priority 100 node 1 priority 1 presngt, interface-nonitor fe-1/0/0 weight 255 interface-noniter fe-5/0/0 weight 255 #interface Definitions set interfaces ge-0/0/0 unit 0 family inet address 1.4.0.202/24 set interfaces fo-1/0/0 fastether-options redundant-parent retho set interfaces fe-1/0/1 disable set interfaces ge-1/0/0 unit 0 family inet addrese 1,2.1.233/24 eet interfaces fe-5/0/0 fastather-options redundant-parent retho set interfaces reth0 unit 0 family inot address 10.16.8.1/24 #ye-0/0/1 one each node will be used for the fab interfaces eet interfaces fab0 fabric-options member-interfaces ge-0/0/1 et interfaces fabl fabric-options momber-interfaces ge-4/0/1 ‘fe have two static routes, one to each ISP, but the preferred one is through ge- 0/o/o eet routing-optiona atatic route 0.0.0.0/0 qualified-next-hop 1.4.0.1 metric 19 set routing-options static route 0.0.0.0/0 qualified-next-hop 1.2.1.1 metric 100 #Zones Definitions set security zones security-zone untrust interfaces ge-0/0/0.0 host-inbound- trafic eystem-services dhep sct security zones security-zone Untrust interfaces go-4/0/0.0 host~inkound= traffic system-services dhcp seb security zones security-zone Trust interfaces reth0.0 #Pinally a permit all security policy from Trust to Untrust zone set security policies fron-zone rust to~zone untrust policy AKY match source- address any get security policice from-zone Teust to-zone untruet policy ANY match deetination-address any set security policies fron-zone Trust to-zone untrust policy ANY match application any set security policies fyon-zone Teust to-zone untrust policy ANY then permit, 6 (epyreht 0 200 per NWR Fe ‘Active/Active Full Mesh This scenario is found in medium to large deployments where secure routers are placed between two pairs of routes. (OSPF is used to control the traffic low through the nodes inthe cluster, and JSRP is used to synchronize the sessions between the two nodes. Since asymmetric routing is supported, it is not required to force the traffic in both directions to. particular node. If failure occurs and return traffic for a session arrives ata node different from the session Creating node, the fb link will be used to send the tratfic back to the nade where sessions are active (this wil be the ‘node hosting the egress interface for that particular session) ‘This scenario benetits from the use of full mesh connectivity between the devices (thus improving the resiliency of the network), while eliminating the need to add extra switches in between the firewalls and routers, which reduces the points of failure in the network. OSPF AREA 0 OSPF AREA 0 UNTRUST ZONE UNTRUST ZONE ro2116/00 102116/20 Physical View Logical View Figure 5: Active/active full mesh scenario ‘Special Consideration ‘The following Gesign consideration should be taken into account when using the chassis cluster feature in Junas OS with enhanced services Error in ether for fp inks (bust nt both) will cause the backup nade ta become disabled (single failure point) ita backup nade detects errors in both fab and fxp inks, wil become master (dual failure point). Inthe event ofa contol ink failure, the system tries toavoid a dual mastership scenario by monitoring the fabric tink. If hellos ae received though ths link, the secondary becomes asabled, while the prmary remains active. tfneither contro link nor fabric link hellos are received, the backup node transitions to active ‘Wher a fabric ink fallure is detected, the nodes perform the solt-brain avoidance procedure just ke In the case of a control tink feu. Ifthe fabric ink ails but the contro link is till operational, the backuo node wil become cisabled, thus avoiding a two master conflict. Cepyrert 0208 npr hewore 7 + Fallover times ae in the order ofa few seconds. A failure wil be detected in three secands or more (asthe minimum hello time fs 1000 ms, and the smallest threshold is three consecutive lost ellos). + Unified in-service software uograde (/SSU) isnot supported (please refer tothe next section fora desciotion of the Upgrade procecure when using the HA feature). + Chassis clustering does not support packet mode-based protocols (eg, MPLS, Connectionloss Network Service, and IPve are not supported), + Pseudo interfaces arent supported when using the chassis cluster feature. The following services that require pseudo Interfaces will nat werk ina cluster configuration: ~ Link services such as Multink Point-to-Point Protocol (MLPPP), Multlink Frame Relay (MLFR), and compressed RTP (crv) ~ Generic routing encapsulation (GRE) tunnels = IPP tunnes = Pes multicast ~ WAN interfaces are supported with the following exceptions: > CHeT.ISON, and x0SL. > ISM200 modules are not supported in HA mode Note: ISM modules are only supportesion the | Series. Cluster Upgrade Cluster upgrade isa simple procedurs during this process: bbut please note that a service disruption of about 3 to 5 minutes will occur 1. Load the new image file inode 0 2 Perform the image upgrade, without rebooting the node by entering “request system software add cimage name>" from lunes 05 CLI 3.Load the new image file in nade 4. Perform the image upgrade In node 1, as explainedin step 2 5. Reboot both nades simultaneously In-Band Management of Chassis Clusters Traditionally, SRX Series clusters can only be managed through an out-of-band management network requiring dedicated access to the management ports which could not be used to forward revenue traffic. This section explores recommended ways to manage and deploy SRX Series clusters using in-band management connections 'SRX Series Services Gateways for the branch can be managed in-band or out-of-band (through the use of the ‘xp0 interface) when deployed in a cluster configuration, This assumes that the cluster can be reached from the ‘management stations through revenue ports only Problem Statement ‘The high avalabilty (HA) feature available in Junos OS for SRX Series gateways is modeled after the redundancy features found in Junos OS-based routers. Designed with separate control and data planes, Junos OS-based routers provide redundancy in both planes. The control plane in unos OS is managed by the Routing Engines, which perform all outing and forwarding computations (among many other things).Once the control plane converges, forwarding, entries are pushed to all Packet Forwarding Engines (PFE) in the system. PFEs then perform route-based lookups to determine the appropriate destination for each packet without any Routing Engine intervention, ‘When enabling a chassis cluster in SRX Series gateways, the same model s used to provide control plane redundancy as is shown in Figure 6. 6 Conyrert 6200, knparNatwor Fe Control Plane Corey Daemons roy Controt Plane Data Plane eat Ce pean peal roid pees) Figure 6: SRX Series clustering moda Just like in a router with twe Routing Engines, the control plane of SRX Series clusters operates in an active/passive ‘mode with only one node actively managing the control plane at any given time. Because of this the forwarding plane always directs all traffic sent to the control plane (also referred toas host-inbound traffic) te the clusters primary node. This traffic includes (but isnot limited to}: “Trafic forthe routing daemon such as BGP tratfc, OSPF, SS, RIP, PIM, et. Internet Key Exchange (IKE) negation massages ‘Tafic oected to management daemons like SSH, Telnet, SMP, Netconf (used for NSM), and so.on Monitaring protocols ike Bidirectional Forwarding Ostection (BD), or eal-tme performance monitoring (RPM) Please note that this behavior apalies only to hast-inbound traffic, Through traffic (ie, traffic forwarded by the Cluster but not destined to any of the clusters interfaces) can be processed by either node, based on the cluster’s configuration. ecause the forwarding plane always directs host-inbound traffic to the primary node, a new type of interface, the fxp0 Interface, was added in an effort to provide an independent connection to each node, regardless of the status of the control plane. Trafic sent to the fxp0 interface Is nat processed by the forwarding plane, but is sent to the Junos OS kernel, thus providing a way to connect to the control plane of a node, even on the secondary node. Until unos 05 10.12, the management ofa chassis cluster using NSM (and other management interfaces) required Connectivity to the contrat plane of both members of a cluster therefore requiring access to the fxp0 interface of each node, ‘This apalication note explains how to manage a chassis cluster through the primary node without requiring the use of the fxp0 interfaces. Cepyrert 0208 npr hewore 8 Description and Deployment Scenario Connecting to a Cluster Using SSH/Telnet Accessing the primary node of a cluster is as easy as establishing than the fxp0, that's). Either L3 oF Redundant Ethernet (RETH) 18a connection to any of the node's interfaces (other Interfaces will always dtect the traffic to the primary nade, whichever node that is. Both deployment scenarios are common and are depicted In the following diagrams: RETHOO RETHO.O a TORE @e — ZB at ese a A Honea) — ie — @ Sea a ap Fe Rencnt Eee eae earned, p00 etree cerecittioes Figure 7: Common branch deployment scenarios for SRX Series clustering In both cases, establishing @ connection to any ofthe local addresses will connect tothe primary node (to be precise, i will connect to the primary node of redundancy group 0). For ex ample, we can connect tothe primary node even when the RETH interface, member of the redundancy group 1, is active in a aitferent node (the same applies to L3 interfaces, cevenif they physically reside in the backup node), $esb 10.1.1.34 Jabuser#10,1.1.34's pasevor: === guwos 10.2R1.3 built 2010-05-14 15:13:40 ure (prinary:nodel} labuser@sranchGw> show chassis cluster status Cluster 1D: 3 ode pesority, Statue Deeenpt anual failover Redundancy group: 0 , Failover count: 3 node 200 secondary no yes nodel 255, primary no. yes Redundancy group: 1 , Failover count: 4 node 254 primary yes no. node a secondary yes no. Login into the secondary node from the primary Most monitoring commands wil show the status of bath nodes, secondary node from the primary, as shown below. ‘When needed, its still possible to connect to the Labuser@BranchcW> request routing-engine login node 0 == JUNOS 10.2R1.3 built 2010-05-14 15:13:40 unc (epyreht 0 200 per NWR Fe APPLICATIONNOTE- ranch Sx Sates ona Sees Chass Custer Exiting the session vill bring ue back to the primary node: {secondary:nodeo) labuseréRranchot> exit vlogin: connection closed {primary:node1} labuser#Branchow> ‘SSH management ofa cluster is a good example of how all management protocols behave. It is simple to connect to the primary node, and connecting to the secondary node must be done through the primary, NSM management of a cluster isnot any different. NSM versions prior to 2010.2 require NETCONF connections to both ‘nodes, which is why in-band management ofa cluster in older versions is problematic, The solution to this problem Is the subject of the next section, In-band Management Through Network and Security Manager NSM management of SPX Series gateways in cluster configurations was modeled after the management of ScreenOS devices connected using the NetScreen Redundancy Protocol (NSRP). where NSM connects to each member forming {an HA pair independently. However, other Junos OS-based devices running in HA mode can be managed through NSM Using a single connection, In particular, NSM can manage Juniper Networks EX Series Ethernet Seiches with Virtual Chassis technology by connecting to the master nade only. In this case, configuration and monitoring of the chassis is done through this single connection. NSM version 2010.2 has added the ability to manage a branch SRX Series cluster just lke an EX Series with Virtual Chassis, thus requiring only a single connection to the primary node. This change requires modifications both to ‘the devices so that they identify to NSM as a Virtual Chassis, and to NSM. For backwards compatibility purposes, Clusters identify to NSM as a chassis cluster by default, and itis expected that they will be managed through the fxo0 Interfaces, ‘The cetault behavior can be changed in the device by adding the following configuration to the cluster: labuserésranchet# set chassis cluster network-management cluster-master [Adding the device to NSM is similar to adding an EX Series Virtual Chassis. Simply mark the vitual-chassis" check box when adding the cluster. Note how the cluster must be added ase single node, and not as a chassis cluster. [tiem career no s i — =" Se | pen Figure 8: Adding a cluster as.a Virtual Chassis in NSM. Cepyrert 0208 npr hewore 2 “The hardware inventory wil display the chassis serial numberof the primary node, and a failover wil resuit in an Update reflecting the serial number change. Most configuration and monitoring options are supported. with the following exceptions (which will be addressed in a subsequent release) + Chassisinventory displays “sub-component” instead of "FPC ‘The “chassis serial number” as obtained from cached copy in NSM fom “get-system-information’ contains old Information ands not correct + Software update ofboth devices through NSMis not supported ‘The Virtual Chassis status vew shows no vali information, + License inventory showsinformation only about primary node. + Hardware inventory gets out of syne when the primary nodes rebooted + Reboot commands sent through NSM are only applied on the primary node. (Only control plane logs from the primary node are sent fo NSM. + Data plane logs (Ike session logs IDP attacks, etc) can be sent from both nodes directly to NSMin structured-syslog format. Support for structured-syslog messages on NSM requires version 2010.4R2 or later. ‘When updating IDP signatures, NSM pushes the secutty package tothe primary node, after which it sends a remote procedure call (RPC) fo the cluster to tigger an upgrade, Under normal circumstances, only the primary node will get Updated. To overcome this limitation, a Junes OS script has been developed that takes care of updating the secondary node automatically after the primary has been updated. Updating the IDP Signatures When a chassis cluster is managed through an in-band connection, only the control plane of the primary node will have connectivity to other devices. n particular, only the primary node is able to download new security packages from the Update servers. The “request security ido Security-packase download node primary" and the “request security idp Security-package Inetall nade primary” commands can still be used ta download and install the security package inthe primary node (using these commands without specifying the node will still work on the primary, but fail on the secondary node), cluster can automatically copy and install a new installed security package in the secandary, by loading and enabling the “idp-update.xslt" event script. The script (which can be downloaded from the following location: https://matrix. Juniper net/community/products/security/stxserles /blog/2010/06/01/updating-the-Idp-security-package-In-a- luster-with-no-fxp0-access-to-the-internet) must be copied to the */var/ab/scrints/event” directory in both odes, after which it must be enabled using the following configuration: set event-options policy idp-update events IDP_SECURITY_INSTALL RESULT ‘set event-options policy idp-update attributes-match idp security install_result. status matches successful eet event-options policy idp-update then event-ecript idp-update.xelt \With the script enabled, all IDP signature update methods are supported, including NSM, command-line interface (CLI), and auto-update, tis possible to manually synchronize the signature packages between the nodes by manually copying the contents of the /varidb/idpdsec-download directory in the primary node to the secondary. Files can be copied between ‘nodes by using the “fle copy" command and specifying the backup node as the target (file copy var/db/idpd/sec- download nodeX:/var/dbidpd/sec-cownload) where noceX is either node0 or nedel, depending an which node isthe backup, Simiialy the IP policy templates can be synchronized by simply copying the templates stored in the /var/ab/seripts/ ‘commit directory to the secondary node. Using SNMP Just lke in the SSH/Telnet case, the primary device can answer SNMP queries and generate SNMP traps for both nodes. At the time ofthis writing, not all MIBs supported by branch SRX Series devices work across a cluster, but most MiBs do. 2 Conyrert 6208, Natwor Fe

You might also like