For performing an ISSU (In-Service Software Upgrade) the followings steps need to be done

:
1.
2.
3.
4.
5.

Load the Junos Software package on the device - KB20955
Verify the Health of the Cluster (Important step) - KB20956
Create backup of the current configuration and set the rescue config - KB20957
Start the In-Service Software Upgrade - KB20958
Process to follow, in the event of the ISSU process stalling in the middle of the upgrade - KB19500

Load the Junos Software package on the device KB20956
a. Junos software installation requires the package to be on the SRX device.
For help on getting the Junos software package, refer to Downloading Software Packages from
Juniper Networks.
There are multiple methods for transferring the software package to the device. Copy/transfer the
software package on to the device by using FTP or USB. (You can determine the amount of
temporary storage space left on the device by following KB17367.)
b. Once the package is loaded, verify that it is available under the directory (/cf/var/tmp). You can do
this by following any of the methods shown below:
From Shell:
{primary:node0}
root@test-node0> exit
root@test-node0% ls /var/tmp
juniper.conf.spu.gz
juniper.data
junos-srx5000-10.1R1.8-domestic.tgz
From CLI:
{primary:node0}
root@test-node0> file list /cf/var/tmp/junos*
junos-srx5000-10.1R1.8-domestic.tgz
c. To ensure that the package transferred to the device is not truncated or corrupted, perform a MD5
checksum, which proves the integrity of the package.
> file checksum md5 /var/tmp/jinstall-ex-4200-10.4R1.9-domestic-signed.tgz

Verify the Health of the Cluster (Important step) - KB20956
Review the output of the following commands, and verify completely that the cluster is in
good shape and the health is excellent. It is highly recommended that ISSU is done only if the
Chassis Cluster is in a healthy failover state.
1. Confirm the Chassis Cluster is in the Primary/Secondary state with a proper priority.
Follow the steps here: KB20673 - How to verify that Chassis Cluster in
Primary/Secondary State has proper priority. KB20673 is the common method for

Failover count: 1 node0 100 secondary node1 150 primary no no no no Redundancy group: 1 .Troubleshooting steps when the Chassis Cluster does not come up What is the priority of each node?  If the priority is 0. Failover count: 1 node0 100 secondary node1 150 primary no no no no Do you see one node with the status of primary and one node with the status of secondary?   Yes .verifying the Chassis Cluster health. Is any hardware down? Check messages log. There are several reasons why you could see the ineligible state:     Cold sync failure (see J-Series/SRX Security Configuration Guide for more details) Monitored interface down IP Tracking is failing (SRX3000 and SRX5000) Possible hardware issue Perform the following to correct the priority 0 state:      Check chassis cluster statistics.What does priority 0 mean in a JSRP chassis cluster? Priority 0 means that the node is in the ineligible state. Are there any errors? Check chassisd logs. Are any of the monitored interfaces down or is a tracked IP missed? Check jsrpd logs.Proceed with Step 2 No . KB20673 Run the command show chassis cluster status on either node to verify the Chassis Cluster status: {primary:node0} root@J-SRX> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 . then proceed to KB16869 . also perform the following steps.Go to KB20641 . However for an ISSU upgrade. Are there any events leading up to the problem? . Are there any missing heartbeats or probes? Check chassis cluster interfaces.

Otherwise. After a manual failover. If any of the FPC‟s are showing as „Present‟ or „Offline‟. If a Redundancy Group is active on node 0. 2. If the priority is 255. If the priority for both nodes is between 1 and 254. Contact your technical support representative. Each Redundancy Group (other than Redundancy Group 0) contains one or more redundant Ethernet interfaces. it is important to determine the cause for this and make sure that on both nodes they come up as online before proceeding further. proceed to Step 3. if in doubt at this stage. If the redundancy group fails over to node 1. then proceed to KB16870 . then the child links of all the associated redundant Ethernet interfaces on node 0 are active. no additional failovers may occur for that redundancy group. then the child links of all redundant Ethernet interfaces on node 1 become active. To remove manual failover state and restore proper priority state. Manual failover will show 'yes' in that scenario.) {primary:node0} root@test-node0> show chassis fpc pic-status node0: -------------------------------------------------------------------------Slot 0 Online SRX5k DPC 40x 1GE PIC 0 Online 10x 1GE RichQ PIC 1 Online 10x 1GE RichQ PIC 2 Online 10x 1GE RichQ PIC 3 Online 10x 1GE RichQ Slot 3 Online SRX5k SPC PIC 0 Online SPU Cp PIC 1 Online SPU Flow Slot 4 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow . Verify that all the FPC’s and the PIC’s are showing online. then it should be OK to proceed. request chassis cluster failover reset redundancy-group <0-128>  If the priority is between 1 and 254. use below CLI command.What does priority 255 mean in a JSRP chassis cluster? Priority 255 means that a manual failover was initiated. A redundant Ethernet interface is a pseudo interface that contains a pair of physical Gigabit Ethernet interfaces or a pair of Fast Ethernet interfaces. it means that the Chassis Cluster is in a healthy state. (If the FPC in non-online mode does not take part in the Chassis Cluster failover. it is always recommended to reset the manual flag in the cluster status.

It is best to have all the Redundancy Groups to be primary on any one node. srx> request chassis cluster failover redundancy-group 0 node 0 srx> request chassis cluster failover reset redundancy-group 0 . If not. It is suggested to run this command twice. Also confirm that the error count is not increasing. node 0. Check the Chassis control link and verify that you see a closely uniform sent/receive packets. srx> show chassis cluster control-plane statistics () 4.Slot 5 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow Slot 6 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow Slot 7 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow Slot 8 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow node1: -------------------------------------------------------------------------Slot 0 Online SRX5k DPC 40x 1GE PIC 0 Online 10x 1GE RichQ PIC 1 Online 10x 1GE RichQ PIC 2 Online 10x 1GE RichQ PIC 3 Online 10x 1GE RichQ Slot 3 Online SRX5k SPC PIC 0 Online SPU Cp PIC 1 Online SPU Flow Slot 4 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow Slot 5 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow Slot 6 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow Slot 7 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow Slot 8 Online SRX5k SPC PIC 0 Online SPU Flow PIC 1 Online SPU Flow 3. proceed to do a failover of the Redundancy Group before you proceed further. eg.

save it as RESCUE: {primary:node0} root@test-node0> request system configuration rescue save Reason: Precautionary step for ISSU. If so. If you do find discrepancies. 5. to make sure that even if active configuration gets wiped out for some reason.KB20957 If the current configuration is a good one. Check for errors happening during our troubleshooting period prior to the upgrade. it is recommended that you have a latest copy of the running configuration stored on a different storage device/server for easy retrieval if required.gz' and review the contents. Then do a 'file show /config/rescue. You can check for the SPU counts and it should match with the number of SPU you have on the device.conf.srx> request chassis cluster failover redundancy-group 1 node 0 srx> request chassis cluster failover reset redundancy-group 1 For the failover for RG0. Check if the packet counts and SPU counts match each other. Verify that the date and time match with the system date and time. root@test-node0> show chassis cluster information | no-more Create backup of the current configuration and set the rescue config . contact your technical support representative for consultations before proceeding further. The rest of the RG groups will failover faster. Verify there are no alarms. You can verify the rescue configuration on the device by doing a 'file list /config/' to confirm that the file exists. the device will always have a rescue config to load from once it boots up. . solve that first before proceeding further. Check if the events are showing any irregular problems. Run the following command: {primary:node0} root@test-node0> show chassis cluster information The cluster information and statistics should not be showing any alarms that could cause a disruption when this is going on. there might be a slight lag and you may need to wait for about 2 – 3 mins maximum as the RE is getting failed over. Further.

tgz Checking junos requirements on / Saving boot file package in /var/sw/pkg/junos-boot-srx500010. First.tgz . it will upgrade node 1 but not reboot..tgz Verified manifest signed by PackageProduction_10_1_0 Hardware Database regeneration succeeded Validating against /config/juniper. become active at next reboot WARNING: A reboot is required to load this software correctly WARNING: Use the 'request system reboot' command WARNING: when software installation is complete Saving package file in /var/sw/pkg/junos-10... follow the ISSU abort process in KB19500. Saving state for rollback .. This is necessary to verify and monitor the ISSU process as it upgrades the Junos image.conf. Perform the upgrade with the following command: {primary:node0} root@test-node0> request system software in-service-upgrade /var/tmp/junossrx5000-10. The messages that reported on node 0 and node 1 will be on similar lines as follows.Start the In-Service Software Upgrade .(complete the package information as shown above) Chassis ISSU Started node1: ------------------------------------------------------------------------Chassis ISSU Started ISSU: Validating Image Inititating in-service-upgrade node1: ------------------------------------------------------------------------Inititating in-service-upgrade Checking compatibility with configuration Initializing.1-domestic signed by PackageProduction_10_1_0 Using /var/tmp/junos-srx5000-domestic. .. verify that you have both console connectivity to the primary and secondary nodes. This could also lead the ISSU process to stall.4.tgz reboot <---be sure to include the 'reboot' option Important: Make sure that you have the reboot command specified in the command.conf...1R4.7-domestic. 2.KB20958 1. If is not specified.. and the physical reboot of node 1 is needed before the automatic failover happens. Verified manifest signed by PackageProduction_10_1_0 Verified junos-10. (Messages that are not important have been omitted. If it stalls.gz mgd: commit complete Validation succeeded ISSU: Preparing Backup RE Pushing bundle to node1 JUNOS 10.4R3.) NODE 0: {primary:node0} root@test-node0> request system software in-service-upgrade /var. and verify that 'logging' is enabled on both terminal sessions..gz mgd: commit complete Validation succeeded Validating against /config/rescue.

check the following commands. node1 booted up. srx> show chassis cluster status srx> show chassis fpc pic-status (all the PICs in NODE 1 should be online – keep monitoring it for 2 mins or so to make sure all are online) srx> show chassis alarms srx> show system alarms Srx> show log messages | grep issu Now the automatic failover will happen and once that is done.How to verify that Chassis Cluster in Primary/Secondary State has proper priority Also see that the Redundancy Groups are now primary on Node 1 – to bring it back to node 0. Waiting for node1 to become secondary node1 became secondary. Node 0 should come back up in the healthy state. Check the command “show version” to verify this.Finished upgrading secondary node node1 Rebooting Secondary Node node1: ------------------------------------------------------------------------Shutdown NOW! [pid 21958] ISSU: Backup RE Prepare Done Waiting for node1 to reboot. Failover count: 2 node0 254 primary no no node1 0 secondary no no At this stage. follow the process as shown below: srx> srx> srx> srx> request request request request chassis chassis chassis chassis cluster cluster cluster cluster failover failover failover failover redundancy-group 0 node 0 redundancy-group 1 node 0 redundancy-group X node 0 reset redundancy-group X . the upgrade of Node 0 will happen. Verify everything as mentioned in the KB20673 . Failover count: 2 node0 254 primary no no node1 2 secondary no no Redundancy group: 1 . Waiting for node1 to be ready for failover ISSU: Preparing Daemons Once this is done. Node 1 has rebooted successfully and is on the Junos version that you upgraded to. but still monitor it to see if there are any problems or warnings that the boot messages are throwing. Also. the following will be reported : {secondary:node1} root@test-node1> show chassis cluster status Cluster ID: 2 Node Priority Status Preempt Manual failover Redundancy group: 0 . The messages reported are similar to above. on NODE 1.

run the following commands on both nodes simultaneously to rollback to previous Junos version request chassis cluster in-service-upgrade abort request system software rollback request system reboot  If only node completed upgrade. verify with 'show version'.As mentioned earlier. run the following commands on both nodes simultaneously to rollback to previous Junos version request chassis cluster in-service-upgrade abort request system reboot . verify with 'show version'.  If both nodes completed upgrade. run the following commands 1) On upgraded node request chassis cluster in-service-upgrade abort request system software rollback 2) On Node that did not complete upgrade request chassis cluster in-service-upgrade abort 3) On both nodes after completing the above steps request system reboot  If neither node completed upgrade succesfully . you will see that the failover of RG0 might take some time. verify with 'show version'. Process to follow. and you can check the health of the Cluster as mentioned in KB20673 .How to verify that Chassis Cluster in Primary/Secondary State has proper priority. The rest of the Redundancy Groups should failover fast. The ISSU process is now complete. in the event of the ISSU process stalling in the middle of the upgrade KB19500 In case system does not complete ISSU process perform the following steps to completely stop the ISSU process and rollback to previous state.