SRX Chassis Cluster Upgrade with Minimal Downtime (v0.7) Assume that node0 is the primary for control plane (RG0) and data plane (RG1+) and configured with high priority than the secondary node. On the node0 1. Upload the new Junos OS image /var/tmp/junos-srx5000-12.1x46-d35-domestic.tgz On the node1 1. Upload the new Junos OS image /var/tmp/junos-srx5000-12.1x46-d35-domestic.tgz 2. Disable all physical interfaces for transit traffic on node1 (secondary node) set interfaces xe-12/0/0 disable set interfaces xe-12/3/0 disable 3. Disable TCP SYN check and sequence check set security flow tcp-session no-syn-check set security flow tcp-session no-sequence-check 4. Disable preempt for all RG1+ delete chassis cluster redundancy-group 1 preempt 5. Delete all interface-monitor and ip-monitoring delete chassis cluster redundancy-group 1 interface-monitor delete chassis cluster redundancy-group 1 ip-monitoring 6. Commit the configuration 7. Adjust control-ports (SRX5K) or physically disconnect control link (Branch SRX and SRX1K/3K), and adjust fab interfaces 7. Adjust control-ports (SRX5K) or physically disconnect control link (Branch SRX and SRX1K/3K), and adjust fab interfaces 7a. For SRX5400/5600/5800, change the control and fabric ports to non-exisitng ports. - Control ports need to be set in any SPC port on the device, which does not have a physical connection - Fabric ports can be set in any IOC slots (existing or not) on the device. A simple way is that change the fabric ports to undefined port numbers ( port 40) on the same slot. delete chassis cluster control-ports set chassis cluster control-ports fpc 10 port 0 (SPC port) set chassis cluster control-ports fpc 22 port 0 (SPC port) set interfaces fab0 fabric-options member-interfaces xe-1/3/40 set interfaces fab1 fabric-options member-interfaces xe-13/3/40 NOTE: Assume that port 40 on FPC1 and FPC13 are non-existing ports for fabric link. If configured for dual control links, you need to also include the configuration change for the second control link. 7b. For Branch SRX and SRX1400/3400/3600, the control link(s) will need to be physically disconnected. set interfaces fab0 fabric-options member-interfaces xe-1/0/40 set interfaces fab1 fabric-options member-interfaces xe-14/0/40 7b. For Branch SRX and SRX1400/3400/3600, the control link(s) will need to be physically disconnected. set interfaces fab0 fabric-options member-interfaces xe-1/0/40 set interfaces fab1 fabric-options member-interfaces xe-14/0/40 1
8. Commit the configuration 8. Commit the configuration (Branch SRX and SRX1400/3400/3600 only) NOTE: For Branch SRX and SRX1400/3400/3600, will need to be applied to both nodes independently due to loss of node communication after control link removed in step 7b. NOTE: For SRX5400/5600/5800, upon completion the following errors will be generated due to control link down. These are expected error messages. Technically speaking the candidate configuration is not converted to active configuration on node0 (you can either additional like below or in step 12), but the candidate configuration is now in active configuration on node1. So you do not need to the change on node1. You can check it using show configuration display set match "control-ports fab[01]" command on node1. e.g, root@srx5k# configuration check succeeds error: error communicating with error: remote - configuration failed on node1 error: failed error: Connection to node1 has been broken error: remote unlock- configuration failed on node1 NOTE: In case if you want to exit the configuration mode, you can execute again on node0. root@srx5k# exit The configuration has been changed but not ted Discard unted changes? [yes,no] (yes) no <<< SHOULD be "no" Exit aborted root@srx5k# and- quit complete Exiting configuration mode root@srx5k> show configuration display set match "control- ports fab[01]" set chassis cluster control- ports fpc 10 port 0 set chassis cluster control- ports fpc 22 port 0 set interfaces fab0 fabric- options member- interfaces xe- 1/3/40 set interfaces fab1 fabric- options member- interfaces xe- 13/3/40 NOTE: Before starting node1 upgrade, make sure the active configuration includes the changes on step 7 on both nodes. root@srx5k> show configuration display set match "control- ports fab[01]" set chassis cluster control- ports fpc 10 port 0 set chassis cluster control- ports fpc 22 port 0 set interfaces fab0 fabric- options member- interfaces xe- 1/3/40 set interfaces fab1 fabric- options member- interfaces xe- 13/3/40 NOTE: Before starting node1 upgrade, make sure the active configuration includes the changes on step 7 on both nodes. {disabled:node1} root@srx5k> show configuration display set match "control- ports fab[01]" set chassis cluster control- ports fpc 10 port 0 set chassis cluster control- ports fpc 22 port 0 set interfaces fab0 fabric- options member- interfaces xe- 1/3/40 set interfaces fab1 fabric- options member- interfaces xe- 13/3/40 2
### Start node1 upgrade ### 9. Upgrade Junos OS on the node1 request system software add no-copy no-validate <install-package> 10. Reboot request system reboot 11. After node1 boot with updated Junos OS, all FPCs and PICs should be online before further process (it takes 10-15 minutes depending on the number of FPCs), and node1 should be in primary state for all RGs show version show chassis fpc pic-status show chassis cluser status (node0 should be lost status) NOTE: Prioritis of RG1+ will report priority 0 as part of normal behavior. 12. Before failing over to node1, it is best to verify the configuration change will occur successfully, then - disable all physical interfaces for transit traffic on node0 - enable all physical interfaces for transit traffic on node1 root@srx5k# set interfaces reth0 description TEST root@srx5k# complete root@srx5k# rollback 1 load complete root@srx5k# complete set interfaces xe-0/0/0 disable set interfaces xe-0/3/0 disable delete interfaces xe-12/0/0 disable delete interfaces xe-12/3/0 disable 12. Before failing over to node1, it is best to verify the configuration change will occur successfully, then - disable all physical interfaces for transit traffic on node0 - enable all physical interfaces for transit traffic on node1 root@srx5k1# set interfaces reth0 description TEST root@srx5k# node1: complete root@srx5k# rollback 1 load complete root@srx5k# node1: complete set interfaces xe-0/0/0 disable set interfaces xe-0/3/0 disable delete interfaces xe-12/0/0 disable delete interfaces xe-12/3/0 disable NOTE: Enable all physical interfaces of node1 that were disabled on step 2. NOTE: Enable all physical interfaces of node1 that were disabled on step 2. NOTE: If there are any conflicts, they need to be resolved before moving to the next step. NOTE: If there are any conflicts, they need to be resolved before moving to the next step. 13. Commit the configuration simultaneously on both nodes. This will cause all of the traffic to failover to the node1 NOTE: The total number of minimum down will vary depending on swiching/routing environment. ( dynamic routing, STP, MSTP, RSTP, VSTP, edge, PortFast, and etc). 13. Commit the configuration simultaneously on both nodes. This will cause all of the traffic to failover to the node1 NOTE: The total number of minimum down will vary depending on swiching/routing environment. ( dynamic routing, STP, MSTP, RSTP, VSTP, edge, PortFast, and etc). 14. Verify traffic is passing through node1 show security flow session summary monitor interface traffic 3
### Start node0 upgrade ### 15. Upgrade Junos OS on the node0 request system software add no-validate no-copy <install-package> 16. Reboot request system reboot 17. After node0 boot with updated Junos OS, all FPCs and PICs should be online before further process (it takes 10-15 minutes depending on the number of FPCs), and node0 should be in primary state for all RGs show version show chassis fpc pic-status show chassis cluser status (node0 should be lost status) NOTE: Prioritis of RG1+ will report priority 0 as part of normal behavior. 18. Before re-configuring control-ports (SRX5K) or connecting control link (Branch SRX and SRX1K/3K) and re-configuring fab interfaces, enable interface-monitor which disabled in step 5 set chassis cluster redundancy-group 1 interface-monitor xe-0/0/0 weight 255 set chassis cluster redundancy-group 1 interface-monitor xe-0/3/0 weight 255 set chassis cluster redundancy-group 1 interface-monitor xe-12/0/0 weight 255 set chassis cluster redundancy-group 1 interface-monitor xe-12/3/0 weight 255 18. Before re-configuring control-ports (SRX5K) or connecting control link (Branch SRX and SRX1K/3K) and re-configuring fab interfaces, enable interface-monitor which disabled in step 5 set chassis cluster redundancy-group 1 interface-monitor xe-0/0/0 weight 255 set chassis cluster redundancy-group 1 interface-monitor xe-0/3/0 weight 255 set chassis cluster redundancy-group 1 interface-monitor xe-12/0/0 weight 255 set chassis cluster redundancy-group 1 interface-monitor xe-12/3/0 weight 255 NOTE: You still can configure node1 s interfaces even these are now shown on node0. NOTE: You still can configure node0 s interfaces even these are now shown on node1. 19. Commit the configuration on both nodes 19. Commit the configuration on both nodes 20. Re-configure control-ports (SRX5K) or connect control link (Branch SRX and SRX1K/3K), and re-configure fab interfaces on node0 only (You will configure the below config on node1 at step 22) 20a. For SRX5400/5600/5800, re-configure the correct control and fabric interfaces on node0 delete chassis cluster control-ports set chassis cluster control-ports fpc 11 port 0 set chassis cluster control-ports fpc 23 port 0 set interfaces fab0 fabric-options member-interfaces xe-1/3/0 set interfaces fab1 fabric-options member-interfaces xe-13/3/0 and-quit 20b. For Branch SRX and SRX1400/3400/3600, physically re-connect control link(s) and re-configure fabric interfaces on node0 set interfaces fab0 fabric-options member-interfaces xe-1/0/0 set interfaces fab1 fabric-options member-interfaces xe-14/0/0 and-quit 4
21. Make node0 in halt status by request system halt root@srx5k> request system halt warning: This command will not halt the other routing- engine. If planning to switch off power, use the both- routing- engines option. Halt the system? [yes,no] (no) yes *** FINAL System shutdown message from root@srx5k *** System going down IMMEDIATELY Shutdown NOW! [pid 2193] root@srx5k> failed to set the server tnp addresswaiting (max 60 seconds) for system process `vnlru_mem' to stop...done Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...3 3 1 1 1 1 1 1 0 0 0 0 0 0 done syncing disks... All buffers synced. Uptime: 1h25m0s recorded reboot as normal shutdown The operating system has halted. Please press any key to reboot. NOTE: DO NOT press any key before step 22 is completed. 22. When node0 console prints out The operating system has halted., re-configure control-ports (SRX5K) or connect control link (Branch SRX and SRX1K/3K), and reconfigure fab interfaces 22a. For SRX5400/5600/5800, re-configure the correct control and fabric interfaces on node0 delete chassis cluster control-ports set chassis cluster control-ports fpc 11 port 0 set chassis cluster control-ports fpc 23 port 0 set interfaces fab0 fabric-options member-interfaces xe-1/3/0 set interfaces fab1 fabric-options member-interfaces xe-13/3/0 and-quit 22b. For Branch SRX and SRX/1400/3400/3600, re-connect control link(s) and reconfigure fabric interfaces on node0 set interfaces fab0 fabric-options member-interfaces xe-1/0/0 set interfaces fab1 fabric-options member-interfaces xe-14/0/0 and-quit NOTE: Make sure DO NOT until node0 is in halt status in step 21. NOTE: Make sure node1 is primary for all RGs (show chassis cluster status). 23. Press any key to reboot node0 24. When node0 returns to Up state, verify if it has synchronized with node1. Then enable all physical interfaces for transit traffic on node0, which was disabled in step 12 and enable TCP syn-check/ sequence-check which were disabled in step 3, 4, 5. show chassis fpc pic-status (verify all slots and pics are Online ) show security flow session summary (verify both nodes reporting similar session counts) delete interfaces xe-0/0/0 disable delete interfaces xe-0/3/0 disable delete security flow tcp-session no-syn-check 5
delete security flow tcp-session no-sequence-check 25. Verify if the RG states are back online with the correct priority show chassis cluster status 26. Enable preempt and ip-monitoring if they were configured before for RG1+ set chassis cluster redundancy-group 1 preempt and-quit 27. Optional: Failover RG groups to Node0 (in case preempt is not configured, or is used with higher priority on node1) request chassis cluster failover redundancy-group 0 node 0 request chassis cluster failover redundancy-group 1 node 0 request chassis cluster failover reset redundancy-group 0 request chassis cluster failover reset redundancy-group 1 6