First model year cars

It appears to be obvious that first model year cars are not always good thing to pursuit. Same stands for major OS releases. PAN OS is not failing away from the same tree.

Being convinced by vendor that firewall version 8.1.x is stable, no bugs, plenty of new features, etc i have decided to upgrade my pair of HA firewalls from 7.x to 8.1.x.

Upgrade went as smooth as silk, no issues, upgraded from 7.x to 8.0, then 8.0.8 and 8.1.0. Upgrade process is a bit time consuming but for a pair of firewalls in HA, during the upgrade process i lost only one ping to the 8.8.8.8. Very impressive. My browser sessions were not lost, etc.. The software was running fine for about a month and a half. Then i noticed slowness and non-responsiveness when it comes to committing changes. Then at one time i was not able to login to GUI; it was timing out and management sessions were not established. At some time i managed to get thru and login to GUI and noticed that management plane is taking 85-90% of CPU time. CLI was a bit more quicker but not significantly. I have deleted some log files from firewall but that did not help to much.

Opened up a case with PAN support and discovered that 8.1.0 is not stable and recommended software version as it has to many bugs. Interesting. At the beginning of my upgrade journey, all was looking good and promising from vendor’ standpoint.  Now it is not recommended?? Oh well, lets get the process of downgrading it to 8.0.8 or 8.0.9 which appears to be stable, according to vendor. Finger crossed..

Login to passive unit (i am able to login to passive unit via GUI) and export current config for backup purposes.

Due to high management CPU utilization i was not able to downgrade via GUI. CLI must be used. First i tried to restart management plane so i had opportunity to login to GUI and perform “suspend local device” command.

(active)>debug software restart process management-server
Firewall responded: Server error : Timed out while getting config lock. Please try again.

(active)>debug software restart process web-server
Server error : Timed out while getting config lock. Please try again.

(active)>show high-availability state
HA not available

(active)>show high-availability state

Group 1:
Mode: Active-Passive
Local Information:
Version: 1
Mode: Active-Passive
State: active (last 39 days)
Device Information:
Management IPv4 Address: 10.2.2.1/28
Management IPv6 Address:
Jumbo-Frames disabled; MTU 1500
HA1 Control Links Joint Configuration:
Encryption Enabled: no
Election Option Information:
Priority: 100
Preemptive: yes
Version Compatibility:
Software Version: Match
Application Content Compatibility: Mismatch
Anti-Virus Compatibility: Match
Threat Content Compatibility: Mismatch
VPN Client Software Compatibility: Match
Global Protect Client Software Compatibility: Match
State Synchronization: Complete; type: ethernet
Peer Information:
Connection status: up
Version: 1
Mode: Active-Passive
State: passive (last 39 days)
Device Information:
Management IPv4 Address: 10.2.2.2/28
Management IPv6 Address:
Jumbo-Frames disabled; MTU 1500
Connection up; Primary HA1 link
Election Option Information:
Priority: 120
Preemptive: yes
Configuration Synchronization:
Enabled: yes
Running Configuration: synchronized

Entered configuration mode..

(active)#set deviceconfig high-availability enabled yes group election-option preemptive no
(active)>show high-availability state
(active)>request high-availability state suspend
Successfully changed HA state to suspended

(suspended)>show high-availability state

Group 1:
Mode: Active-Passive
Local Information:
Version: 1
Mode: Active-Passive
State: suspended (last 2 minutes)
State Reason: User requested
Device Information:
Management IPv4 Address: 10.2.2.1/28
Management IPv6 Address:
Jumbo-Frames disabled; MTU 1500
HA1 Control Links Joint Configuration:
Encryption Enabled: no
Election Option Information:
Priority: 100
Preemptive: yes
Version Compatibility:
Software Version: Match
Application Content Compatibility: Mismatch
Anti-Virus Compatibility: Match
Threat Content Compatibility: Mismatch
VPN Client Software Compatibility: Match
Global Protect Client Software Compatibility: Match
State Synchronization: Complete; type: ethernet
Peer Information:
Connection status: up
Version: 1
Mode: Active-Passive
State: active (last 2 minutes)
Device Information:
Management IPv4 Address: 10.2.2.2/28
Management IPv6 Address:
Jumbo-Frames disabled; MTU 1500
Connection up; Primary HA1 link
Election Option Information:
Priority: 120
Preemptive: no
Configuration Synchronization:
Enabled: yes
Running Configuration: synchronized

I was able to login to passive unit and change “Preemptive” to no.

Checking available OS version that can be used to revert back:

(suspended)>debug swm status

Partition State Version
——————————————————————————–
sysroot0 REVERTABLE 8.0.8  = Cool, i can use this one..
sysroot1 RUNNING-ACTIVE 8.1.0
maint READY 8.1.0

(suspended)>debug swm revert
Reverting from 8.1.0 (sysroot1) to 8.0.8 (sysroot0)

(suspended)>debug swm status

Partition       State                           Version
——————————————————————————–
sysroot0        PENDING-REVERT   8.0.8
sysroot1        RUNNING-ACTIVE   8.1.0
maint             READY                       8.1.0

(suspended)>show jobs pending

Enqueued           Dequeued          ID            PositionInQ           Type Status Result Completed
——————————————————————————————————————————————
2018/04/26           15:33:47           264                1                        WildFire QUEUED PEND 0%
2018/04/26           17:00:52           275                2                         EDLRefresh QUEUED PEND 0%

(suspended)>show jobs all

Enqueued                      Dequeued   ID   PositionInQ Type Status Result Completed
——————————————————————————————————————————————
2018/04/26    17:00:52                       275           1           EDLRefresh QUEUED PEND 0%
2018/04/26    17:30:01 17:30:01       278                       _SystemWildfireUpdate_ ACT PEND 0%
2018/04/26 15:33:47 17:28:56         264                         WildFire ACT PEND 49%
2018/04/26 17:02:08 17:02:08         276                         Downld FIN OK 17:03:30

After a while, 10-15 minutes,

(suspended)>show jobs all

Enqueued                   Dequeued       ID                  PositionInQ Type Status Result Completed
——————————————————————————————————————————————
2018/04/26 17:00:52 17:33:02         275                  EDLRefresh ACT PEND 98%
2018/04/26 17:02:08 17:02:08         276                   Downld FIN OK 17:03:30

(suspended)> show system info

hostname: FEI-NGF-001
ip-address: 10.2.2.1
public-ip-address: unknown
netmask: 255.255.255.240
default-gateway: 10.2.2.14
ip-assignment: static
ipv6-address: unknown
ipv6-link-local-address: fe80::290:bff:fe28:349e/64
ipv6-default-gateway:
mac-address: 00:90:0b:28:34:9e
time: Thu Apr 26 17:34:56 2018
uptime: 39 days, 8:21:57
family: 5000
model: PA-5020
serial: 000000000
cloud-mode: non-cloud
sw-version: 8.1.0
global-protect-client-package-version: 3.1.3
app-version: 8010-4662
app-release-date: 2018/04/24 14:54:12 PDT
av-version: 2592-3088
av-release-date: 2018/04/26 04:03:42 PDT
threat-version: 8010-4662
threat-release-date: 2018/04/24 14:54:12 PDT
wf-private-version: 0
wf-private-release-date: unknown
url-db: brightcloud
wildfire-version: 238788-241269
wildfire-release-date: 2018/04/26 17:00:51 PDT
url-filtering-version: 5536
global-protect-datafile-version: 0
global-protect-datafile-release-date: unknown
global-protect-clientless-vpn-version: 68-108
global-protect-clientless-vpn-release-date: 2018/03/15 13:16:24 PDT
logdb-version: 8.1.8
platform-family: 5000
vpn-disable-mode: off
multi-vsys: on
operational-mode: normal

(suspended)>show jobs all

Enqueued                  Dequeued     ID           PositionInQ Type Status Result Completed
——————————————————————————————————————————————
2018/04/26 17:00:52 17:33:02       275         EDLRefresh ACT PEND 0%
2018/04/26 17:02:08 17:02:08       276         Downld FIN OK 17:03:30

(suspended)>debug swm status

Partition                    State                      Version
——————————————————————————–
sysroot0                  PENDING-REVERT  8.0.8
sysroot1                  RUNNING-ACTIVE  8.1.0
maint                       READY                      8.1.0

(suspended)>show jobs all

Enqueued                  Dequeued ID            PositionInQ Type Status Result Completed
——————————————————————————————————————————————
2018/04/26 17:00:52 17:33:02                    275                  EDLRefresh ACT PEND 0%
2018/04/26 17:02:08 17:02:08                    276                  Downld FIN OK 17:03:30

Now we can do system restart, since EDLRefresh process completed.

(suspended)>request restart system
Executing this command will disconnect the current session. Do you want to continue? (y or n)
Broadcast message from root (pts/0) (Thu Apr 26 17:41:12 2018):
The system is going down for reboot NOW!

 

After reboot, wait til GUI is reachable again, login to firewall and check HA status. This one, running 8.0.8 should be active and passing traffic. Again, i lost only one ping during this whole porecess. The passive unit (which was assigned to active) was passing traffic and had info about all previously opened sessions. No issues whatsoever..

Now, go to active GUI, check HA status and once it is showing all green, login to passive unit and install (via GUI) software version 8.0.8 – from device/software run install, in the same way you would run installation of other software versions.

Reboot passive unit, login to passive GUI and check status: HA, software versions, config sync, etc.. All should be good and green. The only issue i have encountered is that i lost login banner message: i had to reenter it to the firewalls and setup “Administrators must acknowledge login banner” check box.

***************************************************************
* UNAUTHORIZED ACCESS TO THIS NETWORK DEVICE IS PROHIBITED. *

You must have explicit permissions to access or configure this device.
All activities performed on this device may be logged.
There is no right to privacy on this device.
Access violations may be reported to law enforcement and may be
subject to civil and /or criminal penalties.

**************************************************************

Overall, successful process. It took me about 1.5 hours, the HA part worked perfectly, the systems were in production all the time and with exception of two lost pings there were no issues reported.

Now i can do this for living… 🙂