How to continue a Grid Infrastructure Patch in ExaCC when it has failed in Node 1 – Eclipsys

Written by Gustavo Rene Antunez | Mar 7, 2022 5:30:00 PM

This week we had an issue when we were patching EXACC Grid Infrastructure (GI) from 19.10 to 19.11. The patching died when it was running the post-patch on node 1 and left the patch in a weird “ROLLING PATCH” mode as it hadn’t finished the 19.11 patching (rootcrs.sh -post patch) in node 1 and node 2 was still running 19.10.

What you see below is the command that was executed to initially patch the EXACC GI and the last entry in the log and that is where it died.

[root@hostname1 ~]# dbaascli patch db apply --patchid 32545008-GI --dbnames grid
...
2021-09-23 10:15:32.233368 - INFO: Running /u01/app/19.0.0.0/grid/crs/install/rootcrs.sh -postpatch
2021-09-23 10:15:32.233539 - Output from cmd /u01/app/19.0.0.0/grid/crs/install/rootcrs.sh -postpatch run on localhost  is:

We looked at the GUI (Graphical User Interface) console, and it was mentioned that the cluster was patched in the 19.11 version.

However, if you went to the command line of both nodes, you would have seen that the patch was not finished.

[grid@hostname1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
32847378;OCW Interim patch for 32847378
32585572;DBWLM RELEASE UPDATE 19.0.0.0.0 (32585572)
32584670;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32584670)
32576499;ACFS RELEASE UPDATE 19.11.0.0.0 (32576499)
32545013;Database Release Update : 19.11.0.0.210420 (32545013)

OPatch succeeded.

[grid@hostname2 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
32240590;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32240590)
32222571;OCW Interim patch for 32222571
32218663;ACFS RELEASE UPDATE 19.10.0.0.0 (32218663)
32218454;Database Release Update : 19.10.0.0.210119 (32218454)
29340594;DBWLM RELEASE UPDATE 19.0.0.0.0 (29340594)

OPatch succeeded.

The following thing to try was to relaunch the dbaascli, but it failed as it detected that a patching operation was going on.

[root@hostname1 ~]# dbaascli patch db apply --patchid 32545008-GI --dbnames grid
...
The current operation apply_async is blocked on node hostname1 due the following error: The current operation cannot proceed due a previous ongoing patching operation was detected

The fix to this issue is to run the post-patch as root in node 1.

[root@hostname1 ~]# /u01/app/19.0.0.0/grid/crs/install/rootcrs.sh -postpatch

Now that the post-patch is finished, the following step is to verify the stack, the patch version, and status.

[grid@hostname1 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [1944883066].

[grid@hostname1 ~]$ crsctl query crs releasepatch
Oracle Clusterware release patch level is [1988519045] and the complete list of patches [32545013 32576499 32584670 32585572 32847378 ] have been applied on the local node. The release patch string is [19.11.0.0.0].

[grid@hostname1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Now in node 2, we ran the following command to finish the patching. As you can see, it is a bit different from what we ran initially.

[root@hostname2 ~]# dbaascli patch db apply --patchid 32545008-GI --instance1 hostname2:/u01/app/19.0.0.0/grid --dbnames grid

Once the dbaascli command had finished, now we saw both nodes with the same patch version and the cluster state as NORMAL.

[grid@hostname1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
32847378;OCW Interim patch for 32847378
32585572;DBWLM RELEASE UPDATE 19.0.0.0.0 (32585572)
32584670;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32584670)
32576499;ACFS RELEASE UPDATE 19.11.0.0.0 (32576499)
32545013;Database Release Update : 19.11.0.0.210420 (32545013)

OPatch succeeded.

[grid@hostname2 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
32847378;OCW Interim patch for 32847378
32585572;DBWLM RELEASE UPDATE 19.0.0.0.0 (32585572)
32584670;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32584670)
32576499;ACFS RELEASE UPDATE 19.11.0.0.0 (32576499)
32545013;Database Release Update : 19.11.0.0.210420 (32545013)

OPatch succeeded.

[grid@hostname2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [19.0.0.0.0]

[grid@hostname2 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
 
[grid@hostname2 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [1988519045].

Hope this blog helps you out should you ever be in a patching situation where one node is not patched when the patching died in node 1.

View full post