<p”>Exadata Cloud@Customer has the particularity of bringing the best of both worlds, where on-premises Data sovereignty meets the innovation & capabilities of the Cloud. Thanks to the Control plane network that links up both ExaCC servers and OCI, users can create/manage resources through the Console or any API-based cloud tooling (terraform, OCI-CLI, SDK..). Everything you do on the exaC@C is synchronized into OCI through that layer.
I’ll describe a small glitch that sometimes happens to a database resource. It has no incidence on the database itself because, under EaxC@C, it works just fine. However, you can see in the screenshot that databases are marked as failed while they are actually “up and running”(and accessible) databases.
+-------------+-----------+------------------------------------+-----------+
| Unique-Name | charset | id | state |
+-------------+-----------+------------------------------------+-----------+
| MYCDB1_DOM | AL32UTF8 | ocid1.database.oc1.ca-toronto-1.xxa| FAILED |
+-------------+-----------+------------------------------------+-----------+
State
We need to be mindful of what the state column really means. It’s quite self-explanatory after a deployment attempt, but for an existing DB, a state often means the database resource is down/up. In our case, however, I couldn’t detect the resource anymore, hence the state info shows “FAILED”.
But before delving into it, let’s review how ExaCC database resources are seen & registered on the OCI side.
DB registration allows performing admin tasks on the exaC@C database through OCI console & Cloud tooling.
Each database created in Exadata Cloud@Customer using API/Console will automatically be registered in OCI.
Minus a few exceptions, where OCI allows for a manual registration which are:
cases:
– Database, that you manually created on Exadata Cloud at Customer, using DBCA
– Existing database, that you migrated from another platform to Exadata Cloud@Customer.
This is done through dbaascli registerdb function, read more on Registring a Database.
Files created after registration
Each registered database will generate a cloud registration file (DBname.ini
) located under the below directory.
$ ll /var/opt/oracle/creg/*ini
… MYCDB1.ini
I first decided checked a workaround described below
Doc ID 2764524.1 EXACS DBs Show Wrong State (Failed) on OCI Webconsole
Cause: DBs registered in crs with dbname in lowercase (dborcl) instead of uppercase (DBORCL).
Suggested solution: Create a symbolic link to creg db ini file to match the case for the DB name registered in CRS.
Outcome: This didn’t fix my problem so I opened an SR to get to the bottom of this.
This took help from support, as they have a better view of Control plane resources metadata. Taking a look at cloud registration file content, we can see that it contains DB information usually present in the crs plus a few parameters present in the spfile.
$ more /var/opt/oracle/creg/MYCDB1.ini
#################################################################
# This file is automatically generated by database as a service # #################################################################
acfs_vol_dir=/var/opt/oracle/dbaas_acfs
acfs_vol_sizegb=10
agentdbid=83112625-52d2-4b39-b987-1b0d7d2d70cb
aloc=/var/opt/oracle/ocde/assistants
archlog=yes
bkup_asm_spfile=+DATA1/MYCDB1_DOM/spfilemycdb1.ora
…
Agent resource id
Notice the agentdbid in the .ini registration file. Agent resource id, is actually the id that the control plane layer uses to identify & interact with the DB
“agentdbid=83112625-52d2-4b39-b987-1b0d7d2d70cb”
On top of the registration file, the agent id is also written in a rec file under /var/opt/oracle/dbaas_acfs/<DBNAME>
$ more /var/opt/oracle/dbaas_acfs/MYCDB1/83112625-52d2-4b39-b98xx.rec
{
"agentdbid" : "83112625-52d2-4b39-b987-1b0d7d2d70cb"
}
According to OCI support, Somehow the Agent Resource ID seen in the Control plane UI console was different than the agent did in the corresponding *.ini file.
Take note of the agent id communicated by the support engineer & replace the id in the .ini and the .rec file.
Take a backup of {DBNAME}.ini file of above two DBS on all DB nodes
sudo su - oracle
$ cd /var/opt/oracle/creg
$ cp /var/opt/oracle/creg/MYCDB1.ini /var/opt/oracle/creg/MYCDB1.ini.old
Modify ID in {DBNAME}.ini file of the DB with the value of Agent Resource ID seen in the support console.
-- Replace agentdbid= >> by 47098321-43d1-4b44-b997-1b0d5d1d90cb
$ vi /var/opt/oracle/creg/MYCDB1.ini
Remove the old rec file with the wrong resource and replace it with a new rec file with the right recid
rm /var/opt/oracle/dbaas_acfs/MYCDB1/83112625-52d2-4b39-b987-1b0d7d2d70cb.rec
$ vi /var/opt/oracle/dbaas_acfs/MYCDB1/47098321-43d1-4b44-b997-1b0d5d1d90cb.rec
{
"agentdbid" : "47098321-43d1-4b44-b997-1b0d5d1d90cb" << new value
}
After the change, wait for an hour or so, for Control Plan to get in sync and verify the DB status
+-------------+-----------+------------------------------------+-----------+
| Unique-Name | charset | id | state |
+-------------+-----------+------------------------------------+-----------+
| MYCDB1_DOM | AL32UTF8 | ocid1.database.oc1.ca-toronto-1.xxa| AVAILABLE |
+-------------+-----------+------------------------------------+-----------+
You can’t see the agent resource id in your console as an end user. It is unfortunately internal metadata for the control plane. This means you will have to open an SR each time an issue like this happens. However, I have opened an enhancement request to allow users to see the control plane agentid.
We can say that failed database state in the OCI console doesn’t always mean the resource is down
It is possible that migrated databases from another platform could lead to this phenomenon
There is no way as of now for you to know the agent resource id that the control plane is seeing.
Hope control plane metadata like agent resource id visibility can be achieved in a future release.
Until then this workaround can still help those who spot such behavior