We recently experienced an issue with our client’s virtualized ODA (Oracle Database Appliance) with the following details.
VM-ODA_BASE ODA X5-2
Version
——-
12.1.2.12.0
Our client mentioned that one of the (Virtual Machines) VMs domiciled on the ODA was inaccessible so we went ahead and did the following by looking for more information on the two-node ODA X5-2.
Checking from ODA_BASE
[root@eng-oda-base0 ~]# oakcli show vm
NAME NODENUM MEMORY VCPU STATE REPOSITORY
eng-dns01 1 2048M 2 ONLINE vmrepo
eng-mgmt-web-01 1 8192M 3 ONLINE vmrepo
eng-prod-recovery-01 0 29696M 12 ONLINE vmrepo
eng-prod-recovery-02 1 29696M 12 ONLINE vmrepo
eng-prod-recovery-03 1 29696M 12 ONLINE vmrepo
eng-prod-mgmt-01 1 16384M 2 ONLINE vmrepo
eng-prod-mgmt-02 1 16384M 2 ONLINE vmrepo
eng-prod-oam-01 1 8192M 2 ONLINE vmrepo
eng-prod-oam-02 1 8192M 2 ONLINE vmrepo
eng-prod-ssa-01 1 7500M 3 ONLINE vmrepo
eng-prod-web-01 1 8192M 3 ONLINE vmrepo
eng-prod-web-02 1 8192M 3 ONLINE vmrepo
eng-oau-mgmt-01 0 16384M 2 ONLINE devvms
eng-oau-mgmt-01_clone 0 16384M 2 OFFLINE vmrepo
eng-oau-oam-01 1 8192M 2 ONLINE devvms
eng-oau-web-01 0 8192M 2 ONLINE devvms
The VM eng-prod-mgmt-01 looked to be online on the ODA_BASE level even though it was reported as inaccessible
We then established a ping to the problematic VM and it was unsuccessful with constant timeouts
A further check on the VM showed it to still be online as shown below
[root@eng-oda-base0 ~]# oakcli show vm eng-prod-mgmt-01
The Resource is : eng-prod-mgmt-01
AutoStart : restore
CPUPriority : 100
Disks : |file:/OVS/Repositories/vmrepo/.ACF
S/snaps/eng-prod-mgmt-01/VirtualMac
hines/eng-prod-mgmt-01/OEL7.4.img,x
vda,w|
Domain : XEN_PVM
DriverDomain : False
ExpectedState : online
FailOver : true
IsSharedRepo : true
Keyboard : en-us
MaxMemory : 16384M
MaxVcpu : 2
Memory : 16384M
Mouse : OS_DEFAULT
Name : eng-prod-mgmt-01
Networks : |bridge=vsok1||bridge=prodv|
NodeNumStart : 1
OS : OL_5
PrefNodeNum : 0
PrivateIP : None
ProcessorCap : 0
RepoName : vmrepo
State : Online
TemplateName : otml_OEL_7_5
VDisks : |0|
Vcpu : 2
cpupool : default-unpinned-pool
vncport : 5907
One interesting observation was that the preferred node is node 0 but the VM seemed to have been started/migrated on node 1
NodeNumStart : 1
OS : OL_5
PrefNodeNum : 0
We attempted an online migration to the preferred node 0 as follows
[root@eng-oda-base0 ~]# oakcli migrate vm eng-prod-mgmt-01
OAKERR : 9002 : Repo: vmrepo is not online on node: 0
The action failed, so taking a closer look at the repo concerned showed the following
[root@eng-oda-base0 ~]# oakcli show repo
NAME TYPE NODENUM FREE SPACE STATE SIZE
odarepo1 local 0 N/A N/A N/A
odarepo2 local 1 N/A N/A N/A
vmrepo shared 0 N/A UNKNOWN N/A
vmrepo shared 1 N/A UNKNOWN N/A
devvms shared 0 N/A UNKNOWN N/A
devvms shared 1 N/A UNKNOWN N/A
[root@eng-oda-base0 ~]# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base has been set to /u01/app/grid
[root@eng-oda-base0 ~]# oakcli show repo
NAME TYPE NODENUM FREE SPACE STATE SIZE
odarepo1 local 0 N/A N/A N/A
odarepo2 local 1 N/A N/A N/A
vmrepo shared 0 7.71% ONLINE 4072960.0M
vmrepo shared 1 7.71% ONLINE 4072960.0M
devvms shared 0 52.76% ONLINE 1512448.0M
devvms shared 1 52.76% ONLINE 1512448.0M
The repository show online after setting the ASM Environment
We attempted the Virtual Machine Migration again
[root@eng-oda-base0 ~]# oakcli migrate vm eng-prod-mgmt-01
OAKERR:7079 Error encountered while migrating VM eng-prod-mgmt-01 – OAKERR:7079 Error encountered while migrating VM eng-prod-mgmt-01 – Error: /usr/lib64/xen/bin/xc_restore 4 23 3 5 1 1 1 0 1 failed
Checking on DOM 0
[root@eng-oak2-dom0 ~]# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 3863 20 r—– 19335893.9
eng-dns01 5 2051 2 -b—- 73429.3
eng-mgmt-web-01 10 8195 3 -b—- 354394.6
eng-prod-recovery-02 134 29699 12 -b—- 12829439.8
eng-prod-recovery-03 121 29699 12 r—– 23506726.9
eng-prod-mgmt-01 146 16387 2 -b—- 1473.6
eng-prod-mgmt-02 12 16387 2 -b—- 254483.2
eng-prod-oam-01 145 8195 2 -b—- 1324.9
eng-prod-oam-02 3 8195 2 -b—- 863358.7
eng-prod-ssa-01 148 7503 3 -b—- 9644.3
eng-prod-web-01 136 8195 3 -b—- 26821.4
eng-prod-web-02 11 8195 3 -b—- 1203707.9
eng-oau-oam-01 13 8195 2 -b—- 874714.3
oakDom1 1 81923 24 r—– 50625431.6
The VM (Virtual Machine) was found to be running on node 1 instead of node 0
Login to VM using VNC (Virtual Network Computing)
Run the following on DOM (Document Object Model) where the problematic VM is currently running
xm list -l 146 | grep 59
59
59
(uuid 018e3193-36c9-9159-02df-d0930275427b)
(location 0.0.0.0:5906)
Indicating that we can use VNC port 5906 to login and see what’s going on
After logging in using VNC viewer as follows DOM 0 NODE-IP:5906
We realized that the VM was running on run level 1 safe mode so we took a look at journalctl searching for any recent errors
The following was discovered
…
Found device root=/dev/mapper/ol-root
started filesystem check on /dev/mapper/ol-root
started dracut initqueue hook
Reached target Remote File Systems (Pre)
Reached target Remote File Systems
Mounting /sysroot…
[ ***] A start job is running for /sysroot (3min 59s / 4min 31s)[240.527013] INFO: task mount:406 blocked for more than 120 seconds.
[ 240.527056] “echo 0 > /proc/sys/kernel/hung_task_timeout+secs” disables this message.”
[FAILED] Failed to mount /sysroot.
See ‘systemctl status sysroot.mount’ for more details.
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Reload Configuration from the Real Root.
[ OK ] Stopped dracut pre-pivot and cleanup hook.
[ OK ] Stopped target Initrd Default Target.
[ OK ] Reached target Initrd File System.
[ OK ] Stopped dracut mount hook.
[ OK ] Stopped target Basic System.
[ OK ] Stopped System Initialization.
Starting Emergency Shell…
Generating “/run/initramfs/rdsosreport.txt”
Entering emergency mode. Exit the shell to continue.
Type “journalctl” to view system logs.
You might want to save “/run/initramfs/rdsosreport.txt” to a USB stick or /boot
after mounting them and attaching them to a bug report.
We ran the following and issued a reboot afterward
# xfs_repair /dev/mapper/ol-root
If the above does not work, try the following and reboot
# xfs_repair -L /dev/mapper/ol-root
Now VM is running on the correct node and is accessible
Last login: Fri Apr 17 09:16:28 2020 from 192.168.199.151
[root@eng-oak1-dom0 ~]# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 3852 20 r—– 20699947.9
eng-mgmt-web-01 15 8195 3 -b—- 31474.3
eng-prod-recovery-01 21 29699 12 r—– 7436775.4
eng-prod-mgmt-01 24 16387 2 -b—- 9.1
eng-prod-mgmt-02 13 16387 2 -b—- 25035.3
eng-prod-oam-01 8 8195 2 -b—- 1577418.4
eng-prod-oam-02 10 8195 2 r—– 3543915.9
eng-prod-ssa-01 9 7503 3 -b—- 1532317.8
eng-prod-web-01 5 8195 3 -b—- 1505332.2
eng-prod-web-02 22 8195 3 -b—- 4841.3
eng-oau-mgmt-01 4 16387 2 -b—- 1325935.3
eng-oau-web-01 3 8195 2 -b—- 361317.7
oakDom1 1 81923 24 r—– 79419898.1