VM Inaccessible - or is it?

I originally posted this on LinkedIn. I'm still figuring out syntax highlighting on Blogger so please bear with the basic formatting for the moment.

This morning I got a strange IM from one of my colleagues. A virtual machine he was working on had become inaccessible in vSphere, but seemed fine otherwise. I decided to have a look around. Hostnames have been changed and sequences have been shortened to protect my institution.

The Summary page for the server, which we'll call dummy-vm-3, indicated that the host it was registered on was vm-host-13. However, the last migration in the Tasks logs before the machine disappeared, indicated a vMotion from vm-host-7 to vm-host-6. Odd.

From the ESXi shell, we can tell which VMs are running, whether they are listed in the client or not. I sshed into "vm-host-13" and ran the following:
~ # vim-cmd vmsvc/getallvms
Skipping invalid VM '1394'
Vmid Name  File         Guest OS   Version Annotation
1 dummy-vm-1 [datastore-1] dummy-vm-1/dummy-vm-1.vmx winLonghornGuest vmx-08
1290 dummy-vm-2 [datastore-2] dummy-vm-2/dummy-vm-2.vmx rhel6_64Guest  vmx-10

~ # esxcli vm process list
dummy-vm-1
   World ID: 15566863
   Process ID: 0
   VMX Cartel ID: 15566860
   UUID: 42 3b c7 ff 3f b2 c2 92-6f 3e 21 24 75 17 c5 d6
   Display Name: dummy-vm-1
   Config File: /vmfs/volumes//dummy-vm-1/dummy-vm-1.vmx

dummy-vm-2
   World ID: 38217841
   Process ID: 0
   VMX Cartel ID: 38217840
   UUID: 42 3b 17 c0 1a b9 31 a7-f6 70 ad ab 1f 07 51 35
   Display Name: dummy-vm-2
   Config File: /vmfs/volumes//dummy-vm-2/dummy-vm-2.vmx

Interestingly, dummy-vm-3 was not running, but the hypervisor helpfully informs us that an invalid (i.e. vmx file cannot be read) VM with id 1394 is being hidden from the getallvms output. Could it still be on vm-host-6? I ran the following on vm-host-6:
~ # esxcli vm process list

dummy-vm-3
   World ID: 38659305
   Process ID: 0
   VMX Cartel ID: 38656655
   UUID: 42 3b 41 fa 78 c5 48 61-44 bc 42 e6 68 46 f2 4d
   Display Name: dummy-vm-3
   Config File: /vmfs/volumes//dummy-vm-3/dummy-vm-3.vmx

dummy-vm-4
   World ID: 34269567
   Process ID: 0
   VMX Cartel ID: 34269564
   UUID: 42 3b 4f c0 38 dc 40 1b-e9 5a 85 40 ac b0 b4 40
   Display Name: dummy-vm-4
   Config File: /vmfs/volumes//dummy-vm-4/dummy-vm-4.vmx

Bingo. The registration had somehow moved to vm-host-13, but not the VM process itself. We would later find out that this was a fumbled power-on DRS placement after the VM was shut down to remove an extra disk.

At this point, all we needed to do was to unregister dummy-vm-3 from vm-host-13 and register it on vm-host-6. Back on vm-host-13:
~ # vim-cmd vmsvc/unregister 1394
The command doesn't return anything, but the "Unknown" VM shown when connecting directly to vm-host-3 disappears. Let's register it on vm-host-6 and hope for the best:
~ # vim-cmd /solo/register /vmfs/volumes//dummy-vm-3/dummy-vm-3.vmx
1724
After that, the VM re-appeared in the vSphere client and we could access the console again. It was immediately picked up by DRS and thrown to another host, this time successfully.

References:
https://kb.vmware.com/s/article/1005051
http://www.yellow-bricks.com/2011/11/16/esxi-commandline-work/
http://www.everythingvm.com/content/fixing-invalid-virtual-machines-vmware

Comments

Popular posts from this blog

Secure Boot failure on ESXi 7.0 U3 - Solved

MikroTik CRS309-1G-8S+IN in 2023

ADFS RelayState (IdP-initiated sign-on deep links)