VM Inaccessible - or is it?
I originally posted this on LinkedIn. I'm still figuring out syntax highlighting on Blogger so please bear with the basic formatting for the moment.
This morning I got a strange IM from one of my colleagues. A virtual machine he was working on had become inaccessible in vSphere, but seemed fine otherwise. I decided to have a look around. Hostnames have been changed and sequences have been shortened to protect my institution.
The Summary page for the server, which we'll call dummy-vm-3, indicated that the host it was registered on was vm-host-13. However, the last migration in the Tasks logs before the machine disappeared, indicated a vMotion from vm-host-7 to vm-host-6. Odd.
From the ESXi shell, we can tell which VMs are running, whether they are listed in the client or not. I sshed into "vm-host-13" and ran the following:
Interestingly, dummy-vm-3 was not running, but the hypervisor helpfully informs us that an invalid (i.e. vmx file cannot be read) VM with id 1394 is being hidden from the getallvms output. Could it still be on vm-host-6? I ran the following on vm-host-6:
Bingo. The registration had somehow moved to vm-host-13, but not the VM process itself. We would later find out that this was a fumbled power-on DRS placement after the VM was shut down to remove an extra disk.
At this point, all we needed to do was to unregister dummy-vm-3 from vm-host-13 and register it on vm-host-6. Back on vm-host-13:
References:
https://kb.vmware.com/s/article/1005051
http://www.yellow-bricks.com/2011/11/16/esxi-commandline-work/
http://www.everythingvm.com/content/fixing-invalid-virtual-machines-vmware
This morning I got a strange IM from one of my colleagues. A virtual machine he was working on had become inaccessible in vSphere, but seemed fine otherwise. I decided to have a look around. Hostnames have been changed and sequences have been shortened to protect my institution.
The Summary page for the server, which we'll call dummy-vm-3, indicated that the host it was registered on was vm-host-13. However, the last migration in the Tasks logs before the machine disappeared, indicated a vMotion from vm-host-7 to vm-host-6. Odd.
From the ESXi shell, we can tell which VMs are running, whether they are listed in the client or not. I sshed into "vm-host-13" and ran the following:
~ # vim-cmd vmsvc/getallvms Skipping invalid VM '1394' Vmid Name File Guest OS Version Annotation 1 dummy-vm-1 [datastore-1] dummy-vm-1/dummy-vm-1.vmx winLonghornGuest vmx-08 1290 dummy-vm-2 [datastore-2] dummy-vm-2/dummy-vm-2.vmx rhel6_64Guest vmx-10 ~ # esxcli vm process list dummy-vm-1 World ID: 15566863 Process ID: 0 VMX Cartel ID: 15566860 UUID: 42 3b c7 ff 3f b2 c2 92-6f 3e 21 24 75 17 c5 d6 Display Name: dummy-vm-1 Config File: /vmfs/volumes//dummy-vm-1/dummy-vm-1.vmx dummy-vm-2 World ID: 38217841 Process ID: 0 VMX Cartel ID: 38217840 UUID: 42 3b 17 c0 1a b9 31 a7-f6 70 ad ab 1f 07 51 35 Display Name: dummy-vm-2 Config File: /vmfs/volumes/ /dummy-vm-2/dummy-vm-2.vmx
Interestingly, dummy-vm-3 was not running, but the hypervisor helpfully informs us that an invalid (i.e. vmx file cannot be read) VM with id 1394 is being hidden from the getallvms output. Could it still be on vm-host-6? I ran the following on vm-host-6:
~ # esxcli vm process list dummy-vm-3 World ID: 38659305 Process ID: 0 VMX Cartel ID: 38656655 UUID: 42 3b 41 fa 78 c5 48 61-44 bc 42 e6 68 46 f2 4d Display Name: dummy-vm-3 Config File: /vmfs/volumes//dummy-vm-3/dummy-vm-3.vmx dummy-vm-4 World ID: 34269567 Process ID: 0 VMX Cartel ID: 34269564 UUID: 42 3b 4f c0 38 dc 40 1b-e9 5a 85 40 ac b0 b4 40 Display Name: dummy-vm-4 Config File: /vmfs/volumes/ /dummy-vm-4/dummy-vm-4.vmx
Bingo. The registration had somehow moved to vm-host-13, but not the VM process itself. We would later find out that this was a fumbled power-on DRS placement after the VM was shut down to remove an extra disk.
At this point, all we needed to do was to unregister dummy-vm-3 from vm-host-13 and register it on vm-host-6. Back on vm-host-13:
~ # vim-cmd vmsvc/unregister 1394The command doesn't return anything, but the "Unknown" VM shown when connecting directly to vm-host-3 disappears. Let's register it on vm-host-6 and hope for the best:
~ # vim-cmd /solo/register /vmfs/volumes/After that, the VM re-appeared in the vSphere client and we could access the console again. It was immediately picked up by DRS and thrown to another host, this time successfully./dummy-vm-3/dummy-vm-3.vmx 1724
References:
https://kb.vmware.com/s/article/1005051
http://www.yellow-bricks.com/2011/11/16/esxi-commandline-work/
http://www.everythingvm.com/content/fixing-invalid-virtual-machines-vmware
Comments
Post a Comment