We use Hp Data Protector 6.21 to backup our virtual machines running on vSphere 4.1. HP DP VMware integrated backups use their VEPA agents. Which takes a snapshot of the VM, and allows it to back up the base disks. The snapshot is then committed after the backup has completed.
It would appear that DP VEPA and the VMware backup API don’t play all that well together and you can see issues with snapshots being left behind. Something any good VM admin knows can be messy and dangerous if left unchecked.
What goes wrong?
- The DP backup manages to create the snap shots
- The snap shot are not deleted and the backup shows an error
- The VM can show that its running on the snap shot but this is not always the case
- The snap shot manager does not show any snap shots
- The snap shots can only be viewed in the data store browser and cannot be deleted
- The next backup occurs and seems to fail again and add another snap shot to the chain
- Eventually the snap shots fill the data store
What have we done?
Beyond trying to mange the situation using vCenter alarms we have attempted to fix the issue working with HP and VMware. The net result is that HP blame VMware and VMware say that it’s a known fault and will be rectified in the next big release of vSphere.. Well that’s not helpful but what can we do.
So onto how we have attempted to fix these snapshots that are left behind.
Doing a V2V conversion does not work either as it doesn’t actually see the disks.
Doing a vMotion or Storage vMotion does not release the locks, in some cases the svMotion does change the name of the file and shows that the VM is actually on the base disk again.
The snapped disk cannot be downloaded either even though they start to they fail about half way through.
How to remove the snap shots
- Create a manual snapshot against the machine
- Commit the snap shot by doing a delete all
I noticed that on the occasions when this has worked the snapshot manager actually lists one of the old VEPA snaps and they do all disappear. I think I have had this work once out of five times so far.
- Clone the VM to another data store, this can be done whilst the VM is online, but in that case ensure you don’t power on the clone with the NIC connected!
- This has 99% success rate so far but does incur some additional steps
- Once the server has been tested you can delete the old server. For the most part the only thing to change with the new clone is the VM MAC address
- The VM in the inventory must be renamed
- After the rename perform a storage vMotion so that the underlying folder structure and disk names in the data store are renamed correctly.
- Should the clone fail as it has once you can commit the snap shots manually using the command line
- Please follow this guide step by step to do it http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007849