How to consolidate a snapshot in VMware ESX when there are virtual machine errors


Uploaded by VMwareKB on 08.12.2010

Transcript:
This video will follow VMware Knowledgebase article 1007849
on resolving issues when consolidating snapshots on an ESX server.
This is an example of one of the symptoms seen when trying to power on the virtual machine with snapshots.
So from the Virtual Infrastructure client we select the VM and try to power it on.
We see the following error message
which indicates that the parent virtual disk has been modified since the child was created.
From the client, we can edit the virtual machine's settings
and look at the hard disk associated with this virtual machine.
So with this example, we see there is one hard disk attached and it is running on a snapshot.
To resolve this issue
we need to connect to the ESX host directly via command line.
We can access this via the console, or in this example, via an SSH session.
We need to navigate to the home directory of the problem virtual machine.
I can use the following command to list any registered virtual machines on this ESX host.
In this example, I just have one virtual machine
so I am just going to change into the home directory of this virtual machine.
So here I am listing out all of the files associated with this virtual machine.
Just to explain what some of these files are in the virtual machine's directory.
The .vmx file is the virtual machine's own configuration file.
Here we see a flat.vmdk file which is the base disk.
This example is a 5 gig disk, and each flat file has a smaller .vmdk descriptor file.
We also see a number of delta files, and these are the snapshots.
And each delta file has an associated smaller .vmdk descriptor file.
Before proceeding, we need to know all the virtual disks associated with the virtual machine.
Note some disks may be located on different datastores.
For this example, we are just going to use disks located on the same datastore.
We can use the following command to see what is referenced in the virtual machine's .vmx config file.
So here we see there is one disk in use, and it's a snapshot file.
We also need to know how much space the base disk and any associated snapshots are taking up.
The following command will prove useful.
So here we see there is a 5 gig base disk
and a number of delta or snapshot files.
If any virtual disks are located on a different datastore
you need to run the same command on the associated .vmdk's on that datastore.
We also need to ensure that the virtual machine is powered off.
We can confirm this using the following command.
So here we see that the virtual machine is powered off.
We also need to ensure that there is enough space to commit the snapshots on the datastore.
So we can use this command
So in this example, the virtual machine is located on the following datastore
which has ample space available.
But in this example, I'm going to clone the disks to a new datastore
set up for this purpose.
We will now verify that the snapshot chain is intact.
So we need to examine the CID's and parent CID's for each of the snapshots to disks in the chain.
The CID is a hexadecimal digit number identifying the disk.
So as before, the following command tells us what snapshot is being referenced by the virtual machine configuration file.
So here we see this snapshot has been referenced.
We will now examine its descriptor file and look at the CID's.
We can do this using the following useful command.
So here we see snapshot 3 is pointing to its parent snapshot, which is the following one.
So we're going to follow the snapshot chain back and just make sure that the parent CID's and CID's match as follows.
So here we see that the parent CID matches the CID of snapshot 2.
I'm going to repeat for the third snapshot.
So here again we see that the parent CID for snapshot 2 matches the CID of snapshot 1.
If there were any discrepancy here, we would need to edit the descriptor .vmdk file so that both numbers coincide.
And the parent disk for snapshot 1 is actually the base disk.
So I'm just going to examine the CID on that disk as well.
So in this example I see that the parent CID is actually different to the CID of the base disk
which is a cause of why we are not able to power on this virtual machine.
So I need to update the parent CID in the descriptor file
to match what is in the descriptor file of the base disk.
So we are now going to edit the descriptor file for snapshot 001
so that its parent CID matches the CID of the base disk.
It is a good idea to make a backup copy of any descriptor files before you change them.
So we are now going to verify that our changes are successful.
Ok, so now we see that the parent CID of the snapshot matches the CID of the base disk.
Now that we have verified the snapshot chain
we will use the "vmkfstools -i" command to copy out and consolidate the snapshots all in one step.
Please be aware that committing snapshots can take an extended amount of time
depending on the number of and size of snapshots in use.
My next step is to create a directory on a target datastore into which I'm going to clone the snapshots and base disk of this virtual machine.
To run the "vmkfstools -i" command, I use the following syntax.
So as you can see the clone process has started
and will continue until it reaches 100 percent.
So this example's clone operation completed.
However, if this command doesn't work
it may be that the snapshot you have chosen has been corrupted
especially if the datastore LUN ran out of space.
So in that case, repeat the "vmkfstools -i" command, but this time use the next snapshot up the tree.
So now that we have cloned the original virtual machine's disk and snapshots into a new disk and a new datastore
we are going to edit the virtual machine's settings
and point the virtual machine to the new disk.
We can do this from the Virtual Infrastructure client.
We are going to use an existing virtual disk.
We are going to browse to the datastore path.
I created a new disk on the new datastore
in a folder called "recover".
Here I have my new disk which is 5 gigs in size
and it contains all the data from the snapshots from the original virtual machine disk.
What remains now is to power on the virtual machine.
Once we power up the virtual machine
we need to ensure that we have all the data we need
and if not we need to go back through the process
and go back to the previous snapshot in the chain
in case any information was missed from the snapshot that we chose to commit from.
Once we have verified that the powered-on virtual machine's data is ok using the recover disk
we can go back and clean up the original virtual machine's base disk and snapshots
to save space on the datastore LUN.
Here we see from the SSH session that the existing disks and snapshots still exist.
Currently the virtual machine is running off the recover disk on the new datastore.
To remove the original base disk and snapshot files
we can use the "rm" command, but first we list them out.
So we can simply use the "rm" command
and we will also remove the associated descriptor files.
We can copy the recover disk back to the original datastore location using the "vmkfstools -i" command.
First we need to ensure that we have the virtual machine powered off.
Then we follow with the "vmkfstools -i" command.
Again, this will take a number of minutes to complete.
Once the clone has completed, you can edit the virtual machine settings
and detach the recover disk
and instead replace it with the new disk which has been cloned back to the original datastore location.
We can do that as follows by editing the virtual machine settings.
Now we are going to add in the hard disk which we have cloned back to the original datastore.
Now we can power on the virtual machine just to make sure that everything is ok.
Here we see the virtual machine starting again.
The last step then is to clean up the snapshot database.
We can do this by renaming the .vmsd file.
And finally, to clean up the snapshot database
we need to power off the virtual machine
and remove it from the ESX host's inventory, and re-add it. We can do that as follows.
Now we are free to power on the virtual machine again.