F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues #3163

kvaps · 2019-04-01T21:06:04Z

Note for reviewer:

delete action changes:
- Now it will take last VM's host, instead using host from BRIDGE_LIST
  fixes LVM cleanup executes on master instead storage node #2352 (but I still thinking that it is should be fixed on API level)
- Support for both path types, eg: node:/var/lib/one//datastores/103/63/disk.0 and node:/var/lib/one//datastores/103/63. In second case it will zero and then remove all the lvm drives under VM directory recursively.
  fixes Destroy VM isn't working with BRIDGE_LIST and UNDEPLOYED VMs #2981 (comment) and Destroy VM isn't working with BRIDGE_LIST and UNDEPLOYED VMs #2981 (comment)
mv action changes:
- Remove check for /VM/TEMPLATE/DISK[DISK_ID=$DISK_ID]/TYPE. I don't see big reason in it. Perhaps it was attempt to define different mechanism for moving old volumes as files, if system datastore was converted from shared to fs_lvm. But I still thinking that it should also convert drive to use lvm instead file store during datastore migration. Besides, this check wasn't working anyway, this parameter always equal to FILE.
  fixes fs_lvm offline migration does not work #2359 and LVM datastore migration does not work #3130
- Added check for /VM/LCM_STATE. We don't need to execute mv on UNDEPLOY state.

Signed-off-by: kvaps [email protected]

Signed-off-by: kvaps <[email protected]>

kvaps · 2019-04-02T11:43:51Z

I've added explanatory notes to the PR, please review.

kvaps · 2019-04-04T22:53:42Z

Please do not merge yet.

There is need enhancements:

More strict check for datastore and volumes belongs to the VM during the mass removing.
run lvchange -an during EPILOG_UNDEPLOY operation

Signed-off-by: kvaps <[email protected]>

kvaps · 2019-04-04T23:57:18Z

OK, now ready for review

Signed-off-by: kvaps <[email protected]>

xorel · 2019-04-08T09:45:09Z

Let's discuss the delete first:

I don't think it's a good idea to get DST_HOST from the history (host might be down/unavailable already). Why we can't use the BRIDGE_LIST here?
about the rest of the changes, I don't get the point. There are always 2 actions (if the VM has one disk). 1st deleting the disk (e.g. disk0) and second the directory. For the disk deletion, the DST_HOST is replaced by BRIDGE_LIST (if available) to prevent running on the frontend for the UNDEPLOYED host. For the directory removal, there should be no problem. For previously running VM this action will run on the hypervisor and for the UNDEPLOYED VM on the fronted (with previous mv). What problem are the other changes addressing?

kvaps · 2019-04-08T10:35:00Z

Hi,

I don't think it's a good idea to get DST_HOST from the history (host might be down/unavailable already). Why we can't use the BRIDGE_LIST here?

There is few potential problems:

If datastore have no BRIDGE_LIST, the first delete action will always be called on the controller.
But if its failed, the second one (after the recover --retry) will always be called directly on the compute node, see LVM cleanup executes on master instead storage node #2352 (comment)
This fix unifies this behavior to always be run on the compute node.

Removing directory will never be called on the BRIDGE_LIST node, because this check will never return DS_SYS_ID for the DST_PATH which have no disk\.[[:digit:]]+$ in the end.

one/src/tm_mad/fs_lvm/delete

Lines 46 to 48 in 3e1d571

    
           DS_SYS_ID=$(echo $DST_PATH | grep -E '\/disk\.[[:digit:]]+$' | $AWK -F '/' '{print $(NF-2)}')

~~OpenNebula 5.8 always call delete operation for the folder during vm termination, even if vm have the drives. Destroy VM isn't working with BRIDGE_LIST and UNDEPLOYED VMs #2981 (comment).~~ My bad, I was wrong here!
BRIDGE_LIST node have no activated lvm device, so zeroing is not working

We still have the node with activated LVM-device, it will continue existing there even after deletion.
Hoverer I solved it by adding this into mv:

one/src/tm_mad/fs_lvm/mv

Lines 90 to 108 in 0feb9d8

    
               # undeploy operation 
        
               if [ "${LCM_STATE}" = "30" ]; then 
        
                 # deactivate 
        
                 CMD=$(cat <<EOF 
        
                     set -ex -o pipefail 
        
                     ${SUDO} ${SYNC} 
        
                     ${SUDO} ${LVSCAN} 
        
                     ${SUDO} ${LVCHANGE} -an "${SRC_DEV}" 
        
                     rm -f "${SRC_DIR}/.host" || : 
        
           EOF 
        
           ) 
        
                 ssh_exec_and_log "$SRC_HOST" "$CMD" \ 
        
                     "Error deactivating disk $SRC_PATH" 
        
                 exit 0 
        
               fi

What problem are the other changes addressing?

Remove check for /VM/TEMPLATE/DISK[DISK_ID=$DISK_ID]/TYPE solves problem with offline and datastore migration. fixes fs_lvm offline migration does not work #2359 and F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues #3163
Added check for the /VM/LCM_STATE for deactivate lvm device during undeploy.

Any way. Since I was wrong about 3st point and we have solution for 1st point, I will prepare another PR for solve another problems only.

Thanks for review

kvaps · 2019-04-08T13:38:42Z

Closing this PR, due new one: #3201

Co-authored-by: Tino Vázquez <[email protected]>

kvaps force-pushed the fix-lvm branch from c4d82de to 697c3fc Compare April 1, 2019 21:09

Fix fs_lvm cleanup and offline migration

24daea5

Signed-off-by: kvaps <[email protected]>

kvaps force-pushed the fix-lvm branch from 697c3fc to 24daea5 Compare April 1, 2019 21:09

vholer requested a review from xorel April 2, 2019 08:24

vholer added this to the Release 5.8.2 milestone Apr 2, 2019

kvaps changed the title ~~F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues~~ F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues [Do not merge] Apr 4, 2019

disable volume during undeploy

0d228c0

Signed-off-by: kvaps <[email protected]>

kvaps force-pushed the fix-lvm branch from 421c6e3 to 0d228c0 Compare April 4, 2019 23:26

check DEV_PATH during deletion

6a34ef8

Signed-off-by: kvaps <[email protected]>

kvaps force-pushed the fix-lvm branch from df9f7e2 to 6a34ef8 Compare April 4, 2019 23:55

kvaps changed the title ~~F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues [Do not merge]~~ F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues Apr 4, 2019

fix: always take last HISTORY

0feb9d8

Signed-off-by: kvaps <[email protected]>

kvaps force-pushed the fix-lvm branch from 8a90069 to 0feb9d8 Compare April 5, 2019 15:59

kvaps closed this Apr 8, 2019

rsmontero pushed a commit that referenced this pull request Jul 25, 2024

M #~: Fix memory in numa (#3163)

fe36696

Co-authored-by: Tino Vázquez <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues #3163

F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues #3163

kvaps commented Apr 1, 2019 •

edited

Loading

kvaps commented Apr 2, 2019

kvaps commented Apr 4, 2019 •

edited

Loading

kvaps commented Apr 4, 2019

xorel commented Apr 8, 2019

kvaps commented Apr 8, 2019 •

edited

Loading

kvaps commented Apr 8, 2019

F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues #3163

F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues #3163

Conversation

kvaps commented Apr 1, 2019 • edited Loading

Note for reviewer:

kvaps commented Apr 2, 2019

kvaps commented Apr 4, 2019 • edited Loading

kvaps commented Apr 4, 2019

xorel commented Apr 8, 2019

kvaps commented Apr 8, 2019 • edited Loading

kvaps commented Apr 8, 2019

kvaps commented Apr 1, 2019 •

edited

Loading

kvaps commented Apr 4, 2019 •

edited

Loading

kvaps commented Apr 8, 2019 •

edited

Loading