Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues #3163

Closed
wants to merge 4 commits into from

Conversation

kvaps
Copy link
Contributor

@kvaps kvaps commented Apr 1, 2019

Note for reviewer:

Signed-off-by: kvaps [email protected]

@vholer vholer requested a review from xorel April 2, 2019 08:24
@vholer vholer added this to the Release 5.8.2 milestone Apr 2, 2019
@kvaps
Copy link
Contributor Author

kvaps commented Apr 2, 2019

I've added explanatory notes to the PR, please review.

@kvaps kvaps changed the title F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues [Do not merge] Apr 4, 2019
@kvaps
Copy link
Contributor Author

kvaps commented Apr 4, 2019

Please do not merge yet.

There is need enhancements:

  • More strict check for datastore and volumes belongs to the VM during the mass removing.
  • run lvchange -an during EPILOG_UNDEPLOY operation

@kvaps
Copy link
Contributor Author

kvaps commented Apr 4, 2019

OK, now ready for review

@kvaps kvaps changed the title F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues [Do not merge] F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues Apr 4, 2019
@xorel
Copy link
Member

xorel commented Apr 8, 2019

Let's discuss the delete first:

  • I don't think it's a good idea to get DST_HOST from the history (host might be down/unavailable already). Why we can't use the BRIDGE_LIST here?
  • about the rest of the changes, I don't get the point. There are always 2 actions (if the VM has one disk). 1st deleting the disk (e.g. disk0) and second the directory. For the disk deletion, the DST_HOST is replaced by BRIDGE_LIST (if available) to prevent running on the frontend for the UNDEPLOYED host. For the directory removal, there should be no problem. For previously running VM this action will run on the hypervisor and for the UNDEPLOYED VM on the fronted (with previous mv). What problem are the other changes addressing?

@kvaps
Copy link
Contributor Author

kvaps commented Apr 8, 2019

Hi,

I don't think it's a good idea to get DST_HOST from the history (host might be down/unavailable already). Why we can't use the BRIDGE_LIST here?

There is few potential problems:

  1. If datastore have no BRIDGE_LIST, the first delete action will always be called on the controller.
    But if its failed, the second one (after the recover --retry) will always be called directly on the compute node, see LVM cleanup executes on master instead storage node #2352 (comment)
    This fix unifies this behavior to always be run on the compute node.

  2. Removing directory will never be called on the BRIDGE_LIST node, because this check will never return DS_SYS_ID for the DST_PATH which have no disk\.[[:digit:]]+$ in the end.

    DS_SYS_ID=$(echo $DST_PATH | grep -E '\/disk\.[[:digit:]]+$' | $AWK -F '/' '{print $(NF-2)}')

  3. OpenNebula 5.8 always call delete operation for the folder during vm termination, even if vm have the drives. Destroy VM isn't working with BRIDGE_LIST and UNDEPLOYED VMs #2981 (comment). My bad, I was wrong here!

  4. BRIDGE_LIST node have no activated lvm device, so zeroing is not working

  5. We still have the node with activated LVM-device, it will continue existing there even after deletion.
    Hoverer I solved it by adding this into mv:

    one/src/tm_mad/fs_lvm/mv

    Lines 90 to 108 in 0feb9d8

    # undeploy operation
    if [ "${LCM_STATE}" = "30" ]; then
    # deactivate
    CMD=$(cat <<EOF
    set -ex -o pipefail
    ${SUDO} ${SYNC}
    ${SUDO} ${LVSCAN}
    ${SUDO} ${LVCHANGE} -an "${SRC_DEV}"
    rm -f "${SRC_DIR}/.host" || :
    EOF
    )
    ssh_exec_and_log "$SRC_HOST" "$CMD" \
    "Error deactivating disk $SRC_PATH"
    exit 0
    fi

What problem are the other changes addressing?

  1. Remove check for /VM/TEMPLATE/DISK[DISK_ID=$DISK_ID]/TYPE solves problem with offline and datastore migration. fixes fs_lvm offline migration does not work #2359 and F #2352 #2359 #2981 #3130: Fix fs_lvm cleanup and offline migration issues #3163

  2. Added check for the /VM/LCM_STATE for deactivate lvm device during undeploy.

Any way. Since I was wrong about 3st point and we have solution for 1st point, I will prepare another PR for solve another problems only.

Thanks for review

@kvaps
Copy link
Contributor Author

kvaps commented Apr 8, 2019

Closing this PR, due new one: #3201

@kvaps kvaps closed this Apr 8, 2019
rsmontero pushed a commit that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants