Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add toolbox script for RHCOS #44

Merged
merged 3 commits into from
Sep 28, 2018
Merged

Conversation

yuqi-zhang
Copy link
Contributor

Add rhcos-toolbox and corresponding specfile for building wtih rdgo.
rhcos-toolbox is a script that makes use of podman to create a
privileged debug/admin container, with ability to handle user
defined commands and variables. Modify README to reflect the change.

@yuqi-zhang
Copy link
Contributor Author

Added WIP for a some considerations:

  1. machine-type detection can be removed later once https://pagure.io/fedora-infrastructure/issue/7223, etc. is resolved. Still needs more testing on other archs (eg. arm64v8) once that is removed.

  2. Bind mounts are still in discussion. Currently it is defined similar to the old toolbox (rootfs/run is mounted), but also consider allowing user-defined mounts, and /var/srv/containers as the alternative mount: make /var/srv/containers labeled container_file_t:s0 by default fedora-coreos-tracker#42

  3. The container is effectively a pet container (stopped after the user disconnects). Considering adding extra functionality (removing the container, more flags, etc.)

PTAL @ashcrow @dm0- @dustymabe @lucab , thanks!

@ashcrow
Copy link
Member

ashcrow commented Sep 7, 2018

So far this is looking good. Thank you for incorporating my requests.

I like @lucab's idea for providing other mounts for the toolbox through an abstraction so it's not bound to a specific runtime. @lucab do you think it's still worth doing that?

@lucab
Copy link

lucab commented Sep 7, 2018

I think that came initially from @dm0-, and he may have a better view on the desired UX. From my side, I think we have a few possible approaches:

  1. be prescriptive, and directly bind-mount all of / rslave + RW + no-relabeling
  2. allow user flexibility, and expose all of the options above through some abstractions
  3. make this runtime specific, call it toolbox-podman, don't bind-mount anything and just consume a ${PODMAN_VOLUMES}

I don't have a strong opinion on this, but I think I'd currently rank them 1 - 3 - 2.

rhcos-toolbox Outdated
container_exec() {
sudo podman exec \
--env LANG=$LANG \
--env TERM=$TERM \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that here (and in many other places in this script), you are missing quotes around variables.

@ashcrow
Copy link
Member

ashcrow commented Sep 10, 2018

I think that came initially from @dm0-, and he may have a better view on the desired UX. From my side, I think we have a few possible approaches:

  1. be prescriptive, and directly bind-mount all of / rslave + RW + no-relabeling

@yuqi-zhang if you're cool with it let's run with this addition.

@yuqi-zhang
Copy link
Contributor Author

yuqi-zhang commented Sep 10, 2018

Thanks for the feedback! I'm fixing it up right now. There's also an issue I'm investigating where the container works fine for the first couple of runs, then becomes VERY slow when invoking the script after awhile (podman/systemd etc. takes up 100% of CPU for ~10 seconds before spawning the container).

Edit: its a mount issue, fixing.

@yuqi-zhang yuqi-zhang force-pushed the rhcos-toolbox branch 3 times, most recently from d59e3d4 to deb4860 Compare September 10, 2018 15:20
@yuqi-zhang yuqi-zhang changed the title <WIP> Add toolbox script for RHCOS Add toolbox script for RHCOS Sep 17, 2018
@yuqi-zhang
Copy link
Contributor Author

Removed WIP as the comments have been addressed.

@mrguitar
Copy link

@yuqi-zhang great work on this. My only gripe from my first pass using this is there's no feedback during the pull. rkt does a great job of showing the pull progress in the original version of this script.

One option could be to remove ">/dev/null 2>&1;" in the container create function. If you remove the existing container & fedora image on your system w/o piping this to dev/null I think it's a better experience. Thoughts?

@ashcrow
Copy link
Member

ashcrow commented Sep 19, 2018

Note that this work resolves openshift/os#78

@lucab
Copy link

lucab commented Sep 19, 2018

podman-related question: this currently does two steps (create + exec) which means there a whole story of ID tracking and GC. Would it be possible to instead have a single volatile interactive container, which is cleaned up on exit?

@yuqi-zhang
Copy link
Contributor Author

@mrguitar Good point, I'll remove some of the piping.

@lucab In terms of ID tracking, generally there should be only 1 toolbox at any given time, so it can just be referred to by name as toolbox. But I do agree that it may be easier to track if it uses the same workflow.

The creation/exec can be bundled together, BUT this would mean that every time a user exits the container it gets stop'ed and then remove'd? In the current script exiting the container just stops it, so the container acts more like a pet container. I could change it to a clean run every time if you think that makes more sense?

@dm0-
Copy link
Contributor

dm0- commented Sep 19, 2018

In CL, changes in the container filesystem "needed" to be persistent across toolbox runs. That was a requirement that prevented us from changing off nspawn.

The existing toolbox script keeps a different container for each user by default (although it was a configurable path so one user could have several), but I am not sure how hard of a requirement that was. Maybe the default TOOLBOX_NAME value could include a sanitized $USER value to preserve that.

Add rhcos-toolbox and corresponding specfile for building wtih rdgo.
rhcos-toolbox is a script that makes use of podman to create a
privileged debug/admin container, with ability to handle user
defined commands and variables. Modify README to reflect the change.

Signed-off-by: Yu Qi Zhang <[email protected]>
@yuqi-zhang
Copy link
Contributor Author

Then I think it makes sense to keep the filesystem persistent for the RHCOS toolbox, unless there is a reason not to do so.

I could also add in a /bin/toolbox rm cmd to clean up the container instead?

@mrguitar
Copy link

@dm0- was there a desire to move away from nspawn for this? I saw some chatter about switching to rkt, but it seemed like nspawn ultimately worked fine for this use case.

@lucab I can see pros and cons with either volatile or persistent containers. The way it's written in the PR seems to mirror the existing behavior w/ nspawn. Besides the GC, are there others reasons you'd like it to not persist? If there are enough reasons, it might be worth making it configurable.

@dm0-
Copy link
Contributor

dm0- commented Sep 19, 2018

@mrguitar There was an attempt to move to rkt at one point (coreos/bugs#1610), but yes, it looks like everyone felt that nspawn was good enough.

@lucab
Copy link

lucab commented Sep 20, 2018

@mrguitar I'd prefer not to make it configurable, we are already three levels of wrappers apart from the real execve and I'd like not to grow this into a full-blown container runtime.

I naively assumed it was fine and cleaner to purge resources once a debugging session was done, but I missed that some usecase may indeed benefit from retaining those across session. So I'm fine with sticking to the current approach, sorry for the diversion.

@mrguitar
Copy link

+100 on keeping the scope small and not making another full blown runtime here. :)
Thanks

Add a registry login option upon failing a pull. Add example
registry.redhat.io toolboxrc format.

Signed-off-by: Yu Qi Zhang <[email protected]>
@yuqi-zhang
Copy link
Contributor Author

Testing shows no conflict with kubelet on a RHCOS cluster, so we're good with podman.

Added extra login prompt for UX considerations for registry.redhat.io

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@dm0- dm0- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions, but LGTM without them.

rhcos-toolbox Outdated
cleanup
}

if [ -z "$1" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition seems strange, like if someone calls the script with multiple arguments but an empty first argument, this will make /bin/sh try to run a file with an empty filename and fail. Maybe a condition like [ -n "$*" ] || set /bin/sh is clearer, even if it still fails overall when given an empty first argument.

rhcos-toolbox Outdated
read -r -n 1 -p "Would you like to authenticate to registry: '${REGISTRY}' and try again? [y/N] "
echo

if [[ $REPLY =~ ^[Yy]$ ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop the $ anchor in case users reflexively type yes to a confirmation prompt, or also accept [Yy][Ee][Ss].

Accept y/yes for login prompt, and use cleaner detection for
checking user args for command.

Signed-off-by: Yu Qi Zhang <[email protected]>
@yuqi-zhang
Copy link
Contributor Author

@dm0- thanks for the suggestions! Pushed a fix. Ready to merge if you think we're good to go.

@dm0-
Copy link
Contributor

dm0- commented Sep 27, 2018

Oh, sorry, I missed the -n 1 on read the first time around; that should have been fine. I'll wait to merge until tomorrow morning so @lucab has a chance to double-check it, too.

@dm0- dm0- merged commit b61ed8c into coreos:master Sep 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants