Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirt: Resize specifically to 16G #2652

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

Today for IaaS clouds, we default to an instance type, which
then in turn usually provides a default size. RHCOS resizes on
boot to that size, distinct from its "default" 16G size.

However, libvirt installs were inheriting our default size. We'd
like to shrink it because we plan to land encryption:
openshift/enhancements#15
And the less data we need to encrypt, the better.

(In the future I'd like to make this configurable with a variable,
but let's just prepare for the encryption work now)

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 11, 2019
@cgwalters
Copy link
Member Author

/label libvirt

@openshift-ci-robot
Copy link
Contributor

@cgwalters: The label(s) /label libvirt cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga

In response to this:

/label libvirt

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

/label platform/libvirt

@openshift-ci-robot
Copy link
Contributor

@cgwalters: The label(s) /label platform/libvirt cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga

In response to this:

/label platform/libvirt

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@abhinavdahiya
Copy link
Contributor

However, libvirt installs were inheriting our default size.

I don't think we are inheriting that, we explicitly required the RHCOS to be the size by default. need to find the previous conversation with @crawford

And the less data we need to encrypt, the better.

hmm, how does that affect this, maybe some context?

@hardys
Copy link

hardys commented Nov 14, 2019

It seems we need the same fix for platform/baremetal - in recent IPI baremetal deploys we're seeing errors like this:

Nov 14 12:55:37 localhost startironic.sh[25249]: Trying to pull registry.svc.ci.openshift.org/ocp/4.3-2019-11-14-095446@sha256:3083ddf86e3132209d5237ac47ca410f4ecaedc0510c699032047cc4b1c2312a...
Nov 14 12:55:38 localhost startironic.sh[25249]: Getting image source signatures
Nov 14 12:55:40 localhost startironic.sh[25249]: Copying blob sha256:641d7cc5cbc48a13c68806cf25d5bcf76ea2157c3181e1db4f5d0edae34954ac
Nov 14 12:55:40 localhost startironic.sh[25249]: Copying blob sha256:c3c32150a70ab874599ceb62913bb97155a1356f487ede252d78870dd84008d6
Nov 14 12:55:40 localhost startironic.sh[25249]: Copying blob sha256:c65691897a4d140d441e2024ce086de996b1c4620832b90c973db81329577274
Nov 14 12:55:41 localhost startironic.sh[25249]: Copying blob sha256:18803438f00ccb3072bb493ce69deb377f62a020454db0b806f7bff8d4f45e33
Nov 14 12:55:42 localhost startironic.sh[25249]:   write /var/tmp/storage989090957/3: no space left on device
Nov 14 12:55:42 localhost startironic.sh[25249]: Error: unable to pull registry.svc.ci.openshift.org/ocp/4.3-2019-11-14-095446@sha256:3083ddf86e3132209d5237ac47ca410f4ecaedc0510c699032047cc4b1c2312a: unable to pull image: Error writing blob: error storing blob to file "/var/tmp/storage989090957/3": write /var/tmp/storage989090957/3: no space left on device

We can see that /sysroot is only 2.5G and it's full:

[core@localhost ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        2.9G     0  2.9G   0% /dev
tmpfs           3.0G  168K  3.0G   1% /dev/shm
tmpfs           3.0G  1.3M  3.0G   1% /run
tmpfs           3.0G     0  3.0G   0% /sys/fs/cgroup
/dev/dm-0       2.5G  2.5G   15M 100% /sysroot
/dev/vda1       364M   83M  259M  25% /boot
/dev/vda2       127M  3.0M  124M   3% /boot/efi
tmpfs           596M  4.0K  596M   1% /run/user/1000

@cgwalters would you like to update this to align https://github.com/openshift/installer/blob/master/data/data/baremetal/bootstrap/main.tf with the changes here, or should we push a separate PR that copies the same fixes?

@stbenjam
Copy link
Member

I opened #2673 for baremetal but happy to close if you want to incorporate those changes here.

@cgwalters
Copy link
Member Author

Hmm. Perhaps it'll be simpler for RHCOS to switch to having a smaller partition internally that's always resized rather than requiring this change.

@cgwalters
Copy link
Member Author

Took a different tack on this in coreos/coreos-assembler#917

Copy link

@darkmuggle darkmuggle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgwalters, darkmuggle
To complete the pull request process, please assign praveenkumar
You can assign the PR to them by writing /assign @praveenkumar in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hardys
Copy link

hardys commented Nov 15, 2019

This is a blocker for IPI baremetal so it'd be good to either get this in coreos/coreos-assembler#917 in or this PR and #2673

Today for IaaS clouds, we default to an instance type, which
then in turn usually provides a default size.  RHCOS resizes on
boot to that size, distinct from its "default" 16G size.

However, libvirt installs were inheriting our default size.  We'd
like to shrink it because we plan to land encryption:
openshift/enhancements#15
And the less data we need to encrypt, the better.

(In the future I'd like to make this configurable with a variable,
 but let's just prepare for the encryption work now)
@jlebon
Copy link
Member

jlebon commented Nov 15, 2019

I don't think we are inheriting that, we explicitly required the RHCOS to be the size by default. need to find the previous conversation with @crawford

Hmm, wouldn't it make more sense for the disk size requirement to live in the installer (where I presume the code to select the right disk size on various clouds also live)?

@cgwalters
Copy link
Member Author

Hmm, wouldn't it make more sense for the disk size requirement to live in the installer (where I presume the code to select the right disk size on various clouds also live)?

Right, I agree with this consistency argument for merging this. That said I think we should also change the RHCOS builds with coreos/coreos-assembler#924 since that makes things more consistent on that side too.

@abhinavdahiya
Copy link
Contributor

I don't think we are inheriting that, we explicitly required the RHCOS to be the size by default. need to find the previous conversation with @crawford

Hmm, wouldn't it make more sense for the disk size requirement to live in the installer (where I presume the code to select the right disk size on various clouds also live)?

if the default RHCOS disk size needs to be configured by anybody trying to use it.. means it's a wrong default. So if the installer needs to re-size for the libvirt we is supposed to the bare-minimum we can do I think the resize shouldn't live here.

@cgwalters
Copy link
Member Author

if the default RHCOS disk size needs to be configured by anybody trying to use it.. means it's a wrong default.

We definitely support customers picking instance and disk sizes on IaaS (GCP/AWS/OpenStack etc.) Ultimately, I think we'll want this flexibility for libvirt too (exposed via env variables or install configs maybe).

@cgwalters
Copy link
Member Author

coreos/coreos-assembler#924 merged which lessens the importance of this at least.

@cgwalters
Copy link
Member Author

OK, we have another use case for this, which is allowing CodeReady Containers to expand the RHCOS root disk before doing an install. That one needs an explicit configuration option for it in some form, either an environment variable or an installconfig option.

@praveenkumar
Copy link
Contributor

/test e2e-libvirt

@abhinavdahiya
Copy link
Contributor

OK, we have another use case for this, which is allowing CodeReady Containers to expand the RHCOS root disk before doing an install. That one needs an explicit configuration option for it in some form, either an environment variable or an installconfig option.

Not sure why? Can somebody provide context/detail.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 27, 2019

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-openstack 5e46b88 link /test e2e-openstack
ci/prow/e2e-libvirt 5e46b88 link /test e2e-libvirt

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jlebon
Copy link
Member

jlebon commented Nov 27, 2019

Not sure why? Can somebody provide context/detail.

Likely because in CRC everything is kept on a single node.

if the default RHCOS disk size needs to be configured by anybody trying to use it.. means it's a wrong default.

We can pick a number that satisfies the most common use cases when building the image. But IMO it's completely reasonable to want more control than that at deployment time depending on your circumstances. How many nodes will you have? How big are the images you expect to be using? Do you need to carve out more partitions for compliance? etc...

@cfergeau
Copy link
Contributor

Not sure why? Can somebody provide context/detail.

Likely because in CRC everything is kept on a single node.

Yes, CRC is using a single node, so the users' workloads are going to be installed on that disk, and the default 16GB size has been too small in some cases.

@abhinavdahiya
Copy link
Contributor

Closing due to this being open for a long time, Please feel free to reopen

/close

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closed this PR.

In response to this:

Closing due to this being open for a long time, Please feel free to reopen

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants