Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flytekit] Polish - Resources #6142

Open
2 tasks done
wild-endeavor opened this issue Jan 5, 2025 · 5 comments
Open
2 tasks done

[flytekit] Polish - Resources #6142

wild-endeavor opened this issue Jan 5, 2025 · 5 comments
Assignees
Labels
backlogged For internal use. Reserved for contributor team workflow.

Comments

@wild-endeavor
Copy link
Contributor

wild-endeavor commented Jan 5, 2025

Resources

This is a series of tickets to improve the flytekit authoring experience. If any changes are not possible to make in a backwards-compatible way, split it out into a separate ticket.

Rename argument

The argument ephemeral_storage should be renamed to disk. disk is succinct and gets the point across. Make it super clear if not already in user facing notes that the disk is ephemeral.

SHM

Should we add shm as an option to Resources? (i think the same issue that kept us from putting accelerator - the fact that resources are specified as both requests and limits - may be relevant here. Even if k8s doesn't allow it today, does it make sense to for an shm to have two different numbers here? Discuss more and probably cut a separate ticket for this (cc @granthamtaylor)

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@wild-endeavor wild-endeavor added documentation Improvements or additions to documentation untriaged This issues has not yet been looked at by the Maintainers labels Jan 5, 2025
@wild-endeavor wild-endeavor added backlogged For internal use. Reserved for contributor team workflow. and removed documentation Improvements or additions to documentation labels Jan 5, 2025
@thomasjpfan
Copy link
Member

I agree with changing ephemeral_storage to disk.

When adding shared memory with a pod template, it think it just sets the limit and it is not a request. I think most users just leaves it as default, which sets the limit as the total number of ram. Reference

A size limit can be specified for the default medium, which limits the capacity of the emptyDir volume. The storage is allocated from node ephemeral storage. If that is filled up from another source (for example, log files or image overlays), the emptyDir may run out of capacity before this limit. If no size is specified, memory backed volumes are sized to node allocatable memory.

This is what we do with the pytorch plugin: https://github.com/flyteorg/flytekit/blob/4e93e36843b8f13f06eb088e1a46232ad1fb225d/plugins/flytekit-kf-pytorch/flytekitplugins/kfpytorch/pod_template.py#L6. So we can put shm into Resource, but it can only be used with limit.

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Jan 16, 2025
@wild-endeavor wild-endeavor changed the title [Draft] [flytekit] Polish - Resources [flytekit] Polish - Resources Jan 17, 2025
@JiangJiaWei1103
Copy link
Contributor

#take

@JiangJiaWei1103
Copy link
Contributor

JiangJiaWei1103 commented Jan 20, 2025

Hi @wild-endeavor,

Would it be a good idea to support both ephemeral_storage and disk for now, prioritize disk as the preferred argument, and warn users about the future deprecation of ephemeral_storage? I think this approach could help maintain backward compatibility while encouraging users to a smooth transition.
The question above is clarified by Thomas' response here. Thanks!

Regarding shm, just to confirm, we’re not planning to address it in this ticket—correct?

cc @thomasjpfan @granthamtaylor

Thanks so much!

@thomasjpfan
Copy link
Member

For a first PR, I would just add disk to Resources for now.


For shm, I think we should only allow it as a limit and error if shm is "requested".

limits=Resources(..., shm=Resources.SharedMemory.Default)

Where Resources.SharedMemory.Default means "memory backed volumes are sized to node allocatable memory." BTW, we likely need a backend change here to create the proper pod spec that sets shared memory.

@JiangJiaWei1103
Copy link
Contributor

JiangJiaWei1103 commented Jan 25, 2025

Sounds good! I'll create a PR focusing on adding a new option disk now.

For shm-related features, I'm glad to help bring it to life in the future. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow.
Projects
Status: Backlog
Development

No branches or pull requests

4 participants