Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler][aws] Fix example minimal #27075

Merged
merged 1 commit into from
Jul 27, 2022

Conversation

wuisawesome
Copy link
Contributor

Why are these changes needed?

The DLAMI moved underneath us and broke for 2 reasons.

  1. The AMI's snapshot size increased to 140 GB which was more than our hardcoded max EBS volume size of 100GB
  2. The AMI dropped support for python 3.7 and only has 3.8 now.

The solutions short term solutions are simple.

  1. Allocate a bigger EBS volume.
  2. Use the tensorflow 3.8 env.

Related issue number

Closes #26368

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@wuisawesome
Copy link
Contributor Author

Manual testing shows it works
image

@DmitriGekhtman
Copy link
Contributor

:( :( :( :(
cc @cadedaniel

@wuisawesome
Copy link
Contributor Author

mac agents, library failures look unrelated, merging.

@wuisawesome wuisawesome merged commit 3322558 into ray-project:master Jul 27, 2022
Rohan138 pushed a commit to Rohan138/ray that referenced this pull request Jul 28, 2022
Why are these changes needed?
The DLAMI moved underneath us and broke for 2 reasons.

The AMI's snapshot size increased to 140 GB which was more than our hardcoded max EBS volume size of 100GB
The AMI dropped support for python 3.7 and only has 3.8 now.
The solutions short term solutions are simple.

Allocate a bigger EBS volume.
Use the tensorflow 3.8 env.
Related issue number
Closes ray-project#26368

Co-authored-by: Alex <[email protected]>
Signed-off-by: Rohan138 <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
Why are these changes needed?
The DLAMI moved underneath us and broke for 2 reasons.

The AMI's snapshot size increased to 140 GB which was more than our hardcoded max EBS volume size of 100GB
The AMI dropped support for python 3.7 and only has 3.8 now.
The solutions short term solutions are simple.

Allocate a bigger EBS volume.
Use the tensorflow 3.8 env.
Related issue number
Closes ray-project#26368

Co-authored-by: Alex <[email protected]>
Signed-off-by: Stefan van der Kleij <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Autoscaler][Docs] AWS quickstart example fails to start
2 participants