Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve default launch device for train #3523

Merged
merged 9 commits into from
Dec 5, 2024

Conversation

Noezor
Copy link
Contributor

@Noezor Noezor commented Nov 12, 2024

👋 folks,

Thanks for the great work on the project.

As reported on #2438, I struggled to start the machine on my m1 but found that it's because the documentation of the getting started didn't specify the device type, although I imagine a few people will try to run a gsplat on their M1.

I first thought I would change the documentation but we have the tools to spot if CUDA is appropriate or not anyways so am instead doing a code contrib.

I had two implementation choices:

  • keeping the default as cuda and then verifying if cuda is available.
  • adding a new None default and then chosing the most appropriate default

The first choice had the advantage of avoiding potential other breaking changes that the second choice would create. However, I checked and the class I'm modifying is only used once in the TrainerConfig so that issue wasn't significant.

Using None has an advantage of disambiguating between the case where the user sets the parameter explicitely or not. The major issue I wanted to avoid using None is if someone has a cuda device, runs with device_type=cuda and goes afk only to find that his training indeed launched but took 20h instead of 5mn because we didn't find the cuda device for some sheneingan.

Let me know what you think and if you have ideas of where I should use this. I sometimes found the default to be cpu or cuda and am not sure of the logic. I feel like anyways the code would be improved by doing "cuda" if torch.cuda.is_available() else "cpu" for the cuda case.

@jb-ye
Copy link
Collaborator

jb-ye commented Nov 21, 2024

I could still prefer option 1: keeping the default as cuda and then verifying if cuda is available (throwing an error and halt if necessary).

@jb-ye jb-ye enabled auto-merge (squash) December 5, 2024 18:26
@jb-ye jb-ye merged commit 555d554 into nerfstudio-project:main Dec 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants