Improve default launch device for train #3523
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
👋 folks,
Thanks for the great work on the project.
As reported on #2438, I struggled to start the machine on my m1 but found that it's because the documentation of the getting started didn't specify the device type, although I imagine a few people will try to run a gsplat on their M1.
I first thought I would change the documentation but we have the tools to spot if CUDA is appropriate or not anyways so am instead doing a code contrib.
I had two implementation choices:
cuda
and then verifying ifcuda
is available.None
default and then chosing the most appropriate defaultThe first choice had the advantage of avoiding potential other breaking changes that the second choice would create. However, I checked and the class I'm modifying is only used once in the
TrainerConfig
so that issue wasn't significant.Using
None
has an advantage of disambiguating between the case where the user sets the parameter explicitely or not. The major issue I wanted to avoid usingNone
is if someone has a cuda device, runs withdevice_type=cuda
and goes afk only to find that his training indeed launched but took 20h instead of 5mn because we didn't find the cuda device for some sheneingan.Let me know what you think and if you have ideas of where I should use this. I sometimes found the default to be
cpu
orcuda
and am not sure of the logic. I feel like anyways the code would be improved by doing"cuda" if torch.cuda.is_available() else "cpu"
for thecuda
case.