-
Notifications
You must be signed in to change notification settings - Fork 210
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1182 +/- ##
==========================================
+ Coverage 90.92% 90.94% +0.01%
==========================================
Files 283 284 +1
Lines 12701 12687 -14
==========================================
- Hits 11549 11538 -11
+ Misses 1152 1149 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have already removed the hardcoded run only on one GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @ethanwharris for working on this! LGTM. Have a couple of questions, but shouldn't block merging this PR.
Asking for my knowledge, what really fixed the issue?
predictions = predict_step(*args, **kwargs) | ||
if predictions is not None: | ||
predictions = self.output_transform(predictions) | ||
predictions = [self.output(prediction) for prediction in predictions] | ||
return predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, when do you think predictions
would be None
? Should that be counted as a failure? Or a warning be raised that the OutputTransform
and Output
instances passed were not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some cases where it can be None
but not sure, it may just be within our tests that it can be None
. But yeah, could be better to have an error there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also see if there is a possibility that predictions
can be None
, but for now - I guess we can merge this PR and create a small follow-up PR if required (for the error).
def __getstate__(self): | ||
"""Temporarily override pickle behaviour. | ||
|
||
TODO: New DataPipeline should avoid this being pickled. | ||
""" | ||
state = self.__dict__.copy() | ||
state.pop("data") | ||
if "data_iter" in state: | ||
state.pop("data_iter") | ||
return state | ||
|
||
def __setstate__(self, newstate): | ||
"""Temporarily override pickle behaviour. | ||
|
||
TODO: New DataPipeline should avoid this being pickled. | ||
""" | ||
newstate["data"] = None | ||
self.__dict__.update(newstate) | ||
|
||
def __copy__(self): | ||
"""The default copy implementation seems to use ``__getstate__`` and ``__setstate__`` so we override it | ||
here with a custom implementation to ensure that it includes the data list.""" | ||
cls = self.__class__ | ||
result = cls.__new__(cls) | ||
result.__dict__.update(self.__dict__) | ||
return result | ||
|
||
def __deepcopy__(self, memo): | ||
"""The default deepcopy implementation seems to use ``__getstate__`` and ``__setstate__`` so we override it | ||
here with a custom implementation to ensure that it includes the data list.""" | ||
cls = self.__class__ | ||
result = cls.__new__(cls) | ||
memo[id(self)] = result | ||
for k, v in self.__dict__.items(): | ||
setattr(result, k, deepcopy(v, memo)) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krshrimali This is the main fix. We used to have a bug where the data was accidentally included in the checkpoint. We patched that by adding this overrides. But then DDP spawn needs to pickle the data to send it to each process so this causes problems. We refactored away the bit that got this included in the checkpoint so now can be safely removed 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you so much for the explanation, @ethanwharris!
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
What does this PR do?
Fixes #1153
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃