-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataLoader2] Deep copy DataPipe during initialization #786
Conversation
[ghstack-poisoned]
ghstack-source-id: 43fd0384f875c9c81a994de831951b114f8499c2 Pull Request resolved: #786
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after #746 (the other PR will have to rebase). Differential Revision: [D39741743](https://our.internmc.facebook.com/intern/diff/D39741743) [ghstack-poisoned]
ghstack-source-id: 2b8ef5829badd33c82870d4e1753021d4cbb55a6 Pull Request resolved: #786
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after #746 (the other PR will have to rebase). Differential Revision: [D39741743](https://our.internmc.facebook.com/intern/diff/D39741743) [ghstack-poisoned]
ghstack-source-id: 62bcadc0aae4e352f39839566b6a3a8bf7f9490e Pull Request resolved: #786
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Pull Request resolved: pytorch#786 Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after pytorch#746 (the other PR will have to rebase). Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D39741743 Pulled By: ejguan fbshipit-source-id: f1cdaca054e10bcb3672857b933732178b370535
Summary: Pull Request resolved: #786 Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after #746 (the other PR will have to rebase). Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D39741743 Pulled By: ejguan fbshipit-source-id: f1cdaca054e10bcb3672857b933732178b370535
Summary: Pull Request resolved: pytorch#786 Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after pytorch#746 (the other PR will have to rebase). Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D39741743 Pulled By: ejguan fbshipit-source-id: f1cdaca054e10bcb3672857b933732178b370535
@NivekT Do you think we need to add BC breaking note to this PR? |
Actually yes, I think we should, even though I don't think users would intentionally rely on the state of DataPipes being shared among DataLoaders. |
Can you add a brief example in the bc breaking note that the same DataPipe used by multiple DLv2? |
Will do by the afternoon. |
Thank you so much! |
@ejguan Updated this PR's description. |
Stack from ghstack:
Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages.
This can be merged before or after #746 (the other PR will have to rebase).
Differential Revision: D39741743
BC Breaking Note:
Previously, if a DataPipe is being passed to multiple DataLoaders, the DataPipe's state can be altered by any of those DataLoaders. In some cases, that may raise exception due to the single iterator constraint; in other cases, some behaviors can be changed due to the adapters (e.g. shuffling) of another DataLoader.
Consider the code snippet below:
Prior to this change, since the DataPipe objects are identical for each DataLoader2, it would've violated the single iterator constraint and raised an exception.
After this change, since the DataPipe objects are deep copied within the
DataLoader2
s, they are no longer the same object within each. You can iterate through them simultaneously without issue: