Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle queue errors in streaming dataset reader #19167

Merged
merged 4 commits into from
Jan 15, 2024
Merged

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Dec 15, 2023

What does this PR do?

Suppresses sporadic errors when querying the queue:

Error 1:

Exception in thread Thread-5:
 Traceback (most recent call last):
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
     self.run()
 Exception in thread Thread-5:
 Traceback (most recent call last):
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
     self.run()
   File "/teamspace/studios/this_studio/lightning/src/lightning/data/streaming/reader.py", line 136, in run
     self._maybe_delete_chunks()
   File "/teamspace/studios/this_studio/lightning/src/lightning/data/streaming/reader.py", line 99, in _maybe_delete_chunks
     chunk_index = _get_from_queue(self._to_delete_queue)
   File "/teamspace/studios/this_studio/lightning/src/lightning/data/streaming/reader.py", line 309, in _get_from_queue
     raise e
   File "/teamspace/studios/this_studio/lightning/src/lightning/data/streaming/reader.py", line 301, in _get_from_queue
     return queue.get(timeout=0.1)
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/queues.py", line 117, in get
     res = self._recv_bytes()
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
     buf = self._recv_bytes(maxlength)
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
     buf = self._recv(4)
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
     chunk = read(handle, remaining)
 OSError: [Errno 9] Bad file descriptor

Error 2:

Traceback (most recent call last):
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
     self.run()
   File "/teamspace/studios/this_studio/lightning/src/lightning/data/streaming/reader.py", line 134, in run
     self._maybe_delete_chunks()
   File "/teamspace/studios/this_studio/lightning/src/lightning/data/streaming/reader.py", line 89, in _maybe_delete_chunks
     chunk_index = _get_from_queue(self._to_delete_queue, timeout=None if reached_pre_download else _DEFAULT_TIMEOUT)
   File "/teamspace/studios/this_studio/lightning/src/lightning/data/streaming/reader.py", line 298, in _get_from_queue
     return queue.get(timeout=timeout)
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/queues.py", line 103, in get
     res = self._recv_bytes()
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
     buf = self._recv_bytes(maxlength)
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
     buf = self._recv(4)
   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
     raise EOFError
 EOFError

Any clue why we get this error in the first place?


📚 Documentation preview 📚: https://pytorch-lightning--19167.org.readthedocs.build/en/19167/

cc @Borda

@github-actions github-actions bot added the data (external) litdata package label Dec 15, 2023
@awaelchli awaelchli changed the title Handle OSError: Bad file descriptor in streaming dataset reader Handle queue errors in streaming dataset reader Dec 18, 2023
@awaelchli awaelchli marked this pull request as ready for review December 18, 2023 15:46
@awaelchli awaelchli requested a review from tchaton as a code owner December 18, 2023 15:46
@awaelchli awaelchli added the bug Something isn't working label Dec 18, 2023
@awaelchli awaelchli added this to the 2.1.x milestone Dec 18, 2023
Copy link
Contributor

github-actions bot commented Dec 18, 2023

⚡ Required checks status: All passing 🟢

Groups summary

🟢 lightning_data: CPU workflow
Check ID Status
data-cpu (macOS-11, lightning, 3.10, 2.1) success
data-cpu (ubuntu-20.04, lightning, 3.10, 2.1) success
data-cpu (windows-2022, lightning, 3.10, 2.1) success

These checks are required after the changes to src/lightning/data/streaming/reader.py.

🟢 mypy
Check ID Status
mypy success

These checks are required after the changes to src/lightning/data/streaming/reader.py.

🟢 install
Check ID Status
install-pkg (ubuntu-22.04, app, 3.8) success
install-pkg (ubuntu-22.04, app, 3.11) success
install-pkg (ubuntu-22.04, fabric, 3.8) success
install-pkg (ubuntu-22.04, fabric, 3.11) success
install-pkg (ubuntu-22.04, pytorch, 3.8) success
install-pkg (ubuntu-22.04, pytorch, 3.11) success
install-pkg (ubuntu-22.04, lightning, 3.8) success
install-pkg (ubuntu-22.04, lightning, 3.11) success
install-pkg (ubuntu-22.04, notset, 3.8) success
install-pkg (ubuntu-22.04, notset, 3.11) success
install-pkg (macOS-12, app, 3.8) success
install-pkg (macOS-12, app, 3.11) success
install-pkg (macOS-12, fabric, 3.8) success
install-pkg (macOS-12, fabric, 3.11) success
install-pkg (macOS-12, pytorch, 3.8) success
install-pkg (macOS-12, pytorch, 3.11) success
install-pkg (macOS-12, lightning, 3.8) success
install-pkg (macOS-12, lightning, 3.11) success
install-pkg (macOS-12, notset, 3.8) success
install-pkg (macOS-12, notset, 3.11) success
install-pkg (windows-2022, app, 3.8) success
install-pkg (windows-2022, app, 3.11) success
install-pkg (windows-2022, fabric, 3.8) success
install-pkg (windows-2022, fabric, 3.11) success
install-pkg (windows-2022, pytorch, 3.8) success
install-pkg (windows-2022, pytorch, 3.11) success
install-pkg (windows-2022, lightning, 3.8) success
install-pkg (windows-2022, lightning, 3.11) success
install-pkg (windows-2022, notset, 3.8) success
install-pkg (windows-2022, notset, 3.11) success

These checks are required after the changes to src/lightning/data/streaming/reader.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

src/lightning/data/streaming/reader.py Outdated Show resolved Hide resolved
@mergify mergify bot added the ready PRs ready to be merged label Jan 15, 2024
@awaelchli awaelchli merged commit 628ee0c into master Jan 15, 2024
55 checks passed
@awaelchli awaelchli deleted the data/queue-error branch January 15, 2024 17:04
@tchaton tchaton mentioned this pull request Jan 16, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data (external) litdata package ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants