Intermittent KeyError when handling a finished future in the task worker #11350

rtibbles · 2023-10-04T17:44:10Z

Observed behavior

Occasionally, when a task is completed, deleting its future from the futures map results in a KeyError that is unhandled.

Errors and logs

ERROR    2023-10-04 15:52:05,784 exception calling callback for <Future at 0x116e39358 state=finished returned NoneType>
Traceback (most recent call last):
  File "/Users/runner/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/concurrent/futures/_base.py", line 324, in _invoke_callbacks
    callback(self)
  File "/Users/runner/work/kolibri/kolibri/kolibri/core/tasks/worker.py", line 90, in handle_finished_future
    del self.future_job_mapping[job.job_id]
KeyError: '68a88bf36f0b4cd8942b4db6f895d6f5'

Expected behavior

Would be good to understand why these KeyErrors are happening, but ultimately making them either not happen, or be handled, because the deletion is unneeded is sufficient resolution here.

User-facing consequences

Erroneous error logs that suggest something has gone wrong when it hasn't

Steps to reproduce

Running the task runner tests in multiprocessing mode should be sufficient.

Context

Observed in the MacOS tests on Github Actions (but also occasionally observed locally with threaded task runners).

The text was updated successfully, but these errors were encountered:

laynestephens · 2023-11-08T14:49:01Z

I would like to be assigned to this issue.

Nyu10 · 2023-11-08T17:15:28Z

am also interested

MisRob · 2023-11-10T09:05:51Z

Hi @laynestephens and @Nyu10, thanks for volunteering. I will assign this to @laynestephens since they were first. @Nyu10 I assigned you another issue you asked for meanwhile.

laynestephens · 2023-11-22T19:50:44Z

Hi again! I am working with a partner on this issue, @adviti-mishra , and I was wondering if they could be assigned simultaneously.

MisRob · 2023-11-24T09:10:52Z

Hi @laynestephens, yes sure

MisRob · 2023-11-24T09:11:44Z

@laynestephens We will need @adviti-mishra to comment on this issue so I can assign them

adviti-mishra · 2023-11-25T23:21:52Z

Hello @MisRob Thank you so much! I'm with @laynestephens and would love to be assigned as well.

adviti-mishra · 2023-11-30T20:24:26Z

The KeyError issue seems to be in deleting a variable called future from self.job_future_mapping and a variable called job.job_id from self.future_job_mapping. The first fix that came to mind was adding an if check to see if the keys exist in the dictionaries before deleting them. However, we suspect the root cause of this error is a race condition where this function is trying to delete from the dictionaries in one thread while another function is trying to modify the dictionaries in another thread.
If our hypotheses are correct, could we make a PR for the first fix and then look into fixing the race condition? This is our first time contributing to open source, so we aren't sure if ideating like this on the issue is a good idea or not : ) Thank you!

MisRob · 2023-12-04T13:12:57Z

@adviti-mishra It seems you've already discussed this on Slack. Let us know there if you needed anything else. Thank you.

Fixed intermittent KeyError when handling a finished future in the task worker #11350

rtibbles · 2024-01-17T18:11:33Z

Fixed in #11591

rtibbles added P2 - normal Priority: Nice to have DEV: backend Python, databases, networking, filesystem... help wanted Open source contributors welcome labels Oct 4, 2023

rtibbles added this to the 0.16 Future Patches triage milestone Oct 4, 2023

MisRob assigned laynestephens Nov 10, 2023

MisRob assigned adviti-mishra Nov 27, 2023

adviti-mishra mentioned this issue Dec 5, 2023

Fixed intermittent KeyError when handling a finished future in the task worker #11350 #11591

Merged

9 tasks

rtibbles added a commit that referenced this issue Jan 17, 2024

Merge pull request #11591 from adviti-mishra/release-v0.16.x

9d9166e

Fixed intermittent KeyError when handling a finished future in the task worker #11350

rtibbles closed this as completed Jan 17, 2024

rtibbles modified the milestones: Kolibri 0.16: Future Patches triage, Kolibri 0.16.0: Bugs and Regressions Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent KeyError when handling a finished future in the task worker #11350

Intermittent KeyError when handling a finished future in the task worker #11350

rtibbles commented Oct 4, 2023

laynestephens commented Nov 8, 2023

Nyu10 commented Nov 8, 2023

MisRob commented Nov 10, 2023

laynestephens commented Nov 22, 2023

MisRob commented Nov 24, 2023

MisRob commented Nov 24, 2023

adviti-mishra commented Nov 25, 2023

adviti-mishra commented Nov 30, 2023

MisRob commented Dec 4, 2023

rtibbles commented Jan 17, 2024

Intermittent KeyError when handling a finished future in the task worker #11350

Intermittent KeyError when handling a finished future in the task worker #11350

Comments

rtibbles commented Oct 4, 2023

Observed behavior

Errors and logs

Expected behavior

User-facing consequences

Steps to reproduce

Context

laynestephens commented Nov 8, 2023

Nyu10 commented Nov 8, 2023

MisRob commented Nov 10, 2023

laynestephens commented Nov 22, 2023

MisRob commented Nov 24, 2023

MisRob commented Nov 24, 2023

adviti-mishra commented Nov 25, 2023

adviti-mishra commented Nov 30, 2023

MisRob commented Dec 4, 2023

rtibbles commented Jan 17, 2024