-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Second attempt at fixing trace propagation in Celery 4.2+ #831
Conversation
kwarg_headers = kwargs.setdefault("headers", {}) | ||
# Note: kwargs can contain headers=None, so no setdefault! | ||
# Unsure which backend though. | ||
kwarg_headers = kwargs.get("headers") or {} | ||
kwarg_headers.update(headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is currently crashing all the time (in capture_internal_exceptions
) because the old code could not deal with kwargs == {"headers": None}
, only kwargs == {}
.
kwarg_headers.update(headers) | ||
|
||
# https://github.com/celery/celery/issues/4875 | ||
# | ||
# Need to setdefault the inner headers too since other | ||
# tracing tools (dd-trace-py) also employ this exact | ||
# workaround and we don't want to break them. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out it is perfectly reproducible. The bug we're working around lives in celery.app.amqp, but it seems that module is used in redis too??? tbh I no longer understand how celery separates concerns.
@@ -42,6 +42,7 @@ def inner(propagate_traces=True, backend="always_eager", **kwargs): | |||
|
|||
# this backend requires capture_events_forksafe | |||
celery.conf.worker_max_tasks_per_child = 1 | |||
celery.conf.worker_concurrency = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This speeds up tests, otherwise celery forks 20 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good on the perspective of speeding up tests -- won't this have an effect on bugs and code paths surfaced by tests, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we haven't had any bugs related to this so far and my gut feeling tells me no. There is no direct communication between the forked processes so I'd say it's unlikely, and the tests won't send in high event volumes ever anyway.
@request.addfinalizer | ||
def _(): | ||
assert not in_process_events | ||
capture_events() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to remove this assertion because we actually do have events in the same process now.
We need to call capture_events or otherwise our transport will raise an error (this fixture setup is overdue for refactor...)
It turns out the bug is perfectly reproducible with the Redis backend, I just messed up the test.
apply_async was basically patched at the wrong point in time, and as such the trace propagation didn't work out.
Sorry y'all for dropping this on you without so little context, we can go through this tomorrow.
Follow-up to #824 #825