-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long-running tasks are duplicated multiple times in multi-cluster environment when no timeout is set #307
Comments
I'm also seeing long running tasks being picked up more than once. It's one cluster of 6 workers using the orm and it seems to pickup the same job multiple times. I've worked around it by adding a flag when the job is picked up. Not ideal obviously. Is anyone else seeing this? @jonathan-golorry what backend are you using? |
I'm using the django ORM. |
I am also seeing this issue. Using the Django ORM. django_q version = 1.0.1 settings.py:
|
I did a bit of poking on this one. Looks like the second worker starts the duplicate task at exactly 120 sec for me. This happens to be my Is it possible that the retry mechanism is duplicating the running task? |
I reduced the I tried fixing the issue by increasing the |
This behavior is documented https://django-q.readthedocs.io/en/latest/brokers.html |
Tried this again, and it seems to magically be working. @edmenendez @jonathan-golorry Have you guys checked to see if the |
Thanks for checking this. Looks like the default Can |
I'm setting timeout to 900, but not setting retry. And I think retry defaults to 60 seconds. So that's probably the issue. A warning to the console about that would be nice :-) |
This issue has been reported many times in Django-Q's issue tracker: Koed00#183 Koed00#180 Koed00#307 All these issue have been closed and responses have noted that retry should be set bigger than timeout or duration of any task.
Running version 0.9.4 in a 2-cluster environment.
I wrote a task to email myself and set it to run once/day at 10am. The task had a very poorly-scaling query in it, so it would take a few hours to run. I didn't have a timeout set, but I'd get somewhere from 5 to 20 emails a day. Here's a typical set of emails:
5:40am
5:57am
9:34am
9:51am
7:38pm
7:40pm
12:05am (next day)
12:05am (next day)
The emails from between 12am and 10am are presumably from tasks for the previous day, but they are date-stamped by when the task started (as part of the task).
The admin page does not show duplicate tasks, only one task per day. This day's task claims to have stopped at 7:38pm.
I found it interesting that I was running 2 clusters and emails seem to come in pairs. I saw some old posts about tasks being duplicated in multi-cluster environments, but those seemed to identify bugs that were fixed sometime before 0.9.4.
I've since fixed the slow query and so far it seems to have fixed it.
The text was updated successfully, but these errors were encountered: