-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OrmQ broker MySQL connection errors on ORM.delete(task_id) #124
Comments
Hey. I just moved to Berlin two days ago and I've got internet again, so you caught me at a good time. |
Learned a bit more. My naive approach above (close the connection before doing anything in the The problem is that So instead I am now limiting my Still have more testing to do to see if this really does resolve the problem with stale/shared connections.
|
Learning more and I have now partially verified my solution.
Putting this all together: If So we either need to set a I'd prefer to let There are some undocumented ways to test to see if you're in an atomic block, but the recommended way that I found was: So now I'm doing:
It's pretty cool to see the debugging statements separate out cleanly--the webserver logs only report "In an atomic transaction" while the qcluster logs are all "Safe to call close_old_connections". So now every Broker method goes through
I listed these two specifically because I think my local tests have now replicated all the problems I've seen in production. And this approach solves all of them in local dev. Going to stage these changes to ElasticBeanstalk and see how it does on real-world infrastructure. If the solution holds, I'll submit a pull request. |
Just started seeing this now that my qcluster has been running for a few days. Running Django 1.8.7, django-q 0.7.11, pylibmc 1.5.0
The weird thing is that the broker is able to dequeue tasks, save the lock time, and pass them off to workers (the tasks themselves require DB access and do execute successfully). It's only when ORM.delete() is called that this error is triggered. So I don't think it's actually a problem accessing the DB, despite what the error says. I'd guess it has more to do with a conflicting transaction or autocommit state on the connection that results in dropping the connection. Just not sure why this behavior seems to only appear on older qclusters that have, presumably, passed some timeout threshold.
And because ORM.delete() failed, when the broker re-spawns it re-executes the completed task. Then once again dies when it tries to delete the task. I end up in an endless execute-die-respawn loop until I manually delete the task.
If it is related to stale DB connections, see: https://code.djangoproject.com/ticket/21597#comment:29
So as an experiment I'm now explicitly closing the connection in the broker before trying to use it and refactored access to the connection object:
Time will tell if this helps. My last pull request ended up being totally misguided so I'm holding off on this one for now!
Any thoughts?
The text was updated successfully, but these errors were encountered: