-
Notifications
You must be signed in to change notification settings - Fork 5
Replace io-REDIS #178
Comments
Flyby thought: Perhaps there are some options in the redis library which cause/can help us avoid these issues? https://github.com/luin/ioredis/blob/0b001e0f3313ad53e9f62a8ab3a70c478205698b/lib/redis/index.ts#L51-L90 It does sound like some sort of a 'infinite retry' bug, e.g. with commands or connections? Perhaps even maxRetriesPerRequest which is currently set to -1? Line 37 in bb8dfba
|
Appreciate the outside perspective, but I'm not sure if that would help. The PR to try to fix this writes "maxRetriesPerRequest does not help with this" :). It seems to be getting stuck behind some socket connection issue and does not even try to retry. I think the first step in resolving this would be to abstract all the redis calls into |
Still pending final analysis, but it seems like introducing connection pooling was the real answer.
|
Connection pooling seems to have worked. The rate of errors dropped drastically and the ones still remaining seem to be related to shutdowns: #210 |
Last night I spent way too long trying to track down an impossible bug in the plugin server, which turns out to be caused by a bug in ioredis.
Basically, the following line would get stuck forever one time out of a thousand:
Removing any mention of redis from the ingestion pipeline solved the issue.
However, various plugins still use redis and we can't just have them stalling randomly like they now still sometimes do:
Luckily plugins now have timeouts and this doesn't do more than stall the ingestion for 30sec... and leave some open handles hanging.
Still crashing for no reason is not acceptable, even it it's one time in 100k.
What to do? Fix the bug in ioredis? Replace ioredis with redis?
I think celery.node required ioredis for some reason, but since we have integrated that code into our codebase, perhaps we can work around it... or even use both libraries interchangeably (one for celery, one for other uses)... And perhaps switching the redis library won't help at all? But I'm not sure what to do then.
The text was updated successfully, but these errors were encountered: