-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing queue-mode to 'lazy" sometimes locks the queue up #850
Comments
Please post full log files. |
Aslo, have you considered that sending up to 0.5 GB of messages to disk takes some time and while that happens queues do perform any other operations? Do queues go out of this state eventually? |
The messages are published as persistent, so they should be already stored in the disk. All the other queues switch to 'lazy' mode pretty fast (seconds top), the 'bad' ones does not seem to finish ever (I think I tried half an hour). I tried it again today, but was only able to reproduce the behavior with publisher still running. Find the attached archive containing logs, configuration, rabbitmqctl status,report output (report hanged so it is incomplete), /api/queues/ output and 10s strace output of the 'beam' process. I tried to simplify the configuration so I removed the node from the cluster and turned off all monitoring agents we have in place. It has been now 20 minutes since I applied the policy. 14/20 are in the 'bad' state, the server is using 800% CPU (all the CPUs available). |
There are no errors in the logs. We'd have to investigate after the 3.6.3 release. |
Hi @cenekzach I can reproduce the issue in this way:
What happen: The queues enter in wait_for_msg_store_credit The credit_flow:blocked() should return
It enters in a loop |
I'm testing 3.6.2 version. I use 3-node cluster of identical virtual servers - 8xCPU, 30G RAM, 500G storage. There is no HA policy, clients connect to the first node.
I have 20 clients publishing to 20 durable queues a mix of 1kB/10kB persistent messages (single vhost, single exchange). When queues are around 50k long I stop the publishers. Then I add policy saying that all the queues in the vhost are now lazy (everything else is default). Most of the queues behave ok, but sometimes some of the queues do not.
For example in one test one of the queues have 17555 messages in memory (54920 total) and can not be purged nor deleted - both UI and pika client hangs. The node consumes 100% CPU (1/8), but there is nothing else going on. The only way I found to fix this is to restart the cluster, the node with the queue does not stop cleanly and needs to be killed. After the node starts, the 'bad' queue is marked as lazy and works fine.
Tried it again now, 6/20 queues are in this bad state, the node using 600% CPU (6/8). Nothing suspicious in the logs.
OS: CentOS 6.7
Erlang: 17.4
The text was updated successfully, but these errors were encountered: