-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The file handle cache ignores the memory high watermark #134
Comments
To reproduce:
|
It appears the memory is consumed by the file handle cache: the master queue's process dictionary is full of During my test, the queue contains 1000 messages of 10 MiB each. The memory high watermark being set at around 2 GiB, nearly all (if not all) messages are paged to the disk. However, the queue process is happily consuming 5 GiB of memory. This matches what Neither the file handle cache nor the message store ask the memory monitor how much memory they are allowed to consume. They only pay attention to the number of open file handles. |
@dumbbell worth delaying 3.5.2 for if you think we can fix this in under a week. |
I agree, let's delay 3.5.2. If I don't have a good solution in a week, we can do the release. |
... and clear read cache if necessary. This solves an issue where sync'ing a mirrored queue could take all available memory (way above the high watermark) and possibly crash the node. Fixes #134.
This is a preliminary commit. There is a debug log message left on purpose for instance; I will remove it after further testing. |
Here are the results of today's testing and benchmark: I measured the time taken to sync one mirrored queue containing 1000 messages of 10 MiB each (10 GiB worth of data). The two nodes are running on the same host, the disk is an SSD formatted using ZFS (which does heavy caching). Both had a high watermark set to 2.2 GiB. I did four runs for each of the following versions:
Results:
The explicit garbage collection gives very unstable performance. However, the proposed fix without this explicit garbage collection gives stable timings, even more stable than the During the tests, memory consumption with both versions of the fix could be up-to 3 GiB (800 MiB above the high watermark) for a few seconds. I believe it is the garbage collector who can temporarily allocate a lot of memory. The commit I force-pushed removes the call to the garbage collector and the debug log message. |
Because this change could modify the behaviour in a stable branch, what do you think about hiding it behind a configuration knob? |
It makes little sense to me to have a config value for this. This is what the users probably expect to be "always on". What kind of breaking changes are you worried about? |
As a side note, this file handle cache, which reimplements what the underlying OS already does, can lead to unexpectedly lower performance. During the test, the high watermark was set to 2.2 GiB. I then tried with the default high watermark (which gave 6.2 GiB on this machine): the average sync time went from 30" to 40" (33% decrease of performance)! The problem was that with that much memory, the file handle cache was competing with the ZFS ARC (ZFS' cache). As a result, part of RabbitMQ was paged out to disk... Admittedly, ZFS may not be the best filesystem for RabbitMQ, but we may want to keep that in mind and, one day, revisit how we do file caching. |
The current fix arbitrarily drops "2 * memory above limit" amount of data (taken among the least recently used read buffers) when the memory use goes above the limit. It seems to work for the tested scenario. But this cache is used by queue paging as well for instance. I have no idea if it could be counter-productive in other production workloads. |
Since RabbitMQ 2.0.0. |
Since the FHC now uses memory monitor, we need to make sure it is running early on. Nothing in rabbit_alarm really depends on the steps that used to run earlier. `full` test suite passes. It may be worth extracting memory and disk monitors into separate boot steps down the road. While here, remove a comment that's misleading. Originally introduced as part of bug24998, doesn't seem to have any significant reasoning behind it other than "complete separation of FHC". References #134. Fixes #157.
Since the FHC now uses memory monitor, we need to make sure it is running early on. Nothing in rabbit_alarm really depends on the steps that used to run earlier. `full` test suite passes. It may be worth extracting memory and disk monitors into separate boot steps down the road. References #134. Fixes #157.
From an escalation (
VESC-420
). Manual GC triggering does not change anything. So the issue most likely is an unbounded buffer, e.g. message store overwhelming queue master with messages it cannot relay to the mirror fast enough.The text was updated successfully, but these errors were encountered: