-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM / Panic on files remove #16037
Comments
@osleg can you post |
@robn sorry took me a bit of time to get those, here's 3 logs: first from before |
Got this issue as well... :-( |
I've also hit this recently and it looks like it is similar if not the same as #6783 |
@osleg I'm so sorry I missed this. Your slab output confirms it: see Intuitively, freeing objects shouldn't require large data allocations, however, freeing an object naturally involves metadata updates. #6783 involves dedup, which isn't in play here, but I notice all those indirect vdevs which I expect would need to be updated as frees come through, so maybe that's producing a similar effect. I'm not very familiar with the file deletion codepaths, and even less so with indirect vdevs, so I'll need to read a bunch of code to get an idea of how this stuff works before I can go any further. |
I'm closing in on this. If you're still able to do the test, I could use one more bit of information. Create the files, but don't delete anything. Just run: This will scan the entire pool from userspace, and produce a summary of all the blocks on the pool:
Basically what's happening is that if we can't process a "free block" operation on the spot, we put it on the IO pipeline. The overheads of that many ops suddenly landing on the pipeline is consuming a ton of memory (that's the The thing is, we only do that if freeing the block may also require at least a read (and maybe an update) of some other metadata. Specifically:
#6783 and #16697 both involve dedup, which explains them. Yours is less clear. The output from |
Problem
Upon testing OpenZFS versions 2.1.13-2.1.15 and 2.2.2-2.2.3 on CentOS 8 Stream
with various kernel versions ranging from 4.18.0-408 to .547, and utilizing
Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz with 8GB ECC RAM, we encountered
a memory consumption issue which leads to kerno panic during disk usage stress testing.
Test setup
Utilizing zpool with multiple configurations:
Pools consist of non-mirrored configurations with block devices of varying
sizes but consistent speed and throughput, ranging from 147GB to 6.7TB.
The test involves running multiple writers to fill the disk with random-sized
files ranging from 1KB to 2GB. Once the disks are filled, all files are
removed, and the process is repeated.
Observed issue
Across all tested versions, particularly pronounced in versions prior to 2.2.3,
significant memory consumption occurs when files are removed.
Memory usage spikes, consuming all available memory.
The OOM killer activates in an attempt to free memory, resulting in kernel
panics when no further resources are available for the OOM killer to release.
With 8GB RAM, the issue consistently occurs in every test instance before
version 2.2.3, with a decreased frequency in version 2.2.3 (5 out of 20 CentOS
test instances experienced kernel panics).
Logs
Machine info
Current instance is the only one that I left with for testing rn:
issue demo
After re-ssh directory still has all the files
zpool status
zpool list
zpool config
zfs config
vmcore dmesg
dmesg.txt
maybe related:
#14732
#15776
#14914
The text was updated successfully, but these errors were encountered: