-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow initial read from encrypted volume #10259
Comments
Try sync / zpool sync after the dd's to test the true write time - the write may continue to be processed along with relatively slow encryption after the dd has returned. (drop_caches doesn't implicitly sync.) Regardless of the caching pattern, this is probably a slow encryption/decryption issue - known problem in ZFS releases up until now. A large optimization is in-hand and is likely to be included in 0.8.4. |
That occurred to me, so I did try It seems unlikely, to me, that it would be to do with the slow crypto issue fixed/improved in #9749 (assuming that's what you're referring to), as this problem occurs only on initial read from ARC after initial write, and is not a problem generally (though faster crypto will be nice, of course). This issue is also present when using aes-256-ccm (which seems to be a bit slower, overall and gives a ~50 MB/s initial read). Maybe (hopefully) #9749 will fix it, though. I'll try the latest in a VM shortly. I'm also planning on looking at the code at some point. |
I can reproduce this 100% with 0.8.3. Looks like it was fixed in 0.8.4. However, I am getting slow reads with some old pools. After some testing, my conclusions are:
I used these commands to reproduce the issue:
|
This appears to be fixed w/0.8.4 for me, too. Thanks for the comment; I had upgraded to 0.8.4 this morning but hadn't yet checked to see if the issue was still present (I had assumed it was). I'll go ahead and close this soon, unless anyone would rather it stay open (perhaps to sort out/deal with issues for encrypted ds created with older versions) ? |
Doesn't sound like there's any interest in keeping this issue open. Thanks to whoever fixed it! |
System information
First, thanks very much for your work on OpenZFS / ZoL. It's really incredible.
I've looked but don't see any other issues that cover the below.
Describe the problem you're observing
I'm relatively new to ZFS and am doing performance testing of new pools using both multiple hdd in raidz1 and a single NVMe striped (separate pools).
Using unencrypted volumes, everything seems to work very well, as expected, etc. With an encrypted volume (aes-256-gcm), I've noticed that the first/initial read after writing a large new file (I'm using 10 GB, but any size seems to be affected) is quite 'slow' - ca. 64 MB/s - and that's there's not significant use of the CPU (one core maxed by
dd
, system load average never exceeds 1). After the first read, the file is held in the ARC (as expected) and subsequent reads are very fast (~6 GB/s on my system). However, emptying the ARC immediately after the initial write results in a much faster, from-storage initial read (~350 MB/s) and more CPU usage (typically, a system load avg. of 4 or more). The file is again held in the ARC as expected and successive reads are very fast. During the 'slow' initial read, the drives are not read from; after clearing the ARC the drives are read from, as expected.This affects encrypted vols on both raidz1 and the NVMe 'stripe'. The numbers above are for the raidz1; NVMe has a fast initial write (1.2 GB/s), slow initial read after initial write that's also 64 MB/s, and then relatively fast subsequent reads from NVMe (empty ARC, 1.2 GB/s) and from ARC (6 GB/s).
cryptsetup benchmark
andopenssl speed
give output that's consistent with my expectations for this CPU, andperf top
suggests hardware acceleration is working normally ('aes_aesni_encrypt' and 'gcm_pclmulqdq_mul' are at the top of the list). Given this, and that encrypt/decrypt is relatively fast when reading/writing directly to storage (i.e., with an empty ARC), it doesn't seem like crypto is (or should be) the bottleneck/problem. It also doesn't seem to be related to the storage 'scheme' - striping is affected just the same as raidz1.This does seem like it has to do with how new, encrypted data is being handled/managed in the ARC. I don't yet know much about ZFS internals (sorry I can't be more descriptive or helpful), but here's what I've been able to gather (if you'll pardon a little redundancy):
On an initial write into an encrypted pool:
zpool iostat
(https://pastebin.com/raw/nLF35LgS), and is as I would expect;perf top
during an initial 'slow' read (immediately below), which shows heavy crypto (e.g. dominant aes_aesni_encrypt); interestingly, the ARC grows during this initial 'slow' read to double the size, so it seems likely we're caching both encrypted and unencrypted copies? No problem, of course, but worth mentioning I thought.On initial read without first emptying the ARC ('slow'):
perf top
;perf stat
counters suggest inefficiency - there's significantly more branching, instructions, etc. as as compared to a read after emptying the ARC (below) (https://pastebin.com/raw/G4XC9vPZ)On initial read after first emptying the ARC ('fast'):
perf top
;perf stat
counters suggest a significant improvement in efficiency as compared to the above (https://pastebin.com/raw/asbdwmGa)This has been discussed in #zfs and #zfsonlinux and has been reproduced by a friendly/helpful user, with:
I'm not sure where to go from here, but it seems like there's a bug or at least some unexpected behavior? Ideally the initial read after a write could either take place at the speed you would expect when reading/decrypting from memory, or the ARC version could be discarded (or storing it there disabled) and we could just do the initial read from storage?
Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs
There have been no warnings or errors that I've seen.
The text was updated successfully, but these errors were encountered: