Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow initial read from encrypted volume #10259

Closed
bgort opened this issue Apr 27, 2020 · 6 comments
Closed

Slow initial read from encrypted volume #10259

bgort opened this issue Apr 27, 2020 · 6 comments
Labels
Component: Encryption "native encryption" feature Type: Performance Performance improvement or performance problem

Comments

@bgort
Copy link

bgort commented Apr 27, 2020

System information

Type Version/Name
Distribution Name Proxmox
Distribution Version 6.1
Linux Kernel 5.4.27
Architecture x86_64 / Ryzen 9 3900x
ZFS Version 0.8.3
SPL Version 0.8.3

First, thanks very much for your work on OpenZFS / ZoL. It's really incredible.

I've looked but don't see any other issues that cover the below.

Describe the problem you're observing

I'm relatively new to ZFS and am doing performance testing of new pools using both multiple hdd in raidz1 and a single NVMe striped (separate pools).

Using unencrypted volumes, everything seems to work very well, as expected, etc. With an encrypted volume (aes-256-gcm), I've noticed that the first/initial read after writing a large new file (I'm using 10 GB, but any size seems to be affected) is quite 'slow' - ca. 64 MB/s - and that's there's not significant use of the CPU (one core maxed by dd, system load average never exceeds 1). After the first read, the file is held in the ARC (as expected) and subsequent reads are very fast (~6 GB/s on my system). However, emptying the ARC immediately after the initial write results in a much faster, from-storage initial read (~350 MB/s) and more CPU usage (typically, a system load avg. of 4 or more). The file is again held in the ARC as expected and successive reads are very fast. During the 'slow' initial read, the drives are not read from; after clearing the ARC the drives are read from, as expected.

This affects encrypted vols on both raidz1 and the NVMe 'stripe'. The numbers above are for the raidz1; NVMe has a fast initial write (1.2 GB/s), slow initial read after initial write that's also 64 MB/s, and then relatively fast subsequent reads from NVMe (empty ARC, 1.2 GB/s) and from ARC (6 GB/s).

cryptsetup benchmark and openssl speed give output that's consistent with my expectations for this CPU, and perf top suggests hardware acceleration is working normally ('aes_aesni_encrypt' and 'gcm_pclmulqdq_mul' are at the top of the list). Given this, and that encrypt/decrypt is relatively fast when reading/writing directly to storage (i.e., with an empty ARC), it doesn't seem like crypto is (or should be) the bottleneck/problem. It also doesn't seem to be related to the storage 'scheme' - striping is affected just the same as raidz1.

This does seem like it has to do with how new, encrypted data is being handled/managed in the ARC. I don't yet know much about ZFS internals (sorry I can't be more descriptive or helpful), but here's what I've been able to gather (if you'll pardon a little redundancy):

  • On an initial write into an encrypted pool:

    • It seems data is being written to disk and also held in the ARC, encrypted;
    • the writing to disk is confirmed by watching zpool iostat (https://pastebin.com/raw/nLF35LgS), and is as I would expect;
    • that it's being held in the ARC is confirmed by watching arcsz grow as the file is created/written, ending at 10GB for my 10GB test file; and,
    • that it's being held in the ARC encrypted is confirmed by watching perf top during an initial 'slow' read (immediately below), which shows heavy crypto (e.g. dominant aes_aesni_encrypt); interestingly, the ARC grows during this initial 'slow' read to double the size, so it seems likely we're caching both encrypted and unencrypted copies? No problem, of course, but worth mentioning I thought.
  • On initial read without first emptying the ARC ('slow'):

    • There is no reading from disk (https://pastebin.com/raw/9rBVhV6c);
    • as mentioned above, decryption takes place, confirmed by perf top;
    • this is very slow - ~64 MB/s;
    • perf stat counters suggest inefficiency - there's significantly more branching, instructions, etc. as as compared to a read after emptying the ARC (below) (https://pastebin.com/raw/G4XC9vPZ)
    • after this initial read, the file is stored in the ARC unencrypted, as subsequent reads are very fast (~6 GB/s) and there's no indication crypto is involved (just memcpy when copying to /dev/null)
  • On initial read after first emptying the ARC ('fast'):

    • As mentioned above, we read from storage (https://pastebin.com/raw/qzAwi8xX);
    • decryption takes place here, per perf top;
    • this is much faster - ~350 MB/s +/-;
    • perf stat counters suggest a significant improvement in efficiency as compared to the above (https://pastebin.com/raw/asbdwmGa)
    • after this initial read, the file is stored in the ARC unencrypted, as subsequent reads are very fast (~6 GB/s) and there's no indication crypto is involved (just memcpy when copying to /dev/null)

This has been discussed in #zfs and #zfsonlinux and has been reproduced by a friendly/helpful user, with:

  • A Debian system (kernel 4.19, E5-2680v3, 0.8.3) - 85 MB/s 'slow' initial read, but fast if ARC cleared prior to read
  • A Gentoo system (kernels 5.4 and 4.19, Ryzen 3700x, 0.8.3) - 64 MB/s 'slow' initial read, but fast if ARC cleared

I'm not sure where to go from here, but it seems like there's a bug or at least some unexpected behavior? Ideally the initial read after a write could either take place at the speed you would expect when reading/decrypting from memory, or the ARC version could be discarded (or storing it there disabled) and we could just do the initial read from storage?

Describe how to reproduce the problem

zpool create -f -o ashift=12 tub99 raidz1 <4 512e hdd, ~240 MB/s ea. raw r/w>
zpool set feature@encryption=enabled tub99

zfs create tub99/unenc
zfs create -o encryption=aes-256-gcm -o keyformat=passphrase tub99/enc

mkdir /mnt/tmp
mount -t tmpfs tmpfs /mnt/tmp
dd bs=1M count=10000 if=/dev/urandom of=/mnt/tmp/testjunk status=progress

####################### slow
echo 3 > /proc/sys/vm/drop_caches  #for good measure

#write files
dd bs=1M if=/mnt/tmp/testjunk of=/tub99/unenc/testjunk  #unencrypted
  10485760000 bytes (10 GB, 9.8 GiB) copied, 17.879 s, 586 MB/s

dd bs=1M if=/mnt/tmp/testjunk of=/tub99/enc/testjunk  #encrypted
  10485760000 bytes (10 GB, 9.8 GiB) copied, 18.079 s, 580 MB/s

#read files
dd bs=1M if=/tub99/unenc/testjunk of=/dev/null  #unencrypted
  10485760000 bytes (10 GB, 9.8 GiB) copied, 1.78527 s, 5.9 GB/s  (from ARC, of course)

dd bs=1M if=/tub99/enc/testjunk of=/dev/null  #encrypted
  10485760000 bytes (10 GB, 9.8 GiB) copied, 162.909 s, **** 64.4 MB/s **** << PROBLEM 

  ## successive reads for both enc & unenc come from the ARC, at ~6 GB/s ##

####################### fast

rm /tub99/enc/testjunk
rm /tub99/unenc/testjunk

echo 3 > /proc/sys/vm/drop_caches

#write files
dd bs=1M if=/mnt/tmp/testjunk of=/tub99/unenc/testjunk  #unencrypted
  10485760000 bytes (10 GB, 9.8 GiB) copied, 18.4432 s, 569 MB/s

dd bs=1M if=/mnt/tmp/testjunk of=/tub99/enc/testjunk  #encrypted
  10485760000 bytes (10 GB, 9.8 GiB) copied, 18.0011 s, 583 MB/s

echo 3 > /proc/sys/vm/drop_caches

#read files
dd bs=1M if=/tub99/unenc/testjunk of=/dev/null  #unencrypted
  10485760000 bytes (10 GB, 9.8 GiB) copied, 23.5944 s, 444 MB/s

dd bs=1M if=/tub99/enc/testjunk of=/dev/null  #encrypted
  10485760000 bytes (10 GB, 9.8 GiB) copied, 31.3279 s, **** 335 MB/s **** << MUCH BETTER
  
  ## successive reads for both enc & unenc come from the ARC, at ~6 GB/s ##

Include any warning/errors/backtraces from the system logs

There have been no warnings or errors that I've seen.

@adamdmoss
Copy link
Contributor

Try sync / zpool sync after the dd's to test the true write time - the write may continue to be processed along with relatively slow encryption after the dd has returned. (drop_caches doesn't implicitly sync.)

Regardless of the caching pattern, this is probably a slow encryption/decryption issue - known problem in ZFS releases up until now. A large optimization is in-hand and is likely to be included in 0.8.4.

@bgort
Copy link
Author

bgort commented Apr 29, 2020

That occurred to me, so I did try sync and zpool sync early on, actually, but forgot to mention. It makes no difference if the initial read follows a sync, or is 10 minutes later (as long as the new file is still cached), as that initial read is coming from the ARC. There is no reading from storage at all during the 'slow' read, as confirmed by iostat and zpool iostat. This issue affects both the HDD raidz1 and SSD stripe.

It seems unlikely, to me, that it would be to do with the slow crypto issue fixed/improved in #9749 (assuming that's what you're referring to), as this problem occurs only on initial read from ARC after initial write, and is not a problem generally (though faster crypto will be nice, of course). This issue is also present when using aes-256-ccm (which seems to be a bit slower, overall and gives a ~50 MB/s initial read). Maybe (hopefully) #9749 will fix it, though.

I'll try the latest in a VM shortly. I'm also planning on looking at the code at some point.

@bgort
Copy link
Author

bgort commented Apr 29, 2020

I have tested this against 89a6610 (HEAD -1; HEAD [c14ca14] is broken) and the problem is still present.

@behlendorf behlendorf added Component: Encryption "native encryption" feature Type: Performance Performance improvement or performance problem labels Apr 29, 2020
@vvrbanc
Copy link

vvrbanc commented May 18, 2020

I can reproduce this 100% with 0.8.3. Looks like it was fixed in 0.8.4. However, I am getting slow reads with some old pools.

After some testing, my conclusions are:

  • performance bug is fixed on 0.8.4 if zfs encryptionroot was created with 0.8.3 or 0.8.4.
  • encryptionroots created with 0.8.1 are slow even after upgrading zfs to 0.8.4. Even if you create a new child under old encryption root.
  • the bug is not zpool related, it only matters whether the encryptionroot was created with older or newer zfs.
  • I did not test zfs 0.8.2, only 0.8.1, 0.8.3 and 0.8.4.

I used these commands to reproduce the issue:

dd if=/dev/zero of=/dev/shm/zfs.img bs=1M count=1300
dd if=/dev/urandom of=/dev/shm/zfs.key bs=32 count=1
dd if=/dev/urandom of=/dev/shm/testfile bs=1M count=1000
zpool create -f -o cachefile=/tmp/zpool.cache -O encryption=on -O keyformat=raw -O keylocation=file:///dev/shm/zfs.key zfstest /dev/shm/zfs.img
dd if=/dev/shm/testfile of=/zfstest/testfile bs=1M
sync
dd if=/zfstest/testfile of=/dev/null bs=1M

@bgort
Copy link
Author

bgort commented May 18, 2020

This appears to be fixed w/0.8.4 for me, too. Thanks for the comment; I had upgraded to 0.8.4 this morning but hadn't yet checked to see if the issue was still present (I had assumed it was).

I'll go ahead and close this soon, unless anyone would rather it stay open (perhaps to sort out/deal with issues for encrypted ds created with older versions) ?

@bgort
Copy link
Author

bgort commented May 21, 2020

Doesn't sound like there's any interest in keeping this issue open. Thanks to whoever fixed it!

@bgort bgort closed this as completed May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Encryption "native encryption" feature Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

4 participants