Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locking does not work on PEFS backed by a ZFS volume #51

Closed
DanielO opened this issue Apr 5, 2022 · 24 comments
Closed

Locking does not work on PEFS backed by a ZFS volume #51

DanielO opened this issue Apr 5, 2022 · 24 comments

Comments

@DanielO
Copy link
Collaborator

DanielO commented Apr 5, 2022

With PEFS mounted on /testtank/test/pefs locking does not work:

doconnor@freebsd14:~ $ sudo lockf -k -t 0 /testtank/test/pefs/lock sleep 5 &
doconnor@freebsd14:~ $ sudo lockf -k -t 0 /testtank/test/pefs/lock echo foo
foo

If its not mounted, or if it is backed by UFS it does work:

doconnor@freebsd14:~ $ sudo lockf -k -t 0 /testtank/test/pefs/lock sleep 5 &
doconnor@freebsd14:~ $ sudo lockf -k -t 0 /testtank/test/pefs/lock echo foo
lockf: /testtank/test/pefs/lock: already locked
@DanielO
Copy link
Collaborator Author

DanielO commented Apr 5, 2022

monwarez - can you please try the advlock branch and see if it fixes it?
I did some quick tests and it seems to work, although I still don't understand why it works when backed by UFS and not ZFS..

@monwarez
Copy link

monwarez commented Apr 5, 2022

I did a quick test and it does not, note that this times it is not a plain 13.1-rc1, I cherry picked a commit related to locking but it did not improve. So I need to test it with a cleaner environment to be sure

@DanielO
Copy link
Collaborator Author

DanielO commented Apr 5, 2022

OK please do, I also tested it on a FreeBSD 13.1-RC1 system (arm64) and it worked.

Actually I tested it wrong and it doesn't work, I'll have to debug more.

@DanielO
Copy link
Collaborator Author

DanielO commented Apr 5, 2022

Works on:
FreeBSD freebsd14.gsoft.com.au 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-3468cd95c: Sat Mar 26 06:16:19 UTC 2022 [email protected]:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

Broken on:
FreeBSD rpi13.gsoft.com.au 13.1-RC1 FreeBSD 13.1-RC1 releng/13.1-n250053-6fe29001573 GENERIC arm64

@monwarez
Copy link

monwarez commented Apr 5, 2022

If I recall correctly, it was working on 13.1-BETA1, just before the merge of openzfs 2.1
Just before this commit: https://cgit.freebsd.org/src/commit/?h=releng/13.1&id=408fc21842a843efc6599e90fc88f06cd3b8dd9c

@monwarez
Copy link

monwarez commented Apr 9, 2022

I found a workaround: if I nullfs mount the backend directory it work
If I add this after creating mnt in the lock test suite

mkdir mnt.zfs
mount_nullfs mnt.zfs mnt
# rest of the test script

Then the test passes

@vansante
Copy link

I just upgraded to FreeBSD 13.1-RELEASE-p2, and I am using pefs-kmod version g20220816,1, and I am also getting this bug. Is this being worked on? This is kind of blocking for upgrading to 13.1.

@DanielO
Copy link
Collaborator Author

DanielO commented Sep 22, 2022

I don't really have the time to look at it right now sorry.

@RobinGeuze
Copy link
Contributor

So we did some digging. The problem or one of the problems appears to be that ZFS does not properly increase the "gen" attribute for a directory when creating a file. This causes the dircache to malfunction, because based on an older directory version it decides the file does not exist. This patch fixes the issues we ran into:

diff --git a/sys/fs/pefs/pefs_vnops.c b/sys/fs/pefs/pefs_vnops.c
index 42f0f11..3801484 100644
--- a/sys/fs/pefs/pefs_vnops.c
+++ b/sys/fs/pefs/pefs_vnops.c
@@ -546,8 +546,8 @@ pefs_lookup_dircache(struct pefs_enccn *enccn, u_long gen, struct vnode *dvp,
                cache = pefs_dircache_lookup(pd, cnp->cn_nameptr,
                    cnp->cn_namelen);
                if (cache == NULL) {
-                       if (pefs_dircache_valid(pd, gen))
-                               return (ENOENT);
+                       //if (pefs_dircache_valid(pd, gen))
+                       //      return (ENOENT);
                        return (EINVAL);
                }

@DanielO
Copy link
Collaborator Author

DanielO commented Sep 26, 2022

Thanks for the debug work Robin!
I am a bit reluctant to disable the cache wholesale, but perhaps it could be done as a work around if mounted on top of ZFS.

As a work around I think people could set sysctl vfs.pefs.dircache.enable=0 to avoid the problem without necessitating a recompile.

@RobinGeuze
Copy link
Contributor

This patch doesn't disable the cache fully, only "negative caches", eg the case where a file does not exist according to the cache. Positive cache hits are still pushed forward.

@RobinGeuze
Copy link
Contributor

RobinGeuze commented Sep 27, 2022

Did some more debugging, it seems like ZFS switched the value returned for va_gen from "seq" to "z_gen" and the latter does not seem to be increased for a directory when a file is created in it.

I've reported this as a bug at openzfs here: openzfs/zfs#13962

EDIT: For people who build their own FreeBSD or OpenZFS for FreeBSD I can confirm that reverting openzfs/zfs#12851 fixes the issue.

@DanielO
Copy link
Collaborator Author

DanielO commented Sep 27, 2022

Thanks, I did some digging on va_gen but then fell down a rabbit hole trying to work out where it was created/updated and not having any luck.

I just noticed now that struct attr has a va_filerev member which is set by z_seq in ZFS. Not sure how it behaves elsewhere though.

Does changing pefs_getgen() to return va_filerev also work? (I assume it would..)

@RobinGeuze
Copy link
Contributor

I'll check whether it works for ZFS tomorrow, but I already did a quick check and at least for tmpfs va_filerev is always 0 (hardcoded value), so its at least not a universal solution. Assuming the zfs guys do not see this as a big but intended behaviour we might need to detect what underlying fs we are mounting on to determine which cache invalidation method to use.

@ghost
Copy link

ghost commented Sep 27, 2022

va_gen (z_gen) is a "file generation number" assigned at creation to distinguish reused inode numbers va_fileid (z_id). In short, va_fileid is not unique over time since an id may be reused, but the pair (va_fileid, va_gen) is intended to be unique.

Changing the contents of a directory does not make it a new directory, so the pair (id, gen) should not change. It was a bug that it did so for ZFS.

The way to check for modification is va_filerev (z_seq). It does seem tmpfs hardcodes 0 here for no good reason. Several other filesystems do as well, but they're mostly filesystems where you wouldn't expect files/directories to be modified anyway, like udf and cd9660 and various pseudo-filesystems. Some of those are potentially bugs.

FWIW, I don't see tmpfs updating va_gen on a directory when a file is created inside either.

@RobinGeuze
Copy link
Contributor

RobinGeuze commented Sep 27, 2022

Ah thank you very much for that explanation. I tried to find some documentation on these values somewhere but really couldn't find anything that explains them.

PEFS already implemented that if the va_gen value is 0 it will ignore the dircache, so I guess that will work just as well for filerev. I tested it on ZFS and it seems to work fine when we replace va_gen with va_filerev. I've created a PR with the necessary changes (properly renaming and retyping everything).

@DanielO
Copy link
Collaborator Author

DanielO commented Sep 29, 2022

Another possibility would be to use va_gen and va_filerev together. I think that would work for all upper file systems.

@RobinGeuze
Copy link
Contributor

Had a week of holiday which gave me some time to think about it and I agree that might be the best approach, will update the PR.

@DanielO
Copy link
Collaborator Author

DanielO commented Oct 10, 2022

Thanks, that would be very helpful!

@RobinGeuze
Copy link
Contributor

Just pushed it, haven't properly tested it yet, will try to do that tomorrow.

@RobinGeuze
Copy link
Contributor

Just build and tested it, seems to work just fine :)

@DanielO
Copy link
Collaborator Author

DanielO commented Oct 12, 2022

@monwarez would you be able to test the change? It is in the master branch now (aa7f151)

@monwarez
Copy link

I just tested and it works, now it correctly lock the file, and also remove the lock file if not needed.

@DanielO
Copy link
Collaborator Author

DanielO commented Oct 12, 2022

Excellent news, three cheers for @RobinGeuze
Now to file a PR to get the port updated.

@DanielO DanielO closed this as completed Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants