Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unison (fastcheck=true) doesn't notice new files (written in exFAT by another computer running Windows) #1122

Open
Artur2048 opened this issue Feb 14, 2025 · 24 comments
Labels
discuss way forward is unclear; needs discussion of approach to take and why windows not applicable to systems other than Microsoft Windows

Comments

@Artur2048
Copy link

Instructions


I have thoroughly read the Reporting Bugs and Feature Requests wiki page.

Meta

Running Unison to sync local drive with removable USB SSD.

Environment

  • unison version 2.53.7
  • Linux Mint 21.3
  • ocaml 4.14.2)
  • issue affect both - GUI and CLI
  • both - local and SSH
  • use of fsmonitor: no

Reproduction Recipe

After performing initial sync with Unison, I've unmounted and removed my USB SSD, plugged it to another PC and added some files and directories.
I have then plugged it back to my computer and started Unison again. After rescanning both folders it wrote that "Everything is in sync" (which is not true - there are new files there)

Interesting fact - if I create additional file in exactly the same directory as the "undetected new files" on my local computer and rescan with Unison - it will detect all new files - including files created on another PC which were previously undetected

Additional information - my SSD is formatted as ExFAT, and the profile PRF is:

root = /media/artur/Extreme Pro/Test1/
root = /home/artur/Test1/
fat = true
times = true
log = true
logfile = sync_test1.log

Steps to reproduce:

  1. both roots (home and media) are in sync
  2. unmount and remove "media"
  3. add some files to some subdirectory on ANOTHER PC - Windows PC:
    for example add file "test1" in already existing directory "Archive"
    and also add "test2" in the already existing directory "Documents"
  4. bring back the USB SSD and mount it on local PC, start Unison again and resccan (either via GUI or CLI)
  5. Unison will report that everything is in sync and no changes has been made to either replica
  6. now on LOCAL PC - create additional (empty) file on "media" in the Archive directory (let's say test3)
  7. hit RESCAN on Unison GUI (or start it again via CLI):
  8. Unison will now detect the "test3" file correctly as well as "test1" file ...but "test2" remains undetected
  9. after clicking GO - it will correctly copy test1 and test3 to the replica in "home".

Expected behavior vs actual behavior

Expected behavior - Unison should be able to detect files modified to removable drives on external PCs

Additionally:

  • if i test using above steps on very small replica set (just 2 directories with few empty files) - SOMETIMES Unison detects them and sometimes it doesn't. (but if it doesn't - then running additional rescans will not change anything - files remain undetected)
  • adding on local pc additional file to the directory containing new (undetected) files will always cause Unison to detect those files in that particular directory (just type "touch /media/....../Test1/Archive/xxx")
  • on my large volume (containing about 1TB of data and almost 100,000 files) it basically happens on every run. (so i need to delete Unison cache - the "archive files ar*" and "fingerprint files fp*" to make Unison do the "first time scan" in order to guarantee my files are really in sync)
@Artur2048 Artur2048 added the defect unison fails to meet its specification (but doesn't crash; see also "crash") label Feb 14, 2025
@gdt
Copy link
Collaborator

gdt commented Feb 14, 2025

Are you using the CLI without -repeat?

Could you use ktrace (ktruss, strace, not sure what Linux calls it) to see what system calls unison executes? Basically the key question is if the operating system is returning information about these files. Or perhaps -- guessing wildly -- it has cached information and does not.

Have you looked at the filesystem back on the Linux machine to see that the files you think are there are there?

@Artur2048
Copy link
Author

yes. i'm using CLI without -repeat,
...but exactly the same issue happens on GUI version of Unison
System is correctly returning information about those files. if i run ls -lR /media/.../Test1 - the files are there 100%
i can also use any other tool to copy or archive them (cp, tar, ...)

One thing that puzzles me is - if i create additional file on my local Linux machine and run Unison the second time - it WILL actually detect the new file i just created as well as the file that was added on another PC.

I think it is possible Unison gets cached information... but it's not cached by OS... it's maybe cached in those ar* and fp* files in my .unison directory
(if i delete ar* and fp* from .unison - it will do "first time rescan" and do proper resync)

Also (not sure if it matters) - time on both machines is synchronized via NTP.

Sure.. I'll try to run strace and post the results.

@tleedjarv
Copy link
Contributor

I have a suspicion of what is going on here... but first, could you check a few things:

  • Does the problem persist when you try with fastcheck = false?
  • Is the problem only present when you add new files (or possibly delete a file); or also when you modify an existing file?

@tleedjarv
Copy link
Contributor

And then, please post the output of stat Archive (any dir where the issue exists) before and after you have added a new file in it on another PC.
So:

  1. stat Dirname
  2. unmount
  3. add a file in Dirname on another PC
  4. bring back the SSD and mount again
  5. stat Dirname

@Artur2048
Copy link
Author

OK. i will retest with fastcheck=false and let you know.
I will also try to reproduce the problem without adding new files (by only changing existing files).
...as for now i don't know if the problem would happen in such case.

(the problem is - the issue does not happen every single time....)

But I have reproduced it again and gathered the information you were asking for
this time it's my real set of replicas containing some FLAC files i purchased on PrestoMusic

content of the profile file:
artur@legion:~/Unison-test$ cat ~/.unison/data.prf

label = Sync Music and Archive to Extreme Pro
root = /media/artur/Extreme Pro/
root = /media/artur/Data/

path = Music
path = Archive

fat = true
times = true
log = true
logfile = sync_data.log

Replicas are now in sync.
I have unmounted (properly) the USB drive and mounted on my Windows machine where i have added one new folder and one new text file.
Then after (safely) unmounting USB from Windows laptop i plugged it back to my Linux machine.

New folder and new file is on USB:

artur@legion:~/Unison-test$ ls -l /media/artur/Extreme\ Pro/Music/PrestoMusic/Hi-Res/test
total 0
-rw-r--r-- 1 artur artur 0 Feb 14 15:33 'New Text Document.txt'

But as for now - it's not on the local disk:

artur@legion:~/Unison-test$ ls -l /media/artur/Data/Music/PrestoMusic/Hi-Res/test
ls: cannot access '/media/artur/Data/Music/PrestoMusic/Hi-Res/test': No such file or directory

let's compare amount of files (and subdirectories) in the Hi-Res folder:

artur@legion:~/Unison-test$ ls -l /media/artur/Data/Music/PrestoMusic/Hi-Res/ | wc -l
48

artur@legion:~/Unison-test$ ls -l /media/artur/Extreme\ Pro/Music/PrestoMusic/Hi-Res/ | wc -l
49

Starting Unison to resync

artur@legion:~/Unison-test$ ~/Unison/bin/unison data
Unison 2.53.7 (ocaml 4.14.2): Contacting server...
Looking for changes
Reconciling changes
Nothing to do: replicas have not changed since last sync.

running the strace shows that Unison is calling lstat:
lstat("/media/artur/Data/Music/PrestoMusic/Hi-Res", {st_mode=S_IFDIR|0755, st_size=131072, ...}) = 0

Stat of the directory (and the subdirectory) before adding new file

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/Hi-Res
  File: /media/artur/Extreme Pro/Music/PrestoMusic/Hi-Res
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 217835      Links: 50
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2024-12-09 17:25:24.000000000 +0700
Modify: 2024-12-09 17:40:22.000000000 +0700
Change: 2024-12-09 17:40:22.000000000 +0700
 Birth: 2025-01-30 23:53:54.600000000 +0700


artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/Hi-Res/test
  File: /media/artur/Extreme Pro/Music/PrestoMusic/Hi-Res/test
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 219772      Links: 2
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2025-02-14 15:33:38.000000000 +0700
Modify: 2025-02-14 15:33:38.590000000 +0700
Change: 2025-02-14 15:33:38.590000000 +0700
 Birth: 2025-02-14 15:33:38.590000000 +0700

now i'll add NEW file on my local Linux machine:

artur@legion:~/Unison-test$ touch /media/artur/Extreme\ Pro/Music/PrestoMusic/Hi-Res/NEW_FILE

Stat of the directory after adding file:

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/Hi-Res
  File: /media/artur/Extreme Pro/Music/PrestoMusic/Hi-Res
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 217835      Links: 50
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2024-12-09 17:25:24.000000000 +0700
Modify: 2025-02-14 15:44:29.340000000 +0700
Change: 2025-02-14 15:44:29.340000000 +0700
 Birth: 2025-01-30 23:53:54.600000000 +0700

And let's start Unison again

artur@legion:~/Unison-test$ ~/Unison/bin/unison data
Unison 2.53.7 (ocaml 4.14.2): Contacting server...
Looking for changes
Reconciling changes

Extreme Pro    Data               
new file ---->            Music/PrestoMusic/Hi-Res/NEW_FILE  [f] 
new dir  ---->            Music/PrestoMusic/Hi-Res/test  [f] 

2 items will be synced, 0 skipped
0 B to be synced from Extreme Pro to Data
0 B to be synced from Data to Extreme Pro

Proceed with propagating updates? [] 

As we can see - after adding another file on the LOCAL machine - Unison can now see the changes including other - previously undetected file. (but only in the Hi-Res folder)

@tleedjarv
Copy link
Contributor

Thank you for all the detailed data. You didn't produce quite exactly what I asked for (just stat the parent dir before and after adding a new file on the other machine, don't add anything on the local machine) but I believe I got my suspicion confirmed based on the existing data anyway.

Now, this is not 100% confirmed yet but I believe what is happening here is:

  • Unison has an optimization where it checks if a directory has changed by doing lstat (as you saw) and does not read directory contents if it thinks nothing has changed. Directory contents being the list of files and subdirs.
    • If an existing file or sub-dir has changed, this should still be detected because each file is separately checked for changes, regardless if the parent directory is assumed unchanged.
  • For some reason (yet to be determined), when adding a new file/subdir on the other machine the modification time for the parent directory is not updated. This causes Unison to skip reading the list of files/subdirs in that directory, thus missing the new files.
  • Adding a file on the local machine does update the directory modification time, which is why all the updates are detected on the next sync.

fastcheck=false turns of this optimization (and others).

We should now confirm that this is indeed what is happening and figure out why the directory modification time is not being updated.

@Artur2048
Copy link
Author

I have retested with fastcheck=false

here's 2 replicas with different content:

artur@legion:~/Unison-test$ find /media/artur/Extreme\ Pro/Music/PrestoMusic/ | wc -l
1757
artur@legion:~/Unison-test$ find /media/artur/Data/Music/PrestoMusic/ | wc -l
1755

(the Extreme Pro has New Folder and New Text Document.txt added on another machine)

let's try first WITHOUT adding fastcheck=false:

artur@legion:~/Unison-test$ ~/Unison/bin/unison data
Unison 2.53.7 (ocaml 4.14.2): Contacting server...
Looking for changes
Reconciling changes
Nothing to do: replicas have not changed since last sync.

Then i ran it again with added fastcheck false:

artur@legion:~/Unison-test$ ~/Unison/bin/unison -fastcheck false data
Unison 2.53.7 (ocaml 4.14.2): Contacting server...
Looking for changes
Reconciling changes

Extreme Pro    Data               
new dir  ---->            Music/PrestoMusic/CD Quality/New folder  [f] 

now i chose "Q" to terminate reconciliation process and get the stats of directories before and after adding file...

This time i'll provide stats on all 3 levels
/media/artur/Extreme Pro/Music/PrestoMusic/CD Quality/New folder/ <-- this is the newly added folder
/media/artur/Extreme Pro/Music/PrestoMusic/CD Quality/
/media/artur/Extreme Pro/Music/PrestoMusic/

Stats below

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/CD\ Quality/New\ folder/
  File: /media/artur/Extreme Pro/Music/PrestoMusic/CD Quality/New folder/
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 229964      Links: 2
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2025-02-14 17:15:42.000000000 +0700
Modify: 2025-02-14 17:15:43.870000000 +0700
Change: 2025-02-14 17:15:43.870000000 +0700
 Birth: 2025-02-14 17:15:43.870000000 +0700

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/CD\ Quality/
  File: /media/artur/Extreme Pro/Music/PrestoMusic/CD Quality/
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 227075      Links: 12
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2024-12-09 17:40:22.000000000 +0700
Modify: 2024-12-09 17:40:22.000000000 +0700
Change: 2024-12-09 17:40:22.000000000 +0700
 Birth: 2025-01-30 23:53:46.520000000 +0700

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/
  File: /media/artur/Extreme Pro/Music/PrestoMusic/
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 226638      Links: 5
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2024-12-09 17:31:24.000000000 +0700
Modify: 2024-12-09 17:31:25.300000000 +0700
Change: 2024-12-09 17:31:25.300000000 +0700
 Birth: 2025-01-30 23:53:46.520000000 +0700

Now i'll add another file on local machine to the "CD Quality" folder:
touch /media/artur/Extreme\ Pro/Music/PrestoMusic/CD\ Quality/NEW-FILE

and let's see stats again:

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/CD\ Quality/New\ folder/
  File: /media/artur/Extreme Pro/Music/PrestoMusic/CD Quality/New folder/
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 229964      Links: 2
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2025-02-14 17:15:42.000000000 +0700
Modify: 2025-02-14 17:15:43.870000000 +0700
Change: 2025-02-14 17:15:43.870000000 +0700
 Birth: 2025-02-14 17:15:43.870000000 +0700

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/CD\ Quality/
  File: /media/artur/Extreme Pro/Music/PrestoMusic/CD Quality/
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 227075      Links: 12
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2024-12-09 17:40:22.000000000 +0700
Modify: 2025-02-14 17:50:55.770000000 +0700
Change: 2025-02-14 17:50:55.770000000 +0700
 Birth: 2025-01-30 23:53:46.520000000 +0700

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/
  File: /media/artur/Extreme Pro/Music/PrestoMusic/
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 226638      Links: 5
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2024-12-09 17:31:24.000000000 +0700
Modify: 2024-12-09 17:31:25.300000000 +0700
Change: 2024-12-09 17:31:25.300000000 +0700
 Birth: 2025-01-30 23:53:46.520000000 +0700

let's rerun unison (without adding fastcheck=false)

artur@legion:~/Unison-test$ ~/Unison/bin/unison data
Unison 2.53.7 (ocaml 4.14.2): Contacting server...
Looking for changes
Reconciling changes

Extreme Pro    Data               
new file ---->            Music/PrestoMusic/CD Quality/NEW-FILE  [f] 
new dir  ---->            Music/PrestoMusic/CD Quality/New folder  [f] 

2 items will be synced, 0 skipped
0 B to be synced from Extreme Pro to Data
0 B to be synced from Data to Extreme Pro

Proceed with propagating updates? [] 

Hmm... it's quite unusual. I've read the description of fastcheck... i would assume it affects the way files are analyzed (not directories). For example if a file has unchanged modifiication date and size - Unison will assume that the file was not changed.
...but i thought that the directories are still "crawled normally" like doing find ./ and then files are compared either by fast-checking or slow-checking...

Please let me know if you would like to get any more information or retests.

@Artur2048
Copy link
Author

Ah. my bad. I guess you wanted to see stats of the "CD Quality" directory BEFORE i added file on the other computer...
let me try to reproduce it again.

@Artur2048
Copy link
Author

OK. reproduced it again, and you were right!

  1. i check stats of directory "Hi-Res"
  2. unmount drive, take it to another PC (Windows)
  3. add subdirectory inside "Hi-Res"
  4. bring the drive back to local laptop and check stats again

ok - before unmounting (at this point replicas are really in sync):

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/Hi-Res/
  File: /media/artur/Extreme Pro/Music/PrestoMusic/Hi-Res/
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 226639      Links: 49
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2025-02-14 16:27:02.000000000 +0700
Modify: 2025-02-14 16:27:02.160000000 +0700
Change: 2025-02-14 16:27:02.160000000 +0700
 Birth: 2025-01-30 23:53:54.600000000 +0700

now disk was unmounted and file was added on Windows PC
...and back to Linux:

artur@legion:~/Unison-test$ ~/Unison/bin/unison data
Unison 2.53.7 (ocaml 4.14.2): Contacting server...
Looking for changes
Reconciling changes
Nothing to do: replicas have not changed since last sync.

and the stats indeed are not changed - modification times are the same as before:

artur@legion:~/Unison-test$ stat /media/artur/Extreme\ Pro/Music/PrestoMusic/Hi-Res/
  File: /media/artur/Extreme Pro/Music/PrestoMusic/Hi-Res/
  Size: 131072    	Blocks: 256        IO Block: 131072 directory
Device: 811h/2065d	Inode: 230131      Links: 50
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   artur)   Gid: ( 1000/   artur)
Access: 2025-02-14 16:27:02.000000000 +0700
Modify: 2025-02-14 16:27:02.160000000 +0700
Change: 2025-02-14 16:27:02.160000000 +0700
 Birth: 2025-01-30 23:53:54.600000000 +0700

And after either - adding some file in "Hi-Res" folder or running Unison with -fastcheck false - it synchronizes directories properly.

So.. the other machine where i was adding files (that didn't cause modification times to update) is Windows 11 PC.

File system on my USB-SSD drive is ExFat

@Artur2048
Copy link
Author

Maybe we could have Unison crawl directories in the regular way ("find ./") and don't rely on their modification times?
and still use "fastcheck" to determine if files were modified?

crawling directory does not take that much time... it would add a fraction of second maybe to the total time
..but disabling fastcheck for both - files and directories - changes total processing time from 1 second to several minutes..

Not sure how Windows treats directories on ExFat removable drives and why the modification times are not updated properly... (I confirm i always unmount my drives safely on Linux as well as on Windows)

@tleedjarv
Copy link
Contributor

tleedjarv commented Feb 14, 2025

Very good, at least we now know what the underlying problem is.

I've read the description of fastcheck... i would assume it affects the way files are analyzed (not directories). For example if a file has unchanged modifiication date and size - Unison will assume that the file was not changed.
...but i thought that the directories are still "crawled normally" like doing find ./ and then files are compared either by fast-checking or slow-checking...

Right. The directories part was bolted on to it later, as an "experimental" feature 15 years ago (

unison/NEWS.md

Lines 538 to 539 in 77cd088

+ Experimental update detection optimization: do not read the
contents of unchanged directories
). I guess it worked okay enough that it was never polished after that.

Maybe we could have Unison crawl directories in the regular way ("find ./") and don't rely on their modification times? and still use "fastcheck" to determine if files were modified?

crawling directory does not take that much time... it would add a fraction of second maybe to the total time ..but disabling fastcheck for both - files and directories - changes total processing time from 1 second to several minutes..

Yes, disabling fastcheck is definitely not an option. As to whether always crawling directories is a performance penalty or not, I have no clue. It apparently could have been in 2009 when this optimization was added. Before deciding whether to disable or remove this optimization, it would be great to have some real world testing results on various filesystems and media (I'm specifically thinking of high-latency media like rotating drives and USB sticks). I will first prepare binaries suitable for testing.

Regardless of the corrective action we decide to take, it would still be interesting to find out why the modification time is not updated.

exFAT specification seems to require that modification time be updated https://learn.microsoft.com/en-us/windows/win32/fileio/exfat-specification#746-lastmodifiedtimestamp-lastmodified10msincrement-and-lastmodifiedutcoffset-fields

This random page I found seems to suggest that the modification time should be updated when adding a file: https://answers.microsoft.com/en-us/windows/forum/all/exfat-format-external-drive-with-all-folders/39d33729-2148-4718-84f5-419336a49f81

Edit: found an interesting commit 05faade
So this appears to be a known problem since the beginning but the "fix" was to disable this optimization when Unison is running in Windows (even if NTFS, ReFS or other filesystem), and the "fix" was never applied when syncing to a FAT filesystem in some other OS. I guess at minimum the proper fix is to disable this optimization for a FAT filesystem (incl. exFAT), under any OS.

@gdt
Copy link
Collaborator

gdt commented Feb 14, 2025

My quick reactions are

  • Adding a file and not changing the directory mod time is buggy.
  • I am not inclined to carve out directories from fastcheck because of small numbers of buggy implementations. Disabling fastcheck on dirs is going to cause reading all directories, which will take time and impact caching.*
  • Perhaps we need a way to turn off fastcheck on dirs separately. Could be added to -fat, but maybe FAT32 is ok.
  • Philosophically, I do not like to impose costs on systems that behave properly because other systems have bugs.

And:

  • It would be interesting to see if Linux updates dir modtimes correctly (on exFAT).
  • It would be interesting to see if Windows updates dir modtimes correctly on FAT32

if this situation is really that Windows doesn't update the mod time, please file a bug with MS, and then provide a link to the public bugtracker entry. (I have no idea how that goes, but in theory paid-for proprietary software comes with support.) But first make sure you are running a version of Windows that is currently receiving support and fully up to date.

@gdt gdt changed the title Unison fails to detect new files on removable media in case changes were made on another PC Unison (fastcheck=true) doesn't notice new files (written in exFAT by another computer running Windows) Feb 14, 2025
@gdt gdt added the windows not applicable to systems other than Microsoft Windows label Feb 14, 2025
@gdt
Copy link
Collaborator

gdt commented Feb 14, 2025

I've retitled now that we understand more, and marked this as a Windows issue (happy to remove that if we find that the not-updating-mtime behavior is more widespread).

@gdt gdt added discuss way forward is unclear; needs discussion of approach to take and why and removed defect unison fails to meet its specification (but doesn't crash; see also "crash") labels Feb 14, 2025
@tleedjarv
Copy link
Contributor

  • I am not inclined to carve out directories from fastcheck because of small numbers of buggy implementations. Disabling fastcheck on dirs is going to cause reading all directories, which will take time and impact caching.

This needs to be measured. Do we know if there really is any meaningful cost and impact on caching? The previously synced individual files need to be checked anyway. This could be tested separately with a tiny C program perhaps. Alt A: read the dir contents and stat all files in the dir. Alt B: don't read the dir contents and stat all the files in the dir. Of course, to execute alt B, the list of files needs to be provided externally, without reading the dir contents.

  • It would be interesting to see if Linux updates dir modtimes correctly (on exFAT).

Yes it does, this was demonstrated above.

  • It would be interesting to see if Windows updates dir modtimes correctly on FAT32

According to the commit from 2009 that I referenced, unlikely.

@tleedjarv
Copy link
Contributor

tleedjarv commented Feb 14, 2025

Do we know if there really is any meaningful cost and impact on caching?

Interestingly enough, there appears to be a significant difference. I added some code to enable/disable the optimization in question by an env var and ran some primitive tests.

A local sync between two replicas of 400 000 files, no subdirs, no updates ("Nothing to do: replicas have not changed since last sync.").

Cache warm, not purged at any time during testing.
With dirFastCheck=true the entire execution (time unison ...) takes around 2.2 - 2.3 seconds (~0.6 sec sys).
With dirFastCheck=false the entire execution takes around 2.9 - 3.1 seconds (~0.7-0.8 sec sys).

While the runtime increase is 30-40%, it is less than one second for 800 000 files. It would be interesting to see the effect of a cold cache and high-latency media.

Edit: testing with 10 000 files per replica (which should be more representative of an average user) also shows overall time increased by 0.01-0.02 seconds (this increase also correlates well with the one above). This is more significant than I'd imagined.

Edit2: trying to clear caches, with both replicas above the overall execution time increases (hopefully indicating that caches indeed were purged) but the delta between dirFastCheck=true vs false remained the same.

@tleedjarv
Copy link
Contributor

For those interested in doing some testing of their own, download a binary from https://github.com/tleedjarv/unison/actions/runs/13329824560

With that binary you can use an environment variable UNISONDEBUG_DIRFASTCHECK to enable/disable the directory read optimization.

Set the env var to true to enable the optimization, and to false to disable it. Any other value will result in an error.

@gdt
Copy link
Collaborator

gdt commented Feb 14, 2025

And harder to test, but the effect of cache loading on other workloads is also important. If somebody is running this once, that's one thing, but if it's -repeat, cache-changing behavior is more serious.

Reading the 2nd link, it sounds like someone was finding that changing a file didn't update the directory mod time, and it was explained that this is correct, but if you add a file, it's ok. (The answer is a little hard to understand, because it talks about how things are designed vs the specification.)

Regarding the 2009 commit, good find! But that seems to be not really the right conditional. With the benefit of lots of hindsight, it seems the right condition is "filesystem is FAT and was written by Windows". Which you can't tell. Having fastcheckdir and people turning it off in their profile when that's true seems reasonable, in terms of letting people work around Windows bugs while not negatively impacting others. Plus we could default it off with -fat on Windows. That would leave syncing removable media written on Windows on a non-Windows system. Given that unison does remote sync, I suspect this is not super common.

@tleedjarv
Copy link
Contributor

And harder to test, but the effect of cache loading on other workloads is also important. If somebody is running this once, that's one thing, but if it's -repeat, cache-changing behavior is more serious.

Is there a meaningful difference (actual cache changing) between the two alternatives I outlined in #1122 (comment) ? I'm not challenging your statement, I'm just trying to wrap my head around it.

So far, I'm thinking that the observed performance difference can be attributed in part to OCaml's Unix library and Unison's own code doing more work (which could have optimization potential) and in part to a modern Linux kernel having an inefficient implementation of readdir(3). I have not yet run any test on any other OS (and possibly will not either).

But this is more a point of curiosity now. As for the actual fix to the issue at hand my current thinking is this:

  • To not introduce a new user preference. This feature has existed for 15 years without a user preference; there is clearly no need to introduce one now.
  • Implement code for detecting a FAT/exFAT filesystem and disable dirFastCheck regardless of OS.
    • I planned to implement FAT filesystem detection anyway due to our discussions around the times preference.

-fat is irrelevant on Windows. It is only meant to be used on non-Windows OS, and with a successful FAT detection becomes as good as irrelevant (will only have some meaning in case the fs type can't be detected properly, such as an SMB share).

@gdt
Copy link
Collaborator

gdt commented Feb 14, 2025

What I meant is that reading all the dirs is going to load data into some kind of fs caches in the OS, and displace other data. That is causing to cause other programs to see different timing behavior. That's really hard to figure out, but not dismissable. I do see your point that It could be that on all systems that stat calls on the files force the same loading.

I don't think it's clear there is no need for a preference because it's been 15 years. (Really windows is buggy and people running windows should file a bug and MS should fix it; there's no need to change unison, which is not wrong. And therefore nothing to do at all.) Another view is that this usage is really unusual, if it hasn't been reported yet, and affected people disabling fastcheck is an ok workaround.

But seriously, we are talking about accommodating buggy proprietary software, to what extent, and how. Let's wait to see how the bug report gets made and handled at least for a bit. People may need to disable fastcheck on dirs but not files for FAT written by WIndows, if "support" of proprietary software is as illusory as seems reasonably likely.

As I understand it, the problem is not FAT; it's Windows mis-handling FAT, and so the disabling should be for FAT which has been written by Windows. I don't see how that can be automatic.

I also don't see how we can detect the underlying fs in any reasonable or reliable way. There could be remote fileysystems, union or null mounts, whatever.

I'm also not inclined to disable something which is sound according to standards, in a way that's hidden or harms non-broken usage.

@tleedjarv
Copy link
Contributor

Disabling fastcheck is not an option. That would mean reading the contents of all files every time.

Disabling the dir optimization on the other hand is very much an option. Disabling it has no known downsides. (Also note that it was never enabled on Windows; we could now enable it for NTFS and ReFS, for example)

We can of course do one better and not disable it universally. If we disable the dir optimization when having detected FAT then we've covered a vast majority (and I'm thinking 99.9+% here) of potential issue cases. It doesn't matter that we don't detect FAT behind layers. It doesn't matter that we don't know if Windows has or has not touched it. Not only because these would be a very tiny minority of the cases. More importantly because, again, disabling the optimization has no downsides.

I dislike the idea of exposing another obscure preference that nobody knows how or when to use. Even when having the preference, the heavy lifting should still be done by auto-detection, leaving manual tuning for expert users and debugging purposes.

The fact that this issue has seemingly not been reported before is very strange to me. I would have thought that people syncing with USB sticks or drives with (ex)FAT filesystems was a slightly common sight. Since this hasn't come up before, I can only assume that people syncing to FAT drives are either setting fastcheck=false (doubtful) or it shows that this new preference is not needed in practice.

@gdt
Copy link
Collaborator

gdt commented Feb 17, 2025

This issue seems to be 100% about working around a Windows bug. Assuming that's true (and contrary evidence of course welcome), ethically it's on MS and proprietary software users to fix that. It is fundamentally unreasonable to expect volunteer Free Software developers to accomodate buggy proprietary software because people somehow are ok with that software not being fixed.

I would like to defer everything about this until the OP files a bug with MS and we see the resolution.

@Artur2048
Copy link
Author

Artur2048 commented Feb 17, 2025

Unfortunately Microsoft does not care about things like that... and would probably not bother to fix it.

As i understand the simple and effective "workaround" is already present in Windows version of Unison.
Windows version of Unison does not rely on directory modification times... i think Linux version behavior should just be consistent.

With option fat=true the directory fast scan could automatically be disabled,
or the dirfastcheck option could be added to allow users to workaround this issue.

Please keep in mind that even if you magically make Microsoft fix their Win-11 system, there are other systems which will remain unpatched (Win-10, Win-7, etc..)

Last, but not least - ExFAT system is used in MANY devices and their SD cards:

  • photo cameras
  • mobile phones
  • tablets

Unison could be used to synchronize contents of SD card which was written by Nikon camera, or Samsung tablet, or any other device...

Also, if someone is using ExFAT on a Linux machine - there's probably a reason.
I have 2 operating systems on my laptop (it's very common practice). My "data partition" is formatted in ExFAT so I can access it from my Linux OS as well as from Windows. I also use Unison to synchronize contents of that partition to a remote NAS. Now any file that was added there while running Windows is missed.
So even if Linux is handling ExFAT correctly... if we see an ExFAT drive in Linux - chances are that this drive is also shared with another OS as well. (otherwise it would probably be formatted with EXT4).

@gdt
Copy link
Collaborator

gdt commented Feb 17, 2025

Sounds like you are decliing to file a bug report.

@acolomb
Copy link

acolomb commented Feb 17, 2025

If I may chime in... Expecting any change from Microsoft or any other proprietary software vendor here is purely wishful thinking. It just won't happen and we all know that. Even if some vendor fixed their bug, a large portion of users would still be stuck with a version that just doesn't get that update, because the vendor dictates who gets the update, and simple business economics tells me that they will limit the target audience to the absolutely necessary minimum.

Personal experience shows this pattern even with smaller companies, in my case Synology. After several attempts and even some friendly discussions with real persons calling themselves "product management", it all just didn't lead anywhere. Some FOSS-head requesting access to an existing, half-documented API is just economically uninteresting.

So insisting on ethical responsibility doesn't help anybody. Free Software exists because people had a motivation to improve their situation and let others benefit from that work for free. What would the FOSS landscape look like if we had just considered Microsoft Windows' lacking POSIX compliance a "bug on their end"? Time spent on such a discussion is mostly wasted IMHO. It's quite reasonable to add a workaround for a specific system's bugs if that system happens to be used by a large portion of the user base. Especially if Unison's resulting behavior may silently cause troubles that go unnoticed, depending on some very specific file access pattern (added only, no other changes in the same dir, using a sensible OS on the file system).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss way forward is unclear; needs discussion of approach to take and why windows not applicable to systems other than Microsoft Windows
Projects
None yet
Development

No branches or pull requests

4 participants