-
-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make IgnoreFile
more efficient
#2086
Conversation
[EDIT: ignore, I found a Windows machine!] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Patrick, thank you for this PR!
I'm ok with improving the ignore file, as this isn't one of the strongest areas of the tool.
That being said, it is questionable whether this should be fixed at all.
Fantomas has support for multiple paths so any filtering of files could be solved outside of the tool. The thought has crossed my mind to remove the .fantomasignore
file in the next version.
One reason to do this is just to get rid of another dependency (our current solution isn't without hiccups, for example, markashleybell/MAB.DotIgnore#9).
I haven't made up my mind though, there is an ease of use factor of course in having our current fantomasignore
setup. So, what I'm trying to get at here, is the question: 'Is your actual problem solvable outside of Fantomas'?
Lastly, I would ask you to use your own fork
of Fantomas when raising PRs. No real reason, other than having the same experience for other contributors. This isn't listed in the contribution guidelines as, well, you are one of the few people that has permission to do this.
Let me know what you think.
src/Fantomas.Extras/IgnoreFile.fs
Outdated
/// Store of the IgnoreFiles present in each folder discovered so far. | ||
/// This is to save repeatedly hitting the disk for each file, and to save | ||
/// loading the IgnoreLists from the disk repeatedly (which is nontrivially expensive!). | ||
let private ignoreFiles: ConcurrentDictionary<string, IgnoreList option> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we could not work with a mailbox processor instead?
src/Fantomas.Extras/IgnoreFile.fs
Outdated
/// loading the IgnoreLists from the disk repeatedly (which is nontrivially expensive!). | ||
let private ignoreFiles: ConcurrentDictionary<string, IgnoreList option> = | ||
ConcurrentDictionary() | ||
|
||
let isIgnoredFile (file: string) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The daemon is using this file function so I guess any changes to the content of the ignore will not be picked up.
Perhaps we could add a timestamp of when this was parsed to the cache and re-parse when the file changed.
Thinking out loud here.
Noted, thanks - I don't think GitHub lets me change the source branch of a PR, but future ones will come from my fork. The problem is solvable outside Fantomas, but it's not all that easy. Specifically, GR's internal CI pipeline definitions make it a mighty faff to work out what files have changed in a given pull request or even a given commit, so I gave up on the idea "let's have Fantomas run on precisely the set of changed files". (This also makes a pre-push Git hook much less easy, since now you need to know somehow what merge base to use for comparison; most of the time it'll just be master and you can I think my intended uses of Note that the converse strategy is very easy for rolling our own: keeping a list of files and directories that Fantomas should format. However, that approach means new files would be unformatted by default, which I really don't want to encourage! |
Hey, thanks for elaborating. Sorry if I appear a bit too strong in my initial response. We can pursue this PR if the daemon usage is not impacted by it. That is my biggest concern at the moment. |
It would be easy enough to adjust so that e.g. the cache was invalidated on a 5sec or 10sec timer or something, although I don't know how I'd go about making it automatically pick up changes to the By the way, I can't reproduce the CI failure. On the assumption that I've got my concurrent code wrong, I'll take your advice and switch to a mailbox processor. |
I guess the big question, if we go ahead with this PR, is what the cache invalidation policy should be. I'm inclined to say something simple like "invalidate the entire cache every five seconds", though the structure I've got here would be easily adapted to invalidating only specific files. |
I put up an example cache invalidation policy in bb01625. |
let ignoreFileInvalidation = | ||
let rec go () = | ||
async { | ||
do! Async.Sleep(TimeSpan.FromSeconds 5.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a time-based mechanism to clear the cache? This rubs me the wrong a bit.
Could we not parse the ignore file on demand and just keep track of the location of the ignore file.
Or we parse the ignore file and take the file's last modified timestamp into account.
We re-evaluate this timestamp and if the file is newer, we invalidate the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not in general possible to keep track of the location of the ignore-file, because someone might put another ignore-file nearer to the file under consideration. If we even might invalidate the cache on every request, then we must traverse the entire filesystem up to the cached ignore-file on every request (unless the OS can call us when a relevant file has changed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is not per se a problem, but filesystem access is really quite slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That scenario of someone adding a new .fantomasignore
file somewhere closer feels like a quite theoretical one.
It is highly unlikely to begin with that there is a .fantomasignore
file somewhere else than the repository root.
This is an assumption but I would really like to go with it. When we first introduce the concept of the ignore file, we only checked the working directory of the cli tool.
Because that didn't work for the daemon, we started traversing parent directories until we potentially found something.
Maybe that was a bad idea and we should split up the mechanism, only traverse for the daemon (and having no cache at all) and look in the pwd
for the regular cli tool usage.
Ah, if traversal isn't really supported for the command line tool then that would be a lot simpler - it would be a tiny bit more inconvenient but really not a big problem.
I would like to preserve the caching, though (now with only one element of the cache). Repeatedly hitting the disk and parsing an ignore file is still expensive.
-------- Original Message --------
…On 14 Feb 2022, 09:35, Florian Verdonck wrote:
@nojaf commented on this pull request.
---------------------------------------------------------------
In [src/Fantomas.CoreGlobalTool/Daemon.fs](#2086 (comment)):
> @@ -30,10 +33,44 @@ type FantomasDaemon(sender: Stream, reader: Stream) as this =
let exit () = disconnectEvent.Set() |> ignore
+ let ignoreFileStore =
+ new IgnoreFileStore<_>(FileSystem(), IgnoreList, (fun ignoreList path -> ignoreList.IsIgnored(path, false)))
+
+ let ignoreFileCancellation = new CancellationTokenSource()
+
+ let ignoreFileInvalidation =
+ let rec go () =
+ async {
+ do! Async.Sleep(TimeSpan.FromSeconds 5.0)
That scenario of someone adding a new .fantomasignore file somewhere closer feels like a quite theoretical one.
It is highly unlikely to begin with that there is a .fantomasignore file somewhere else than the repository root.
This is an assumption but I would really like to go with it. When we first introduce the concept of the ignore file, we only checked the working directory of the cli tool.
Because that didn't work for the daemon, we started traversing parent directories until we potentially found something.
Maybe that was a bad idea and we should split up the mechanism, only traverse for the daemon (and having no cache at all) and look in the pwd for the regular cli tool usage.
—
Reply to this email directly, [view it on GitHub](#2086 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAX6DVJHAZSJB53JZFALWJTU3DEGDANCNFSM5OFMIRKA).
Triage notifications on the go with GitHub Mobile for [iOS](https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675) or [Android](https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub).
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
(I'm locally developing on aarch64-darwin net6.0, which means my local environment is considerably different from the net5.0 that Paket etc are set up for; and there is at least one bug in Fake on my architecture. So there may be some iterating here while I satisfy the CI checks.)
The purpose of this PR is to dramatically improve the perf of the IgnoreFile checks.
Before: on every file, we traversed the filesystem, potentially up to the filesystem root, then read in the first
.fantomasignore
file we found.After: we touch the filesystem only if we can't already know the answer to the question "is there a
.fantomasignore
file here?"; and we only ever parse a given.fantomasignore
file once (modulo multithreading race conditions).Even more efficient for the
-r
mode of Fantomas (which I know is deprecated, but I really like it :( ) would be if we could somehow parse the .fantomasignore files up front, discovering all the skipped files, and then simply not touch them. But I've tried to keep this PR very tightly contained.I've also fixed a potential bug where the alternative directory separator character was being ignored in
isIgnoredFile
; an alternative fix would be to usefullPath
instead offile
, which I think would normalise slashes, but this way is more obviously correct.Question: is this problematic for the daemon? A long-running Fantomas process might need to invalidate this currently-permanent cache at some point; otherwise the user could edit the
.fantomasignore
files but we'll never reread them.