Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symlinks can cause duplicate entries #138

Open
43081j opened this issue Mar 24, 2025 · 2 comments
Open

symlinks can cause duplicate entries #138

43081j opened this issue Mar 24, 2025 · 2 comments

Comments

@43081j
Copy link
Contributor

43081j commented Mar 24, 2025

Given the following:

import mock from "mock-fs";
import { fdir } from "./dist/index.js";

mock({
  "/sym/linked": {
    "file-1": "file contents",
  },
  "/some/dir": {
    dirSymlink: mock.symlink({
      path: "/sym/linked",
    }),
    fileSymlink: mock.symlink({
      path: "/sym/linked/file-1",
    }),
  },
});
const api = new fdir().withSymlinks().crawl("/");
const files = await api.sync();
console.log(files);

files will contain the same entry 3 times (/sym/linked/file-1).

if you change this to withPromise (async), it will contain the same entry 2 times.

so there are two problems here:

  1. inconsistency between async and sync
  2. duplicating symlink resolutions

inconsistent results (async vs sync)

This is because async roughly visits the FS breadth-first (i.e. it visits all the top-level files, then 2nd level, and so on).

because of that, it visits /sym/linked before visiting /some/dir/dirSymlink. that means this code is hit and /sym/linked isn't queued up again (because the callback is never called thanks to this early return).

when we switch to sync mode, we visit the FS depth-first.

this means we visit /some/dir/dirSymlink before we visit /sym/linked. of course, that means we haven't yet visited it, so we queue it up to be visited.

meanwhile, when we do eventually visit /sym/linked normally (i.e. not because of the symlink), we don't check visited at all and push the result.

because of all of this, we get 2 copies of /sym/linked

duplicating symlink resolutions

the race condition explained above causes one of the duplicates. but that code isn't even really meant to be dealing with dupes, it exists to avoid infinite loops.

so this points at a deeper problem we should fix in the walker most likely, rather than trying to fix it in the symlink resolution code

basically, if we visit symlinks before the things they link to, we push them twice right now.

somewhere, we should probably be checking visited before traversing directories in the walker (visited is only directories afaict)

@thecodrr
Copy link
Owner

Thank you for the detailed bug report. Will take a look at this soon. Feel free to open a PR if you have a working fix for this.

@43081j
Copy link
Contributor Author

43081j commented Mar 25, 2025

i don't quite have a fix for it yet but will try take a look this week unless you beat me to it

if anyone else wants to take a look too (or yourself), i think my suggestion is that when we encounter a directory, we always check if it is already in visited. this solves one half of the problem

the other half of the problem is that you may have a symlink to a file, and those don't get tracked in visited. so this can happen:

  • visit /foo/symlink-1 (which resolves to /bar/file-1)
  • visit /bar/file-1

we now have ['/bar/file-1', '/bar/file-1']. but in async, and if we visited them in reverse, we'd have one entry.

this suggests we need to de-dupe in pushFile, but maybe that will affect performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants