Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include nested gitignores with pants_ignore to ignore for file watching #18654

Closed

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Apr 2, 2023

Closes #5682.

We need a single Gitignore for the whole Pants run to set up here:

let ignorer =
GitignoreStyleExcludes::create_with_gitignore_file(ignore_patterns, gitignore_file)
.map_err(|e| format!("Could not parse build ignore patterns: {e:?}"))?;
let watcher = if watch_filesystem {
let w = InvalidationWatcher::new(executor.clone(), build_root.clone(), ignorer.clone())?;
w.start(&graph)?;
Some(w)
} else {
None
};

We can do that by merging gitignores from subdirectories as if they were in the top-level gitignore. We only need to relativize all their globs and ensure they take higher precedence.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

Note that the downside of using walkdir is that our filesystem watching will not be invalidated when .gitignore files are changed. For the root .gitignore we work around that by adding .gitignore as an invalidation glob which restarts pantsd on change, but I don't know if we should do that in this case, because it would mean walking the entire repository on startup.

I think that I would be ok with noting that as a caveat in the help for the option, but it would be worth opening a new issue about.

@Eric-Arellano
Copy link
Contributor Author

Note that the downside of using walkdir is that our filesystem watching will not be invalidated when .gitignore files are changed. For the root .gitignore we work around that by adding .gitignore as an invalidation glob which restarts pantsd on change, but I don't know if we should do that in this case, because it would mean walking the entire repository on startup.

This PR already now walks the entire repository on startup to create the ignores.

This would be by changing pantsd_invalidation_globs, right? Possibly setting a default arg to **/.gitignore?

I think that I would be ok with noting that as a caveat in the help for the option, but it would be worth opening a new issue about.

I'm okay with a warning, but strongly prefer fixing it if feasible.

@Eric-Arellano Eric-Arellano changed the title [blocked by #18649] Include nested gitignores with pants_ignore to ignore for file watching Include nested gitignores with pants_ignore to ignore for file watching Apr 5, 2023
@Eric-Arellano Eric-Arellano marked this pull request as ready for review April 5, 2023 12:49
@stuhood
Copy link
Member

stuhood commented Apr 6, 2023

This would be by changing pantsd_invalidation_globs, right? Possibly setting a default arg to **/.gitignore?

Yea. And my concern would be that that is the first "entire repo" glob in there (rather than being anchored somewhere at the root), meaning that the glob that Pants needs to poll to decide whether to restart is invalidated by any change anywhere in the repo.

I'm not sure how much of a concern that is though. #10705 will make it cheaper to clean nodes which haven't actually been impacted by a change.

@Eric-Arellano
Copy link
Contributor Author

And my concern would be that that is the first "entire repo" glob in there (rather than being anchored somewhere at the root), meaning that the glob that Pants needs to poll to decide whether to restart is invalidated by any change anywhere in the repo.

Ah, that's the difference. This current implementation only walks the entire repo once, whereas if we change invalidation globs we'd now be polling the whole repo.

But question, is that any different than the status quo? For example, this is our current config:

pants/pants.toml

Lines 48 to 54 in c49e56a

pantsd_invalidation_globs.add = [
"!*_test.py",
"!BUILD",
# NB: The `target` directory is ignored via `pants_ignore` below.
"src/rust/engine/**/*.rs",
"src/rust/engine/**/*.toml",
]

Unless the pattern is prefixed by / or has a / in the middle of the pattern, it is equivalent to **/pattern, per Git's ignore semantics. So would this change be any worse than before?

@stuhood
Copy link
Member

stuhood commented Apr 6, 2023

Unless the pattern is prefixed by / or has a / in the middle of the pattern, it is equivalent to **/pattern, per Git's ignore semantics. So would this change be any worse than before?

I think that because of how we implement our globs, those existing globs aren't equivalent: we don't have recursive semantics by default in our globs (without a leading **), so globs are implicitly anchored.

@cognifloyd
Copy link
Member

Unless the pattern is prefixed by / or has a / in the middle of the pattern, it is equivalent to **/pattern, per Git's ignore semantics. So would this change be any worse than before?

I think that because of how we implement our globs, those existing globs aren't equivalent: we don't have recursive semantics by default in our globs (without a leading **), so globs are implicitly anchored.

@stuhood I don't know about that. These globs look like they follow the gitignore rules. Are you saying they will only match files in the repo root?

pants/pants.toml

Lines 48 to 50 in 2486ba8

pantsd_invalidation_globs.add = [
"!*_test.py",
"!BUILD",

# Explicitly specified globs are already relative, and are added verbatim.
invalidation_globs.update(
("!*.pyc", "!__pycache__/", ".gitignore", *bootstrap_options.pantsd_invalidation_globs)
)

@stuhood
Copy link
Member

stuhood commented May 25, 2023

@stuhood I don't know about that. These globs look like they follow the gitignore rules. Are you saying they will only match files in the repo root?

Confusingly, exclude patterns do match gitignore rules fairly precisely (...because they are implemented using a gitignore crate). But include rules do not, and are a separate implementation in order to provide laziness and filesystem watching.

The difference in this case is that our include rules are not implicitly recursive. To make them recursive, you need the **.

@benjyw
Copy link
Contributor

benjyw commented Sep 16, 2023

What's the status on this?

@stuhood
Copy link
Member

stuhood commented Sep 18, 2023

I think that this approach isn't feasible from a performance perspective. I've closed #5682 and opened #19860, specific to the missing piece here.

@stuhood stuhood closed this Sep 18, 2023
@Eric-Arellano Eric-Arellano deleted the nested-gitignores branch September 18, 2023 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support nested .gitignore files in --pants-ignore-use-gitignore
4 participants