Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dirty the dependents of uncacheable nodes #9015

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Jan 26, 2020

Problem

The rust level Node::cacheable flag is currently only used to mark @goal_rules as uncacheable (because they are allowed to operate on @sideeffecting types, such as the Console and the Workspace). But since the implementation of cacheable did not allow it to operate deeply in the Graph, we additionally needed to mark their parent Select nodes uncacheable, and could not use the flag in more positions.

Via #7350, #8495, #8347, and #8974, it has become clear that we would like to safely allow nodes deeper in the graph to be uncacheable, as this allows for the re-execution of non-deterministic processes, or re-consumption of un-trackable state, such as:

  1. a process receiving stdin from a user
  2. an intrinsic rule that pokes an un-watched file on the filesystem
  3. reading from a stateful process like git

Note that these would all be intrinsic Nodes: it's not clear that we want to expose this facility to @rules directly.

Solution

Finish adding support for uncacheable nodes. Fixes #6598.

When an uncacheable node completes, it will now keep the value it completed with (in order to correctly compute a Generation value), but it will re-compute the value once per Session. The accurate Generation value for the uncacheable node allows its dependents to "clean" themselves and not re-run unless the uncacheable node produced a different value than it had before.

Result

The Node::cacheable flag may be safely used deeper in the graph, with the semantics that requests for any of an uncacheable node's dependents will cause it to re-run once per Session. The dependents will not re-run unless the value of the uncacheable node changes (regardless of the Session).

@stuhood
Copy link
Member Author

stuhood commented Jan 26, 2020

The first commit removes some unnecessary generic parameters that I encountered while adding the Context parameter to cacheable, but does not change any logic: the second contains the described change.

@stuhood

This comment has been minimized.

@stuhood stuhood force-pushed the stuhood/dirty-the-dependents-of-uncacheable-nodes branch from 2aead47 to e6c3662 Compare January 26, 2020 20:55
@stuhood
Copy link
Member Author

stuhood commented Jan 26, 2020

Ok, reviewable!

@stuhood stuhood force-pushed the stuhood/dirty-the-dependents-of-uncacheable-nodes branch from e6c3662 to b3dfed9 Compare January 26, 2020 21:05
Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! One open question about uncacheable nodes having dependencies we should resolve, then I'm happy to approve :)

}
}

/// Iff the value is Clean, mark it Dirty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like either this should happen also if Uncacheable, or Uncacheable nodes shouldn't be allowed to have dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soooo, possibly? dirty means: "might run again in any session", uncacheable means: "will only run again in a new session". So the gap between them is that if something uncacheable is dirtied during a session, it's unclear whether to re-run it.

I think that because we're asserting that these are not side-effecting operations, re-running them (in or out of a single session) is safe. Not clear... will take a look at what it might do to the data model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suspect if we're not going to dirty them, we shouldn't allow deps; I'm imagining something like (I'm aware we don't support non-cacheable @rules now, but we should make the framework force us to think about it if we add them):

@rule(cacheable=False)
def foo():
  executable = yield Get[Snapshot](some_path_globs)
  do_sideeffecting_thing_with(executable)

where the side-effecting rule has a dep, someone edits the snapshotted file, we re-snapshot because that was invalidated, but we don't re-run the rule because it's side-effecting. Erroring out (ideally on the first get from the side-effecting rule, rather than on invalidation), rather than silently not running the rule, feels safer/less surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ftr: this is not supposed to represent side-effecting things. Those are still allowed only at the root.

Copy link
Member Author

@stuhood stuhood Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, apologies. I just realized that your example is still relevant in a @goal_rule (marked uncacheable), meaning that sideeffects are in fact relevant here.

But I'm not sure I understand why your example motivates 1) erroring, or 2) re-running the uncacheable rule. If anything, it feels like that motivates not dirtying it, and keeping the "exactly once per session" semantics, which would have the effect that the rule ran once to completion per session, even if things below it and above it in the graph re-ran.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A specific (admittedly, somewhat contrived) example I can think of is:

  • Some checked-in file contains a "git branch name which should be used to fetch golden test files"
  • Some uncacheable rule will run git to get the contents of a file at that sha. It depends on reading that file, to construct the command line to run. Something like:
git_branch_name = read(some_path_in_git)
golden_value = execute("git", "show", f"{git_branch_name}:golden_file")
use_golden_value()

As I believe the current PR stands, if this rule is marked as once-per-session, and the contents of some_path_in_git changes, we won't re-run this rule.

It's unclear whether we should.
I can argue that we shouldn't re-run (because in intent, "once-per-session" isn't actually an "exactly-once" guarantee, it's an "at-least once" guarantee - the point of the feature is to say "Yeah, this otherwise would be a cache hit, but we're going to pretend it wasn't because we know it reads external state").
I can argue that it should (because something regular which it depends on was invalidated, and that's how invalidation generally works).

Not allowing deps feels like a safe thing to do, because forcing people to decompose their rules to be one of "uncacehable, but no deps" or "if any of its deps change, it will rerun" seems reasonable. (Except ugh then goal_rules still need to be special).
Re-running feels like a safe thing to do, because we're saying these things shouldn't be side-effecting, so it's safe for us to run these things as often as we want.
Allowing deps, but not re-running, feels like it opens up a chance for stale results and unsurprising behaviour.

Copy link
Member Author

@stuhood stuhood Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except ugh then goal_rules still need to be special

Right... this was my realization above. And it's less about whether we want side-effecting things deeper in the graph (we definitely do not), and more about whether we want multiple types of classifications of nodes (IMO, we probably do not any time soon).

I think the way I see this is that because cacheable rules are deterministic-ish and not side-effecting, automatically retrying them to attempt to improve the observed transactionality of rules is good and healthy, and that's what we do when we dirty nodes while they run. But, it's not strictly required... it's best effort. Given that, not making an effort for uncacheable nodes (and not attempting to differentiate them from side-effecting nodes) seems reasonable.

// independent of matching Generation values. This is to allow for the behaviour that an
// uncacheable Node should always have dirty dependents, transitively.
if !entry.node().cacheable(context) || !entry.is_clean(context) {
complete_as_dirty = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete_as_dirty is ignored if !cacheable - should this maybe be a tri-state enum we can match on, rather than a function call which we'll repeat and a bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than a function call which we'll repeat and a bool?

We don't repeat the function call... the calls to cacheable(..) here are for the dependencies, while the call inside Entry::complete is for the node itself. Given that, I don't think the enum makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On re-read, I think I see what you are suggesting... can try that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this isn't straightforward: the complete_as_dirty boolean comes into play for most of the states of the Entry::complete(.., result) parameter, which means that replacing those two parameters with an enum would require a five state enum I think. I tried it and ran into a roadblock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... Do you know what the names of those 5 enum variants would be? I'm curious as to whether that enum would be more clear (even if it's more verbose), given that my read of this code was that there would be 3 enum variants, it seems like I mis-understood the behaviour as it's currently written...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relevant properties are:

  • node was/wasn't cleaned - It's possible for a node that depends on an uncacheable node to be "cleaned", which means that our result value is None and skips cloning its value to complete it.
  • complete as dirty/clean - This applies to both "node was/wasn't cleaned" states: a node that was cleaned without re-running uses it's previous value, but stays dirty. It's not possible for the caller of Entry::complete to know this though, because the previous value is stored in the Entry (without shenangians).
  • (un)cacheable - When a node is uncacheable, the properties above aren't relevant; otherwise they are.

It's possible that the closure that "cleans" a Node (here) could clone the previous value, and then Graph::complete would be able to compute the final result value to pass to Entry::complete... but I think that would require cloning the previous result value in order to clean things, which is not cheap.

In short: I don't see a good way to do this right now.

Copy link
Member Author

@stuhood stuhood Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would clarify things to rename "complete_as_dirty" to "has_dirty_dependencies", which makes it less verby, and clarifies that not using that value in some codepaths is "a-ok"?

@stuhood stuhood force-pushed the stuhood/dirty-the-dependents-of-uncacheable-nodes branch from b3dfed9 to 550b650 Compare January 27, 2020 19:04
Copy link
Contributor

@gshuflin gshuflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me - I don't have any comments other than the ones @illicitonion already mentioned.

@stuhood stuhood force-pushed the stuhood/dirty-the-dependents-of-uncacheable-nodes branch from 550b650 to 4c02bc1 Compare February 10, 2020 06:28
@stuhood
Copy link
Member Author

stuhood commented Feb 10, 2020

Rebased and added a commit that uses the SessionId in Node::peek, which required propagating it quite a few places, but fixed tests which were failing because they could no longer view errors rendered in traces.

@stuhood
Copy link
Member Author

stuhood commented Feb 13, 2020

@illicitonion , @cosmicexplorer : Aside from figuring out what order to land this in relative to #8858, I think it is ready to go.

Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!! As mentioned in #8858 this PR is cleared to land first!

fn reachable_digest_count(&self, roots: &[N]) -> usize {
fn reachable_digest_count(&self, roots: &[N], context: &N::Context) -> usize {
// TODO: This is a surprisingly expensive method, because it will clone all reachable values by
// calling `peek` on them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment!

/// have Session-specific semantics. More than one context object might be associated with a
/// single caller "session".
///
type SessionId: Clone + Debug + Eq;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!

@@ -267,6 +277,14 @@ impl Scheduler {
m
}

///
/// Return all Digests currently in memory in this Scheduler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly add a note on what this is useful for?

@stuhood stuhood force-pushed the stuhood/dirty-the-dependents-of-uncacheable-nodes branch from 4c02bc1 to aa82b43 Compare February 15, 2020 22:10
@stuhood stuhood merged commit da5992d into pantsbuild:master Feb 15, 2020
@stuhood stuhood deleted the stuhood/dirty-the-dependents-of-uncacheable-nodes branch February 15, 2020 22:11
@stuhood
Copy link
Member Author

stuhood commented Feb 15, 2020

Renamed complete_as_dirty to has_dirty_dependencies, and gambled that that wouldn't break CI.

Thanks for the reviews!

stuhood added a commit that referenced this pull request Mar 12, 2020
#9271)

### Problem

`StoreGCService` was expecting a `SchedulerSession`, but getting a `Scheduler`. This meant that if `pantsd` ran long enough (the default value for "long enough" was too long to ever be caught in tests), it would crash with a failure to call `lease_files_in_graph` due to the method signature change in #9015.

### Solution

Although just adding enough type hints here would hypothetically be enough for mypy to catch this (?), additionally added a coverage test for `StoreGCService`.

### Result

`pantsd` stays up as long as it should.
stuhood added a commit that referenced this pull request Mar 12, 2020
#9271)

### Problem

`StoreGCService` was expecting a `SchedulerSession`, but getting a `Scheduler`. This meant that if `pantsd` ran long enough (the default value for "long enough" was too long to ever be caught in tests), it would crash with a failure to call `lease_files_in_graph` due to the method signature change in #9015.

### Solution

Although just adding enough type hints here would hypothetically be enough for mypy to catch this (?), additionally added a coverage test for `StoreGCService`.

### Result

`pantsd` stays up as long as it should.
wisechengyi pushed a commit that referenced this pull request May 4, 2020
#9271)

### Problem

`StoreGCService` was expecting a `SchedulerSession`, but getting a `Scheduler`. This meant that if `pantsd` ran long enough (the default value for "long enough" was too long to ever be caught in tests), it would crash with a failure to call `lease_files_in_graph` due to the method signature change in #9015.

### Solution

Although just adding enough type hints here would hypothetically be enough for mypy to catch this (?), additionally added a coverage test for `StoreGCService`.

### Result

`pantsd` stays up as long as it should.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dirty the dependents of an uncacheable Node
4 participants