Two targets can swap positions with pantsd #7583

illicitonion · 2019-04-17T15:05:38Z

Before this PR, nothing would remove the edges of a dirty node, so if
two nodes swapped positions in the graph (e.g. if a dependency between
two targets inverted), a cycle would be detected.

With this PR, if we detect a cycle, but detect that there may be dirty
edges in play, we fully clear that node (including removing its edges),
which will cause it being re-triggered from scratch.

This is specifically in place to handle the cycle scenario - the dirty
bit, and dependency Generations are still the primary mechanism for
handling re-use of old versions.

There's an ugliness here that we still don't remove obsolete edges, so
if Generation 2 of a node has differing dependencies from Generation 1,
the dependency from Generation 1 will still dirty Generation 2. We may
want to consider solving that separately as/when it becomes a
significant issue, or we may want to re-work this PR to do something
like that... This PR happens to cover a part of that problem, but only
where it causes definitive problems (a fake cycle) rather than also
where it causes performance problems.

There's probably a slightly more principled solution here along the
lines of:

Rather than using () as an edge weight in the graph, use the
Generation of the dependee Node as an edge weight.
When doing cycle detection, compare the edge weight against the
generation of the node, and ignore obsolete edges.
but I would want to think about that a lot more before doing it...

stuhood

Thanks a ton for looking at this!

src/rust/engine/graph/src/entry.rs

stuhood · 2019-04-17T18:45:51Z

src/rust/engine/graph/src/entry.rs

@@ -360,6 +381,7 @@ impl<N: Node> Entry<N> {
            } else {
              None
            },
+            dirty, // TODO: Should this also cover uncacheable?


It feels like it is definitely related, yea. If this dirty value is associated with the previous_ value(s), then in the case where we've said: "you should definitely not trust/reuse the previous value", we should also not trust its edges.

But see the comments on #6598... it's pretty likely that the implementation of cacheability should switch to an implementation that changes the identity of the node each time (possibly by changing parameter identities)... and that would make this less relevant I think.

I think the node identity-based uncacheability approach seems like a good idea, could link to that issue here.

This now dirties in both cases, but I agree that reworking this in the future would be nice.

src/rust/engine/graph/src/entry.rs

src/rust/engine/graph/src/lib.rs

cosmicexplorer

This is a decidedly nontrivial issue and I'm very glad we have a handle on why it happens and how to fix it!

I've noted multiple places that I believe would strongly benefit from copious use of one-off enums. We can probably merge this PR first and then follow up with later enum changes to avoid blocking the fix.

When doing cycle detection, compare the edge weight against the
generation of the node, and ignore obsolete edges.
but I would want to think about that a lot more before doing it...

Is there additional complexity to implementing this beyond "we now have to compare generations", or is there a concern this would introduce difficult-to-debug errors?

cosmicexplorer · 2019-04-17T19:16:26Z

src/rust/engine/graph/src/entry.rs

@@ -62,10 +62,25 @@ pub(crate) enum EntryState<N: Node> {
  // The previous_result value is _not_ a valid value for this Entry: rather, it is preserved in
  // order to compute the generation value for this Node by comparing it to the new result the next
  // time the Node runs.
+  //
+  // A note on dirty as was_dirty:


Suggested change

// A note on dirty as was_dirty:

// A note on dirty versus was_dirty:

Replaced with different docs on the new enum

src/rust/engine/graph/src/entry.rs

cosmicexplorer · 2019-04-17T19:23:53Z

src/rust/engine/graph/src/entry.rs

@@ -541,7 +565,7 @@ impl<N: Node> Entry<N> {
  ///
  /// Clears the state of this Node, forcing it to be recomputed.
  ///
-  pub(crate) fn clear(&mut self) {
+  pub(crate) fn clear(&mut self, graph_still_contains_edges: bool) {


This could also be converted into its own one-off enum instead of a bool.

src/rust/engine/graph/src/lib.rs

cosmicexplorer · 2019-04-17T19:27:11Z

src/rust/engine/graph/src/lib.rs

@@ -514,17 +541,21 @@ impl<N: Node> Graph<N> {
          // TODO: doing cycle detection under the lock... unfortunate, but probably unavoidable
          // without a much more complicated algorithm.


Unrelated: I would be interested in any thoughts on how to estimate the speedup we might get from incremental cycle detection (possibly just by using a profiler?) instead of holding the lock.

src/rust/engine/graph/src/lib.rs

tests/python/pants_test/pantsd/test_pantsd_integration.py

stuhood

Also, would it be possible to include the unit tests from master...twitter:stuhood/dirty-cycle-detection here? Can also mark this one as fixing #7404.

stuhood · 2019-04-18T00:29:06Z

Also also:

There's an ugliness here that we still don't remove obsolete edges...

It would be good to incorporate some of the PR description into a TODO somewhere in the code. Definitely fine with leaving "non-problematic" edges in place for now and revisiting it in the (distant) future!

stuhood

Thanks!

src/rust/engine/graph/src/entry.rs

stuhood · 2019-04-18T21:37:13Z

src/rust/engine/graph/src/entry.rs

@@ -428,6 +471,9 @@ impl<N: Node> Entry<N> {
            "Not completing node {:?} because it was invalidated before completing.",
            self.node
          );
+          if let Some(previous_result) = previous_result.as_mut() {


Nit: Possible that the EntryResult enum could gain a "NotPresent" variant to incorporate the None case? Possibly not worth it.

Will leave for now, can add in the future if needed.

src/rust/engine/graph/src/entry.rs

stuhood · 2019-04-18T21:44:03Z

src/rust/engine/graph/src/entry.rs

@@ -567,6 +616,12 @@ impl<N: Node> Entry<N> {

    trace!("Clearing node {:?}", self.node);

+    if graph_still_contains_edges {
+      if let Some(previous_result) = previous_result.as_mut() {


Yea, a lot of these would be eliminated by a "NotPresent" variant.

src/rust/engine/graph/src/lib.rs

Before this PR, nothing would remove the edges of a dirty node, so if two nodes swapped positions in the graph (e.g. if a dependency between two targets inverted), a cycle would be detected. With this PR, if we detect a cycle, but detect that there may be dirty edges in play, we fully clear that node (including removing its edges), which will cause it being re-triggered from scratch. This is specifically in place to handle the cycle scenario - the dirty bit, and dependency Generations are still the primary mechanism for handling re-use of old versions. There's an ugliness here that we still don't remove obsolete edges, so if Generation 2 of a node has differing dependencies from Generation 1, the dependency from Generation 1 will still dirty Generation 2. We _may_ want to consider solving that separately as/when it becomes a significant issue, or we may want to re-work this PR to do something like that... This PR happens to cover a part of that problem, but only where it causes definitive problems (a fake cycle) rather than also where it causes performance problems. There's probably a slightly more principled solution here along the lines of: * Rather than using () as an edge weight in the graph, use the Generation of the dependee Node as an edge weight. * When doing cycle detection, compare the edge weight against the generation of the node, and ignore obsolete edges. but I would want to think about that a lot more before doing it...

So that we can report whole paths which would cause the cycle. This will slow down cycle detection a little; if it becomes a problem, we could do a Dijkstra run, and only if we detect a cycle (which is the rare case), do the Bellman-Ford. But we've also been talking about trying to do incremental cycle detection, so I'm not going to worry too much about this unless it starts posing a noticeable problem.

This reverts commit 9c121f1.

Before this PR, nothing would remove the edges of a dirty node, so if two nodes swapped positions in the graph (e.g. if a dependency between two targets inverted), a cycle would be detected. With this PR, if we detect a cycle, but detect that there may be dirty edges in play, we fully clear that node (including removing its edges), which will cause it being re-triggered from scratch. This is specifically in place to handle the cycle scenario - the dirty bit, and dependency Generations are still the primary mechanism for handling re-use of old versions. There's an ugliness here that we still don't remove obsolete edges, so if Generation 2 of a node has differing dependencies from Generation 1, the dependency from Generation 1 will still dirty Generation 2. We _may_ want to consider solving that separately as/when it becomes a significant issue, or we may want to re-work this PR to do something like that... This PR happens to cover a part of that problem, but only where it causes definitive problems (a fake cycle) rather than also where it causes performance problems. There's probably a slightly more principled solution here along the lines of: * Rather than using () as an edge weight in the graph, use the Generation of the dependee Node as an edge weight. * When doing cycle detection, compare the edge weight against the generation of the node, and ignore obsolete edges. but I would want to think about that a lot more before doing it...

…antsbuild#7617)" This reverts commit 5de9012.

This reverts commit 5de9012.

illicitonion requested review from stuhood, blorente and ity April 17, 2019 15:05

stuhood reviewed Apr 17, 2019

View reviewed changes

cosmicexplorer reviewed Apr 17, 2019

View reviewed changes

stuhood reviewed Apr 17, 2019

View reviewed changes

stuhood mentioned this pull request Apr 17, 2019

WIP: Add integration test for pantsd dependency swapping #7403

Closed

stuhood approved these changes Apr 18, 2019

View reviewed changes

illicitonion and others added 11 commits April 23, 2019 11:44

Add integration test for pantsd dependency swapping

96d441c

Add Stu's unit tests

a8126f5

Replace was_dirty bit with EntryResult enum

277e003

Add comment about why test exists

6f87dcf

Add comment describing how we should remove obsolete edges

67f2b09

Document CycleType variants

fb516f7

Add comment explaining repeated clearing strategy

1f9af0f

Add blocking TODO

5bddaa5

Review feedback

283bfe4

illicitonion force-pushed the dwagnerhall/pantsd-cycle2 branch from f40b8fd to 283bfe4 Compare April 23, 2019 10:44

illicitonion added 3 commits April 23, 2019 13:16

Fix clippy

37c97ba

Detect self-cycles

293bdd5

Add some comments

4bf0c74

illicitonion force-pushed the dwagnerhall/pantsd-cycle2 branch from 88206e7 to 4bf0c74 Compare April 24, 2019 12:36

Debug hanging test

477e1eb

illicitonion merged commit 9c121f1 into pantsbuild:master Apr 24, 2019

illicitonion deleted the dwagnerhall/pantsd-cycle2 branch April 24, 2019 15:32

illicitonion added a commit that referenced this pull request Apr 24, 2019

Revert "Two targets can swap positions with pantsd (#7583)"

3fadb2d

This reverts commit 9c121f1.

cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Apr 29, 2019

Revert "Two targets can swap positions with pantsd (pantsbuild#7583) (p…

641b016

…antsbuild#7617)" This reverts commit 5de9012.

cosmicexplorer added a commit that referenced this pull request Apr 29, 2019

Revert "Two targets can swap positions with pantsd (#7583) (#7617)"

04e07bf

This reverts commit 5de9012.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two targets can swap positions with pantsd #7583

Two targets can swap positions with pantsd #7583

illicitonion commented Apr 17, 2019

stuhood left a comment

stuhood Apr 17, 2019

cosmicexplorer Apr 17, 2019

illicitonion Apr 18, 2019

cosmicexplorer left a comment

cosmicexplorer Apr 17, 2019

illicitonion Apr 18, 2019

cosmicexplorer Apr 17, 2019

cosmicexplorer Apr 17, 2019

stuhood left a comment

stuhood commented Apr 18, 2019

stuhood left a comment

stuhood Apr 18, 2019

illicitonion Apr 23, 2019

stuhood Apr 18, 2019

	// A note on dirty as was_dirty:
	// A note on dirty versus was_dirty:

		@@ -514,17 +541,21 @@ impl<N: Node> Graph<N> {
		// TODO: doing cycle detection under the lock... unfortunate, but probably unavoidable
		// without a much more complicated algorithm.

Two targets can swap positions with pantsd #7583

Two targets can swap positions with pantsd #7583

Conversation

illicitonion commented Apr 17, 2019

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cosmicexplorer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

stuhood commented Apr 18, 2019

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment