Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two targets can swap positions with pantsd #7583

Merged
merged 15 commits into from
Apr 24, 2019

Conversation

illicitonion
Copy link
Contributor

Before this PR, nothing would remove the edges of a dirty node, so if
two nodes swapped positions in the graph (e.g. if a dependency between
two targets inverted), a cycle would be detected.

With this PR, if we detect a cycle, but detect that there may be dirty
edges in play, we fully clear that node (including removing its edges),
which will cause it being re-triggered from scratch.

This is specifically in place to handle the cycle scenario - the dirty
bit, and dependency Generations are still the primary mechanism for
handling re-use of old versions.

There's an ugliness here that we still don't remove obsolete edges, so
if Generation 2 of a node has differing dependencies from Generation 1,
the dependency from Generation 1 will still dirty Generation 2. We may
want to consider solving that separately as/when it becomes a
significant issue, or we may want to re-work this PR to do something
like that... This PR happens to cover a part of that problem, but only
where it causes definitive problems (a fake cycle) rather than also
where it causes performance problems.

There's probably a slightly more principled solution here along the
lines of:

  • Rather than using () as an edge weight in the graph, use the
    Generation of the dependee Node as an edge weight.
  • When doing cycle detection, compare the edge weight against the
    generation of the node, and ignore obsolete edges.
    but I would want to think about that a lot more before doing it...

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for looking at this!

@@ -360,6 +381,7 @@ impl<N: Node> Entry<N> {
} else {
None
},
dirty, // TODO: Should this also cover uncacheable?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it is definitely related, yea. If this dirty value is associated with the previous_ value(s), then in the case where we've said: "you should definitely not trust/reuse the previous value", we should also not trust its edges.

But see the comments on #6598... it's pretty likely that the implementation of cacheability should switch to an implementation that changes the identity of the node each time (possibly by changing parameter identities)... and that would make this less relevant I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the node identity-based uncacheability approach seems like a good idea, could link to that issue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now dirties in both cases, but I agree that reworking this in the future would be nice.

Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a decidedly nontrivial issue and I'm very glad we have a handle on why it happens and how to fix it!

I've noted multiple places that I believe would strongly benefit from copious use of one-off enums. We can probably merge this PR first and then follow up with later enum changes to avoid blocking the fix.

When doing cycle detection, compare the edge weight against the
generation of the node, and ignore obsolete edges.
but I would want to think about that a lot more before doing it...

Is there additional complexity to implementing this beyond "we now have to compare generations", or is there a concern this would introduce difficult-to-debug errors?

@@ -62,10 +62,25 @@ pub(crate) enum EntryState<N: Node> {
// The previous_result value is _not_ a valid value for this Entry: rather, it is preserved in
// order to compute the generation value for this Node by comparing it to the new result the next
// time the Node runs.
//
// A note on dirty as was_dirty:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A note on dirty as was_dirty:
// A note on dirty versus was_dirty:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with different docs on the new enum

@@ -541,7 +565,7 @@ impl<N: Node> Entry<N> {
///
/// Clears the state of this Node, forcing it to be recomputed.
///
pub(crate) fn clear(&mut self) {
pub(crate) fn clear(&mut self, graph_still_contains_edges: bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be converted into its own one-off enum instead of a bool.

@@ -514,17 +541,21 @@ impl<N: Node> Graph<N> {
// TODO: doing cycle detection under the lock... unfortunate, but probably unavoidable
// without a much more complicated algorithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: I would be interested in any thoughts on how to estimate the speedup we might get from incremental cycle detection (possibly just by using a profiler?) instead of holding the lock.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it be possible to include the unit tests from master...twitter:stuhood/dirty-cycle-detection here? Can also mark this one as fixing #7404.

@stuhood
Copy link
Member

stuhood commented Apr 18, 2019

Also also:

There's an ugliness here that we still don't remove obsolete edges...

It would be good to incorporate some of the PR description into a TODO somewhere in the code. Definitely fine with leaving "non-problematic" edges in place for now and revisiting it in the (distant) future!

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -428,6 +471,9 @@ impl<N: Node> Entry<N> {
"Not completing node {:?} because it was invalidated before completing.",
self.node
);
if let Some(previous_result) = previous_result.as_mut() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Possible that the EntryResult enum could gain a "NotPresent" variant to incorporate the None case? Possibly not worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave for now, can add in the future if needed.

@@ -567,6 +616,12 @@ impl<N: Node> Entry<N> {

trace!("Clearing node {:?}", self.node);

if graph_still_contains_edges {
if let Some(previous_result) = previous_result.as_mut() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, a lot of these would be eliminated by a "NotPresent" variant.

illicitonion and others added 11 commits April 23, 2019 11:44
Before this PR, nothing would remove the edges of a dirty node, so if
two nodes swapped positions in the graph (e.g. if a dependency between
two targets inverted), a cycle would be detected.

With this PR, if we detect a cycle, but detect that there may be dirty
edges in play, we fully clear that node (including removing its edges),
which will cause it being re-triggered from scratch.

This is specifically in place to handle the cycle scenario - the dirty
bit, and dependency Generations are still the primary mechanism for
handling re-use of old versions.

There's an ugliness here that we still don't remove obsolete edges, so
if Generation 2 of a node has differing dependencies from Generation 1,
the dependency from Generation 1 will still dirty Generation 2. We _may_
want to consider solving that separately as/when it becomes a
significant issue, or we may want to re-work this PR to do something
like that... This PR happens to cover a part of that problem, but only
where it causes definitive problems (a fake cycle) rather than also
where it causes performance problems.

There's probably a slightly more principled solution here along the
lines of:
 * Rather than using () as an edge weight in the graph, use the
   Generation of the dependee Node as an edge weight.
 * When doing cycle detection, compare the edge weight against the
   generation of the node, and ignore obsolete edges.
but I would want to think about that a lot more before doing it...
So that we can report whole paths which would cause the cycle.

This will slow down cycle detection a little; if it becomes a problem,
we could do a Dijkstra run, and only if we detect a cycle (which is the
rare case), do the Bellman-Ford. But we've also been talking about
trying to do incremental cycle detection, so I'm not going to worry too
much about this unless it starts posing a noticeable problem.
@illicitonion illicitonion force-pushed the dwagnerhall/pantsd-cycle2 branch from f40b8fd to 283bfe4 Compare April 23, 2019 10:44
@illicitonion illicitonion force-pushed the dwagnerhall/pantsd-cycle2 branch from 88206e7 to 4bf0c74 Compare April 24, 2019 12:36
@illicitonion illicitonion merged commit 9c121f1 into pantsbuild:master Apr 24, 2019
@illicitonion illicitonion deleted the dwagnerhall/pantsd-cycle2 branch April 24, 2019 15:32
illicitonion added a commit that referenced this pull request Apr 24, 2019
illicitonion added a commit to twitter/pants that referenced this pull request Apr 24, 2019
Before this PR, nothing would remove the edges of a dirty node, so if
two nodes swapped positions in the graph (e.g. if a dependency between
two targets inverted), a cycle would be detected.

With this PR, if we detect a cycle, but detect that there may be dirty
edges in play, we fully clear that node (including removing its edges),
which will cause it being re-triggered from scratch.

This is specifically in place to handle the cycle scenario - the dirty
bit, and dependency Generations are still the primary mechanism for
handling re-use of old versions.

There's an ugliness here that we still don't remove obsolete edges, so
if Generation 2 of a node has differing dependencies from Generation 1,
the dependency from Generation 1 will still dirty Generation 2. We _may_
want to consider solving that separately as/when it becomes a
significant issue, or we may want to re-work this PR to do something
like that... This PR happens to cover a part of that problem, but only
where it causes definitive problems (a fake cycle) rather than also
where it causes performance problems.

There's probably a slightly more principled solution here along the
lines of:
 * Rather than using () as an edge weight in the graph, use the
   Generation of the dependee Node as an edge weight.
 * When doing cycle detection, compare the edge weight against the
   generation of the node, and ignore obsolete edges.
but I would want to think about that a lot more before doing it...
illicitonion added a commit that referenced this pull request Apr 24, 2019
Before this PR, nothing would remove the edges of a dirty node, so if
two nodes swapped positions in the graph (e.g. if a dependency between
two targets inverted), a cycle would be detected.

With this PR, if we detect a cycle, but detect that there may be dirty
edges in play, we fully clear that node (including removing its edges),
which will cause it being re-triggered from scratch.

This is specifically in place to handle the cycle scenario - the dirty
bit, and dependency Generations are still the primary mechanism for
handling re-use of old versions.

There's an ugliness here that we still don't remove obsolete edges, so
if Generation 2 of a node has differing dependencies from Generation 1,
the dependency from Generation 1 will still dirty Generation 2. We _may_
want to consider solving that separately as/when it becomes a
significant issue, or we may want to re-work this PR to do something
like that... This PR happens to cover a part of that problem, but only
where it causes definitive problems (a fake cycle) rather than also
where it causes performance problems.

There's probably a slightly more principled solution here along the
lines of:
 * Rather than using () as an edge weight in the graph, use the
   Generation of the dependee Node as an edge weight.
 * When doing cycle detection, compare the edge weight against the
   generation of the node, and ignore obsolete edges.
but I would want to think about that a lot more before doing it...
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Apr 29, 2019
cosmicexplorer added a commit that referenced this pull request Apr 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants