-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two targets can swap positions with pantsd #7583
Two targets can swap positions with pantsd #7583
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton for looking at this!
src/rust/engine/graph/src/entry.rs
Outdated
@@ -360,6 +381,7 @@ impl<N: Node> Entry<N> { | |||
} else { | |||
None | |||
}, | |||
dirty, // TODO: Should this also cover uncacheable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like it is definitely related, yea. If this dirty value is associated with the previous_
value(s), then in the case where we've said: "you should definitely not trust/reuse the previous value", we should also not trust its edges.
But see the comments on #6598... it's pretty likely that the implementation of cacheability should switch to an implementation that changes the identity of the node each time (possibly by changing parameter identities)... and that would make this less relevant I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the node identity-based uncacheability approach seems like a good idea, could link to that issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now dirties in both cases, but I agree that reworking this in the future would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a decidedly nontrivial issue and I'm very glad we have a handle on why it happens and how to fix it!
I've noted multiple places that I believe would strongly benefit from copious use of one-off enums. We can probably merge this PR first and then follow up with later enum changes to avoid blocking the fix.
When doing cycle detection, compare the edge weight against the
generation of the node, and ignore obsolete edges.
but I would want to think about that a lot more before doing it...
Is there additional complexity to implementing this beyond "we now have to compare generations", or is there a concern this would introduce difficult-to-debug errors?
src/rust/engine/graph/src/entry.rs
Outdated
@@ -62,10 +62,25 @@ pub(crate) enum EntryState<N: Node> { | |||
// The previous_result value is _not_ a valid value for this Entry: rather, it is preserved in | |||
// order to compute the generation value for this Node by comparing it to the new result the next | |||
// time the Node runs. | |||
// | |||
// A note on dirty as was_dirty: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// A note on dirty as was_dirty: | |
// A note on dirty versus was_dirty: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with different docs on the new enum
@@ -541,7 +565,7 @@ impl<N: Node> Entry<N> { | |||
/// | |||
/// Clears the state of this Node, forcing it to be recomputed. | |||
/// | |||
pub(crate) fn clear(&mut self) { | |||
pub(crate) fn clear(&mut self, graph_still_contains_edges: bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be converted into its own one-off enum instead of a bool.
@@ -514,17 +541,21 @@ impl<N: Node> Graph<N> { | |||
// TODO: doing cycle detection under the lock... unfortunate, but probably unavoidable | |||
// without a much more complicated algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated: I would be interested in any thoughts on how to estimate the speedup we might get from incremental cycle detection (possibly just by using a profiler?) instead of holding the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would it be possible to include the unit tests from master...twitter:stuhood/dirty-cycle-detection here? Can also mark this one as fixing #7404.
Also also:
It would be good to incorporate some of the PR description into a TODO somewhere in the code. Definitely fine with leaving "non-problematic" edges in place for now and revisiting it in the (distant) future! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -428,6 +471,9 @@ impl<N: Node> Entry<N> { | |||
"Not completing node {:?} because it was invalidated before completing.", | |||
self.node | |||
); | |||
if let Some(previous_result) = previous_result.as_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Possible that the EntryResult
enum could gain a "NotPresent" variant to incorporate the None
case? Possibly not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will leave for now, can add in the future if needed.
@@ -567,6 +616,12 @@ impl<N: Node> Entry<N> { | |||
|
|||
trace!("Clearing node {:?}", self.node); | |||
|
|||
if graph_still_contains_edges { | |||
if let Some(previous_result) = previous_result.as_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, a lot of these would be eliminated by a "NotPresent" variant.
Before this PR, nothing would remove the edges of a dirty node, so if two nodes swapped positions in the graph (e.g. if a dependency between two targets inverted), a cycle would be detected. With this PR, if we detect a cycle, but detect that there may be dirty edges in play, we fully clear that node (including removing its edges), which will cause it being re-triggered from scratch. This is specifically in place to handle the cycle scenario - the dirty bit, and dependency Generations are still the primary mechanism for handling re-use of old versions. There's an ugliness here that we still don't remove obsolete edges, so if Generation 2 of a node has differing dependencies from Generation 1, the dependency from Generation 1 will still dirty Generation 2. We _may_ want to consider solving that separately as/when it becomes a significant issue, or we may want to re-work this PR to do something like that... This PR happens to cover a part of that problem, but only where it causes definitive problems (a fake cycle) rather than also where it causes performance problems. There's probably a slightly more principled solution here along the lines of: * Rather than using () as an edge weight in the graph, use the Generation of the dependee Node as an edge weight. * When doing cycle detection, compare the edge weight against the generation of the node, and ignore obsolete edges. but I would want to think about that a lot more before doing it...
So that we can report whole paths which would cause the cycle. This will slow down cycle detection a little; if it becomes a problem, we could do a Dijkstra run, and only if we detect a cycle (which is the rare case), do the Bellman-Ford. But we've also been talking about trying to do incremental cycle detection, so I'm not going to worry too much about this unless it starts posing a noticeable problem.
f40b8fd
to
283bfe4
Compare
88206e7
to
4bf0c74
Compare
This reverts commit 9c121f1.
Before this PR, nothing would remove the edges of a dirty node, so if two nodes swapped positions in the graph (e.g. if a dependency between two targets inverted), a cycle would be detected. With this PR, if we detect a cycle, but detect that there may be dirty edges in play, we fully clear that node (including removing its edges), which will cause it being re-triggered from scratch. This is specifically in place to handle the cycle scenario - the dirty bit, and dependency Generations are still the primary mechanism for handling re-use of old versions. There's an ugliness here that we still don't remove obsolete edges, so if Generation 2 of a node has differing dependencies from Generation 1, the dependency from Generation 1 will still dirty Generation 2. We _may_ want to consider solving that separately as/when it becomes a significant issue, or we may want to re-work this PR to do something like that... This PR happens to cover a part of that problem, but only where it causes definitive problems (a fake cycle) rather than also where it causes performance problems. There's probably a slightly more principled solution here along the lines of: * Rather than using () as an edge weight in the graph, use the Generation of the dependee Node as an edge weight. * When doing cycle detection, compare the edge weight against the generation of the node, and ignore obsolete edges. but I would want to think about that a lot more before doing it...
Before this PR, nothing would remove the edges of a dirty node, so if two nodes swapped positions in the graph (e.g. if a dependency between two targets inverted), a cycle would be detected. With this PR, if we detect a cycle, but detect that there may be dirty edges in play, we fully clear that node (including removing its edges), which will cause it being re-triggered from scratch. This is specifically in place to handle the cycle scenario - the dirty bit, and dependency Generations are still the primary mechanism for handling re-use of old versions. There's an ugliness here that we still don't remove obsolete edges, so if Generation 2 of a node has differing dependencies from Generation 1, the dependency from Generation 1 will still dirty Generation 2. We _may_ want to consider solving that separately as/when it becomes a significant issue, or we may want to re-work this PR to do something like that... This PR happens to cover a part of that problem, but only where it causes definitive problems (a fake cycle) rather than also where it causes performance problems. There's probably a slightly more principled solution here along the lines of: * Rather than using () as an edge weight in the graph, use the Generation of the dependee Node as an edge weight. * When doing cycle detection, compare the edge weight against the generation of the node, and ignore obsolete edges. but I would want to think about that a lot more before doing it...
…antsbuild#7617)" This reverts commit 5de9012.
Before this PR, nothing would remove the edges of a dirty node, so if
two nodes swapped positions in the graph (e.g. if a dependency between
two targets inverted), a cycle would be detected.
With this PR, if we detect a cycle, but detect that there may be dirty
edges in play, we fully clear that node (including removing its edges),
which will cause it being re-triggered from scratch.
This is specifically in place to handle the cycle scenario - the dirty
bit, and dependency Generations are still the primary mechanism for
handling re-use of old versions.
There's an ugliness here that we still don't remove obsolete edges, so
if Generation 2 of a node has differing dependencies from Generation 1,
the dependency from Generation 1 will still dirty Generation 2. We may
want to consider solving that separately as/when it becomes a
significant issue, or we may want to re-work this PR to do something
like that... This PR happens to cover a part of that problem, but only
where it causes definitive problems (a fake cycle) rather than also
where it causes performance problems.
There's probably a slightly more principled solution here along the
lines of:
Generation of the dependee Node as an edge weight.
generation of the node, and ignore obsolete edges.
but I would want to think about that a lot more before doing it...