-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detransitize the differential dataflow computation #23
Conversation
Instead of computing the full transitive closure of the subset relation at each point, we now only compute it for those regions that are "dying".
I checked manually and the new analysis also produces the same results on the clap example as the old one. |
src/output/timely_opt.rs
Outdated
}; | ||
|
||
// subset(R1, R2, Q) :- | ||
// subset(R1, R2, P) :- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny typo: should be "," instead of ":-"
we first compute each `R1 -> R2` relation where R1 is live but R2 is dead; then we find the distinct R2 values to use as "seeds" for the reachability analysis
move |_timestamp, facts| { | ||
let mut result = result.lock().unwrap(); | ||
for ((r1, r2, location), _timestamp, multiplicity) in facts { | ||
assert_eq!(*multiplicity, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this assert triggers with -v
or --stats
— apparently multiplicity
being -1 here — with both input data sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked into this, and it seems that the problem goes away if you put a consolidate()
after the definition of subset
. This coalesces all updates together, and .. it seems to mean that the final results of subset
are non-negative.
I think what happened here (testing now, but 5 min compile loop) is that
let subset = Variable::from(subset_base.clone());
provides an initial value for subset
and then removes it in the next round. The code concats the collection back in, so these two changes logically cancel, but the removal does produce negative tuples before they get cancelled.
I'm trying out
let subset = Variable::from(subset_base.clone().filter(|_| false));
now which just says "don't start with any facts", which will avoid removals and shouldn't change the results as subset_base
is concatenated in to the recursive definition of subset
anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding in the filter to prevent initial values going in seems to prevent the assert, even after removing the consolidate()
that I added. Now, technically speaking I haven't seen it assert so I'm trying that now to make sure that the code is actually being called (though using --stats
and seeing stats outputs, so probably?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, without the filter I do see the assert, and either the filter or consolidate seem to prevent it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you recommend I do this filter
trick then? Or is consolidate
better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sort of confused why we get the negative multiplicities, but I guess it's sort of an "internal accounting" thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think filter
is probably "better". I forgot that with the filter
you can drop the clone
. It only really makes a difference if the input collection is large (the difference is one additional positive and negative instance of each input fact).
This is all here because Variable
supports non-monotonic variables, and in principle a strictly monotonic variable shouldn't need any from()
input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negative multiplicities are really just about being general enough for non-monotonic computation. If you want to compute f^i(x) for a general f, we need to start with x, compute f(x), but then subtract x to get f(x) - x. That change is what needs to flow back around the loop to keep the incremental computation going correctly.
In the case of Datalog you wouldn't need to subtract off the x, because you know you'll have distincts everywhere, but more general computation needs the -x to correctly determine f^i(x) rather than f(x + f(x + f(x + ...))).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, yes I should totally have a better way to create an empty collection. Filter should be pretty cheap (the closure should optimize down to no per-record work). Probably the best other way is to use the same to_stream
stuff you are using for inputs, but just with a None
as the input IntoIterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I figured I would do None
or vec![]
with an as-stream or whatever.
This does slow us down quite a bit right now though.
Instead, propagate `R requires B` to superset regions of R only when R goes dead. Timing now down to 15s from around 120s!
I haven't had time to read the comments yet, but I pushed some new commits that also remove the transitive loan propagation. This brings total time down to 15s (without altering final output). |
I pushed a few minor commits (e.g., using At this point I think the best thing to do is to pursue #20 (which is blocked on #24) |
Yup, it will save a bunch of copies, mostly. At some point I'll show you how to use |
I am still not using them for `antijoin` because that's a bit more complicated for me to think about. Also, I think we should write some nicer wrapper traits, like `join` and `semijoin`, to make this more readable. Still, results in a net speed win: down to 14.2s.
OK, I pushed the change to start with empty variables and grow them. To be honest, I find the code cleaner this way, and I manually verified that the asserts don't fire when using |
It also seems to either make things mildly faster =) |
This introduces a new variant of the differential dataflow computation. This variant attempts to avoid computing the transitive closure of subset relations -- instead of computing the full TC at each point, it rather computes the TC along edges, and only for the dead regions along each edge. We also switch to using
distinct_total
instead ofdistinct
, which is faster. The naive code is retained and used for comparison. The new method seems to produce the same results as the old.UPDATE: I also remove the transitive computation of which loans are in scope, see below.
Current results for clap on my machine:
distinct_total
distinct_total
Progress!
cc @frankmcsherry