Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detransitize the differential dataflow computation #23

Merged
merged 14 commits into from
May 14, 2018

Conversation

nikomatsakis
Copy link
Contributor

@nikomatsakis nikomatsakis commented May 11, 2018

This introduces a new variant of the differential dataflow computation. This variant attempts to avoid computing the transitive closure of subset relations -- instead of computing the full TC at each point, it rather computes the TC along edges, and only for the dead regions along each edge. We also switch to using distinct_total instead of distinct, which is faster. The naive code is retained and used for comparison. The new method seems to produce the same results as the old.

UPDATE: I also remove the transitive computation of which loans are in scope, see below.

Current results for clap on my machine:

Version Timing
Before 131s
Without distinct_total 63s
With distinct_total 41s
Without the transitive loan propagation 15s
Using arrangements and join-core 14.2s
Empty variables 13.5s

Progress!

cc @frankmcsherry

@nikomatsakis
Copy link
Contributor Author

I checked manually and the new analysis also produces the same results on the clap example as the old one.

};

// subset(R1, R2, Q) :-
// subset(R1, R2, P) :-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny typo: should be "," instead of ":-"

we first compute each `R1 -> R2` relation where R1 is live but R2 is
dead; then we find the distinct R2 values to use as "seeds" for the
reachability analysis
move |_timestamp, facts| {
let mut result = result.lock().unwrap();
for ((r1, r2, location), _timestamp, multiplicity) in facts {
assert_eq!(*multiplicity, 1);
Copy link
Member

@lqd lqd May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this assert triggers with -v or --stats — apparently multiplicity being -1 here — with both input data sets.

Copy link

@frankmcsherry frankmcsherry May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked into this, and it seems that the problem goes away if you put a consolidate() after the definition of subset. This coalesces all updates together, and .. it seems to mean that the final results of subset are non-negative.

I think what happened here (testing now, but 5 min compile loop) is that

    let subset = Variable::from(subset_base.clone());

provides an initial value for subset and then removes it in the next round. The code concats the collection back in, so these two changes logically cancel, but the removal does produce negative tuples before they get cancelled.

I'm trying out

    let subset = Variable::from(subset_base.clone().filter(|_| false));

now which just says "don't start with any facts", which will avoid removals and shouldn't change the results as subset_base is concatenated in to the recursive definition of subset anyhow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding in the filter to prevent initial values going in seems to prevent the assert, even after removing the consolidate() that I added. Now, technically speaking I haven't seen it assert so I'm trying that now to make sure that the code is actually being called (though using --stats and seeing stats outputs, so probably?).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, without the filter I do see the assert, and either the filter or consolidate seem to prevent it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you recommend I do this filter trick then? Or is consolidate better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sort of confused why we get the negative multiplicities, but I guess it's sort of an "internal accounting" thing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think filter is probably "better". I forgot that with the filter you can drop the clone. It only really makes a difference if the input collection is large (the difference is one additional positive and negative instance of each input fact).

This is all here because Variable supports non-monotonic variables, and in principle a strictly monotonic variable shouldn't need any from() input.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negative multiplicities are really just about being general enough for non-monotonic computation. If you want to compute f^i(x) for a general f, we need to start with x, compute f(x), but then subtract x to get f(x) - x. That change is what needs to flow back around the loop to keep the incremental computation going correctly.

In the case of Datalog you wouldn't need to subtract off the x, because you know you'll have distincts everywhere, but more general computation needs the -x to correctly determine f^i(x) rather than f(x + f(x + f(x + ...))).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, yes I should totally have a better way to create an empty collection. Filter should be pretty cheap (the closure should optimize down to no per-record work). Probably the best other way is to use the same to_stream stuff you are using for inputs, but just with a None as the input IntoIterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I figured I would do None or vec![] with an as-stream or whatever.

This does slow us down quite a bit right now though.
Instead, propagate `R requires B` to superset regions of R only when R
goes dead.

Timing now down to 15s from around 120s!
@nikomatsakis
Copy link
Contributor Author

I haven't had time to read the comments yet, but I pushed some new commits that also remove the transitive loan propagation. This brings total time down to 15s (without altering final output).

@nikomatsakis
Copy link
Contributor Author

I pushed a few minor commits (e.g., using join_map instead of join and map etc), but they didn't affect overall runtime (not too surprising). Still, I believe @frankmcsherry that using join_map is more efficient in principle? (Perhaps once we make more strides =)

At this point I think the best thing to do is to pursue #20 (which is blocked on #24)

@frankmcsherry
Copy link

Still, I believe @frankmcsherry that using join_map is more efficient in principle? (Perhaps once we make more strides =)

Yup, it will save a bunch of copies, mostly. At some point I'll show you how to use join_core which is what lies underneath join, join_map, and semijoin, and which allows you to put arbitrary iterator logic in the output production. In principle you can fuse all of the maps into their preceding joins, except antijoin which has that concat there.

I am still not using them for `antijoin` because that's a bit more
complicated for me to think about. Also, I think we should write some
nicer wrapper traits, like `join` and `semijoin`, to make this more
readable.

Still, results in a net speed win: down to 14.2s.
@nikomatsakis
Copy link
Contributor Author

OK, I pushed the change to start with empty variables and grow them. To be honest, I find the code cleaner this way, and I manually verified that the asserts don't fire when using -v.

@nikomatsakis
Copy link
Contributor Author

It also seems to either make things mildly faster =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants