Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Dynamic databases #231

Merged
merged 48 commits into from
Jul 7, 2020

Conversation

nikomatsakis
Copy link
Member

@nikomatsakis nikomatsakis commented Jun 29, 2020

This PR introduces an RFC describing a shift to dyn-capable databases. The primary goal is to make it so that query-group code can be compiled in the query-group crate, rather than waiting to be monomorphized in the final database crate.

The main user-visible effect of this proposal is that it requires all salsa query group traits to be dyn-safe.

Rendered form of the RFC

Current status

Largely implemented.

Pending work:

  • Refactor slots and dependencies and try to minimize code generic over Q in the derived impl -- the current branch doesn't match the RFC in that it still creates slots that store the values.
  • Update the salsa book
  • Test effectiveness perhaps?

Pending updates the RFC text itself:

  • Describe the 'static limitation on databases and how it might be overcome
  • I removed the db.query(Q) methods entirely in favor of Q.in_db(&db). This does not require an extension trait and also works better with dyn Db coercions.
  • Describe how we are adding salsa::Database and HasQueryGroup<G> as an automatic supertrait to each query-group trait.

@nikomatsakis nikomatsakis changed the title Dynamic databases rfc [RFC] Dynamic databases Jun 29, 2020
@nikomatsakis nikomatsakis requested a review from matklad June 29, 2020 12:56
Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a preliminary review. So far this looks reasonable, but I must dig deeper to really understand the KeyIndex indirection. In general, I must say it was (and, after this rfc, probably still is) hard to follow salsa's trait setup....

book/src/rfcs/RFC0006-Dynamic-Databases.md Outdated Show resolved Hide resolved
```rust,ignore
pub trait DatabaseExt {
#[allow(unused_variables)]
fn query<Q>(&self, query: Q) -> QueryTable<'_, Self, Q>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative is to add where Self: Sized bound. I think for rust-analyzer we only invoke query for final DB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting, that's true, we could do that, though it seems strictly less good. This is an interesting case in the language, it'd be nice if we had a way to express "final" functions in a trait that could not be overridden by impls (which would then be exempted from object safety requirements).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest version, I would up with TheQuery.in_db(&db) instead. This works better because we need to coerce the &db type to a &dyn DB anyway. It's also nicely ergonomic and avoids turbofish. Not as discoverable. We could, side note, add a method like db.xxx_query() that returns the QueryTable.

book/src/rfcs/RFC0006-Dynamic-Databases.md Outdated Show resolved Hide resolved
book/src/rfcs/RFC0006-Dynamic-Databases.md Outdated Show resolved Hide resolved
@nikomatsakis
Copy link
Member Author

Regarding following the setup, it is tricky, I was thinking about how to diagram it but I couldn't quite come up with the right diagram to explain everything. I didn't put a lot of effort into it yet, but I think it'd be worth it.

@nikomatsakis
Copy link
Member Author

I can probably put in some effort into prototyping this but I'd also be happy to work with someone else on that. Also, I forgot that I had planned to add a few notes in the alternatives section about ways we could collect DatabaseKeyIndex values and/or why I chose to make dyn mandatory (the latter is basically "simpler to pick one and not have to maintain multiple options").

@nikomatsakis nikomatsakis force-pushed the dynamic-databases-rfc branch from 4ab4093 to 1370469 Compare June 29, 2020 22:48
Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a second pass, everything looks reasonable!

}
```

### The salsa-event mechanism will move to dynamic dispatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, rust-analyzer uses this API extensively, as we use it to inject cancellation. We need to call event handler for every recomputation and revalidation event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll just have to measure whether using a dyn interface here makes a perf difference, but I don't really see much of an alternative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cancellation specifically, I think we should just build it into salsa.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense

book/src/rfcs/RFC0006-Dynamic-Databases.md Show resolved Hide resolved
immutable, including the list of dependencies. This in turn means that those
fields can be traversed during revalidation without acquiring any read-locks.

There is one complication, though. It sometimes happens that we a query Q has a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There is one complication, though. It sometimes happens that we a query Q has a
There is one complication, though. It sometimes happens that a query Q has a

@matklad
Copy link
Member

matklad commented Jun 30, 2020

Regarding following the setup, it is tricky, I was thinking about how to diagram it but I couldn't quite come up with the right diagram to explain everything. I didn't put a lot of effort into it yet, but I think it'd be worth it.

What I usually do is just ask salsa to dump all the code generated for hello world example. It might make sense to write this as a test (piping the output through rustftmt), so that it's easy to see what exactly is going on.

@nikomatsakis
Copy link
Member Author

nikomatsakis commented Jun 30, 2020

I started playing around with implementation (you can see the commits in this branch) and I realized I may want to change one aspect of the current proposal. Instead of the DatabaseKeyIndex being an index into a central vector, it may be better to have it be a set of three integers:

struct DatabaseKeyIndex {
    group_index: u16,
    query_index: u16,
    key_index: u32,
}

This was actually my original idea, but I changed it to a centralized index at some point. This "3 integer" approach means that we can create fresh keys without any central locking or coordination. It works just as well for the main job of dispatching revalidation requests -- perhaps better. It also works particularly well with the "interning"-style queries, as the key_index is simply the intern request (trying to fix interning-style queries into the code was what drove me to this approach).

I think that for the the other kinds of queries, we would basically use a FxIndexMap<Q::Key, ..>. Already with slots we never remove keys and instead just replace them with a "not computed" placeholder, so the indices won't get invalidated.

@nikomatsakis nikomatsakis force-pushed the dynamic-databases-rfc branch 2 times, most recently from 383fffd to 0706be7 Compare June 30, 2020 10:33
@nikomatsakis
Copy link
Member Author

What I usually do is just ask salsa to dump all the code generated for hello world example.

Yeah this is why I wrote up those "plumbing" chapters in the book, so I didn't have to keep doing that. I think there's a way to get a good diagram, but I didn't want to invest too much energy into it until we had worked out the changes due to this RFC. Note that the latest pushes implement the "triple index" scheme I described and also handle the tracking of dependencies via indices instead of direct references to slots. I think it worked out quite cleanly.

@nikomatsakis
Copy link
Member Author

Update: I think this RFC implies that the salsa::requires feature must be removed, as described in the latest commit.

@nikomatsakis

This comment has been minimized.

@nikomatsakis nikomatsakis force-pushed the dynamic-databases-rfc branch from 05ddf29 to ad635dc Compare July 2, 2020 09:41
@nikomatsakis
Copy link
Member Author

OK, I realized the solution to my dilemma and pushed an implementation of it. You now define databases via storage: Storage<Self> instead of directly embedding the runtime, but the runtime just has the type Runtime and doesn't require the DB type parameter. This allows us to continue using pub(crate) for plumbing which is very convenient.

@nikomatsakis
Copy link
Member Author

(I suppose it would have been a less invasive change to split out Runtime into something else, so that users still wrote runtime::Runtime<Self>... but I sort of like the name "storage"...)

@nikomatsakis nikomatsakis force-pushed the dynamic-databases-rfc branch 2 times, most recently from eece1c1 to a4bca65 Compare July 2, 2020 10:50
@nikomatsakis
Copy link
Member Author

OK, I've basically implemented this RFC now, but I hit one final obstacle I hadn't anticipated: In the absence of GATs, the structure that I've setup here, lacking GATs, basically makes it impossible for databases to carry non-static borrows.

The problem stems from one of the nicest simplifications in this RFC. Instead of having everything generic over DB, things are instead just generic over a Q: Query, and for the database they get a Q::DynDb, which is short for something like dyn MyQueryGroup. This makes a lot of gnarly generic signatures a lot simpler. However, the problem is that type DynDb = dyn MyQueryGroup is really short for type DynDb = dyn MyQueryGroup + 'static.

This is annoying because the parameter is db: &Q::DynDb, and if we were writing things out without generics we'd have &'a dyn MyQueryGroup which would default to &'a dyn (MyQueryGroup + 'a). Unfortunately, we are not writing things out by hand, we're working with generics, and it doesn't work like that.

If we had GATs, we could make it db: &Q::DynDb<'_>, and things would work out fine. Even without GATs, there's probably a way to get support with a bit of "fancy footwork". We'd have another trait that carries the DynDb associated type which also has a lifetime parameter, so that we wind up with db: <Q as DynDbRef<'_>>::DynDb or something like that.

That said, right now, database traits don't really support generic parameters anyway, so we're not actually losing anything afaik by requiring 'static.

@nikomatsakis nikomatsakis force-pushed the dynamic-databases-rfc branch from cf7bfcf to 4b70a10 Compare July 3, 2020 10:56
@nikomatsakis
Copy link
Member Author

nikomatsakis commented Jul 3, 2020

OK, fully rebased, tests pass, and the hardest part is implemented. I updated the OP with details of work left to do.

Update book/src/rfcs/RFC0006-Dynamic-Databases.md

Co-authored-by: Aleksey Kladov <[email protected]>

Update book/src/rfcs/RFC0001-Query-Group-Traits.md

Co-authored-by: bjorn3 <[email protected]>

Update book/src/rfcs/RFC0006-Dynamic-Databases.md

Co-authored-by: Aleksey Kladov <[email protected]>

fix lint warnings on RFC
This will be more compatible once we move to having queries have an
associated `DynDb` type. It also reads nicely.
This had two unexpected consequences, one unfortunate, one "medium":

* All `salsa::Database` must be `'static`. This falls out from
`Q::DynDb` not having access to any lifetimes, but also the defaulting
rules for `dyn QueryGroup` that make it `dyn QueryGroup + 'static`. We
don't really support generic databases anyway yet so this isn't a big
deal, and we can add workarounds later (ideally via GATs).

* It is now statically impossible to invoke `snapshot` from a query,
and so we don't need to test that it panics. This is because the
signature of `snapshot` returns a `Snapshot<Self>` and that is not
accessible to a `dyn QueryGroup` type. Similarly, invoking
`Runtime::snapshot` directly is not possible becaues it is
crate-private. So I removed the test. This seems ok, but eventually I
would like to expose ways for queries to do parallel
execution (matklad and I had talked about a "speculation" primitive
for enabling that).

* This commit is 99% boilerplate I did with search-and-replace. I also
rolled in a few other changes I might have preferred to factor out,
most notably removing the `GetQueryTable` plumbing trait in favor of
free-methods, but it was awkward to factor them out and get all the
generics right (so much simpler in this version).
@nikomatsakis nikomatsakis force-pushed the dynamic-databases-rfc branch from 4b70a10 to fad97ee Compare July 4, 2020 14:17
@nikomatsakis nikomatsakis force-pushed the dynamic-databases-rfc branch from a50273d to 0a8c203 Compare July 5, 2020 11:57
It's simpler to just store a DatabaseKeyIndex. It may be somewhat
slower, we'll have to measure.  But we can add back in this other
design later if we want.
This should enable more sharing and less monomorphization. There is
probably room for more radical restructing in this vein.
@nikomatsakis
Copy link
Member Author

BTW, I adjusted the docs, so if you get a chance, do a checkout and run mdbook serve. I think the diagram + links is reasonably comprehensible, or at least to help in "refreshing one's memory"

@matklad
Copy link
Member

matklad commented Jul 7, 2020

Some doctests fail

@matklad
Copy link
Member

matklad commented Jul 7, 2020

See rust-lang/rust-analyzer#1987 (comment) for visualization of impact on compile times.

@nikomatsakis
Copy link
Member Author

OK, doc-tests are fixed. I think we're going to merge this once CI is happy.

@nikomatsakis nikomatsakis merged commit 9b9dbcc into salsa-rs:master Jul 7, 2020
matklad added a commit that referenced this pull request Jul 7, 2020
The single (but big) change is Dynamic Database RFC implementation:

#231
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants