-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: optimize indices #1841
feat: optimize indices #1841
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions to help me understand how this operation works
let mut indices = Vec::with_capacity(old_indices.len()); | ||
for idx in old_indices { | ||
let index = dataset | ||
.open_generic_index(&column.name, &idx.uuid.to_string()) | ||
.await?; | ||
indices.push(index); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, could maybe use Vec::from_iter
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I change to
let indices = stream::iter(old_indices.iter())
.zip(repeat((dataset.clone(), col_name.clone())))
.map(|(meta, (ds, col_name))| async move {
ds.open_generic_index(&col_name, &meta.uuid.to_string())
.await
})
.buffered(10)
.try_collect::<Vec<_>>()
.await?;
it complaints
error: higher-ranked lifetime error
--> lance/src/index.rs:300:5
|
300 | #[instrument(skip_all)]
| ^^^^^^^^^^^^^^^^^^^^^^^
|
= note: could not prove `Pin<Box<{async block@lance/src/index.rs:300:5: 300:28}>>: CoerceUnsized<Pin<Box<(dyn futures::Future<Output = Result<(), lance_core::Error>> + std::marker::Send + 'i)>>>`
= note: this error originates in the attribute macro `instrument` (in Nightly builds, run with -Z macro-backtrace for more info)
error: could not compile `lance` (lib) due to previous error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the return of the dreaded rust-lang/rust#102211
Don't worry too much about it but I I've had luck with something like this...
let indices = stream::iter(old_indices.iter())
.zip(repeat((dataset.clone(), col_name.clone())))
.map(|(meta, (ds, col_name))| async move {
ds.open_generic_index(&col_name, &meta.uuid.to_string())
.await
})
.collect::<Vec<_>>();
let indices = stream::iter(indices)
.buffered(10)
.try_collect::<Vec<_>>()
.await?;
...or you can just remove the call to buffered
(that's the method that usually introduces the bogus error).
/// If `num_indices_to_merge` is 0, a new delta index will be created. | ||
/// If `num_indices_to_merge` is 1, the delta updates will be merged into the latest index. | ||
/// If `num_indices_to_merge` is more than 1, the delta updates and latest N indices | ||
/// will be merged into one single index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to explain to the user why they would change this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry too much. We can tweak this if/when the parameter becomes public. I'm guessing that there is some kind of cost-to-compute/accuracy-of-index tradeoff here? Or is it a cost-to-compute/cost-to-search tradeoff?
In other words, "why wouldn't I merge the indicies into one big index every time?" or "why wouldn't I make every index a delta index?"
I'm still not sure it is clear from the comment.
Allow users to speicfy how many delta indices to be merged