-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust/drivermgr): start rust driver_manager #416
Conversation
0dc016f
to
6f6a5f9
Compare
c6e3c8e
to
1492bc2
Compare
I'm still figuring out the Windows CI, but otherwise this is ready for review. I recommend reading the files in the following order:
|
1492bc2
to
3ad0dea
Compare
3ad0dea
to
747b9e2
Compare
a50c6f5
to
157d304
Compare
157d304
to
1fb6999
Compare
Rust CI is successful on my branch: https://github.com/wjones127/arrow-adbc/actions/runs/4131461920 |
//! [AdbcConnection] should not be used across multiple threads. Driver | ||
//! implementations do not guarantee connection APIs are safe to call from | ||
//! multiple threads, unless calls are carefully sequenced. So instead of using | ||
//! the same connection across multiple threads, create a connection for each | ||
//! thread. [AdbcConnectionBuilder] is [core::marker::Send], so it can be moved | ||
//! to a new thread before initialized into an [AdbcConnection]. [AdbcConnection] | ||
//! holds it's inner data in a [std::rc::Rc], so it is also cheaply copyable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking just last night that we might want to implement async execute
methods, which would involve invoking with spawn_blocking()
. That might mean we need to make AdbcConnection
Send
, but not 100% sure yet. Maybe something for a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've wanted to implement async interfaces in general. That'll require some thought between Flight RPC, and Rust/C++/Go. (I want to make the C-level ABI in a way that all three languages can expose their native concurrency models efficiently, and import others' models efficiently. But that means I need to really go study up on how those models work.) I'd appreciate any thoughts you and @zeroshade have.
e.g.: gRPC C++ uses a callback-based model. UCX (in C) uses a polling-based model. There's precedent like kqueue/epoll, and of course the new io_uring (don't know if that's relevant).
There's also Python's model to consider (as a consumer), and probably we'd want to anticipate things like OpenTelemetry (or generally: non-stack-frame, non-thread based 'task' context). And eventually, whenever I get around to JNI bindings, Java's model as well (which, with virtual threading, is also getting more complicated/interesting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and consideration of whether we need an async version of C Data Interface...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the Rust model is described well here: https://rust-lang.github.io/async-book/02_execution/02_future.html.
But for now I was simply thinking that Rust driver manager could wrap the blocking calls into C drivers with tokio::spawn_blocking(), which sends the call to another thread to execute. This is what Rust async libraries do for filesystem interaction on systems that don't provide a native async interface. And then native Rust drivers could always be async. But maybe there is value in having both async and blocking versions of execute()
in the Rust API. Something I'll try to get feedback from the Rust community on once I have a proposal for the ADBC Rust API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this in the R package too. In the Arrow R package we always execute "cancellable arrow stuff" on another thread (via an Executor) and wait for that future to complete. That relies on Arrow's async API and cancellation signal handlers, but it would be a huge asset for the driver manager to handle this (nothing worse than accidentally executing a huge query without the ability to cancel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. If I can find some spare time...
I think async interfaces in general throughout our ecosystem are going to be more important as they get adopted more by developers (C++ is, well, coming along, but Go/mostly Rust/sorta Python have generally mature concurrency stories nowadays)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, thank you!
I had some questions, but my Rust is rather out of date so hopefully it makes sense
run: cargo fmt -- --check | ||
- name: Clippy | ||
run: cargo clippy --tests | ||
- name: Build Driver SQLite (Windows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may or may not want to integrate with the main pipeline, which will build all these artifacts once so you can just download them from other jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now it seems clear enough while separate. But if we rely on any more artifacts from the other build, then I think we should merge.
|
||
[package] | ||
name = "arrow-adbc" | ||
version = "0.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: version needs to be bumped (also, we'll have to add this to the release scripts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll need to look at what needs to be done for release.
arrow = { version = "32.0.0", features = ["ffi"], default_features = false} | ||
libloading = "0.7" | ||
|
||
# TODO: support arrow2 with a non-default feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the upcoming merger - should we target arrow2 directly if that is going to be the primary implementation?
/// | ||
/// Can be used in combination with [check_err] when implementing ADBC FFI | ||
/// functions. | ||
pub trait AdbcError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Instead of requiring boxing everywhere, what about having a concrete error type and just converting to/from the FFI version as necessary?
Also in general, I think it'd be better to compartmentalize the FFI bits the same way Go does, instead of assuming FFI as a fundamental part of the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this now. When I was thinking more FFI-centric this made a bit more sense. I have refactored to use a concrete struct in #478.
/// Databases hold state shared by multiple connections. This typically means | ||
/// configuration and caches. For in-memory databases, it provides a place to | ||
/// hold ownership of the in-memory database. | ||
pub trait DatabaseApi { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it typical to suffix traits with 'Api'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mostly to differentiate between the AdbcDatabase
in the driver_manager module and this trait. But given we want to be less FFI-centric, I renamed the driver_manager AdbcDatabase
-> DriverDatabase
and renamed this one AdbcDatabase
.
Sounds good to me. Thanks Will! |
I think this has been superseded? |
No description provided.