-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(csharp): redefine C# APIs to prioritize full async support #1865
Conversation
Wow, C#/.NET has thought this all through, pretty neat. |
/// </summary> | ||
/// <param name="codes">The metadata items to fetch.</param> | ||
/// <returns>Metadata about the driver and/or database</returns> | ||
public virtual Task<IArrowArrayStream> GetInfo(ReadOnlySpan<AdbcInfoCode> codes, CancellationToken cancellationToken = default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetInfoAsync?
/// <param name="targetTableName">The table name</param> | ||
/// <param name="mode">The ingest mode</param> | ||
/// <param name="isTemporary">True for a temporary table. Catalog and Schema must be null when true.</param> | ||
public virtual AdbcStatement BulkIngest(string? targetCatalog, string? targetDbSchema, string targetTableName, BulkIngestMode mode, bool isTemporary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this have an equivalent async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this does is setup a statement for later binding and execution. It's not intended to do anything on the server side. We could instead have something a little more complete like
void BulkIngest(string? targetCatalog, string? targetDbSchema, string targetTableName, BulkIngestMode mode, bool isTemporary, IArrowArrayStream data);
Task BulkIngestAsync(string? targetCatalog, string? targetDbSchema, string targetTableName, BulkIngestMode mode, bool isTemporary, IArrowArrayStream data, CancellationToken cancellationToken);
and with that definition we'd obviously need an async version.
There were a few reasons I chose not to do this (or not to do this now) including the need to have both variations (the equivalents of Bind and BindStream) and the longer-term desire to support the progress API which might then require additional overloads. But I'm definitely conflicted on the best thing to do for right now.
That said, if CreateStatement has an async version then this would get one too.
/// <summary> | ||
/// Create a new statement that can be executed. | ||
/// </summary> | ||
public abstract AdbcStatement CreateStatement(IReadOnlyDictionary<string, object>? options = default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't an expensive operation, but does it need an Async equivalent for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, part of the goal of the API is to signal to implementers what they should be doing. I see a statement as a kind of unit of internal bookkeeping which doesn't use database resources until you ask it to do something else. By not having an async function for creating a statement, this intent is communicated fairly clearly to implementers. This is also consistent with ADO.NET's DbCommand, which doesn't have an async construction function (though there were likely other considerations there that we're not faced with).
If there's a demonstrated need to do something differently, we can later add an async method that returns a ValueTask and that would be consistent with everything else we have.
That said, I'm mostly neutral on adding it now so if you think it's better to have it then I'll change it.
{ | ||
} | ||
|
||
public ValueTask DisposeAsync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need an equivalent #if NET5_0_OR_GREATER directive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not strictly required because you can define whatever function you want; only the interface it implements is missing in older versions of .NET. I guess my design principle here is to #ifdef only when necessary; but does that give a misleading impression in this case?
/// </summary> | ||
/// <param name="key">Option name</param> | ||
/// <returns>Option value.</returns> | ||
public virtual object GetOption(string key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs equivalent Async
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to Driver.Open: if the ADBC "database" is like a connection string then the options should all be local; we haven't connected to anything yet. But I'll take a look at e.g. the Flight implementation to see how well my interpretation holds up.
/// For example, in-memory databases can place ownership of the actual | ||
/// database in this object. | ||
/// </summary> | ||
public abstract class AdbcDatabase11 : IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include IAsyncDisposable?
/// This provides a common interface for vendor-specific driver | ||
/// initialization routines. | ||
/// </summary> | ||
public abstract class AdbcDriver11 : IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include IAsyncDisposable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidhcoe - I don't think we need IAsyncDisposable because there are no async methods in the class.
/// Open a database via this driver. | ||
/// </summary> | ||
/// <param name="parameters">Driver-specific parameters.</param> | ||
public abstract AdbcDatabase Open(IReadOnlyDictionary<string, string> parameters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need an equivalent OpenAsync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to CreateStatement. I see an ADBC database as being a bookkeeping thing; more like a connection string than something that's actually connected to anything. Through that interpretation, the name "Open" is misleading here and should probably be replaced.
If we change this for CreateStatement now then we'd change this here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments/suggestions with regards to inheritance.
/// This provides a common interface for vendor-specific driver | ||
/// initialization routines. | ||
/// </summary> | ||
public abstract class AdbcDriver11 : IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidhcoe - I don't think we need IAsyncDisposable because there are no async methods in the class.
/// <summary> | ||
/// Returns the version of the ADBC spec supported by this driver. | ||
/// </summary> | ||
public virtual int Version => AdbcVersion.Version_1_0_0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this property be available in all classes? Or just use .NET type management?
I see AdbcDriver.DriverVersion
is the functional equivalent of this property. If we inherit from AdbcDriver
, I guess we should stick to DriverVersion
. Personally, I'd prefer this property be named AdbcVersion
to disambiguate between driver, dbms, etc? Especially since it's an int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a consumer that cares would check this when loading the driver and remember the value. Putting it on all the types seems like it would be annoying for the implementer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's all I have for now.
/// | ||
/// This must always be thread-safe. | ||
/// </remarks> | ||
public virtual void Cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought I had was that, in the past, I've had to send a message to the server to stop processing and end the session on the server-side. So it might be useful to have a CancelAsync
(and in Statement) if the implementation requires a call to the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cancel is what you call when you're using the synchronous API. This already has to be on another thread because the synchronous API for e.g. ExecuteQuery will block the thread it's called from. If you're using the asynchronous API, then you should use the cancellation token instead of calling this method.
In the absence of additional feedback I'm going to check these in now and start refactoring the code to reference the newer classes. I would say the only open questions are whether or not there should be more async for the Driver and Database (I argue "no" but was hoping for more discussion) and the "BulkIngest" API which feels incomplete or wrong. |
For #1843, redefines the C# APIs to prioritize full async support. As this is making a number of breaking changes already, it also takes the opportunity to do some general cleanup.
Async methods that are generally expected to run locally (e.g. GetOption/SetOption) are defined to return ValueTask and have their default implementation be synchronous. Async methods that are generally expected to run remotely are defined to return Task and have their default implementations be asynchronous.
My mental model for the four ADBC "object" types is as follows:
The driver is analogous to the ODBC or JDBC driver, or ADO.NET provider. In JDBC, this type is represented by the java.sql.Driver interface. In ADO.NET, it's represented by the DbProviderFactory class. Because this object is strictly about code, it's not expected to do any IO other than that potentially required by any code e.g. to bring in pages of a binary image from disk.
The database is analogous to the JDBC DataSource, the ODBC "DSN" or the ADO.NET connection string. It represents the information and capability required to create a database connection but does not itself do IO until it tries to create a connection. (This would imply that parameter validation which requires network access -- e.g. to validate a host name -- is deferred until the connection is created. Perhaps that's too limiting?)
Because neither the driver nor the database is doing IO, neither of them need to have async methods other than Connect, including async cleanup.
The connection represents an actual session with a database. This matches an ODBC connection, a JDBC java.sql.Connection or an ADO.NET DbConnection. Opening a connection, closing it or using it to fetch information about the data source are all operations likely to require IO so these all require async implementations.
The statement is a unit of bookkeeping related to certain types of database operations. In some cases, a connection can only have a single active statement running against it, but it can be useful to have multiple statements even then if, for instance, each one is a prepared statement that represents both client-side and server-side resources. The statement is analogous to an ODBC statement, a JDBC java.sql.Statement or an ADO.NET DbCommand. Due to the need to clean up an in progress operation or to release server-side resources, the cleanup of a statement might do IO and should therefore support asynchrony. But when the statement is first created, it only represents the potential for future work and so creation is always synchronous.
I'd be curious to hear how well this aligns with others' points of view. lidavidm? davidhcoe? ("ping" removed)