-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(csharp/src/Drivers/Apache/Spark): poc - Support for Apache Spark over HTTP (non-Arrow) #2018
feat(csharp/src/Drivers/Apache/Spark): poc - Support for Apache Spark over HTTP (non-Arrow) #2018
Conversation
…-bq/spark-ds-variants
…-bq/spark-ds-variants
csharp/src/Drivers/Apache/Thrift/Service/Rpc/Thrift/TBinaryColumn.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! This is only a partial review as I've exceeded the time I had allocated for a first pass.
| BOOLEAN | Boolean | bool | | ||
| CHAR | String | string | | ||
| DATE* | *String* | *string* | | ||
| DECIMAL* | *String* | *string* | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a fundamental limitation or an implementation issue? Are we getting back enough information to do the conversion locally when the data is returned in a Thrift vs Arrow format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CurtHagenlocher
This is a "Thrift" limitation - if we don't do any further conversion. We have the opportunity to convert the string types to native type - at the expense of memory/performance. Should we do always do the conversion on the client driver? Or possibly allow an option to decide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct thing is to do the conversion locally, yes. I think it would be very surprising for a user to run a query that produces a decimal or date value and to get that back as a string. I understand that we've been avoiding this for structured types where it's perhaps a little more justifiable but this just seems wrong.
One thing we could do to unblock this checkin is to add an option for converting scalar types and to fail if the value is not explicitly passed as false. This would let us add support in a subsequent change without breaking consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced option data_type_conv
with value none
to indicate that no conversion should be done. Indicated future support for scalar
to convert date, decimal and timestamp types.
csharp/src/Drivers/Apache/Thrift/Service/Rpc/Thrift/TDoubleColumn.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! I do think we can't just return e.g. dates or decimals as strings, but I've made a suggestion for a small change that would unblock this change and let us fix that separately.
var memory = offsetBuffer.AsMemory(); | ||
var typedMemory = Unsafe.As<Memory<byte>, Memory<int>>(ref memory).Slice(0, length + 1); | ||
|
||
for(int _i197 = 0; _i197 < length; ++_i197) | ||
{ | ||
//typedMemory.Span[_i197] = offset; | ||
StreamExtensions.WriteInt32LittleEndian(offset, memory.Span, _i197 * 4); | ||
StreamExtensions.WriteInt32LittleEndian(offset, memory.Span, _i197 * IntSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For .NET 6+ we could use e.g. BinaryPrimitives.WriteInt32LittleEndian for what I suspect is an efficiency gain. This could be done as a followup and/or performance-tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay will create a follow-up to .NET 6+ support for BinaryPrimitives.WriteInt32LittleEndian
csharp/src/Drivers/Apache/Thrift/Service/Rpc/Thrift/TBinaryColumn.cs
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Apache/Thrift/Service/Rpc/Thrift/TBinaryColumn.cs
Outdated
Show resolved
Hide resolved
| BOOLEAN | Boolean | bool | | ||
| CHAR | String | string | | ||
| DATE* | *String* | *string* | | ||
| DECIMAL* | *String* | *string* | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct thing is to do the conversion locally, yes. I think it would be very surprising for a user to run a query that produces a decimal or date value and to get that back as a string. I understand that we've been avoiding this for structured types where it's perhaps a little more justifiable but this just seems wrong.
One thing we could do to unblock this checkin is to add an option for converting scalar types and to fail if the value is not explicitly passed as false. This would let us add support in a subsequent change without breaking consumers.
Proof-of-concept
Adds support for standard Apache Spark data source
adbc.spark.type
parameter option to specify the Spark server type.Closes #2015