-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34737: [C#] C Data interface for schemas and types #34133
Conversation
|
fce7987
to
59eb7ea
Compare
59f34ac
to
5198a0f
Compare
132b21c
to
87e734f
Compare
@adamreeve would you be interested in reviewing? |
public string metadata; | ||
public long flags; | ||
public long n_children; | ||
public IntPtr* children; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should children
just be IntPtr
rather than IntPtr*
? That would allow the struct itself to not need to be unsafe, although might make dealing with children
internally a bit more fiddly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looking at this, I don't know how I would do this without unsafe:
Marshal.AllocHGlobal(numFields * sizeof(IntPtr))
And that's called in the constructor. (Compiler says that you can't call sizeof(IntPtr)
outside of an unsafe context; not sure why.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about IntPtr.Size
?
Marshal.AllocHGlobal(numFields * IntPtr.Size);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think IntPtr.Size
works here, for other struct types you'd generally want to use Marshal.SizeOf
in this scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I've narrowed it so unsafe
only appears when constructing and accessing children. Does that seem like the improvement we are looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yeah I think that is preferable
d4aff85
to
293db1d
Compare
Thanks for the feedback @adamreeve. I've realized I could simplify the API by changing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just barely started looking at this but will have to come back to it tomorrow.
using System.Runtime.InteropServices; | ||
using Apache.Arrow.Types; | ||
|
||
[UnmanagedFunctionPointer(CallingConvention.StdCall)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why StdCall
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'm not sure on this. It was the default and seemed to work with the integration tests fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I read up some more on this. I don't think StdCall
is the default (that would be WinApi
which defaults to StdCall
on Windows and CDecl
on Linux).
That being said, I don't think we need to worry about calling convention anywhere as it is only relevant when working with 32-bit binaries: https://stackoverflow.com/questions/34832679/is-the-callingconvention-ignored-in-64-bit-net-applications
I'm pretty sure Arrow-C++ doesn't even build on 32-bit (#32111) so this is probably a non-issue. With that in mind, perhaps skip the UnamangedFunctionPointer
entirely? It seems its only purpose is to specify the calling convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace Arrow C++ does build on 32-bit Linux (we have nightly crossbow tests for that).
public string metadata; | ||
public long flags; | ||
public long n_children; | ||
public IntPtr* children; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about IntPtr.Size
?
Marshal.AllocHGlobal(numFields * IntPtr.Size);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a few more comments that can be addressed post merge.
Thanks for the contribution here, @wjones127!
I'll merge this early this week, unless I see more feedback.
} | ||
} | ||
|
||
public class CDataSchemaPythonTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split this into a new file? Having a single class per file makes it easier to navigate the code and find what you are looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
public long n_children; | ||
public CArrowSchema** children; | ||
public CArrowSchema* dictionary; | ||
public delegate* unmanaged[Stdcall]<CArrowSchema*, void> release; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future (when we start targeting net6.0+), I think we will want to drop the [Stdcall]
here and just use the default:
public delegate* unmanaged<CArrowSchema*, void> release;
But that doesn't work in the older TFMs, so what we have now is fine.
{ | ||
ExportType(field.DataType, schema); | ||
schema->name = StringUtil.ToCStringUtf8(field.Name); | ||
// TODO: field metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you log issues for these TODOs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Re-running the macOS 12 Python 3 leg which failed. Once it passes, I'll merge this PR. |
@westonpace - I see you still have changes requested. Is that stale? or are you still requesting changes on the current proposal? The macOS 12 Python 3 leg is still failing. Does anyone know if that is related to these changes? @wjones127, can you merge with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my review was stale. I'm happy with where this is now.
Benchmark runs are scheduled for baseline = 7c307b1 and contender = f02d351. f02d351 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…4133) ### Rationale for this change This starts the C Data Interface implementation for C# with integration for `ArrowSchema`. `ArrowArray` will come in a follow-up PR. ### What changes are included in this PR? * Adds classes `CArrowSchema` and `ImportedArrowSchema` which allow interacting with the `CArrowSchema`. * Adds integration tests with PyArrow, inspired by the similar integration tests in [arrow-rs](https://github.com/apache/arrow-rs/blob/master/arrow/src/pyarrow.rs) ### Are these changes tested? Yes, the PyArrow integration tests validate the functionality. ### Are there any user-facing changes? This only adds new APIs, and doesn't change any existing ones. * Closes: apache#33856 * Closes: apache#34737 Lead-authored-by: Will Jones <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Eric Erhardt <[email protected]>
…4133) ### Rationale for this change This starts the C Data Interface implementation for C# with integration for `ArrowSchema`. `ArrowArray` will come in a follow-up PR. ### What changes are included in this PR? * Adds classes `CArrowSchema` and `ImportedArrowSchema` which allow interacting with the `CArrowSchema`. * Adds integration tests with PyArrow, inspired by the similar integration tests in [arrow-rs](https://github.com/apache/arrow-rs/blob/master/arrow/src/pyarrow.rs) ### Are these changes tested? Yes, the PyArrow integration tests validate the functionality. ### Are there any user-facing changes? This only adds new APIs, and doesn't change any existing ones. * Closes: apache#33856 * Closes: apache#34737 Lead-authored-by: Will Jones <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Eric Erhardt <[email protected]>
Rationale for this change
This starts the C Data Interface implementation for C# with integration for
ArrowSchema
.ArrowArray
will come in a follow-up PR.What changes are included in this PR?
CArrowSchema
andImportedArrowSchema
which allow interacting with theCArrowSchema
.Are these changes tested?
Yes, the PyArrow integration tests validate the functionality.
Are there any user-facing changes?
This only adds new APIs, and doesn't change any existing ones.