-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement 64-bit integer based replacements for .NET memory types #2
base: main
Are you sure you want to change the base?
Conversation
} | ||
} | ||
|
||
public void CopyTo(LongMemory<T> destination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be inefficient with large datasets. What about using Buffer.MemoryCopy or a bulk copy method to optimize performance?
{ | ||
private readonly T[] _array; | ||
private readonly long _length; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought is you could look into using .Net's ArrayPool to mitigate large object allocation. ArrayPool provides reusable arrays, reducing the overhead of frequent allocations and deallocations in memory-intensive applications. It works by pooling arrays of various sizes and reusing them.
Nice article: https://medium.com/@epeshk/the-big-performance-difference-between-arraypools-in-net-b25c9fc5e31d
{ | ||
internal readonly struct LongMemory<T> | ||
{ | ||
private readonly T[] _array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type probably needs to hold a raw pointer rather than an array, arrays in .NET are limited to 2 GiB, which is why the new long types are needed. It might be worth reaching out to Curt who's the Arrow maintainer to say you're looking into this and asking if it's possible for him to share the work he started on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about this some more, rather than a raw pointer it probably needs to hold a LongMemoryManager or ILongMemoryOwner, as you need to be able to Pin this and ensure the underlying memory isn't garbage collected. The memory manager is then the thing that might hold a pointer, but here should just be an abstract base class like MemoryManager
} | ||
} | ||
|
||
public Span<T> Span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably return a LongSpan
, not sure if the name should change too.
|
||
public Span<T> AsSpan() | ||
{ | ||
return _memory.Span; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting a Span from a LongSpan should require checking that the length is less than int.MaxValue, and throw an exception if not.
{ | ||
internal readonly struct LongSpan<T> | ||
{ | ||
private readonly LongMemory<T> _memory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to look at how the existing Span type and other corresponding types are implemented in .NET. Eg. if you go to the docs for the Span type (https://learn.microsoft.com/en-us/dotnet/api/system.span-1?view=net-8.0) there's a "Source" link that takes you to https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/Span.cs
There you can see Span holds:
internal readonly ref T _reference;
private readonly int _length;
I imagine that LongSpan should then look like:
internal readonly ref T _reference;
private readonly long _length;
(and so LongSpan will need to be a ref struct
)
|
||
namespace Apache.Arrow.Memory | ||
{ | ||
internal class LongMemoryOwner<T>: IMemoryOwner<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to define a LongMemoryOwner class, but rather an ILongMemoryOwner
interface similar to IMemoryOwner
private IntPtr _ptr; | ||
private long _size; | ||
|
||
public LargeMemory(long size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to make these types match the standard, non-large types as closely as possible. Eg. you can see the source for System.Memory
here. In .NET the Memory<T>
type doesn't have a constructor that will allocate memory, and it isn't disposable. It's a struct that holds a reference to an array or a MemoryManager
. I think for simplicity we can skip supporting an array backed LargeMemory
and only work with a LargeMemoryManager
, so LargeMemory
should be quite simple and mostly delegate functionality to a LargeMemoryManager
.
The MemoryManager
is the thing that is disposable and can actually manage the memory, but this is an abstract class so it doesn't implement memory allocation or disposal directly. In Arrow there's a NativeMemoryManager, which inherits from MemoryManager
, and I think a lot of the functionality currently in this LargeMemory
class should be in a new LargeNativeMemoryManager
class instead, which should look pretty similar to the NativeMemoryManager
.
Arrow additionally separates the functionality for allocating memory from the MemoryManager
, eg. with the NativeMemoryAllocator class.
In order to integrate this new large memory functionality with Arrow, we might want to do something like extend the MemoryAllocator
abstract base class by adding a new method, public ILargeMemoryOwner<byte> AllocateLarge(long length)
. Then the existing NativeMemoryAllocator
could be extended to create either a NativeMemoryManager
or LargeNativeMemoryManager
depending on which method is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice progress Anreet, I've added some comments on the latest changes (some of it just repeating what we discussed in the meeting)
|
||
namespace Apache.Arrow.Memory | ||
{ | ||
internal interface ILargeNativeAllocationOwner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a separate interface here, we can just add a new overload of the Release
method to INativeAllocationOwner
.
|
||
namespace Apache.Arrow.Memory | ||
{ | ||
internal interface ILargeOwnableAllocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this class is needed yet, let's remove it and we can always add it back later if needed when looking at C Data Interface support.
|
||
namespace Apache.Arrow.Memory | ||
{ | ||
internal abstract class LargeImportedAllocationOwner:ILargeNativeAllocationOwner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, we can remove this for now, it's only needed for the C Data Interface.
|
||
namespace Apache.Arrow.Memory | ||
{ | ||
internal abstract class LargeMemoryAllocator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than a separate class, we should add methods for large memory allocation to the existing MemoryAllocator
class.
|
||
namespace Apache.Arrow.Memory | ||
{ | ||
internal class LargeNativeMemoryAllocator:LargeMemoryAllocator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned on the LargeMemoryAllocator
, this logic should be added to the existing NativeMemoryAllocator
class rather than be a separate class.
|
||
internal class LargeNullMemoryOwner: ILargeMemoryOwner<byte> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a new class, we can make NullMemoryOwner
also implement ILargeMemoryOwner<byte>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just noticed my previous review wasn't submitted, so this doesn't account for the most recent changes
public override unsafe Span<byte> GetSpan() | ||
{ | ||
void* ptr = CalculatePointer(0); | ||
return new Span<byte>(ptr, _length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check for overflow here? Might also want a GetLargeSpan
method.
// TODO: Should the allocation be moved to NativeMemory? | ||
|
||
long size = length + Alignment; | ||
IntPtr ptr = Marshal.AllocHGlobal(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllocHGlobal only accepts an int size, so we'll need to find a way to allocate > max int32. Maybe use NativeMemory.Alloc
instead (https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.nativememory.alloc?view=net-8.0#system-runtime-interopservices-nativememory-alloc(system-uintptr))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looks like we can use NativeMemory.AlignedAlloc
and then we don't need to deal with alignment ourselves
GC.AddMemoryPressure(bytesAllocated); | ||
|
||
// Ensure all allocated memory is zeroed. | ||
manager.LargeMemory.Span.Fill(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to use a LargeSpan here so we can fill > 2 GB
|
||
namespace Apache.Arrow.Memory | ||
{ | ||
public abstract class LargeMemoryManager<T>:ILargeMemoryOwner<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to help with using an ILargeMemoryOwner in ArrowBuffer, it would help if this also implements the IMemoryOwner interface.
This PR implements 64-bit integer based replacements for .NET memory types (Span, ReadOnlySpan, ReadOnlyMemory, Memory, MemoryManager and MemoryOwner)