Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 64-bit integer based replacements for .NET memory types #2

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

anreetka
Copy link
Owner

This PR implements 64-bit integer based replacements for .NET memory types (Span, ReadOnlySpan, ReadOnlyMemory, Memory, MemoryManager and MemoryOwner)

}
}

public void CopyTo(LongMemory<T> destination)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be inefficient with large datasets. What about using Buffer.MemoryCopy or a bulk copy method to optimize performance?

{
private readonly T[] _array;
private readonly long _length;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought is you could look into using .Net's ArrayPool to mitigate large object allocation. ArrayPool provides reusable arrays, reducing the overhead of frequent allocations and deallocations in memory-intensive applications. It works by pooling arrays of various sizes and reusing them.

Nice article: https://medium.com/@epeshk/the-big-performance-difference-between-arraypools-in-net-b25c9fc5e31d

{
internal readonly struct LongMemory<T>
{
private readonly T[] _array;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type probably needs to hold a raw pointer rather than an array, arrays in .NET are limited to 2 GiB, which is why the new long types are needed. It might be worth reaching out to Curt who's the Arrow maintainer to say you're looking into this and asking if it's possible for him to share the work he started on.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@adamreeve adamreeve Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about this some more, rather than a raw pointer it probably needs to hold a LongMemoryManager or ILongMemoryOwner, as you need to be able to Pin this and ensure the underlying memory isn't garbage collected. The memory manager is then the thing that might hold a pointer, but here should just be an abstract base class like MemoryManager

}
}

public Span<T> Span

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably return a LongSpan, not sure if the name should change too.


public Span<T> AsSpan()
{
return _memory.Span;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting a Span from a LongSpan should require checking that the length is less than int.MaxValue, and throw an exception if not.

{
internal readonly struct LongSpan<T>
{
private readonly LongMemory<T> _memory;
Copy link

@adamreeve adamreeve Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to look at how the existing Span type and other corresponding types are implemented in .NET. Eg. if you go to the docs for the Span type (https://learn.microsoft.com/en-us/dotnet/api/system.span-1?view=net-8.0) there's a "Source" link that takes you to https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/Span.cs

There you can see Span holds:

internal readonly ref T _reference;
private readonly int _length;

I imagine that LongSpan should then look like:

internal readonly ref T _reference;
private readonly long _length;

(and so LongSpan will need to be a ref struct )


namespace Apache.Arrow.Memory
{
internal class LongMemoryOwner<T>: IMemoryOwner<T>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to define a LongMemoryOwner class, but rather an ILongMemoryOwner interface similar to IMemoryOwner

@anreetka anreetka requested a review from adamreeve November 6, 2024 18:06
private IntPtr _ptr;
private long _size;

public LargeMemory(long size)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to make these types match the standard, non-large types as closely as possible. Eg. you can see the source for System.Memory here. In .NET the Memory<T> type doesn't have a constructor that will allocate memory, and it isn't disposable. It's a struct that holds a reference to an array or a MemoryManager. I think for simplicity we can skip supporting an array backed LargeMemory and only work with a LargeMemoryManager, so LargeMemory should be quite simple and mostly delegate functionality to a LargeMemoryManager.

The MemoryManager is the thing that is disposable and can actually manage the memory, but this is an abstract class so it doesn't implement memory allocation or disposal directly. In Arrow there's a NativeMemoryManager, which inherits from MemoryManager, and I think a lot of the functionality currently in this LargeMemory class should be in a new LargeNativeMemoryManager class instead, which should look pretty similar to the NativeMemoryManager.

Arrow additionally separates the functionality for allocating memory from the MemoryManager, eg. with the NativeMemoryAllocator class.

In order to integrate this new large memory functionality with Arrow, we might want to do something like extend the MemoryAllocator abstract base class by adding a new method, public ILargeMemoryOwner<byte> AllocateLarge(long length). Then the existing NativeMemoryAllocator could be extended to create either a NativeMemoryManager or LargeNativeMemoryManager depending on which method is used.

Copy link

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress Anreet, I've added some comments on the latest changes (some of it just repeating what we discussed in the meeting)


namespace Apache.Arrow.Memory
{
internal interface ILargeNativeAllocationOwner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a separate interface here, we can just add a new overload of the Release method to INativeAllocationOwner.


namespace Apache.Arrow.Memory
{
internal interface ILargeOwnableAllocation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this class is needed yet, let's remove it and we can always add it back later if needed when looking at C Data Interface support.


namespace Apache.Arrow.Memory
{
internal abstract class LargeImportedAllocationOwner:ILargeNativeAllocationOwner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we can remove this for now, it's only needed for the C Data Interface.


namespace Apache.Arrow.Memory
{
internal abstract class LargeMemoryAllocator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than a separate class, we should add methods for large memory allocation to the existing MemoryAllocator class.


namespace Apache.Arrow.Memory
{
internal class LargeNativeMemoryAllocator:LargeMemoryAllocator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on the LargeMemoryAllocator, this logic should be added to the existing NativeMemoryAllocator class rather than be a separate class.


internal class LargeNullMemoryOwner: ILargeMemoryOwner<byte>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a new class, we can make NullMemoryOwner also implement ILargeMemoryOwner<byte>

Copy link

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just noticed my previous review wasn't submitted, so this doesn't account for the most recent changes

public override unsafe Span<byte> GetSpan()
{
void* ptr = CalculatePointer(0);
return new Span<byte>(ptr, _length);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for overflow here? Might also want a GetLargeSpan method.

// TODO: Should the allocation be moved to NativeMemory?

long size = length + Alignment;
IntPtr ptr = Marshal.AllocHGlobal(size);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllocHGlobal only accepts an int size, so we'll need to find a way to allocate > max int32. Maybe use NativeMemory.Alloc instead (https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.nativememory.alloc?view=net-8.0#system-runtime-interopservices-nativememory-alloc(system-uintptr))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looks like we can use NativeMemory.AlignedAlloc and then we don't need to deal with alignment ourselves

GC.AddMemoryPressure(bytesAllocated);

// Ensure all allocated memory is zeroed.
manager.LargeMemory.Span.Fill(0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to use a LargeSpan here so we can fill > 2 GB


namespace Apache.Arrow.Memory
{
public abstract class LargeMemoryManager<T>:ILargeMemoryOwner<T>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to help with using an ILargeMemoryOwner in ArrowBuffer, it would help if this also implements the IMemoryOwner interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants