Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 64 bits support to Array underlying storage #12221

Open
GPSnoopy opened this issue Mar 8, 2019 · 160 comments
Open

Add 64 bits support to Array underlying storage #12221

GPSnoopy opened this issue Mar 8, 2019 · 160 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@GPSnoopy
Copy link

GPSnoopy commented Mar 8, 2019

While System.Array API supports LongLength and operator this[long i], the CLR does not allow arrays to be allocated with more than 2^31-1 elements (int.MaxValue).

This limitation has become a daily annoyance when working with HPC or big data. We frequently hit this limit.

Why this matters

  • .NET arrays is the de facto storage unit of many APIs & collections.
  • Currently one has to allocate native memory instead, which does not work with managed code (Span could have helped, but it's been limited to int32 as well).
  • HPC frameworks expects data to be contiguous in memory (i.e. the data cannot be split into separate arrays). E.g. BLAS libraries.
  • Requires applications to be coded and designed differently if they handle <2G elements or >2G elements. I.e. It does not scale.
  • It's an arbitrary limitation with little value to the user and application:
    • With a desktop with 64GB, why can one only allocate 3.1% of its memory in one go?
    • With a data centre machine with 1,536GB, why can one only allocate 0.1% of its memory in one go?

In C++ this is solved with std::size_t (whose typedef changes depending on the target platform). Ideally, .NET would have taken the same route when designing System.Array. Why they haven't is a mystery, given that AMD64 and .NET Framework appeared around the same time.

Proposal
I suggest that when the CLR/JIT runs the .NET application in x64, it allows the array long constructor to allocate more than int.MaxValue items:

  • Indexing the array with operator this[long i] should work as expected and give access to the entire array.
  • Indexing the array with operator this[int i] should work as expected but implicitly limit the access to only the first int.MaxValue elements.
  • The LongLength property should return the total number of elements in the array.
  • The Length property should return the total number of elements in the array, or throw OverflowException of there are more than int.MaxValue elements (this matches the current behaviour on multi-dimensional arrays with more than int.MaxValue elements).

I naively believe that the above should not break any existing application.

Bonus points for extending 64-bit support to Span and ReadOnlySpan.

@tannergooding
Copy link
Member

I naively believe that the above should not break any existing application.

One of the simplest breaks becomes any program that is doing the following:

for (int i = 0; i < array.Length; i++)
{
}

This works fine for any existing programs, since CoreCLR actually defines a limit that is just under int.MaxValue. Allowing values that are greater than int.MaxValue would cause an overflow to -2147483648 and either cause unexpected behavior or cause an IndexOutOfRangeException to be thrown.

@giuliojiang
Copy link

giuliojiang commented Mar 8, 2019

I frequently run into this array when processing large chunks of data from network streams into byte arrays, and always need to implement chunking logic in order to be able to process the data.
It would be great to be able to use long indexes on arrays

@GPSnoopy
Copy link
Author

GPSnoopy commented Mar 8, 2019

@tannergooding That's why I proposed above to keep the existing behaviour of throwing OverflowException on Length when there are more than int.MaxValue elements. Nothing changes there.

I'm suggesting that changing the CLR implementation as proposed above would allow applications and people who want to to use large arrays without breaking existing applications. You are right that simply passing a large array into a library that does not support it will break, but at least this will give us choice. We need to start somewhere, and .NET cannot keep ignoring this problem.

@philjdf
Copy link

philjdf commented Mar 8, 2019

Yesterday I happily created an array in Python which contained more than two billion elements. When can I do the same in .NET?

Currently we get an exception if we try to construct an array with more than 2B elements. What's wrong with deferring that exception until something calls the Length property which can no longer return a valid int? @tannergooding's example wouldn't cause problems. Are there other examples which break?

@GrabYourPitchforks
Copy link
Member

This is an interesting idea, but I wonder if it would be better to have a LargeArray<T> or similar class that has these semantics rather than try to shoehorn it into the existing Array class. The reason I suggest this course of action is that the GC currently has a hard dependency on the element count of any variable-length managed object fitting into a 32-bit signed integer. Changing the normal Array class to have a 64-bit length property would at minimum also affect the String type, and it may have other unintended consequences throughout the GC that would hinder its efficiency, even when collecting normal fixed-size objects. Additionally, accessing the existing Array.Length property would no longer be a simple one-instruction dereference; it'd now be an overflow check with associated branching.

If we had a theoretical LargeArray<T> class, it could be created from the beginning with a "correct" API surface, including even using nuint instead of long for the indexer. If it allowed T to be a reference type, we could also eliminate the weird pseudo-covariance / contravariance behavior that existing Array instances have, which would make writing to a LargeArray<T> potentially cheaper than writing to a normal Array.

@GPSnoopy GPSnoopy changed the title Add 64-bit support to Array underlying storage Add 64 bits support to Array underlying storage Apr 4, 2019
@GPSnoopy
Copy link
Author

GPSnoopy commented Apr 4, 2019

@GrabYourPitchforks interesting facts about the GC. I wasn't aware of such limitations.

A LargeArray<T> class could be a temporary solution. My main concern is that it would stay a very niche class with no interoperability with the rest of the .NET ecosystem, and ultimately would be an evolutionary dead end. I do like the idea of nuint/nint though.

My gut feeling is that Array, String and the GC are ripe for a 64 bits overhaul. We should bite the bullet and do it. So far I've been quite impressed by the .NET Core team willingness to revisit old decisions and areas that Microsoft had kept shut in the past (e.g. SIMD/platform specific instructions, float.ToString() roundtrip fixes, bug fixes that change backward compatibility, etc).

@juliusfriedman
Copy link
Contributor

I guess I could palette a LargeArray but if that's going to be implemented than I hope it doesn't create a new type which is not actually an Array, IMHO it would have been much easier to address this if we would have created another sub type of Array internally instead of inventing Span however Span also solves other problems....

@GSPP
Copy link

GSPP commented Apr 22, 2019

These days 2GB arrays are barely enough for many applications to run reliably. RAM prices have stagnated for a few years now. Surely, the industry will resolve this problem sooner or later. As RAM amounts resume increasing at Moores law rate this 2GB array issue will become very commonplace sooner or later.

A LargeArray<T> type might be a good medium term solution. But will 2GB arrays not be very commonplace 10 years from now? Do we then want to litter application code and API surfaces with LargeArray<T>? It would often be a hard choice whether to go for LargeArray<T> or T[].

Thinking in the very long term it seems far better to find a way to fix T[].

@GSPP
Copy link

GSPP commented Apr 22, 2019

If 64 bit support is implemented there could be a tool that analyzes your code for legacy patterns (e.g. usage of int Length or the typical for loop for (int i = 0; i < array.Length; i++)). The tool should then be able to mass upgrade the source code. This could be a Roslyn analyzer.

@GrabYourPitchforks
Copy link
Member

Since this would be such a massive ecosystem breaking change, one other thing you'd probably have to do is analyze the entire graph of all dependencies your application consumes. If the change were made to T[] directly (rather than LargeArray<T> or similar), assemblies would need to mark themselves with something indicating "I support this concept!", and the loader would probably want to block / warn when such assemblies are loaded. Otherwise you could end up in a scenario where two different assemblies loaded into the same application have different views of what an array is, which would result a never-ending bug farm.

@juliusfriedman
Copy link
Contributor

juliusfriedman commented Apr 26, 2019

Not if large array was an array... i.e. derived from it if only internally perhaps; like I suggested back in the span threads.

@GrabYourPitchforks
Copy link
Member

If large array is a glorified array, then you could pass a large array into an existing API that accepts a normal array, and you'd end up right back with the problems as originally described in this thread.

@GrabYourPitchforks
Copy link
Member

Furthermore, I'm not sure I buy the argument that adding large array would bifurcate the ecosystem. The scenario for large array is that you're operating with enormous data sets (potentially over 2bn elements). By definition you wouldn't be passing this data to legacy APIs anyway since those APIs wouldn't know what to do with that amount of data. Since this scenario is so specialized it almost assumes that you've already accepted that you're limited to calling APIs which have been enlightened.

@juliusfriedman
Copy link
Contributor

juliusfriedman commented Apr 27, 2019

You have LongLength on the Array

The only fundamental diff is that one is on the LOH and one is not.

By virtue of the same fact span wouldn't be able to hold more than such either so large span must be needed also...

@GPSnoopy
Copy link
Author

GPSnoopy commented Oct 2, 2019

From what I can gather and summarise from the above, there are two pieces of work.

  1. Update the CLR so that Array can works with 64-bit length and indices. This includes changes to the Array implementation itself, but as comments have pointed above, also to System.String and the Garbage Collector. It is likely to be relatively easy to come up with a fork of coreclr that can achieve this, as a proof of concept with no regard for backward compatibility.

  2. Find a realistic way to achieve backward compatibility. This is the hard part. I think this is unlikely to succeed without compromising some aspect of the CLR. Whether it is Length throwing on overflow, or awkwardly introducing new specific classes like LargeArray.

But the more I think about it, the more I think this issue is missing the point and ultimately the real problem with .NET as it stands. Even if the initial proposal was to be implemented, it would only fix the immediate 64-bit issue with Array but still leave collections and Span with the same indexing and length limitations.

I've started reading The Rust Programming Language (kind of felt overdue) and it struck me that Rust also mimics C++ size_t and ssize_t with usize and isize. C# on the other hand somehow decided not to expose this CPU architectural detail and forces everyone to the lowest common denominator for most of it's API: a 32-bit CPU with 32-bit addressing.

I'd like to emphasis that the 32-bit limitation is purely arbitrary from a user point of view. There is no such thing as a small array and a big array; an image application should not have to be implemented differently whether it works with 2,147,483,647 pixels or 2,147,483,648 pixels. Especially when it's data driven and the application has little control on what the user is up to. Even more frustrating if the hardware has long been capable of it. If you do not believe me or think I'm talking nonsense, I invite you to learn how to program for MS-DOS 16-bit with NEAR and FAR pointers (hint: there is a reason why Doom required a 386 32-bit CPU).

Instead of tinkering around the edges, what is the general appetite for a more ambitious approach to fix this limitation?

Here is a controversial suggestion, bit of a kick in the nest:

  • Take on @GrabYourPitchforks idea, introduce nint and nuint (I also like size and ssize, but I can imagine a lot of clashes with existing code).
  • Allow implicit conversion from int to nint (and uint to nuint) but not the reverse,
  • Change all Length properties on Array, String, Span and collections to return nint (leaving aside the debate of signed vs unsigned and keeping the .NET convention of signed lengths).
  • Change all indexing operators on Array, String, Span and collection to only take nint.
  • Remove LongLength properties and old indexing operators.
  • Take a page from Nullable Reference book and allow this to be an optional compilation feature (this is where it hurts).
  • Only allow a nint assembly to depend on another assembly if it's also using the new length types.
  • But allow a global or per-reference "I know what I'm doing" override, in which case calling the old int32 Length property on an Array is undefined (or just wraps around the 64-bit value).
  • Spend a decade refactoring for loops and var i = 0.

I understand this is far from ideal and can create uncomfortable ecosystem situations (Python2 vs Python3 anyone?). Open to suggestions on how to introduce size types in .NET in a way that doesn't leave .NET and C# more irrelevant on modern hardware each year.

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Oct 2, 2019

If we can solve the issue with the GC not being tolerant of variable-length objects bigger than 2 GB,
a couple things that might make LargeArray<T> with a native-word-sized-Length more palatable:

  • Array length is already represented as a pointer-sized integer in the memory layout of arrays in .NET. On 64-bit platforms, the extra bytes serve as padding and are always zero.
  • If we were to introduce a LargeArray<T>, we can give it the same memory layout as existing arrays. The only difference is that the bytes that serve as padding for normal arrays would have a meaning for LargeArray<T>.
  • If the memory layout is the same, code that operates on LargeArray<T> can also operate on normal arrays (and for a smaller LargeArray<T>, vice-versa)
  • We can enable casting between LargeArray<T> and normal arrays
  • Casting from an array to LargeArray<T> is easy - we just follow the normal array casting rules (we could get rid of the covariance though, as @GrabYourPitchforks calls out). If we know the element types, it's basically a no-op.
  • When casting from LargeArray<T> to normal arrays, we would additionally check the size and throw InvalidCastException if the LargeArray<T> instance is too long.
  • Similar rules would apply when casting to collection types (e.g. you can cast LargeArray<T> to ICollection<T> only if the element count is less than MaxInt).
  • Existing code operating on arrays doesn't have to worry about Length being longer than MaxInt. Casting rules guarantee this can't happen.
  • LargeArray<T> would not derive from System.Array, but one could explicitly cast to it (the length-check throwing InvalidCastException would happen in that case). The same for non-generic collection types (e.g. casting to ICollection that has the Int32 Count property would only be allowed for a small LargeArray<T>).

This scheme would allow LargeArray<T> and normal arrays to co-exist pretty seamlessly.

@kasthack
Copy link

kasthack commented Oct 2, 2019

@MichalStrehovsky

Similar rules would apply when casting to collection types (e.g. you can cast LargeArray to ICollection only if the element count is less than MaxInt).

This would make LargeArray<T> incompatible with a lot of older code in these cases:

  • Methods that currently accept a more specific type than they actually need(like ICollection<T> instead of IEnumerable<T> while the code just enumerates the values).

  • Probably, some cases where ICollection<T> isn't used directly but inherited from. For instance, methods that accept ISet/IList/IDictionary<...>(I assume, that those interfaces and their implementations would be eventually updated for 64-bit lengths as well) which are inherited from ICollection<T>.

I would go with overflow checks when .Count is called to keep the compatibility and add .LongCount property with a default implementation to the old interfaces.

@MichalStrehovsky
Copy link
Member

@kasthack I was trying to come up with a compatible solution where one never has to worry about getting OverflowException in the middle of a NuGet package that one doesn't have the source for. Allowing cast to ICollection<T> to succeed no matter the size is really no different from just allowing arrays to be longer than 2 GB (no need for a LargeArray<T> type). Some code will work, some won't.

With explicit casting, it's clear that we're crossing a boundary into "legacy land" and we need to make sure "legacy land" can do everything it would reasonably be able to do with a normal ICollection<T> before we do the cast.

methods that accept ISet/IList/IDictionary<...>

Arrays don't implement ISet/IDictionary so a cast to these would never succeed. For IList, the same rules would apply as for ICollection (ICollection was just an example above).

@philjdf
Copy link

philjdf commented Oct 3, 2019

@GPSnoopy's post makes me wonder whether the following variation might make sense:

  1. Introduce new nint and nuint types, but don't change the signatures of anything to use them. Nothing breaks.
  2. Introduce new arrays types (with fixed covariance), new span types, etc, which use nint and nuint. Keep the old ones and don't touch them. Make it fast and easy to convert between old and new versions of these types (with an exception if your 64-bit value is too big to fit into the 32-bit counterpart), but conversion should be explicit. Nothing breaks, type safety and all that.
  3. Add a C# compiler switch /HeyEveryoneIts2019 which, when you write double[] you get the new type of array instead of the old one, everything's nint and nuint, and the compiler adds conservative/obvious stuff to convert to/from old-style arrays when you call outside assemblies which want old-style arrays. This way if it gets through the conversion without an exception, you won't break any old referenced code.

@GSPP
Copy link

GSPP commented Oct 3, 2019

It has been proposed that we could make Array.Length and array indices native-sized (nint or IntPtr).

This would be a portability issue. Code would need to be tested on both bitnesses which currently is rarely required for most codebases. Code on the internet would be subtly broken all the time because developers would only test their own bitness.

Likely, there will be language level awkwardness when nint and int come into contact. This awkwardness is a main reason unsigned types are not generally used.

In C languages the zoo of integer types with their loose size guarantees is a pain point.

I don't think we want to routinely use variable-length types in normal code. If nint is introduced as a type it should be for special situations. Likely, it is most useful as a performance optimization or when interoperating with native code.


All arrays should transparently support large sizes and large indexing. There should be no LargeArray<T> and no LargeSpan<T> so that we don't bifurcate the type system. This would entail an enormous duplication of APIs that operate on arrays and spans.

If the object size increase on 32 bit is considered a problem (it might well be) this could be behind a config switch.


Code, that cares about large arrays needs to switch to long.

Likely, it will be fine even in the very long term to keep most code on int. In my experience, over all the code I ever worked on, it is quite rare to have large collections. Most collections are somehow related to a concept that is inherently fairly limited. For example, a list of customers will not have billions of items in it except if you work for one of 10 companies in the entire world. They can use long. Luckily for us, our reality is structured so that most types of objects do not exist in amounts of billions.

I see no realistic way to upgrade the existing collection system to 64 bit indexes. It would create unacceptable compatibility issues. For example, if ICollection<T>.Count becomes 64 bit, all calling code is broken (all arithmetic but also storing indexes somewhere). This must be opt-in.

It would be nicer to have a more fundamental and more elegant solution. But I think this is the best tradeoff that we can achieve.

@FraserWaters-GR
Copy link

Just note there is already a case today where Length can throw. Multi-dimensional arrays can be large enough that the total number of elements is greater than Int.MaxValue. I think this is why LongCount was originally added.

@huoyaoyuan
Copy link
Member

Note: Currently, when you write foreach over an array, the C# (and also VB) compiler actually generates a for loop, and store index in 32 bits. This means that existing code must break with array that > 2G, or at least a recompile is required.

I really hopes all size-like parameters among the ecosystem is using nuint (avoid checking for >= 0).
There can be a [SizeAttribute] on all the parameters, and JIT generates the positive guard and bit expansion to allow existing int-size-compiled assembly to run with native-sized-corelib.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020
@lostmsu
Copy link

lostmsu commented Jun 20, 2020

One option is to create a NuGet package for LargeArray<T>, and polish it in practical use. Once polished, make it part of the standard. This is what C++ did to parts of Boost.

But CLR should be ready by then.

@GPSnoopy
Copy link
Author

GPSnoopy commented Jul 2, 2020

But CLR should be ready by then.

@lostmsu Are you aware of ongoing work to fix this in CLR that I'm not aware of?

@lostmsu
Copy link

lostmsu commented Jul 2, 2020

@GPSnoopy nope, that was an argument to start the CLR work ahead of time before BCL, so that BCL could catch up.

@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 2, 2020
@MineCake147E
Copy link
Contributor

MineCake147E commented Mar 17, 2024

It's still true today that if your app takes up more than 1GB of RAM, a huge chunk of users won't be able to use it and will file bugs.

And even if the total size of someone's RAM is high, that doesn't mean any individual app can afford nearly as much of it. On my machine, each individual app always takes less than 1GB but (for some reason), more than half of my 16GB RAM is always used up even with only 3 apps running.

Users always need the right equipments to run a certain software anyway.
Application-specific machines, such as supercomputers, servers, workstations, enthusiast HEDT PCs, and gaming PCs, are all tailored for the set of softwares the users want to use.

@tannergooding
Copy link
Member

Users always need the right equipments to run a certain software anyway.
Application-specific machines, such as supercomputers, servers, workstations, enthusiast HEDT PCs, and gaming PCs, are all tailored for the set of softwares the users want to use.

There is a large difference between specialized software and general-purpose software and many of the callouts you gave are still categorized under "general purpose".

Even for games, the "typical" gaming PC (based on Steam Hardware Survey, but also account for consoles and other reporting sites) is a 6 core, 16GB PC running something like an RTX 3060. Machines running 4 cores, 8GB of RAM, and older cards like a GTX 1060 aren't that far behind in usage either, nor are many mobile chipsets.

But, even if you do have a more powerful machine, a typical user is still sharing that machine with other apps, services, and still doesn't want to go over around 80% consumption of the max (due to risk of causing thrashing and contention issues).

you had to recompile everything that's part of your app, and the world didn't fall apart

This world did semi-regularly have issues, often large issues, which app developers had to deal with. Almost every single platform (C, C++, Java, Rust, C#, Python, etc) has been moving towards making their ABIs more stable.

they're considering doing it again as a new major release to fix issues that can't be fixed otherwise

Right, sometimes breaks are necessary; especially in the AOT world. But they are being made intentional and targeted to reduce risk and pain.

I think .NET is (still) way too afraid of breaking changes. Major releases exist for a reason. IMHO it's a shame that .NET went through such a huge breaking change rebuilding itself as .NET Core and going through many incompatibilities and with so many application frameworks that are simply gone in the new .NET, yet it still didn't address any foundational issues like this

There's two main considerations here.

First, you need to show that this is an actual foundational issue. The general argument I've given above is that while having 64-bit arrays would be convenient, it is not actually a foundational issue and in many ways not an issue at all due to the alternatives that exist and the real world considerations that an application or library working with such large data would fundamentally have to make to ensure that it remains usable.

The second is that .NET Core 1.0 did try to make some big breaks and we found out by .NET Core 2.1 that many of them were a mistake and caused too much pain, so much so that we brought most APIs back and made the ecosystem almost whole/consistent again by .NET Core 3.1. The few exceptions were scenarios that were used so little and which were so fundamentally problematic, that bringing them forward as is was a non-starter and where we've since provided or are working on providing alternatives instead.

So it would still be a subpar experience in this way,

Yes, but working with large data is a different experience altogether regardless. You can't just take a data structure that was designed to work with thousands of elements and expect it to behave well when working with billions or more elements. The inverse is also true.

Any application that required working with many gigabytes or more of data and decided to do that by simply writing their code the same way as they would for kilobytes of data would end up digging it's own pitfalls. We don't live in an idealized world, costs don't actually scale linearly, not every operation is equivalent. We have many tiers of costs and we have limits on operations and how much can be pushed through a given pipeline. An application that might saturate those pipelines has to rethink how its working so it doesn't become bottlenecked. It likewise needs to aim to balance across all pipelines to ensure they all achieve suitable levels of use (neither maxing them out or underutilizing them).

.NET arrays already allow you to work with gigabytes of data, up to almost 2.14 billion elements (regardless of the size per element). Once you start approaching this limit, you're already at the point where you probably should've started restructuring and rethinking how your application work, even if your machine has terabytes of RAM available to it.

@MineCake147E
Copy link
Contributor

MineCake147E commented Mar 19, 2024

Any application that required working with many gigabytes or more of data and decided to do that by simply writing their code the same way as they would for kilobytes of data would end up digging it's own pitfalls.

It's certainly true that applications dealing with substantial amounts of data need careful consideration in their design and implementation.
While it might seem tempting to simply write code as if handling kilobytes of data when dealing with gigabytes or more, it's crucial to recognize the potential pitfalls such an approach can bring.

However, rather than solely attributing responsibility to the .NET Runtime for addressing users' mistakes, it's worth considering a more balanced perspective.
While the runtime can offer certain safeguards and guidance, it's ultimately the responsibility of the developers to ensure their applications are designed and optimized appropriately for handling large datasets.

Mistakes and challenges in dealing with large-scale data can indeed be valuable learning experiences for programmers.
By encountering and addressing issues firsthand, developers gain insights into what went wrong and how to improve their approaches in the future.
Improving programmers' comprehension of effective data management strategies is possible irrespective of the array allocation length limit; however, arbitrary constraints might impede this learning journey.

Furthermore, while partitioning memory allocation may seem advantageous, they come with their own set of drawbacks.
These include costly random access, complex operations, debugging and testing challenges, additional overhead in C# code, and cumbersome software cross-boundary vector data loading and storing.
Is the tradeoff always worth it? If a simpler solution existed, allowing for the allocation of contiguous managed memory regions with more than 2 billion elements, many of these complexities would dissipate.
Handling large linear arrays could also potentially benefit garbage collection efficiency, simplifying management compared to dealing with numerous smaller arrays.

First, you need to show that this is an actual foundational issue.

Imagine if Array.MaxLength were restricted to 32767.
Historically, computing faced similar constraints due to limitations in hardware, such as 16-bit architectures.
But as technology evolved, so did the need for larger data structures, leading to the adoption of 32-bit and eventually 64-bit systems.

The crux of the matter isn't merely the fixed value of 2,147,483,591 for Array.MaxLength.
Instead, it's the widespread reliance on int for Length properties and indices throughout the .NET ecosystem, rather than embracing nuint, akin to C++'s size_t.
In hindsight, the preference for nuint seems more logical, given its implementation-defined size, aligned with pointers.

While implementing radical changes at this stage could entail significant disruptions, there's merit in introducing new types such as NativeArray<T>, NativeSpan<T>, ReadOnlyNativeSpan<T>, NativeMemory<T>, and ReadOnlyNativeMemory<T>, all featuring nuint Length { get; }.
Moreover, NativeArray<T>, NativeSpan<T>, and ReadOnlyNativeSpan<T> could also have indexers accepting nuint indices.
As a positive side effect, this approach could enhance the management experience of large unmanaged memory regions.

@tannergooding
Copy link
Member

While the runtime can offer certain safeguards and guidance, it's ultimately the responsibility of the developers to ensure their applications are designed and optimized appropriately for handling large datasets.

Yes. However, if .NET already doesn't support the thing developers shouldn't be doing (in this case for historical reasons) then there is little to no benefit in .NET doing the massively complex work to support it because we'd be adding support for something that no real world scenario should be using in the first place.

The discussion basically stops there as there is no justification for adding support for something which no one should ever be using and for which if users "really" decide they need it, there are plenty of viable workarounds.

If developers need to learn, they can just as easy find this issue and read some of the feedback. Developers coming in and saying "we want this anyways, even if its the wrong thing" just adds confusion to the topic and hinders that learning process.

Handling large linear arrays could also potentially benefit garbage collection efficiency, simplifying management compared to dealing with numerous smaller arrays.

That's really not how that works. The GC and memory allocators in general are optimized for the common cases. They tend to have more overhead, more restrictions, and generally behave worse when introduced to uncommon patterns or extremely large allocations.

Today, the GC considers anything over 85KB as "large" and places it on a special heap (LOH) which is almost never compacted or moved. When such large allocations are over reference types, they have a lot more data that needs to be processed to determine liveness and the overhead of that many objects is going to be substantially more than almost anything else.

Even if the support did exist, it wouldn't get optimized because it wouldn't be a common case. It would never become common because users would start trying it and see that the perf was awful and they'd end up having to research and find out they shouldn't be doing this stuff in the first place. They then wouldn't use the feature as they'd see they should be doing other things instead.

Imagine if Array.MaxLength were restricted to 32767.

This is a straw person argument. Computers being restricted to 16-bits would indeed be limiting, but we aren't restricted to 16-bits. We're restricted to a reasonably large boundary (2.14 billion elements, which could take up more than 32-bits of memory space) well above and beyond what hardware supports for its built in systems and the amounts they are designed to be efficient around.

What you're stating is like arguing that because we went from 8-bit to 16-bit, 16-bit to 32-bit, and 32-bit to 64-bit address spaces in the past 40 years that we will need to go from 64-bit to 128-bit in the next 100. This is the type of thing that a person might naively state without considering that this is an exponential growth and that 2^64 is such an astronomically large number that a single computer cannot actually hold that much RAM without us fundamentally breaking how the hardware is manufactured and even beyond that, essentially breaking our current understanding of physics.

To actually work with that much memory, you need many machines. Once you have many machines, you have non-coherent memory. Once you have non-coherent memory, you have to fundamentally change how you work with the data for both correctness and performance. Distributed computing then typically has a customized Operating System, specialized infrastructure and networking layouts, and unique data management systems to ensure that data can be correctly and efficiently moved about without hitting bottlenecks. This is true for super computers, IoT compute clusters, server setups for datacenters, etc.

The same general considerations tend to happen when you get beyond 2 billion elements. Once you start getting that big, you can't actually reliably make the allocations unless you limit the hardware it runs against. Even if you restrict the hardware you run against, you're going to ultimately hurt the overall performance and considerations of the application because you aren't the only process running. The cost of accessing any piece of memory isn't linear, it starts getting bottlenecked by the increasingly smaller caches and prefetch mechanisms (so much so that in many cases you want to bypass these caches by using non-temporal accesses). You also start getting into compute times where you fundamentally have to start parallelizing the computation.

All of these changes end up meaning that the way the data needs to be worked with has changed as does how you need to manage the data. You have a need to chunk the data into many separate views that can be independently accessed by separate cores and then independently freed, paged out, etc as the data gets processed. You have to balance the management of that data with the processing time of that data to ensure that the CPU stays reasonably saturated without oversaturating it (typically about 80% utilization is the ideal).

You may start considering the need for GPU compute, where a single GPU may peak around 24GB of RAM. Where the GPU itself may be limited in how much they can transfer at one time (until relatively recently this was capped at 256MB). Where you actually have to copy data from one memory system to another and where that memory is non-coherent. Where the allocations and the amount a shader can access at any one time may itself be limited, etc.

None of this is a problem unique to .NET. It's just the world of working with "big" data and where "big" is a lot smaller than 2 billion elements.

Instead, it's the widespread reliance on int for Length properties and indices throughout the .NET ecosystem, rather than embracing nuint, akin to C++'s size_t.

This is also isn't really a good argument and there are many reasons why using int is "better". Even if there were support for some NativeArray type, it's entirely likely that the indexer would be nint (not nuint) as there are benefits to using a signed type and many drawbacks to using an unsigned type. There are some conceptual reasons for why nuint is "better", but in practice it tends to lose out and there are many places you may fundamentally require signed anyways.

Types like T[] itself already support nuint based indexers, even with the length being limited to 31-bits. Such support could be added to Span<T> if we saw enough benefit to doing so. However, the JIT is already optimized around the fact that its a 31-bit always positive length and will typically do the relevant hoisting and other opts to ensure the indexing remains just as efficient regardless. This also allows the general codegen to be smaller and more efficient, as there is an encoding and often perf optimization to using 32-bit registers (over 64-bit registers) on many platforms.

@MineCake147E
Copy link
Contributor

This is a straw person argument.

Oops.
I didn't know that it is until you noticed it. Sorry for that.

@GPSnoopy
Copy link
Author

Just to cross check a few facts, plus some comments.

What you're stating is like arguing that because we went from 8-bit to 16-bit, 16-bit to 32-bit, and 32-bit to 64-bit address spaces in the past 40 years that we will need to go from 64-bit to 128-bit in the next 100. This is the type of thing that a person might naively state without considering that this is an exponential growth and that 2^64 is such an astronomically large number that a single computer cannot actually hold that much RAM without us fundamentally breaking how the hardware is manufactured and even beyond that, essentially breaking our current understanding of physics.

AMD Zen 4 increased its virtual address space from 48-bit to 57-bit. Wikipedia indicates Intel did the same with Ice Lake in 2019 (https://en.wikipedia.org/wiki/Intel_5-level_paging).

Intel 386 (the first x86 32-bit CPU) was introduced in 1985. AMD Opteron (the first 64-bit CPU) was introduced in 2003. This is 18 years later, not 40. While there is no immediate need for 128-bit physical addressing. Actual support for virtual addressing using more than 64-bit is not as far as you think.

In the meantime, the typical amount of RAM per CPU (i.e. UMA) on a server is 768GB-1.5TB. AWS has NUMA instances that go up to 24TB.

The cost of accessing any piece of memory isn't linear, it starts getting bottlenecked by the increasingly smaller caches and prefetch mechanisms (so much so that in many cases you want to bypass these caches by using non-temporal accesses). You also start getting into compute times where you fundamentally have to start parallelizing the computation.

You imply too much on how one might use large arrays. The random-access and linear access cost of any array are the same for any size. Cache locality for random access on large arrays depends heavily on the algorithm and its data locality, rather than the fact that a large array has been allocated in one go. Cache locality goes out of the window if you do a linear access on an array that does not fit in cache (even if you split the array into smaller ones). Not sure why you want to deny the ability for people to use large arrays.

You may start considering the need for GPU compute, where a single GPU may peak around 24GB of RAM. Where the GPU itself may be limited in how much they can transfer at one time (until relatively recently this was capped at 256MB). Where you actually have to copy data from one memory system to another and where that memory is non-coherent. Where the allocations and the amount a shader can access at any one time may itself be limited, etc.

Last night, NVIDIA announced the Blackwell GPU which has 192GB of RAM. The previous NVIDIA GPU, the H100, has 80GB of RAM. CUDA has no issue with large data transfer (larger than 256MB) and 64-bit addressing.

None of this is a problem unique to .NET. It's just the world of working with "big" data and where "big" is a lot smaller than 2 billion elements.

You and I do not have the same definition of big data. If I can fit it on a laptop, it's not big.

This is also isn't really a good argument and there are many reasons why using int is "better".

I think the whole technical discussion is moot. @tannergooding is just regurgitating the same arguments as people did in 2003 when AMD64 came out; arguing why it's useless, has too many performance compromises (e.g. pointers being twice the size), and somehow going on a crusade to unilaterally deny its access to programmers/users. In the meantime the world as moved on. Seriously.

The real question is where the actual decision makers at Microsoft see C# and .NET going. My observation (and this is purely my biased opinion) is that Microsoft has nowadays little appetite for .NET (in markets such as system programming, REST servers, large scale services, HPC, GPUs, AI, etc), instead chasing subscription-based revenues and AI services; leaving .NET mostly for "small" applications or services, mobile apps, and some Unity games.

@Neme12
Copy link

Neme12 commented Mar 19, 2024

Users always need the right equipments to run a certain software anyway.
Application-specific machines, such as supercomputers, servers, workstations, enthusiast HEDT PCs, and gaming PCs, are all tailored for the set of softwares the users want to use.

If you agree that this is a specialized scenario for specialized hardware, sure. I was pushing back on gigabytes of memory being called normal data rather than big data.

@tannergooding
Copy link
Member

AMD Zen 4 increased its virtual address space from 48-bit to 57-bit.

Yes, and that is still nowhere near the entire 64-bit address space. It is an exponential growth.

Different scales in terms of nanoseconds:

  • 2^16: 65.54 microseconds
  • 2^32: 4.295 seconds
  • 2^48: 3.258 days (78.19 hours)
  • 2^57: 1668 days (238.3 weeks)
  • 2^64: 584.6 years (213504 days)

The reason why I use nanoseconds is because computers currently operate in terms of GHz, 1hz is therefore 1ns.

The fastest CPUs can boost to around 6GHz today and with the assistance of liquid nitrogen we've managed to nearly hit 9GHz. The fastest CPUs can do parallel dispatch of up to around 5 additions per cycle, assuming they are non-dependent.

So assuming we simply wanted to increment 2^64 times, it would take a single core on a modern processor 12.97 years to do so, operating under peak efficiency and never slowing down. If we scale that to the peak number of cores in a single CPU (192) and factor in that they're hyperthreaded and without factoring in the lower clock speeds, then a single modern CPU can iterate 2^64 times in around 12.33 days.

This of course isn't factoring in that memory access is tens to hundreds of times slower than direct instruction execution, that the peak transfer rate of things like DDR5 is 64 GB/s and in practice is much lower due to contention, random access, etc. It also isn't factoring in that we're hitting the limits of how small we can make transistors due to the effects of quantum tunnelling or that there are other fundamental barriers related to heat dissipation, the speed at which electricity can travel through various materials, etc. That while we will continue to see some increases, we require a massive scientific breakthrough and likely a change in how computers are designed to get them substantially faster that what we have today.

The random-access and linear access cost of any array are the same for any size. Cache locality for random access on large arrays depends heavily on the algorithm and its data locality, rather than the fact that a large array has been allocated in one go. Cache locality goes out of the window if you do a linear access on an array that does not fit in cache (even if you split the array into smaller ones). Not sure why you want to deny the ability for people to use large arrays.

The consideration is that many of the necessary data management tasks can no longer be trivially done when it is actually a single allocation. These becomes significantly more trivial if you break it up into appropriate sized chunks and that is just how the data is actually interacted with in systems that are designed to work with large data. They recognize that it is technically a contiguous thing, but that it needs to be appropriately buffered, chunked, streamed, and managed to account for real world hardware limitations.

Last night, NVIDIA announced the Blackwell GPU which has 192GB of RAM. The previous NVIDIA GPU, the H100, has 80GB of RAM.

Yes and this is an extremely high end GPU that is not meant for a typical consumer. The typical GPU is far below 24GB with 24GB being the upper end of what the highest end consumer PCs and even the upper end of what many cloud systems offer.

CUDA has no issue with large data transfer (larger than 256MB) and 64-bit addressing.

There is a difference between the API behind the scenes chunking the data for you and it actually working with greater than 256MB chunks.

The limit of 256MB transfers comes about from the PCIe specification which required an optional feature introduced in 2007 known as reBAR (resizable base address register) to do individual accesses of more than 256MB in size at a time. GPU manufacturers (such as AMD, Intel, and Nvidia) only started offering support for this feature around 3-4 years ago.

You and I do not have the same definition of big data. If I can fit it on a laptop, it's not big.

You're free to have your own definitions for big vs small, but that doesn't negate how general systems think about the data.

To the GC, anything over 85KB is large and goes on a special heap.

To standard implementations of memcpy (provided by MSVC, GCC, Clang, etc) 256KB is typically considered large and is the point at which non-temporal accesses start getting used. Other cutoffs where the algorithms change tend to be around 8K and around 32 bytes. All done to account for typical allocation sizes, overhead of branches to dispatch to the right algorithm, and real world hardware costs for these various sizes.

The same general considerations also apply to the CPU optimization manuals, the implementations of malloc provided by the C Runtime, the memory management APIs provided by the Operating System, etc.

In terms of address space, these are relatively tiny. Even in terms of typical file size (considering modern images or videos), these are relatively tiny amounts. But they are still significant in terms of the scale of the CPU, the hardware itself, and the limits its designed around for efficiency.

I think the whole technical discussion is moot. @tannergooding is just regurgitating the same arguments as people did in 2003 when AMD64 came out; arguing why it's useless, has too many performance compromises (e.g. pointers being twice the size), and somehow going on a crusade to unilaterally deny its access to programmers/users. In the meantime the world as moved on. Seriously.

They are very different arguments. When 64-bit came out, we were already hitting the limits of 32-bit address space, especially across an entire machine. There was a clear need for the growth even if individual applications weren't near these limits and even if typical applications weren't near those limits.

There has been no argument against needing to work with data that is more than 2GB in size nor against there being cases where having a single linear allocation would be convenient.

I have stated that a NativeSpan type might be beneficial. It's something I've pushed for on several occasions. It have stated that there are concrete concerns with a NativeArray type, particularly if that is allowed to contain managed references.I have stated that working with big data is problematic today and it is an area where we could improve things substantially.

The arguments have therefore been around how the considerations for accessing big data change compared to accessing lesser amounts of data, how we are no where near the limits of the 64-bit address space, how we are approaching fundamental limits in how computers operate that are unlikely to change without a fundamental breakthrough in our understanding of physics, and how the effort would be better spent investing in making it easier to work with big data, irrespective of how its been allocated and particularly allowing for many separate allocations so that the code can work across a broader range of machines and considerations (both small and large distributed systems require similar considerations here for big data).

The real question is where the actual decision makers at Microsoft see C# and .NET going. My observation (and this is purely my biased opinion) is that Microsoft has nowadays little appetite for .NET (in markets such as system programming, REST servers, large scale services, HPC, GPUs, AI, etc), instead chasing subscription-based revenues and AI services; leaving .NET mostly for "small" applications or services, mobile apps, and some Unity games.

.NET works just fine with large scale services today and is used by many systems that operate on global scales servicing millions of customers per day, touching petabytes and more worth of data: https://dotnet.microsoft.com/en-us/platform/customers

There are of course even more big customers and scenarios that aren't listed on the showcase page. There are plenty of community projects that are targeting these scenarios, there are efforts by the .NET Libraries team to continue improving our support for things like Tensors, working with large data in general, etc.

.NET has a clear interest in supporting working with big data, but that in part has to come with consideration for backwards compatibility, with consideration for real world usage (not just convenient usage), and with consideration for what will lead users towards the greatest success.

@Neme12
Copy link

Neme12 commented Mar 19, 2024

While I like the appeal of using nint for size just for the sake of correctness and it bothers me a little that .NET is using int everywhere, I have to concede that changing that and breaking compatibility for relatively little benefit probably isn't worth it. Although it would have been nice if at least Span used nint correctly from the beginning so that you could at least refer to unmanaged memory, which definitely can be larger than int-sized, in a convenient (and consistent) way.

In the meantime, the typical amount of RAM per CPU (i.e. UMA) on a server is 768GB-1.5TB. AWS has NUMA instances that go up to 24TB.

That's true, but:

  1. All that memory is there to be able to run tons and tons of apps at once, not for just one app to use up most of it.
  2. Your app can still use all that memory, you just can't allocate a contiguous block of (managed) memory that is that big, which is a much smaller limitation than if you couldn't use that memory at all.

@Neme12
Copy link

Neme12 commented Mar 19, 2024

This also allows the general codegen to be smaller and more efficient, as there is an encoding and often perf optimization to using 32-bit registers (over 64-bit registers) on many platforms.

I thought using a native size would be more efficient, because the JIT has to widen an int to native size for array access? Or doesn't it?

@tannergooding
Copy link
Member

tannergooding commented Mar 19, 2024

I thought using a native size would be more efficient, because the JIT has to widen an int to native size for array access? Or doesn't it?

TL;DR: Not really. It's implicit as part of 32-bit operations.

For RISC based architectures (Arm32/Arm64, Risc-V, LoongArch and others), they typically use a fixed-width encoding and so the actual decoding cost is the same. The execution costs, however, can still differ although this namely applies to operations like multiplication and division, not to operations like addition, subtraction, or shifting. However, because its "fixed-width" generating constants can take multiple instructions and so you may need 3-4 instructions to generate some 64-bit constants.

For CISC based architectures (x86 and x64 namely), they typically use a variable-width encoding. For many reasons, including back compat but also factoring in common access sizes, the smallest encoding works with 32-bit registers and you need an additional prefix byte to do 8, 16, or 64-bit register access. This minorly increases the cost to the decoder and can negatively impact other code by pushing bytes outside the normal decode/icache windows.

For both sets of architectures, it is typical that doing a 32-bit operation will implicitly zero the upper 32-bits. So doing an inc eax for example ensures the upper 32-bits are zero and no explicit zero extension is needed. The simplest example is:

public static int M(Span<int> span)
{
    int sum = 0;

    for (int i = 0; i < span.Length; i++)
    {
        sum += span[i];
    }

    return sum;
}

Which generates the following:

; Method Program:M(System.Span`1[int]):int (FullOpts)
G_M000_IG01:                ;; offset=0x0000
    ; No prologue

G_M000_IG02:                ;; offset=0x0000
    mov      rax, bword ptr [rcx]         ; load the byref field into rax
    mov      ecx, dword ptr [rcx+0x08]    ; load the length into ecx, implicitly zero-extending
    xor      edx, edx                     ; zero the sum local
    xor      r8d, r8d                     ; zero the index
    test     ecx, ecx                     ; check if the length is 0
    jle      SHORT G_M000_IG04            ; skip loop if it is
    align    [0 bytes for IG03]

G_M000_IG03:                ;; offset=0x000F
    mov      r10d, r8d                    ; copy index into r10d, technically unnecessary
    add      edx, dword ptr [rax+4*r10]   ; load element from the index and add to the sum
    inc      r8d                          ; increment the index
    cmp      r8d, ecx                     ; check if we've hit the bounds
    jl       SHORT G_M000_IG03            ; if not, continue the loop

G_M000_IG04:                ;; offset=0x001E
    mov      eax, edx                     ; move the sum into the return register

G_M000_IG05:                ;; offset=0x0020
       ret                                ; return
; Total bytes of code: 33

If you instead use the 64-bit based indexing, you end up increasing the encoding cost by around 1-byte per instruction (which uses the wider register). It probably won't matter in practice, but its still unnecessary "bloat" to the 99% use case.

@huoyaoyuan
Copy link
Member

I think we shouldn't waste much time on how large the benefit is - it is definitely beneficial for many cases.

Instead, the topic should focus more about how world-breaking it would be, for existing 32bit-based code. Do more magic for codegen may also affect the performance.

If you instead use the 64-bit based indexing, you end up increasing the encoding cost by around 1-byte per instruction (which uses the wider register). It probably won't matter in practice, but its still unnecessary "bloat" to the 99% use case.

This looks like xarch specific. Popular RISC architectures use the same size for instruction encoding. This is something to consider but should not matter to much.

@MineCake147E
Copy link
Contributor

Yes. However, if .NET already doesn't support the thing developers shouldn't be doing (in this case for historical reasons) then there is little to no benefit in .NET doing the massively complex work to support it because we'd be adding support for something that no real world scenario should be using in the first place.

The discussion basically stops there as there is no justification for adding support for something which no one should ever be using and for which if users "really" decide they need it, there are plenty of viable workarounds.

OK, I found you're right about this. I agree.
I no longer argue about this thing.

This is also isn't really a good argument and there are many reasons why using int is "better". Even if there were support for some NativeArray type, it's entirely likely that the indexer would be nint (not nuint) as there are benefits to using a signed type and many drawbacks to using an unsigned type. There are some conceptual reasons for why nuint is "better", but in practice it tends to lose out and there are many places you may fundamentally require signed anyways.

Would you clarity about what's the drawbacks using unsigned types?

I initially thought that nint is better, but I ended up thinking otherwise.
Here's the list of a tiny portion of benefits using unsigned types:

  • No need for checking if the Length is negative
  • No need for checking if the index is negative
  • No need for reserving unused sign bit, hence twice the maximum representable size
  • Zero extension tends to be faster than sign extension
    • Zero extension could be done with renaming
  • Interoperation with native libraries can be easier

Here's the list of a tiny portion of reasons why unsigned indexing wouldn't harm performance anyway:

  • Loop unrolling can be done in the similar way that signed indexing offers
  • Reverse loop can also be done in the similar way that signed indexing offers as well
    • Simply replacing i >= 0 with i < Length, because -1 is now MaxValue
  • Tail-relative index can easily be calculated with Length + ~index anyway

I don't see anything I absolutely require signed index types.

@2A5F
Copy link

2A5F commented Mar 20, 2024

For using signed, there is a certain reason for the stupid Enumerator design of c#

public ref struct Enumerator
{
    /// <summary>The span being enumerated.</summary>
    private readonly Span<T> _span;
    /// <summary>The next index to yield.</summary>
    private int _index;
 
    /// <summary>Initialize the enumerator.</summary>
    /// <param name="span">The span to enumerate.</param>
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    internal Enumerator(Span<T> span)
    {
        _span = span;
        _index = -1;
    }
 
    /// <summary>Advances the enumerator to the next element of the span.</summary>
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public bool MoveNext()
    {
        int index = _index + 1;
        if (index < _span.Length)
        {
            _index = index;
            return true;
        }
 
        return false;
    }
 
    /// <summary>Gets the element at the current position of the enumerator.</summary>
    public ref T Current
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get => ref _span[_index];
    }
}

Java-style iterators is better with unsigned

public ref struct Enumerator
{
    private readonly Span<T> _span;
    private uint _index;
 
    internal Enumerator(Span<T> span)
    {
        _span = span;
        _index = 0;
    }

    public bool HasNext() => _index < _span.Length;
    public ref T Next() => ref _span[_index++];
}

@MineCake147E
Copy link
Contributor

For using signed, there is a certain reason for the stupid Enumerator design of c#

You can do the same thing with ~0 instead of -1 because both wouldn't be a valid index anyway.

public ref struct Enumerator
{
    private readonly ref T _head;
    private readonly nuint _length;
    /// <summary>The next index to yield.</summary>
    private nuint _index;
 
    /// <summary>Initialize the enumerator.</summary>
    /// <param name="span">The span to enumerate.</param>
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    internal Enumerator(Span<T> span)
    {
        _head = ref MemoryMarshal.GetReference(span);
        _length = (uint)span.Length
        _index = ~(nuint)0;
    }
 
    /// <summary>Advances the enumerator to the next element of the span.</summary>
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public bool MoveNext()
    {
        var index = ++_index;
        return index < _length;
    }
 
    /// <summary>Gets the element at the current position of the enumerator.</summary>
    public T Current
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get => _index < _length ? Unsafe.Add(ref head, _index) : default!;
    }
}

@tannergooding
Copy link
Member

I think we shouldn't waste much time on how large the benefit is - it is definitely beneficial for many cases.

That's part of the discussion point. Whether there is actually benefit or whether it is simply convenience.

Many times convenience is a benefit, but other times it may itself be a pit of failure and be inappropriate due to the pain it causes or due to leading users down the wrong path, especially users who may not know better.

Instead, the topic should focus more about how world-breaking it would be, for existing 32bit-based code. Do more magic for codegen may also affect the performance.

.NET cannot make a breaking change here. As has been iterated many times, almost every single for loop written would now have a chance for silent failure. Any such change to make System.Array actually support 64-bits would have to be a hard break where roll forward was explicitly blocked and where there would need to be significant analyzer and language work to help surface the bugs users might hit (even though the vast majority of code would never actually encounter large arrays). There would likely be security and other concerns from such a break as well.

If it was deemed viable, a new array type could be introduced and which new code could opt into using. However, there have been many reasons pointed out as to why this itself isn't viable and how it interplays poorly with a GC. Such a type would reasonably need to be restricted to unmanaged data only (data that isn't itself a reference type and doesn't contain reference types).

The most viable path here would be to introduce some NativeSpan type which is essentially Span<T> but with an nint based length.

This looks like xarch specific. Popular RISC architectures use the same size for instruction encoding. This is something to consider but should not matter to much.

The considerations for RISC architectures were also called out, including the extended instruction sequences often required to work with anything over a 16-bit constant.

Every architecture has tradeoffs here and almost every one ends up being more expensive for larger register sizes. It only varies where that expense is seen (encoding size vs operation count vs number of instructions vs ...).

@tannergooding
Copy link
Member

Would you clarity about what's the drawbacks using unsigned types?

Here's the list of a tiny portion of benefits using unsigned types:

Sure.

One of the primary things to note is that using a signed value doesn't mean negative values are allowed, so many of the "benefits" you've listed for unsigned types equally apply to signed. Many of the checks are likewise equivalent due to the two's complement representation.

No need for checking if the Length is negative

The length for types like Array and Span are known to be "never negative" already, its an implicit assumption made by the runtime and which can only be violated by the user unsafely mutating the underlying state (which itself is undefined and dangerous behavior).

No need for checking if the index is negative

You only have a singular check for signed as well. Due to two's complement representation (x < 0) || (x >= KnownPositive) can just be emitted as (uint)x >= KnownPositive. It's an implicit optimization that is done by many compilers since all negatives will have the most significant bit set and will therefore compare as greater than KnownPositive

No need for reserving unused sign bit, hence twice the maximum representable size

This is really a non-issue. Due to having an Operating System, a runtime, general program state, etc; you can never actually have something that is nuint.MaxValue in length.

On a 32-bit system, it's typical for the OS to explicitly reserve at minimum 1GB of memory, so the maximum user reserved space is 3GB, which of course is shared with other systems. Without explicit opt-in, many systems historically limited this user space to 2GB as well, notably.

On 64-bit systems, most modern hardware actually has special support for always masking off the most significant 7-11 bits of the address so that it can be used to encode special information. However, even if this were to ever expand to the full space, it's very likely that the last bit would be reserved for the system regardless. The amount of space required to encode the page tables when you actually have that much memory, the general needs for the system and memory managers to maintain state, the need to represent the actual programs, etc all prevent you from ever actually using all bits, so reserving the last bit is fine.

Zero extension tends to be faster than sign extension
Zero extension could be done with renaming

Depends on the CPU. Many modern CPUs have zero-cost sign-extension as well. However, sign-extension is only needed for values that "could" be negative. The general index usage, bounds checks, and well known state for core types means that values are typically known to be never negative and thus can use zero-extension regardless. There are a couple of niche cases otherwise, but those are easy to workaround. -- More notably, if using nint there is no need to extend regardless, its already the right size.

Interoperation with native libraries can be easier

This depends. Many native libraries do take size_t. However, many other common usages like ptrdiff_t and ssize_t are explicitly signed themselves. You really get a mix of both depending on the scenario and many newer libraries have opted to use signed to avoid some of the issues you encounter, especially when doing subtractions and comparisons with unsigned types.

It's ultimately all about tradeoffs and for typical usages the types can be treated equivalently due to knowing the value is never negative. So, you then just need to consider where the differences do come in, the risk around them, and how much is needed to workaround any issues when they do come up.

@davidxuang
Copy link

davidxuang commented Mar 20, 2024

TBH, I don't think it's possible that unsigned integers are ever considered — they are not even CLS-compliant. That's a huge drawback for interop with or implement some languages.

There are several possible ways to introduce huge arrays: new types, simple breaking change, port LongLength to arrays, or make it optional (like the way Python tries to drop GIL in PEP 703) by introduce size_t-like alias. Though I suppose byte buffers may be the only case that long indexer is often needed.

@tannergooding
Copy link
Member

tannergooding commented Mar 20, 2024

they are not even CLS-compliant.

CLS compliance notably doesn't really matter that much anymore and we've even looked at removing it in the past.

It was a consideration 20+ years ago when .NET was first introduced and there are technically some languages that may not support unsigned types. However, CLS Compliance itself hasn't been updated since then and doesn't take into account any new features (like DIMs, ref structs, etc; many of which would likely not be considered CLS Compliant given the original spirit of the feature) and Roslyn doesn't itself accurately warn in all the places it "should" (which includes places where modreq might be emitted, as is the case when using several language features with virtual members).

We've also, since .NET Core 2.1 at least, been a lot more flexible and overall consistent with exposing support for unsigned types where that makes sense or to ensure they have general parity with signed types (such as for Generic Math, conversion support, optimizations, Math and reading/writing APIs, etc).

So, if we truly thought that unsigned was appropriate, we'd end up having that discussion and using it. The biggest "blocker" is just that the rest of the ecosystem already uses signed types almost exclusively for the general indexing concept and so it simplifies many things to continue being consistent. We also get some of the other benefits like a clear sentinel for uninitialized or invalid state (think a sentinel similar to null) and code being less error prone when doing things like computing offsets (smallOffset - largerOffset is negative, not an overflow resulting in a very large positive), or doing reverse loop iterations, etc.

There are several possible ways to introduce huge arrays

I would expect the most likely thing to be some NativeSpan<T> like type, which can track a large length and work for any type of data (including unsafely reinterpreting something like a large long[] as a NativeSpan<byte>, since it might not be representable as Span<T>).

It might be conceivable to then define some UnmanagedArray<T> (where T : unmanaged) which allows for safe allocation and cleanup of very large buffers, but I wouldn't expect it to ever work for reference types or structs that contain reference types (directly or indirectly) due to the overhead that places on the GC in tracking that many objects. It's also something that is easily pollyfilled by the community for the special scenarios that need it: https://source.terrafx.dev/#TerraFX/UnmanagedArray%25601.cs,6e619c21500b58b1

@MineCake147E
Copy link
Contributor

MineCake147E commented Mar 20, 2024

The length for types like Array and Span are known to be "never negative" already, its an implicit assumption made by the runtime and which can only be violated by the user unsafely mutating the underlying state (which itself is undefined and dangerous behavior).

You still have to check it in Slice(start, length).

if ((uint)start > (uint)_length || (uint)length > (uint)(_length - start))

if ((ulong)(uint)start + (ulong)(uint)length > (ulong)(uint)_length)

Unsigned cast hack is not an exception of negativity check.

You only have a singular check for signed as well. Due to two's complement representation (x < 0) || (x >= KnownPositive) can just be emitted as (uint)x >= KnownPositive. It's an implicit optimization that is done by many compilers since all negatives will have the most significant bit set and will therefore compare as greater than KnownPositive

Yes. I knew it.
However, use of signed types for length still spreads misinformation that it could be negative.
There'd be plenty of user codes that check the sign bit of the length and/or index.

Additional question: Why do you need to reinterpret the values in signed types as unsigned one, if the signed types are better?

@jkotas
Copy link
Member

jkotas commented Mar 21, 2024

unsigned lengths

A lot has been written about advantages and disadvantages of signed and unsigned types for lengths. The general consensus is that signed lengths are less error prone for general higher-level programming that is focus of C# and .NET. Other much newer languages that target similar audience have made the same choice:

On the other hand, Rust that is more focused on low level programming has unsigned lengths.

@Neme12
Copy link

Neme12 commented Mar 21, 2024

Would you clarity about what's the drawbacks using unsigned types?

I don't think this is the appropriate discussion for that here. But my 2 cents are: While I like the appeal of unsigned lengths as something that can be correct by construction if your API never accepts negative values, so that you don't have to check for it on every input... in practice, unsigned lengths have many disadvantages - the biggest one being that if you have two unsigned values, you just can't take their difference, because the difference might not fit into an unsigned value. This is not only an annoying issue, but also a correctness issue, because even if you don't accept negative values, you often need them as part of your computation.

Even C++ introduced ssize_t because of how cumbersome size_t is.

I would love it if there were unsigned types that had the same max value as corresponding signed types, just couldn't be negative - so you could always take their difference. Those would have none of the disadvantages of unsigned types. But unfortunately, this isn't how either the hardware or platforms are designed, and we have what we have. The best we can do given what we have are signed types.

@Neme12
Copy link

Neme12 commented Mar 21, 2024

No need for checking if the Length is negative

The length for types like Array and Span are known to be "never negative" already, its an implicit assumption made by the runtime and which can only be violated by the user unsafely mutating the underlying state (which itself is undefined and dangerous behavior).

No need for checking if the index is negative

You only have a singular check for signed as well. Due to two's complement representation (x < 0) || (x >= KnownPositive) can just be emitted as (uint)x >= KnownPositive. It's an implicit optimization that is done by many compilers since all negatives will have the most significant bit set and will therefore compare as greater than KnownPositive

That's true but @MineCake147E still has a point (although I don't agree that unsigned would have more benefits than downsides). There are many places where you need to check than an int is not negative for validation, often having nothing to do with array indexing, and even if it has to do with array indexing, on unsigned, you'll likely have a different code path (e.g. throw an ArgumentException instead of IndexOutOfRangeExceptinon).

No need for reserving unused sign bit, hence twice the maximum representable size

Although as I said, this isn't actually a benefit - in practice it's a problem.

@huoyaoyuan
Copy link
Member

.NET cannot make a breaking change here. As has been iterated many times, almost every single for loop written would now have a chance for silent failure. Any such change to make System.Array actually support 64-bits would have to be a hard break where roll forward was explicitly blocked and where there would need to be significant analyzer and language work to help surface the bugs users might hit (even though the vast majority of code would never actually encounter large arrays). There would likely be security and other concerns from such a break as well.

I imagined letting the JIT insert some magic to adapt nint-length to 32bit indexing code. This will certainly bring some performance impact. Existing code shouldn't fail while processing small arrays. But yes, making them to have a chance of failing is painful.

@MineCake147E
Copy link
Contributor

MineCake147E commented Mar 21, 2024

Many modern CPUs have zero-cost sign-extension as well.

Even Zen4 and Golden Cove need 1 clock cycle to perform movsx rax, eax, while they need no clock cycle to zero-extend the value (Golden Cove needs the destination register to be different though).
Golden Cove also has less instruction execution ports available for movsx than for movzx and mov edx, eax.
MOVSX in uops.info
MOVZX in uops.info
mov r32, r32 in uops.info

code being less error prone when doing things like computing offsets (smallOffset - largerOffset is negative, not an overflow resulting in a very large positive)

This is why the carry flag exists.
JIT could optimize subtract then compare pattern by just eliminating unnecessary cmp instruction.
Example C# code:

var s = length - start;
return length > start ? s : 0;

The JIT could optimize it to something like (in the future):

xor eax, eax
sub rdx, rcx    ; rdx = length - start
cmova rax, rdx
ret

doing reverse loop iterations

As I mentioned earlier, you can simply replace i >= 0 with i < Length, because -1 is now MaxValue.
Also, you can do the dec rcxjae as well.

clear sentinel for uninitialized or invalid state (think a sentinel similar to null)

I wonder what you mean here.
For index, ~0 would just be fine, because ~0 wouldn't be a valid index for any length.
For Length, it depends on your intention.

@Neme12
Copy link

Neme12 commented Mar 21, 2024

clear sentinel for uninitialized or invalid state (think a sentinel similar to null)

Yeah... those should never have existed. If you have no value, just use null (or better, Optional<T>).

@Xyncgas
Copy link

Xyncgas commented Mar 21, 2024

It's not like it's ok to index into array with negative int
the argument of underflow holds true another way : unsigned has a bigger max value (as signed allows going to negative)
0 - 2147483647 doesn't underflow using signed, 2147483647 + 2147483647 doesn't overflow using unsigned

the reasoning behind using unsigned for length I can see is now you can get an array with more than INT32.MaxValue elements
Practicality aside, it's computer it computes number and numbers are infinite
The benefit of using unsigned integer to represent counts and position in array is to avoid bugs, assuming the reasoning for unsigned number to represent these things are acceptable it would eventually appear here, and it's the valid reasoning brings the expectation that one day you are going to get unsigned int from a library but now you need to convert it to int and that might be causing problem in the future because it's intended to be an unsigned int

@Neme12
Copy link

Neme12 commented Mar 22, 2024

The benefit of using unsigned integer to represent counts and position in array is to avoid bugs

I would say it's the opposite, unsigned numbers lead to more bugs (especially when combining them with or converting to/from signed numbers, which you often have to). Using a single data type for all kinds of integers has a huge advantage.

This discussion is pointless though, I'm pretty sure the API review board would never approve an unsigned data type for regular usage like length or indexing. At least not without a really good reason for why it's not only useful, but necessary.

@jeffhandley jeffhandley removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests