Large file support #9328

ncannasse · 2020-04-20T13:54:19Z

In order to properly support files >2GB, we need to change the signature of seek and tell method for sys.io.FileInput, using Float instead of Int, which is more easy to handle than Int64, and provides files sizes up to 9007 TB.

seek() is not a problem, because passing Int will still work it might have 0 impact on user code, unless someone overrides it.
tell() will change the return type so it might break some code relying on it.

Alternative is to add seekLarge() tellLarge() operations but I don't like much people having to change their code once their hit 2GB limit.

The text was updated successfully, but these errors were encountered:

Simn · 2020-04-20T14:16:15Z

I would prefer to make Int64 less awkward to work with first. I don't think the required changes are very complicated, we just need a literal syntax and adapt some TInt things.

RealyUniqueName · 2020-04-20T16:54:39Z

Float for byte counting is a dirty hack.
I agree we should improve (U)Int64 situation and use it.
Also, supporting bigger file sizes would require haxe.io.Bytes to support bigger data chunks. Or something like this: https://github.com/HaxeFoundation/haxe/blob/a93adb16800a719510027e44ee2809ba1ece5c47/std/haxe/io/BigBuffer.hx

ncannasse · 2020-04-20T20:16:04Z

The problem with Int64 is that it's not supported natively on every platform, JS being quite an important one for example.

ncannasse · 2020-04-20T20:19:32Z

So on many of these platforms using Int64 will prove to be much more slower than using Float.

Or we can introduce Int53 which is actually a double with no floating part ;)

nadako · 2020-04-21T10:24:44Z

Or we can introduce Int53 which is actually a double with no floating part ;)

This is something I've been actually seriously considering for one of my projects, because I didn't want to introduce Int64 object allocations for JS but also didn't want to use Float directly because of fractions. I've never implemented this though, because I just used domain-specific abstract types over numbers in the end.

Also modern JS has BigInt nowadays, maybe something to look into as well.

Aurel300 · 2020-04-21T11:02:38Z

I agree with the Int64 argument. For JS it should be implemented on top of BigInt moving forward. It is available in most browsers nowadays, but anyway for filesystem operations Node.js matter a lot more – BigInt support has been around since Node.js 10.4.0 and 10 is currently the oldest supported version (LTS is 12, latest is 13 at the moment).

Using Float (or even an Int53) would degrade some of the benefits of type safety. Do we specify what seek(10.5) does? Do we check if the number is integer-ish at runtime every time?

RealyUniqueName · 2020-04-22T09:58:04Z

I think Int53 might be an option for existing sys API. Why would that degrade type safety?

Aurel300 · 2020-04-23T13:34:39Z

The safety degrades mostly with the Float option. With Int53, the issues I see are actually mostly to do with literals and type casting.

If we want Int53 literals, the parser must be modified. If we do this, we might as well implement Int64 literals.

If we don't implement Int53 literals, how does a user specify a large offset as an input into e.g. seek (or resize in new sys APIs)? The options are:

A String-parsing function just like for Int64. A bit awkward to use, and unnecessarily slow at runtime.
Allow a Float to Int53 implicit cast, since we do have Float literals. This is still awkward because there is no hexadecimal syntax for Float literals, but more importantly – we need to do a runtime check that the Float actually represents an integer. Unnecessary slowdown and as always, compile-time errors are better than runtime surprises.

Having a new core type also comes with questions about its use in practice. Is there a toInt32 function that throws when the number is too large to fit? Can you do maths with Int53? Is (10:Float) / 3 different from (10:Int53) / 3 (the latter should use integer division)? How does an Int53 serialise into JSON and haxe.Serializer?

Overall, I just have the feeling that we shouldn't adopt a hack just because JavaScript doesn't have real integers. It might also be confusing to users in general – a cursory search for Int53 only reveals JS-based projects.

dimensionscape · 2024-05-05T09:19:49Z

Any further consideration for this?

dimensionscape · 2024-05-09T05:17:15Z

Ok, I suppose this remains on the back burner.

I am just going to plug this here in case anyone comes across this issue looking for support for larger files.

https://github.com/Dimensionscape-LLC/HxBigIO

RealyUniqueName added this to the Design milestone Apr 20, 2020

skial mentioned this issue Apr 21, 2020

Haxe Roundup 526 skial/haxe.io#741

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large file support #9328

Large file support #9328

ncannasse commented Apr 20, 2020

Simn commented Apr 20, 2020

RealyUniqueName commented Apr 20, 2020 •

edited

Loading

ncannasse commented Apr 20, 2020

ncannasse commented Apr 20, 2020

nadako commented Apr 21, 2020

Aurel300 commented Apr 21, 2020

RealyUniqueName commented Apr 22, 2020

Aurel300 commented Apr 23, 2020 •

edited

Loading

dimensionscape commented May 5, 2024

dimensionscape commented May 9, 2024

Large file support #9328

Large file support #9328

Comments

ncannasse commented Apr 20, 2020

Simn commented Apr 20, 2020

RealyUniqueName commented Apr 20, 2020 • edited Loading

ncannasse commented Apr 20, 2020

ncannasse commented Apr 20, 2020

nadako commented Apr 21, 2020

Aurel300 commented Apr 21, 2020

RealyUniqueName commented Apr 22, 2020

Aurel300 commented Apr 23, 2020 • edited Loading

dimensionscape commented May 5, 2024

dimensionscape commented May 9, 2024

RealyUniqueName commented Apr 20, 2020 •

edited

Loading

Aurel300 commented Apr 23, 2020 •

edited

Loading