Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large file support #9328

Open
ncannasse opened this issue Apr 20, 2020 · 10 comments
Open

Large file support #9328

ncannasse opened this issue Apr 20, 2020 · 10 comments
Milestone

Comments

@ncannasse
Copy link
Member

In order to properly support files >2GB, we need to change the signature of seek and tell method for sys.io.FileInput, using Float instead of Int, which is more easy to handle than Int64, and provides files sizes up to 9007 TB.

seek() is not a problem, because passing Int will still work it might have 0 impact on user code, unless someone overrides it.
tell() will change the return type so it might break some code relying on it.

Alternative is to add seekLarge() tellLarge() operations but I don't like much people having to change their code once their hit 2GB limit.

@Simn
Copy link
Member

Simn commented Apr 20, 2020

I would prefer to make Int64 less awkward to work with first. I don't think the required changes are very complicated, we just need a literal syntax and adapt some TInt things.

@RealyUniqueName
Copy link
Member

RealyUniqueName commented Apr 20, 2020

Float for byte counting is a dirty hack.
I agree we should improve (U)Int64 situation and use it.
Also, supporting bigger file sizes would require haxe.io.Bytes to support bigger data chunks. Or something like this: https://github.com/HaxeFoundation/haxe/blob/a93adb16800a719510027e44ee2809ba1ece5c47/std/haxe/io/BigBuffer.hx

@RealyUniqueName RealyUniqueName added this to the Design milestone Apr 20, 2020
@ncannasse
Copy link
Member Author

The problem with Int64 is that it's not supported natively on every platform, JS being quite an important one for example.

@ncannasse
Copy link
Member Author

So on many of these platforms using Int64 will prove to be much more slower than using Float.

  • Or we can introduce Int53 which is actually a double with no floating part ;)

@nadako
Copy link
Member

nadako commented Apr 21, 2020

Or we can introduce Int53 which is actually a double with no floating part ;)

This is something I've been actually seriously considering for one of my projects, because I didn't want to introduce Int64 object allocations for JS but also didn't want to use Float directly because of fractions. I've never implemented this though, because I just used domain-specific abstract types over numbers in the end.

Also modern JS has BigInt nowadays, maybe something to look into as well.

@Aurel300
Copy link
Member

I agree with the Int64 argument. For JS it should be implemented on top of BigInt moving forward. It is available in most browsers nowadays, but anyway for filesystem operations Node.js matter a lot more – BigInt support has been around since Node.js 10.4.0 and 10 is currently the oldest supported version (LTS is 12, latest is 13 at the moment).

Using Float (or even an Int53) would degrade some of the benefits of type safety. Do we specify what seek(10.5) does? Do we check if the number is integer-ish at runtime every time?

@RealyUniqueName
Copy link
Member

I think Int53 might be an option for existing sys API. Why would that degrade type safety?

@Aurel300
Copy link
Member

Aurel300 commented Apr 23, 2020

The safety degrades mostly with the Float option. With Int53, the issues I see are actually mostly to do with literals and type casting.

If we want Int53 literals, the parser must be modified. If we do this, we might as well implement Int64 literals.

If we don't implement Int53 literals, how does a user specify a large offset as an input into e.g. seek (or resize in new sys APIs)? The options are:

  • A String-parsing function just like for Int64. A bit awkward to use, and unnecessarily slow at runtime.
  • Allow a Float to Int53 implicit cast, since we do have Float literals. This is still awkward because there is no hexadecimal syntax for Float literals, but more importantly – we need to do a runtime check that the Float actually represents an integer. Unnecessary slowdown and as always, compile-time errors are better than runtime surprises.

Having a new core type also comes with questions about its use in practice. Is there a toInt32 function that throws when the number is too large to fit? Can you do maths with Int53? Is (10:Float) / 3 different from (10:Int53) / 3 (the latter should use integer division)? How does an Int53 serialise into JSON and haxe.Serializer?

Overall, I just have the feeling that we shouldn't adopt a hack just because JavaScript doesn't have real integers. It might also be confusing to users in general – a cursory search for Int53 only reveals JS-based projects.

@dimensionscape
Copy link
Contributor

Any further consideration for this?

@dimensionscape
Copy link
Contributor

Ok, I suppose this remains on the back burner.

I am just going to plug this here in case anyone comes across this issue looking for support for larger files.

https://github.com/Dimensionscape-LLC/HxBigIO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants