Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getProgramAccounts() large string error #73

Open
GustavAlbrecht opened this issue Jan 22, 2025 · 7 comments
Open

getProgramAccounts() large string error #73

GustavAlbrecht opened this issue Jan 22, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@GustavAlbrecht
Copy link

GustavAlbrecht commented Jan 22, 2025

Overview

The getProgramAccounts() method call throws and error when used on the Stake program:
Error: Cannot create a string longer than 0x1fffffe8 characters

Steps to reproduce

const STAKE_PROGRAM_ID = "Stake11111111111111111111111111111111111111";

let rpcClient = createSolanaRpc(config.get_program_accounts_rpc_endpoint);

let data = await rpcClient.getProgramAccounts(STAKE_PROGRAM_ID, {encoding: "base64"}).send();

Edit: I use node --version v22.11.0

Description of bug

The core issue is a node internal limit on string size. I used libraries like stream-json to workaround this. Since getProgramAccounts is expected to sometimes return heavy data i consider it a bug that this library isn't handling heavy data coming from this call.

@GustavAlbrecht GustavAlbrecht added the bug Something isn't working label Jan 22, 2025
@steveluscher
Copy link
Collaborator

Very interesting. What version of Node is this by the way?

@lorisleiva, do you think this is a limitation introduced by our new bigint-aware JSON parser, that's not present in the native parser?

@GustavAlbrecht
Copy link
Author

I use node --version v22.11.0

@lorisleiva
Copy link
Member

Ugh yeah it is likely that our custom JSON parser — that mitigates bigint values unsafe above 2^51-1 — ends up causing this limitation.

This is because we are having to await on response.text() instead of response.json() and the latter is likely making use of data streams.

As such, we end up hitting the Node limitation on strings which is set at 0x1fffffe8 characters.

However, that many characters amount to 512Mb of data and perhaps the pros of having safe u64 values (e.g. lamports) on the client outweighs the cons of having a half gigabyte data size limit by default.

I say by default because anyone can customise their RPC object by providing custom transports and APIs, which, when dealing with such large amounts of data, might be the best way forward anyway.

@steveluscher
Copy link
Collaborator

Is this the case, @GustavAlbrecht? Are you downloading >512Mb of data from an RPC?

@steveluscher
Copy link
Collaborator

Do you think there's any advantage to changing the fromJson API to instead deal with the body directly, which is a ReadableStream? I suppose we could let fromJson take in the body instead, build up the JSON string with your parser in a streaming fashion, and then JSON.parse() it. At worst it would make the processor faster (ie. be able to start sooner) and at best it might fix this problem?

@steveluscher
Copy link
Collaborator

steveluscher commented Feb 20, 2025

We definitely won't be including something like this in the core library, but you might consider making a custom network transport that either disables our bigint parser, or uses something like https://github.com/karminski/streaming-json-js to convert it to JSON progressively without having to store the entire string.

@GustavAlbrecht
Copy link
Author

GustavAlbrecht commented Feb 23, 2025

Is this the case, @GustavAlbrecht? Are you downloading >512Mb of data from an RPC?

We definitely won't be including something like this in the core library, but you might consider making a custom network transport that either disables our bigint parser, or uses something like https://github.com/karminski/streaming-json-js to convert it to JSON progressively without having to store the entire string.

To answer your first question: Yes i download more than 512MB. I remember it happened in 2024 that it needed to switch to parsing progressively because stake accounts surpassed the 512 MB limit.

For the second question: To clarify i was using axios back then and i used the stream-json and stream-chain packages to fix the issue. I was looking to replace it with this library recently but given the issue i didn't complete it. I didn't use stream-json with this library or modified anything. So i don't have a strong opinion on how to fix it the best way.

I guess in theory an implementation that would parse json text and yield account objects in chunks would be ideal in terms of memory and processing for many use cases. Or at least for my use case where i fetch accounts, send them to the stake account parser, and then persist them in Postgres in batches. Right now i first need to wait until the whole text is parsed into account objects before i can start parsing account.data and persist stake account objects in Postgres.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants