-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store connection issue prevents subgraph indexing until graph-node is restarted #4190
Comments
@evaporei ^ as we discussed earlier |
thanks @sduchesneau - I think @fordN has also seen this error. Interesting that a restart fixes, while the regular retry does not. Will take a look |
@sduchesneau is this from a recent build from source? |
It was from a docker build of master, |
thanks @sduchesneau! This seems like it might be related to a failure to reset the subgraph on encountering this error:
(we see this log in all of the shared examples where we have seen this) |
Hey @sduchesneau and @azf20 |
@sduchesneau did you see this while using RPC or Firehose providers? I started seeing this recently after adding a firehose provider and I'm not sure if it's because of the firehose provider (the provider being faulty or some bug in the graph node firehose code path). |
@balakhonoff have you encountered this issue as well? If so, were you using firehose providers? |
@azf20 We did encounter this issue. Also, we can reproduce it reliably in our self-hosted environment. |
The issue is a mismatch between the subgraph runner and the store: because we write changes asynchronously, the store pretends to the subgraph runner that things have been written that aren't really in the database yet. That's all good when things are working. But if the async writer encounters an error when it is trying to write something, the subgraph runner's view will deviate from reality, and the store refuses to do anything more (that's what the What needs to happen at that point is that the subgraph runner tells the store explicitly "I got rid of all in-memory assumptions about what has been written" and reinitializes the store. That can either happen by calling The retry loop in the subgraph runner here doesn't take that into account and continues with a poisoned store. The easiest fix for this might be to add a method |
BUG!
Current behavior:
store error: no connection to the server
ERRO Subgraph failed with non-deterministic error: Failed to transact block operations: subgraph writer poisoned by previous error, retry_delay_s: 1800, attempt: 27, sgd: 865, subgraph_id: (...), component: SubgraphInstanceManager
How to reproduce:
I don't have a safe way to reproduce this on demand, unfortunately, but it happened to me on 3 different subgraphs at the same time (under heavy load)
Expected behavior:
The text was updated successfully, but these errors were encountered: