Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth_subscribe in WS is not consistent between two nodes #2573

Closed
eduadiez opened this issue Sep 20, 2023 · 9 comments · Fixed by #2635
Closed

eth_subscribe in WS is not consistent between two nodes #2573

eduadiez opened this issue Sep 20, 2023 · 9 comments · Fixed by #2635
Assignees
Labels
bug Something isn't working rpc zkevm-bridge-sync-rpc
Milestone

Comments

@eduadiez
Copy link

System information

zkEVM Node version: v0.3.0
OS & Version: Linux
Network: Mainnet

RPC returns the latest block on both nodes but on WS it doesn't return the latest block. Checkout the screen shot below.

This issue was reported by quicknode

image

@eduadiez eduadiez added bug Something isn't working rpc labels Sep 20, 2023
@sjoshi10
Copy link

thank you 🙏

@sjoshi10
Copy link

As soon as we restart zkemv-rpc, it catches up to the tip on WS subscription.

@NikitosnikN
Copy link

related logs in RPC component:

{"level":"error","ts":1696403623.5247061,"caller":"jsonrpc/server.go:356","msg":"Unable to read WS message, websocket: close 4000: No new messages within 60 seconds","pid":10,"version":"v0.3.1","stacktrace":"github.com/0xPolygonHermez/zkevm-node/jsonrpc.(*Server).handleWs\n\t/src/jsonrpc/server.go:356\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2109\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2487\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}

@tclemos
Copy link
Contributor

tclemos commented Oct 9, 2023

Just to give it a heads-up, I'm fully focused and working on this at the moment.

I have made some experiments in order to identify the source of the problem, but unfortunately I wasn't able to reproduce it yet in a controlled environment. I'm continuing to investigate it and find different ways to reproduce the issue.

I've identified some points in the code that could be causing the issue and I'll be improving them.

@tclemos tclemos modified the milestones: v0.4.0, v0.3.2 Oct 9, 2023
@tclemos
Copy link
Contributor

tclemos commented Oct 10, 2023

@sjoshi10

  • Are both nodes running the same resources(CPU, RAM, Storage and Network)?
  • Could you share with us the resource specification for these nodes?

@tclemos tclemos modified the milestones: v0.3.2, v0.4.0 Oct 11, 2023
@sjoshi10
Copy link

@tclemos
Both nodes are running same resources.

192 cores
1.5TB RAM
1x 3.5TB NVMe

@tclemos
Copy link
Contributor

tclemos commented Oct 12, 2023

@tclemos Both nodes are running same resources.

192 cores
1.5TB RAM
1x 3.5TB NVMe

Holy moly! 😱

@lyh169
Copy link

lyh169 commented Nov 14, 2023

latest block

Hi, Could you please tell me the reason that will lead to 'RPC returns the latest block on both nodes but on WS it doesn't return the latest block' in the old version?

@tclemos
Copy link
Contributor

tclemos commented Nov 14, 2023

In the previous versions, the messages were sent sequentially considering all the connections, in case one connection was stuck for any reason it would delay all the other messages to be sent, creating a snowball effect accumulating messages. While the node was stuck waiting one of the connections to be able to write the message, the network was generating more blocks due to new transactions, the blocks were updated in the state due to the synchronizer, but the WS writing sequence was stuck.

In the new versions we changed this and now one connection shouldn't affect each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rpc zkevm-bridge-sync-rpc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants