-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increased latency in eth calls #12281
Comments
try |
we've increased it before to |
Then add —pprof:
|
I see probably it's happening when new file got created - in this case RPCDaemon now does "ReOpen" files - and it's not really necessary. We also working on it in #12299 and heap-profiling a bit optimized here #11710 |
good to know its spotted |
I think it’s very old behavior. Do you see it after recent upgrade? |
@JkLondon hi.
|
We did a node upgrade to 2.60.6 on 28th of August, so It doesn't look like its related to bumping up the version. I can't say if its happening on versions before 2.60, but its happening for at least 3 months. It seems that it started getting a bit more spikey on second half of September. Not sure what these large spikes were tbh, but even if we ignore them it averages 7-10 seconds for eth_blockNumber call However, I've noticed that in one of the Discord threads a guy was complaining in May that he received a lot of latency calls when he bumped from 2.58.2 to 2.60.0 and that he had to roll back to resolve hence my question for downgrading |
Thank you for finding it. Seems that support request was not solved. We will work on it. |
I will work on removing need of files re-open (on e3 first then will port to e2): #12332 |
@AskAlexSharov Can we expect a release with the fix anytime soon? |
|
will test if the issue persists and get back in here |
Sorry guys, this takes longer then anticipated to test since the underlying base image also changed so we gotta incorporate that . Will have some info for this by EOW |
Thank you |
Hey guys We've basically tried to make 2.60.10 reduce the latency for across regions and here is what we found out and what we are battling against currently:
This points honestly to blockchain size to be connected to this as I see Polygon also has big ledgers on our end ~ 9.9TB. We were thinking of resyncing the nodes to get the latency up but honestly its not a small amount of work and want to see if that would be the correct step forward Would appreciate ideas or suggestions ...and sorry for the hold up |
|
thanks for suggestions and ideas Alex Dedicating time to the first point currently. Got no Polygon nodes with "normal" ledger sizes all increased up to 9TBs of data. For Ethereum, I'll copy over deff Metrics are produced by a proxy service. They just messure the time it took to get the answer from Erigon. Proxies are deployed in the same network as Erigons so the network latency is minimal For For |
may be 2 reasons:
To catch slow RPC calls:
Also can:
Also can try Erigon3 - chaindata size there is 20gb. https://github.com/erigontech/erigon?tab=readme-ov-file#erigon3-changes-from-erigon2 |
thanks for insights, will get back to the team with this and discuss small update - fresh Polygon DBs are not helping reduce the latency |
hey @AskAlexSharov a quick question => is Erigon3 production ready ? |
Differences: https://github.com/erigontech/erigon?tab=readme-ov-file#erigon3-changes-from-erigon2 |
@AskAlexSharov #13320 this one should be also addressed and would be an easy fix, is there any progress to fix the issue? |
Let's make sure that the latency is good at least in E3. |
System information
Erigon version: 2.60.8
OS & Version: Alpine Linux v3.20 in K8s
Erigon Command (with flags/config):
Chain/Network: Ethereum / Polygon / BSC
Expected behaviour
No spikes in eth calls
Actual behaviour
For a while now we’ve been witnessing random latency spikes in calls on:
We are trying to see why do these calls last for up to 20 seconds.
eth_blockNumber usually returns in less then a second.
This happens on Ethereum , Polygon and BSC on Mainnet and it happens on archive mode runs.
The bellow graphs are from archive Erigon running Polygon for the last 28 days
eth_getBlockByNumber
eth_call
eth_blockNumber
Our initial guess was that this happens because of high traffic but that doesn’t seem to be the case.
We’ve had one peek in this range but thats it
We can confirm network and I/O are not bottlenecks in this case as we’ve throughly checked and other non-erigon nodes are operational.
Steps to reproduce the behaviour
To reproduce this, one can try to set up https://github.com/louislam/uptime-kuma and create a monitor, point it to one of the nodes with POST eth_blockNumber periodically. There should be random spikes happening on the graph from time to time.
Backtrace
Example for eth_blockNumber when spike happens in logs.
From our monitoring, the request was initiated at 14:15:27, but as you can see from the logs it answerd at 14:15:35
Another occurance
Initiated at :27 answered at :34
The text was updated successfully, but these errors were encountered: