-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node fails to restart: failed to find any peer in table (interopnet and testnet) #2062
Comments
Looks similar to #2021 |
Have you solved the problem? if yes, please share it, thanks |
It happened again, this time on testnet...
commit ffa7be8 I'm going to leave the machine as-is until somebody on the team can inspect it. |
I'll investigate a bit more in a few hours. I've also made an EBS snapshot. |
I set up a new node to take the place of the one that had this happen, so I could investigate. But it just happened on the new node too. I suspect that because I'm using the node for doing a lot of DHT lookups as I scan all the miners, that's triggering something bad. |
for reference: ipfs/kubo#1941 |
I tried to figure out where in the code it was refusing to start up. Here's a quick workaround to get my node to start up again: I patched it into here: At first glance, it appears that it might be trying to make DHT queries before the DHT is ready. |
It was executed as it was but failed. |
I can confirm this is happening. One of my Lotus nodes fails to restart indefinitely. |
If the DHT is reporting no peers in the table, try |
I think what is possibly happening is that the local list of peers in the DHT is getting zeroed out somehow, and then when the node restarts, the go-fil-markets module is trying to restart deals, but the DHT hasn’t been repopulated yet, and go-fil-markets can’t deal with the error so it just exits. |
@jimpick Yeah, that sounds about right. In general, That certainly seems to be what's caused this node to go kaput, I wonder if the miner for the deal in question is just unavailable. |
My node is starting again ... it appears #2239 is a working fix, so I'll close this now. |
Describe the bug
I've been running this node on the interopnet for a day or so, and now it fails to restart:
Version (run
lotus --version
):The text was updated successfully, but these errors were encountered: