Memory leak on full nodes and validators #532

tkporter · 2019-10-16T19:10:05Z

Expected Behavior

Memory usage by tx-nodes and validators remains constant

Current Behavior

There seems to be a memory leak that's being seen on validators and tx-nodes (though not as extreme).

Example validator:

Example tx-node:

Here's a pprof rendering from a tx-node:

I wasn't able to find anything super conclusive, just that levelDB looked suspicious. Planning on running my own testnet with --pprof and seeing if there is anything noticeable

The text was updated successfully, but these errors were encountered:

mattharrop · 2019-10-18T23:25:55Z

My validator node consistently crashes after a few hours running on an instance with 8gb RAM. Logs from the crash attached.
celocrash.log

tkporter · 2019-10-22T18:56:17Z

Looks like discv5 is the one to blame:

(Note it doesn't seem like the sum of all the entries comes out to 631 MB, so there must be some inaccuracy here ^)

daithi-coombes · 2019-12-02T00:12:06Z

This issue is very very similar: ethereum/go-ethereum#16859
and should've been fixed in: ethereum/go-ethereum#16880

from @mattharrop logs above, there's plenty of goroutines hanging for 2hrs

 goroutine 1 [chan receive, 160 minutes]:

Also in network logs there's false positives about bad enode urls:

Error in upserting a valenode entry      AnnounceMsg="{Address: 0x132EA61AaE41
A3beE6a2faf45aAE68e2A7f9457F, View: {Round: 0, Sequence: 496530}, IncompleteEnodeURL: enode://c8f8bc50da
1780163761d738a1cbed328df7b82900e9d7ada38fb505fe589af60982b91e9b950a75bf677fabaafdf409fe9d2c0f9ead523946
70ed5f1da7f5eb, Signature: f3d205bd4a570b95671c9afba818d73b27cb62fb21376c7a184354b7f9e4745f3d65d4b8aabee
3621abe115ac099539037949771e4d6497dd72ac6ffe351d83001}" error="old announce message"

(am basing false positive on `error="old announce message")

am thinking dangling requests in discovery are the cause, for now. Will post heapdump as per this instructions: #573 (comment)

trianglesphere · 2020-12-03T22:01:24Z

Closing because it seems like issue #1197 is the same, except with newer logs.

tkporter added type:bug Something isn't working investigate labels Oct 16, 2019

tkporter self-assigned this Oct 16, 2019

tkporter added protocol and removed investigate labels Oct 23, 2019

timmoreton assigned kevjue Oct 23, 2019

clrblmt added celo-blockchain labels Nov 1, 2019

prestwich mentioned this issue Nov 6, 2019

Node shuts down without error after a few hours #573

Closed

timmoreton removed the betanet-blocker label Dec 4, 2019

asaj unassigned kevjue Dec 12, 2019

asaj added the betanet-phase-2 label Dec 12, 2019

tkporter added betanet-phase-3 and removed betanet-phase-2 labels Jan 8, 2020

mcortesi removed the celo-blockchain label Aug 25, 2020

mcortesi removed protocol type:bug Something isn't working labels Nov 3, 2020

mcortesi added the blockchain label Dec 3, 2020

oneeman added this to the Milestone 8 milestone Dec 3, 2020

mcortesi assigned trianglesphere and unassigned tkporter Dec 3, 2020

trianglesphere closed this as completed Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak on full nodes and validators #532

Memory leak on full nodes and validators #532

tkporter commented Oct 16, 2019 •

edited

Loading

mattharrop commented Oct 18, 2019

tkporter commented Oct 22, 2019

daithi-coombes commented Dec 2, 2019

trianglesphere commented Dec 3, 2020

Memory leak on full nodes and validators #532

Memory leak on full nodes and validators #532

Comments

tkporter commented Oct 16, 2019 • edited Loading

Expected Behavior

Current Behavior

mattharrop commented Oct 18, 2019

tkporter commented Oct 22, 2019

daithi-coombes commented Dec 2, 2019

trianglesphere commented Dec 3, 2020

tkporter commented Oct 16, 2019 •

edited

Loading