You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seen in logs on a Kubernetes cluster. Sometimes the process would restart a few times then work; sometimes it would restart endlessly.
scope version 1.11.6 (1.11.4 and 1.11.3 had same symptom)
I believe this is a race condition from running getInitialState() on a background thread - if we shut down the ebpf tracker before that has finished it will crash.
These nodes seem to have a lot of TCP connections, e.g. conntrack -L reports 245155 on one, only around 3,000 in TIME_WAIT.
Logs:
2019-10-13T19:27:25.821787845Z time="2019-10-13T19:27:25Z" level=info msg="publishing to: https://<redacted>@cloud.weave.works."
2019-10-13T19:27:25.821833615Z <probe> INFO: 2019/10/13 19:27:25.821238 Basic authentication disabled
2019-10-13T19:27:25.849864019Z <probe> INFO: 2019/10/13 19:27:25.849501 command line args: --mode=probe --probe-only=true --probe.docker=true --probe.docker.bridge=docker0 --probe.kubernetes.role=host https://<elided>@cloud.weave.works.
2019-10-13T19:27:25.849895079Z <probe> INFO: 2019/10/13 19:27:25.849589 probe starting, version 1.11.6, ID 4240307090b3c701
2019-10-13T19:27:25.856663104Z <probe> WARN: 2019/10/13 19:27:25.856137 Cannot resolve 'scope.weave.local.': dial tcp 172.17.0.1:53: connect: connection refused
2019-10-13T19:27:26.177284533Z <probe> INFO: 2019/10/13 19:27:26.176898 Control connection to cloud.weave.works. starting
2019-10-13T19:27:26.563771966Z <probe> WARN: 2019/10/13 19:27:26.563445 Error collecting weave status, backing off 10s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
2019-10-13T19:27:26.604484396Z <probe> ERRO: 2019/10/13 19:27:26.604152 tcp tracer received event with timestamp 12652644202032895 even though the last timestamp was 12652644205786052. Stopping the eBPF tracker.
2019-10-13T19:27:26.604712068Z <probe> ERRO: 2019/10/13 19:27:26.604464 tcp tracer received event with timestamp 12652644202138598 even though the last timestamp was 12652644205786052. Stopping the eBPF tracker.
2019-10-13T19:27:26.624841063Z <probe> INFO: 2019/10/13 19:27:26.624417 Publish loop for cloud.weave.works. starting
2019-10-13T19:27:26.715718105Z <probe> WARN: 2019/10/13 19:27:26.714537 Dropping report to cloud.weave.works.
2019-10-13T19:27:26.758087095Z <probe> WARN: 2019/10/13 19:27:26.757677 Dropping report to cloud.weave.works.
[[[.]
2019-10-13T19:27:27.629609655Z <probe> WARN: 2019/10/13 19:27:27.629252 Dropping report to cloud.weave.works.
2019-10-13T19:27:27.661782679Z <probe> WARN: 2019/10/13 19:27:27.659335 ebpf tracker died, restarting it
2019-10-13T19:27:27.662414461Z <probe> WARN: 2019/10/13 19:27:27.661235 Dropping report to cloud.weave.works.
[...]
2019-10-13T19:27:28.205549324Z <probe> WARN: 2019/10/13 19:27:28.205110 Dropping report to cloud.weave.works.
2019-10-13T19:27:28.659833456Z <probe> WARN: 2019/10/13 19:27:28.659448 Endpoint reporter took longer than 1s
2019-10-13T19:27:28.820249185Z <probe> WARN: 2019/10/13 19:27:28.819872 Endpoint reporter took 1.160715041s (longer than 1s)
2019-10-13T19:27:28.878843652Z <probe> ERRO: 2019/10/13 19:27:28.875629 tcp tracer received event with timestamp 12652646202038856 even though the last timestamp was 12652646204876031. Stopping the eBPF tracker.
2019-10-13T19:27:28.878955243Z <probe> ERRO: 2019/10/13 19:27:28.875708 tcp tracer received event with timestamp 12652646202110612 even though the last timestamp was 12652646204876031. Stopping the eBPF tracker.
2019-10-13T19:27:28.963800677Z <probe> WARN: 2019/10/13 19:27:28.961378 ebpf tracker died again, gently falling back to proc scanning
2019-10-13T19:27:31.067514362Z panic: runtime error: invalid memory address or nil pointer dereference
2019-10-13T19:27:31.067560271Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x15a49fa]
2019-10-13T19:27:31.067568748Z
2019-10-13T19:27:31.067573672Z goroutine 1545 [running]:
2019-10-13T19:27:31.067579267Z github.com/weaveworks/scope/probe/endpoint.(*EbpfTracker).feedInitialConnections(0x0, 0x22d8fc0, 0xc42104ca80, 0xc425108600, 0xc428593f48, 0x0, 0x0, 0xc420b7f8e0, 0x18)
2019-10-13T19:27:31.067583045Z /go/src/github.com/weaveworks/scope/probe/endpoint/ebpf.go:324 +0x3a
2019-10-13T19:27:31.067586942Z github.com/weaveworks/scope/probe/endpoint.(*connectionTracker).getInitialState(0xc420628cc0)
2019-10-13T19:27:31.067590264Z /go/src/github.com/weaveworks/scope/probe/endpoint/connection_tracker.go:186 +0x2bc
2019-10-13T19:27:31.06759352Z created by github.com/weaveworks/scope/probe/endpoint.(*connectionTracker).ReportConnections
2019-10-13T19:27:31.067596559Z /go/src/github.com/weaveworks/scope/probe/endpoint/connection_tracker.go:99 +0x383
The text was updated successfully, but these errors were encountered:
bboreham
added
the
bug
Broken end user or developer functionality; not working as the developers intended it
label
Oct 13, 2019
BUG REPORT
Seen in logs on a Kubernetes cluster. Sometimes the process would restart a few times then work; sometimes it would restart endlessly.
scope version 1.11.6 (1.11.4 and 1.11.3 had same symptom)
I believe this is a race condition from running
getInitialState()
on a background thread - if we shut down the ebpf tracker before that has finished it will crash.These nodes seem to have a lot of TCP connections, e.g.
conntrack -L
reports 245155 on one, only around 3,000 in TIME_WAIT.Logs:
The text was updated successfully, but these errors were encountered: