Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky tests for 3.4 release #10979

Closed
gyuho opened this issue Aug 5, 2019 · 11 comments
Closed

Fix flaky tests for 3.4 release #10979

gyuho opened this issue Aug 5, 2019 · 11 comments
Milestone

Comments

@gyuho
Copy link
Contributor

gyuho commented Aug 5, 2019

Continued from #10700

@gyuho
Copy link
Contributor Author

gyuho commented Aug 5, 2019

Sent out upstream gRPC fixes

For

go test -v -run TestLeasingTxnOwnerIf

# or

=== RUN   TestLeasingNonOwnerPutError
{"level":"warn","ts":"2019-08-03T07:18:10.564Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-99ab9134-5b05-46e0-965d-601c54493c53/localhost:1341936467916325720","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial unix localhost:1341936467916325720: connect: no such file or directory\""}
{"level":"warn","ts":"2019-08-03T07:18:10.565Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-99ab9134-5b05-46e0-965d-601c54493c53/localhost:1341936467916325720","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0xbc42f5]

goroutine 26716 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*csAttempt).sendMsg(0xc00017e700, 0x146c6c0, 0xc000038580, 0xc000038588, 0x5, 0x5, 0xc000038590, 0xa, 0xa, 0xc000038590, ...)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:827 +0x155
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg.func2(0xc00017e700, 0x0, 0xc001cdca50)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:692 +0x186
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc000392120, 0xc001cdca50, 0xc0012b8d38, 0x20c62f8, 0x0)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:572 +0x43d
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg(0xc000392120, 0x146c6c0, 0xc000038580, 0x0, 0x0)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:698 +0x644
go.etcd.io/etcd/etcdserver/etcdserverpb.(*leaseLeaseKeepAliveClient).Send(0xc001cb2560, 0xc000038580, 0x0, 0x0)
	/go/src/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3828 +0x6b
go.etcd.io/etcd/clientv3.(*lessor).sendKeepAliveLoop(0xc001c68a00, 0x1768ae0, 0xc001cb2560)
	/go/src/go.etcd.io/etcd/clientv3/lease.go:573 +0x2d5
created by go.etcd.io/etcd/clientv3.(*lessor).resetRecv
	/go/src/go.etcd.io/etcd/clientv3/lease.go:489 +0x345
FAIL	go.etcd.io/etcd/clientv3/integration	136.425s
=== RUN   TestLeasingReconnectTxn
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0xbc42f5]

goroutine 32579 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*csAttempt).sendMsg(0xc0024e0980, 0x14974a0, 0xc001bc8b30, 0xc0017da388, 0x5, 0x5, 0xc0017da390, 0xb, 0xb, 0xc0017da390, ...)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:827 +0x155
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg.func2(0xc0024e0980, 0x0, 0xc0009dcbe0)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:692 +0x186
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc001ac4000, 0xc0009dcbe0, 0xc0000c5aa8, 0x20c82f8, 0x0)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:572 +0x43d
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg(0xc001ac4000, 0x14974a0, 0xc001bc8b30, 0x0, 0x0)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:698 +0x644
go.etcd.io/etcd/etcdserver/etcdserverpb.(*watchWatchClient).Send(0xc001bc8b10, 0xc001bc8b30, 0x0, 0x20c82f8)
	/go/src/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3697 +0x6b
go.etcd.io/etcd/clientv3.(*watchGrpcStream).run(0xc000152750)
	/go/src/go.etcd.io/etcd/clientv3/watch.go:547 +0xd9b
created by go.etcd.io/etcd/clientv3.(*watcher).newWatcherGrpcStream
	/go/src/go.etcd.io/etcd/clientv3/watch.go:277 +0x58c
FAIL	go.etcd.io/etcd/clientv3/integration	162.826s
=== RUN   TestLeasingNonOwnerPutError
{"level":"warn","ts":"2019-08-04T02:13:05.300Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-1e17e7fb-c8b4-43a1-9d89-79ca429efaef/localhost:51782595789236259660","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial unix localhost:51782595789236259660: connect: no such file or directory\""}
{"level":"warn","ts":"2019-08-04T02:13:05.300Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-1e17e7fb-c8b4-43a1-9d89-79ca429efaef/localhost:51782595789236259660","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[DEBUG] about to panic! true
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbc1cb2]
goroutine 27073 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc/internal/transport.(*Stream).Context(...)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/internal/transport/transport.go:386
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).Context(0xc0017a4480, 0xc001a7d080, 0x7f8cbd54b6)
	/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:565 +0x82
go.etcd.io/etcd/clientv3.(*lessor).sendKeepAliveLoop(0xc0000efb80, 0x1769b40, 0xc001a09d70)
	/go/src/go.etcd.io/etcd/clientv3/lease.go:581 +0x349
created by go.etcd.io/etcd/clientv3.(*lessor).resetRecv
	/go/src/go.etcd.io/etcd/clientv3/lease.go:489 +0x345
FAIL	go.etcd.io/etcd/clientv3/integration	142.556s

@gyuho gyuho mentioned this issue Aug 5, 2019
19 tasks
@gyuho gyuho added this to the etcd-v3.4 milestone Aug 5, 2019
@gyuho
Copy link
Contributor Author

gyuho commented Aug 6, 2019

Right now, we are debugging with gRPC tip grpc/grpc-go#2958, which should fix the panic.

@gyuho
Copy link
Contributor Author

gyuho commented Aug 7, 2019

Now, only functional tests are flaky. Adding server output on failures:

etcd/test

Lines 225 to 240 in 1583547

if [[ "${ETCD_TESTER_EXIT_CODE}" -ne "0" ]]; then
printf "\n"
echo "FAILED! 'tail -1000 /tmp/etcd-functional-1/etcd.log'"
tail -1000 /tmp/etcd-functional-1/etcd.log
printf "\n"
echo "FAILED! 'tail -1000 /tmp/etcd-functional-2/etcd.log'"
tail -1000 /tmp/etcd-functional-2/etcd.log
printf "\n"
echo "FAILED! 'tail -1000 /tmp/etcd-functional-3/etcd.log'"
tail -1000 /tmp/etcd-functional-3/etcd.log
echo "--- FAIL: exit code" ${ETCD_TESTER_EXIT_CODE}
exit ${ETCD_TESTER_EXIT_CODE}
fi

@jingyih
Copy link
Contributor

jingyih commented Aug 7, 2019

Great. So we need grpc-go 1.23 to fix the panics?

@hexfusion
Copy link
Contributor

hexfusion commented Aug 7, 2019

Right now, we are debugging with gRPC tip grpc/grpc-go#2958, which should fix the panic.

Is it too late to get that into 1.23?

grpc/grpc-go#2958 (comment)

@gyuho
Copy link
Contributor Author

gyuho commented Aug 7, 2019

@jingyih @hexfusion gRPC usually releases from master branch cut. It should be included in 1.23.

@tbg
Copy link
Contributor

tbg commented Aug 8, 2019

I also saw TestRoundRobinBalancedResolvableFailoverFromRequestFail fail again in https://travis-ci.com/etcd-io/etcd/jobs/223131529

@tbg
Copy link
Contributor

tbg commented Aug 8, 2019

Also this failure (a functional test?) https://travis-ci.com/etcd-io/etcd/jobs/223267755

[...]
{"level":"fatal","ts":"2019-08-07T17:32:14.954Z","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/etcdmain/main.go:46\nmain.main\n\t/go/src/go.etcd.io/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}

./test: line 199:  8844 Terminated              ./bin/etcd-agent --network tcp --address 127.0.0.1:${a}9027

FAILED! 'tail -1000 /tmp/etcd-functional-2/etcd.log'

tail: cannot open '/tmp/etcd-functional-2/etcd.log' for reading: No such file or directory

@gyuho
Copy link
Contributor Author

gyuho commented Aug 8, 2019

@tbg The functional test failures should be fixed by #11006.

Regarding TestRoundRobinBalancedResolvableFailoverFromRequestFail, @jpbetz any idea?

@tbg
Copy link
Contributor

tbg commented Aug 8, 2019

One more:

--- FAIL: TestSTMSerialize (0.28s)

    v3_stm_test.go:204: etcdserver: leader changed

    v3_stm_test.go:204: etcdserver: leader changed

Haven't seen an issue about this one, but also I don't think it's caused by my PR:

https://travis-ci.com/etcd-io/etcd/jobs/223465586

@gyuho
Copy link
Contributor Author

gyuho commented Aug 13, 2019

Closing this for now. Most notable ones are fixed.

@gyuho gyuho closed this as completed Aug 13, 2019
@gyuho gyuho unpinned this issue Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants