Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: import/tpch/nodes=8 failed #34694

Closed
cockroach-teamcity opened this issue Feb 7, 2019 · 6 comments
Closed

roachtest: import/tpch/nodes=8 failed #34694

cockroach-teamcity opened this issue Feb 7, 2019 · 6 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/66f13c1d9a12c31e18a198da4ff5ac0bbe2db781

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=import/tpch/nodes=8 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1127849&tab=buildLog

The test failed on provisional_201902062125_v2.2.0-alpha.20190211:
	test.go:743,cluster.go:1585,import.go:150: pq: gs://cockroach-fixtures/tpch-csv/sf-100/lineitem.tbl.2: row 16142300: reading CSV record: read tcp 10.138.0.8:39328->173.194.203.128:443: read: connection reset by peer

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Feb 7, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Feb 7, 2019
@asubiotto
Copy link
Contributor

Seems like a node crashed but can't find anything in the logs or dmesg. My first instinct is to blame #34241 but don't have much context.

@petermattis petermattis assigned dt and maddyblue and unassigned tbg Feb 7, 2019
@petermattis
Copy link
Collaborator

Blaming #34241 without evidence is a bit unfair to that poor issue. The other instances of #34241 have always had a stack trace.

The error message suggests to me a problem reading from GCS. @dt or @mjibson is there anything to be done here. We have retry loops on the reads from cloud providers, right?

@asubiotto
Copy link
Contributor

asubiotto commented Feb 7, 2019

You're right! Sorry.

Seeing something similar with #34693 and #34700 possibly.

@dt
Copy link
Member

dt commented Feb 7, 2019

we have a retry around the creation of the Reader but probably not everywhere that we read from it -- we don't currently wrap the Read method on any of the SDKs so it would just be whatever they have internally.

@maddyblue
Copy link
Contributor

Yes. This was a failure during read mid file. We can't retry ourselves because we don't know the number of bytes that have been read and can't resume from there. Some SDKs do that automatically, including I think GCS. I'm not sure what to do here besides upgrade to the latest GCS SDK and see if they had any improvements in internal retrying.

@dt
Copy link
Member

dt commented Feb 7, 2019

yeah, I'm fine just closing the occasional failure due to external cloud/connectivity issues.

If it were frequent and came up in actual usage, I might want add more retries. As is, I'd rather assume the cloud readers aren't too flaky and if we get an error from one, say so right away -- since maybe they should fix their network config or something -- instead of sitting in a retry loop.

@dt dt closed this as completed Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

6 participants