Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel: reported stress failure but test is passing #74013

Closed
cockroach-teamcity opened this issue Dec 18, 2021 · 25 comments
Closed

bazel: reported stress failure but test is passing #74013

cockroach-teamcity opened this issue Dec 18, 2021 · 25 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

ts/testmodel.TestModelDBQuery failed with artifacts on master @ 3f95a4bd83cce2952a12497de82692866b4da659:

=== RUN   TestModelDBQuery/#12
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:

  • TAGS=bazel,gss

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Dec 18, 2021
@tbg tbg changed the title ts/testmodel: TestModelDBQuery failed bazel: reported stress failure but test is passing Dec 20, 2021
@tbg
Copy link
Member

tbg commented Dec 20, 2021

This test did not fail (If I'm reading the logs correctly). I saw this in another failure recently too (can't find it now, but I did assign that issue to the bazel folks as well so it's in your proj board).

@rickystewart
Copy link
Collaborator

From the logs:

08:40:56     2021/12/18 08:40:56 found outstanding output. Considering last test failed: {github.com/cockroachdb/cockroach/pkg/ts/testmodel TestModelDBQuery/#12}
08:40:56     2021/12/18 08:40:56 consolidating failed subtest "#12" into parent test "TestModelDBQuery"
08:40:56     2021/12/18 08:40:56 failed parent test {"github.com/cockroachdb/cockroach/pkg/ts/testmodel" "TestModelDBQuery"}

@rickystewart rickystewart self-assigned this Dec 20, 2021
@rickystewart
Copy link
Collaborator

Unfortunately the .json file gets deleted, so I can't investigate to see what it might have said.

@tbg
Copy link
Member

tbg commented Dec 21, 2021

I think I found the same in the other issue. Fixing that would be a good first step.

edit: spent a good five minutes looking for that other issue I keep referring to. No dice. Not sure what I did with it but I can't imagine I did anything but rename it + file it for dev-inf and bazel, like this one.

@tbg
Copy link
Member

tbg commented Dec 21, 2021

Found it: #73841

rickystewart added a commit to rickystewart/cockroach that referenced this issue Dec 21, 2021
We've seen some issues where Bazel jobs are failing in `github-post`
(cockroachdb#73841, cockroachdb#74013) with the following output:

    found outstanding output. Considering last test failed:

It's hard to say what the problem is because these scripts haven't kept
the `test.json.txt` in `artifacts`. Here I remove the logic to clean up
the file so we can RC further instances of the problem.

Release note: None
@rickystewart
Copy link
Collaborator

OK, PR to fix that problem: #74152

craig bot pushed a commit that referenced this issue Dec 21, 2021
72992: util/log: fix redactability of logging tags r=abarganier a=knz

Fixes #72905.

Some time in the v21.2 cycle, the log entry preparation logic was
refactored and a mistake was introduced: the logging tags were not any
more subject to the redaction logic. The result was that redaction
markers were missing in log tag values, and if a value had contained
unbalanced redaction markers in a value string (say, as part of a SQL
table key), it would have caused log file corruption and possibly a
confidential data leak.

This patch fixes that, by preparing the logging tags in the same way
as the main message for each entry.

Release note (cli change): A bug affecting the redactability of
logging tags in output log entries has been fixed. This bug had
been introduced in the v21.2 release.

73937: setting: introduce setting classes r=RaduBerinde a=RaduBerinde

This commit introduces the three setting classes in the RFC (#73349):
`SystemOnly`, `TenantReadOnly`, and `TenantWritable`. The `SystemOnly`
class replaces the existing `WithSystemOnly()`.

In this change we don't yet implement the advertised semantics. We
mechanically use `TenantWritable` for all settings except those that
were using `WithSystemOnly()` which use `SystemOnly`; this should not
change any existing behavior. The classes will be revisited in a
separate change, after we implement the semantics.

Release note: None

73978: opt: fix like escape processing for span constraints r=cucaroach a=cucaroach

Fixes: #44123

Previously no attempt was made to properly handle escape ('\\') sequence
in like patterns being turned into constraints. Refactor code used to
process like at runtime to generate a regexp and use that to properly
handle index constraint generation.

Release note (sql change): Escape character processing was missing from
constraint span generation which resulted in incorrect results when
doing escaped like lookups.

74102: sql: do not fetch virtual columns during backfill r=mgartner a=mgartner

Fixes #73372

Release note (bug fix): A bug has been fixed that caused internal errors
when altering the primary key of a table. The bug was only present if
the table had a partial index with a predicate that referenced a virtual
computed column. This bug was present since virtual computed columns
were added in version 21.1.0.

74110: bazel: require setting `cockroach_cross=y` to opt into cross toolchains r=irfansharif a=rickystewart

With `--incompatible_enable_cc_toolchain_resolution` set in #73819, now
Bazel selects the appropriate toolchain for you. Bazel was selecting the
`cross_linux_toolchain` when building for the host platform on Linux,
resulting in link errors when trying to compile `stress` under `race`.
We update the toolchains to instead require opting into the cross
toolchains by defining `cockroach_cross=y`.

Closes #73997.

Release note: None

74111: bench/rttanalysis: allow roundtrips to be off by 1 r=ajwerner a=ajwerner

If we don't have a range, let the currently estimate be wrong by 1.
We mostly care about the ballpark and the growth rate. I'm sick of
these flakes.

Fixes #73884.

Release note: None

74152: ci: don't delete `test.json.txt` after processing r=tbg a=rickystewart

We've seen some issues where Bazel jobs are failing in `github-post`
(#73841, #74013) with the following output:

    found outstanding output. Considering last test failed:

It's hard to say what the problem is because these scripts haven't kept
the `test.json.txt` in `artifacts`. Here I remove the logic to clean up
the file so we can RC further instances of the problem.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
gustasva pushed a commit to gustasva/cockroach that referenced this issue Jan 4, 2022
We've seen some issues where Bazel jobs are failing in `github-post`
(cockroachdb#73841, cockroachdb#74013) with the following output:

    found outstanding output. Considering last test failed:

It's hard to say what the problem is because these scripts haven't kept
the `test.json.txt` in `artifacts`. Here I remove the logic to clean up
the file so we can RC further instances of the problem.

Release note: None
@tbg
Copy link
Member

tbg commented Jan 6, 2022

@rickystewart we have multiple reproductions with the output now, the output always ends mid-stream, for example here:

image

Could dev-inf pick this up again? I'm seeing a few flakes per week from this and I imagine there's a fair amount that don't get routed to me.

@tbg
Copy link
Member

tbg commented Jan 11, 2022

@tbg
Copy link
Member

tbg commented Jan 11, 2022

Maybe as a hot fix, we can disable the nightly bazel stress build?

@rickystewart
Copy link
Collaborator

Maybe as a hot fix, we can disable the nightly bazel stress build?

Done.

@mgartner
Copy link
Collaborator

mgartner commented Jan 12, 2022

I've preserved the incomplete output from one of the "failing" tests here.

@rickystewart
Copy link
Collaborator

@mgartner -- Broken link.

@mgartner
Copy link
Collaborator

@rickystewart Oops. Fixed.

@tbg
Copy link
Member

tbg commented Jan 31, 2022

I think this is another one: nevermind, found the panic in the stress.log output, sorry about the noise!

@rickystewart
Copy link
Collaborator

Since #75087, this should be resolved.

@tbg
Copy link
Member

tbg commented Feb 14, 2022

Unfortunately, this doesn't seem to be resolved, see #76492.

@tbg tbg reopened this Feb 14, 2022
@tbg
Copy link
Member

tbg commented Feb 14, 2022

image

The output just ends. Possible that there is an OOM or something like that, but it's difficult to tell.

@rickystewart
Copy link
Collaborator

If I look in the build log I see the following:

08:28:20     === RUN   TestStoreScanInconsistentResolvesIntents
08:28:20         test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/751d67000aac5f3394c2369309253f02/logTestStoreScanInconsistentResolvesIntents2968565157
08:28:20         test_log_scope.go:80: use -show-logs to present logs inline
08:28:20     panic: interface conversion: interface {} is nil, not uuid.UUID [recovered]
08:28:20       panic: interface conversion: interface {} is nil, not uuid.UUID [recovered]
08:28:20       panic: interface conversion: interface {} is nil, not uuid.UUID
08:28:20     
08:28:20     goroutine 134067015 [running]:
08:28:20     github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc016abfb90, {0x54a7e78, 0xc016c6aa20})
08:28:20       github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:251 +0x94
08:28:20     panic({0x3d056a0, 0xc016c6aae0})
08:28:20       GOROOT/src/runtime/panic.go:1038 +0x215
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Send.func1()
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_send.go:95 +0x1ff
08:28:20     panic({0x3d056a0, 0xc016c6aae0})
08:28:20       GOROOT/src/runtime/panic.go:1047 +0x266
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.TestStoreScanInconsistentResolvesIntents.func1({{0x54a7dd0, 0xc01a066c80}, {0xc01a7a61d0, 0x8}, 0x0, 0x1, {0x553c0a8, 0xc016090f00}, {{0x16d399b6bbb4912c, 0x0, ...}, ...}, ...})
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_test.go:2098 +0x154
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateBatch({_, _}, {_, _}, {_, _}, {_, _}, _, 0xc010d8ba00, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_evaluate.go:255 +0xa43
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatchWrapper(_, {_, _}, {_, _}, {_, _}, _, _, {{0x16d399b6bbb4912d, ...}, ...}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_write.go:623 +0x1a7
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatchWithServersideRefreshes(_, {_, _}, {_, _}, {_, _}, _, _, {{0x16d399b6bbb4912d, ...}, ...}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_write.go:591 +0x2fd
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatch(_, {_, _}, {_, _}, _, {{0x16d399b6bbb4912d, 0x0, 0x0}, {0x16d399b6bbb4912c, ...}}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_write.go:407 +0x63d
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateProposal(0xc0067cf500, {0x54a7dd0, 0xc01a066c80}, {0xc01a7a61d0, 0x8}, 0xc010d8ba00, {{0x16d399b6bbb4912d, 0x0, 0x0}, {0x16d399b6bbb4912c, ...}}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_proposal.go:799 +0x1d7
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).requestToProposal(0x0, {0x54a7dd0, _}, {_, _}, _, {{{0x0, 0x0, 0x0}, 0xc01fb4ac50, ...}, ...}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_proposal.go:899 +0xaa
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evalAndPropose(0xc0067cf500, {0x54a7dd0, _}, _, _, {{{0x0, 0x0, 0x0}, 0xc01fb4ac50, {0x1, ...}, ...}, ...}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:110 +0x1e5
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeWriteBatch(0xc0067cf500, {0x54a7dd0, 0xc01a066c80}, 0xc010d8ba00, 0xc0178ae780)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_write.go:153 +0x754
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries(0xc0067cf500, {0x54a7dd0, 0xc01a066c80}, 0xf, 0x44a83a8)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:463 +0x388
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendWithoutRangeID(0xc0067cf500, {0x54a7e78, 0xc016c6aa50}, 0xc010d8ba00)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:182 +0x437
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Send(...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:98
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Send(_, {_, _}, {{{0x16d399b6bbb4912c, 0x0, 0x0}, 0xc01a7a6190, {0x1, 0x1, 0x1, ...}, ...}, ...})
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_send.go:197 +0x745
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*senderTransport).SendNext(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x1, 0x1, 0x1, ...}, ...}, ...})
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:337 +0x169
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendToReplicas(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x0, 0x0, 0x0, ...}, ...}, ...}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:2029 +0x11af
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x0, 0x0, 0x0, ...}, ...}, ...}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1577 +0xce5
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x0, 0x0, 0x0, ...}, ...}, ...}, ...)
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1179 +0x426
08:28:20     github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x0, 0x0, 0x0, ...}, ...}, ...})
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:800 +0x6e5
08:28:20     github.com/cockroachdb/cockroach/pkg/kv.(*CrossRangeTxnWrapperSender).Send(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x0, 0x0, 0x0, ...}, ...}, ...})
08:28:20       github.com/cockroachdb/cockroach/pkg/kv/db.go:221 +0xd8
08:28:20     github.com/cockroachdb/cockroach/pkg/internal/client/requestbatcher.(*RequestBatcher).sendBatch.func1.1({0x54a7e78, 0xc016c6aa20})
08:28:20       github.com/cockroachdb/cockroach/pkg/internal/client/requestbatcher/batcher.go:278 +0xe8
08:28:20     github.com/cockroachdb/cockroach/pkg/internal/client/requestbatcher.(*RequestBatcher).sendBatch.func1({0x54a7e78, 0xc016c6aa20})
08:28:20       github.com/cockroachdb/cockroach/pkg/internal/client/requestbatcher/batcher.go:309 +0x218
08:28:20     github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
08:28:20       github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:494 +0x16f
08:28:20     created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
08:28:20       github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:483 +0x445
08:28:20     I220214 08:28:19.183654 1 (gostd) testmain.go:1544  [-] 1  Test //pkg/kv/kvserver:kvserver_test exited with error code 2
08:28:20     
08:28:20     
08:28:20     ERROR: exit status 2
08:28:20     
08:28:20     3 runs completed, 1 failures, over 32m25s
08:28:20     context canceled
08:28:20     FAIL
08:28:20     ================================================================================
08:28:20     //pkg/kv/kvserver:kvserver_test                                          FAILED in 1944.9s
08:28:20       /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/execroot/cockroach/bazel-out/k8-dbg/testlogs/pkg/kv/kvserver/kvserver_test/test.log

@rickystewart
Copy link
Collaborator

I'm going to re-close this because it doesn't seem to have anything to do with the original bug (which, indeed, I have not seen repro'd since I fixed it a few weeks back).

@irfansharif
Copy link
Contributor

@rickystewart should we file an issue to make stray panics easily discoverable in bazel test logs/CI?

@rickystewart
Copy link
Collaborator

@irfansharif I think so. We have #74105 and it seems related, but not exactly the same as, the panic logging issue. Let me know if you don't file the issue and I will.

@irfansharif
Copy link
Contributor

@tbg, I followed the sequence of steps above starting from the TC failure link. I feel in this specific instance the panic logging was not that undiscoverable. Probably there are other instances where there isn't the case, but if they come up, I'll file separately with evidence where we can improve.

image

Scrolling below:

image

@tbg
Copy link
Member

tbg commented Feb 15, 2022

@irfansharif my pre-bazel muscle memory has me ignore the build log. All failures were detectable from artifacts. In particular, one could look at full_output.txt, which is where I looked, and where the output ends abruptly:

https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_StressBazel/4369269:id/full_output.tgz!/full_output.txt

I do think if it is straightforward to arrange, panics should be included there as well (this was certainly the old behavior). It's problematic if there are multiple places one has to look for any given kind of failure.

@rickystewart
Copy link
Collaborator

I agree it's problematic if you have to look in one of multiple places to find failures, and for that reason I think that dropping the pre-Bazel muscle memory and checking the build log is a good idea. There are some exceptions -- acceptance and other tests where the meaningful stuff is buried in artifacts -- but I don't think that pattern is desirable or advantageous and that we should continue trying to implement it. In general the movement should be in the opposite direction IMO, making the build log more meaningful and useful rather than less.

Regarding getting panic info into the full_output.txt -- checking teamcity-bazel-support.sh I see the following:

    $testfilter -mode=convert < "$test_json" > "$artifacts_dir"/full_output.txt
    (cd "$artifacts_dir" && tar --strip-components 1 -czf full_output.tgz full_output.txt $(basename $test_json))
    rm -rf "$artifacts_dir"/full_output.txt

@rail wrote this, I think. Looks like we convert the test.json contents to full_output.txt using testfilter. So presumably the panic info is either not in test.json, or testfilter drops it as part of the conversion. One of those would have to be fixed.

@tbg
Copy link
Member

tbg commented Feb 16, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

5 participants