Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: TestShowTraceReplica failed #98598

Closed
cockroach-teamcity opened this issue Mar 14, 2023 · 7 comments
Closed

sql: TestShowTraceReplica failed #98598

cockroach-teamcity opened this issue Mar 14, 2023 · 7 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 14, 2023

sql.TestShowTraceReplica failed with artifacts on master @ 024da43b378167023d483325e714603005c4ba7a:

=== RUN   TestShowTraceReplica
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/f4d1e5a362e4384d18c9c26e2defb957/logTestShowTraceReplica3723250020
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestShowTraceReplica
    show_trace_replica_test.go:121: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/f4d1e5a362e4384d18c9c26e2defb957/logTestShowTraceReplica3723250020
--- FAIL: TestShowTraceReplica (55.06s)
=== RUN   TestShowTraceReplica/SELECT_*_FROM_d.t1
    show_trace_replica_test.go:104: condition failed to evaluate within 45s: SELECT * FROM d.t1: got [[4 4]] expected [[1 1]]
    --- FAIL: TestShowTraceReplica/SELECT_*_FROM_d.t1 (45.09s)
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-25352

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Mar 14, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 14, 2023
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Mar 14, 2023
@mgartner
Copy link
Collaborator

Let's skip this until it can be fixed. This is a dup of #34213.

@mgartner mgartner assigned mgartner and msirek and unassigned mgartner Mar 14, 2023
@msirek
Copy link
Contributor

msirek commented Mar 14, 2023

Logs were not saved for the failing test at the indicated location (/artifacts/tmp/_tmp/f4d1e5a362e4384d18c9c26e2defb957/logTestShowTraceReplica3723250020), but the output shows a SELECT returning unexpected results, SELECT * FROM d.t1: got [[4 4]] expected [[1 1]]

=== RUN   TestShowTraceReplica
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/f4d1e5a362e4384d18c9c26e2defb957/logTestShowTraceReplica3723250020
    test_log_scope.go:79: use -show-logs to present logs inline
=== RUN   TestShowTraceReplica/SELECT_*_FROM_d.t1
    show_trace_replica_test.go:104: condition failed to evaluate within 45s: SELECT * FROM d.t1: got [[4 4]] expected [[1 1]]
=== RUN   TestShowTraceReplica/UPSERT_INTO_d.t2_VALUES_(1)
=== RUN   TestShowTraceReplica/DELETE_FROM_d.t2
=== RUN   TestShowTraceReplica/ALTER_TABLE_d.t3_SCATTER
=== CONT  TestShowTraceReplica
    show_trace_replica_test.go:121: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/f4d1e5a362e4384d18c9c26e2defb957/logTestShowTraceReplica3723250020
--- FAIL: TestShowTraceReplica (55.06s)
    --- FAIL: TestShowTraceReplica/SELECT_*_FROM_d.t1 (45.09s)
    --- PASS: TestShowTraceReplica/UPSERT_INTO_d.t2_VALUES_(1) (0.18s)
    --- PASS: TestShowTraceReplica/DELETE_FROM_d.t2 (0.01s)
    --- PASS: TestShowTraceReplica/ALTER_TABLE_d.t3_SCATTER (0.03s)

@msirek
Copy link
Contributor

msirek commented Mar 14, 2023

Given the test output, this command may run the test more like TeamCity runs it:
bazel test --config=cinolint -c fastbuild //pkg/sql:sql_test

This didn't reproduce the problem though.
I also tried running this under stress, with the option --run_under '@com_github_cockroachdb_stress//:stress', but that hung my system.

@msirek
Copy link
Contributor

msirek commented Mar 14, 2023

I was able to reproduce this with the following command:

bazel test --config=cinolint -c fastbuild //pkg/sql:sql_test --run_under '@com_github_cockroachdb_stress//:stress ' --test_filter=TestShowTraceReplica

...
...
24 runs so far, 0 failures, over 1m40s
26 runs so far, 0 failures, over 1m45s

initialized metamorphic constant "span-reuse-rate" with value 18
initialized metamorphic constant "COCKROACH_ENABLE_HDR_HISTOGRAMS" with value true
initialized metamorphic constant "parse-json-impl" with value 0
initialized metamorphic constant "coldata-batch-size" with value 504
initialized metamorphic constant "mvcc-incremental-iter-tbi" with value false
initialized metamorphic constant "mvcc-value-disable-simple-encoding" with value true
initialized metamorphic constant "storage.value_blocks.enabled" with value false
initialized metamorphic constant "disable-checksstconflicts-range-key-masking" with value true
initialized metamorphic constant "default-batch-bytes-limit" with value 46953
initialized metamorphic constant "kv-batch-size" with value 1
initialized metamorphic constant "datum-row-converter-batch-size" with value 1
initialized metamorphic constant "row-container-rows-per-chunk-shift" with value 1
initialized metamorphic constant "inverted-joiner-batch-size" with value 1
initialized metamorphic constant "spilling-queue-initial-len" with value 1
initialized metamorphic constant "ColIndexJoin-batch-size" with value 476105
initialized metamorphic constant "ColIndexJoin-using-streamer-batch-size" with value 8329006
initialized metamorphic constant "max-batch-size" with value 2060
initialized metamorphic constant "max-batch-byte-size" with value 6837472
initialized metamorphic constant "parallel-scan-result-threshold" with value 3848
initialized metamorphic constant "split-scans-right-for-stats-first" with value true
initialized metamorphic constant "raft-log-truncation-clearrange-threshold" with value 199649
initialized metamorphic constant "copy-batch-size" with value 7399
initialized metamorphic constant "async-IE-result-channel-buffer-size" with value 25
initialized metamorphic constant "copy-fast-path-enabled-default" with value false
initialized metamorphic constant "span-reuse-rate" with value 63
initialized metamorphic constant "COCKROACH_ENABLE_HDR_HISTOGRAMS" with value true
initialized metamorphic constant "parse-json-impl" with value 1
initialized metamorphic constant "coldata-batch-size" with value 4079
initialized metamorphic constant "storage.value_blocks.enabled" with value false
initialized metamorphic constant "mvcc-max-iters-before-seek" with value 1
initialized metamorphic constant "default-batch-bytes-limit" with value 6846
initialized metamorphic constant "kv-batch-size" with value 1
initialized metamorphic constant "datum-row-converter-batch-size" with value 1
initialized metamorphic constant "row-container-rows-per-chunk-shift" with value 1
initialized metamorphic constant "spilling-queue-initial-len" with value 4
initialized metamorphic constant "merge-joiner-groups-buffer" with value 10
initialized metamorphic constant "direct-scans-enabled" with value true
initialized metamorphic constant "ColIndexJoin-batch-size" with value 376542
initialized metamorphic constant "max-batch-size" with value 9077
initialized metamorphic constant "parallel-scan-result-threshold" with value 4626
initialized metamorphic constant "split-scans-right-for-stats-first" with value true
initialized metamorphic constant "copy-batch-size" with value 49
initialized metamorphic constant "async-IE-result-channel-buffer-size" with value 15
initialized metamorphic constant "use-index-lookup-for-descriptors-in-database" with value false
initialized metamorphic constant "copy-fast-path-enabled-default" with value false
I230314 20:09:24.405256 1 (gostd) rand.go:199  [T1] 1  random seed: -7119735781298532229
=== RUN   TestShowTraceReplica
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/31026032a53ac4950ec0ef26cdd2bd3b/logTestShowTraceReplica590011483
    test_log_scope.go:79: use -show-logs to present logs inline
=== RUN   TestShowTraceReplica/SELECT_*_FROM_d.t1
    show_trace_replica_test.go:104: condition failed to evaluate within 45s: SELECT * FROM d.t1: got [[4 4]] expected [[1 1]]
=== RUN   TestShowTraceReplica/UPSERT_INTO_d.t2_VALUES_(1)
=== RUN   TestShowTraceReplica/DELETE_FROM_d.t2
=== RUN   TestShowTraceReplica/ALTER_TABLE_d.t3_SCATTER
=== CONT  TestShowTraceReplica
    show_trace_replica_test.go:121: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/31026032a53ac4950ec0ef26cdd2bd3b/logTestShowTraceReplica590011483
--- FAIL: TestShowTraceReplica (96.02s)
    --- FAIL: TestShowTraceReplica/SELECT_*_FROM_d.t1 (45.23s)
    --- PASS: TestShowTraceReplica/UPSERT_INTO_d.t2_VALUES_(1) (8.97s)
    --- PASS: TestShowTraceReplica/DELETE_FROM_d.t2 (0.01s)
    --- PASS: TestShowTraceReplica/ALTER_TABLE_d.t3_SCATTER (0.27s)
FAIL
I230314 20:11:01.339664 1 (gostd) testmain.go:992  [T1] 1  Test //pkg/sql:sql_test exited with error code 1


ERROR: exit status 1

28 runs completed, 1 failures, over 1m49s
context canceled
FAIL
================================================================================

Given that output, I tried running as a single test with the given seed value:

./dev test pkg/sql -f=TestShowTraceReplica -v --ignore-cache -- --test_env=COCKROACH_RANDOM_SEED=-7119735781298532229

but it passes.

@cockroach-teamcity
Copy link
Member Author

sql.TestShowTraceReplica failed with artifacts on master @ a36d88bebd1d26161c3c7327b86af72fca88fc2c:

=== RUN   TestShowTraceReplica
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/31026032a53ac4950ec0ef26cdd2bd3b/logTestShowTraceReplica1526383266
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestShowTraceReplica
    show_trace_replica_test.go:121: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/31026032a53ac4950ec0ef26cdd2bd3b/logTestShowTraceReplica1526383266
--- FAIL: TestShowTraceReplica (56.20s)
=== RUN   TestShowTraceReplica/UPSERT_INTO_d.t2_VALUES_(1)
    show_trace_replica_test.go:104: condition failed to evaluate within 45s: UPSERT INTO d.t2 VALUES (1): got [[3 3]] expected [[2 2]]
    --- FAIL: TestShowTraceReplica/UPSERT_INTO_d.t2_VALUES_(1) (45.71s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Mar 15, 2023
97321: copy: enhance copyfrom tests with kvtrace feature and more tests r=cucaroach a=cucaroach

Epic: CRDB-18892
Informs: #91831
Release note: None


98264: colfetcher: track KV CPU time in the direct columnar scan r=yuzefovich a=yuzefovich

This commit addresses a minor TODO to track the KV CPU time when direct
columnar scans are used. In the regular columnar scan this time is
tracked by the cFetcher, but with the KV projection pushdown the
cFetcher is used on the KV server side, so we need to augment the
ColBatchDirectScan to track it. Notably, this means that the decoding
done on the KV server side is included. Additionally, this commit
clarifies how the KV CPU time is obtained from the cFetcher (we don't
need to use a helper (unlike in the case of `bytesRead` and
`batchRequestsIssued` fields which are written to on `cFetcher.Close`),
and we don't need the mutex protection there either).

Epic: None

Release note: None

98546: multitenant: allow secondary tenants to split/scatter by default r=knz a=arulajmani

AdminSplit and AdminScatter requests are subject to capability checks.
Previously, these capabilities were codified in the "enabled" form. As
such, by default, secondary tenants did not have the ability to perform
these operations. This is in violation of what secondary tenants could
do prior to 23.1, at a time before capabilities existed. Moreover,
RESTORE/IMPORT rely on performing these operations for performance.
This made disallowing these operations by default a performance
regression.

This patch flips the phrasing of how these capabilities are stored on
the proto to use the "disable" verbiage. As such, secondary tenants are
able to perform splits and scatters by default. However, no change is
made to the public interface -- users above the `tenantcapabilitiespb`
package continue to interact with these capabilities as they were
before, oblivious to how these things are stored on disk.

There's a few testing changes here:
- As part of this change, we also clean up a testing knob that was used
by various backup, CDC, and logictests to override capability checks in
the authorizer. This isn't required with the new default behaviour.
- We also add some missing E2E tests for the `CanAdminUnsplit` capability
which were missing when it was introduced.

Fixes #96736

Release note: None

98615: sql_test: re-skip TestShowTraceReplica r=msirek a=msirek

TestShowTraceReplica wasn't failing under stress, but failed in TeamCity
once enabled. This re-skips the test until it can be reliably reproduced
and debugged.

Informs #98598

Release note: None 

Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Mark Sirek <[email protected]>
@mgartner
Copy link
Collaborator

This is a duplicate of #34213. We were keeping open as a reminder to re-skip it. I'm going to close this and leave #34213 open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

No branches or pull requests

4 participants