-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload: fix rand workload #88362
workload: fix rand workload #88362
Conversation
Note that this does not guarantee that no errors are seen when running the
Fixing this type of "bug" is out of scope for this PR as it's not related to the workload itself. |
pkg/workload/rand/rand.go
Outdated
@@ -67,10 +85,11 @@ var randMeta = workload.Meta{ | |||
g.flags.StringVar(&g.tableName, `table`, ``, `Table to write to`) | |||
g.flags.IntVar(&g.batchSize, `batch`, 1, `Number of rows to insert in a single SQL statement`) | |||
g.flags.StringVar(&g.method, `method`, `upsert`, `Choice of DML name: insert, upsert, ioc-update (insert on conflict update), ioc-nothing (insert on conflict no nothing)`) | |||
g.flags.Int64Var(&g.seed, `seed`, 1, `Key hash seed.`) | |||
g.flags.Int64Var(&g.seed, `seed`, defaultRandomSeed(), `Key hash seed.`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I wasn't aware we were defaulting to 1
, so I checked other workloads. They seem to follow the same pattern, e.g., https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/tpcc/tpcc.go#L187
I wonder why?
pkg/workload/rand/BUILD.bazel
Outdated
go_test( | ||
name = "rand_test", | ||
srcs = ["rand_test.go"], | ||
args = ["-test.timeout=295s"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bazel defaults to "medium" but it seems like this test doesn't need more than a few seconds, so maybe size = "small"
?
209cf14
to
dec6b1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)
pkg/workload/rand/BUILD.bazel
line 29 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Bazel defaults to "medium" but it seems like this test doesn't need more than a few seconds, so maybe
size = "small"
?
Good point, fixed.
pkg/workload/rand/rand.go
line 88 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Nice! I wasn't aware we were defaulting to
1
, so I checked other workloads. They seem to follow the same pattern, e.g., https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/tpcc/tpcc.go#L187I wonder why?
Yes, after making this change I realized the other workloads also use 1
as the default seed. IMO, making the seed different on each run is a better default though. Especially in the context of workloads being used in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r1, 2 of 2 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it doesn't attempt to satisfy types or constraints; adding a couple more examples from my local runs,
Error: pq: duplicate key value violates unique constraint "table6234468500453514806_col6234468500453514806_4_expr_key"
Error: pq: bit string length 1 does not match type BIT(36)
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @smg260, and @srosenberg)
pkg/workload/rand/rand.go
line 88 at r2 (raw file):
Previously, renatolabs (Renato Costa) wrote…
Yes, after making this change I realized the other workloads also use
1
as the default seed. IMO, making the seed different on each run is a better default though. Especially in the context of workloads being used in tests.
I agree that it shouldn't default to a fixed value. I am just surprised it's been like this since this commit [1] (circa 2019). We could have been fuzzing more states :(
We should update all workloads to use a new seed by default; obviously, it's out of scope for this PR.
[1] 626b1a6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r1, 2 of 2 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @smg260, and @srosenberg)
pkg/workload/rand/rand.go
line 67 at r4 (raw file):
func defaultRandomSeed() int64 { defaultSeedOnce.Do(func() { defaultSeed = timeutil.Now().UTC().UnixNano()
nit: UTC
is redundant.
On second thought, a time-based seed is not as uniform as a crypto-based seed. cli.Main
is already using a seed by reading it from /dev/urandom
[1]. I think we should do the same in all the workloads.
[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/cli.go#L53
pkg/workload/rand/rand_test.go
line 44 at r4 (raw file):
// insert it into a column func TestRandRun(t *testing.T) { defer leaktest.AfterTest(t)()
Let's also add defer log.Scope(t).Close(t)
since there are quite a few subtests; collectively, their output is verbose. In case of failure, it's nice to have a dedicated log with all the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @smg260, and @srosenberg)
pkg/workload/rand/rand.go
line 88 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
I agree that it shouldn't default to a fixed value. I am just surprised it's been like this since this commit [1] (circa 2019). We could have been fuzzing more states :(
We should update all workloads to use a new seed by default; obviously, it's out of scope for this PR.[1] 626b1a6
Note, another side-effect of this change is that the workload binary needs to be invoked with --init
. Previously, it was possible to reuse the same tables between runs. I don't think we need to preserve that behavior, just noting the (obvious) difference :)
dec6b1a
to
37607b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)
pkg/workload/rand/rand.go
line 88 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Note, another side-effect of this change is that the workload binary needs to be invoked with
--init
. Previously, it was possible to reuse the same tables between runs. I don't think we need to preserve that behavior, just noting the (obvious) difference :)
It doesn't need to be run with --init
; but the caller does need to make sure they pass the same seed that was used when initializing. If a user is running the commands interactively, they can copy and paste the seed that is logged. If calling workload
programatically, the best way to go is to probably generate the seed out of band and pass it to every workload
call.
pkg/workload/rand/rand.go
line 67 at r4 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
nit:
UTC
is redundant.On second thought, a time-based seed is not as uniform as a crypto-based seed.
cli.Main
is already using a seed by reading it from/dev/urandom
[1]. I think we should do the same in all the workloads.[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/cli.go#L53
Good catch, fixed.
pkg/workload/rand/rand_test.go
line 44 at r4 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Let's also add
defer log.Scope(t).Close(t)
since there are quite a few subtests; collectively, their output is verbose. In case of failure, it's nice to have a dedicated log with all the output.
Good point, added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r5, 2 of 2 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)
a discussion (no related file):
Created issue #88566 to deal with the default seeds in workloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the first one is a conflict on a non-primary-key index. It's technically possible to deal with them (the workload already deals with conflicts on the primary key), but it doesn't do that right now.
As for the second one, it's not that it doesn't try to satisfy the type, but more that the type OID read from pg_catalog.pg_attribute
for the BIT
type maps to a variable-length BIT
type even though the schema was defined with as BIT
of a specific length. I'm not entirely sure why that is the case.
@ajwerner would you know if there's a reliable way for this workload to get the width of the BIT
column? I'm asking you directly because I copied a change you made in the schemachange
workload [1] in this PR, and I don't understand why it's needed.
[1]
cockroach/pkg/workload/schemachange/type_resolver.go
Lines 106 to 111 in 975701e
// Special case CHAR to have the right width. | |
if objectID == oid.T_bpchar { | |
t := *types.OidToType[objectID] | |
t.InternalType.Width = 1 | |
return &t, nil | |
} |
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)
The OIDs don't imply the widths. I guess you can make a char(2) type or whatever width you want but if you don't name the width I guess we do 1? I think I just wanted to make sure randgen always made a valid value and 1 will do that for all char widths. I've learned now that you can access column widths via the |
The `rand` workload generates random table definitions when intialized, and performs random writes to the first table created when run. It relies on the `randgen` package to generate random data and has internal logic to convert from Datum to the corresponding Go data structure to be passed to a database call. This means that whenever `randgen` is able to generate new types of data, the `rand` workload needs to be changed accordingly. Since there were no automated means to detect when the `rand` workload had drifted (and this workload is probably not run very frequently), running `rand` could fail because it wouldn't know how to generate a Go data structure for a specific datum. This commit updates the `rand` workload to add support for the missing data types and also adds a unit test that verifies that it is able to generate and insert data for all possible column types. If a new type is added to `randgen.RandomDatum`, this test should fail. Release note: None
The `rand` workload would default to the using `1` as the random seed when the corresponding command line flag was not passed. Since it's easy to _not_ pass this flag, it's also easy to always run the same workload over and over again. This commit updates the workload to use a different random seed each time. A consequence of this change is that the seed used to `init` the workload will have to be passed when running it. Release note: None
37607b6
to
d1f7d54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)
a discussion (no related file):
Pushed a couple of changes here:
- Improved logging of seed, especially in the case where the caller passes
--seed
. Not great, but current solution works. We should come up with a better, reusable strategy in Use different workload seeds by default #88566. - Changed the test to actually test all possible column types that
randgen
can generate; specifically, arrays of different types of data are now part of the test as well. - A consequence of the above is that I had to change the logic for arrays; arrays of geography/geometry data types use
:
as separator. - I used the information on @ajwerner's comment about
information_schema.columns
to get the width of bit/char columns. We shouldn't see mismatched width errors when trying to insert anymore.
@srosenberg or @smg260 could you take another look, please?
a discussion (no related file):
@srosenberg taking a second look at the errors you posted:
Error: pq: duplicate key value violates unique constraint "table6234468500453514806_col6234468500453514806_4_expr_key"
I've seen this happen when I run the workload without first clearing the data from a previous run. As I said, it's technically possible to see this error in other circumstances, but the random generator would have to generate the same value twice, which is quite unlikely for most column types. Could you try running init
with --drop
and let me know if the issue persists (if you happen to have access to the seed that was used in this run)?
Error: pq: bit string length 1 does not match type BIT(36)
This has to do with the wrong width information being used in the rand
workload, which should now be fixed. If you still have the seed for this one, I'd appreciate if you could double check that the fix works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// defaultRandomSeed sets the `defaultSeed` package variable if it's | ||
// not yet set. This makes it possible for every run of the `rand` | ||
// workload to use a different seed by default | ||
func defaultRandomSeed() int64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification - why is sync.Once
required over simply binding randutil.NewPseudoSeed()
to &g.seed
on L92
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @smg260, and @srosenberg)
pkg/workload/rand/rand.go
line 70 at r8 (raw file):
Previously, smg260 (Miral Gadani) wrote…
Clarification - why is
sync.Once
required over simply bindingrandutil.NewPseudoSeed()
to&g.seed
on L92
Because New
is called multiple times, and I just want to set the default seed to a random value once.
bors r=smg260 TFTR! |
Build succeeded: |
Add test-eng as a code owner/watcher for pkg/workload. In light of recent and future improvements [1], [2], TestEng would prefer to be in sync with all changes to the workload code. Over time, the team plans to build expertise in this area. [1] cockroachdb#88362 [2] cockroachdb#88566 Release note: None Release justification: test only change
89014: jobs: Clear out claim info when pausing r=miretskiy a=miretskiy Clear out job claim information when job is paused. Clearing out claim information is beneficial since it allows operator to pause/resume job if they want to try to move job coordinator to another node. Addresses #82698 Release note: none 89026: kvserver: add `SmallEngineBlocks` testing knob and metamorphic params r=erikgrinaker a=erikgrinaker `@cockroachdb/repl-prs` to do the main review, tagging other teams for visibility/review of metamorphic test params. Resolves #86648. --- **kvserver: add `SmallEngineBlocks` testing knob** This patch adds a store testing knob `SmallEngineBlocks` which configures Pebble with a block size of 1 byte. This will store every key in a separate block, which can provoke bugs in time-bound iterators. Release note: None **sql/logictest: add metamorphic test param for small engine blocks** Uses a Pebble block size of 1 byte, to provoke bugs in time-bound iterators. Release note: None **kvserver/rangefeed: add metamorphic test param for small engine blocks** Uses a Pebble block size of 1 byte, to provoke bugs in time-bound iterators. Release note: None **kvserver/gc: add metamorphic test param for small engine blocks** Uses a Pebble block size of 1 byte, to provoke bugs in time-bound iterators. Release note: None **backupccl: add metamorphic test param for small engine blocks** Uses a Pebble block size of 1 byte, to provoke bugs in time-bound iterators. Release note: None 89030: codeowners: add test-eng to owners of pkg/workload r=srosenberg a=srosenberg Add test-eng as a code owner/watcher for pkg/workload. In light of recent and future improvements [1], [2], TestEng would prefer to be in sync with all changes to the workload code. Over time, the team plans to build expertise in this area. [1] #88362 [2] #88566 Release note: None Release justification: test only change Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Stan Rosenberg <[email protected]>
The
rand
workload generates random table definitions whenintialized, and performs random writes to the first table created when
run. It relies on the
randgen
package to generate random data andhas internal logic to convert from Datum to the corresponding Go data
structure to be passed to a database call.
This means that whenever
randgen
is able to generate new types ofdata, the
rand
workload needs to be changed accordingly. Since therewere no automated means to detect when the
rand
workload haddrifted (and this workload is probably not run very frequently),
running
rand
could fail because it wouldn't know how to generate aGo data structure for a specific datum.
This commit updates the
rand
workload to add support for the missingdata types and also adds a unit test that verifies that it is able to
generate and insert data for all possible column types. If a new type
is added to
randgen.RandomDatum
, this test should fail.This also changes the random seed used in
rand
in each run.