workload: fix rand workload #88362

renatolabs · 2022-09-21T16:34:40Z

The rand workload generates random table definitions when
intialized, and performs random writes to the first table created when
run. It relies on the randgen package to generate random data and
has internal logic to convert from Datum to the corresponding Go data
structure to be passed to a database call.

This means that whenever randgen is able to generate new types of
data, the rand workload needs to be changed accordingly. Since there
were no automated means to detect when the rand workload had
drifted (and this workload is probably not run very frequently),
running rand could fail because it wouldn't know how to generate a
Go data structure for a specific datum.

This commit updates the rand workload to add support for the missing
data types and also adds a unit test that verifies that it is able to
generate and insert data for all possible column types. If a new type
is added to randgen.RandomDatum, this test should fail.

This also changes the random seed used in rand in each run.

cockroach-teamcity · 2022-09-21T16:34:52Z

This change is

renatolabs · 2022-09-21T16:39:17Z

Note that this does not guarantee that no errors are seen when running the rand workload. One issue I observed in my tests is that random data generation is not constrained by the schema definition. In other words, the following is possible:

table is created with column c1 as INT8, and column c2 is defined as c1 + 200
randgen generates MaxInt64-1 as a value for c1
error: c2 overflows

Fixing this type of "bug" is out of scope for this PR as it's not related to the workload itself.

srosenberg · 2022-09-21T19:46:53Z

pkg/workload/rand/rand.go

@@ -67,10 +85,11 @@ var randMeta = workload.Meta{
 		g.flags.StringVar(&g.tableName, `table`, ``, `Table to write to`)
 		g.flags.IntVar(&g.batchSize, `batch`, 1, `Number of rows to insert in a single SQL statement`)
 		g.flags.StringVar(&g.method, `method`, `upsert`, `Choice of DML name: insert, upsert, ioc-update (insert on conflict update), ioc-nothing (insert on conflict no nothing)`)
-		g.flags.Int64Var(&g.seed, `seed`, 1, `Key hash seed.`)
+		g.flags.Int64Var(&g.seed, `seed`, defaultRandomSeed(), `Key hash seed.`)


Nice! I wasn't aware we were defaulting to 1, so I checked other workloads. They seem to follow the same pattern, e.g., https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/tpcc/tpcc.go#L187

I wonder why?

srosenberg · 2022-09-21T19:51:34Z

pkg/workload/rand/BUILD.bazel

+go_test(
+    name = "rand_test",
+    srcs = ["rand_test.go"],
+    args = ["-test.timeout=295s"],


Bazel defaults to "medium" but it seems like this test doesn't need more than a few seconds, so maybe size = "small"?

renatolabs

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)

pkg/workload/rand/BUILD.bazel line 29 at r2 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Bazel defaults to "medium" but it seems like this test doesn't need more than a few seconds, so maybe size = "small"?

Good point, fixed.

pkg/workload/rand/rand.go line 88 at r2 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Nice! I wasn't aware we were defaulting to 1, so I checked other workloads. They seem to follow the same pattern, e.g., https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/tpcc/tpcc.go#L187

I wonder why?

Yes, after making this change I realized the other workloads also use 1 as the default seed. IMO, making the seed different on each run is a better default though. Especially in the context of workloads being used in tests.

smg260

Reviewed 2 of 4 files at r1, 2 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)

srosenberg

Indeed, it doesn't attempt to satisfy types or constraints; adding a couple more examples from my local runs,

Error: pq: duplicate key value violates unique constraint "table6234468500453514806_col6234468500453514806_4_expr_key"
Error: pq: bit string length 1 does not match type BIT(36)

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @smg260, and @srosenberg)

pkg/workload/rand/rand.go line 88 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Yes, after making this change I realized the other workloads also use 1 as the default seed. IMO, making the seed different on each run is a better default though. Especially in the context of workloads being used in tests.

I agree that it shouldn't default to a fixed value. I am just surprised it's been like this since this commit [1] (circa 2019). We could have been fuzzing more states :(
We should update all workloads to use a new seed by default; obviously, it's out of scope for this PR.

[1] 626b1a6

srosenberg

Reviewed 2 of 4 files at r1, 2 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @smg260, and @srosenberg)

pkg/workload/rand/rand.go line 67 at r4 (raw file):

func defaultRandomSeed() int64 {
	defaultSeedOnce.Do(func() {
		defaultSeed = timeutil.Now().UTC().UnixNano()

nit: UTC is redundant.

On second thought, a time-based seed is not as uniform as a crypto-based seed. cli.Main is already using a seed by reading it from /dev/urandom [1]. I think we should do the same in all the workloads.

[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/cli.go#L53

pkg/workload/rand/rand_test.go line 44 at r4 (raw file):

// insert it into a column
func TestRandRun(t *testing.T) {
	defer leaktest.AfterTest(t)()

Let's also add defer log.Scope(t).Close(t) since there are quite a few subtests; collectively, their output is verbose. In case of failure, it's nice to have a dedicated log with all the output.

srosenberg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @smg260, and @srosenberg)

pkg/workload/rand/rand.go line 88 at r2 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

I agree that it shouldn't default to a fixed value. I am just surprised it's been like this since this commit [1] (circa 2019). We could have been fuzzing more states :(
We should update all workloads to use a new seed by default; obviously, it's out of scope for this PR.

[1] 626b1a6

Note, another side-effect of this change is that the workload binary needs to be invoked with --init. Previously, it was possible to reuse the same tables between runs. I don't think we need to preserve that behavior, just noting the (obvious) difference :)

renatolabs

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)

pkg/workload/rand/rand.go line 88 at r2 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Note, another side-effect of this change is that the workload binary needs to be invoked with --init. Previously, it was possible to reuse the same tables between runs. I don't think we need to preserve that behavior, just noting the (obvious) difference :)

It doesn't need to be run with --init; but the caller does need to make sure they pass the same seed that was used when initializing. If a user is running the commands interactively, they can copy and paste the seed that is logged. If calling workload programatically, the best way to go is to probably generate the seed out of band and pass it to every workload call.

pkg/workload/rand/rand.go line 67 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

nit: UTC is redundant.

On second thought, a time-based seed is not as uniform as a crypto-based seed. cli.Main is already using a seed by reading it from /dev/urandom [1]. I think we should do the same in all the workloads.

[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/cli.go#L53

Good catch, fixed.

pkg/workload/rand/rand_test.go line 44 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Let's also add defer log.Scope(t).Close(t) since there are quite a few subtests; collectively, their output is verbose. In case of failure, it's nice to have a dedicated log with all the output.

Good point, added.

renatolabs

Reviewed 1 of 3 files at r5, 2 of 2 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)

a discussion (no related file):
Created issue #88566 to deal with the default seeds in workloads.

renatolabs

Yes, the first one is a conflict on a non-primary-key index. It's technically possible to deal with them (the workload already deals with conflicts on the primary key), but it doesn't do that right now.

As for the second one, it's not that it doesn't try to satisfy the type, but more that the type OID read from pg_catalog.pg_attribute for the BIT type maps to a variable-length BIT type even though the schema was defined with as BIT of a specific length. I'm not entirely sure why that is the case.

@ajwerner would you know if there's a reliable way for this workload to get the width of the BIT column? I'm asking you directly because I copied a change you made in the schemachange workload [1] in this PR, and I don't understand why it's needed.

[1]

cockroach/pkg/workload/schemachange/type_resolver.go

Lines 106 to 111 in 975701e

    
           // Special case CHAR to have the right width. 
        
           if objectID == oid.T_bpchar { 
        
           	t := *types.OidToType[objectID] 
        
           	t.InternalType.Width = 1 
        
           	return &t, nil 
        
           }

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)

ajwerner · 2022-09-23T15:58:06Z

The OIDs don't imply the widths. I guess you can make a char(2) type or whatever width you want but if you don't name the width I guess we do 1? I think I just wanted to make sure randgen always made a valid value and 1 will do that for all char widths.

I've learned now that you can access column widths via the information_schema.columns table's character_maximum_length column. That has to be done on a per column basis.

The `rand` workload generates random table definitions when intialized, and performs random writes to the first table created when run. It relies on the `randgen` package to generate random data and has internal logic to convert from Datum to the corresponding Go data structure to be passed to a database call. This means that whenever `randgen` is able to generate new types of data, the `rand` workload needs to be changed accordingly. Since there were no automated means to detect when the `rand` workload had drifted (and this workload is probably not run very frequently), running `rand` could fail because it wouldn't know how to generate a Go data structure for a specific datum. This commit updates the `rand` workload to add support for the missing data types and also adds a unit test that verifies that it is able to generate and insert data for all possible column types. If a new type is added to `randgen.RandomDatum`, this test should fail. Release note: None

The `rand` workload would default to the using `1` as the random seed when the corresponding command line flag was not passed. Since it's easy to _not_ pass this flag, it's also easy to always run the same workload over and over again. This commit updates the workload to use a different random seed each time. A consequence of this change is that the seed used to `init` the workload will have to be passed when running it. Release note: None

renatolabs

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)

a discussion (no related file):
Pushed a couple of changes here:

Improved logging of seed, especially in the case where the caller passes --seed. Not great, but current solution works. We should come up with a better, reusable strategy in Use different workload seeds by default #88566.
Changed the test to actually test all possible column types that randgen can generate; specifically, arrays of different types of data are now part of the test as well.
A consequence of the above is that I had to change the logic for arrays; arrays of geography/geometry data types use : as separator.
I used the information on @ajwerner's comment about information_schema.columns to get the width of bit/char columns. We shouldn't see mismatched width errors when trying to insert anymore.

@srosenberg or @smg260 could you take another look, please?

a discussion (no related file):
@srosenberg taking a second look at the errors you posted:

Error: pq: duplicate key value violates unique constraint "table6234468500453514806_col6234468500453514806_4_expr_key"

I've seen this happen when I run the workload without first clearing the data from a previous run. As I said, it's technically possible to see this error in other circumstances, but the random generator would have to generate the same value twice, which is quite unlikely for most column types. Could you try running init with --drop and let me know if the issue persists (if you happen to have access to the seed that was used in this run)?

Error: pq: bit string length 1 does not match type BIT(36)

This has to do with the wrong width information being used in the rand workload, which should now be fixed. If you still have the seed for this one, I'd appreciate if you could double check that the fix works.

smg260

LGTM

smg260 · 2022-09-28T16:08:40Z

pkg/workload/rand/rand.go

+// defaultRandomSeed sets the `defaultSeed` package variable if it's
+// not yet set. This makes it possible for every run of the `rand`
+// workload to use a different seed by default
+func defaultRandomSeed() int64 {


Clarification - why is sync.Once required over simply binding randutil.NewPseudoSeed() to &g.seedon L92

renatolabs

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @smg260, and @srosenberg)

pkg/workload/rand/rand.go line 70 at r8 (raw file):

Previously, smg260 (Miral Gadani) wrote…

Clarification - why is sync.Once required over simply binding randutil.NewPseudoSeed() to &g.seedon L92

Because New is called multiple times, and I just want to set the default seed to a random value once.

renatolabs · 2022-09-28T20:07:37Z

bors r=smg260

TFTR!

craig · 2022-09-28T21:15:13Z

Build succeeded:

Bazel Essential CI (Cockroach)

Add test-eng as a code owner/watcher for pkg/workload. In light of recent and future improvements [1], [2], TestEng would prefer to be in sync with all changes to the workload code. Over time, the team plans to build expertise in this area. [1] cockroachdb#88362 [2] cockroachdb#88566 Release note: None Release justification: test only change

89014: jobs: Clear out claim info when pausing r=miretskiy a=miretskiy Clear out job claim information when job is paused. Clearing out claim information is beneficial since it allows operator to pause/resume job if they want to try to move job coordinator to another node. Addresses #82698 Release note: none 89026: kvserver: add `SmallEngineBlocks` testing knob and metamorphic params r=erikgrinaker a=erikgrinaker `@cockroachdb/repl-prs` to do the main review, tagging other teams for visibility/review of metamorphic test params. Resolves #86648. --- **kvserver: add `SmallEngineBlocks` testing knob** This patch adds a store testing knob `SmallEngineBlocks` which configures Pebble with a block size of 1 byte. This will store every key in a separate block, which can provoke bugs in time-bound iterators. Release note: None **sql/logictest: add metamorphic test param for small engine blocks** Uses a Pebble block size of 1 byte, to provoke bugs in time-bound iterators. Release note: None **kvserver/rangefeed: add metamorphic test param for small engine blocks** Uses a Pebble block size of 1 byte, to provoke bugs in time-bound iterators. Release note: None **kvserver/gc: add metamorphic test param for small engine blocks** Uses a Pebble block size of 1 byte, to provoke bugs in time-bound iterators. Release note: None **backupccl: add metamorphic test param for small engine blocks** Uses a Pebble block size of 1 byte, to provoke bugs in time-bound iterators. Release note: None 89030: codeowners: add test-eng to owners of pkg/workload r=srosenberg a=srosenberg Add test-eng as a code owner/watcher for pkg/workload. In light of recent and future improvements [1], [2], TestEng would prefer to be in sync with all changes to the workload code. Over time, the team plans to build expertise in this area. [1] #88362 [2] #88566 Release note: None Release justification: test only change Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Stan Rosenberg <[email protected]>

renatolabs requested review from a team and herkolategan and removed request for a team September 21, 2022 16:34

renatolabs requested a review from a team September 21, 2022 16:34

srosenberg reviewed Sep 21, 2022

View reviewed changes

renatolabs force-pushed the fix-rand-workload branch from 209cf14 to dec6b1a Compare September 22, 2022 19:17

renatolabs commented Sep 22, 2022

View reviewed changes

smg260 approved these changes Sep 22, 2022

View reviewed changes

srosenberg reviewed Sep 23, 2022

View reviewed changes

renatolabs force-pushed the fix-rand-workload branch from dec6b1a to 37607b6 Compare September 23, 2022 14:09

renatolabs commented Sep 23, 2022

View reviewed changes

renatolabs mentioned this pull request Sep 23, 2022

Use different workload seeds by default #88566

Closed

renatolabs commented Sep 23, 2022

View reviewed changes

renatolabs added 2 commits September 23, 2022 14:44

renatolabs force-pushed the fix-rand-workload branch from 37607b6 to d1f7d54 Compare September 23, 2022 18:44

renatolabs commented Sep 23, 2022

View reviewed changes

smg260 approved these changes Sep 28, 2022

View reviewed changes

renatolabs commented Sep 28, 2022

View reviewed changes

craig bot merged commit b3c07be into cockroachdb:master Sep 28, 2022

srosenberg mentioned this pull request Sep 29, 2022

codeowners: add test-eng to owners of pkg/workload #89030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workload: fix rand workload #88362

workload: fix rand workload #88362

renatolabs commented Sep 21, 2022

cockroach-teamcity commented Sep 21, 2022

renatolabs commented Sep 21, 2022

srosenberg Sep 21, 2022

srosenberg Sep 21, 2022 •

edited

Loading

renatolabs left a comment

smg260 left a comment

srosenberg left a comment

srosenberg left a comment

srosenberg left a comment

renatolabs left a comment

renatolabs left a comment

renatolabs left a comment

ajwerner commented Sep 23, 2022

renatolabs left a comment •

edited

Loading

smg260 left a comment

smg260 Sep 28, 2022

renatolabs left a comment

renatolabs commented Sep 28, 2022

craig bot commented Sep 28, 2022

	// Special case CHAR to have the right width.
	if objectID == oid.T_bpchar {
	t := *types.OidToType[objectID]
	t.InternalType.Width = 1
	return &t, nil
	}

workload: fix rand workload #88362

workload: fix rand workload #88362

Conversation

renatolabs commented Sep 21, 2022

cockroach-teamcity commented Sep 21, 2022

renatolabs commented Sep 21, 2022

srosenberg Sep 21, 2022

Choose a reason for hiding this comment

srosenberg Sep 21, 2022 • edited Loading

Choose a reason for hiding this comment

renatolabs left a comment

Choose a reason for hiding this comment

smg260 left a comment

Choose a reason for hiding this comment

srosenberg left a comment

Choose a reason for hiding this comment

srosenberg left a comment

Choose a reason for hiding this comment

srosenberg left a comment

Choose a reason for hiding this comment

renatolabs left a comment

Choose a reason for hiding this comment

renatolabs left a comment

Choose a reason for hiding this comment

renatolabs left a comment

Choose a reason for hiding this comment

ajwerner commented Sep 23, 2022

renatolabs left a comment • edited Loading

Choose a reason for hiding this comment

smg260 left a comment

Choose a reason for hiding this comment

smg260 Sep 28, 2022

Choose a reason for hiding this comment

renatolabs left a comment

Choose a reason for hiding this comment

renatolabs commented Sep 28, 2022

craig bot commented Sep 28, 2022

srosenberg Sep 21, 2022 •

edited

Loading

renatolabs left a comment •

edited

Loading