Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: allow callers to cancel background steps in mixedversion #100754

Merged

Conversation

renatolabs
Copy link
Contributor

This commit updates the signature of functions in the mixedversion package that create steps that run in the background; this includes functions in the Helper struct, as well as in the main mixedversion.Test struct. These functions now all return a mixedversion.StopFunc that test authors can call when they wish to stop a background function without causing the test to fail.

When the stop function is called, the context passed to the background functions is canceled; however, that context cancelation is captured by the framework and logged as an expected termination. If the context is canceled through other means, the test will fail as usual.

Epic: CRDB-19321

Release note: None

@renatolabs renatolabs requested a review from a team as a code owner April 5, 2023 21:05
@renatolabs renatolabs requested review from srosenberg and smg260 and removed request for a team April 5, 2023 21:05
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@renatolabs renatolabs force-pushed the rc/4-mixedversion-allow-background-cancel branch from 9db0f53 to 317a9d8 Compare April 5, 2023 21:14
@renatolabs
Copy link
Contributor Author

renatolabs commented Apr 5, 2023

This is a stacked PR, please review them in order (but feel free to review the second or third ones before the first one is merged):

To be reviewed here: the entire diff.

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @smg260, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/BUILD.bazel line 25 at r1 (raw file):

        "//pkg/util/timeutil",
        "//pkg/util/version",
        "@com_github_pkg_errors//:errors",

Any reason not to opt for github.com/cockroachdb/errors, since roachtest already uses it.


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 229 at r1 (raw file):

		// cancel the underlying context.
		//
		// There should be one channel in `bgChans` for each hook in

Nit: should be put doubts in my head for some reason. "There is one channel in bgChans for each"... might be more assuring.


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 238 at r1 (raw file):

	}

	shouldStop chan struct{}

I've noticed we have a util Stopper which is used in roachtest (https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/test_runner.go#L71)
It might be a bit heavy weight, or not fit this scenario, just curious if you considered it along the way?


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 34 at r1 (raw file):

	"github.com/cockroachdb/cockroach/pkg/util/ctxgroup"
	"github.com/cockroachdb/cockroach/pkg/util/timeutil"
	"github.com/pkg/errors"

Ref to question regarding cockroachdb/errors above.

@renatolabs renatolabs force-pushed the rc/4-mixedversion-allow-background-cancel branch from 317a9d8 to 90d840d Compare April 24, 2023 17:39
Copy link
Contributor Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/BUILD.bazel line 25 at r1 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Any reason not to opt for github.com/cockroachdb/errors, since roachtest already uses it.

Huh, that's what I meant to have used, something went wrong with my auto import. Thanks for catching, fixed.


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 229 at r1 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Nit: should be put doubts in my head for some reason. "There is one channel in bgChans for each"... might be more assuring.

Done.


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 238 at r1 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

I've noticed we have a util Stopper which is used in roachtest (https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/test_runner.go#L71)
It might be a bit heavy weight, or not fit this scenario, just curious if you considered it along the way?

I'm a little ambivalent on this one. Stopper has a lot of features we wouldn't need here, and the semantics don't quite line up exactly. Stopper lets you stop all tasks whereas here we want the ability to stop functions individually. So if we used Stopper, we'd need one stopper for each function, and the code wouldn't be signficantly simpler after the change.

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 722 at r2 (raw file):

// hook is run concurrently. `stopChans` should either be `nil` for
// steps that are not meant to be run in the background, or contain
// one stop channel (`shouldStop`) for each hook.

Can this be made into an assertion/check? Seems like it could be subject to misuse resulting in an invalid slice index.


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 745 at r2 (raw file):

	}

	if len(steps) <= 1 {

Semantically does a single hook run in the same way as multiple hooks concurrently?

If so, is this branch necessary?

This commit updates the signature of functions in the `mixedversion`
package that create steps that run in the background; this includes
functions in the `Helper` struct, as well as in the main
`mixedversion.Test` struct. These functions now all return a
`mixedversion.StopFunc` that test authors can call when they wish to
stop a background function without causing the test to fail.

When the stop function is called, the context passed to the background
functions is canceled; however, that context cancelation is captured
by the framework and logged as an expected termination. If the context
is canceled through other means, the test will fail as usual.

Epic: CRDB-19321

Release note: None
@renatolabs renatolabs force-pushed the rc/4-mixedversion-allow-background-cancel branch from 90d840d to 389adc0 Compare April 24, 2023 19:53
Copy link
Contributor Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 722 at r2 (raw file):

Previously, smg260 (Miral Gadani) wrote…

Can this be made into an assertion/check? Seems like it could be subject to misuse resulting in an invalid slice index.

This would only result in an error if there's an internal error (internal to the mixedversion package, not exposed by any public API). If there's such an error and we get a panic, arguably that's ok since we'll get a stack trace.

I might be misunderstanding your suggestion, please clarify if so.


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 745 at r2 (raw file):

Previously, smg260 (Miral Gadani) wrote…

Semantically does a single hook run in the same way as multiple hooks concurrently?

If so, is this branch necessary?

Depends on what's implied by "the same way".

If you're asking why we don't just use concurrentRunStep in the single step case, the reason is mostly so that the test plan generated is be more readable, i.e., so that we're able to say:

- downgrade node 2 to 22.2
- run "func A"
- downgrade node 3 to 22.2

instead of

- downgrade node 2 to 22.2
- run steps concurrently:
    - run "func A"
- downgrade node 3 to 22.2

These "single concurrent runs" would add up and get noisy without providing much value or clarity.

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 722 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

This would only result in an error if there's an internal error (internal to the mixedversion package, not exposed by any public API). If there's such an error and we get a panic, arguably that's ok since we'll get a stack trace.

I might be misunderstanding your suggestion, please clarify if so.

More a general comment about the function being susceptible to a caller breaking the implied (via the comment) contract, however if its more of an internal helper function, not a big deal.

e.g. Would it not be possible for someone to pass a stopChans satisfying 0 < len(stopChans) < len(hooks), in which case the stopChanFor function would behave incorrectly?


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 745 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Depends on what's implied by "the same way".

If you're asking why we don't just use concurrentRunStep in the single step case, the reason is mostly so that the test plan generated is be more readable, i.e., so that we're able to say:

- downgrade node 2 to 22.2
- run "func A"
- downgrade node 3 to 22.2

instead of

- downgrade node 2 to 22.2
- run steps concurrently:
    - run "func A"
- downgrade node 3 to 22.2

These "single concurrent runs" would add up and get noisy without providing much value or clarity.

That makes sense, but since its just for output aesthetics, would it not make sense to decorate when generating the test plan? i.e. during test plan generation, if a concurrent step with only one step is encountered, you could print appropriately?

Not a deal breaker. More of a thought.

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 238 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

I'm a little ambivalent on this one. Stopper has a lot of features we wouldn't need here, and the semantics don't quite line up exactly. Stopper lets you stop all tasks whereas here we want the ability to stop functions individually. So if we used Stopper, we'd need one stopper for each function, and the code wouldn't be signficantly simpler after the change.

Makes sense, I agree the simplicity fits better here.

Copy link
Contributor Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 722 at r2 (raw file):

Previously, smg260 (Miral Gadani) wrote…

More a general comment about the function being susceptible to a caller breaking the implied (via the comment) contract, however if its more of an internal helper function, not a big deal.

e.g. Would it not be possible for someone to pass a stopChans satisfying 0 < len(stopChans) < len(hooks), in which case the stopChanFor function would behave incorrectly?

The only way this could happen is if there's a programming error in the mixedversion package (since the management of hooks and stopChans happen inside the package, they are both private fields not directly manipulated by callers). If the size of stopChans were in the range above, the function will panic and I think that's ok, it should only happen if there's a bug in the framework and trying to account for it might be too defensive (we'd just return an error anyway -- a panic is probably better since it halts execution exactly at the point where the assumption is not held).


pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go line 745 at r2 (raw file):

Previously, smg260 (Miral Gadani) wrote…

That makes sense, but since its just for output aesthetics, would it not make sense to decorate when generating the test plan? i.e. during test plan generation, if a concurrent step with only one step is encountered, you could print appropriately?

Not a deal breaker. More of a thought.

It could be handled at the presentation level, though I think it doesn't necessarily improves things much. I also plan to change this part of the code in the near future: right now when multiple user hooks are expected to run at the same time, they always run concurrently; I want them to run sequentially sometimes too. I'll consider changing how this is handled when I make that change.

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 9 files at r1, 3 of 3 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @srosenberg)

@renatolabs
Copy link
Contributor Author

TFTR!

bors r=smg260

@craig
Copy link
Contributor

craig bot commented Apr 25, 2023

👎 Rejected by code reviews

@renatolabs
Copy link
Contributor Author

bors r=smg260,herkolategan

@craig
Copy link
Contributor

craig bot commented Apr 25, 2023

This PR was included in a batch that timed out, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Apr 25, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants