Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ctx/Signal Handling still not totally correct #728

Closed
ghostsquad opened this issue May 14, 2022 · 9 comments · Fixed by #769
Closed

Ctx/Signal Handling still not totally correct #728

ghostsquad opened this issue May 14, 2022 · 9 comments · Fixed by #769

Comments

@ghostsquad
Copy link
Contributor

ghostsquad commented May 14, 2022

#479 was recently merged (associated issue: #458), but I realized that you don't want to give mvdan/sh an arbitrary context.Background(), what we need to do is give it a new context, and forward the cancellation from the parent IF that cancellation isn't coming from a signal.

I use github.com/oklog/run and it's associated SignalHandler in the CLI's I make, and I think this can be improved with the following:

task/cmd/task/task.go

Lines 205 to 220 in 367c0b3

ctx := context.Background()
if !watch {
ctx = getSignalContext()
}
if status {
if err := e.Status(ctx, calls...); err != nil {
log.Fatal(err)
}
return
}
if err := e.Run(ctx, calls...); err != nil {
e.Logger.Errf(logger.Red, "%v", err)
os.Exit(1)
}

more or less, this can be replaced with

var group run.Group
group.Add(run.SignalHandler(ctx, syscall.SIGINT, syscall.SIGTERM))
group.Add(func() error {
	if status {
		return e.Status(ctx, calls...)
	}

	return e.Run(ctx, calls...)
}, func(err error) {
	cancel()
})

if err := group.Run(); err != nil {
	e.Logger.Errf(logger.Red, "%v", err)
	os.Exit(1)
}

And then this would be updated:

// We used to pass to interp.Runner a context that was cancelled on reception of a
// OS signal. This caused the Runner to terminate the subprocess abruptly.
// The correct behavior instead is for us to completely ignore the signal and let
// the subprocess deal with it. If the subprocess doesn't handle the signal, it will
// be terminated. If the subprocess does handle the signal, it knows better than us
// wether it wants to cleanup and terminate or do something different.
// See https://github.com/go-task/task/issues/458 for details.
// So now we pass an empty context just to make the API of interp.Runner happy
return r.Run(context.Background(), p)

with

subCtx := context.Background()
subCtx, cancel := context.WithCancel(subCtx)

var group run.Group
group.Add(func() error {
	return r.Run(subCtx, p)
}, func(err error) {
	if cancel != nil {
		cancel()
	}
})

group.Add(func() error {
	select {
	case <-ctx.Done():
		// If we receive a SIGINT, let signal propagation handle the cancellation
		// This is to avoid the problem presented in: https://github.com/go-task/task/issues/458
		if !errors.Is(ctx.Err(), run.SignalError{}) {
			cancel()
		}
	case <-subCtx.Done():
		cancel()
	}
	return nil
}, func(err error) {
	cancel()
})
return group.Run()

mentions to get some eyes to sanity check this thinking:
@andreynering @aliculPix4D @marco-m-pix4d

@ghostsquad
Copy link
Contributor Author

**updated the code slightly... to reflect that we are disconnecting the parent context (ctx) from the sub context (subCtx) and only forwarding on cancellation on a specific condition.

@andreynering andreynering added the type: bug Something not working as intended. label May 14, 2022
@andreynering
Copy link
Member

Just realized that tests are broken when running locally on my M1 Macbook Pro:

$ task
task: [install] go install -v -ldflags="-w -s -X main.version=367c0b3" ./cmd/task
task: [sleepit] go build ../internal/sleepit
task: [test] go test github.com/go-task/task/v3 github.com/go-task/task/v3/args github.com/go-task/task/v3/cmd/task github.com/go-task/task/v3/internal/compiler github.com/go-task/task/v3/internal/compiler/v2 github.com/go-task/task/v3/internal/compiler/v3 github.com/go-task/task/v3/internal/execext github.com/go-task/task/v3/internal/hash github.com/go-task/task/v3/internal/logger github.com/go-task/task/v3/internal/output github.com/go-task/task/v3/internal/sleepit github.com/go-task/task/v3/internal/status github.com/go-task/task/v3/internal/summary github.com/go-task/task/v3/internal/templater github.com/go-task/task/v3/taskfile github.com/go-task/task/v3/taskfile/read
task: [task-2] echo 'cmd ran'
cmd ran
task: [task-2] exit 1
task: [task-2] echo 'failing' && exit 2
failing
task: [task-2] echo 'echo ran'
echo ran
task: [task-1] echo 'task-1 ran successfully'
task-1 ran successfully

--- FAIL: TestSignalSentToProcessGroup (0.00s)
    --- FAIL: TestSignalSentToProcessGroup/child_does_not_handle_sigint:_receives_sigint_and_terminates_immediately (0.00s)
        unix_test.go:114: starting the SUT process: fork/exec /Users/andrey/Developer/andrey/task/bin/task: no such file or directory
    --- FAIL: TestSignalSentToProcessGroup/child_intercepts_sigint:_receives_sigint_and_does_cleanup (0.00s)
        unix_test.go:114: starting the SUT process: fork/exec /Users/andrey/Developer/andrey/task/bin/task: no such file or directory
    --- FAIL: TestSignalSentToProcessGroup/child_simulates_terraform:_receives_1_sigint_and_does_cleanup (0.00s)
        unix_test.go:114: starting the SUT process: fork/exec /Users/andrey/Developer/andrey/task/bin/task: no such file or directory
FAIL
FAIL	github.com/go-task/task/v3	0.348s
ok  	github.com/go-task/task/v3/args	(cached)
?   	github.com/go-task/task/v3/cmd/task	[no test files]
?   	github.com/go-task/task/v3/internal/compiler	[no test files]
?   	github.com/go-task/task/v3/internal/compiler/v2	[no test files]
?   	github.com/go-task/task/v3/internal/compiler/v3	[no test files]
?   	github.com/go-task/task/v3/internal/execext	[no test files]
?   	github.com/go-task/task/v3/internal/hash	[no test files]
?   	github.com/go-task/task/v3/internal/logger	[no test files]
ok  	github.com/go-task/task/v3/internal/output	(cached)
?   	github.com/go-task/task/v3/internal/sleepit	[no test files]
ok  	github.com/go-task/task/v3/internal/status	(cached)
ok  	github.com/go-task/task/v3/internal/summary	(cached)
?   	github.com/go-task/task/v3/internal/templater	[no test files]
ok  	github.com/go-task/task/v3/taskfile	(cached)
?   	github.com/go-task/task/v3/taskfile/read	[no test files]
FAIL
task: Failed to run task "default": task: Failed to run task "test": exit status 1

@andreynering
Copy link
Member

Ah, commented too quick. Of course the problem was just the location of the binary. Fixed at 7d474db.

@andreynering
Copy link
Member

Well, seems that the test is indeed intermittent, though. Example:

https://github.com/go-task/task/runs/6437587228

Screen Shot 2022-05-14 at 20 44 31

I'd say it's happening once in 20 runs or something like that.

@ghostsquad
Copy link
Contributor Author

We need to track this issue for when github actions rolls out support for M1 Mac support
actions/runner-images#2187

Also consider explicitly building on Arm64 (Linux) as a stop-gap.
https://github.com/uraimo/run-on-arch-action

@ghostsquad
Copy link
Contributor Author

I'm going to file a new issue for testing on ARM, I don't think that's actually related to these failures. I think that the tests are trying to test a "racy" condition, and need some tweaking to make it more deterministic.

@marco-m-pix4d
Copy link
Contributor

First, thanks for all the effort. I am aware that there is still some flakiness, but I hadn't time to sit down and find the root cause. I hope that in general my PR (just merged) makes things better. If on the other hand you want to rollback, I will not take offense :-)

@ghostsquad
Copy link
Contributor Author

I don't think we need to rollback. Flaky tests just need to be addressed before they get too annoying.

The reason I realized this may be problematic is when I started implementing "fail fast" on parallel steps in my own project.

@andreynering
Copy link
Member

I just reverted it: d450444

I tried to understand what was happening for some minutes, but seems this would take more time. I plan to back to this again once possible. Sorry for that.

andreynering added a commit that referenced this issue Jun 12, 2022
Task will now give time for the processes running to do cleanup work

Ref #458
Ref #479
Fixes #728

Co-authored-by: Marco Molteni <[email protected]>
Co-authored-by: aliculPix4D <[email protected]>
andreynering added a commit that referenced this issue Jun 12, 2022
Task will now give time for the processes running to do cleanup work

Ref #458
Ref #479
Fixes #728

Co-authored-by: Marco Molteni <[email protected]>
Co-authored-by: aliculPix4D <[email protected]>
@pd93 pd93 removed the type: bug Something not working as intended. label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants