Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Dev Server hangs for 10 seconds on early shutdown #4174

Closed
mjameswh opened this issue Jan 19, 2023 · 7 comments
Closed

[Bug] Dev Server hangs for 10 seconds on early shutdown #4174

mjameswh opened this issue Jan 19, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@mjameswh
Copy link
Contributor

Describe the bug

Launching the dev server, then sending it a shutdown signal (ie. ctrl-c) quickly after, cause the process to hang out for 10 seconds before the process actually exits. If waiting for a very short time (less than a second) before sending the signal, termination is immediate.

This delay is caused by the Worker node failing to connect to Frontend, and thus waiting until expiration of its context deadline delay (thus 10 seconds).

This is most apparent when using dev server in integration tests (eg. through SDKs "test environment" feature), as these tests frequently execute very quickly.

Minimal Reproduction

temporal server start-dev & ; sleep 0.1 ; kill %1 ; fg %1

results in a 10 seconds delay, followed by this error message:

{"level":"fatal","ts":"2023-01-19T12:17:25.645-0700","msg":"error starting scanner","service":"worker","error":"context deadline exceeded","logging-call-at":"service.go:504","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Fatal\n\tgo.temporal.io/[email protected]/common/log/zap_logger.go:151\ngo.temporal.io/server/service/worker.(*Service).startScanner\n\tgo.temporal.io/[email protected]/service/worker/service.go:504\ngo.temporal.io/server/service/worker.(*Service).Start\n\tgo.temporal.io/[email protected]/service/worker/service.go:390\ngo.temporal.io/server/service/worker.ServiceLifetimeHooks.func1.1\n\tgo.temporal.io/[email protected]/service/worker/fx.go:139"}

Incrementing the sleep time in the above command to 1.0, no hang and no error message is observed.

Environment/Versions

  • OS and processor: Mac M1
  • Temporal Version: 1.8.5
@mjameswh mjameswh added the bug Something isn't working label Jan 19, 2023
@mjameswh
Copy link
Contributor Author

Potentially fixed by this PR. Will reevaluate once the PR lands in a cli release.

@mjameswh
Copy link
Contributor Author

Can still reproduce with cli 0.5.0.

@cretz
Copy link
Member

cretz commented Feb 24, 2023

Users have noticed this with Temporalite too.

@feedmeapples
Copy link
Contributor

works fine in v0.8.0. Was the fix included in v0.5.0?

@mjameswh
Copy link
Contributor Author

Tested just now with 0.8.0, and it still happens.

@bergundy
Copy link
Member

This looks like a server issue, I'll transfer to temporalio/temporal.

@bergundy bergundy transferred this issue from temporalio/cli Apr 17, 2023
@mjameswh
Copy link
Contributor Author

This appears to have been fixed in server 1.24.0 / CLI 0.13.0, presumably thanks to #5459.

I tested with various sleep durations, ranging from 0.01 seconds up to 2 seconds, and in all cases, the dev server completed shutdown within 1000-1400 ms of receiving the signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants