Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestPantsDaemonIntegration.test_pantsd_sigterm is flaky #7425

Closed
cosmicexplorer opened this issue Mar 24, 2019 · 1 comment · Fixed by #7504
Closed

TestPantsDaemonIntegration.test_pantsd_sigterm is flaky #7425

cosmicexplorer opened this issue Mar 24, 2019 · 1 comment · Fixed by #7504

Comments

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Mar 24, 2019

To my great shame, this test, added in #6574, has flaked: see https://travis-ci.org/pantsbuild/pants/jobs/510547772.

                     ==================== FAILURES ====================
                     _ TestPantsDaemonIntegration.test_pantsd_sigterm _
                     
                     self = <pants_test.pantsd.test_pantsd_integration.TestPantsDaemonIntegration testMethod=test_pantsd_sigterm>
                     
                         def test_pantsd_sigterm(self):
                           self._assert_pantsd_keyboardinterrupt_signal(
                             signal.SIGTERM,
                     >       ['Signal {signum} (SIGTERM) was raised. Exiting with failure.'.format(signum=signal.SIGTERM)])
                     
                     .pants.d/pyprep/sources/62676ab0dd65698bc674cce2191134fbe580f110/pants_test/pantsd/test_pantsd_integration.py:464: 
                     _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                     .pants.d/pyprep/sources/62676ab0dd65698bc674cce2191134fbe580f110/pants_test/pantsd/test_pantsd_integration.py:450: in _assert_pantsd_keyboardinterrupt_signal
                         self.assertIn(msg, waiter_run.stderr_data)
                     E   AssertionError: 'Signal 15 (SIGTERM) was raised. Exiting with failure.' not found in 'INFO] waiting for file /home/travis/build/pantsbuild/pants/.pants.d/tmp/tmpnxjru4og.pants.d/.pids/pantsd/pid to appear...\nInterrupted by user:\nInterrupted by user over pailgun client!\n'

Not sure how to repro this yet, but running it enough times locally might work. I believe this happens when the timeout for reading more content from the nailgun client completes before all the output from the subprocess is written. It's not clear why the timeout being hit isn't causing a warn log, as it is expected to.

@cosmicexplorer
Copy link
Contributor Author

Happened again. I think the answer might actually be the opposite of the above, in that the subprocess likely completes before the timeout completes, so the self._exit_reason is raised, but there's no signal sent to the pantsd-runner subprocess. How can we stop this subprocess from completing before our timeout? We could either make this timeout smaller (probably the right idea), or we could figure out how to make our subprocess wait longer.

cosmicexplorer added a commit that referenced this issue Apr 7, 2019
…eout (#7504)

### Problem

An attempt to resolve #7425. As mentioned on that ticket, the default (reasonable) timeout for waiting for pailgun subprocesses to complete after a signal (`--pantsd-pailgun-quit-timeout`) was 1 second. I believe the remaining issue here is that some invocations in CI would actually take more than a second to quit, which would cause nondeterministic behavior.

### Solution

- Set the timeout to 5 seconds on any test which intends for its subprocess to complete within the timeout.
- Match more of the stderr contents in each error case to remove ambiguity.

### Result

`TestPantsDaemonIntegration.test_pantsd_sigterm` hopefully won't continue to flake!!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant