-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent Pants service failure #8198
Comments
22:10:17 [WARN] Encountered error when trying to log this message: b'timestamp: 2019-08-22T18:10:17.718050\nException caught: (pants.pantsd.pants_daemon.RuntimeFailure)\n File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/bin/pants", line 10, in <module>\n sys.exit(main())\n File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/bin/pants_loader.py", line 78, in main\n PantsLoader.run()\n File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/bin/pants_loader.py", line 74, in run\n cls.load_and_execute(entrypoint)\n File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/bin/pants_loader.py", line 67, in load_and_execute\n entrypoint_main()\n File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/pants_daemon.py", line 510, in launch\n PantsDaemon.Factory.create(OptionsBootstrapper.create()).run_sync()\n File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/pants_daemon.py", line 443, in run_sync\n self._run_services(self._services)\n File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/pants_daemon.py", line 396, in _run_services\n raise PantsDaemon.RuntimeFailure(\'service failure for {}, shutting down!\'.format(service))\n\nException message: service failure for <pants.pantsd.service.scheduler_service.SchedulerService object at 0x10d152550>, shutting down!\n\n', This is the most relevant bit I've found in the log so far. Will keep digging. |
|
By the looks of it, you're running under the pants daemon.
Somehow, the run failing resulted in a daemon crash, which is why you're seeing the Were you running any other pants commands at all when this happened, in the same workdir? |
No other commands running, just this. |
Having just restarted my machine, the first pants run of the day gave me a similar error, but this time for FSEventService: ./pants test src/python/toolchain/satresolver::
/Users/tansypants/.cache/pants/plugins/protobuf-3.9.0-cp37-cp37m-macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl-install/google/protobuf/descriptor.py:47: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from google.protobuf.pyext import _message
/Users/tansypants/.cache/pants/plugins/botocore-1.12.207-py2.py3-none-any.whl-install/botocore/vendored/requests/packages/urllib3/_collections.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Mapping, MutableMapping
/Users/tansypants/.cache/pants/plugins/protobuf-3.9.0-cp37-cp37m-macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl-install/google/protobuf/descriptor.py:47: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from google.protobuf.pyext import _message
/Users/tansypants/.cache/pants/plugins/botocore-1.12.207-py2.py3-none-any.whl-install/botocore/vendored/requests/packages/urllib3/_collections.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Mapping, MutableMapping
Exception in thread FSEventServiceThread:
Traceback (most recent call last):
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pywatchman/__init__.py", line 339, in readBytes
buf = [self.sock.recv(size)]
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/tansypants/.pyenv/versions/3.7.3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/Users/tansypants/.pyenv/versions/3.7.3/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/pants_daemon.py", line 363, in target
service.run()
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/service/fs_event_service.py", line 110, in run
self._watchman.watch_project(self._build_root)
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/watchman.py", line 158, in watch_project
return self.client.query('watch-project', os.path.realpath(path))
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pywatchman/__init__.py", line 1052, in query
res = self.receive()
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pywatchman/__init__.py", line 960, in receive
result = self.recvConn.receive()
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pywatchman/__init__.py", line 695, in receive
buf = [self.transport.readBytes(sniff_len)]
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pywatchman/__init__.py", line 344, in readBytes
raise SocketTimeout('timed out waiting for response')
pywatchman.SocketTimeout: timed out waiting for response, while executing ('watch-project', '/Users/tansypants/code/toolchain')
timestamp: 2019-08-23T14:02:14.196588
Exception caught: (pants.pantsd.pants_daemon.RuntimeFailure)
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/bin/pants", line 10, in <module>
sys.exit(main())
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/bin/pants_loader.py", line 78, in main
PantsLoader.run()
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/bin/pants_loader.py", line 74, in run
cls.load_and_execute(entrypoint)
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/bin/pants_loader.py", line 67, in load_and_execute
entrypoint_main()
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/pants_daemon.py", line 510, in launch
PantsDaemon.Factory.create(OptionsBootstrapper.create()).run_sync()
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/pants_daemon.py", line 443, in run_sync
self._run_services(self._services)
File "/Users/tansypants/.cache/pants/setup/bootstrap-Darwin-x86_64/1.20.0.dev1_py37/lib/python3.7/site-packages/pants/pantsd/pants_daemon.py", line 396, in _run_services
raise PantsDaemon.RuntimeFailure('service failure for {}, shutting down!'.format(service))
Exception message: service failure for <pants.pantsd.service.fs_event_service.FSEventService object at 0x10fd937f0>, shutting down!``` |
From the log:
(Removed several hundred filenames for brevity.) 'src/python/toolchain/workflow/worker.py', 'src/python/toolchain/workflow/worker_thread.py')
18:01:42 [INFO] setting up service <pants.pantsd.service.pailgun_service.PailgunService object at 0x10fd93198>
18:01:42 [INFO] setting up service <pants.pantsd.service.store_gc_service.StoreGCService object at 0x10fd933c8>
18:01:42 [INFO] starting service <pants.pantsd.service.fs_event_service.FSEventService object at 0x10fd937f0>
18:01:42 [INFO] starting service <pants.pantsd.service.scheduler_service.SchedulerService object at 0x10fd93668>
18:01:42 [INFO] starting service <pants.pantsd.service.pailgun_service.PailgunService object at 0x10fd93198>
18:01:42 [INFO] starting pailgun server on port 49517
18:01:42 [INFO] starting service <pants.pantsd.service.store_gc_service.StoreGCService object at 0x10fd933c8>
18:01:42 [INFO] handling pailgun request: `./pants test src/python/toolchain/satresolver::`
18:02:14 [INFO] terminating pantsd service: <pants.pantsd.service.fs_event_service.FSEventService object at 0x10fd937f0>
18:02:14 [INFO] terminating pantsd service: <pants.pantsd.service.scheduler_service.SchedulerService object at 0x10fd93668>
18:02:14 [INFO] terminating pantsd service: <pants.pantsd.service.pailgun_service.PailgunService object at 0x10fd93198>
18:02:14 [INFO] pailgun service on port 49517 shutting down
18:02:14 [INFO] terminating pantsd service: <pants.pantsd.service.store_gc_service.StoreGCService object at 0x10fd933c8>
18:02:14 [INFO] terminating pantsd |
And again. This time it didn't even get far enough to yell at me about deprecated things.
Log:
Enormous list of files truncated. I hope they aren't actually useful.
|
Not sure if this one belongs here, but it seems it may be related?
Log:
truncated.
truncated. (These all appear to be the complete list of every file in our repo.)
truncated
|
For one of your posted logs, this line is very relevant:
So, in that case, for some reason, the watchman socket was either corrupted or unable to start. Do you know of anything in your environment that might prevent pants from creating a socket? Another explanation is that the watchman socket file was leftover from the last time you ran, and pants tried to connect to it, but there was no process running. If that happens, running |
Unfortunately clean-all has done nothing to fix this one. I hit this 15-20 times a day. Re-running the same command works just fine, but this error comes up maybe 1 in every 4 pants runs. I don't think there's anything in my environment that would prevent socket creation (except that I'm running OSX, which does occasionally result in things suddenly not working the way one would expect.) |
This appears to be only hitting OSX users at my company. Linux users do not seem to be seeing the same errors. (sample size: 3 OSX users, 1 linux user) |
Okay, so if it's not the socket thing (let's assume that was a one-off because of the restart), let's go back to the other logs you posted. This line here:
Makes me believe that the failures were legitimate, because this triggers when a run completes with a bad error code. However, your run doesn't finish at all which is very surprising. Seeing as you note that the list of invalidating files seems to be the whole repo, it's possible that the globs are misconfigured, and they end up invalidating the daemon way too often. If a file covered by those globs changes, it will invalidate the daemon and crash it. Is it possible that one of those files is changing mid-run? |
It's possible, but unlikely - My general workflow is to make some changes, run I will note though, that the list of al the files in the repo did contain all the Of note: My coworkers say they hit this issue more often when running pants on python targets than anything else. Possibly it is python specific. |
We've hit this issue on multiple machines, on multiple different pants commands (lint and test for sure) on multiple different targets (though possibly always with python targets?) The globs are generally the default python_library(
dependencies = [
'3rdparty/python:plyvel',
'3rdparty/python:setuptools',
'src/python/toolchain/base',
'src/python/toolchain/lang/python',
'src/python/toolchain/packagerepo/pypi',
'src/python/toolchain/satresolver',
'src/python/toolchain/util/file',
],
)
python_tests(
name='tests',
dependencies = [
'3rdparty/python:Django',
'3rdparty/python:pytest',
'3rdparty/python:pytest-django', # Test time requirement.
':pypi',
'src/python/toolchain/base',
'src/python/toolchain/django/webresource',
'src/python/toolchain/lang/python',
'src/python/toolchain/packagerepo/pypi',
'src/python/toolchain/satresolver',
'src/python/toolchain:conftest',
],
)
python_binary(
name = 'depgraph_builder',
dependencies = [
':pypi'
],
entry_point = 'toolchain.satresolver.pypi.depgraph_builder'
) |
How large is this repository? Is it possible that watchman is just taking a long time to respond the first time? What are these set to for you?
|
Not huge. I guess it is possible it's taking ages. We do not appear to be overriding the default watchman settings. Is there some way I can print out the actual settings for those flags on a given run? |
Seen again at the run for 2710a7f: https://gist.github.com/wisechengyi/5666243d6efbf263ae217f43516757f2 |
I removed the flaky-test label per #9386 since this one appears non-specific to a particular test but indicates a larger pantsd production code problem. |
### Problem Using the notify crate for file invalidation is now feature complete, and so we are using two file watching implementations simultaneously under `pantsd`. ### Solution Disable watchman by default, and deprecate it. Additionally, remove the implementation of the `experimental` flag from notify. Fixes #8710, fixes #8198, fixes #7375, fixes #5487, fixes #4999, fixes #3479.
Started seeing this error some time after we upgraded to python3, and before we upgraded to
1.20.0.dev1
(so... May? ish? I can probably nail that down a bit.)The text was updated successfully, but these errors were encountered: