[Pantsd] Enable pantsd in Travis #7320

blorente · 2019-03-06T14:16:02Z

In the context of #4438, a large step will be enabling pantsd in Travis CI. This ticket is created to track progress towards that.

An unordered summary of known issues discovered in #7322:

Forking issues
- "OSX 10.12 sanity check shard fails fast"[ ]
- ([pantsd] Segfault when no_proxy not set #7380) "dynamically adding to --pythonpath does not affect a forked pantsd instance" ?:
  - tests/python/pants_test/backend/python:integration
(have seen it be flaky in other places) https://gist.github.com/stuhood/2d87abf9d08df80c9062ffcb93cff393:
- tests/python/pants_test/backend/python/tasks/native:integration
https://gist.github.com/stuhood/a2a927816f7d7712278b454949778d15
- tests/python/pants_test/backend/project_info/tasks:idea_plugin_integration
"test directly inspecting value of the ENABLE_PANTSD option":
- tests/python/pants_test/goal:run_tracker_integration
"not necessarily broken, just expecting log output". When exceptions are logged to files as opposed to stderr, and tests check for some string in stderr, that's a problem:
- tests/python/pants_test/logging:test_native_engine_logging
- tests/python/pants_test/rules:test_integration
- tests/python/pants_test/engine/legacy:pants_engine_integration
- tests/python/pants_test/engine/legacy:graph_integration
- tests/python/pants_test/build_graph:build_graph_integration
- TestPantsDaemonIntegration.test_pantsd_sigterm
pants.pantsd.watchman.Watchman.WatchmanCrash: error from watchman: RootResolveError: unable to resolve root /home/travis/build/pantsbuild/pants/.pants.d/tmp/$snip.pants.d: directory /home/travis/build/pantsbuild/pants/.pants.d/tmp/$snip.pants.d is not watched -> This presents the case where problems arise if tests create their own buildroots I don't think that's something we should deal with in production, for now? Not sure though.
- tests/python/pants_test/backend/jvm/tasks/jvm_compile/java:cache_compile_integration
- tests/python/pants_test/backend/python/tasks:integration
When hermetic is enabled in PantsDaemonIntegrationTestBase, many of the pantsd tests complain about absolute symlinks on in the python distribution: Exception: Failed to read_link for Link("build-support/pants_dev_deps.py37.venv/.Python"): Custom { kind: InvalidData, error: StringError("Absolute symlink: \"/Users/bescobar/workspace/otherpants/build-support/pants_dev_deps.py37.venv/.Python\"") }, or because some plugins are not loaded. I think this is because of the --pants-config-files=[] that is passed when hermetic. To this point, the reason things had to be hermetic in the first place was because some tests run pantsd multiple times, and expect it to be alive between runs in the same test. Since the PANTS_SHUTDOWN_PANTSD_AFTER_RUN option leaked from the travis env, every pants run in the test shut down the daemon. Stripping that option should be enough to make those tests run non-hermetically, I think.
- tests/python/pants_test/pantsd:pantsd_integration
  - test_pantsd_compile
  - test_pantsd_run
  - test_pantsd_filesystem_invalidation
  - test_pantsd_aligned_output
- tests/python/pants_test/engine/legacy:console_rule_integration
  - test_v2_list

So, TL;DR, the current set of issues this has (which may or may not be relevant to enabling pantsd itself). Barring these, travis passes (ticked means "probably not relevant outside of the test harness"):

Forking issues.
Some tests had to be marked hermetic. The root cause was that the env variable PANTS_SHUTDOWN_PANTSD_AFTER_RUN=true was leaking from the travis env. There are tests that expect the daemon to be alive run after run, because they test things like FS invalidation. Even though we turned the CLI option off in the pantsd running context, the ENV var still leaked. That meant that pantsd got restarted after every run within the test. This problem is both not very relevant because it only affects the test harness, and not very hard to solve for the test harness.
Some tests expect output in stderr, but they are not getting it. When the daemon is enabled, the output gets logged into the log file or logs/exceptions.log, and doesn't reach stderr. I have to investigate more to see if this is an error that actually happens in the wild.
- This one seems legitimate, it's pretty easy to repro locally with:
```
$ cat stderr_repro.sh
#!/bin/bash
mv $1/TEST_BUILD $1/BUILD
./pants --enable-pantsd --build-file-imports=warn run "${1}:hello"
mv $1/BUILD $1/TEST_BUILD
$ ./stderr_repro.sh testprojects/src/python/build_file_imports_module
<The warning will not appear in the console, but in the pantsd log>
```
  When we stop forking, fixing this issue should be easy.
Some tests create custom buildroots, inside the workdir of the pants instance running the tests. This means that the watchman instance orchestrating the tests has a hard time tracking those roots. I'm not sure how much this is an Issue, but I can see a world where we don't care that much about buildroots within buildroots, particularly given that specifying a custom buildroot is already discouraged.
- The current hypothesis is that mock_buildroot() actually symlinks the inner .pants.d directory to the outer one (something like .pants.d/tmpabc123/.pants.d -> .pants.d). Therefore, when we run the test wit the daemon enabled, and we try to watch the workdir (.pants.d/tmpabc123/.pants.d), watchman will crash because it doesn't support symlinks. If this is the case, this means that the issues is in the test harness, not pantsd (except maybe we could give a more graceful error if we fail because of this).
- EDIT: I don't think it's symlinks specifically anymore, because when setting a folder other than the outer workdir as the root of our mock buildroot, the test doesn't repro anymore.
- EDIT: No, that was not what was wrong. The story goes: When calling mock_buildroot(), we link some essential files to basically recreate a minimal buildroot. We don't link a file called .python_interpreter_constraints. This means that, when the test runs pants in the current buildroot, it sees that the python interpreter constraints file is missing, and triggers a ./pants clean-all, which in turn gets rid of the symlink of .pants.d/tmpabc123/.pants.d -> .pants.d, and replaces it with a new folder. I'm not sure what the connection is between this and watchman crashing, but if we also link the .python_interpreter_constraints file, it works.

The text was updated successfully, but these errors were encountered:

stuhood · 2019-03-06T21:28:57Z

I believe that #7300 failed due to a known issue with pantsd on OSX < 10.12 (EDIT: now reported as #7323).

Nonetheless, I don't think it's actually necessary to use pantsd on the binary builder shards yet... will take a new tack in a new PR.

stuhood · 2019-03-06T23:23:28Z

Iterating on #7322.

stuhood · 2019-03-12T18:28:17Z

Will gather a log today of issues, and triage them based on whether I think that they will be fixed by #7341.

cosmicexplorer · 2019-03-12T18:40:48Z

Is there a ticket or comment somewhere describing the blockers to using pantsd run-over-run for integration tests in travis? I am vaguely aware that the integration tests are supposed to have a separate buildroot, which means pantsd can't immediately be used. There is even a specific warning when using pantsd in an integration test:

pants/src/python/pants/pantsd/pants_daemon.py

Lines 189 to 198 in ecd1554

    
           pidfile_absolute = PantsDaemon.metadata_file_path('pantsd', 'pid', bootstrap_options.pants_subprocessdir) 
        
           if pidfile_absolute.startswith(build_root): 
        
             pidfile = os.path.relpath(pidfile_absolute, build_root) 
        
           else: 
        
             pidfile = None 
        
             logging.getLogger(__name__).warning( 
        
               'Not watching pantsd pidfile because subprocessdir is outside of buildroot. Having ' 
        
               'subprocessdir be a child of buildroot (as it is by default) may help avoid stray ' 
        
               'pantsd processes.' 
        
             )

Is there a reason we can't share that across runs? The time we get back in our lives from having integration tests run faster is going to make everything else easier to iterate on. If this is considered an unnecessary addendum to the pantsd work right now then I can work on it myself, but I'd appreciate having the blockers more clearly laid out.

stuhood · 2019-03-12T19:01:57Z

@cosmicexplorer : All integration tests use new anonymous workdirs, intentionally, to avoid mixing of results from one test method into another. This is mostly because having to worry about state from other tests being mixed into your current test method would drive you fairly quickly insane.

I think that we need to build a huge amount of trust in pantsd before we begin to do that. We could begin collecting blockers into an issue, but I know of 3 or 4 off the top of my head (which we may or may not fix while enabling pantsd by default... unclear), so it doesn't feel like a priority to me.

EDIT: But, if you are interested in pushing on it, I would recommend making it a separate ticket blocked by this one.

stuhood · 2019-03-12T22:53:18Z

I've updated the description with a summary of failures from #7322.

In short... there aren't that many? And it doesn't seem like any of them would be fixed by #7341 (with the exception of maybe the watchman-related failures). I don't think that that means we shouldn't implement and use #7341, but it might affect the priority.

stuhood · 2019-03-12T23:23:33Z

Unassigning for now so that I can work on the release.

stuhood · 2019-03-14T19:39:16Z

Will pick this back up later today.

stuhood · 2019-03-16T00:33:27Z

I've added and applied a decorator to the tests mentioned above in #7322: so it should go mostly-green.

stuhood · 2019-03-20T05:58:10Z

Still looking at a few more failures here.

stuhood · 2019-04-03T18:43:08Z

This is being continued in #7440.

** Problem ** See #7320 for context. We would like to run integration tests under pantsd, to discover more issues. ** Solution ** Enable pantsd in the integration tests in the cron shard for py36 (not in the test runner, but on individual integration tests themselves) via a class property, and blacklist the tests that don't work with a useful docstring. ** Result ** Pantsd will now be enabled in integration and contrib tests running in the cron py36 shards.

blorente · 2019-05-09T08:38:36Z

Closed with #7440

blorente mentioned this issue Mar 6, 2019

Enable pantsd by default #4438

Closed

stuhood self-assigned this Mar 6, 2019

stuhood mentioned this issue Mar 6, 2019

WIP: Enable pantsd #7300

Closed

stuhood added the pantsd label Mar 6, 2019

cosmicexplorer mentioned this issue Mar 12, 2019

keeping a live pantsd up for reducing integration test time in travis #7372

Closed

1 task

stuhood removed their assignment Mar 12, 2019

This was referenced Mar 13, 2019

[pantsd] Implement auto-shutdown after runs #7341

Merged

[pantsd] Segfault when no_proxy not set #7380

Closed

stuhood self-assigned this Mar 14, 2019

stuhood assigned blorente and unassigned stuhood Mar 26, 2019

blorente mentioned this issue Apr 10, 2019

[pantsd] Make PantsDaemonIntegrationTestBase hermetic. #7525

Closed

blorente closed this as completed May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pantsd] Enable pantsd in Travis #7320

[Pantsd] Enable pantsd in Travis #7320

blorente commented Mar 6, 2019 •

edited

Loading

stuhood commented Mar 6, 2019 •

edited

Loading

stuhood commented Mar 6, 2019

stuhood commented Mar 12, 2019

cosmicexplorer commented Mar 12, 2019

stuhood commented Mar 12, 2019 •

edited

Loading

stuhood commented Mar 12, 2019

stuhood commented Mar 12, 2019

stuhood commented Mar 14, 2019

stuhood commented Mar 16, 2019

stuhood commented Mar 20, 2019

stuhood commented Apr 3, 2019

blorente commented May 9, 2019

[Pantsd] Enable pantsd in Travis #7320

[Pantsd] Enable pantsd in Travis #7320

Comments

blorente commented Mar 6, 2019 • edited Loading

stuhood commented Mar 6, 2019 • edited Loading

stuhood commented Mar 6, 2019

stuhood commented Mar 12, 2019

cosmicexplorer commented Mar 12, 2019

stuhood commented Mar 12, 2019 • edited Loading

stuhood commented Mar 12, 2019

stuhood commented Mar 12, 2019

stuhood commented Mar 14, 2019

stuhood commented Mar 16, 2019

stuhood commented Mar 20, 2019

stuhood commented Apr 3, 2019

blorente commented May 9, 2019

blorente commented Mar 6, 2019 •

edited

Loading

stuhood commented Mar 6, 2019 •

edited

Loading

stuhood commented Mar 12, 2019 •

edited

Loading