Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text file busy when attempting to execute script written into sandbox #10507

Closed
stuhood opened this issue Jul 29, 2020 · 11 comments · Fixed by #10568 or #10577
Closed

Text file busy when attempting to execute script written into sandbox #10507

stuhood opened this issue Jul 29, 2020 · 11 comments · Fixed by #10568 or #10577
Assignees
Labels

Comments

@stuhood
Copy link
Member

stuhood commented Jul 29, 2020

Using a Process to invoke an is_executable=True script can result in:

Error launching process: Os { code: 26, kind: Other, message: "Text file busy" }

We have tests that cover this behavior:

def test_not_executable(self):
file_name = "echo.sh"
file_contents = b'#!/bin/bash -eu\necho "Hello"\n'
digest = self.request_single_product(
Digest, CreateDigest([FileContent(path=file_name, content=file_contents)])
)
req = Process(
argv=("./echo.sh",), input_digest=digest, description="cat the contents of this file",
)
with self.assertRaisesWithMessageContaining(ExecutionError, "Permission"):
self.request_single_product(ProcessResult, req)
def test_executable(self):
file_name = "echo.sh"
file_contents = b'#!/bin/bash -eu\necho "Hello"\n'
digest = self.request_single_product(
Digest,
CreateDigest([FileContent(path=file_name, content=file_contents, is_executable=True)]),
)
req = Process(
argv=("./echo.sh",), input_digest=digest, description="cat the contents of this file",
)
result = self.request_single_product(ProcessResult, req)
self.assertEqual(result.stdout, b"Hello\n")
... so it's possible that this is non-deterministic.

@Eric-Arellano
Copy link
Contributor

@Eric-Arellano
Copy link
Contributor

To reproduce, apply this diff:

diff --git a/src/python/pants/engine/process.py b/src/python/pants/engine/process.py
index 2dbb58714..0001946d2 100644
--- a/src/python/pants/engine/process.py
+++ b/src/python/pants/engine/process.py
@@ -336,12 +336,12 @@ class BinaryPaths(EngineAware):
 @rule(desc="Find binary path")
 async def find_binary(request: BinaryPathRequest) -> BinaryPaths:
     # TODO(John Sirois): Replace this script with a statically linked native binary so we don't
-    #  depend on either /bin/bash being available on the Process host.
-    # TODO(#10507): Running the script directly from a shebang sometimes results in a "Text file
-    #  busy" error.
-    script_path = "./script.sh"
+    #  depend on either `/usr/bin/env` or bash being available on the Process host.
+    script_path = "./find_binary.sh"
     script_content = dedent(
         """
+        #!/usr/bin/env bash
+
         set -euo pipefail

         if command -v which > /dev/null; then
@@ -363,7 +363,7 @@ async def find_binary(request: BinaryPathRequest) -> BinaryPaths:
         Process(
             description=f"Searching for `{request.binary_name}` on PATH={search_path}",
             input_digest=script_digest,
-            argv=["/bin/bash", script_path, request.binary_name],
+            argv=[script_path, request.binary_name],
             env={"PATH": search_path},
         ),
     )

Then run something like ./pants run build-support/bin/check_headers.py --no-process-execution-use-local-cache --no-pantsd. The flags at the end are to force you to re-search the python binary to use to run the Pex.

I suspect that this only happens on Linux. I haven't been able to get this error locally on macOS. On my Ubuntu VM, I hit a different error that none of the interpreters were discovered in the first place with this change, rather than hitting Error Code 26. In CI, we've hit Error Code 26 a few times.

The current hypothesis is that we're not calling fsync enough, which is called sync in Rust. See https://doc.rust-lang.org/std/fs/struct.File.html#method.sync_all.

We call fsync right now once on the input digest:

// Start with async materialization of input snapshots, followed by synchronous materialization
// of other configured inputs. Note that we don't do this in parallel, as that might cause
// non-determinism when paths overlap.
let _metadata = store
.materialize_directory(workdir_path.clone(), req.input_files)
.compat()
.await?;

If you follow materialize_directory, you'll see that we call .sync_all() at the end:

// We fundamentally materialize files for other processes to read; as such, we must ensure
// data is flushed to disk and visible to them as opposed to just our process. Even though
// we need to re-open all written files, executing all fsyncs at the end of the
// materialize call is significantly faster than doing it as we go.
future::join_all(
materialize_metadata
.to_path_list()
.into_iter()
.map(|path| {
let path = destination.join(path);
executor
.spawn_blocking(move || {
OpenOptions::new()
.write(true)
.create(false)
.open(path)?
.sync_all()
})

The hypothesis is that for some reason that is not enough. We want to experiment if calling fsync again right here would fix things:

let res: Result<_, String> = Ok(());
res

(I'm not sure what that snippet will be. Stu had an idea).

@stuhood
Copy link
Member Author

stuhood commented Aug 7, 2020

It seems likely that this is docker related. This thread is interesting: moby/moby#9547

@stuhood stuhood assigned stuhood and unassigned benjyw and Eric-Arellano Aug 7, 2020
@stuhood
Copy link
Member Author

stuhood commented Aug 7, 2020

Will see if the nuclear option (a global sync) resolves this.

@Eric-Arellano
Copy link
Contributor

It seems likely that this is docker related.

That sounds plausible, but I'm not sure Docker is it. We saw this error on the lint shard before RBE was used, i.e. when running ./pants run build-support/bin/get_rbe_token.py. That happens before RBE is used. That lint shard does not use Docker and only uses the default Travis environment. (Unless, under-the-hood, Travis is running Docker?)

stuhood added a commit that referenced this issue Aug 7, 2020
### Problem

As described in #10507 we suspect that use of docker and AUFS results in a need for heavier handed measures for syncing files to the filesystem before executing them. Various other strategies were tried and ruled out in #10557.

### Solution

If we detect that we are probably running in docker or an LXC container, spawn a `sync` process before executing.

### Result

Fixes #10507.

[ci skip-build-wheels]
jsirois added a commit to jsirois/pants that referenced this issue Aug 7, 2020
Previously our materialize_directory tail end fsync could wipe out file
metadata if the os had already flushed the initial create. Thread
through the full create metadata to allow our tail end fsync to be
faithful.

Fixes pantsbuild#10507

[ci skip-build-wheels]
jsirois added a commit to jsirois/pants that referenced this issue Aug 7, 2020
Previously our materialize_directory tail end fsync could wipe out file
metadata if the os had already flushed the initial create. Thread
through the full create metadata to allow our tail end fsync to be
faithful.

Fixes pantsbuild#10507

[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano reopened this Aug 7, 2020
@jsirois
Copy link
Contributor

jsirois commented Aug 7, 2020

Yeah, see: https://stackoverflow.com/questions/37288453/calling-fsync2-after-close2

That and its pointers say (open, write, close, open, fsync, close) != (open, fsync, close) on various Linux kernels. We'll need to always call /bin/sync when present to get this hack working for CI and then I'm working on a fix that tries to retain perf and still do (open, write, fsync, close) by doing (open, write) x N .... -> (fsync, close) x N

jsirois added a commit to jsirois/pants that referenced this issue Aug 12, 2020
It's generically unsafe to fork+exec against binaries written out in a
multithreaded program when concurrent forks are possible. Even if all
files are opened for writing with O_CLOEXEC (which is the case for
Rust) if thread1 opens a file for writing and then thread2 forks,
thread2 will hold an open file descriptor. If thread2's subsequent exec
is delayed past the fork+exec thread1 does against the file it wrote,
then the thread1 fork+exec'd process will encounter ETXTBSY.

OSX "solves" this by retrying some number of times when it hits ETXTBSY,
but Linux does not attempt this hack. O_CLOFORK has been proposed and
adopted by some unices, but not Linux. As such we need to make some
tradeoff to allow this use case. This change introduces a lock around
process spawns (fork+exec) to prevent interleaved fork / exec whenever
there is a spawn that we know exec's a binary we wrote out.

Fixes pantsbuild#10507

[ci skip-build-wheels]
@jsirois jsirois assigned jsirois and unassigned stuhood Aug 12, 2020
@jsirois
Copy link
Contributor

jsirois commented Aug 12, 2020

Wrapping things up here:

  1. The fsync business was a canard I introduced in ~2018 - never related. That's just about file contents on disk, not file contents visible to programs which read through os buffer caches. Worse, ETXTBSY is about open files, not about partial data. Since you must fsync on an open filehandle, this was very obviously (in retrospect) sloppy thinking from the get go.
  2. Unrelated in the end, but very interesting if we do ever switch to tokio async File APIs: Dropping a File does not close it tokio-rs/tokio#2307
  3. Dead on related and very basic in the end - you simply cannot concurrently be forking and writing out the binary you will exec against per standard unix rules:
    cmd/go: "fork/exec foo.test: text file busy" flakes golang/go#22220
    os: StartProcess ETXTBSY race on Unix systems golang/go#22315

#10577 works around 3 with a fork+exec lock.

jsirois added a commit to jsirois/pants that referenced this issue Aug 13, 2020
It's generically unsafe to fork+exec against binaries written out in a
multithreaded program when concurrent forks are possible. Even if all
files are opened for writing with O_CLOEXEC (which is the case for
Rust) if thread1 opens a file for writing and then thread2 forks,
thread2 will hold an open file descriptor. If thread2's subsequent exec
is delayed past the fork+exec thread1 does against the file it wrote,
then the thread1 fork+exec'd process will encounter ETXTBSY.

OSX "solves" this by retrying some number of times when it hits ETXTBSY,
but Linux does not attempt this hack. O_CLOFORK has been proposed and
adopted by some unices, but not Linux. As such we need to make some
tradeoff to allow this use case. This change introduces a lock around
process spawns (fork+exec) to prevent interleaved fork / exec whenever
there is a spawn that we know exec's a binary we wrote out.

Fixes pantsbuild#10507

[ci skip-build-wheels]
jsirois added a commit to jsirois/pants that referenced this issue Aug 13, 2020
It's generically unsafe to fork+exec against binaries written out in a
multithreaded program when concurrent forks are possible. Even if all
files are opened for writing with O_CLOEXEC (which is the case for
Rust) if thread1 opens a file for writing and then thread2 forks,
thread2 will hold an open file descriptor. If thread2's subsequent exec
is delayed past the fork+exec thread1 does against the file it wrote,
then the thread1 fork+exec'd process will encounter ETXTBSY.

OSX "solves" this by retrying some number of times when it hits ETXTBSY,
but Linux does not attempt this hack. O_CLOFORK has been proposed and
adopted by some unices, but not Linux. As such we need to make some
tradeoff to allow this use case. This change introduces a lock around
process spawns (fork+exec) to prevent interleaved fork / exec whenever
there is a spawn that we know exec's a binary we wrote out.

Fixes pantsbuild#10507

[ci skip-build-wheels]
jsirois added a commit to jsirois/pants that referenced this issue Aug 15, 2020
It's generically unsafe to fork+exec against binaries written out in a
multithreaded program when concurrent forks are possible. Even if all
files are opened for writing with O_CLOEXEC (which is the case for
Rust) if thread1 opens a file for writing and then thread2 forks,
thread2 will hold an open file descriptor. If thread2's subsequent exec
is delayed past the fork+exec thread1 does against the file it wrote,
then the thread1 fork+exec'd process will encounter ETXTBSY.

OSX "solves" this by retrying some number of times when it hits ETXTBSY,
but Linux does not attempt this hack. O_CLOFORK has been proposed and
adopted by some unices, but not Linux. As such we need to make some
tradeoff to allow this use case. This change introduces a lock around
process spawns (fork+exec) to prevent interleaved fork / exec whenever
there is a spawn that we know exec's a binary we wrote out.

Fixes pantsbuild#10507

[ci skip-build-wheels]
jsirois added a commit to jsirois/pants that referenced this issue Aug 15, 2020
It's generically unsafe to fork+exec against binaries written out in a
multithreaded program when concurrent forks are possible. Even if all
files are opened for writing with O_CLOEXEC (which is the case for
Rust) if thread1 opens a file for writing and then thread2 forks,
thread2 will hold an open file descriptor. If thread2's subsequent exec
is delayed past the fork+exec thread1 does against the file it wrote,
then the thread1 fork+exec'd process will encounter ETXTBSY.

OSX "solves" this by retrying some number of times when it hits ETXTBSY,
but Linux does not attempt this hack. O_CLOFORK has been proposed and
adopted by some unices, but not Linux. As such we need to make some
tradeoff to allow this use case. This change introduces a lock around
process spawns (fork+exec) to prevent interleaved fork / exec whenever
there is a spawn that we know exec's a binary we wrote out.

Fixes pantsbuild#10507

[ci skip-build-wheels]
jsirois added a commit that referenced this issue Aug 15, 2020
It's generically unsafe to fork+exec against binaries written out in a
multithreaded program when concurrent forks are possible. Even if all
files are opened for writing with O_CLOEXEC (which is the case for
Rust) if thread1 opens a file for writing and then thread2 forks,
thread2 will hold an open file descriptor. If thread2's subsequent exec
is delayed past the fork+exec thread1 does against the file it wrote,
then the thread1 fork+exec'd process will encounter ETXTBSY.

OSX "solves" this by retrying some number of times when it hits ETXTBSY,
but Linux does not attempt this hack. O_CLOFORK has been proposed and
adopted by some unices, but not Linux.

As such we need to make some tradeoff to allow this use case. This 
change introduces a lock around process spawns (fork+exec) to prevent
interleaved fork / exec whenever there is a spawn that we know exec's a 
binary we wrote out. Since forks can still happen in our process space
outside of our control (in libraries), this change also introduces a 
bounded ETXTBSY retry loop as a fallback for binaries we wrote out.

Fixes #10507
@jsirois
Copy link
Contributor

jsirois commented Aug 15, 2020

For completeness, some postscript on #10577:

Although controlling forks in a central location and serializing fork/exec against any binaries we write out solves the issue for code we control, it does not solve the general problem when we do not control all code (libraries we link against). As such, #10577 added a bounded retry loop to protect against uncontrolled forks.

An alternative approach requiring no explicit locks or retry loop workarounds would be to write out binaries in a seperate multithreaded process that does no forking at all. We'd need to wire up a communication channel (pipes) to send digests that need to be written out and be able to await completion of the writing. That form of solution would probably pair well with our still unused brfs fuse filesystem. To use that filesystem, we'll want to keep it mounted at least as longs as pantsd stays up if not longer. Its easy to imagine a second Pants daemon with a lifttime >= pantsd that:

  1. fork+exec's a brfs mount
  2. starts an event loop that waits on requests via os level pipes to materialize digests , does so by creating symlink farms in chroots and then communicates completions over said pipes.

Note that there is a fork+exec here, but only 1 and on daemon startup before an event loop is ever started. Note also that we finally leverage brfs which should significantly improve digest materialization performance - no data copying, just symlinks that point directly into lmdb memory.

@stuhood
Copy link
Member Author

stuhood commented Sep 14, 2020

Would brfs even suffer from this, given that you never "open" the executable file for writing? It just "exists" from the get go when read by a client?

@jsirois
Copy link
Contributor

jsirois commented Sep 14, 2020

Would brfs even suffer from this, given that you never "open" the executable file for writing? It just "exists" from the get go when read by a client?

You're right, it would not since, critically, we would use it asymmetrically and write directly to LMDB (not through the brfs filesystem) and only read from the brfs filesystem. If we wrote through the brfs filesystem interface we'd have the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants