Binary cache: async push_success #908

autoantwort · 2023-02-15T13:52:30Z

This results in ~10-20% faster build times on my machine.

For example building boost on my M1 mac went down from 2.948 min to 2.375 min

autoantwort · 2023-02-15T21:19:32Z

How or when should "upload messages" (like Uploaded binaries to {count} {vendor}.) be printed?

Thomas1664 · 2023-02-16T14:24:53Z

Doesn't this have the same problem as #694 that the working thread might exit due to calls to check_exit or value_or_exit?

src/vcpkg/binarycaching.cpp

autoantwort · 2023-02-16T14:27:43Z

Doesn't this have the same problem as #694 that the working thread might exit due to calls to check_exit or value_or_exit?

Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning

Thomas1664 · 2023-02-16T14:44:30Z

Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning

The problem is that we almost never can be sure that there isn't some nested API call that exits on failure. But it seems like #909 at least partially addresses this issue.

autoantwort · 2023-02-16T14:46:12Z

Yeah but in the binary cache are nearly no hard exists. It currently also only prints warnings.

# Conflicts: # src/vcpkg.cpp

ras0219-msft

I like this direction; unblocking I/O work has great potential for making vcpkg much faster.

However we need to be very careful about the impacts of concurrency -- deadlocks suck :(

include/vcpkg/binarycaching.h

src/vcpkg/install.cpp

src/vcpkg/binarycaching.cpp

include/vcpkg/binarycaching.h

ras0219-msft · 2023-03-01T01:29:11Z

src/vcpkg.cpp

@@ -156,6 +157,7 @@ namespace vcpkg::Checks
    // Implements link seam from basic_checks.h
    void on_final_cleanup_and_exit()
    {
+        BinaryCache::wait_for_async_complete();


I do not think we can do this here. This is on the critical path for ctrl-c handling and should only be used for extremely fast, emergency tear-down behavior (like restoring the console).

If there happens to be an exit anywhere in any BinaryCache implementation, this would deadlock. Importantly, this include any sort of assertion we might want to do, like checking pointers for null.

Unfortunately, the only path forward I see is to call this (or appropriately scope the BinaryCache itself) at the relevant callers. The consequence of possibly not uploading some set of binary caches in the case of some unhandled program error (such as permissions issue on a directory expected to be writable) is vastly preferable to deadlocks.

I have changed the BinaryCache::wait_for_async_complete() implementation so it does not deadlock anymore.

I also moved the call to Checks::exit_with_code which is not called when crtl+c is handled. (I personally would like to have a way to terminate vcpkg but wait until the binary cache is done so that I don't lose progress.)

And I prefer it when build packages are uploaded to the binary caches before vcpkg exits because of an error, otherwise I have to build the already build packages again at a later point when there is no cache entry.

Agreed that it is, desirable to finish uploading on "understood" errors. For example, if a package failed to build or failed to be installed.

I was also wrong about my original assessment of a deadlock. My concern was the call path of the binary upload thread calling Checks::unreachable() or .value_or_exit(), but it seems that std::thread::join() does have a carve-out to handle this specific case: it will throw a resource_deadlock_would_occur if you try to join yourself.

I've put some other concerns below, but I don't want those to distract from my main point: We must make it as trivial / correct-by-construction as possible to guarantee that the binary cache thread NEVER attempts to wait on itself. I think the best approach for vcpkg right now is to add calls from Install::perform() etc to BinaryCache::wait_for_async_complete() before any "user-facing" error, such as the exit guarded by result.code != BuildResult::SUCCEEDED && keep_going == KeepGoing::NO. This is motivated by the perspective that it's always safer to terminate than to join and possibly deadlock / race condition / etc.

There's still a UB data race if the main thread and binary upload thread attempt to exit at the same time:

Concurrently calling join() on the same thread object from multiple threads constitutes a data race that results in undefined behavior.
-- https://en.cppreference.com/w/cpp/thread/thread/join

There's also a serious "scalability" problem if we ever want a second background thread for whatever reason, because BGThread A would join on BGThread B, while BGThread B tries to join on BGThread A. This might be solvable with ever more complex structures, such as a thread ownership DAG that gets threads to join only on their direct children, but I don't think the benefit is worth the cost.

The UB and the joining itself could simply be prevented by doing a if (std::this_thread::get_id() == instance->push_thread.get_id()). My concern with the explicit approach is that it is easy to forget to call the waiting function of the BinaryCache and every time you want to exit you have to remember to call it. This seems to me to be very prone to human error.

I have now implemented your request

@ras0219-msft Is there anything left that is preventing this PR from being merged?

src/vcpkg/binarycaching.cpp

Co-authored-by: Robert Schumacher <[email protected]>

…ages between package installs Co-authored-by: Robert Schumacher <[email protected]>

See microsoft#908 (comment)

# Conflicts: # src/vcpkg/build.cpp

# Conflicts: # src/vcpkg/base/messages.cpp

# Conflicts: # include/vcpkg/base/messages.h # src/vcpkg/base/messages.cpp

autoantwort · 2024-12-26T15:57:59Z

include/vcpkg/base/message_sinks.h

+        std::vector<std::pair<Color, std::string>> m_published;
+
+        // buffers messages until newline is reached
+        // guarded by m_print_directly_lock
+        std::vector<std::pair<Color, std::string>> m_unpublished;


@BillyONeal Now that you have implemented the "error document" type stuff, I am right that this should be a DiagnosticLine etc. instead.

I'm not sure yet, I ran into a problem where I'm not sure exactly how "status" type information should be conveyed through this infrastructure; I'm working on it over here: main...BillyONeal:vcpkg-tool:message-sink-line

As part of that I realized that #1137 touches the same area and would be easier to merge first and I'm doing that right now...

Why does a "status" need to be conveyed here at all? This are only informational messages.
Or what is the problem you are trying to solve here?

Some messages, like "I am about to touch the network now", are time sensitive and under normal conditions must be printed. However, errors/warnings like "the download failed" might need to be suppressed, if a subsequent retry / alternate makes the overall process succeed.

For example, when downloading a file, in "time order" let's say this happens:

We try to download the file from an asset cache. This should get a timely message because it touches the network.

Attempting to contact the asset cache fails. This normally emits an error.

We try to download the file from upstream. This needs a timely message because it touches the network.

Download from the real upstream succeeds. We need to 'go back in time' and not emit the error from #2. This means we can't print that error when it happens, we have to buffer it until we know it's actually going to happen or not. But we normally must not buffer #1.

We try to submit the freshly downloaded file back to the asset cache. This needs a timely message because it touches the network.

Submitting back to the asset cache fails for some reason. That normally emits an error, but in this context needs to be reduced to a warning because we can continue without it.

If any of this is happening on 'background' threads, even the 'timely' messages need to be held until the next synchronization point with the thread that owns the console. This is what 'statusln' is for in my WIP. They need to share one channel rather than just passing both MessageSink and DiagnosticContext to handle this background case where a caller wants to keep the original-time-order of diagnostics and status.

If any of this is happening on 'background' threads, even the 'timely' messages need to be held until the next synchronization point with the thread that owns the console.

This PR already does this (for the scope of the PR)

background case where a caller wants to keep the original-time-order of diagnostics and status

That is already the case in the PR. But yeah this PR don't know when a message ends (if it spans multiple lines), the whole reason for the error document type stuff to be created. But could this not simply be solved by buffering DiagnosticLines instead and the use of DiagnostigContext instead of MessageSink in this PR, or what breaks then?
Currently: every message (regardless of the type) -> MessageSinkBuffer -> MessageSink
Future: every message (regardless of the type) -> DiagnosticLineBuffer -> MessageSink

But could this not simply be solved by buffering DiagnosticLines instead and the use of DiagnostigContext instead of MessageSink in this PR, or what breaks then?

Then 'inner' parts have no way to emit the 'intended to be timely' messages. What I'm doing is:

Add one function, statusln to DiagnosticContext where these timely messages go. Normal buffering of errors won't buffer these, but background thread stuff will. Notably, there is no non-ln version because it needs to be a reasonable buffering point. (Embedded newlines are fine, but callers need to assume there will be a newline there)

Teach downloads.cpp et al. to use DiagnosticContext, which implied teaching system.process.cpp how to use DiagnosticContext.

Then 'inner' parts have no way to emit the 'intended to be timely' messages.

Iiuc 'intended to be timely' = must be printed immediately: The inner parts that don't run in the background can print there stuff immediately via the MessageSink/DiagnosticContext and the stuff in the background thread is never allowed to print stuff immediately otherwise you get interleaved messages with the build output.
So I don't understand why we need this extra "print messages immediately" channel when the only possible states are "print everything immediately" or "print nothing immediately".
Maybe I should just wait and see what your resulting code looks like 😅

So I don't understand why we need this extra "print messages immediately" channel when the only possible states are "print everything immediately" or "print nothing immediately".

No, there's a third condition, which is step 4 in my example above. We need to buffer errors and warnings from the inner operation, because we may want to swallow and not emit them if a subsequent attempt succeeds, but we must not buffer any of the timely status messages.

Example: https://github.com/BillyONeal/vcpkg-tool/blob/9aa671863a68ef90d0c355e4594bd9925d9df083/src/vcpkg/base/downloads.cpp#L926-L966

Aah thanks for the example! :)
But iiuc this "problem" is not caused by this PR and already existed beforehand?

Yes, it has nothing to do with your PR. It's a problem I ran into trying to DiagnosticContext-ize the downloads stuff, which I want to do in order to have confidence that your PR is correct. (This way everything the background thread might touch speaks the 'can be made background thread aware' language)

@JavierMatosD

…utput (#1565) Extensive overhaul of our downloads handling and console output; @JavierMatosD and I have gone back and forth several times and yet kept introducing unintended bugs in other places, which led me to believe targeted fixes would no longer cut it. Fixes many longstanding bugs and hopefully makes our console output for this more understandable: * We no longer print 'error' when an asset cache misses but the authoritative download succeeds. This partially undoes #1541. It is good to print errors immediately when they happen, but if a subsequent authoritative download succeeds we need to not print those errors. * We now always and consistently print output from x-script s at the time that actually happens. Resolves https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2300063 * We don't tell the user that proxy settings might fix a hash mismatch problem. * We do tell the user that proxy settings might fix a download from asset cache problem. * We now always tell the user the full command line we tried when invoking an x-script that fails. * We don't crash if an x-script doesn't create the file we expect, or creates a file with the wrong hash. * We now always print what we are doing *before* touching the network, so if we hang the user knows which server is being problematic. Note that this includes storing back to asset caches which we were previously entirely silent about except in case of failure. Other changes: * Removed debug output about asset cache configuration. The output was misleading / wrong depending on readwrite settings, and echoing to the user exactly what they said before we've interpreted it is not useful debug output. (Contrast with other `VcpkgPaths` debug output which tend to be paths we have likely changed from something a user said) Other notes: * This makes all dependencies of #908 speak `DiagnosticContext` so it will be easy to audit that the foreground/background thread behavior is correct after this. * I did test the curl status parsing on old Ubuntu again. Special thanks to @JavierMatosD for his help in review of the first console output attempts and for blowing the dust off this area in the first place.

…cache-push-success # Conflicts: # include/vcpkg/base/fwd/message_sinks.h # include/vcpkg/base/message_sinks.h # src/vcpkg/base/message_sinks.cpp

…cache-push-success # Conflicts: # src/vcpkg/commands.install.cpp # src/vcpkg/commands.set-installed.cpp

… background thread.

…r move.

…ture/async-binary-cache-push-success # Conflicts: # include/vcpkg/binarycaching.h # src/vcpkg/binarycaching.cpp

src/vcpkg/binarycaching.cpp

…tion from the background thread.

… the work queue is drained before returning that no work is left.

* Restore autoantwort's only printing counts when done. * Note which specs we are submitting in messages from the background.

…hread

BillyONeal · 2025-02-03T18:35:35Z

@autoantwort I pushed some changes here, can you let me know if you are happy with them? Thanks!

BillyONeal · 2025-02-03T18:38:30Z

src/vcpkg/binarycaching.cpp

-    static ExpectedL<BinaryProviders> make_binary_providers(const VcpkgCmdArguments& args, const VcpkgPaths& paths)
+    void ReadOnlyBinaryCache::fetch(View<InstallPlanAction> actions)
+    {
+        std::vector<const InstallPlanAction*> action_ptrs;


This block is just moved up from 2325 as these things became members of ReadOnlyBinaryCache or BinaryCache rather than being local to this file now.

BillyONeal · 2025-02-03T18:38:57Z

src/vcpkg/binarycaching.cpp

+        });
+    }
+
+    void BinaryCacheSynchronizer::add_submitted() noexcept


This starts meaningfully new code.

autoantwort

LGTM

autoantwort · 2025-02-05T11:09:27Z

include/vcpkg/binarycaching.h

+        using backing_uint_t = std::conditional_t<sizeof(size_t) == 4, uint32_t, uint64_t>;
+        using counter_uint_t = std::conditional_t<sizeof(size_t) == 4, uint16_t, uint32_t>;


Why does this depend on size_t?

There are a lot of 64 bit machines without 32 bit atomics, and a lot of 32 bit machines without 64 bit atomics, and I wanted to choose something least likely to put us into lockful atomics world.

BillyONeal · 2025-02-05T22:21:30Z

Thanks for the contribution!

BillyONeal · 2025-02-14T02:23:25Z

Neumann-A · 2025-02-14T10:43:14Z

Why is this so inconsistent? I would have expected less variance in the result. Especially for stuff taking 1d and longer.

dg0yt · 2025-02-14T12:30:59Z

Does the artifact size with static linkage explain most of the inconsistency? In particular when ports install executables.

Neumann-A · 2025-02-14T12:42:15Z

Does the artifact size with static linkage explain most of the inconsistency? In particular when ports install executables.

Hmm maybe. The android triplets are mor ore less consistent and the -static and -static-md are also more or less consistent. @BillyONeal do you have storage data for the different triplets?

BillyONeal · 2025-02-14T16:25:50Z

The difference being mostly a function of how big the binary cache size is is my supposition as well. I don't have those stats though. For instance, the triplets with an LLVM have more improvement. The improvement for macOS seems bigger, which might be explained by not being in the same data center as the caches.

Binary cache: async push_success

95f0438

autoantwort marked this pull request as draft February 15, 2023 20:04

Thomas1664 reviewed Feb 16, 2023

View reviewed changes

src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved

autoantwort mentioned this pull request Feb 16, 2023

Unified object provider backend #911

Draft

Merge branch 'main' into feature/async-binary-cache-push-success

9d999d8

# Conflicts: # src/vcpkg.cpp

ras0219-msft requested changes Mar 1, 2023

View reviewed changes

autoantwort and others added 14 commits March 2, 2023 21:34

Merge branch 'main' into feature/async-binary-cache-push-success

163d9cd

Apply suggestions from code review

2a54205

Co-authored-by: Robert Schumacher <[email protected]>

Adapt code review

0912655

Update src/vcpkg/binarycaching.cpp

5d7288c

Co-authored-by: Robert Schumacher <[email protected]>

Adapt code review

10189ac

Remove unnecessary actions_to_push_notifier.notify_all()

2567607

Prevent deadlock and don't be on the crtl+c path

ecdd000

Add and use BGMessageSink to print IBinaryProvider::push_success mess…

8e7ae61

…ages between package installs Co-authored-by: Robert Schumacher <[email protected]>

Restore old upload message

850d7c9

Don't join yourself

548be38

Print messages about remaining packages to upload

6dbbf06

Localization

74b86fd

Improve messages

5171d3e

No singleton and explicit calls to wait_for_async_complete()

d69ed8f

See microsoft#908 (comment)

autoantwort marked this pull request as ready for review March 5, 2023 20:28

autoantwort added 4 commits March 8, 2023 19:22

Merge branch 'main' into feature/async-binary-cache-push-success

2df42d5

# Conflicts: # src/vcpkg/build.cpp

Merge branch 'main' into feature/async-binary-cache-push-success

5f1786e

# Conflicts: # src/vcpkg/base/messages.cpp

Merge branch 'main' into feature/async-binary-cache-push-success

93303c3

Merge branch 'main' into feature/async-binary-cache-push-success

8a26c8b

# Conflicts: # include/vcpkg/base/messages.h # src/vcpkg/base/messages.cpp

autoantwort commented Dec 26, 2024

View reviewed changes

BillyONeal mentioned this pull request Jan 8, 2025

Add a "status" API to DiagnosticContext, overhaul Downloads console output #1565

Merged

Merge branch 'main' into feature/async-binary-cache-push-success

061e6e8

BillyONeal added 6 commits January 14, 2025 17:56

Merge remote-tracking branch 'origin/main' into feature/async-binary-…

050c51f

…cache-push-success # Conflicts: # include/vcpkg/base/fwd/message_sinks.h # include/vcpkg/base/message_sinks.h # src/vcpkg/base/message_sinks.cpp

Merge remote-tracking branch 'origin/main' into feature/async-binary-…

8182732

…cache-push-success # Conflicts: # src/vcpkg/commands.install.cpp # src/vcpkg/commands.set-installed.cpp

Change find_last test to something that find (forward) won't pass.

139c7da

Collapse the background work queue system to handle completion of the…

4f410f1

… background thread.

Change BinaryCache and ZipTool's interface to avoid needing to copy o…

73b693a

…r move.

Merge remote-tracking branch 'BillyONeal/contextize-ziptool' into fea…

54fe17f

…ture/async-binary-cache-push-success # Conflicts: # include/vcpkg/binarycaching.h # src/vcpkg/binarycaching.cpp

BillyONeal reviewed Jan 31, 2025

View reviewed changes

src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved

BillyONeal added 4 commits January 31, 2025 14:54

Fixed upload status being printed to the terminal without synchroniza…

27780f9

…tion from the background thread.

Use any_of, put a member FileSystem& back into BinaryCache, make sure…

8e15cf4

… the work queue is drained before returning that no work is left.

* Combine submitted/completed counts into one atomic.

21de05a

* Restore autoantwort's only printing counts when done. * Note which specs we are submitting in messages from the background.

Change submission count message slightly and avoid printing from BG t…

8f372b2

…hread

BillyONeal approved these changes Feb 3, 2025

View reviewed changes

BillyONeal marked this pull request as ready for review February 3, 2025 18:34

BillyONeal reviewed Feb 3, 2025

View reviewed changes

src/vcpkg/binarycaching.cpp

});

}

void BinaryCacheSynchronizer::add_submitted() noexcept

Copy link

Member

BillyONeal Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This starts meaningfully new code.

Make the message count pettier.

8ecfdaa

autoantwort commented Feb 5, 2025

View reviewed changes

BillyONeal merged commit a6289e8 into microsoft:main Feb 5, 2025
6 checks passed

autoantwort deleted the feature/async-binary-cache-push-success branch February 15, 2025 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary cache: async push_success #908

Binary cache: async push_success #908

autoantwort commented Feb 15, 2023 •

edited

Loading

autoantwort commented Feb 15, 2023

Thomas1664 commented Feb 16, 2023

autoantwort commented Feb 16, 2023

Thomas1664 commented Feb 16, 2023

autoantwort commented Feb 16, 2023

ras0219-msft left a comment

ras0219-msft Mar 1, 2023

autoantwort Mar 2, 2023

ras0219-msft Mar 3, 2023 •

edited

Loading

autoantwort Mar 4, 2023

autoantwort Mar 5, 2023 •

edited

Loading

autoantwort Mar 18, 2023

autoantwort Dec 26, 2024

BillyONeal Dec 26, 2024 •

edited

Loading

autoantwort Dec 26, 2024

BillyONeal Jan 3, 2025 •

edited

Loading

autoantwort Jan 3, 2025

BillyONeal Jan 4, 2025 •

edited

Loading

autoantwort Jan 4, 2025

BillyONeal Jan 4, 2025

autoantwort Jan 4, 2025

BillyONeal Jan 4, 2025 •

edited

Loading

BillyONeal commented Feb 3, 2025

BillyONeal Feb 3, 2025 •

edited

Loading

BillyONeal Feb 3, 2025

autoantwort left a comment

autoantwort Feb 5, 2025

BillyONeal Feb 5, 2025

BillyONeal commented Feb 5, 2025

BillyONeal commented Feb 14, 2025

Neumann-A commented Feb 14, 2025

dg0yt commented Feb 14, 2025

Neumann-A commented Feb 14, 2025

BillyONeal commented Feb 14, 2025

		using backing_uint_t = std::conditional_t<sizeof(size_t) == 4, uint32_t, uint64_t>;
		using counter_uint_t = std::conditional_t<sizeof(size_t) == 4, uint16_t, uint32_t>;

Binary cache: async push_success #908

Binary cache: async push_success #908

Conversation

autoantwort commented Feb 15, 2023 • edited Loading

autoantwort commented Feb 15, 2023

Thomas1664 commented Feb 16, 2023

autoantwort commented Feb 16, 2023

Thomas1664 commented Feb 16, 2023

autoantwort commented Feb 16, 2023

ras0219-msft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ras0219-msft Mar 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

autoantwort Mar 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyONeal Dec 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyONeal Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyONeal Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyONeal Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

BillyONeal commented Feb 3, 2025

BillyONeal Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

autoantwort left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyONeal commented Feb 5, 2025

BillyONeal commented Feb 14, 2025

Neumann-A commented Feb 14, 2025

dg0yt commented Feb 14, 2025

Neumann-A commented Feb 14, 2025

BillyONeal commented Feb 14, 2025

autoantwort commented Feb 15, 2023 •

edited

Loading

ras0219-msft Mar 3, 2023 •

edited

Loading

autoantwort Mar 5, 2023 •

edited

Loading

BillyONeal Dec 26, 2024 •

edited

Loading

BillyONeal Jan 3, 2025 •

edited

Loading

BillyONeal Jan 4, 2025 •

edited

Loading

BillyONeal Jan 4, 2025 •

edited

Loading

BillyONeal Feb 3, 2025 •

edited

Loading