Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that the ptr field of Allocator/Random should not be compared and remove existing comparison #22691

Merged
merged 2 commits into from
Feb 4, 2025

Conversation

squeek502
Copy link
Collaborator

@squeek502 squeek502 commented Jan 31, 2025

Release notes: #22691 (comment)


This is an alternative to #21760 (it's the 1st option in that PR's description):

Codify that comparing ptr is always illegal/unsafe, and remove all existing comparisons.

(suggested for exploration here: #21760 (comment))

However, I personally think this is probably not the best solution to #21756 / #17704, for a few reasons:

  • A doc comment is about the best we can currently do to help unsuspecting users to avoid comparing these fields and invoking (currently unchecked, see debug safety feature: runtime undefined value detection #211) illegal behavior
  • Unless I'm thinking about it wrong, in the future it seems likely that this illegal behavior would become a runtime panic at best (i.e. not a compile error) which doesn't seem ideal (you could still write library code that appears to work fine if you don't use any Allocator implementations that set .ptr = undefined)
  • The solution in Do not use undefined for the ptr field of Allocator/Random instances #21760 better communicates the intention behind setting .ptr = undefined, and removes the possibility of illegal behavior (and doesn't increase the field's size, so there's no real downside AFAIK)

If #21760 is merged, this PR could be seen as just a straightforward improvement to the API of process.Child.collectOutput, plus an irrelevant extra commit that could be dropped.

What I personally would like to see happen is exactly that: merge #21760 to address #21756 / #17704, and then drop the second commit from this PR and merge it too.


Note that the changes to collectOutput differ from those suggested in #21760 (comment) because I realized that there's no tangible benefit to taking ArrayList pointers--the memory for stdout/stderr is handled via std.io.poll and its LinearFifos, and then the result is stuffed into the ArrayList fields and returned, so doing something like calling ensureTotalCapacity/etc before collectOutput is wasted at best (or will leak memory at worst).

Instead, the ArrayList parameters were removed in favor of just an allocator parameter, and the ArrayList usage is now just an implementation detail (it's just used for the convenience of toOwnedSlice).


If this is merged instead of #21760, then:

@mlugg
Copy link
Member

mlugg commented Jan 31, 2025

I think this is a much better approach than #21760.

To me, a doc comment isn't some "best effort" thing, it's precisely the right thing to do here. Doc comments are how we communicate aspects of APIs which aren't clear from the type system alone. In this case, we're trying to communicate the constraint "comparisons with this field may result in undefined"; or perhaps even "this field should not be read aside from to pass to the functions in vtable".

  • Unless I'm thinking about it wrong, in the future it seems likely that this illegal behavior would become a runtime panic at best (i.e. not a compile error) which doesn't seem ideal (you could still write library code that appears to work fine if you don't use any Allocator implementations that set .ptr = undefined)

The alternative is to write code which is almost certainly a bug. #21760 masks weird behavior by making it valid; this approach means that it does at least sometimes have the possibility to result in a safety panic! The specific example in #21760 isn't indicative of a bug as such, but of an incredibly poor API. I can't think of any reason to look at ptr which is not indicative of either a bug or a poor API.

I disagree. The intention here is to convey "this pointer is guaranteed to be unused, so its value is irrelevant"; those are precisely the semantics that undefined implements and conveys.

@squeek502
Copy link
Collaborator Author

Makes sense.

The intention here is to convey "this pointer is guaranteed to be unused, so its value is irrelevant"

I think this is our only real disagreement, as I don't see this as a sufficient guarantee. Can totally understand this being merged instead of #21760, though; it's just not the choice I would personally make.

@squeek502 squeek502 force-pushed the child-internal-array-list branch from 39536a6 to 73c121a Compare January 31, 2025 22:26
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this one up too.

Merging it fixes the immediate problem but does not preclude a follow-up to also do your suggestion.

The API of collectOutput is not to my taste, but I fixed it by removing pub. It appears to be unused outside this file.

@andrewrk andrewrk enabled auto-merge February 2, 2025 04:56
@squeek502
Copy link
Collaborator Author

squeek502 commented Feb 2, 2025

Might be worth noting that collectOutput is used in 3rd party code; 214 results on GitHub:

https://github.com/search?q=collectOutput+language%3Azig&type=code

@andrewrk andrewrk disabled auto-merge February 2, 2025 04:59
@andrewrk
Copy link
Member

andrewrk commented Feb 2, 2025

Thanks for checking, in that case I'll make a different flavor of review

stdout: *std.ArrayList(u8),
stderr: *std.ArrayList(u8),
/// Used for `stdout` and `stderr`.
allocator: Allocator,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is going to break some people's code, let's put a little effort into making it more future-proof. My suggestion also happens to be more familiar to existing callsites.

  • Make them both std.ArrayListUnmanaged
  • Make the function only append to them, not clear them

Sure, this implementation uses that fifo abstraction but that's not necessarily how it has to work. And append-only array list is generally a very flexible API from the caller's perspective.

I'll note also that this patch introduces more error handling that can be deleted with the above suggestion.

Copy link
Collaborator Author

@squeek502 squeek502 Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a simple version of this API while I look into improving the implementation (the version I just pushed still uses std.io.poll and then just calls appendSlice to copy the data from the FIFO to the ArrayListUnmanaged).

Unsure how long it'll take to write the improved version; one option would be to leave improving the implementation as a follow-up issue.

Copy link
Collaborator Author

@squeek502 squeek502 Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed an attempt at making std.io.Poller not FIFO-specific and then using the ArrayListUnmanageds directly here:

squeek502@b8c7327

Let me know if it looks like I'm vaguely on the right track and I'll add it to this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't look at your branch yet, I'll try to get to that later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, it's only a minor improvement after the list.capacity == 0 optimization from #22691 (comment)

@squeek502 squeek502 force-pushed the child-internal-array-list branch from 3b4a04d to fd40c36 Compare February 2, 2025 09:08
@alexrp alexrp force-pushed the child-internal-array-list branch from fd40c36 to 16200c8 Compare February 3, 2025 18:49
@squeek502 squeek502 force-pushed the child-internal-array-list branch from 16200c8 to 4041cc0 Compare February 4, 2025 01:07
@andrewrk
Copy link
Member

andrewrk commented Feb 4, 2025

Thanks, appreciate all the follow-ups.

@andrewrk andrewrk enabled auto-merge February 4, 2025 03:16
@andrewrk andrewrk merged commit d72f3d3 into ziglang:master Feb 4, 2025
10 checks passed
@squeek502
Copy link
Collaborator Author

squeek502 commented Feb 5, 2025

This should get the breaking and release notes labels due to the collectOutput API change. I can try writing up some release notes for this if you'd like.

@alexrp alexrp added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library. labels Feb 5, 2025
@squeek502
Copy link
Collaborator Author

squeek502 commented Feb 6, 2025

Release Notes

std.process.Child.collectOutput API changes

Upgrade guide:

var stdout = std.ArrayList(u8).init(allocator);
defer stdout.deinit();
var stderr = std.ArrayList(u8).init(allocator);
defer stderr.deinit();

try child.collectOutput(&stdout, &stderr, max_output_bytes);

var stdout: std.ArrayListUnmanaged(u8) = .empty;
defer stdout.deinit(allocator);
var stderr: std.ArrayListUnmanaged(u8) = .empty;
defer stderr.deinit(allocator);

try child.collectOutput(allocator, &stdout, &stderr, max_output_bytes);

Before, collectOutput included a check to ensure that stdout.allocator was the same as stderr.allocator, which was necessary due to its internal implementation. However, comparing the ptr field of an Allocator interface can lead to illegal behavior, since Allocator.ptr is set to undefined in cases where the allocator implementation does not have any associated state (page_allocator, c_allocator, etc).

With this change, that unsafe Allocator.ptr comparison has been eliminated from collectOutput (and it was the only occurrence of such a comparison in the Zig codebase). Additionally, the documentation for the ptr field of both the Allocator and Random interfaces has been updated to mention that any comparison of those fields could lead to illegal behavior. In the future, this comparison will become detectable illegal behavior.

@xdBronch
Copy link
Contributor

xdBronch commented Feb 6, 2025

probably shouldnt use decl literals in the before since 0.13 didnt have them

@squeek502
Copy link
Collaborator Author

Good call, removed them from the 'before' code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
5 participants