-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce allowzero
pointer attribute which makes 0x0 an allowed address
#1953
Comments
Perhaps I am late to the party here, but why is this not an option? test "aoeu" {
var ptr = @intToPtr(?*i32, 0);
} |
To reiterate, the compiler infrastructure to support
|
I dislike this new pointer flavour, i think that pointers in zig are overcomplex (especially upcoming(?) zero-terminated data pointer). Also this problem looks like a design flaw of the language, because implementation (or optimisation) of optional pointers interferes with ability to use full address space when not restricted by target platform. Also any real-world use case for zeroable pointers that i can think of is low level memory management in a kernel. For such a rare task probably this pointer kind should be accessible via builtin function. const text_boot = @intToPtr(@ZeroablePtr([*]const u8), 0)[0..@ptrToInt(&__end_init)]; // does not look too bad Or implementation/optimisation of pointers could be platform dependent. When targeting linux/win/mac/bsd pointers would be optimised and when targeting freestanding pointers would be defined like any other optional type (maybe with compiler option to turn on optimisation). Edit: couldn't find issue about pointers to null terminated data, so im gonna share my thoughts here. |
See #265
It's not just legacy APIs, e.g. look at any linux syscall that takes a path. |
i don't like this proposal but i do think it falls in line with "Communicate intent precisely." |
Hey everyone, thanks for pushing back. I think I jumped the gun on this one, and I appreciate the feedback.
The example use case does this: const text_boot = @intToPtr([*]const u8, 0)[0..@ptrToInt(&__end_init)];
for (text_boot) |text_boot_byte, byte_index| { If you put a
That's a fair question. Which of the other two proposals (see #1952) to solve this problem do you prefer? Do you have a fourth idea in mind?
I'm very interested in your opinions on null terminated pointers - would you mind repeating your comment on #265 ? I removed the accepted label and re-opened #1952 for re-consideration. We do have a problem to solve here. In status quo zig, pointers which have address 0, but are dereferencable, are causing unchecked undefined behavior. This is not relevant for most OS targets, but it is relevant for the freestanding target, which is a first class citizen of zig use cases. One problem I thought of with this |
Here are my two cents on pointers and optionals. To me, the most correct thing to do (in terms of sane semantics) is to always use 1 extra bit for all optionals. Then there are never any surprises. We don't need
We already have slices which are an alternative to null-terminated pointers. Slices are 16 bytes (on 64bit) where null terminated pointers are only 8 (+ @sizeof(T). We need one extra element). Null-terminated pointers use less memory (at least sometimes) but their semantics are hard to use so we avoid them in Zig. I think this null pointer optimization falls under the same category. We take a small hit, but in return, we get simpler semantics that has fewer edge cases. I don't think optional pointer are very common in Zig code, so I don't think this optimization is really that big anyways (can someone show me code that allocates a lot of optional pointers? Is it really necessary?). |
Sharing an irc log from earlier:
|
Similar to @Rocknest's proposal
We could have allowzero be enablable for all pointers in a binary with a top level option. Then the gzip library still works in bootloaders, as long as the library doesn't assume that It does seem like more of a target configuration rather than a pointer configuration. |
Pointers & slices also have unintuitive behaviour when used with undefined values. In #1831 following example mentioned as not working: var b: []const u8 = ([*]const u8)(undefined)[0..0]; // len = 0, ptr = undefined
var b_opt: ?[*]const u8 = b.ptr; // isNull = false, ptr = undefined
@import("std").debug.assert(b_opt != null); // undefined is not the same as null! If i understand correctly this is also caused by pointer optimisation. So this is a case against the whole 0-special-casing of pointers even on targets such as linux/windows. Also wouldn't checking undefined optional pointer trigger undefined behaviour? (undetectable at even at runtime?) fn isNullPtr(ptr: ?*T) bool {
return (ptr != null);
}
// somewhere else called as
var aPtr: ?*T = undefined;
_ = isNullPtr(aPtr); // what happens now? could be this detected at debug runtime without invisible field &etc. I dont know how undefined optional pointers could be useful, but also there is u0. For me it is important to have sound and intuitive semantics. |
@andrewrk one solution is to do it manually with usize, another one is maybe use the new C pointer type |
The docs for the C pointer type are going to say "never choose this type for your pointers." It's only for interop with auto-parsed C headers. Using the new C pointer type is not a valid solution to the use case. |
Thats good, it leaves us with 'only one obvious way to do things' - convert to usize and make assumptions about special cases by yourself (maybe not so obvious). |
Here are some things that I would consider to be settled:
These questions remain:
I hate to say it but upon reconsidering all this stuff all day today, I'm starting to think that the best option is
|
I'm accepting this issue again. Still willing to discuss this, but if anyone wants to push back, they'll have to solve all the problems in their counter proposal. Until that happens this is the planned way forward. I want to note that I don't think this issue even would affect anyone in this thread except me since @kristate I think your ideas about safety-checked undefined behavior is its own separate proposal that has to do with language-level assertions and how they should work. I invite you to start a new proposal to talk about what constructs should return errors, or have safety-checked undefined behavior. |
EDIT: I've now created #1959 for my counter-proposal, which is less of a slog to read through, and hopefully still contains sound reasoning, though it doesn't address all the discussion points here as I think I did in this comment originally. TL;DWR (this comment/post got really long again, my apologies, but I don't want to trim anything and obfuscate some obvious logic mistakes I might have made along the way):
To reiterate: Zig's stance previously was not to use null pointers, and as such I don't see the motivating reason for introducing the null pointer value/concept into Zig pointers, beyond these two distinct use cases:
I thought I'd share some details (regarding C via the C standard(s)) that maybe everyone already knows, but shouldn't be overlooked in this conversation. (Note that I have no insight or knowledge about LLVM; to me previous comments suggest that throughout (most?) of their pipeline null pointers do coincide with all-bit-0 representations. Any knowledge all of you have surpasses mine.)
Why do we need null pointers? Zig's stance on null pointers was made very clear as "strongly against" (I believe), although the actual runtime benefits of smaller storage size should not be dismissed completely, so discussion for reintroducing the lean representation of optional pointers by sacrificing one address should still be held. On this front, if LLVM's intrinsics around these attributes (reminder: I know nothing about them) are hyper-optimized for an all-bit-0 representation, the correct choice might be to use that. (Additionally, C-pointer to Zig-pointer conversions can be no-ops, if the representations for the involved pointer types coincide.) There is a pragmatic solution to finding a null pointer value (as in address corresponding to specific representation) for Zig-pointers:
The first is trivial for any pointer with alignment > 1, and completely realizable on current x64 systems f.e. by misusing their purposefully-limited pointer representation. To be honest, I lack to see the issue brought up in the library use case, via the "gunzip" example.
|
Looks like I joined too late / took too long. In response to @andrewrk 's second-last comment: |
@rohlem the C ABI that Zig uses is determined by the "environment" part of the target. You can use
(Side note, I should open a proposal to rename "environment" to "C ABI" and audit the "unknown" one) As far as I'm aware, all of these C ABIs use 0 as the NULL pointer. Are you suggesting to support an additional C ABI, which uses something other than 0 for NULL? What is it called, and on what systems is it used? |
@andrewrk If it's true that all supported target "environments"/C-ABIs agree that NULL has both representation and value (address) 0, then this silent assumption from before was correct. It was more of a detail of exposition for my argument. (TL;DR: Sry, my comment got bloated again. Bottom line: I don't think there are motivating use cases for letting users decide "allowzero" on a per-variable basis (cases 1, 2 vs 4, 3 below), and I think it is reasonably possible to either find an invalid pointer representation, or "ban" a single addressable byte/word, even on freestanding systems (making cases 2 and 4 below obsolete) - note this doesn't have to be address 0. ) The more important thing to note is that, being the null pointer, "distinct from any pointer to data/function", this value (apparently address 0) cannot be used in standard C to load from, nor store to the corresponding data (and therefore, allowing it for C-pointers in Zig seems counter-intuitive to me). That is why this optimization works for C. If we want the same optimization for optional pointers in Zig, it makes sense to use this same address to make conversions easier. In this case we lose the ability to represent this address using optional pointers, just as C pointers cannot (legally) represent this address. This is not actually an issue, however, because: Pointer operations are meaningless for both the null pointer (as C defines it "pointing to neither function nor data"), as well as an unpopulated optional pointer (the same way arithmetic is "meaningless" on unpopulated optional integers).
As might be obvious, the first two combinations seem the most meaningful to me. |
Did you see these bullet points in my comment?
This is coherent with what you're saying about NULL in C. For C pointers address 0 represents an invalid pointer. It's allowed in C pointers, not so that you can read or write to it, but because it more closely matches the semantics of actual C pointers. C code can be translated into Zig code like this: #include <stddef.h>
static int *foo(void) {
return NULL;
}
static void bar(void) {
int *ptr = foo();
if (ptr != NULL) {
*ptr += 1;
}
} pub fn foo() [*c]c_int {
return 0;
}
pub fn bar() void {
var ptr: [*c]c_int = foo();
if (ptr != NULL) {
ptr.* += 1;
}
}
pub const NULL = if (@typeId(@typeOf(0)) == @import("builtin").TypeId.Pointer) @ptrCast([*c]void, 0) else if (@typeId(@typeOf(0)) == @import("builtin").TypeId.Int) @intToPtr([*c]void, 0) else ([*c]void)(0); Yep that pub fn foo() ?[*]c_int {
return null;
}
pub fn bar() void {
var ptr: ?[*]c_int = foo();
if (ptr != @ptrCast(?[*]c_int, (?*c_void)(0))) {
ptr.?.* += 1;
}
}
This may be true in C but it's not true in LLVM and in Zig we can make up whatever rules we want.
I don't see your comment taking into account the use case I explicitly mentioned in this thread, or my comment #1953 (comment)
|
Some interesting information that i found:
|
|
However zig's pointer are basically C's pointers extended with type safe comptime attributes, but inside they are backed up by C's vague spec (because of seamless interop of zig<->c pointers). Edit: are pointers received from C checked for alignment etc.? |
This is incorrect. Zig pointers are defined by the zig language specification (#75) and interop with C pointers is defined by the C ABI part of the selected target. (currently named "environ") Let me repeat: the C specification has no bearing on Zig. Let's make sure that rumor doesn't start flying around. |
Oh okay, but what about new special C pointer? |
WebAssembly loads modules into virtual linear memory starting from address 0, it seems that linkers required to put some useless thing at the start if a language wishes to use 0 as the null pointer. |
After the recent discussion in IRC I realized I was indeed misinterpreting just how specialized this change would be, and if extending the type system really is the easiest solution, it seems a valid approach (knowing most code won't make use of it). I just thought of an edge case with (I assume) undefined behaviour I wanted to mention here, so it is at least documented:
Additionally, I assume a builtin (I assume checking the result of non- |
I still like @rohlem's original idea:
However if a special pointer is easier to implement (and this special type is going to be used only once in a kernel) then maybe it is good idea to hide it in a builtin function ( |
@Rocknest Here's how I understand the reasoning behind the proposal:
While I still think yet another type flag makes the language harder to reason about, the alternative (not marking these exceptional uses with
|
I don't understand the use case still. The code that you showed as the use case works as long as the pointer is not declared as optional? Why would you want it to be optional in that use case? How can a hardcoded address be optional? You are filling out the pointer contents with an address in the declaration. The LLVM docs referenced in #1952 say that while loads from a pointer containing the address 0 is undefined behavior, it is never checked or enforced. Realistically that means it will just try to load from it, right? So a comptime known hardcoded pointer with an address of zero could just be the special case? Also, the problem could be skipped for this special use case by just doing a compare in assembly, then doing the rest of the addresses starting at one or four or whatever your alignment needs are in Zig. That is not a huge burden on the programmer that deals with issues like this. Encapsulate the ugly business in a function call and it even looks pretty. I feel like I must be missing the point. |
@vegecode Even if the address being 0 is Using assembly to work around the problem makes the code non-portable; plus it requires the programmers to learn and understand the respective assembly. At the end of the day, this proposal is completely self-contained. If it turns out not to be a viable solution, for whatever reason, nobody outside of the intended target audience will even have had to notice it in hindsight. |
@rohlem First of all, thank you for taking the time to respond. I appreciate feedback always. You make good points, although I think you missed my main point by focusing on the assembly section. Also, this use case is non portable. It's dealing with the machine directly, not the abstractions layered on top. Secondly, you misrepresent my argument by compressing multiple statements in my previous comment to one inaccurate paraphrase and then placing it in quotes. That is rude enough, but I can understand if that is coming from misunderstanding my point or from a lack of clarity in my description. You took it a step further even by then elevating your mischaracterization of my thoughts on this specific use case to an overall "mentality." If you meant nothing by it, know that it was not taken as nothing. My main point was that a hardcoded comptime int address pointer could be the exception where null wasn't checked for. If you're using those, you should know what you're doing. Also it would be one less piece to add to the language. Overall I think the proposal is a fair one, and perfectly reasonable. The only thing that rubbed me the wrong way was your response. I won't hold it against you, but I think it's important to be upfront when a social interaction is sub-optimal. |
@vegecode First of all, I'm sorry to have offended or even mischaracterized you (/ your statements) in any way. I tend to (apparently mis-)use quotes often, not to claim paraphrasing someone else's statement, but to point out I'm heavily reducing from surrounding context in these cases, and that those elements are not to be taken literally, but understood and taken in context. I understand and agree with your fundamental stance that a fixed, comptime-known int/address is not optional. In this way it reflects my earlier proposal of "just banning optional Also, while I admittedly don't know a lot about embedded systems, to me it seems the special handling of null pointers was an original, "artificial" invention of the C language, influencing LLVM's mechanisms surrounding it. In this context, I seem to be missing the point of why the use case of dereferencing address 0 were any less portable than using a pointer to dereference any other address. Thank you for pointing out the/my miscommunication, and I'm sorry should parts of this comment still read as if I valued my own opinions and logic over anyone else's view on the matter (which is, generally, a stupid mindset I'd want to avoid having or representing, if possible). |
@rohlem Thank you very much for responding so thoughtfully. It makes sense to value your own thoughts and opinions above others, just not to denigrate other's opinions, which I can see was not your intention. I think one of the best things about the small Zig community so far is that everyone is open to the free exchange of ideas and mature about handling disagreements when they arise. Thanks for being excellent and empathetic! |
|
I'm extracting this proposal from #1952. This proposal conflicts with it, and only this one or that one can be accepted.
Add a pointer attribute to zig pointer types marking a pointer as possibly having the address 0:
Pointers which do not have this attribute are not allowed to have the address 0, which is why
?*i32
is guaranteed to have the same bit representation as*i32
. But if you made an optional out of anallowzero
pointer, like this:?*allowzero T
then it would add a bit of data to the type, just like it would for a?usize
value.C pointer types (See #1059) would always allow zero, and the attribute would be a compile error. If we did this, then
@intToPtr
would gain safety safety checked undefined behavior for if the address is 0 and the destination pointer type does not have theallowzero
attribute.This OS kernel code uses a pointer with address 0 and in status quo zig it is in danger of invoking unchecked undefined behavior for the reasons explained in #1952. After this proposal, it would cause safety-checked undefined behavior (crash in debug mode) which would then be fixed by adding the
allowzero
attribute to the pointer.The text was updated successfully, but these errors were encountered: