-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: type for null terminated pointer #265
Comments
How would that work? Wouldn't that be an unbounded linear search? |
Maybe if you cast from That's true though, for pointers we can't really have this check. |
Proposal rejected in order to discourage use of null terminated things. |
How do you plan to interface with C-style null-terminated strings? |
Can you elaborate with this question? We have the cstr module for getting string length and converting to a slice. |
Solved thanks to that. |
Reopening for #518. My proposal is to leave the type unnamed; you have to refer to it as Instances of the type should not be usable in any way, except for implicitly converting to Literals with assert(x.len > 0);
for (x[0..x.len - 1]) |b| {
assert(b != 0);
}
assert(x[x.len - 1] == 0); There is no way to create a There are many cases where you don't want to bother with this safety check, but you still want a pub fn fromSliceUnsafe(x: []const u8) -> @typeOf(c"") {
@setDebugSafety(this, false)
return @typeOf(c"")(x);
}
pub fn fromPointerUnsafe(x: &const u8) -> @typeOf(c"") {
return fromSliceUnsafe(x[0..0]);
} The name "Unsafe" should properly inform people that they're taking a risk using these utilities. I'm not sure we need any solution to null-terminated arrays of things other than |
I think this counter proposal has 2 competing ideas:
These don't go well together. I think the original proposal is better, where the type is named. There's also no reason to limit it to I can probably find a Windows API that uses NULL as a sentinel for an array of pointers. The
So the generated code would first have to do essentially a strlen to find the length, use that to convert to a slice, then cast that to the null-terminated type, and then the debug safety check would do a redundant strlen? That doesn't seem right. If we had the
|
From llvm/include/llvm/Support/MemoryBuffer.h /// This interface provides simple read-only access to a block of memory, and
/// provides simple methods for reading files and standard input into a memory
/// buffer. In addition to basic access to the characters in the file, this
/// interface guarantees you can read one character past the end of the file,
/// and that this character will read as '\0'.
///
/// The '\0' guarantee is needed to support an optimization -- it's intended to
/// be more efficient for clients which are reading all the data to stop
/// reading when they encounter a '\0' than to continually check the file
/// position to see if it has reached the end of the file. |
That may be a better place to put my comment - |
@jido In C, nothing prevents you from removing all null terminators from a char array, and then passing it to strlen. This is the motivation for the null terminated type. The compiler will insert a null terminator at x.len to prevent this bug. You, as a user of a null terminated type can still insert null terminators between 0 and x.len and strlen will stop at your null terminator instead of the one at x.len. The one at x.len is there for safety, and trying to override it is undefined behavior (aka a runtime crash in debug mode). |
Moving some work from Pointer Reform (#770) to here:
|
This is definitely an interesting proposal which I would invite you to file separately from this null terminated pointers issue (and indeed if it was accepted, it could potentially reverse the decision on this one). I think you've made a pretty strong case for userland null-terminated pointer. It's enough to make me go back and consider all the use cases that I have for it. I don't know if I'm ready to reverse the "accepted" label here yet, but I'll admit that I'm going back and questioning this decision. |
That sounds good! I'd be happy to write up a proposal and work on implementing it as well. Also, I just tried this out in Compiler Explorer (updated) and it seems like a one-member extern struct already gets passed the same way as a plain int on x86_64, so the suggestion here could be tentatively evaluated outside stdlib without waiting for that idea to be implemented. |
For the record, what I was thinking by this was that the template would take a struct specifying pointer attributes and you'd write one typedef per set of parameters. Without working out details, something like const PtrAttrs = struct {
is_const: bool,
is_volatile: bool,
// etc.
}
const c_str_const = TermPtr(u8, '\0', PtrAttrs { is_const: true, is_volatile: false });
const w_str_mut = TermPtr(u16, '\0', PtrAttrs { is_const: false, is_volatile: false });
// etc. This should generalize to various custom pointer types. |
Ability to define types such as 0 terminated pointers would a powerful addition to the language. Also it would probably solve #1595. How it could possibly look like: pub fn TermPtr(comptime T: type, comptime termValue: T) type {
return @TransparentStruct(struct { // builtin ensures that there is only one field
ptr: [*]T // could be any type, for example 'c_int' to pass type safe values to some c api
fn from(comptime slice: []T) @This() {
return @This() { .ptr = (slice ++ termValue).ptr };
}
});
} Not sure about modifiers such as const, align. Maybe something like this: pub fn TermPtr(comptime T: type, comptime termValue: T) type {
return @TransitiveStruct(struct { // allows to forward type modifiers from the outside
ptr: [*]T
// methods
});
}
///////
const c_str = TermPtr(u8, 0x00);
extern fn someApi(path: const c_str); // the type here is '[*]const T', but it is type safe It could even allow to use operators on such type, however it should be optional. For example |
@matthew-mcallister It seems important to me that C types that can be represented in C, can be represented in Zig language without having to import any code. This precedent is set for C integer types, float types, and C pointer types. There must be a C string literal, or Zig string literals must work for C APIs. |
Continuing from IRC: To echo the common usage of null terminated strings, I think the length should always be computed at runtime (at least before the optimizer kicks in). I propose: null slices
null arrays
null literals
misc notes
|
@daurnimator Why not keep I feel like null-terminated literals can still conceivably be handled well by a builtin function. Say |
why would we need |
I meant those names as placeholders; I only added the |
coming in late to this, but wouldn't it be easier to create a hybrid string type? |
Thought it might be helpful to share my experience with this problem. In D I implemented a module to support null-terminated strings. https://github.com/dragon-lang/mar/blob/master/src/mar/sentinel.d The general term for arrays that end in a particular value that I've found is a "sentinel array" and or "sentinel ptr". In D I just implemented them as a wrapper struct arround pointers and arrays. The 2 main benefits I see from having sentinel pointers as a part of the type system are:
My library solution for this in D would be equivalent to something like this in Zig: pub fn SentinelPtr(comptime T: type) type {
return struct {
ptr: [*]T,
// create a sentinel pointer from `ptr`, assume it ends in a sentinel value
pub fn init(ptr: [*]T) {
return @This() {
.ptr = ptr,
}
}
};
}
pub fn SentinelSlice(comptime T: type) type {
return struct {
array: []T,
// create a sentinel slice from `slice`
pub fn init(slice: []T) @This() {
std.debug.assert(slice.ptr[slice.len] == 0);
return assume(slice);
}
// create a sentinel slice from `slice`, assume it ends in a sentinel value
pub fn assume(slice: []T) @This() {
return @This() {
.array = array,
};
}
};
} It just boils down to wrapping the pointers/slices inside structs and creating helper functions to create/unwrap them. This is one way to implement it, though, if you do it in a library like this then it will be a bit more verbose than a language solution, and you'll probably want to find a way to allow the types to perform automatic const conversion, i.e. var chars = [2]u8;
chars[0] = 'a';
chars[1] = '\0';
var x = SentinelPtr(u8).init(chars.*);
var y : SentinelPtr(const u8) = x; // is there a way to support this in zig? |
Just to extend and visualize @Rocknest 's syntactical proposal: []const u8
[*]const u8
[5]const u8
[_]const u8
[*c]const u8
[null]const u8
[*null]const u8
[5 null]const u8
[_ null]const u8 I appreciate having the |
But e.g. |
Ah, true |
Currently the type of
c"aoeu"
is*const u8
.Instead, the type should indicate that the pointer is null terminated. Here are two ideas to represent that:
*0 const u8
*null const u8
This type would be implicitly castable to
*const u8
. You can explicitly cast the other way, and in debug mode this inserts a safety check to make sure there actually is a null byte there.It should probably work for any type that supports
T == 0
orT == null
.We want to steer users away from this type and instead use
[]const u8
, which includes a pointer and a length. However, we still have to deal with null terminated things from C land, which makes this useful, and some kernel interfaces. For example, we currently have this:Having the
open_c
prototype be*0 const u8
would make it more type-safe. Further, we could provide anopen
function that supported either type forpath
, and if it happened to be null terminated then it could avoid the stack allocation.We could also make the type of string literals be
[]0 const u8
meaning that the pointer value for the slice has a 0 after the last byte. The length would still indicate the memory before the null byte. If you slice this type then the pointer component would change from*0 const u8
to*const u8
.It would be extra helpful if automatic .h import could identify when a pointer in a function is supposed to be null-terminated, and we could emit a compile error if the user passes a pointer that is not null terminated. I'm not sure how we could detect this automatically though.
The text was updated successfully, but these errors were encountered: