proposal: type for null terminated pointer #265

andrewrk · 2017-02-23T21:31:32Z

Progress

Currently the type of c"aoeu" is *const u8.

Instead, the type should indicate that the pointer is null terminated. Here are two ideas to represent that:

*0 const u8
*null const u8

This type would be implicitly castable to *const u8. You can explicitly cast the other way, and in debug mode this inserts a safety check to make sure there actually is a null byte there.

It should probably work for any type that supports T == 0 or T == null.

We want to steer users away from this type and instead use []const u8, which includes a pointer and a length. However, we still have to deal with null terminated things from C land, which makes this useful, and some kernel interfaces. For example, we currently have this:

pub fn open_c(path: *const u8, flags: usize, perm: usize) -> usize {
    arch.syscall3(arch.SYS_open, usize(path), flags, perm)
}

pub fn open(path: []const u8, flags: usize, perm: usize) -> usize {
    const buf = @alloca(u8, path.len + 1);
    @memcpy(&buf[0], &path[0], path.len);
    buf[path.len] = 0;
    return open_c(buf.ptr, flags, perm);
}

Having the open_c prototype be *0 const u8 would make it more type-safe. Further, we could provide an open function that supported either type for path, and if it happened to be null terminated then it could avoid the stack allocation.

We could also make the type of string literals be []0 const u8 meaning that the pointer value for the slice has a 0 after the last byte. The length would still indicate the memory before the null byte. If you slice this type then the pointer component would change from *0 const u8 to *const u8.

It would be extra helpful if automatic .h import could identify when a pointer in a function is supposed to be null-terminated, and we could emit a compile error if the user passes a pointer that is not null terminated. I'm not sure how we could detect this automatically though.

The text was updated successfully, but these errors were encountered:

thejoshwolfe · 2017-04-03T20:38:13Z

in debug mode this inserts a safety check to make sure there actually is a null byte there.

How would that work? Wouldn't that be an unbounded linear search?

andrewrk · 2017-04-03T20:40:17Z

Maybe if you cast from []T to []0 T then len then we check for 0 in the last spot (since we have len) and then in the new slice, len is -1.

That's true though, for pointers we can't really have this check.

andrewrk · 2017-04-03T22:11:58Z

Proposal rejected in order to discourage use of null terminated things.

AndreaOrru · 2017-05-09T17:02:50Z

How do you plan to interface with C-style null-terminated strings?

andrewrk · 2017-05-09T18:55:15Z

Can you elaborate with this question? We have the cstr module for getting string length and converting to a slice.

AndreaOrru · 2017-05-10T09:22:43Z

Solved thanks to that.

thejoshwolfe · 2017-10-02T16:45:02Z

Reopening for #518.

My proposal is to leave the type unnamed; you have to refer to it as @typeOf(c""). This discourages people from using the type, and it clearly associates the type with the c"" syntax.

Instances of the type should not be usable in any way, except for implicitly converting to &const u8. No slicing syntax x[0..5], no array subscripting syntax x[0], no field access x.foo.

Literals with c"foo" syntax should obviously be of this type. Nothing implicitly casts to this type. Creating an instance of this type should be done through @typeOf(c"")(x), where x is of type []const u8. This is the only explicit conversion that should be allowed. In safety mode, the conversion should do this safety check:

assert(x.len > 0);
for (x[0..x.len - 1]) |b| {
    assert(b != 0);
}
assert(x[x.len - 1] == 0);

There is no way to create a @typeOf(c"") directly from a pointer. You have to slice the pointer, explicitly giving it a length, and then convert it. This way the safety check is always bounded.

There are many cases where you don't want to bother with this safety check, but you still want a @typeOf(c""). For that we can add these functions to the cstr module:

pub fn fromSliceUnsafe(x: []const u8) -> @typeOf(c"") {
    @setDebugSafety(this, false)
    return @typeOf(c"")(x);
}
pub fn fromPointerUnsafe(x: &const u8) -> @typeOf(c"") {
    return fromSliceUnsafe(x[0..0]);
}

The name "Unsafe" should properly inform people that they're taking a risk using these utilities.

I'm not sure we need any solution to null-terminated arrays of things other than u8. These cases exist, but I don't think we need special support for them.

andrewrk · 2017-10-02T17:27:56Z

I think this counter proposal has 2 competing ideas:

Introduce a type safety feature that makes it more ergonomic to have safer code.
Make the feature intentionally ugly so that using it is unattractive

These don't go well together. I think the original proposal is better, where the type is named.

There's also no reason to limit it to u8. A null terminated array is not inherently an evil C concept that is intruding in the Zig language. It's a general data storage technique that is valid for some memory constrained use cases.

I can probably find a Windows API that uses NULL as a sentinel for an array of pointers. The argv in libc main would be represented with &null ?&null u8. (And not just libc - this is how it is represented in the official x86_64 ABI specification).

There is no way to create a @typeOf(c"") directly from a pointer. You have to slice the pointer, explicitly giving it a length, and then convert it. This way the safety check is always bounded.

So the generated code would first have to do essentially a strlen to find the length, use that to convert to a slice, then cast that to the null-terminated type, and then the debug safety check would do a redundant strlen? That doesn't seem right.

If we had the[]null u8 type as originally proposed, this would be more straightforward. I think a better way this can go is:

Set the null terminator in memory.
Cast the &u8 to []null u8. This cast does linear search for the terminator and sets len appropriately.
Now @typeOf(slice.ptr) == &null u8

andrewrk · 2017-10-16T01:31:15Z

From llvm/include/llvm/Support/MemoryBuffer.h

/// This interface provides simple read-only access to a block of memory, and
/// provides simple methods for reading files and standard input into a memory
/// buffer.  In addition to basic access to the characters in the file, this
/// interface guarantees you can read one character past the end of the file,
/// and that this character will read as '\0'.
///
/// The '\0' guarantee is needed to support an optimization -- it's intended to
/// be more efficient for clients which are reading all the data to stop
/// reading when they encounter a '\0' than to continually check the file
/// position to see if it has reached the end of the file.

jido · 2018-03-27T21:16:57Z

That may be a better place to put my comment -
Does the null/0 have to be at index x.len? C strings are stored in fixed length array but the string length can vary, it is not necessarily equal to the array size. The same applies to null-terminated C arrays.

Hejsil · 2018-03-27T21:38:52Z

@jido In C, nothing prevents you from removing all null terminators from a char array, and then passing it to strlen. This is the motivation for the null terminated type. The compiler will insert a null terminator at x.len to prevent this bug. You, as a user of a null terminated type can still insert null terminators between 0 and x.len and strlen will stop at your null terminator instead of the one at x.len. The one at x.len is there for safety, and trying to override it is undefined behavior (aka a runtime crash in debug mode).

andrewrk · 2018-06-06T04:51:03Z

andrewrk · 2019-02-16T06:37:42Z

I would definitely like to see that happen whichever way null-terminated pointers are implemented.

This is definitely an interesting proposal which I would invite you to file separately from this null terminated pointers issue (and indeed if it was accepted, it could potentially reverse the decision on this one).

I think you've made a pretty strong case for userland null-terminated pointer. It's enough to make me go back and consider all the use cases that I have for it. I don't know if I'm ready to reverse the "accepted" label here yet, but I'll admit that I'm going back and questioning this decision.

matthew-mcallister · 2019-02-16T07:31:48Z

This is definitely an interesting proposal which I would invite you to file separately from this null terminated pointers issue (and indeed if it was accepted, it could potentially reverse the decision on this one).

That sounds good! I'd be happy to write up a proposal and work on implementing it as well.

Also, I just tried this out in Compiler Explorer (updated) and it seems like a one-member extern struct already gets passed the same way as a plain int on x86_64, so the suggestion here could be tentatively evaluated outside stdlib without waiting for that idea to be implemented.

matthew-mcallister · 2019-02-16T07:44:54Z

In principle, you can always make the type template function more flexible and add typedefs for common cases.

For the record, what I was thinking by this was that the template would take a struct specifying pointer attributes and you'd write one typedef per set of parameters. Without working out details, something like

const PtrAttrs = struct {
    is_const: bool,
    is_volatile: bool,
    // etc.
}
const c_str_const = TermPtr(u8, '\0', PtrAttrs { is_const: true, is_volatile: false });
const w_str_mut = TermPtr(u16, '\0', PtrAttrs { is_const: false, is_volatile: false });
// etc.

This should generalize to various custom pointer types.

Rocknest · 2019-02-16T19:16:29Z

Ability to define types such as 0 terminated pointers would a powerful addition to the language. Also it would probably solve #1595.

How it could possibly look like:

pub fn TermPtr(comptime T: type, comptime termValue: T) type {
    return @TransparentStruct(struct { // builtin ensures that there is only one field
        ptr: [*]T // could be any type, for example 'c_int' to pass type safe values to some c api
        fn from(comptime slice: []T) @This() {
            return @This() { .ptr = (slice ++ termValue).ptr };
        }
    });
}

Not sure about modifiers such as const, align. Maybe something like this:

pub fn TermPtr(comptime T: type, comptime termValue: T) type {
    return @TransitiveStruct(struct { // allows to forward type modifiers from the outside
        ptr: [*]T 
        // methods
    });
}
///////
const c_str = TermPtr(u8, 0x00);
extern fn someApi(path: const c_str); // the type here is '[*]const T', but it is type safe

It could even allow to use operators on such type, however it should be optional. For example [index] operator on c_str is ok, but + on type safe c_int wrapper for c api is not ok.

andrewrk · 2019-02-19T00:17:10Z

@matthew-mcallister It seems important to me that C types that can be represented in C, can be represented in Zig language without having to import any code. This precedent is set for C integer types, float types, and C pointer types. There must be a C string literal, or Zig string literals must work for C APIs.

daurnimator · 2019-02-19T04:15:33Z

Something I hadn't considered yet:
* Would the null terminated slice/array types assert that a null/0 byte does _not_ occur in any of the elements before the `len` index?
If so, this would guarantee the property that after casting a []null T to [*]null T, finding the length based on the null termination would give the same len value as before. However this would mean that casting []null T to [*]null T should have a runtime safety check, which probably means it shouldn't be an implicit cast. Hmm.

Or the type could have a weaker guarantee, which is only that there shall be a null/0 byte at the len index, and makes no guarantees about items otherwise. However, then the "length" may change when implicitly casting from []null T to [*]null T.

Continuing from IRC:

To echo the common usage of null terminated strings, I think the length should always be computed at runtime (at least before the optimizer kicks in). I propose:

null slices

[]null T (null slice) should be struct { ptr: [*]T } where .len performs a strlen-like operation.
[]null T implicitly casts to [*]T
[]null T can be 'cast' to [*]T by simply doing nullslice.ptr
[*]T to []null T needs an explicit cast (via @ptrCast)
[]null T can be 'cast' to []T via nullslice[0..nullslice.len]. The .len here is invoking a strlen-like operation.
For safety, you could make indexing a null slice do a length check: null_u8_slice[5] could emit code that does: assert(strnlen(null_u8_slice.ptr, 5) == 5) before the access.

null arrays

[N]null T is an array of max size N where the first null should be considered the length.
It is similar to []null T except uses a strnlen-like instead of a strlen-like.
It is valid to read [N]null T at N. You are guaranteed to get null.
[N]null T uses @sizeOf(T)*(N+1) space (or possibly less depending on array packing if T is e.g. u1?)
[N]null T implicitly casts to []null T
[N]null T implicitly casts to [*]T
It is a compile error to write to index N.
It is valid to read or write to a [N]null T at any index in the range [0..N)

null literals

c"aoeu" is of type [4]null const u8
(c"aoeu").len == 4
@sizeOf(c"aoeu") == 5
(c"ao\0eu").len == 2
@sizeOf(c"ao\0eu") == 6

misc notes

[*]null T doesn't exist. the length of a null array always "known": it's before the first null!

matthew-mcallister · 2019-02-20T05:03:43Z

@daurnimator Why not keep [*]null T/[*c]null T and dispense with the others? If the intended purpose of this feature is to decorate pointers in C FFI definitions, then that would be the minimal solution, plus it would encourage use of "real" slices in Zig code and discourage overuse of strlen.

I feel like null-terminated literals can still conceivably be handled well by a builtin function. Say c"hello" produces a raw [*]null u8 pointer. Then a hypothetical @cstrToSlice function can safely check for the null and make a Zig slice at compile time. Or all string literals can have an implicit terminating null and @sliceToCstr will check that its argument has the null (and no interior nulls?) and return a [*]null u8.

andrewrk · 2019-02-20T12:41:01Z

why would we need @cstrToSlice? std.mem.len already works fine, it'll just have an improved prototype that uses [*]null T instead of [*]const T. @sliceToCstr as you described it could also easily be a userland function.

matthew-mcallister · 2019-02-20T23:35:25Z

I meant those names as placeholders; I only added the @ to satisfy the requirement that you wouldn't need to import from std. As far as my suggestion goes, it's agnostic to how they're implemented.

Androthi · 2019-04-15T00:57:29Z

marler8997 · 2019-06-18T17:57:08Z

Thought it might be helpful to share my experience with this problem. In D I implemented a module to support null-terminated strings.

https://github.com/dragon-lang/mar/blob/master/src/mar/sentinel.d

The general term for arrays that end in a particular value that I've found is a "sentinel array" and or "sentinel ptr". In D I just implemented them as a wrapper struct arround pointers and arrays.

The 2 main benefits I see from having sentinel pointers as a part of the type system are:

functions that take sentinel pointers can declare this requirement, meaning that if a client passes a non-sentinel pointer then they will get a compile error rather than a runtime bug
it allows the application to control when and how sentinel arrays are allocated, rather than having to convert normal zig arrays to sentinel arrays every time a C function is called

My library solution for this in D would be equivalent to something like this in Zig:

pub fn SentinelPtr(comptime T: type) type {
    return struct {
        ptr: [*]T,
        // create a sentinel pointer from `ptr`, assume it ends in a sentinel value
        pub fn init(ptr: [*]T) {
            return @This() {
                .ptr = ptr,
            }
        }
    };
}
pub fn SentinelSlice(comptime T: type) type {
    return struct {
        array: []T,
        // create a sentinel slice from `slice`
        pub fn init(slice: []T) @This() {
            std.debug.assert(slice.ptr[slice.len] == 0);
            return assume(slice);
        }
        // create a sentinel slice from `slice`, assume it ends in a sentinel value
        pub fn assume(slice: []T) @This() {
            return @This() {
                .array = array,
            };
        }
    };
}

It just boils down to wrapping the pointers/slices inside structs and creating helper functions to create/unwrap them.

This is one way to implement it, though, if you do it in a library like this then it will be a bit more verbose than a language solution, and you'll probably want to find a way to allow the types to perform automatic const conversion, i.e.

var chars = [2]u8;
chars[0] = 'a';
chars[1] = '\0';
var x = SentinelPtr(u8).init(chars.*);
var y : SentinelPtr(const u8) = x; // is there a way to support this in zig?

hryx · 2019-07-03T01:45:31Z

Just to extend and visualize @Rocknest 's syntactical proposal:

[]const u8
[*]const u8
[5]const u8
[_]const u8
[*c]const u8
[null]const u8
[*null]const u8
[5 null]const u8
[_ null]const u8

I appreciate having the null inside the brackets because it describes a property of the array/slice type itself (like 5/_/*), as opposed to the const which qualifies the element type.
(Minor thing)> 🐤

daurnimator · 2019-07-03T01:48:28Z

I appreciate having the null inside the brackets because it describes a property of the array/slice type itself (like 5/_/*), as opposed to the const which qualifies the element type.

But e.g. align is on the outside of the []

hryx · 2019-07-03T01:49:27Z

But e.g. align is on the outside of the []

Ah, true

andrewrk · 2019-11-25T07:26:23Z

Implemented in #3728, landed in 5a98dd4.

Related follow-up issues:

make null terminated pointer syntax support any sentinel value, not just null #3731
slicing sentinel-terminated array or pointer with .. as end index should give a sentinel-terminated slice #3766
Finish taking advantage of sentinel-terminated pointers in the standard library #3767
remove type coercion from array values to references #3768

andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Feb 23, 2017

andrewrk modified the milestone: 0.2.0 Mar 26, 2017

andrewrk closed this as completed Apr 3, 2017

tiehuis added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. rejected labels Sep 15, 2017

andrewrk mentioned this issue Oct 2, 2017

usecase: memory-bound design using fixed-size allocators #518

Closed

thejoshwolfe reopened this Oct 2, 2017

andrewrk removed the rejected label Oct 2, 2017

andrewrk added the accepted This proposal is planned. label Oct 14, 2017

andrewrk mentioned this issue Dec 6, 2017

Add endianness as one of the pointer properties #649

Closed

andrewrk mentioned this issue Dec 15, 2017

design flaw: Access out of bounds on pointer to array #386

Closed

andrewrk mentioned this issue Feb 11, 2018

Pointer Reform #770

Closed

andrewrk modified the milestones: 0.2.0, 0.3.0 Feb 28, 2018

andrewrk modified the milestones: 0.3.0, 0.4.0 Jul 18, 2018

andrewrk mentioned this issue Aug 9, 2018

overhaul std.fmt formatting api #1358

Closed

andrewrk modified the milestones: 0.4.0, 0.5.0 Mar 1, 2019

emekoi mentioned this issue May 6, 2019

WindowsDynLib lookup function should pass a C String to GetProcAddress #2435

Closed

This was referenced May 9, 2019

translate-c: ability to annotate types in C code for better translation #2457

Open

Redis module header file does not translate successfully #2451

Closed

andrewrk mentioned this issue May 27, 2019

rework the API layers between the standard library and the operating system #2527

Merged

5 tasks

andrewrk mentioned this issue Jun 18, 2019

Fix windows create process retry/path search #2705

Merged

andrewrk mentioned this issue Sep 6, 2019

added dynamic library loading via libdl #2598

Closed

andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019

This was referenced Nov 20, 2019

sentinel-terminated pointers #3728

Merged

remove type coercion from array values to references #3768

Closed

andrewrk closed this as completed Nov 25, 2019

andrewrk mentioned this issue Mar 13, 2020

Use std.Buffer even less #4665

Closed

BuzzwordChief mentioned this issue Jul 16, 2021

Docs fix array/pointer/slice type coercion section #9392

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: type for null terminated pointer #265

proposal: type for null terminated pointer #265

andrewrk commented Feb 23, 2017 •

edited

Loading

thejoshwolfe commented Apr 3, 2017

andrewrk commented Apr 3, 2017 •

edited

Loading

andrewrk commented Apr 3, 2017

AndreaOrru commented May 9, 2017

andrewrk commented May 9, 2017

AndreaOrru commented May 10, 2017

thejoshwolfe commented Oct 2, 2017

andrewrk commented Oct 2, 2017 •

edited

Loading

andrewrk commented Oct 16, 2017

jido commented Mar 27, 2018

Hejsil commented Mar 27, 2018

andrewrk commented Jun 6, 2018

andrewrk commented Feb 16, 2019

matthew-mcallister commented Feb 16, 2019 •

edited

Loading

matthew-mcallister commented Feb 16, 2019 •

edited

Loading

Rocknest commented Feb 16, 2019 •

edited

Loading

andrewrk commented Feb 19, 2019

daurnimator commented Feb 19, 2019 •

edited

Loading

matthew-mcallister commented Feb 20, 2019 •

edited

Loading

andrewrk commented Feb 20, 2019

matthew-mcallister commented Feb 20, 2019

Androthi commented Apr 15, 2019

marler8997 commented Jun 18, 2019 •

edited

Loading

hryx commented Jul 3, 2019 •

edited

Loading

daurnimator commented Jul 3, 2019

hryx commented Jul 3, 2019 •

edited

Loading

andrewrk commented Nov 25, 2019

proposal: type for null terminated pointer #265

proposal: type for null terminated pointer #265

Comments

andrewrk commented Feb 23, 2017 • edited Loading

thejoshwolfe commented Apr 3, 2017

andrewrk commented Apr 3, 2017 • edited Loading

andrewrk commented Apr 3, 2017

AndreaOrru commented May 9, 2017

andrewrk commented May 9, 2017

AndreaOrru commented May 10, 2017

thejoshwolfe commented Oct 2, 2017

andrewrk commented Oct 2, 2017 • edited Loading

andrewrk commented Oct 16, 2017

jido commented Mar 27, 2018

Hejsil commented Mar 27, 2018

andrewrk commented Jun 6, 2018

andrewrk commented Feb 16, 2019

matthew-mcallister commented Feb 16, 2019 • edited Loading

matthew-mcallister commented Feb 16, 2019 • edited Loading

Rocknest commented Feb 16, 2019 • edited Loading

andrewrk commented Feb 19, 2019

daurnimator commented Feb 19, 2019 • edited Loading

null slices

null arrays

null literals

misc notes

matthew-mcallister commented Feb 20, 2019 • edited Loading

andrewrk commented Feb 20, 2019

matthew-mcallister commented Feb 20, 2019

Androthi commented Apr 15, 2019

marler8997 commented Jun 18, 2019 • edited Loading

hryx commented Jul 3, 2019 • edited Loading

daurnimator commented Jul 3, 2019

hryx commented Jul 3, 2019 • edited Loading

andrewrk commented Nov 25, 2019

andrewrk commented Feb 23, 2017 •

edited

Loading

andrewrk commented Apr 3, 2017 •

edited

Loading

andrewrk commented Oct 2, 2017 •

edited

Loading

matthew-mcallister commented Feb 16, 2019 •

edited

Loading

matthew-mcallister commented Feb 16, 2019 •

edited

Loading

Rocknest commented Feb 16, 2019 •

edited

Loading

daurnimator commented Feb 19, 2019 •

edited

Loading

matthew-mcallister commented Feb 20, 2019 •

edited

Loading

marler8997 commented Jun 18, 2019 •

edited

Loading

hryx commented Jul 3, 2019 •

edited

Loading

hryx commented Jul 3, 2019 •

edited

Loading