Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make null terminated pointer syntax support any sentinel value, not just null #3731

Closed
daurnimator opened this issue Nov 20, 2019 · 14 comments
Closed
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@daurnimator
Copy link
Contributor

daurnimator commented Nov 20, 2019

Accepted Proposal


Idea: Instead of [*]null u8, have [*]sentinel(0) u8.

It's not obvious that null means 0: no where else in zig do we call 0 "null".
Additionally, we could open things up to allow arbitrary values as a sentinel in the future.

Originally posted in #3728

@daurnimator daurnimator added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Nov 20, 2019
@andrewrk andrewrk added this to the 0.7.0 milestone Nov 20, 2019
@andrewrk
Copy link
Member

andrewrk commented Nov 20, 2019

Here's an alternate syntax proposal:

  • *[10:0]u8- pointer to array with sentinel 0 at len
  • [:0]u8 - slice with sentinel 0 at len
  • [*:0]u8 - unknown length pointer with sentinel 0
pub extern "c" fn printf(format: [*:0]const u8, ...) c_int;

test "allow any sentinel" {
    assert(@typeOf("hello") == *const [5:0]u8);
    var slice: [:0]const u8 = "hello";
    var ptr: [*:0]const u8 = "hello";
}

test "with enums" {
    const Number = enum {one, two, sentinel};
    var ptr: [*:.sentinel]Number = &[_:.sentinel]Number{.one, .two, .two, .one};
    comptime assert(ptr[4] == .sentinel); // comptime because index is comptime known
}

test "with optional thing" {
    var ptr: [*:null]?i32 = &[_:null]?i32{1, 2, 3, 4};
    comptime assert(ptr[4] == null); // comptime because index is comptime known
}

test "with floats" {
    const nan = std.math.nan_f32;
    var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
    comptime assert(ptr[4] == nan); // comptime because index is comptime known
}
  • basic implementation
  • index of a sentinel-terminated array which is comptime-known to be len, is a comptime-known value when loading. no-op with safety check when storing.

@andrewrk andrewrk modified the milestones: 0.7.0, 0.6.0 Nov 20, 2019
@andrewrk
Copy link
Member

andrewrk commented Nov 20, 2019

Another thing that could be part of this proposal, slicing syntax which can obtain sentinel terminated types from non-sentinel-terminated types:

const std = @import("std");
const assert = std.debug.assert;

test "obtaining a null terminated slice" {
    // here we have a normal array
    var buf: [50]u8 = undefined;

    buf[0] = 'a';
    buf[1] = 'b';
    buf[2] = 'c';
    buf[3] = 0;

    // now we obtain a null terminated slice:
    const ptr = buf[0..3 :0];

    // ptr is a pointer to null-terminated array,
    // because the len was comptime known (See #863)
    comptime assert(@typeOf(ptr) == *[3:0]u8);

    var runtime_len: usize = 3;
    const ptr2 = buf[0..runtime_len :0];
    // ptr2 is a null-terminated slice
    comptime assert(@typeOf(ptr) == [:0]u8);


    buf[3] += 1;
    _ = buf[0..3 :0]; // safety panic: slice sentinel assertion failed
}

Along with this:

  • slicing a sentinel-terminated pointer without this syntax should yield a non-sentinel-terminated result.

@emekoi
Copy link
Contributor

emekoi commented Nov 21, 2019

related: #1838

@marler8997
Copy link
Contributor

marler8997 commented Nov 21, 2019

Putting the sentinel value inside the brackets (i.e. [*:0] u8) has the benefit that you know you can't do this with a single-value pointer: *sentinel(0) u8. This is good since sentinel pointers don't make sense with single-value pointers.

@ikskuh
Copy link
Contributor

ikskuh commented Nov 21, 2019

I like the semantics of "sentinel-terminated pointers" and andrews syntax-proposal is also quite good. This would also allow safe-slicing of C-Strings:

test "index-out-of-bounds"
{
    // here we have a normal array
    var buf: [50]u8 = undefined;

    buf[0] = 'a';
    buf[1] = 'b';
    buf[2] = 'c';
    buf[3] = 0;

    const ptr = buf[2..10 :0]; // this will @panic with "index-out-of-bounds"
}

@mikdusan
Copy link
Member

const ptr = buf[2..10 :0]; // this will @panic with "index-out-of-bounds"

not saying I'm against the runtime check, just to consider this (admittedly contrived) case in debug/safe builds:

var buf: [50]u8 = undefined;
buf[0] = 'a';
buf[1] = 'b';
buf[2] = 'c';
buf[3] = 0xa;

assert(buf[11] == 0x0a);
const ptr = buf[2..10 :0xa]; // this will not @panic

I admit this could be a non-issue as undefined handling at runtime is expected to evolve.

@andrewrk
Copy link
Member

@MasterQ32 your example in #3731 (comment) would panic but not for the reason in your comment. It would panic because buf[10] != 0. My proposal for slicing syntax with a sentinel value is that it only does a O(1) assertion: assert(ptr[len] == sentinel);

@andrewrk andrewrk added the accepted This proposal is planned. label Nov 21, 2019
@andrewrk andrewrk changed the title Change [*]null T to [*]sentinel(0) T make null terminated pointer syntax support any sentinel value, not just null Nov 21, 2019
@ikskuh
Copy link
Contributor

ikskuh commented Nov 21, 2019

@andrewrk ah okay, makes sense to not burn that much perf for safety, even in debug builds

@haoyu234
Copy link

haoyu234 commented Nov 22, 2019

about nan's behavior:

  • Is nan equal to nan in the zig language?
  • [*:nan]f32, how to judge the length of the sequence? Is nan equal to nan?

In the above example I saw that nan is used in the sentinel grammar.

test "with floats" {
const nan = std.math.nan_f32;
var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
comptime assert(ptr[4] == nan); // comptime because index is comptime known
}

i am very confused about whether nan can be a sentinel.

@andrewrk
Copy link
Member

  • Is nan equal to nan in the zig language?

The zig language's floating point formats are specified by IEEE-754. https://ziglang.org/documentation/master/#Floats

  • [*:nan]f32, how to judge the length of the sequence? Is nan equal to nan?

The memory has a particular bit pattern. The pointer type allows specifying the sentinel bit pattern. If a == b would return true then it is the sentinel value.

@daurnimator
Copy link
Contributor Author

The pointer type allows specifying the sentinel bit pattern.

Okay... so same bit-patterned nan would be equal?

If a == b would return true then it is the sentinel value.

nan == nan is false (as IEEE-754 demands)... which means the comparison is not by bit pattern.

@andrewrk
Copy link
Member

andrewrk commented Nov 22, 2019

Thanks, I forgot that nan != nan. My example above is no good. Maybe negative zero would be better to use for this example, but really, it's beside the point. The point is that the compiler lets you use any value, provided that the == operator is allowed for the type, and that a == a holds for all possible values of the type.

I think it could even work fine for float types. The safety assertion would use the isNan operator rather than == for this case.

@andrewrk andrewrk changed the title make null terminated pointer syntax support any sentinel value, not just null slicing syntax which can obtain sentinel terminated types from non-sentinel-terminated types Nov 25, 2019
@daurnimator
Copy link
Contributor Author

Done as part of #3728

@andrewrk
Copy link
Member

I moved the unfinished parts of #3731 (comment) and #3731 (comment) to #3770.

@andrewrk andrewrk changed the title slicing syntax which can obtain sentinel terminated types from non-sentinel-terminated types make null terminated pointer syntax support any sentinel value, not just null Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

7 participants