Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overhaul std.fmt formatting api #1358

Closed
tiehuis opened this issue Aug 9, 2018 · 23 comments
Closed

overhaul std.fmt formatting api #1358

tiehuis opened this issue Aug 9, 2018 · 23 comments
Labels
accepted This proposal is planned. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@tiehuis
Copy link
Member

tiehuis commented Aug 9, 2018

Remaining Work.

This is a proposal for the formatting interface exposed via std.fmt and which
shows up in most printing functions (e.g. std.debug.warn).

This is largely based on Rust's std::fmt (which in turn is similar to Python3) so see that for a more in-depth reference for certain parts.

Formatting Options

We do take the following formatting options from Rust:

  • positional parameters ("{0}").
  • alignment ("{:<} {:<5} {:0^10}")
  • width ("{:5}")
  • precision ("{:.5} {:.0}")

We do not take the following:

  • # alternate printing forms
  • +, -, 0 sign flags (NOTE: may actually want these)
  • named parameters (format!("{arg1}", arg1 = "example"))
  • runtime specified precision (format!("{:.*}", 3, 5.0923412) (NOTE: could add this in if reasonable demand)
  • numbered argument specified precision (format!("{0:1$}", 5.0923412, 3))

Format Specifiers

These are largely unchanged but a few are:

  • {} (primitives) print the default primitive representation (if it exists)
  • {c} (int): print as an ascii character
  • {b} (int): print as binary
  • {x} (int): print as lowercase hex
  • {X} (int): print as uppercase hex
  • {o} (int): print as octal
  • {e} (float): print in exponent form
  • {d} (int/float): print in base10/decimal form
  • {s} ([]u8/*u8): print as null-terminated string
  • {*} (any): print as a pointer (hex) (NOTE: does & make more sense here?)
  • {?} (any): print full debug representation (e.g. traverse structs etc to primitive fields)
  • {#} (any): print raw bytes of the value (hex) (NOTE: do we need this? how often is it used?)

These format specifiers are removed from the current implementation:

  • {.} (float): was to specify decimal float, now {d} replaces this
  • {e10} (float): precision was attached to format specifier. The new format
    specifier type would replace this.
  • {B} (any): printed raw bytes of value, replaced by {#}. This is to
    ensure it cannot be shadowed by a user defined function.

User-defined functions

Alongside this I propose a change in the way format functions are defined.

The current function to implement is of the form:

pub fn format(
    self: *SelfType,
    comptime fmt: []const u8,
    context: var,
    comptime Errors: type,
    output: fn (@typeOf(context), []const u8) Errors!void,
) Errors!void;

I instead propose changing this to be of the form:

pub fn format(
    self: *SelfType,
    comptime format: ?u8,
    context: var,
    comptime Errors: type,
    output: fn (@typeOf(context), []const u8) Errors!void,
) Errors!?void {
    // This is enforced within `std.fmt`.
    std.debug.assert(format == null or
        ('a' <= format.? and format.? <= 'z') or
        ('A' <= format.? and format.? <= 'Z')
    );
}

Format specifiers should be simple and ensuring they are only 1 character
at least enforces consistency and simpler format strings. This also makes
switching on the format cases much easier for an implementation and avoids
some easy edge cases.

format is null for the {} case.

If the function does not handle the format specifier they can return null and
std.fmt will handle an appropriate message.

Old Example
const Vec2 = struct {
    x: f32, y: f32,

    pub fn format(
        self: *Vec2,
        comptime fmt: []const u8,
        context: var,
        comptime Errors: type,
        output: fn (@typeOf(context), []const u8) Errors!void,
    ) Errors!void {
        if (fmt.len > 0) {
            if (fmt.len > 1) {
                unreachable;
            }

            switch (fmt[0]) {
                // point format
                'p' => return std.fmt.format(context, Errors, output, "({.3},{.3})", self.x, self.y),
                // dimension format
                'd' => return std.fmt.format(context, Errors, output, "{.3}x{.3}", self.x, self.y),
                else => unreachable,
            }
        }
        return std.fmt.format(context, Errors, output, "({.3},{.3})", self.x, self.y);
    }
};
New Example
const Vec2 = struct {
    x: f32, y: f32,

    pub fn format(
        self: *Vec2,
        comptime fmt_spec: ?u8,
        context: var,
        comptime Errors: type,
        output: fn (@typeOf(context), []const u8) Errors!void,
    ) Errors!?void {
        switch (fmt_spec) {
            // point format
            null, 'p' => return std.fmt.format(context, Errors, output, "({:.3},{:.3})", self.x, self.y),
            // dimension format
            'd' => return std.fmt.format(context, Errors, output, "{:.3}x{:.3}", self.x, self.y),
            // unhandled format
            else => return null,
        }
    }
};

One extra thing that comes to mind is whether we want to allow access to the
formatting specifiers for user-defined functions, passing the values to each.

An example use-case for the above would be allowing access to the precision
field and printing the vector components with that precision instead of
hardcoding. One concern is format functions don't necessarily have to use that
information for the correct purpose and could use it poorly. This is minor,
though.

Shortcomings/Extras

Leftside format-specifier type

With this proposal {s} becomes {:s}. Is this fine? Since we only accept one
character and don't want named arguments we could put this on the leftside of
: alongside the positional argument. This would mean the common case is the
same as now and fairly clean. With a positional parameter this would change from:

"{0:s} {2} {:b}" -> "{0s} {2} {b}"

This is still unambiguous.

Grammar

format-string := <text> (maybe-format <text>)*
maybe-format := "{{" | "}}" | format
format := '{' argument? (':' format-spec)? '}'
argument := integer? type-spec

type-spec := [a-zA-Z*#?]

format-spec := (fill? align)? width? ('.' precision)?
fill := character
align := '<' | '^' | '>'
width := integer
precision := integer

End

Feel free to make any other suggestions and/or highlight any issues. I'd
prefer to keep this as simple as reasonable as long as it covers all the common
use-cases reasonably.

@tiehuis tiehuis added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 9, 2018
@tiehuis tiehuis added this to the 0.4.0 milestone Aug 9, 2018
@kristate
Copy link
Contributor

kristate commented Aug 9, 2018

{#} will be useful in writing network applications when we need to debug what was sent to and from the line.

@thejoshwolfe
Copy link
Contributor

{#} will be useful in writing network applications when we need to debug what was sent to and from the line.

it should be restricted to packed types then.

@andrewrk andrewrk added the accepted This proposal is planned. label Aug 9, 2018
@andrewrk
Copy link
Member

andrewrk commented Aug 9, 2018

I'd like to further propose:

{s16LE} would be common for printing Windows "wide character" strings.

@thejoshwolfe
Copy link
Contributor

runtime zfilling is useful. i wanted that feature for this project: https://github.com/thejoshwolfe/hexdump-zip . when that tool was written in javascript, i would determine the digit count for the highest memory address value (which depends on the user-provided input file size), then zfill all memory address representations to that width. the zig implementation of that tool can't easily do that, so i just zfill everything to the maximum conceivable memory address, which is way bulky.

kristate added a commit to kristate/zig that referenced this issue Sep 1, 2018
Supports {x} for lowercase and {X} for uppercase;
kristate added a commit to kristate/zig that referenced this issue Sep 1, 2018
andrewrk added a commit that referenced this issue Sep 1, 2018
allow bytes to be printed-out as hex (#1358)
@tiehuis
Copy link
Member Author

tiehuis commented Jan 10, 2019

Small note but the api should take an extra FormatOptions struct which can be used when printing user-formatted types.

pub const Alignment = struct {
    Left,
    Center,
    Right,
};

pub const FormatOptions = struct {
    width: ?usize,
    precision: ?usize,
    align: Alignment,
};

pub fn format(
    self: *Vec2,
    comptime fmt_spec: ?u8,
    context: var,
    options: FormatOptions,
    comptime Errors: type,
    output: fn (@typeOf(context), []const u8) Errors!void,
) Errors!?void {
       try output(context, "(", Errors);
       try printFloat(output, Errors, self.x, options);
       try output(context, ",");
       try printFloat(output, Errors, self.y, options);
       try output(context, ")");
   }

@radek-senfeld
Copy link

I'm toying with Zig on a STM32F103C8 MCU (64kB/128kB flash, 20kB RAM) aka Bluepill. So far it's been a success story after overcoming some road bumps (see #1290 for reference)! Thank you guys for your work! Zig is the kind of fun I was missing in my life recently! 😄

For debugging purposes I've used std.fmt and encountered a strange problem. The following example freezes the MCU:

const value: u32 = 5;
debug.message("test! value: {}", value);

The problem is in https://github.com/ziglang/zig/blob/master/std/fmt.zig

const max_int_digits = 65;

fn formatIntUnsigned(
...
var buf: [max_int_digits - 1]u8 = undefined;
...
)

For an yet unknown reason the buf: [64]u8 is too big. I've tried smaller and [32]u8 works OK.

I thought that the stack overwrites something but the stack pointer should be properly initialized to 0x20005000 at startup. The call is not any particularly deep so unless there's something I'm missing, it shouldn't be the cause. Later on I'm going to setup a proper debugging environment and inspect deeper.

I'm writing this as a reminder and a hurray that Zig can be used just fine on resource constrained platforms for embedded projects. Could you please bear it in mind when designing the standard library?

I think an unified [64]u8 buffer is an overkill for such tiny 32bit micro. I don't know if there's already some plan to target and optimize the libraries for use on these devices. I can imagine a [16]u8 buffer would be just fine in most cases? This could also include smaller data-types everywhere else if possible.

@thejoshwolfe
Copy link
Contributor

thejoshwolfe commented Mar 19, 2019

the max_int_digits is probably for signed 64 bit base 2:

-0x8000000000000000 =>
"-1000000000000000000000000000000000000000000000000000000000000000"

indeed that seems overkill if we know that we're not going to need that. Seems like a straightforward optimization to make with comptime code that uses the bitcount of the integer type.

andrewrk added a commit that referenced this issue Mar 19, 2019
@andrewrk
Copy link
Member

andrewrk commented Mar 19, 2019

The number was wrong anyway because it was from before we had arbitrary sized ints, and now you can print a u128 as binary. I pushed a commit to make it use the integer bit count as @thejoshwolfe suggested.

@radek-senfeld
Copy link

It works! Thank you gentlemen!

tiehuis added a commit that referenced this issue Jun 20, 2019
This removes the odd width and precision specifiers found and replacing
them with the more consistent api described in #1358.

Take the following example:

    {1:5.9}

This refers to the first argument (0-indexed) in the argument list. It
will be printed with a minimum width of 5 and will have a precision of 9
(if applicable).

Not all types correctly use these parameters just yet. There are still
some missing gaps to fill in. Fill characters and alignment has yet to
be implemented.
tiehuis added a commit that referenced this issue Jun 21, 2019
This removes the odd width and precision specifiers found and replacing
them with the more consistent api described in #1358.

Take the following example:

    {1:5.9}

This refers to the first argument (0-indexed) in the argument list. It
will be printed with a minimum width of 5 and will have a precision of 9
(if applicable).

Not all types correctly use these parameters just yet. There are still
some missing gaps to fill in. Fill characters and alignment have yet to
be implemented.
@hryx
Copy link
Contributor

hryx commented Jul 13, 2019

@tiehuis If you're open to outside contributions, how do you feel about these suggestions:

  1. Add a checklist to the description to show what work remains
  2. Add the "contributor friendly" label

@tiehuis
Copy link
Member Author

tiehuis commented Jul 14, 2019

@hryx Happy with that. I'll write up the remaining parts tonight.

@tiehuis
Copy link
Member Author

tiehuis commented Jul 14, 2019

Remaining things required in this issue:

  • Handle padding for all builtin types
  • Handle aligning builtin types
  • Handle precision for all builtin types that are appropriate
  • Add parsing of runtime precision/width parameters
  • Add new formatting types (e.g. # and s16BE).

@tiehuis tiehuis added the contributor friendly This issue is limited in scope and/or knowledge of Zig internals. label Jul 14, 2019
@tuket
Copy link

tuket commented Sep 15, 2019

@data-man Could it be this bug? #1485

@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019
@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Oct 17, 2019
@komuw
Copy link

komuw commented Dec 10, 2019

from IRC;

<dingenskirchen> with varargs being phased out in favor of 'tuples'/structs, would something like   
`std.debug.warn("value: {ident}", .{ident = my_variable});` be possible?

<andrewrk> dingenskirchen, yes
<andrewrk> someone should probably make note of this on https://github.com/ziglang/zig/issues/1358

@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 9, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 20, 2021
@Manuzor
Copy link
Contributor

Manuzor commented Dec 6, 2021

I just stumbled over this (for me) unexpected result:

> cat src/main.zig
const std = @import("std");

pub fn main() anyerror!void {
    const value = @as(u8, 0b00001101); // 13
    std.log.info("value is '{b:08}'", .{value});
}
> zig build run
info: value is '    1101'

The format parser just ignored the leading zero in the format specifier {b:08} when interpreting the width parameter in the format options as a number. I know that a sign flag is not desired, but I think this will be a common user error, especially when coming from C where this would produce a left-aligned zero-filled output (as I expected).

One option is to provide some comptime assistance and produce a compile error for leading zeros in the width parameter.

diff --git a/lib/std/fmt.zig b/lib/std/fmt.zig
index 97dfcc78b..34ac8c940 100644
--- a/lib/std/fmt.zig
+++ b/lib/std/fmt.zig
@@ -309,6 +309,11 @@ pub fn format(
 
                 break :init @field(args, fields_info[arg_index].name);
             } else {
+                if (parser.peek(0)) |ch| {
+                    if (ch == '0') {
+                        @compileError("Leading 0 is not valid for the width parameter. If you intended to zero-fill, try specifying the alignment parameter like this: {" ++ parser.buf[0..parser.pos] ++ "0>" ++ parser.buf[parser.pos + 1 ..] ++ "}");
+                    }
+                }
                 break :init parser.number();
             }
         };

Another option is to allow the fill character to be specified without an explicit alignment parameter, if that fill character is suitable (probably violates Only one obvious way to do things).

diff --git a/lib/std/fmt.zig b/lib/std/fmt.zig
index 34ac8c940..79fdd3023 100644
--- a/lib/std/fmt.zig
+++ b/lib/std/fmt.zig
@@ -268,10 +268,12 @@ pub fn format(
 
         // Parse the fill character
         // The fill parameter requires the alignment parameter to be specified
-        // too
-        if (comptime parser.peek(1)) |ch| {
-            if (comptime mem.indexOfScalar(u8, "<^>", ch) != null) {
-                options.fill = comptime parser.char().?;
+        // too, unless the fill character is unambiguous, such as '0'.
+        if (comptime parser.peek(0)) |ch| {
+            const has_alignment_parameter = mem.indexOfScalar(u8, "<^>", parser.peek(1) orelse 0) != null;
+            if (has_alignment_parameter or mem.indexOfScalar(u8, "123456789<^>", ch) == null) {
+                options.fill = ch;
+                _ = comptime parser.char();
             }
         }

I'd be fine with either, but I think ignoring this will produce many a frowny face.

@matu3ba
Copy link
Contributor

matu3ba commented Dec 12, 2021

Trying to use fmt left me annoyed that 2s complement is not represented as how the memory actually looks.
Take for example

test {
    const print = @import("std").debug.print;
    const min_usable = -2147483647;
    const max_usable = 2147483647;
    print("\n", .{});
    print("min_usable: {d}, {x}\n", .{ min_usable, min_usable });
    print("max_usable: {d}, {x}\n", .{ max_usable, max_usable });
}

ie when you want to have quickly the actual memory repesentation of an integer in hex or binary for using as test case.
Then Zig outputs

min_usable: -2147483647, -7fffffff
max_usable: 2147483647, 7fffffff

But I need -0x7fffffff to represent the same number -2147483647, -7fffffff, which for obvious reasons breaks -INT_MIN of that signed integer range.

If I want to properly print the number for copy-paste a test case, the intuitive approach with absInt also does not work, because must pass an integer to absInt

    const print = @import("std").debug.print;
    const math = @import("std").math;
    const test1 = -0x7fffffff;
    print("test: {d}, -0x{x}\n", .{ math.absInt(test1), math.absInt(test1) });

I would prefer, if std.fmt is able to print with relative breeze numbers for copy-paste into test code as I think this is one of the essential features of Zig (quickly writing the wanted byte representation in tests and code against that).

Personally I dont understand the design decision to represent binary and hex not as how the actual memory looks like for 2s complement. For integer this is however understandable (they are intended for human inspection).

PS: The workaround to cast to unsigned does also not work.

@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@HotCRC
Copy link

HotCRC commented Apr 22, 2022

How does Zig do it?

@ominitay
Copy link
Contributor

Can we add onto this proposal a way to override the default max depth through either a format string parameter or a formatter?

@Vexu
Copy link
Member

Vexu commented Aug 1, 2022

Closing as complete with the remaining TODO split into #12313. Any other missing alignment/padding should be reported as new bugs.

@hackerzhuli
Copy link

hackerzhuli commented Sep 11, 2023

Please put these format specifiers in the documentation. It is hard to find string formatting information for zig. A frequent use case of printing debug information should be easy to find in the documentation. It took me a while to figure out that printing a floating point with nothing after the dot is "{d:.0}", Maybe there should be examples in the doc because this (or similar) seems to be quite a common case.

@nektro
Copy link
Contributor

nektro commented Sep 11, 2023

wrong place to post that but it is at https://ziglang.org/documentation/master/std/#A;std:fmt.format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests