overhaul std.fmt formatting api #1358

tiehuis · 2018-08-09T06:59:04Z

This is a proposal for the formatting interface exposed via std.fmt and which
shows up in most printing functions (e.g. std.debug.warn).

This is largely based on Rust's std::fmt (which in turn is similar to Python3) so see that for a more in-depth reference for certain parts.

Formatting Options

We do take the following formatting options from Rust:

positional parameters ("{0}").
alignment ("{:<} {:<5} {:0^10}")
width ("{:5}")
precision ("{:.5} {:.0}")

We do not take the following:

# alternate printing forms
+, -, 0 sign flags (NOTE: may actually want these)
named parameters (format!("{arg1}", arg1 = "example"))
runtime specified precision (format!("{:.*}", 3, 5.0923412) (NOTE: could add this in if reasonable demand)
numbered argument specified precision (format!("{0:1$}", 5.0923412, 3))

Format Specifiers

These are largely unchanged but a few are:

{} (primitives) print the default primitive representation (if it exists)
{c} (int): print as an ascii character
{b} (int): print as binary
{x} (int): print as lowercase hex
{X} (int): print as uppercase hex
{o} (int): print as octal
{e} (float): print in exponent form
{d} (int/float): print in base10/decimal form
{s} ([]u8/*u8): print as null-terminated string
{*} (any): print as a pointer (hex) (NOTE: does & make more sense here?)
{?} (any): print full debug representation (e.g. traverse structs etc to primitive fields)
{#} (any): print raw bytes of the value (hex) (NOTE: do we need this? how often is it used?)

These format specifiers are removed from the current implementation:

{.} (float): was to specify decimal float, now {d} replaces this
{e10} (float): precision was attached to format specifier. The new format
specifier type would replace this.
{B} (any): printed raw bytes of value, replaced by {#}. This is to
ensure it cannot be shadowed by a user defined function.

User-defined functions

Alongside this I propose a change in the way format functions are defined.

The current function to implement is of the form:

pub fn format(
    self: *SelfType,
    comptime fmt: []const u8,
    context: var,
    comptime Errors: type,
    output: fn (@typeOf(context), []const u8) Errors!void,
) Errors!void;

I instead propose changing this to be of the form:

pub fn format(
    self: *SelfType,
    comptime format: ?u8,
    context: var,
    comptime Errors: type,
    output: fn (@typeOf(context), []const u8) Errors!void,
) Errors!?void {
    // This is enforced within `std.fmt`.
    std.debug.assert(format == null or
        ('a' <= format.? and format.? <= 'z') or
        ('A' <= format.? and format.? <= 'Z')
    );
}

Format specifiers should be simple and ensuring they are only 1 character
at least enforces consistency and simpler format strings. This also makes
switching on the format cases much easier for an implementation and avoids
some easy edge cases.

format is null for the {} case.

If the function does not handle the format specifier they can return null and
std.fmt will handle an appropriate message.

Old Example

const Vec2 = struct {
    x: f32, y: f32,

    pub fn format(
        self: *Vec2,
        comptime fmt: []const u8,
        context: var,
        comptime Errors: type,
        output: fn (@typeOf(context), []const u8) Errors!void,
    ) Errors!void {
        if (fmt.len > 0) {
            if (fmt.len > 1) {
                unreachable;
            }

            switch (fmt[0]) {
                // point format
                'p' => return std.fmt.format(context, Errors, output, "({.3},{.3})", self.x, self.y),
                // dimension format
                'd' => return std.fmt.format(context, Errors, output, "{.3}x{.3}", self.x, self.y),
                else => unreachable,
            }
        }
        return std.fmt.format(context, Errors, output, "({.3},{.3})", self.x, self.y);
    }
};

New Example

const Vec2 = struct {
    x: f32, y: f32,

    pub fn format(
        self: *Vec2,
        comptime fmt_spec: ?u8,
        context: var,
        comptime Errors: type,
        output: fn (@typeOf(context), []const u8) Errors!void,
    ) Errors!?void {
        switch (fmt_spec) {
            // point format
            null, 'p' => return std.fmt.format(context, Errors, output, "({:.3},{:.3})", self.x, self.y),
            // dimension format
            'd' => return std.fmt.format(context, Errors, output, "{:.3}x{:.3}", self.x, self.y),
            // unhandled format
            else => return null,
        }
    }
};

One extra thing that comes to mind is whether we want to allow access to the
formatting specifiers for user-defined functions, passing the values to each.

An example use-case for the above would be allowing access to the precision
field and printing the vector components with that precision instead of
hardcoding. One concern is format functions don't necessarily have to use that
information for the correct purpose and could use it poorly. This is minor,
though.

Shortcomings/Extras

Leftside format-specifier type

With this proposal {s} becomes {:s}. Is this fine? Since we only accept one
character and don't want named arguments we could put this on the leftside of
: alongside the positional argument. This would mean the common case is the
same as now and fairly clean. With a positional parameter this would change from:

"{0:s} {2} {:b}" -> "{0s} {2} {b}"

This is still unambiguous.

Grammar

format-string := <text> (maybe-format <text>)*
maybe-format := "{{" | "}}" | format
format := '{' argument? (':' format-spec)? '}'
argument := integer? type-spec

type-spec := [a-zA-Z*#?]

format-spec := (fill? align)? width? ('.' precision)?
fill := character
align := '<' | '^' | '>'
width := integer
precision := integer

End

Feel free to make any other suggestions and/or highlight any issues. I'd
prefer to keep this as simple as reasonable as long as it covers all the common
use-cases reasonably.

The text was updated successfully, but these errors were encountered:

kristate · 2018-08-09T07:01:19Z

{#} will be useful in writing network applications when we need to debug what was sent to and from the line.

thejoshwolfe · 2018-08-09T14:41:23Z

{#} will be useful in writing network applications when we need to debug what was sent to and from the line.

it should be restricted to packed types then.

andrewrk · 2018-08-09T19:26:19Z

I'd like to further propose:

{s16LE} - decode UTF-16 Little Endian, and print as encoded UTF-8. After proposal: type for null terminated pointer #265 it would work for []u16 as well as [*]null u16 types.
{s16BE} - decode UTF-16 Big Endian, and print as encoded UTF-8. After proposal: type for null terminated pointer #265 it would work for []u16 as well as [*]null u16 types.
{s32LE} - decode UTF-32 Little Endian, and print as encoded UTF-8. After proposal: type for null terminated pointer #265 it would work for []u32 as well as [*]null u32 types.
{s32BE} - decode UTF-32 Big Endian, and print as encoded UTF-8. After proposal: type for null terminated pointer #265 it would work for []u32 as well as [*]null u32 types.

{s16LE} would be common for printing Windows "wide character" strings.

thejoshwolfe · 2018-08-10T19:35:12Z

runtime zfilling is useful. i wanted that feature for this project: https://github.com/thejoshwolfe/hexdump-zip . when that tool was written in javascript, i would determine the digit count for the highest memory address value (which depends on the user-provided input file size), then zfill all memory address representations to that width. the zig implementation of that tool can't easily do that, so i just zfill everything to the maximum conceivable memory address, which is way bulky.

Supports {x} for lowercase and {X} for uppercase;

allow bytes to be printed-out as hex (#1358)

tiehuis · 2019-01-10T00:07:29Z

Small note but the api should take an extra FormatOptions struct which can be used when printing user-formatted types.

pub const Alignment = struct {
    Left,
    Center,
    Right,
};

pub const FormatOptions = struct {
    width: ?usize,
    precision: ?usize,
    align: Alignment,
};

pub fn format(
    self: *Vec2,
    comptime fmt_spec: ?u8,
    context: var,
    options: FormatOptions,
    comptime Errors: type,
    output: fn (@typeOf(context), []const u8) Errors!void,
) Errors!?void {
       try output(context, "(", Errors);
       try printFloat(output, Errors, self.x, options);
       try output(context, ",");
       try printFloat(output, Errors, self.y, options);
       try output(context, ")");
   }

radek-senfeld · 2019-03-19T09:08:21Z

I'm toying with Zig on a STM32F103C8 MCU (64kB/128kB flash, 20kB RAM) aka Bluepill. So far it's been a success story after overcoming some road bumps (see #1290 for reference)! Thank you guys for your work! Zig is the kind of fun I was missing in my life recently! 😄

For debugging purposes I've used std.fmt and encountered a strange problem. The following example freezes the MCU:

const value: u32 = 5;
debug.message("test! value: {}", value);

The problem is in https://github.com/ziglang/zig/blob/master/std/fmt.zig

const max_int_digits = 65;

fn formatIntUnsigned(
...
var buf: [max_int_digits - 1]u8 = undefined;
...
)

For an yet unknown reason the buf: [64]u8 is too big. I've tried smaller and [32]u8 works OK.

I thought that the stack overwrites something but the stack pointer should be properly initialized to 0x20005000 at startup. The call is not any particularly deep so unless there's something I'm missing, it shouldn't be the cause. Later on I'm going to setup a proper debugging environment and inspect deeper.

I'm writing this as a reminder and a hurray that Zig can be used just fine on resource constrained platforms for embedded projects. Could you please bear it in mind when designing the standard library?

I think an unified [64]u8 buffer is an overkill for such tiny 32bit micro. I don't know if there's already some plan to target and optimize the libraries for use on these devices. I can imagine a [16]u8 buffer would be just fine in most cases? This could also include smaller data-types everywhere else if possible.

thejoshwolfe · 2019-03-19T13:05:31Z

the max_int_digits is probably for signed 64 bit base 2:

-0x8000000000000000 =>
"-1000000000000000000000000000000000000000000000000000000000000000"

indeed that seems overkill if we know that we're not going to need that. Seems like a straightforward optimization to make with comptime code that uses the bitcount of the integer type.

see #1358

andrewrk · 2019-03-19T14:12:46Z

The number was wrong anyway because it was from before we had arbitrary sized ints, and now you can print a u128 as binary. I pushed a commit to make it use the integer bit count as @thejoshwolfe suggested.

radek-senfeld · 2019-03-19T17:25:35Z

It works! Thank you gentlemen!

This removes the odd width and precision specifiers found and replacing them with the more consistent api described in #1358. Take the following example: {1:5.9} This refers to the first argument (0-indexed) in the argument list. It will be printed with a minimum width of 5 and will have a precision of 9 (if applicable). Not all types correctly use these parameters just yet. There are still some missing gaps to fill in. Fill characters and alignment has yet to be implemented.

This removes the odd width and precision specifiers found and replacing them with the more consistent api described in #1358. Take the following example: {1:5.9} This refers to the first argument (0-indexed) in the argument list. It will be printed with a minimum width of 5 and will have a precision of 9 (if applicable). Not all types correctly use these parameters just yet. There are still some missing gaps to fill in. Fill characters and alignment have yet to be implemented.

hryx · 2019-07-13T20:53:29Z

@tiehuis If you're open to outside contributions, how do you feel about these suggestions:

Add a checklist to the description to show what work remains
Add the "contributor friendly" label

tiehuis · 2019-07-14T04:30:04Z

@hryx Happy with that. I'll write up the remaining parts tonight.

tiehuis · 2019-07-14T11:09:08Z

Remaining things required in this issue:

Handle padding for all builtin types
Handle aligning builtin types
Handle precision for all builtin types that are appropriate
Add parsing of runtime precision/width parameters
Add new formatting types (e.g. # and s16BE).

tuket · 2019-09-15T21:49:48Z

@data-man Could it be this bug? #1485

komuw · 2019-12-10T19:28:01Z

from IRC;

<dingenskirchen> with varargs being phased out in favor of 'tuples'/structs, would something like   
`std.debug.warn("value: {ident}", .{ident = my_variable});` be possible?

<andrewrk> dingenskirchen, yes
<andrewrk> someone should probably make note of this on https://github.com/ziglang/zig/issues/1358

Manuzor · 2021-12-06T21:21:46Z

I just stumbled over this (for me) unexpected result:

> cat src/main.zig
const std = @import("std");

pub fn main() anyerror!void {
    const value = @as(u8, 0b00001101); // 13
    std.log.info("value is '{b:08}'", .{value});
}
> zig build run
info: value is '    1101'

The format parser just ignored the leading zero in the format specifier {b:08} when interpreting the width parameter in the format options as a number. I know that a sign flag is not desired, but I think this will be a common user error, especially when coming from C where this would produce a left-aligned zero-filled output (as I expected).

One option is to provide some comptime assistance and produce a compile error for leading zeros in the width parameter.

diff --git a/lib/std/fmt.zig b/lib/std/fmt.zig
index 97dfcc78b..34ac8c940 100644
--- a/lib/std/fmt.zig
+++ b/lib/std/fmt.zig
@@ -309,6 +309,11 @@ pub fn format(
 
                 break :init @field(args, fields_info[arg_index].name);
             } else {
+                if (parser.peek(0)) |ch| {
+                    if (ch == '0') {
+                        @compileError("Leading 0 is not valid for the width parameter. If you intended to zero-fill, try specifying the alignment parameter like this: {" ++ parser.buf[0..parser.pos] ++ "0>" ++ parser.buf[parser.pos + 1 ..] ++ "}");
+                    }
+                }
                 break :init parser.number();
             }
         };

Another option is to allow the fill character to be specified without an explicit alignment parameter, if that fill character is suitable (probably violates Only one obvious way to do things).

diff --git a/lib/std/fmt.zig b/lib/std/fmt.zig
index 34ac8c940..79fdd3023 100644
--- a/lib/std/fmt.zig
+++ b/lib/std/fmt.zig
@@ -268,10 +268,12 @@ pub fn format(
 
         // Parse the fill character
         // The fill parameter requires the alignment parameter to be specified
-        // too
-        if (comptime parser.peek(1)) |ch| {
-            if (comptime mem.indexOfScalar(u8, "<^>", ch) != null) {
-                options.fill = comptime parser.char().?;
+        // too, unless the fill character is unambiguous, such as '0'.
+        if (comptime parser.peek(0)) |ch| {
+            const has_alignment_parameter = mem.indexOfScalar(u8, "<^>", parser.peek(1) orelse 0) != null;
+            if (has_alignment_parameter or mem.indexOfScalar(u8, "123456789<^>", ch) == null) {
+                options.fill = ch;
+                _ = comptime parser.char();
             }
         }

I'd be fine with either, but I think ignoring this will produce many a frowny face.

matu3ba · 2021-12-12T11:55:06Z

Trying to use fmt left me annoyed that 2s complement is not represented as how the memory actually looks.
Take for example

test {
    const print = @import("std").debug.print;
    const min_usable = -2147483647;
    const max_usable = 2147483647;
    print("\n", .{});
    print("min_usable: {d}, {x}\n", .{ min_usable, min_usable });
    print("max_usable: {d}, {x}\n", .{ max_usable, max_usable });
}

ie when you want to have quickly the actual memory repesentation of an integer in hex or binary for using as test case.
Then Zig outputs

min_usable: -2147483647, -7fffffff
max_usable: 2147483647, 7fffffff

But I need -0x7fffffff to represent the same number -2147483647, -7fffffff, which for obvious reasons breaks -INT_MIN of that signed integer range.

If I want to properly print the number for copy-paste a test case, the intuitive approach with absInt also does not work, because must pass an integer to absInt

    const print = @import("std").debug.print;
    const math = @import("std").math;
    const test1 = -0x7fffffff;
    print("test: {d}, -0x{x}\n", .{ math.absInt(test1), math.absInt(test1) });

I would prefer, if std.fmt is able to print with relative breeze numbers for copy-paste into test code as I think this is one of the essential features of Zig (quickly writing the wanted byte representation in tests and code against that).

Personally I dont understand the design decision to represent binary and hex not as how the actual memory looks like for 2s complement. For integer this is however understandable (they are intended for human inspection).

PS: The workaround to cast to unsigned does also not work.

HotCRC · 2022-04-22T09:22:57Z

How does Zig do it?

ominitay · 2022-07-31T21:47:45Z

Can we add onto this proposal a way to override the default max depth through either a format string parameter or a formatter?

Vexu · 2022-08-01T16:51:48Z

Closing as complete with the remaining TODO split into #12313. Any other missing alignment/padding should be reported as new bugs.

hackerzhuli · 2023-09-11T01:39:16Z

Please put these format specifiers in the documentation. It is hard to find string formatting information for zig. A frequent use case of printing debug information should be easy to find in the documentation. It took me a while to figure out that printing a floating point with nothing after the dot is "{d:.0}", Maybe there should be examples in the doc because this (or similar) seems to be quite a common case.

nektro · 2023-09-11T03:58:27Z

wrong place to post that but it is at https://ziglang.org/documentation/master/std/#A;std:fmt.format

tiehuis added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 9, 2018

tiehuis added this to the 0.4.0 milestone Aug 9, 2018

andrewrk added the accepted This proposal is planned. label Aug 9, 2018

kristate added a commit to kristate/zig that referenced this issue Sep 1, 2018

std/fmt/index.zig: ziglang#1358 allow bytes to be printed-out as hex;

454b236

Supports {x} for lowercase and {X} for uppercase;

kristate added a commit to kristate/zig that referenced this issue Sep 1, 2018

std/fmt/index.zig: ziglang#1358: test bytes printed-out as hex;

7a633f4

andrewrk added a commit that referenced this issue Sep 1, 2018

Merge pull request #1451 from kristate/fmt-hexbytes-issue1358

1a5c3e4

allow bytes to be printed-out as hex (#1358)

andrewrk added the standard library This issue involves writing Zig code for the standard library. label Feb 10, 2019

andrewrk modified the milestones: 0.4.0, 0.5.0 Feb 10, 2019

tiehuis mentioned this issue Feb 15, 2019

look into using Ryu instead of Errol3 for floating point printing #1299

Closed

tiehuis mentioned this issue Mar 11, 2019

Support hex floating point format in fmt.parseFloat and fmt.print #2047

Closed

andrewrk added a commit that referenced this issue Mar 19, 2019

better buffer length for formatIntUnsigned

af9ac0d

see #1358

tiehuis mentioned this issue Jun 20, 2019

Add positional, precision and width support to std.fmt #2714

Merged

tiehuis added the contributor friendly This issue is limited in scope and/or knowledge of Zig internals. label Jul 14, 2019

andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019

andrewrk modified the milestones: 0.6.0, 0.7.0 Oct 17, 2019

LemonBoy mentioned this issue Sep 21, 2020

std.fmt meets UTF-8 #6390

Merged

andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 9, 2020

LemonBoy mentioned this issue Oct 21, 2020

std.fmt: Add {q} format specifier #6756

Closed

andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021

andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 20, 2021

andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022

ominitay mentioned this issue Jul 31, 2022

fmt: Make default_max_depth configurable #12310

Merged

Vexu mentioned this issue Aug 1, 2022

std.fmt: add # specifier to print raw bytes of the value (hex) #12313

Closed

Vexu closed this as completed Aug 1, 2022

Vexu modified the milestones: 0.11.0, 0.10.0 Aug 2, 2022

vjpr mentioned this issue Nov 12, 2022

Zig 0.10.0 oven-sh/bun#1491

Closed

tobbez mentioned this issue Dec 14, 2022

String formatting requiring alignment when specifying fill character is surprising #13932

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overhaul std.fmt formatting api #1358

overhaul std.fmt formatting api #1358

tiehuis commented Aug 9, 2018 •

edited

Loading

kristate commented Aug 9, 2018

thejoshwolfe commented Aug 9, 2018

andrewrk commented Aug 9, 2018

thejoshwolfe commented Aug 10, 2018

tiehuis commented Jan 10, 2019 •

edited

Loading

radek-senfeld commented Mar 19, 2019

thejoshwolfe commented Mar 19, 2019 •

edited

Loading

andrewrk commented Mar 19, 2019 •

edited

Loading

radek-senfeld commented Mar 19, 2019

hryx commented Jul 13, 2019

tiehuis commented Jul 14, 2019

tiehuis commented Jul 14, 2019 •

edited by Vexu

Loading

tuket commented Sep 15, 2019

komuw commented Dec 10, 2019

Manuzor commented Dec 6, 2021

matu3ba commented Dec 12, 2021 •

edited

Loading

HotCRC commented Apr 22, 2022

ominitay commented Jul 31, 2022

Vexu commented Aug 1, 2022

hackerzhuli commented Sep 11, 2023 •

edited

Loading

nektro commented Sep 11, 2023

overhaul std.fmt formatting api #1358

overhaul std.fmt formatting api #1358

Comments

tiehuis commented Aug 9, 2018 • edited Loading

Formatting Options

Format Specifiers

User-defined functions

Old Example

New Example

Shortcomings/Extras

Leftside format-specifier type

Grammar

End

kristate commented Aug 9, 2018

thejoshwolfe commented Aug 9, 2018

andrewrk commented Aug 9, 2018

thejoshwolfe commented Aug 10, 2018

tiehuis commented Jan 10, 2019 • edited Loading

radek-senfeld commented Mar 19, 2019

thejoshwolfe commented Mar 19, 2019 • edited Loading

andrewrk commented Mar 19, 2019 • edited Loading

radek-senfeld commented Mar 19, 2019

hryx commented Jul 13, 2019

tiehuis commented Jul 14, 2019

tiehuis commented Jul 14, 2019 • edited by Vexu Loading

tuket commented Sep 15, 2019

komuw commented Dec 10, 2019

Manuzor commented Dec 6, 2021

matu3ba commented Dec 12, 2021 • edited Loading

HotCRC commented Apr 22, 2022

ominitay commented Jul 31, 2022

Vexu commented Aug 1, 2022

hackerzhuli commented Sep 11, 2023 • edited Loading

nektro commented Sep 11, 2023

tiehuis commented Aug 9, 2018 •

edited

Loading

tiehuis commented Jan 10, 2019 •

edited

Loading

thejoshwolfe commented Mar 19, 2019 •

edited

Loading

andrewrk commented Mar 19, 2019 •

edited

Loading

tiehuis commented Jul 14, 2019 •

edited by Vexu

Loading

matu3ba commented Dec 12, 2021 •

edited

Loading

hackerzhuli commented Sep 11, 2023 •

edited

Loading