std.Buffer.initSize() doesn't allow append to utilize the space allocated #3740

MCRusher · 2019-11-22T04:00:57Z

After looking at the source code, why does Buffer.initSize() call Buffer.resize(), which calls ArrayList(u8).resize(), which adds new elements,

instead of just calling ArrayList(u8).ensureCapacity() or similar?

It's possible I am misinterpreting the purpose of initSize, but it seems to me that it would make a lot of sense for initSize to allow preallocation of memory to reduce future reallocations, rather than create a bunch of blank space initialized with random values which must be set from outside the buffer object itself.

A usage case for this would be reading an unbounded length line of text from a file or console using std.Buffer, and then wanting to get an owned slice out of it to pass around between functions. You can expect a certain line length, like ~200 or so, so it would make sense to account for that case.

Currently, I have used initSize with size 0 to do so since calling toOwnedSlice after a readline call results in "¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬some string input" when initSize is set greater than 0, rather than what I expected it to do. I have a related example source file included.

If this is useful and makes sense, I am very curious how, and then also how this should be done, as calling Buffer.resize(0) seems hackish and not well optimized for the use case.

Thanks, and sorry if this is messy or misses the point.

mikdusan · 2019-11-22T23:38:21Z

In Buffer size is not capacity so that being said this is currently the way I would optimize for a minimal capacity:

edit: changed size arg 100 → 0 in following snippet:

var buf = try std.Buffer.initSize(std.heap.direct_allocator, 0);
buf.list.ensureCapacity(200);

assuming things remain, maybe clarifying API doc is in order:

    /// Initialize to size bytes of undefined value.
    /// Must deinitialize with deinit.
    pub fn initSize(allocator: *Allocator, size: usize) !Buffer {...}

daurnimator · 2019-11-23T00:52:28Z

I had an idea..... Buffer could be a mode of std.fifo, except that it writes 0 to empty entries here

zig/lib/std/fifo.zig

Lines 115 to 125 in ad0871e

    
           { // set old range to undefined. Note: may be wrapped around 
        
               const slice = self.readableSliceMut(0); 
        
               if (slice.len >= count) { 
        
                   const unused = @sliceToBytes(slice[0..count]); 
        
                   @memset(unused.ptr, undefined, unused.len); 
        
               } else { 
        
                   const unused = @sliceToBytes(slice[0..]); 
        
                   @memset(unused.ptr, undefined, unused.len); 
        
                   const unused2 = @sliceToBytes(self.readableSliceMut(slice.len)[0 .. count - slice.len]); 
        
                   @memset(unused2.ptr, undefined, unused2.len); 
        
               }

Once the null terminated type comes (#3728) it should be straightforward to add any missing access ors.

MCRusher · 2019-11-23T00:53:36Z

@mikdusan
Thanks for the response, and for a way to achieve what I need.

Can I ask what the purpose of initSize is for a Buffer, and how it is used? It confuses me why this is necessary. It also seems you can do almost the same thing with Buffer.fromOwnedSlice().

Also, would it make sense to add an initCapacity function or similar?

I'm a bit confused by the first line of your example, wouldn't initSize(da,100) give you 100 bytes that a read function won't use, and then ensureCapacity(200) would make the total final allocation size at least 300 bytes, with 100 already being used?

I assume that, since the other init functions require some source to draw from, and initNull leaves the object in an invalid state until resize() or replaceContents() is called:

var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200);

would be the easiest way to preallocate 200 ready-to-use bytes in advance, although maybe I'm missing something.

Thanks again.

mikdusan · 2019-11-23T00:59:11Z

@mikdusan
I'm a bit confused by the first line of your example, wouldn't initSize(da,100) give you 100 bytes that a read function won't use, and then ensureCapacity(200) would make the total final allocation size at least 300 bytes, with 100 already being used?

@MCRusher, my intention was to show this code as an (untested) example for domain optimization. Sorry for the confusion and I meant to init 0 bytes at start.

var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200);

and small note, ensure capacity 200 after an init of 100 would only ensure minimum capacity of 200.

mikdusan · 2019-11-23T01:10:08Z

@MCRusher,

Also, would it make sense to add an initCapacity function or similar?

That API doesn't answer the question of size and adding something like initSizeCapacity() almost doesn't seem worth it. Just my opinion, and I would value experienced library creators opinions over mine in this matter.

MCRusher · 2019-11-23T01:44:28Z

@mikdusan
I understand now, thanks for the clarification.

And yeah, after looking at the ArrayList.ensureCapacity() and thinking about about what capacity means (total space, not unused space), you're definitely right about that.

I feel it would make sense to add a function like Buffer.initCapacity() that allows preallocating the capacity of a buffer right at the initialization of the instance.

My reasoning for this is that right now:

var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200);

seems to essentially do this under the hood:

self.list.init(allocator);
self.list.items = try self.list.allocator.realloc(self.list.items,8);
self.list.len = 1;
self.list.items[self.list.len-1] = 0;
self.list.items = try self.list.allocator.realloc(self.list.items,255);

Whereas something like:

pub fn initCapacity(allocator: *Allocator, capacity: usize) !Buffer {
    var self = initNull(allocator);
    try self.list.ensureCapacity(capacity+1);
    self.list.appendAssumeCapacity(0);
    return self;
}

should only perform one allocation instead of two if I'm not mistaken, but is kind of a complicated setup to do manually, and I feel would be better suited to being a function as shown.

Is this reasonable?

Thanks for the input.

mikdusan · 2019-11-23T02:42:26Z

maybe something like this. here's a barely tested diff.

diff --git a/lib/std/array_list.zig b/lib/std/array_list.zig
index 59fd2a10e..b18af0d29 100644
--- a/lib/std/array_list.zig
+++ b/lib/std/array_list.zig
@@ -41,6 +41,14 @@ pub fn AlignedArrayList(comptime T: type, comptime alignment: ?u29) type {
             };
         }
 
+        /// Initialize with capacity to hold at least num elements.
+        /// Deinitialize with `deinit` or use `toOwnedSlice`.
+        pub fn initCapacity(allocator: *Allocator, num: usize) !Self {
+            var self = Self.init(allocator);
+            try self.ensureCapacity(num);
+            return self;
+        }
+
         /// Release all allocated memory.
         pub fn deinit(self: Self) void {
             self.allocator.free(self.items);
diff --git a/lib/std/buffer.zig b/lib/std/buffer.zig
index 24bd23fa7..c5dee220d 100644
--- a/lib/std/buffer.zig
+++ b/lib/std/buffer.zig
@@ -17,6 +17,7 @@ pub const Buffer = struct {
         return self;
     }
 
+    /// Initialize memory to size bytes of undefined values.
     /// Must deinitialize with deinit.
     pub fn initSize(allocator: *Allocator, size: usize) !Buffer {
         var self = initNull(allocator);
@@ -24,6 +25,14 @@ pub const Buffer = struct {
         return self;
     }
 
+    /// Initialize with capacity to hold at least num bytes.
+    /// Must deinitialize with deinit.
+    pub fn initCapacity(allocator: *Allocator, num: usize) !Buffer {
+        var self = Buffer{ .list = try ArrayList(u8).initCapacity(allocator, num + 1) };
+        self.list.appendAssumeCapacity(0);
+        return self;
+    }
+
     /// Must deinitialize with deinit.
     /// None of the other operations are valid until you do one of these:
     /// * ::replaceContents
@@ -99,6 +108,10 @@ pub const Buffer = struct {
         return self.list.len - 1;
     }
 
+    pub fn capacity(self: Buffer) usize {
+        return self.list.items.len;
+    }
+
     pub fn append(self: *Buffer, m: []const u8) !void {
         const old_len = self.len();
         try self.resize(old_len + m.len);
@@ -156,3 +169,18 @@ test "simple Buffer" {
     try buf2.resize(4);
     testing.expect(buf.startsWith(buf2.toSlice()));
 }
+
+test "Buffer.initSize" {
+    var buf = try Buffer.initSize(debug.global_allocator, 3);
+    testing.expect(buf.len() == 3);
+    try buf.append("hello");
+    testing.expect(mem.eql(u8, buf.toSliceConst()[3..], "hello"));
+}
+
+test "Buffer.initCapacity" {
+    var buf = try Buffer.initCapacity(debug.global_allocator, 10);
+    testing.expect(buf.len() == 0);
+    testing.expect(buf.capacity() >= 10);
+    try buf.append("hello");
+    testing.expect(mem.eql(u8, buf.toSliceConst(), "hello"));
+}

MCRusher · 2019-11-23T03:02:42Z

Thanks for writing up a draft.

I think it makes sense like this, even more sense to extend it to ArrayList as well.

But I don't think my opinion is really what matters a lot for a change.

I'm not entirely familiar with Github, especially in dealing with other people's projects, what should be done from here?

mikdusan · 2019-11-23T03:17:34Z

If you are semi-proficient in forking a project (zig) and submitting a pull-request (PR), then the process might be something like this:

fork zig on github
setup write-access to your fork (via ssh or one of the other? mechanisms)
git clone to your workstation from the fork (your workspace)
associate that write-access to the workspace's .git properties
create a new branch from master to work on
checkout that new branch

consider that you need a zig executable to work against the branch. So either download a zig binary for your workstation, or build it from workspace (which requires LLVM and a few other things)

edit lib/std/*.zig as needed
test (go to zig's IRC for help on how to start running zig unit tests)
once you're happy, make sure your changes are committed, cleaned up, and git push to your github fork
then go back to github and access your fork, on left select your new branch, and create a PR from it
the PR will automatically get created in zig master issues/PR tracker
from there it's your PR and you can add comments and interact with people reviewing your code

if this seems complicated for you, you can alternatively:

clone direct from zig master on github to your workstation
create new branch and checkout new branch
work on it, commit
then get onto zig IRC and ask if someone is willing to create a PR on your behalf (I'd be willing to)
and even simpler, you can create your own diff and add it to this very issue just like I added in previous comment and this could be used by whomever is assisting you to create a PR as well

...hope this helps

MCRusher · 2019-11-23T14:35:36Z

Thanks, this is very helpful but I'm still a bit lost.

To be honest, your draft diff is almost exactly what I want already.

But I did believe that Buffer.capacity() would be counting the null byte as part of the capacity of the Buffer.

To make capacity reflect the actual amount of bytes that can be stored until reallocation, it could be changed to self.list.items.len-1, which should work in every case except Buffer.initNull(), since every other init function appends one byte at least.

Additionally, Buffer.capacity could check if self.list.items.len==0 and return 0 if so to bypass this issue, and not leave any more special cases on Buffer.initNull().

Does having Buffer.capacity reflect the null byte make some sense that I'm not seeing, or would this make sense?

I dropped the your diff into my copy of the zig-0.5.0, with three changes:

Buffer.capacity() reflects usable size:

pub fn capacity(self: Buffer) usize {
    return if (self.list.items.len > 0)
        self.list.items.len - 1
    else
        0;
}

Added conditions to Buffer.initCapacity() test:

test "Buffer.initCapacity" {
    var buf = try Buffer.initCapacity(debug.global_allocator, 10);
    const old_cap = buf.capacity();
    testing.expect(buf.len() == 0);
    testing.expect(old_cap >= 10);
    try buf.append("hello");
    testing.expect(buf.len() == 5);
    testing.expect(buf.capacity() == old_cap);
    testing.expect(mem.eql(u8, buf.toSliceConst(), "hello"));
}

ArrayList.initCapacity() test added:

test "std.ArrayList.initCapacity" {
    var bytes: [1024]u8 = undefined;
    const allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
    var list = try ArrayList(i8).initCapacity(allocator,200);
    defer list.deinit();
    testing.expect(list.count() == 0);
    testing.expect(list.capacity() >= 200);
}

and it seems to work fine, all tests pass. I modified my test example to use these new features and it now works fine with no reallocation, and behaves as expected with user input and file input as well.

Thanks again for the help and consideration so far.

MCRusher · 2019-11-24T14:09:17Z

So I created a fork, that was the easiest thing for me to understand and do.

I just want to make sure it's ok with you to start the pull request, since you wrote most of the actual implementation.

Also how can I best credit your contributions to it?

Thanks a lot for brainstorming with me, writing a diff, and helping me through the process.

mikdusan · 2019-11-24T18:22:41Z

@MCRusher feel free to PR this as solely your contribution, I only created a diff to better communicate on this issue.

MCRusher · 2019-11-25T06:47:29Z

@mikdusan

Alright well thanks a lot for sticking with me and helping me through the process.

Should I close the issue now?

daurnimator · 2019-11-25T06:49:23Z

Should I close the issue now?

usually an issue is closed once a fix is merged.
FWIW if you write "Fixes #3740" in your pull request, github will close this issue for you on merge.

MCRusher · 2019-11-25T07:03:00Z

Okay, thanks.

daurnimator added the standard library This issue involves writing Zig code for the standard library. label Nov 22, 2019

MCRusher mentioned this issue Nov 25, 2019

Add initCapacity for buffer & arraylist #3769

Merged

andrewrk added this to the 0.7.0 milestone Nov 27, 2019

andrewrk closed this as completed in #3769 Nov 27, 2019

andrewrk modified the milestones: 0.7.0, 0.6.0 Feb 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std.Buffer.initSize() doesn't allow append to utilize the space allocated #3740

std.Buffer.initSize() doesn't allow append to utilize the space allocated #3740

MCRusher commented Nov 22, 2019

mikdusan commented Nov 22, 2019 •

edited

Loading

daurnimator commented Nov 23, 2019

MCRusher commented Nov 23, 2019

mikdusan commented Nov 23, 2019 •

edited

Loading

mikdusan commented Nov 23, 2019

MCRusher commented Nov 23, 2019 •

edited

Loading

mikdusan commented Nov 23, 2019

MCRusher commented Nov 23, 2019

mikdusan commented Nov 23, 2019

MCRusher commented Nov 23, 2019

MCRusher commented Nov 24, 2019

mikdusan commented Nov 24, 2019

MCRusher commented Nov 25, 2019

daurnimator commented Nov 25, 2019

MCRusher commented Nov 25, 2019

std.Buffer.initSize() doesn't allow append to utilize the space allocated #3740

std.Buffer.initSize() doesn't allow append to utilize the space allocated #3740

Comments

MCRusher commented Nov 22, 2019

mikdusan commented Nov 22, 2019 • edited Loading

daurnimator commented Nov 23, 2019

MCRusher commented Nov 23, 2019

mikdusan commented Nov 23, 2019 • edited Loading

mikdusan commented Nov 23, 2019

MCRusher commented Nov 23, 2019 • edited Loading

mikdusan commented Nov 23, 2019

MCRusher commented Nov 23, 2019

mikdusan commented Nov 23, 2019

MCRusher commented Nov 23, 2019

MCRusher commented Nov 24, 2019

mikdusan commented Nov 24, 2019

MCRusher commented Nov 25, 2019

daurnimator commented Nov 25, 2019

MCRusher commented Nov 25, 2019

mikdusan commented Nov 22, 2019 •

edited

Loading

mikdusan commented Nov 23, 2019 •

edited

Loading

MCRusher commented Nov 23, 2019 •

edited

Loading