Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.Buffer.initSize() doesn't allow append to utilize the space allocated #3740

Closed
MCRusher opened this issue Nov 22, 2019 · 15 comments · Fixed by #3769
Closed

std.Buffer.initSize() doesn't allow append to utilize the space allocated #3740

MCRusher opened this issue Nov 22, 2019 · 15 comments · Fixed by #3769
Labels
standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@MCRusher
Copy link
Contributor

After looking at the source code, why does Buffer.initSize() call Buffer.resize(), which calls ArrayList(u8).resize(), which adds new elements,

instead of just calling ArrayList(u8).ensureCapacity() or similar?

It's possible I am misinterpreting the purpose of initSize, but it seems to me that it would make a lot of sense for initSize to allow preallocation of memory to reduce future reallocations, rather than create a bunch of blank space initialized with random values which must be set from outside the buffer object itself.

A usage case for this would be reading an unbounded length line of text from a file or console using std.Buffer, and then wanting to get an owned slice out of it to pass around between functions. You can expect a certain line length, like ~200 or so, so it would make sense to account for that case.

Currently, I have used initSize with size 0 to do so since calling toOwnedSlice after a readline call results in "¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬some string input" when initSize is set greater than 0, rather than what I expected it to do. I have a related example source file included.

If this is useful and makes sense, I am very curious how, and then also how this should be done, as calling Buffer.resize(0) seems hackish and not well optimized for the use case.

Thanks, and sorry if this is messy or misses the point.

@daurnimator daurnimator added the standard library This issue involves writing Zig code for the standard library. label Nov 22, 2019
@mikdusan
Copy link
Member

mikdusan commented Nov 22, 2019

In Buffer size is not capacity so that being said this is currently the way I would optimize for a minimal capacity:

edit: changed size arg 1000 in following snippet:

var buf = try std.Buffer.initSize(std.heap.direct_allocator, 0);
buf.list.ensureCapacity(200);

assuming things remain, maybe clarifying API doc is in order:

    /// Initialize to size bytes of undefined value.
    /// Must deinitialize with deinit.
    pub fn initSize(allocator: *Allocator, size: usize) !Buffer {...}

@daurnimator
Copy link
Contributor

I had an idea..... Buffer could be a mode of std.fifo, except that it writes 0 to empty entries here

zig/lib/std/fifo.zig

Lines 115 to 125 in ad0871e

{ // set old range to undefined. Note: may be wrapped around
const slice = self.readableSliceMut(0);
if (slice.len >= count) {
const unused = @sliceToBytes(slice[0..count]);
@memset(unused.ptr, undefined, unused.len);
} else {
const unused = @sliceToBytes(slice[0..]);
@memset(unused.ptr, undefined, unused.len);
const unused2 = @sliceToBytes(self.readableSliceMut(slice.len)[0 .. count - slice.len]);
@memset(unused2.ptr, undefined, unused2.len);
}

Once the null terminated type comes (#3728) it should be straightforward to add any missing access ors.

@MCRusher
Copy link
Contributor Author

@mikdusan
Thanks for the response, and for a way to achieve what I need.

Can I ask what the purpose of initSize is for a Buffer, and how it is used? It confuses me why this is necessary. It also seems you can do almost the same thing with Buffer.fromOwnedSlice().

Also, would it make sense to add an initCapacity function or similar?

I'm a bit confused by the first line of your example, wouldn't initSize(da,100) give you 100 bytes that a read function won't use, and then ensureCapacity(200) would make the total final allocation size at least 300 bytes, with 100 already being used?

I assume that, since the other init functions require some source to draw from, and initNull leaves the object in an invalid state until resize() or replaceContents() is called:

var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200);

would be the easiest way to preallocate 200 ready-to-use bytes in advance, although maybe I'm missing something.

Thanks again.

@mikdusan
Copy link
Member

mikdusan commented Nov 23, 2019

@mikdusan
I'm a bit confused by the first line of your example, wouldn't initSize(da,100) give you 100 bytes that a read function won't use, and then ensureCapacity(200) would make the total final allocation size at least 300 bytes, with 100 already being used?

@MCRusher, my intention was to show this code as an (untested) example for domain optimization. Sorry for the confusion and I meant to init 0 bytes at start.

var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200);

and small note, ensure capacity 200 after an init of 100 would only ensure minimum capacity of 200.

@mikdusan
Copy link
Member

@MCRusher,

Also, would it make sense to add an initCapacity function or similar?

That API doesn't answer the question of size and adding something like initSizeCapacity() almost doesn't seem worth it. Just my opinion, and I would value experienced library creators opinions over mine in this matter.

@MCRusher
Copy link
Contributor Author

MCRusher commented Nov 23, 2019

@mikdusan
I understand now, thanks for the clarification.

And yeah, after looking at the ArrayList.ensureCapacity() and thinking about about what capacity means (total space, not unused space), you're definitely right about that.

I feel it would make sense to add a function like Buffer.initCapacity() that allows preallocating the capacity of a buffer right at the initialization of the instance.

My reasoning for this is that right now:

var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200);

seems to essentially do this under the hood:

self.list.init(allocator);
self.list.items = try self.list.allocator.realloc(self.list.items,8);
self.list.len = 1;
self.list.items[self.list.len-1] = 0;
self.list.items = try self.list.allocator.realloc(self.list.items,255);

Whereas something like:

pub fn initCapacity(allocator: *Allocator, capacity: usize) !Buffer {
    var self = initNull(allocator);
    try self.list.ensureCapacity(capacity+1);
    self.list.appendAssumeCapacity(0);
    return self;
}

should only perform one allocation instead of two if I'm not mistaken, but is kind of a complicated setup to do manually, and I feel would be better suited to being a function as shown.

Is this reasonable?

Thanks for the input.

@mikdusan
Copy link
Member

maybe something like this. here's a barely tested diff.

diff --git a/lib/std/array_list.zig b/lib/std/array_list.zig
index 59fd2a10e..b18af0d29 100644
--- a/lib/std/array_list.zig
+++ b/lib/std/array_list.zig
@@ -41,6 +41,14 @@ pub fn AlignedArrayList(comptime T: type, comptime alignment: ?u29) type {
             };
         }
 
+        /// Initialize with capacity to hold at least num elements.
+        /// Deinitialize with `deinit` or use `toOwnedSlice`.
+        pub fn initCapacity(allocator: *Allocator, num: usize) !Self {
+            var self = Self.init(allocator);
+            try self.ensureCapacity(num);
+            return self;
+        }
+
         /// Release all allocated memory.
         pub fn deinit(self: Self) void {
             self.allocator.free(self.items);
diff --git a/lib/std/buffer.zig b/lib/std/buffer.zig
index 24bd23fa7..c5dee220d 100644
--- a/lib/std/buffer.zig
+++ b/lib/std/buffer.zig
@@ -17,6 +17,7 @@ pub const Buffer = struct {
         return self;
     }
 
+    /// Initialize memory to size bytes of undefined values.
     /// Must deinitialize with deinit.
     pub fn initSize(allocator: *Allocator, size: usize) !Buffer {
         var self = initNull(allocator);
@@ -24,6 +25,14 @@ pub const Buffer = struct {
         return self;
     }
 
+    /// Initialize with capacity to hold at least num bytes.
+    /// Must deinitialize with deinit.
+    pub fn initCapacity(allocator: *Allocator, num: usize) !Buffer {
+        var self = Buffer{ .list = try ArrayList(u8).initCapacity(allocator, num + 1) };
+        self.list.appendAssumeCapacity(0);
+        return self;
+    }
+
     /// Must deinitialize with deinit.
     /// None of the other operations are valid until you do one of these:
     /// * ::replaceContents
@@ -99,6 +108,10 @@ pub const Buffer = struct {
         return self.list.len - 1;
     }
 
+    pub fn capacity(self: Buffer) usize {
+        return self.list.items.len;
+    }
+
     pub fn append(self: *Buffer, m: []const u8) !void {
         const old_len = self.len();
         try self.resize(old_len + m.len);
@@ -156,3 +169,18 @@ test "simple Buffer" {
     try buf2.resize(4);
     testing.expect(buf.startsWith(buf2.toSlice()));
 }
+
+test "Buffer.initSize" {
+    var buf = try Buffer.initSize(debug.global_allocator, 3);
+    testing.expect(buf.len() == 3);
+    try buf.append("hello");
+    testing.expect(mem.eql(u8, buf.toSliceConst()[3..], "hello"));
+}
+
+test "Buffer.initCapacity" {
+    var buf = try Buffer.initCapacity(debug.global_allocator, 10);
+    testing.expect(buf.len() == 0);
+    testing.expect(buf.capacity() >= 10);
+    try buf.append("hello");
+    testing.expect(mem.eql(u8, buf.toSliceConst(), "hello"));
+}

@MCRusher
Copy link
Contributor Author

Thanks for writing up a draft.

I think it makes sense like this, even more sense to extend it to ArrayList as well.

But I don't think my opinion is really what matters a lot for a change.

I'm not entirely familiar with Github, especially in dealing with other people's projects, what should be done from here?

@mikdusan
Copy link
Member

If you are semi-proficient in forking a project (zig) and submitting a pull-request (PR), then the process might be something like this:

  • fork zig on github
  • setup write-access to your fork (via ssh or one of the other? mechanisms)
  • git clone to your workstation from the fork (your workspace)
  • associate that write-access to the workspace's .git properties
  • create a new branch from master to work on
  • checkout that new branch

consider that you need a zig executable to work against the branch. So either download a zig binary for your workstation, or build it from workspace (which requires LLVM and a few other things)

  • edit lib/std/*.zig as needed
  • test (go to zig's IRC for help on how to start running zig unit tests)
  • once you're happy, make sure your changes are committed, cleaned up, and git push to your github fork
  • then go back to github and access your fork, on left select your new branch, and create a PR from it
  • the PR will automatically get created in zig master issues/PR tracker
  • from there it's your PR and you can add comments and interact with people reviewing your code

if this seems complicated for you, you can alternatively:

  • clone direct from zig master on github to your workstation
  • create new branch and checkout new branch
  • work on it, commit
  • then get onto zig IRC and ask if someone is willing to create a PR on your behalf (I'd be willing to)
  • and even simpler, you can create your own diff and add it to this very issue just like I added in previous comment and this could be used by whomever is assisting you to create a PR as well

...hope this helps

@MCRusher
Copy link
Contributor Author

Thanks, this is very helpful but I'm still a bit lost.

To be honest, your draft diff is almost exactly what I want already.

But I did believe that Buffer.capacity() would be counting the null byte as part of the capacity of the Buffer.

To make capacity reflect the actual amount of bytes that can be stored until reallocation, it could be changed to self.list.items.len-1, which should work in every case except Buffer.initNull(), since every other init function appends one byte at least.

Additionally, Buffer.capacity could check if self.list.items.len==0 and return 0 if so to bypass this issue, and not leave any more special cases on Buffer.initNull().

Does having Buffer.capacity reflect the null byte make some sense that I'm not seeing, or would this make sense?

I dropped the your diff into my copy of the zig-0.5.0, with three changes:

  1. Buffer.capacity() reflects usable size:
pub fn capacity(self: Buffer) usize {
    return if (self.list.items.len > 0)
        self.list.items.len - 1
    else
        0;
}
  1. Added conditions to Buffer.initCapacity() test:
test "Buffer.initCapacity" {
    var buf = try Buffer.initCapacity(debug.global_allocator, 10);
    const old_cap = buf.capacity();
    testing.expect(buf.len() == 0);
    testing.expect(old_cap >= 10);
    try buf.append("hello");
    testing.expect(buf.len() == 5);
    testing.expect(buf.capacity() == old_cap);
    testing.expect(mem.eql(u8, buf.toSliceConst(), "hello"));
}
  1. ArrayList.initCapacity() test added:
test "std.ArrayList.initCapacity" {
    var bytes: [1024]u8 = undefined;
    const allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
    var list = try ArrayList(i8).initCapacity(allocator,200);
    defer list.deinit();
    testing.expect(list.count() == 0);
    testing.expect(list.capacity() >= 200);
}

and it seems to work fine, all tests pass. I modified my test example to use these new features and it now works fine with no reallocation, and behaves as expected with user input and file input as well.

Thanks again for the help and consideration so far.

@MCRusher
Copy link
Contributor Author

So I created a fork, that was the easiest thing for me to understand and do.

I just want to make sure it's ok with you to start the pull request, since you wrote most of the actual implementation.

Also how can I best credit your contributions to it?

Thanks a lot for brainstorming with me, writing a diff, and helping me through the process.

@mikdusan
Copy link
Member

@MCRusher feel free to PR this as solely your contribution, I only created a diff to better communicate on this issue.

@MCRusher
Copy link
Contributor Author

@mikdusan

Alright well thanks a lot for sticking with me and helping me through the process.

Should I close the issue now?

@daurnimator
Copy link
Contributor

Should I close the issue now?

usually an issue is closed once a fix is merged.
FWIW if you write "Fixes #3740" in your pull request, github will close this issue for you on merge.

@MCRusher
Copy link
Contributor Author

Okay, thanks.

@andrewrk andrewrk added this to the 0.7.0 milestone Nov 27, 2019
@andrewrk andrewrk modified the milestones: 0.7.0, 0.6.0 Feb 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants