-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std.Buffer.initSize() doesn't allow append to utilize the space allocated #3740
Comments
In edit: changed size arg var buf = try std.Buffer.initSize(std.heap.direct_allocator, 0);
buf.list.ensureCapacity(200); assuming things remain, maybe clarifying API doc is in order: /// Initialize to size bytes of undefined value.
/// Must deinitialize with deinit.
pub fn initSize(allocator: *Allocator, size: usize) !Buffer {...} |
I had an idea..... Buffer could be a mode of std.fifo, except that it writes Lines 115 to 125 in ad0871e
Once the null terminated type comes (#3728) it should be straightforward to add any missing access ors. |
@mikdusan Can I ask what the purpose of initSize is for a Buffer, and how it is used? It confuses me why this is necessary. It also seems you can do almost the same thing with Buffer.fromOwnedSlice(). Also, would it make sense to add an initCapacity function or similar? I'm a bit confused by the first line of your example, wouldn't initSize(da,100) give you 100 bytes that a read function won't use, and then ensureCapacity(200) would make the total final allocation size at least 300 bytes, with 100 already being used? I assume that, since the other init functions require some source to draw from, and initNull leaves the object in an invalid state until resize() or replaceContents() is called: var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200); would be the easiest way to preallocate 200 ready-to-use bytes in advance, although maybe I'm missing something. Thanks again. |
@MCRusher, my intention was to show this code as an (untested) example for domain optimization. Sorry for the confusion and I meant to init 0 bytes at start. var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200); and small note, ensure capacity 200 after an init of 100 would only ensure minimum capacity of 200. |
That API doesn't answer the question of size and adding something like |
@mikdusan And yeah, after looking at the ArrayList.ensureCapacity() and thinking about about what capacity means (total space, not unused space), you're definitely right about that. I feel it would make sense to add a function like Buffer.initCapacity() that allows preallocating the capacity of a buffer right at the initialization of the instance. My reasoning for this is that right now: var buf = try std.Buffer.initSize(std.heap.direct_allocator,0);
try buf.list.ensureCapacity(200); seems to essentially do this under the hood: self.list.init(allocator);
self.list.items = try self.list.allocator.realloc(self.list.items,8);
self.list.len = 1;
self.list.items[self.list.len-1] = 0;
self.list.items = try self.list.allocator.realloc(self.list.items,255); Whereas something like: pub fn initCapacity(allocator: *Allocator, capacity: usize) !Buffer {
var self = initNull(allocator);
try self.list.ensureCapacity(capacity+1);
self.list.appendAssumeCapacity(0);
return self;
} should only perform one allocation instead of two if I'm not mistaken, but is kind of a complicated setup to do manually, and I feel would be better suited to being a function as shown. Is this reasonable? Thanks for the input. |
maybe something like this. here's a barely tested diff. diff --git a/lib/std/array_list.zig b/lib/std/array_list.zig
index 59fd2a10e..b18af0d29 100644
--- a/lib/std/array_list.zig
+++ b/lib/std/array_list.zig
@@ -41,6 +41,14 @@ pub fn AlignedArrayList(comptime T: type, comptime alignment: ?u29) type {
};
}
+ /// Initialize with capacity to hold at least num elements.
+ /// Deinitialize with `deinit` or use `toOwnedSlice`.
+ pub fn initCapacity(allocator: *Allocator, num: usize) !Self {
+ var self = Self.init(allocator);
+ try self.ensureCapacity(num);
+ return self;
+ }
+
/// Release all allocated memory.
pub fn deinit(self: Self) void {
self.allocator.free(self.items);
diff --git a/lib/std/buffer.zig b/lib/std/buffer.zig
index 24bd23fa7..c5dee220d 100644
--- a/lib/std/buffer.zig
+++ b/lib/std/buffer.zig
@@ -17,6 +17,7 @@ pub const Buffer = struct {
return self;
}
+ /// Initialize memory to size bytes of undefined values.
/// Must deinitialize with deinit.
pub fn initSize(allocator: *Allocator, size: usize) !Buffer {
var self = initNull(allocator);
@@ -24,6 +25,14 @@ pub const Buffer = struct {
return self;
}
+ /// Initialize with capacity to hold at least num bytes.
+ /// Must deinitialize with deinit.
+ pub fn initCapacity(allocator: *Allocator, num: usize) !Buffer {
+ var self = Buffer{ .list = try ArrayList(u8).initCapacity(allocator, num + 1) };
+ self.list.appendAssumeCapacity(0);
+ return self;
+ }
+
/// Must deinitialize with deinit.
/// None of the other operations are valid until you do one of these:
/// * ::replaceContents
@@ -99,6 +108,10 @@ pub const Buffer = struct {
return self.list.len - 1;
}
+ pub fn capacity(self: Buffer) usize {
+ return self.list.items.len;
+ }
+
pub fn append(self: *Buffer, m: []const u8) !void {
const old_len = self.len();
try self.resize(old_len + m.len);
@@ -156,3 +169,18 @@ test "simple Buffer" {
try buf2.resize(4);
testing.expect(buf.startsWith(buf2.toSlice()));
}
+
+test "Buffer.initSize" {
+ var buf = try Buffer.initSize(debug.global_allocator, 3);
+ testing.expect(buf.len() == 3);
+ try buf.append("hello");
+ testing.expect(mem.eql(u8, buf.toSliceConst()[3..], "hello"));
+}
+
+test "Buffer.initCapacity" {
+ var buf = try Buffer.initCapacity(debug.global_allocator, 10);
+ testing.expect(buf.len() == 0);
+ testing.expect(buf.capacity() >= 10);
+ try buf.append("hello");
+ testing.expect(mem.eql(u8, buf.toSliceConst(), "hello"));
+} |
Thanks for writing up a draft. I think it makes sense like this, even more sense to extend it to ArrayList as well. But I don't think my opinion is really what matters a lot for a change. I'm not entirely familiar with Github, especially in dealing with other people's projects, what should be done from here? |
If you are semi-proficient in forking a project (zig) and submitting a pull-request (PR), then the process might be something like this:
consider that you need a zig executable to work against the branch. So either download a zig binary for your workstation, or build it from workspace (which requires LLVM and a few other things)
if this seems complicated for you, you can alternatively:
...hope this helps |
Thanks, this is very helpful but I'm still a bit lost. To be honest, your draft diff is almost exactly what I want already. But I did believe that Buffer.capacity() would be counting the null byte as part of the capacity of the Buffer. To make capacity reflect the actual amount of bytes that can be stored until reallocation, it could be changed to self.list.items.len-1, which should work in every case except Buffer.initNull(), since every other init function appends one byte at least. Additionally, Buffer.capacity could check if self.list.items.len==0 and return 0 if so to bypass this issue, and not leave any more special cases on Buffer.initNull(). Does having Buffer.capacity reflect the null byte make some sense that I'm not seeing, or would this make sense? I dropped the your diff into my copy of the zig-0.5.0, with three changes:
pub fn capacity(self: Buffer) usize {
return if (self.list.items.len > 0)
self.list.items.len - 1
else
0;
}
test "Buffer.initCapacity" {
var buf = try Buffer.initCapacity(debug.global_allocator, 10);
const old_cap = buf.capacity();
testing.expect(buf.len() == 0);
testing.expect(old_cap >= 10);
try buf.append("hello");
testing.expect(buf.len() == 5);
testing.expect(buf.capacity() == old_cap);
testing.expect(mem.eql(u8, buf.toSliceConst(), "hello"));
}
test "std.ArrayList.initCapacity" {
var bytes: [1024]u8 = undefined;
const allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
var list = try ArrayList(i8).initCapacity(allocator,200);
defer list.deinit();
testing.expect(list.count() == 0);
testing.expect(list.capacity() >= 200);
} and it seems to work fine, all tests pass. I modified my test example to use these new features and it now works fine with no reallocation, and behaves as expected with user input and file input as well. Thanks again for the help and consideration so far. |
So I created a fork, that was the easiest thing for me to understand and do. I just want to make sure it's ok with you to start the pull request, since you wrote most of the actual implementation. Also how can I best credit your contributions to it? Thanks a lot for brainstorming with me, writing a diff, and helping me through the process. |
@MCRusher feel free to PR this as solely your contribution, I only created a diff to better communicate on this issue. |
Alright well thanks a lot for sticking with me and helping me through the process. Should I close the issue now? |
usually an issue is closed once a fix is merged. |
Okay, thanks. |
After looking at the source code, why does Buffer.initSize() call Buffer.resize(), which calls ArrayList(u8).resize(), which adds new elements,
instead of just calling ArrayList(u8).ensureCapacity() or similar?
It's possible I am misinterpreting the purpose of initSize, but it seems to me that it would make a lot of sense for initSize to allow preallocation of memory to reduce future reallocations, rather than create a bunch of blank space initialized with random values which must be set from outside the buffer object itself.
A usage case for this would be reading an unbounded length line of text from a file or console using std.Buffer, and then wanting to get an owned slice out of it to pass around between functions. You can expect a certain line length, like ~200 or so, so it would make sense to account for that case.
Currently, I have used initSize with size 0 to do so since calling toOwnedSlice after a readline call results in "¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬some string input" when initSize is set greater than 0, rather than what I expected it to do. I have a related example source file included.
If this is useful and makes sense, I am very curious how, and then also how this should be done, as calling Buffer.resize(0) seems hackish and not well optimized for the use case.
Thanks, and sorry if this is messy or misses the point.
The text was updated successfully, but these errors were encountered: