-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vec: add uninitialized(uint) and resize(&mut self, uint, T). #19104
Conversation
if l < n { | ||
self.grow(n - l, value) | ||
} else { | ||
unsafe { self.set_len(n) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The destructors need to be called here. It should just be using truncate
.
8d81162
to
e5f61b1
Compare
Incidentally, modified |
@@ -383,6 +408,29 @@ impl<T: Clone> Vec<T> { | |||
} | |||
} | |||
|
|||
impl<T: Int> Vec<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not T: Copy
at this point? Are you just trying to use Int here as a synonym for "meaningless bits"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int
is the most general trait which includes u8
(expected to be the most frequent use case for this ctor) but is known not to imply Drop
(we should forbid allocating a Vec
of RAII objects such as Vec<Vec>
with this ctor.) According to the document, Int
is
A built-in signed or unsigned integer.
What problem do you expect with Int
? Do you think it is too specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think there's a point in having a bound here. It's still very unsafe either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way uninitialized memory works in LLVM is that each access can return an arbitrary value. For example, using an uninitialized integer for indexing may pass the bounds check and then become a different value for the pointer offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned Vec
contains what we call an "uninitialized" raw memory chunk, but from LLVM's point of view, it should just be "some data returned by C function called je_mallocx()
" and should not be different from, say, "some data returned by C function called getaddrinfo(3)
." If you think it's legitimate to read each struct addrinfo
returned by getaddrinfo(3)
under LLVM, then it should also be legitimate to read void*
buffer returned by je_mallocx()
. Reading such a buffer should surely be called UB, but it will never lead to deletion of your home directory; it just yields unpredictable byte sequence.
Hit the [ir] button to see how Vec
and getaddrinfo
are translated to LLVM IR:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware that Rust is not currently communicating that the memory is uninitialized in a build without --disable-jemalloc. LLVM special cases certain libc symbols like malloc
, and it does infer this information from the symbol name along with doing stuff like dead store elimination of dynamic allocations. The fact that it isn't communicated to LLVM for other allocator symbols is a performance bug. It is not a guarantee that you're allowed to leverage. Note that the reference manual currently states that reading uninitialized memory is UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--disable-jemalloc
Good point. You're right, reading data from uninitialized elements of a Vec
under --disable-jemalloc
is what LLVM define as UB.
Still, I don't think this is "more unsafe" than existing unsafe
functions Vec::from_raw_parts()
and set_len()
which are already so flexible that anything can happen from their incorrect usage. So the discussion should focus on "will adding one more rewards us?"
My point is, allocating a buffer to pass its slice to Reader
's read_*()
methods and alike is a frequent use case and the library should support it with options with varying degrees of safety. Note that the length of a vector has to be set properly beforehand. Then my Vec::uninitialized()
constructor should be preferred over calling bare set_len()
. From a safety point of view, Vec::from_elem()
is the best, but "extra" cost of clearing a buffer is sometimes undesirable. A benchmark with a 16 GB buffer on 3 GHz Xeon machine:
fn main() {
const GB: uint = 1 << 30;
const N: uint = 16 * GB;
let mut v = if 1 < std::os::args().len() {
let mut v = Vec::<u8>::with_capacity(N);
unsafe { v.set_len(N); }
v
} else {
Vec::<u8>::from_elem(N, 0)
};
println!("{}", std::io::stdin().read_at_least(N, v.as_mut_slice()));
}
$ time ./large_vec < /dev/zero
Ok(17179869184)
./large_vec < /dev/zero 4.99s user 11.52s system 99% cpu 16.529 total
$ time ./large_vec < /dev/zero x
Ok(17179869184)
./large_vec x < /dev/zero 0.00s user 9.14s system 99% cpu 9.157 total
Also remember, if someone makes a mistake of reading past the end of the region written by read_*()
, it's just a bug regardless whether or not the rest of the buffer was zero-cleard beforehand in a well-defined way. "Safety" and "well-defined behavior" don't substantially help here. We shouldn't always need to pay the extra cost of clearing a buffer for the doubtful benefit from "safety" and "well-defined behavior".
This is a convenience wrapper on top of `grow()` and `truncate()`. Signed-off-by: NODA, Kai <[email protected]>
Signed-off-by: NODA, Kai <[email protected]>
e5f61b1
to
dbf2a76
Compare
I'd like to drop Incidentally, here's my motivation for adding the let mut buf: Vec<u8> = vec![];
loop {
buf.resize(HDR_LEN, 0);
try!( stream.read_at_least(HDR_LEN, buf[mut]) );
let msg_len = (... either bit operations on buf or transmute into C struct ...);
buf.resize(msg_len, 0);
try!( stream.read_at_least(msg_len, buf[mut]) );
(... cook the data in buf ...)
} Well, |
@nodakai I'm drafting up collections reform part 2. I could add this to the docket for discussion, if that works for you? |
@gankro Yes please, that seems to be the only way for this proposal to make it into the merge. I'll close this PR later. |
Closing due to #19104 (comment) |
Add two more convenience methods to
Vec
.Please review my English in the rustdoc comments!