-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
basic support for arbitrary chunks #25
Conversation
Thanks a lot for taking the time to implement this! I agree that it would be a useful thing to have. I think the API can still be improved. I struggled with a similar thing a lot in Claxon. I considered a callback-based API there too, but then I realised that an iterator-based API is more flexible, more convenient, and more performant:
Also, rather than buffering up the entire chunk in memory, it is advantageous to expose a bounded We could adapt
I had not fully implemented this for Claxon yet, so sorry if the details are a bit vague, but I think this is the way to go. The analogous parts there are For writing, providing slices at construction time limits the caller; it prevents streaming, or may force unnecessary buffering. (For example, if you would want to write a cover art image, and the writer takes an
What do you think?
Thanks for pointing this out, looks like a bug indeed. Do you happen to have a small (< 100 byte) test sample that currently fails to read because of this? It would be nice to add that as a regression test. |
Here is a gist of the issue https://play.rust-lang.org/?gist=861c43a989f7b01ebef5aafcf466d274&version=stable&mode=debug&edition=2015, and a conversation that I think relates to the problem. I think an alternative if to drop the Iterator trait and have a next_chunk() method instead... Something like that:
|
Aaah, that’s right. Now that you mention it, I remember hitting the same issue in Claxon. The root cause is that in
That's a shame indeed. I somehow thought that the An alternative would be to expose
I would prefer to have one that occurred in the wild; we’ve observed files before that should be invalid according to Microsoft’s documentation, but some software wrote them nonetheless. I can imagine that the padding byte should be there, but de facto it is not. All test samples in the repository happen to have only even-sized chunks. According to this document (chapter 2) you are right about the padding byte though; if an artificial test file is the best we can do, then let's go with that. |
Hey @ruuda, here is a WIP for reading arbitrary chunk, please have a look and tell me what you think. The idea is to cut WavReader in half, so there is now a ChunksReader that does most of the heavy lifting while the WavReader keeps the same easy to use interface. I'm working on the writing side too, but once again, our nice big plan will not work as is. The problem here is that Drop::drop has no way to report a problem... So I'm considering other alternatives. |
Here we go, @ruuda, a temptative writer in the same spirit. Tell me if it's worth finalizeing or it it is too far from what you would like to see. |
We already deal with this: there is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the reader part, and I like the way it is going, but I think we can make the API easier and safer to use.
I did not look at the writer part yet ... this PR is getting really big, could you maybe split the writer part into a separate PR?
src/lib.rs
Outdated
@@ -327,6 +327,8 @@ pub enum Error { | |||
IoError(io::Error), | |||
/// Ill-formed WAVE data was encountered. | |||
FormatError(&'static str), | |||
/// An inconsistency in parsing chunks has been detected. | |||
InvalidState(&'static str), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this error represents a programming error. I would use an assertion instead in these cases. An Error
is for errors that can arise at runtime even when the library is used correctly. Users of the library should expect these errors and be prepared to deal with them. An invalid state on the other hand, is a bug. As a user of the library, there is no good way to recover from it.
src/lib.rs
Outdated
@@ -779,6 +787,7 @@ fn append_works_on_files() { | |||
let len = fs::metadata("append.wav").unwrap().len(); | |||
|
|||
let mut appender = WavWriter::append("append.wav").unwrap(); | |||
println!("appender: {:?} len:{}", appender.spec(), len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a leftover debug print?
src/lib.rs
Outdated
} | ||
|
||
#[test] | ||
fn write_read_chunks_is_lossless_with_odd_size() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for covering this case too!
Maybe we can avoid some duplication by putting the test in a loop, where the chunk body size is the loop variable. Something like
let full_contents = b"12345678";
for body_len in 0..contents.len() {
contents = &full_contents[..body_len];
// Write and read here, using `contents` as chunk body.
}
src/read.rs
Outdated
} | ||
// If no data chunk is ever encountered, the function will return | ||
// via one of the try! macros that return an Err on end of file. | ||
pub fn next_chunk(&mut self) -> Result<Option<ChunkHeader>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this next()
to stay as close to std::iter::Iterator
as possible. Also, I think we can make the API safer by returning something like
enum Chunk<'r> {
Fmt(pub WavSpecEx),
Fact(pub Fact),
Data(pub EmbeddedReader<'r>),
Unknown(pub [u8; 4], pub EmbeddedReader<'r>),
}
This would make the API impossible to misuse. It would prevent the case of calling read_fmt_chunk
while not at an fmt chunk, and it would ensure that once the borrow of self
ends, the reader is positioned at the start of a chunk (if EmbeddedReader
would skip to the end of the chunk on drop).
src/read.rs
Outdated
self.reader | ||
} | ||
|
||
pub fn into_wav_reader(mut self) -> Result<WavReader<R>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure how useful this is, because of the concerns you raised before about accessing chunks after the DATA chunk. What we could do instead, is allowing to construct the samples iterator from a WavSpecEx
and a Reader
that is assumed to be at the start of a DATA chunk. The reader could be an EmbeddedReader
from Chunk::Data
. Then if you want access to all chunks and still read the audio data in one pass, you can do so, albeit with a bit more manual work. And if you just want the audio WavReader
still works, or if you just want the raw data of a specific chunk, ChunkReader
works. What do you think?
hey @ruuda, I move the the Writer part away for now and found out we had one sample with non standard chunk to test with. the lossless read/write tests will come back with the Writer later. There was still one thing I could not do as you expected. I can not use the EmbeddedReader for the Data chunk as |
src/lib.rs
Outdated
@@ -790,3 +790,38 @@ fn append_works_on_files() { | |||
|
|||
assert_contents("append.wav", &[11, 13, 17, 19, 23]); | |||
} | |||
|
|||
#[cfg(test)] | |||
macro_rules! gard { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat, I’ve wanted such a thing in a few other places too. What does gard
mean though, did you mean guard
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, indeed, this is a French-ism.
src/read.rs
Outdated
/// how long the chunk is, in bytes | ||
pub len: i64, | ||
/// how much bytes remains to be read | ||
pub remaining: i64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these are i64
rather than u64
? If avoiding a few casts is the reason I would go for u64
instead; that makes it impossible to represent invalid states.
src/read.rs
Outdated
self.reader.seek(io::SeekFrom::Current(offset)) | ||
} | ||
} else { | ||
Err(io::Error::new(io::ErrorKind::Other, "Only relative seek is supported.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also handle SeekFrom::Start
and SeekFrom::End
; we know the current position from the start of the chunk, so we can translate a seek to an absolute position into a relative seek on the underlying reader.
src/read.rs
Outdated
let current_in_chunk = self.len as i64 - self.remaining as i64; | ||
let wanted_in_chunk = current_in_chunk as i64 + offset; | ||
if wanted_in_chunk < 0 { | ||
Err(io::Error::new(io::ErrorKind::Other, "Seeking befoer begin of chunk")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: “befoer”
/// | ||
/// This function will only read the Riff header from the file | ||
/// in order to position the stream to the first chunk. | ||
pub fn new(mut reader: R) -> Result<ChunksReader<R>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is a nice way to handle the header.
src/read.rs
Outdated
/// | ||
/// This function will panic if it is called while the reader is not in | ||
/// the data chunk, or if the format has not been parsed. | ||
pub fn len(&self) -> u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing on the chunks iterator, if I would see
let mut chunks_reader = ChunkReader::new(file)?;
chunks_reader.len()
I would think len
would return the number of chunks, not the number of samples in the data chunk. The best solution I can think of would be to leave off len()
and duration()
entirely; a user who reads chunks manually can compute them from the fmt
chunk data instead.
src/read.rs
Outdated
} | ||
let mut kind_str = [0; 4]; | ||
if let Err(_) = self.reader.read_into(&mut kind_str) { | ||
// assumes EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct; it might stop iteration on any kind of IO error. Unfortunately read_into
does not return an io::ErrorKind::UnexpectedEof
because it did not yet exist in Rust 1.4.0. I can fix it later if you like, but could you please put a TODO
here?
Hmm, I can see why now. Making |
Looks good now! I think I’ll go ahead and merge this to master, and I’ll make another pass over it once I get around to fixing up metadata reading in Claxon, to make them consistent, and to get some confidence that the API is right, if it works for both flac and wav. Is that okay with you? |
Happy to hear that. I'm in no rush altogether, but do you want me to give a second go at the writer while I'm on the topic ? |
Sure, go ahead. |
Merged as 0db8127. Thanks a lot! |
Hello there, this is an extension to support writting and reading arbitrary chunks between "fmt " and "data".
Let me know what you think.
(By the way, I think there is a tiny bug in skip_bytes call while looking for the data chunk. As far as I understand it, RIFF chunks must be aligned to even offsets so there might be a padding zero byte... This patch actually fixes it.)