-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MPEG-TS format #147
Comments
Please use the form described in kaitai-io/kaitai_struct#134 : 2 yaml blocks and a free-form text. For referencing an issue use blocked_by:
- 196 in the second block. |
I spent a last couple of weeks on my first kaitai struct file for mpeg2 video. I have found it really hard to implement, and ultimately I have failed. First as I deal with mpegts files I did an mpeg-ts demuxer in kaitai (https://gist.github.com/kalidasya/ef5a7349aaf53073d2e0e16d3588f751) it was easy. But mpeg2 and potentially h264 will not be possible. Here are the obstacles I have encountered (in random order, sorry)
rest:
seq:
- id: data
type: magic
repeat: until
# next line is ugly. next to the generic exit condition (prefix is 1 or eos reached)
# we have to make sure we call the next_start_code for each magic otherwise it will not be available later
# but we cannot do it when the eos was close so the data was not read
repeat-until: (_.prefix_code == 1 or _io.size - _io.pos < 4) and (_io.size - _io.pos < 4 or _.next_start_code > -1)
instances:
next_start_code:
value: data.last.next_start_code
start_code:
seq:
- id: sync
contents: [0x00, 0x00, 0x01]
- id: start_code
type: u1
magic:
seq:
- id: stuff
#contents: [0]
type: u1
instances:
prefix_code:
type: b24
pos: _io.pos
if: _io.size - _io.pos > 4
next_start_code:
type: u1
io: _root._io
pos: _io.pos + 3
if: _io.size - _io.pos > 4 so I can simulate some peek like behaviour. The goal is that you call the rest type if you just want to consume the remaining data in the stream. It will fail if there is no data remaining, and it will consume the next sequence. Lack of peek caused a lot of other troubles, mpeg2 video sequences has a pre-defined order, it means while I am parsing a sequence I have to read to the end of the sequence then parse another if its present, these kind of things are not possible (in the attached gist I ignored the sequence order completely)
types:
sequence:
seq:
- id: start_code
type: start_code
if: _io.size - _io.pos > 4
- id: data
type:
switch-on: start_code.start_code
cases:
0x00: picture_header
0x01...0xAF: slice
0xb2: user_data
0xb3: sequence_header
0xb5: extension_data
seq:
- id: start_code_id
type: b4
- id: data
type:
switch-on: start_code_id
cases:
0b0001: sequence_extension
0b0010: sequence_display_extension
0b1000: picture_coding_extension currently you have to merge these structures together as after reading the
- id: sync_word
contents: [true] or - id: sync_word
contents: [0b1111111] |
All these points deserve an own issue in some kaitai-io repo. BTW synalysis repo has some grammars for mpeg, you may find them useful. |
@kalidasya Thanks for bringing this all together. A few comments of mine on these:
A relatively complex problem. To the extent you've mentioned in this point, seeking the next sync point this way would be implemented using scanning pluggable algorithms, as per kaitai-io/kaitai_struct#538 — there is even a proof-of-concept PR branch by @tinrodriguez8 — kaitai-io/kaitai_struct_compiler#166 — but, unfortunately, it kind of stalled lately.
Yup, should be implemented in kaitai-io/kaitai_struct#12 with
Again, likely scanning algorithms would help, but just to clarify: what are you supposed to do with these bits afterwards? Do they form some kind of a value?
Cached instances are there for a reason. Just to clarify: you don't need/want them to be re-evaluated anywhere outside peeking scenario?
Yeah, given that we have
That's pretty vague. The problem with "ignore EOS" is in defining what exactly happens when we hit an error, i.e. how do we recover. Do we stop this branch / all further processing to some extent, or do we continue, and, if we do, to what extent? This is somewhat discussed in kaitai-io/kaitai_struct#280, but I don't think we even have any solid proposals for an exception/recovery system so far :( |
It is possible, but, unfortunately, situation with WebIDE is somewhat bad lately :( It asks for a major build system revamp, but nobody is interested in sitting down and redoing it, as it's not really a fun project. There is a stuck PR by @fudgepop01: kaitai-io/kaitai_struct_webide#84 — I wasn't able to get it to build fully, and, alas, looks like @fudgepop01 have lost interest in it too, and now concentrates on his VSCode extension project (which might be a great next step).
I'm not sure about root IO (and you're probably not asking for
This is actually interesting, as it should not be. I suspect that implementation of "scanning" / "peeking" via creation of tons of objects in memory might be to blame. Can we investigate it more somehow?
I don't think that struct.unpack has any means to extract individual bytes, and current implementation kind of does that — i.e. it reads bytes, and then splits them into bits. May be we can optimize it for fixed location of bytes, so we won't have that many reads and/or condition checks, but I'm not sure if that's the main cuprit. Again, we'll probably need to investigate/benchmark/profile it more. Are there any good tools out there for Python to do that?
We'll likely have that as part of kaitai-io/kaitai_struct#435 — i.e. as - id: sync_word
type: b7
valid: 0b1111111 |
I think that ticket handles it.
In this particular case the data can be dropped, but its more like the structure exist (I have not attempted a full mpeg2 parsing, only meta information (no picture data for example)
I think my issue was a combination of using
great!
sorry, I meant here only eos of the
ok, thats understandable, the web ide still provides a lot of value
in my case I was always interested in the root io.pos and often the bits_left attribute, but indeed giving some clue where the error happened in the bytestream would help a lot. Not sure for non root io would it help to me to indentify the bytes I failed to parse. But it was just me, maybe my heruistics were suboptimal.
True, that can be the other culprit, I will try to investigate it more, in general it is just reading byte after byte as the other implementation did, so I ruled that out, but maybe I was wrong
indeed, its a heatwave here, I hallucinated a bit level struct, yes thats for bytes only. I will try to figure out where do we spend our times with profiling the execution
yes there are, I used it long ago so I have to refresh my knowledge
|
So what does not have an issue linked: I will take |
it seems @GreyCat is right, I have 15 sec cumulative and 7 sec was spent in the prefix_code property. So a different seeking might help a lot. I will try to figure out if I can narrow it down more. Next is 3 sec cumulative in kaitaistruct.read_bits_int (called more time than prefix_code) in the read_bits_int it seems out of the 2.4 sec cumulative 1.8 is in the function itself (read_bytes is 0.4 and isinstance is 0.2 only) maybe some optimisation can happen there, but I guess the whole seeking should be addressed then we will see the performance |
Sir I am new to Kaitai, I want to parse pcap file which has tcp/IP flow info, I am java for programming . I have seen your video in Youtube regarding media parsing, kindly let me know how to parse pcap file |
@avi-techno what does it have to do with mpeg-ts? |
Is mpeg working nowadays? |
Basic parsing is possible (@pavja2 has POC), but for a more effective parser it depends on kaitai-io/kaitai_struct#196.
The text was updated successfully, but these errors were encountered: