Read video from memory newapi #6771

jdsgomes · 2022-10-14T12:09:40Z

adds read from memory functionality in the new video API.

Addresses #6603

torchvision/io/video_reader.py

vedantroy · 2022-10-19T08:42:52Z

This segfaults on certain videos. Here's a partial-reproduction script (the videos are stored in a pandas dataframe):

import pandas as pd

# df2 = pd.read_pickle("df2.pkl")
df = pd.read_pickle("df.pkl")
# print the # of rows in the df
print(len(df))
# print the keys in the df
print(df.keys())
# print the first row in the df
print(df.iloc[0])


# print the type of the 1st value in the 1st row
first_vid = df.iloc[0][0]
print(f"Length: {len(first_vid)}")
print(f"Type: {type(first_vid)}")

# write first_vid to a file
with open("test.mp4", "wb") as f:
    f.write(first_vid)

import itertools
import copy

import torch
from torchvision.io import VideoReader
import torchvision

def clip_from_start(buf: bytes, expected_frames: int):
    # import av
    # import io
    # buffer = io.BytesIO(buf)
    # container = av.open(buffer)
    # i = 0 
    # for frame in container.decode(video=0):
    #     print(type(frame))
    #     i += 1
    #     print(i)
    #     pass

    tensor = torch.frombuffer(buf, dtype=torch.uint8)
    tensor = copy.deepcopy(tensor)
    # torchvision.io.read_video()
    rdr = VideoReader(tensor)
    sampled_frames = list(itertools.islice(iter(rdr), expected_frames))
    if len(sampled_frames) != expected_frames:
        return None
    data = []
    for frame in sampled_frames:
        data.append(frame["data"])
    return torch.stack(data, dim=0)

clip = clip_from_start(first_vid, 2)
print(clip.shape)

I'm working to get approval of the public copy of the data.

In the meantime, the error is:

test.py:41: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:1563.)
  tensor = torch.frombuffer(buf, dtype=torch.uint8)
malloc(): corrupted top size
Aborted (core dumped)

torchvision/io/video_reader.py

jdsgomes · 2022-10-19T15:01:59Z

This segfaults on certain videos. Here's a partial-reproduction script (the videos are stored in a pandas dataframe):

@vedantroy Thanks for reporting this.
I cannot reproduce the same error with some test videos. Is there any possibility that the data is corrupt?
Otherwise please let me know when you would be able to share the sample video for us to verify and debug.
Other option is to save the video as a file and try to read from path to see if you see the same behaviour

Thanks

bjuncek

Looks good to me. I'd love to make the tests stricter but otherwise good to go :)

bjuncek · 2022-10-20T10:31:23Z

test/test_videoapi.py

+        assert len(vr_pts) == len(vr_pts_mem)
+
+        # compare the frames and ptss
+        for i in range(len(vr_frames)):


Given that this is like-for-like decoding (using verifiably the same backend), we should probably insist on either a max delta or completely equal.

actually there are two videos for which this is not true: SOX5yA1l24A.mp4
I verified with this code by reading twice from file and also the result is not the same:

In [3]: reader1 = VideoReader('test/assets/videos/SOX5yA1l24A.mp4') In [4]: reader2 = VideoReader('test/assets/videos/SOX5yA1l24A.mp4') In [5]: reader1_frames = [] In [6]: reader2_frames = [] In [8]: for r1frame in reader1: ...: reader1_frames.append(r1frame["data"]) In [9]: for r2frame in reader2: ...: reader2_frames.append(r2frame["data"]) In [12]: for i in range(len(reader1_frames)): ...: if torch.all(reader1_frames[i].eq(reader2_frames[i])): ...: print(f"Frame {i} is not equal") Frame 0 is not equal Frame 2 is not equal Frame 3 is not equal Frame 4 is not equal Frame 5 is not equal Frame 6 is not equal Frame 7 is not equal Frame 8 is not equal Frame 9 is not equal Frame 10 is not equal Frame 11 is not equal Frame 12 is not equal ....

Do you know if for this particular format the decoding could be non deterministic?

This is actually surprising to me, is this non deterministic behaviour also occur on the stable API (if we specify same pts)?

Yup; we use the same backend.
@jdsgomes can you check if it happens on the pyav?

Maybe this is a cue that we should reconsider our selection of testing videos.
Specifically, I think we should re-sample them again from whatever are most used datasets these days, plus some FFMPEG fate situations.

I confirmed that that is the case.

please note that there is a bug in the test code (checking for equal instead of not equal) but still there are some frames that are different. This happens both in pyav and video_reader

might be related: https://video.stackexchange.com/questions/18902/why-ffmpeg-works-non-deterministically

In this case, I think this test is fine since it can handle the non-deterministic behaviour.

bjuncek · 2022-10-20T10:32:28Z

torchvision/csrc/io/decoder/defs.h

@@ -165,7 +165,7 @@ struct MediaFormat {
 struct DecoderParameters {
  // local file, remote file, http url, rtmp stream uri, etc. anything that
  // ffmpeg can recognize
-  std::string uri;
+  std::string uri{std::string()};


Why this?
(don't know C++ well, so I'm just curious)

just want to initialise the uri value as empty string

This reverts commit 6c92e94.

jdsgomes · 2022-10-20T13:22:12Z

test/test_videoapi.py

@@ -76,8 +76,14 @@ def test_frame_reading(self, test_video):

                # compare the frames and ptss
                for i in range(len(vr_frames)):
-                    assert av_pts[i] == vr_pts[i]
-                    torch.test.assert_equal(av_frames[i], vr_frames[i])


This was added in the wrong section so reverted the commit. My comment regarding why we cannot make the tests strict from read_from_memory still holds

YosuaMichael

Thanks @jdsgomes , I think overall the PR looks good. I will approve first to unblock, but lets wait for approval from @bjuncek before merging.

Also for vedantroy comment, I can't really reproduce it on my side. In my opinion, we can merge first but open an issue for the possible problem on segfault so we can come back to it later.

bjuncek

Approving to unblock, and starting a new issue to track video assets.

github-actions · 2022-10-21T08:58:35Z

Hey @jdsgomes!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Summary: * add tensor as optional param * add init from memory * fix bug * fix bug * first working version * apply formatting and add tests * simplify tests * fix tests * fix wrong variable name * add path as optional parameter * add src as optional * address pr comments * Fix warning messages * address pr comments * make tests stricter * Revert "make tests stricter" This reverts commit 6c92e94. Reviewed By: YosuaMichael Differential Revision: D40588171 fbshipit-source-id: 1c55b5ebd0bfdd3308931d218269a19481eb3ca9

jdsgomes added 8 commits October 12, 2022 15:47

add tensor as optional param

d334eee

add init from memory

be394ae

fix bug

4fa6bd4

fix bug

11846c1

first working version

74b44b5

apply formatting and add tests

2ddf173

simplify tests

94a12b5

fix tests

901c530

jdsgomes requested a review from bjuncek October 14, 2022 12:09

facebook-github-bot added the cla signed label Oct 14, 2022

fix wrong variable name

1b6fee5

jdsgomes commented Oct 14, 2022

View reviewed changes

torchvision/io/video_reader.py Show resolved Hide resolved

jdsgomes changed the title ~~Read video from mew newapi~~ Read video from memory newapi Oct 15, 2022

jdsgomes added 5 commits October 18, 2022 14:14

Merge branch 'main' into read_video_from_mem_newapi

f985be0

add path as optional parameter

288afa4

add src as optional

9bbdd14

address pr comments

382838d

Fix warning messages

5e6cb0d

YosuaMichael reviewed Oct 19, 2022

View reviewed changes

torchvision/io/video_reader.py Outdated Show resolved Hide resolved

jdsgomes added 2 commits October 19, 2022 14:11

address pr comments

8eb2692

Merge branch 'main' into read_video_from_mem_newapi

f4cadbb

bjuncek suggested changes Oct 20, 2022

View reviewed changes

jdsgomes added 2 commits October 20, 2022 12:33

make tests stricter

6c92e94

Revert "make tests stricter"

76ec9d2

This reverts commit 6c92e94.

jdsgomes commented Oct 20, 2022

View reviewed changes

Merge branch 'main' into read_video_from_mem_newapi

c5723f0

jdsgomes requested review from bjuncek and YosuaMichael October 20, 2022 15:19

YosuaMichael approved these changes Oct 20, 2022

View reviewed changes

jdsgomes mentioned this pull request Oct 20, 2022

Video reader segfaults on certain videos. Here's a partial-reproduction script #6802

Open

bjuncek approved these changes Oct 21, 2022

View reviewed changes

bjuncek mentioned this pull request Oct 21, 2022

videos in the assets might not be representable #6808

Open

Merge branch 'main' into read_video_from_mem_newapi

ca09530

jdsgomes merged commit 06ad05f into pytorch:main Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read video from memory newapi #6771

Read video from memory newapi #6771

jdsgomes commented Oct 14, 2022

vedantroy commented Oct 19, 2022 •

edited

Loading

jdsgomes commented Oct 19, 2022 •

edited

Loading

bjuncek left a comment

bjuncek Oct 20, 2022

jdsgomes Oct 20, 2022

YosuaMichael Oct 20, 2022

bjuncek Oct 20, 2022

jdsgomes Oct 20, 2022

jdsgomes Oct 20, 2022

YosuaMichael Oct 20, 2022

bjuncek Oct 20, 2022

jdsgomes Oct 20, 2022

jdsgomes Oct 20, 2022

YosuaMichael left a comment

bjuncek left a comment

github-actions bot commented Oct 21, 2022

Read video from memory newapi #6771

Read video from memory newapi #6771

Conversation

jdsgomes commented Oct 14, 2022

vedantroy commented Oct 19, 2022 • edited Loading

jdsgomes commented Oct 19, 2022 • edited Loading

bjuncek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YosuaMichael left a comment

Choose a reason for hiding this comment

bjuncek left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 21, 2022

vedantroy commented Oct 19, 2022 •

edited

Loading

jdsgomes commented Oct 19, 2022 •

edited

Loading