-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix replay staging buffer binding for when the capture/replay devices are not the same. #1115
Fix replay staging buffer binding for when the capture/replay devices are not the same. #1115
Conversation
This commit fixes an issue with erroneous memory type selection when binding buffer memory without memory translation, e.g. in cases of binding replay staging buffers. Currently, the code retrieves the capture device's memory types, even though we want to bind the memory directly from the replay device. This issue led to various replay issues with mapping memory later on in the replay, and was particularly disruptive when attempting to replay a trimmed capture on a machine with a GPU of a different vendor than the capture system.
Author jacobv-nvidia not on autobuild list. Waiting for curator authorization before starting CI build. |
CI gfxreconstruct build queued with queue ID 10524. |
CI gfxreconstruct build # 2769 running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a very useful fix, so thanks for the PR!
One point, It might be cleaner to extend BindBufferMemory
and BindImageMemory
with the extra bool parameter and give it a default value of true. Then the two new functions with slightly vague names wouldn't be needed.
@andrew-lunarg Yes, I was on the fence about that. I was trying to avoid cluttering the parent interface of the more general class |
CI gfxreconstruct build # 2769 passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR! I have proposed a small change to the calling sequence in a comment.
@@ -409,7 +410,9 @@ VkResult VulkanRebindAllocator::BindBufferMemory(VkBuffer buffer, | |||
create_info.flags = 0; | |||
create_info.usage = GetBufferMemoryUsage( | |||
resource_alloc_info->usage, | |||
capture_memory_properties_.memoryTypes[memory_alloc_info->original_index].propertyFlags, | |||
is_direct_allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize it's taken me so long to look at this. I think this function doesn't need to care whether the access is direct or not. I propose elevating the condition into the callers, changing the bool parameter to a const VkPhysicalDeviceMemoryProperties&
, and then passing in the desired memory properties inside the direct and non-direct functions. Also, there's no need to call the sub-functions Helper
, it's okay for them to be differentiated by parameters. @jacobv-nvidia may I make that change against your PR using GitHub's edit mechanism inside the PR viewer? An example is at https://github.com/bradgrantham-lunarg/gfxreconstruct/tree/brad-tweak-1115 that works against our local CI, but I'd like to run it on a known failure case if you have one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, go ahead. I agree, your new change seems like a much less hacky implementation of the fix.
I tried to take a trimmed capture of vkcube from llvmpipe and replay it on NVIDIA with |
I have just successfully tested it with vkcube, so that should work as a good test case, but you do need to:
For example: I haven't really tested on llvmpipe, so can't speak too much on that part. The issue was discovered and tested using an NVIDIA trace on an AMD card, but I suppose it's possible that the relevant memory type indices just happen to match up between llvmpipe and NVIDIA. |
Merged as #1173. We encountered this this week in our internal CI and this solved our problem so I prioritized it. Thank you for finding this @jacobv-nvidia ! |
@bradgrantham-lunarg Apologies for the delay in responding, I thought I had replied to your earlier comment on your changes. Anyway, I appreciate the merge. |
While testing the replay of trimmed capture files on devices other than the original capture device, I encountered an issue in the rebind allocator where allocations would fail, causing the replay to abort.
The initial loading allocations would fail, and, when running the replay with the
--validate
option, I would receive messages such as the following:The issue seems to be in the rebind allocator's "Direct"` binding functions, which are supposed to be performed without memory translation for the purposes of, for example, replay staging buffers. However, these functions were using the capture device's memory types instead of the replay device's.
This patch should fix the issue.