-
Notifications
You must be signed in to change notification settings - Fork 172
Tobin secondary cb synch tracking #2066
base: master
Are you sure you want to change the base?
Conversation
25fb4ed
to
658e576
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but yikes!
658e576
to
6e7a47c
Compare
@tobine, Master runs fine on Dota2 and a release build starts up in ~10 seconds with validation enabled. With the sync changes added, my windows machine blue-screened after about 4 minutes. What information can I get for you? |
@mark-lunarg does that mean you also never saw anything on screen? Can you kill the last CL so that ENABLE_MEMORY_ACCESS_CALLBACK is not defined and re-try? If that works it's a big flag about where the issue is. |
Yep, that worked. |
6e7a47c
to
4267210
Compare
@mark-lunarg thanks, did perf seem on-par with master? I'm pretty sure the issue is with Draw tracking in ValidateCmdDrawType() function. Specifically this block
If you're up for a couple other tests here's what I'd be interested in. If you have time and are willing, please try this on top of the running config that you have (last CL of this PR killed):
If you get to the 2nd experiment and it passes, then that we grab the Draw updates, but fail on checking them for conflicts. If the second experiment doesn't pass (which I expect) then almost certainly something is broken in AddReadWriteBuffersAndImages(). That's the code that I'm going over now, but those 2 experiments would verify I'm on the right track. |
4267210
to
ae690c5
Compare
I may have found the issue. Forgot to update LUT when I added AMD draw commands. I just pushed fix. @mark-lunarg can you pull latest and see if that was the problem? |
Just got out of a meeting, will check it now and look at perf if it works. |
Yeah, that fixed it. Time to first pixels is 27 sec with the PR, 10 without. |
4ef67b0
to
91b6474
Compare
@tobine, is there anything to be done on this one now, or should we move it to inactive state and hold onto the branch? |
@mark-lunarg yes, moved to inactive for now. Want to hang onto this until I have a good landing spot for the barrier checking. |
It would be amazing to get this in at some point - missing sync validation is currently one of the biggest pain points for writing correct Vulkan code. Keep up the good work! |
91b6474
to
4a14b22
Compare
4a14b22
to
f9c7517
Compare
This is the initial version of a single-case update to handle synchronization tracking for the cmd sequence from GH892. As the code stands it's only primarily useful for this single cmd sequence and isn't catching any variety of memory conflicts. The intention is that if the general idea is sound then I will build on this initial code to expand/generalize the algorithms used here for all memory conflict cases.
Add tracking for the memory accesses in CmdBlitImage. Initially just adding a read & write access for the src & dest image memory that conservatively references whole binding and is recorded as imprecise so that it only will cause warnings.
Add tracking for the memory access in CmdFillBuffer. This is a single write to a buffer for which we have precise info.
Add tracking for the memory accesses in CmdClear[DS|Color]Image. Initially just adding imprecise write access covering the whole image binding area.
Update CmdClearAttachments to record a write memory access per image attachment that's cleared. This is an imprecise access across the entire image binding for now.
Update global mem access state when CmdClearAttachments is called.
Add tracking for the memory accesses in CmdResolveImage. Initially just adding a read & write access for the src & dest image memory that conservatively references whole binding and is recorded as imprecise so that it only will cause warnings.
Just moving code to prep for adding mem access support to CmdCopyQueryPoolResults. Refactor it to use the common PreValidate* & PostRecord* pattern.
Add the buffer write access for CmdCopyQueryPoolResults. Updated query pool data struct to store number of elements in the query based on query type and then use that number when calculating total data size.
Update the Draw/Dispatch validate/update framework to account for all memory accesses through descriptors. At validate time grab the complete set of active read and write memory bindings from descriptors. Then verify the memory accesses to flag any conflicts. During state update, record all of the accesses into the cmd buffer state. All of these accesses are imprecise and will only result in warnings if there is a potential synch issue.
A PipelineBarrier command that occurs within a renderPass is only for handling a subpass self-dependency. That case is not currently included in the synch model so updating code to only record barriers that are outside of a renderPass for now.
Moved existing code into RecordBarrierMemoryAccess() function. This generalizes to code so it can more easily be multi-purposed in order to record barriers for CmdWaitEvents().
This is a race condition test to check both a RaW and WaR conflict with buffers used in a Draw. A src buffer is copied to a dst buffer, then, without a barrier, the src is used as a storage buffer in a draw and the dst is used as a uniform buffer in the same draw. This results in a RaW conflict for the Dst/Uniform buffer and WaR conflict for the Src/Storage buffer.
Make any memory access race condition conflict messages warning-level for initial release. There are still some holes in modeling the synch primitives in validation so setting callback to warning allows for a soft release where people can ignore or post bugs on incorrect warnings while the synch model is updated in validation. The current warning message is meant to deter developers from sinking too much time into debugging these warnings, and promotes feedback for known-bad warning cases.
Record barriers in CmdWaitEvents using same method used for barriers from CmdPipelineBarriers. This involves merging any previous barriers to handle dependency chains and then recording the barriers into any previous memory accesses that fall into the source synch scope.
Modify ValidateMemoryAccesses() so that it takes map of prev accesses directly. This is preparation to share the function with command buffer submission validation where we'll deal with sets of initial accesses that may cross many command buffers.
At QueueSubmit time, check for any memory conflicts between command buffers in a single submit. Here's the high-level algorithm: 1. While submitting CBs, store vector of CB state for each CB 2. For each CB, create vector of mem accesses that are "live" meaning that no barrier within the CB made them visible 3. Check live vector against all other live mem accesses up to this point in submit. 3a. If there are no potential conflicts, save live access vector into submit access map and continue with next submit 3b. If there are potential conflicts, we need to replay commands up to this point so continue to step 4. 4. Gather all cmds submitted up to this point in a single vector 5. Note any synch commands that occur between the early and late mem access conflict commands 6. Replay mem access commands, including synch commands between the points of interest 7. Following replay, re-check for conflicts and if they still occur, flag an error Also update the race condition warning message between two separate command buffers with additional information about second cmd buffer.
Added TwoCommandBufferUpdateBufferRaWDependencyMissingBarrier test that is an alternate version of UpdateBufferRaWDependencyMissingBarrier test that separates the CmdUpdateBuffer and CmdDrawIndirect commands across two separate command buffers and verifies that the RaW race condition is flagged.
Merging synch barriers may be a bit naive. Note potential bug for future fix.
Add MultiCBDrawWithBufferWaRandRaWSynchChain test. This features three command buffers with mem access dependencies and synch commands that cross CB boundaries. This is a positive test to verify that no errors occur.
Mis-match between size_t and uint32_t was causing some Windows compiler warnings. Use size_t uniformly where vector size() function involved.
Move the code validate inter-cmd-buffer synch conflicts at QueueSubmit time into its own function. This also fixes a bug where the vector of cmds to replay was not being reset when more than one replay session was required for a single commit. It also streamlines the code a bit by making the update of mem access map with latest live mem accesses common code between fast and slow path.
This commit stores memory accesses into separate read and write maps. This provides a perf boost for checking potential memory access faults by eliminating uneccessary checks when previous access maps that would be checked are empty so we don't iterate over maps that have no potential for any conflicts. A couple of other changes to cleanup memory access checks: 1. Merge "Add*MemoryAccess()" and "ValidateMemoryAccesses()" functions into a single "CreateAndValidate*MemoryAccess()" function. This creates the local access struct and validates it, but it won't be recorded into the CB until/unless validation passes. 2. Split Mem access violation reporting into a separate function that can be shared by a couple of different cases that flag errors.
This is a conservative measure to simplify landing and integration of the memory access checks. For now turning off the callback and also disabling some draw-time checks that seem to have ~10-20% perf hit in limited testing. Turning off callback also means turning off tests that rely on the callback. Will continue advancing this code and testing with checks enabled to reach desired functionality before enabling code on master.
At CmdExcuteCommands() time, check between secondary cmd buffers being executed to see if there are any potential memory conflicts.
Add TwoSecondaryCBUpdateBufferRaWDependencyMissingBarrier test to flag error between two separate secondary command buffers that are issued in the same CmdExecuteCommands() call and have a RaW conflict.
Put memory access updates for active descriptors directly into the read and write vectors. This kills an extra data structure transformation where we moved the accesses through sets before putting them into vectors. There was no benefit to the extra overhead and this update brings the performance of validation with memory access checks on-par with validation when the checks are disabled. Still have the warning callback disabled for now. The ENABLE_MEMORY_ACCESS_CALLBACK define in vk_layer_config.h can be un-commented to enable the warning callback, as well as the descriptor memory access tracking.
Turning on all of the memory tracking code as there is almost no perf hit on "smoketest --validate -s" now. Also the warning callback has a large caveat statement to help developers share suspected issues and disregard as they see fit.
These extended draw types were added without updating the read/write flag LUT. Doh! Initially adding data as copies of matching non-AMD cmds which should work. Added comments to double-check and make sure those are the correct flags. Also added a magic number entry at the end of the LUT CommandToFlags array and a static_assert to make sure this array is correctly sized at compile time.
f9c7517
to
96a3862
Compare
@tobine, this repository will close for write access on Sunday, 5/13/2018. If it is pushed before that time it will be present in the follow-on Vulkan-ValidationLayers repository on Monday, otherwise a new PR will be required in the new repo. |
Replaces #2049
This is all the same code rebased to latest master plus it adds memory access checking for secondary command buffers and streamlines the descriptor update so that perf of validation with memory accesses enabled is on-par with validation when these checks are disabled for "smoketest --validate -s".
I've kept the callback disabled for now but added a commented-out "//#define ENABLE_MEMORY_ACCESS_CALLBACK" in vk_layer_config.h that can be un-commented to enable the callback and disabled tracking.
I think it's in a good state to land as a baseline to make it easier to merge code going forward as I continue to improve the synch tracking.