-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UR] [V2] Add wait before enqueue in command buffer #17709
base: sycl
Are you sure you want to change the base?
Conversation
if (phEvent == nullptr) { | ||
phEvent = &internalEvent; | ||
} | ||
UR_CALL(hCommandBuffer->awaitExecution(commandListLocked)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will block on host. The currentExecution
event, if not null, should be simply added to the wait listwhen enqueuing the command list.
return enqueueGenericCommandListsExp(1, &commandBufferCommandList, phEvent, | ||
numEventsInWaitList, phEventWaitList, | ||
UR_COMMAND_ENQUEUE_COMMAND_BUFFER_EXP); | ||
ur_event_handle_t internalEvent = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could create two events in the command buffer, and use them in sequence to avoid allocating new events. Two because to avoid having the same waitlist/signal event. But we can do that later.
@igchor although I'm not sure there's going to be much performance benefit over just plain always allocating new and deallocating previous one. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, this would be similar to our initial implementation where we had CCS and BCS.
unified-runtime/source/adapters/level_zero/v2/command_buffer.cpp
Outdated
Show resolved
Hide resolved
Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>
commandList->getZeCommandList() == | ||
commandListManager.get_no_lock()->getZeCommandList() && | ||
"Provided command list is not the same as the one in the command buffer"); | ||
return currentExecution; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be under commandListManager lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, never mind, I see the lock is taken in the enqueue function. in that case, I would suggest to add unlocked
to this function name and also pass ur_command_list_manager&
instead of locked<ur_command_list_manager>&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I will simply make it unlocked, and remove ur_command_list_manager
from the arguments - it was there just to ensure that the function is synchronized
return enqueueGenericCommandListsExp(1, &commandBufferCommandList, phEvent, | ||
numEventsInWaitList, phEventWaitList, | ||
UR_COMMAND_ENQUEUE_COMMAND_BUFFER_EXP); | ||
ur_event_handle_t internalEvent = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, this would be similar to our initial implementation where we had CCS and BCS.
} | ||
ur_event_handle_t executionEvent = | ||
hCommandBuffer->getCurrentExecutionEvent(commandListLocked); | ||
std::vector<ur_event_handle_t> extendedWaitList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new temporary wait list I would suggest to just modify enqueueGenericCommandListsExp
and put the event directly into the commandList->waitList. This would eliminate the extra allocation. You could add extra optionaEvent
param to getWaitListView
function to achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Igor we need avoid dynamic memory allocations in hot path at all cost.
@@ -47,7 +47,38 @@ ur_result_t ur_exp_command_buffer_handle_t_::finalizeCommandBuffer() { | |||
isFinalized = true; | |||
return UR_RESULT_SUCCESS; | |||
} | |||
ur_event_handle_t ur_exp_command_buffer_handle_t_::getCurrentExecutionEvent( | |||
[[maybe_unused]] locked<ur_command_list_manager> &commandList) { | |||
assert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this assert. Don't we want to use events for synchronization only when the command list is different? When the command list is the same, we already have proper synchronization thanks to the fact the list is in-order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This asset was to ensure proper synchronization of the getCurrentExecutionEvent
. It made sure that it was called with locked<ur_command_list_manager>
that comes from the object on which we call getCurrentExecutionEvent
. But with the change that removes synchronizations that you requested above, this assert is not needed anymore.
Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>
Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>
Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>
According to @MichalMrozek, zeCommandListImmediateAppendCommandListsExp has the same requirements as zeCommandQueueExecuteCommandLists, because of this, the command list must not be referenced by device when it is enqueued. This PR fixes this issue by adding event to synchronize append and execution