[UR] [V2] Add wait before enqueue in command buffer #17709

Xewar313 · 2025-03-28T12:50:53Z

According to @MichalMrozek, zeCommandListImmediateAppendCommandListsExp has the same requirements as zeCommandQueueExecuteCommandLists, because of this, the command list must not be referenced by device when it is enqueued. This PR fixes this issue by adding event to synchronize append and execution

pbalcer · 2025-03-28T12:56:15Z

unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp

+  if (phEvent == nullptr) {
+    phEvent = &internalEvent;
+  }
+  UR_CALL(hCommandBuffer->awaitExecution(commandListLocked));


This will block on host. The currentExecution event, if not null, should be simply added to the wait listwhen enqueuing the command list.

pbalcer · 2025-03-28T13:08:55Z

unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp

-  return enqueueGenericCommandListsExp(1, &commandBufferCommandList, phEvent,
-                                       numEventsInWaitList, phEventWaitList,
-                                       UR_COMMAND_ENQUEUE_COMMAND_BUFFER_EXP);
+  ur_event_handle_t internalEvent = nullptr;


You could create two events in the command buffer, and use them in sequence to avoid allocating new events. Two because to avoid having the same waitlist/signal event. But we can do that later.

@igchor although I'm not sure there's going to be much performance benefit over just plain always allocating new and deallocating previous one. what do you think?

Yes, I agree, this would be similar to our initial implementation where we had CCS and BCS.

unified-runtime/source/adapters/level_zero/v2/command_buffer.cpp

unified-runtime/test/conformance/exp_command_buffer/fill.cpp

Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>

unified-runtime/source/adapters/level_zero/v2/command_buffer.cpp

igchor · 2025-03-28T16:14:15Z

unified-runtime/source/adapters/level_zero/v2/command_buffer.cpp

+      commandList->getZeCommandList() ==
+          commandListManager.get_no_lock()->getZeCommandList() &&
+      "Provided command list is not the same as the one in the command buffer");
+  return currentExecution;


this should be under commandListManager lock

oh, never mind, I see the lock is taken in the enqueue function. in that case, I would suggest to add unlocked to this function name and also pass ur_command_list_manager& instead of locked<ur_command_list_manager>&

In that case, I will simply make it unlocked, and remove ur_command_list_manager from the arguments - it was there just to ensure that the function is synchronized

igchor · 2025-03-28T16:17:12Z

unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp

-  return enqueueGenericCommandListsExp(1, &commandBufferCommandList, phEvent,
-                                       numEventsInWaitList, phEventWaitList,
-                                       UR_COMMAND_ENQUEUE_COMMAND_BUFFER_EXP);
+  ur_event_handle_t internalEvent = nullptr;


Yes, I agree, this would be similar to our initial implementation where we had CCS and BCS.

igchor · 2025-03-28T16:21:06Z

unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp

+  }
+  ur_event_handle_t executionEvent =
+      hCommandBuffer->getCurrentExecutionEvent(commandListLocked);
+  std::vector<ur_event_handle_t> extendedWaitList;


Instead of creating a new temporary wait list I would suggest to just modify enqueueGenericCommandListsExp and put the event directly into the commandList->waitList. This would eliminate the extra allocation. You could add extra optionaEvent param to getWaitListView function to achieve this.

Agree with Igor we need avoid dynamic memory allocations in hot path at all cost.

igchor · 2025-03-28T19:01:27Z

unified-runtime/source/adapters/level_zero/v2/command_buffer.cpp

@@ -47,7 +47,38 @@ ur_result_t ur_exp_command_buffer_handle_t_::finalizeCommandBuffer() {
  isFinalized = true;
  return UR_RESULT_SUCCESS;
 }
+ur_event_handle_t ur_exp_command_buffer_handle_t_::getCurrentExecutionEvent(
+    [[maybe_unused]] locked<ur_command_list_manager> &commandList) {
+  assert(


I don't understand this assert. Don't we want to use events for synchronization only when the command list is different? When the command list is the same, we already have proper synchronization thanks to the fact the list is in-order.

This asset was to ensure proper synchronization of the getCurrentExecutionEvent. It made sure that it was called with locked<ur_command_list_manager> that comes from the object on which we call getCurrentExecutionEvent. But with the change that removes synchronizations that you requested above, this assert is not needed anymore.

Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>

Xewar313 requested review from a team as code owners March 28, 2025 12:50

Xewar313 requested a review from reble March 28, 2025 12:50

Xewar313 temporarily deployed to WindowsCILock March 28, 2025 12:51 — with GitHub Actions Inactive

Xewar313 temporarily deployed to WindowsCILock March 28, 2025 13:04 — with GitHub Actions Inactive

pbalcer reviewed Mar 28, 2025

View reviewed changes

EwanC reviewed Mar 28, 2025

View reviewed changes

unified-runtime/source/adapters/level_zero/v2/command_buffer.cpp Outdated Show resolved Hide resolved

unified-runtime/test/conformance/exp_command_buffer/fill.cpp Outdated Show resolved Hide resolved

Xewar313 added 3 commits March 28, 2025 13:09

Add wait before execution of command buffer

f9efac4

Fix format

40aa326

Release internal event after creation

e6161ee

Xewar313 had a problem deploying to WindowsCILock March 28, 2025 14:15 — with GitHub Actions Error

Xewar313 temporarily deployed to WindowsCILock March 28, 2025 14:25 — with GitHub Actions Inactive

Xewar313 temporarily deployed to WindowsCILock March 28, 2025 14:54 — with GitHub Actions Inactive

EwanC added a commit to reble/llvm that referenced this pull request Mar 28, 2025

Add CTS test

c65615f

Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>

Xewar313 added 2 commits March 28, 2025 15:15

Fix PR comments

bd511a2

Remove test that may fail some platforms

81a0cac

igchor requested changes Mar 28, 2025

View reviewed changes

igchor reviewed Mar 28, 2025

View reviewed changes

EwanC added a commit to reble/llvm that referenced this pull request Mar 28, 2025

Add CTS test

6fb031f

Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>

EwanC mentioned this pull request Mar 31, 2025

[SYCL][UR][Graph] Move L0 simultaneous graph synchronization from SYCL-RT to L0 adapter #17734

Open

EwanC added a commit to reble/llvm that referenced this pull request Mar 31, 2025

Add CTS test

d610ee8

Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>

Xewar313 temporarily deployed to WindowsCILock March 31, 2025 09:46 — with GitHub Actions Inactive

Xewar313 temporarily deployed to WindowsCILock March 31, 2025 09:55 — with GitHub Actions Inactive

Apply PR comments

b18aaee

Xewar313 temporarily deployed to WindowsCILock March 31, 2025 11:13 — with GitHub Actions Inactive

Xewar313 temporarily deployed to WindowsCILock March 31, 2025 11:22 — with GitHub Actions Inactive

Fix compilation

d534922

EwanC added a commit to reble/llvm that referenced this pull request Mar 31, 2025

Add CTS test

19a961e

Taken from intel#17709 Co-authored-by: Mikołaj Komar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UR] [V2] Add wait before enqueue in command buffer #17709

[UR] [V2] Add wait before enqueue in command buffer #17709

Xewar313 commented Mar 28, 2025 •

edited

Loading

pbalcer Mar 28, 2025

pbalcer Mar 28, 2025

igchor Mar 28, 2025

igchor Mar 28, 2025 •

edited

Loading

igchor Mar 28, 2025

Xewar313 Mar 31, 2025

igchor Mar 28, 2025

igchor Mar 28, 2025

MichalMrozek Mar 31, 2025

igchor Mar 28, 2025

Xewar313 Mar 31, 2025

[UR] [V2] Add wait before enqueue in command buffer #17709

Are you sure you want to change the base?

[UR] [V2] Add wait before enqueue in command buffer #17709

Conversation

Xewar313 commented Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

igchor Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xewar313 commented Mar 28, 2025 •

edited

Loading

igchor Mar 28, 2025 •

edited

Loading