Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CR-1201062: Fixing issue where enabling both AIE trace and PL trace resulted in PL trace not being generated #8241

Merged
merged 6 commits into from
Jun 20, 2024

Conversation

jvillarre
Copy link
Collaborator

Problem solved by the commit

During application shutdown, if trace is active then all of the trace information from the device needs a final flush from memory to disk. The order in which AIE trace and PL trace is offloaded is not defined, so either order has to be supported. The AIE trace was removing the connection to the PL hardware after it had finished flushing AIE trace, which resulted in the PL trace to be lost.

This pull request fixes that issue by moving the ownership of the DeviceIntf object into a common location so each individual profiling/trace plugin doesn't have to be responsible for managing the shared memory.

A future planned pull request will change this implementation from raw pointers to shared_ptrs to further clarify ownership and make sure we catch all possible memory issues.

Additionally, this pull request renames several objects in profiling to make it clear which ones are associated with PL constructs as opposed to AIE constructs, as the generic names of "device" and "trace" were ambiguous.

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

The bug was due to the aie_trace plugin removing the object used to communicate with the PL portion of the device after AIE trace had been flushed. It may have been introduced in recent refactorings, but was discovered with regression testing on multiple combinations of profiling options.

How problem was solved, alternative solutions (if any) and why they were rejected

The problem was solved by no longer deleting the PL device interface at the end of AIE trace flushing. This necessitated moving code to a common location and altering all of the profiling plugins that communicate with the PL side of the device.

Risks (if any) associated the changes in the commit

Low risk of affecting profiling execution in some configuration. Low risk of uncaught memory leaks to be addressed in future pull request.

What has been tested and how, request additional testing if necessary

The original failing test case has been tested on the board. Full regression with all combinations of profiling options enabled needs to be run to fully verify.

Documentation impact (if any)

No documentation impact.

jvillarre added 3 commits May 28, 2024 10:32
…call inside the static info database. Now it is no longer the responsibility of each individual plugin to create this interface. Instead, they must create a concrete xdp::Device to describe how to communicate with the physical device and pass that in to updateDevice.

Signed-off-by: Jason Villarreal <[email protected]>
…so remving unused functions.

Signed-off-by: Jason Villarreal <[email protected]>
@jvillarre jvillarre requested a review from hackwa June 14, 2024 22:22
Copy link
Collaborator

@IshitaGhosh IshitaGhosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the PR renames DeviceIntf to PLDeviceIntf, we can drop "device" and just call it PLIntf. It can be done in a later PR too.

@@ -2111,6 +2105,8 @@ namespace xdp {

if (!initializeStructure(currentXclbin, xrtXclbin)) {
delete currentXclbin;
if (xdpDevice != nullptr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip the nullptr check before delete.

XclbinInfo* xclbin = deviceInfo[deviceId]->currentXclbin();
if (xclbin == nullptr)
return nullptr;
return;
if (xclbin->deviceIntf != nullptr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip nullptr check before delete

@jvillarre jvillarre merged commit 538c9d0 into Xilinx:master Jun 20, 2024
17 checks passed
@jvillarre jvillarre deleted the RenamePL branch November 23, 2024 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants