Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VITIS-13074 - Dump common control codes before and after being patched #8338

Merged
merged 9 commits into from
Aug 21, 2024

Conversation

dezhiAmd
Copy link
Collaborator

@dezhiAmd dezhiAmd commented Aug 6, 2024

Problem solved by the commit

The overall goal is to provide a debugging method when there is a hang caused by control-codes.
Add one flag dump-control-codes in file xrt.ini Debug section. When this flags is true, control-codes before and after patching are dumped for users to further identify the issue.

Refer to this confluence page for spec: https://confluence.amd.com/x/8avhV

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

How problem was solved, alternative solutions (if any) and why they were rejected

Problem1: If application has many run-object. How to tell which set of pre- and post- patch bin file is for which run?
Ultimate solution is to add an id in elf file
Currently the id is a static member of class xrt::module_sram. It is incremented every time when constructor is called.

Problem2: minimize performance impact brought by debug flag
Selected solution:
At the phase of constructing xrt::module_sram, Check whether Dump-control-codes is true. If yes, record it in xrt::module_sram::m_debug_mode.

Alternative solution (rejected because concept of control-codes is also in xrt-device)
At the phase of creating xrt::device, Check whether Dump-control-codes is true. If yes, record it in xrt::device::m_debug_mode. The reason is to avoid checking Dump-control-codes in a use case with many xrt::module.

Risks (if any) associated the changes in the commit

What has been tested and how, request additional testing if necessary

Test on strixb0 has been done using one of the standard transaction-buffer test cases from Larry.

Documentation impact (if any)

The new entry in xrt.ini needs to be documented

@dezhiAmd dezhiAmd requested a review from larry9523 August 6, 2024 23:44
@dezhiAmd dezhiAmd requested a review from maxzhen August 7, 2024 19:27
@dezhiAmd dezhiAmd marked this pull request as ready for review August 8, 2024 17:32
@dezhiAmd dezhiAmd requested a review from stsoe as a code owner August 8, 2024 17:32
@dezhiAmd dezhiAmd requested a review from stsoe August 19, 2024 17:05
@stsoe stsoe merged commit 64dae6c into Xilinx:master Aug 21, 2024
17 checks passed
@dezhiAmd dezhiAmd deleted the VITIS-13074 branch August 21, 2024 16:01
daveliddell pushed a commit to daveliddell/XRT that referenced this pull request Sep 27, 2024
VITIS-13074 - Dump common control codes before and after being patched (Xilinx#8338)

* Dump control-codes, control-packet and preemption-codes

* Attempt to fix compiling errors on linux

* Remove no_exec_cmd_buf

* Remove all change to xrt_kernel.cpp

* Put all change in xrt_module.cpp. Not ideal...

* Add log file

* Trigger debugging elf-flow only if xrt_core::config::get_xrt_debug is true

* Attempt to fix compile error on centos78

---------

Co-authored-by: dezhliao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants