-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DebugInfo][DWARF] Utilize DW_AT_LLVM_stmt_sequence attr in line table lookups #123391
Conversation
5d8b108
to
82a6fb8
Compare
@llvm/pr-subscribers-debuginfo Author: None (alx32) ChangesSummary Background The Implementation Details If the attribute is present, the function filters the results to only include line entries from the sequence that starts at the specified offset. This ensures that even when multiple functions share the same address range, we return only the line entries that actually belong to the function represented by the DIE. The implementation:
Full diff: https://github.com/llvm/llvm-project/pull/123391.diff 3 Files Affected:
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
index ff7bf87d8e6b5a..732a4bfc28ab3a 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
@@ -209,6 +209,9 @@ class DWARFDebugLine {
unsigned LastRowIndex;
bool Empty;
+ /// The offset into the line table where this sequence begins
+ uint64_t Offset = 0;
+
void reset();
static bool orderByHighPC(const Sequence &LHS, const Sequence &RHS) {
@@ -243,8 +246,20 @@ class DWARFDebugLine {
uint32_t lookupAddress(object::SectionedAddress Address,
bool *IsApproximateLine = nullptr) const;
+ /// Fills the Result argument with the indices of the rows that correspond
+ /// to the address range specified by \p Address and \p Size.
+ ///
+ /// \param Address - The starting address of the range.
+ /// \param Size - The size of the address range.
+ /// \param Result - The vector to fill with row indices.
+ /// \param Die - if provided, the function will check for a
+ /// DW_AT_LLVM_stmt_sequence attribute. If present, only rows from the
+ /// sequence starting at the matching offset will be added to the result.
+ ///
+ /// Returns true if any rows were found.
bool lookupAddressRange(object::SectionedAddress Address, uint64_t Size,
- std::vector<uint32_t> &Result) const;
+ std::vector<uint32_t> &Result,
+ const DWARFDie *Die = nullptr) const;
bool hasFileAtIndex(uint64_t FileIndex) const {
return Prologue.hasFileAtIndex(FileIndex);
@@ -305,8 +320,20 @@ class DWARFDebugLine {
uint32_t lookupAddressImpl(object::SectionedAddress Address,
bool *IsApproximateLine = nullptr) const;
- bool lookupAddressRangeImpl(object::SectionedAddress Address, uint64_t Size,
- std::vector<uint32_t> &Result) const;
+ /// Fills the Result argument with the indices of the rows that correspond
+ /// to the address range specified by \p Address and \p Size.
+ ///
+ /// \param Address - The starting address of the range.
+ /// \param Size - The size of the address range.
+ /// \param Result - The vector to fill with row indices.
+ /// \param StmtSequenceOffset - if provided, only rows from the sequence
+ /// starting at the matching offset will be added to the result.
+ ///
+ /// Returns true if any rows were found.
+ bool
+ lookupAddressRangeImpl(object::SectionedAddress Address, uint64_t Size,
+ std::vector<uint32_t> &Result,
+ std::optional<uint64_t> StmtSequenceOffset) const;
};
const LineTable *getLineTable(uint64_t Offset) const;
@@ -377,7 +404,15 @@ class DWARFDebugLine {
function_ref<void(Error)> ErrorHandler);
void resetRowAndSequence();
- void appendRowToMatrix();
+
+ /// Append the current Row to the LineTable's matrix of rows and update the
+ /// current Sequence information.
+ ///
+ /// \param LineTableOffset - the offset into the line table where the
+ /// current sequence of rows begins. This offset is stored in the Sequence
+ /// to allow filtering rows based on their originating sequence when a
+ /// DW_AT_LLVM_stmt_sequence attribute is present.
+ void appendRowToMatrix(uint64_t LineTableOffset);
struct AddrOpIndexDelta {
uint64_t AddrOffset;
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
index 939a5163d55abc..27b4c6a5489127 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
@@ -570,8 +570,9 @@ void DWARFDebugLine::ParsingState::resetRowAndSequence() {
Sequence.reset();
}
-void DWARFDebugLine::ParsingState::appendRowToMatrix() {
+void DWARFDebugLine::ParsingState::appendRowToMatrix(uint64_t LineTableOffset) {
unsigned RowNumber = LineTable->Rows.size();
+ Sequence.Offset = LineTableOffset;
if (Sequence.Empty) {
// Record the beginning of instruction sequence.
Sequence.Empty = false;
@@ -848,6 +849,8 @@ Error DWARFDebugLine::LineTable::parse(
*OS << '\n';
Row::dumpTableHeader(*OS, /*Indent=*/Verbose ? 12 : 0);
}
+ uint64_t LineTableSeqOffset = *OffsetPtr;
+
bool TombstonedAddress = false;
auto EmitRow = [&] {
if (!TombstonedAddress) {
@@ -857,7 +860,7 @@ Error DWARFDebugLine::LineTable::parse(
}
if (OS)
State.Row.dump(*OS);
- State.appendRowToMatrix();
+ State.appendRowToMatrix(LineTableSeqOffset);
}
};
while (*OffsetPtr < EndOffset) {
@@ -913,6 +916,9 @@ Error DWARFDebugLine::LineTable::parse(
// followed.
EmitRow();
State.resetRowAndSequence();
+ // Cursor now points to right after the end_sequence opcode - so points
+ // to the start of the next sequence - if one exists.
+ LineTableSeqOffset = Cursor.tell();
break;
case DW_LNE_set_address:
@@ -1364,10 +1370,18 @@ DWARFDebugLine::LineTable::lookupAddressImpl(object::SectionedAddress Address,
bool DWARFDebugLine::LineTable::lookupAddressRange(
object::SectionedAddress Address, uint64_t Size,
- std::vector<uint32_t> &Result) const {
+ std::vector<uint32_t> &Result, const DWARFDie *Die) const {
+
+ // If DIE is provided, check for DW_AT_LLVM_stmt_sequence
+ std::optional<uint64_t> StmtSequenceOffset;
+ if (Die) {
+ if (auto StmtSeqAttr = Die->find(dwarf::DW_AT_LLVM_stmt_sequence)) {
+ StmtSequenceOffset = toSectionOffset(StmtSeqAttr);
+ }
+ }
// Search for relocatable addresses
- if (lookupAddressRangeImpl(Address, Size, Result))
+ if (lookupAddressRangeImpl(Address, Size, Result, StmtSequenceOffset))
return true;
if (Address.SectionIndex == object::SectionedAddress::UndefSection)
@@ -1375,12 +1389,13 @@ bool DWARFDebugLine::LineTable::lookupAddressRange(
// Search for absolute addresses
Address.SectionIndex = object::SectionedAddress::UndefSection;
- return lookupAddressRangeImpl(Address, Size, Result);
+ return lookupAddressRangeImpl(Address, Size, Result, StmtSequenceOffset);
}
bool DWARFDebugLine::LineTable::lookupAddressRangeImpl(
object::SectionedAddress Address, uint64_t Size,
- std::vector<uint32_t> &Result) const {
+ std::vector<uint32_t> &Result,
+ std::optional<uint64_t> StmtSequenceOffset) const {
if (Sequences.empty())
return false;
uint64_t EndAddr = Address.Address + Size;
@@ -1401,6 +1416,14 @@ bool DWARFDebugLine::LineTable::lookupAddressRangeImpl(
while (SeqPos != LastSeq && SeqPos->LowPC < EndAddr) {
const DWARFDebugLine::Sequence &CurSeq = *SeqPos;
+
+ // Skip sequences that don't match our stmt_sequence offset if one was
+ // provided
+ if (StmtSequenceOffset && CurSeq.Offset != *StmtSequenceOffset) {
+ ++SeqPos;
+ continue;
+ }
+
// For the first sequence, we need to find which row in the sequence is the
// first in our range.
uint32_t FirstRowIndex = CurSeq.FirstRowIndex;
@@ -1423,7 +1446,7 @@ bool DWARFDebugLine::LineTable::lookupAddressRangeImpl(
++SeqPos;
}
- return true;
+ return !Result.empty();
}
std::optional<StringRef>
diff --git a/llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp b/llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
index e549128031744e..7a1d634900ab6a 100644
--- a/llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
+++ b/llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
@@ -6,10 +6,10 @@
//
//===----------------------------------------------------------------------===//
+#include "llvm/DebugInfo/DWARF/DWARFDebugLine.h"
#include "DwarfGenerator.h"
#include "DwarfUtils.h"
#include "llvm/DebugInfo/DWARF/DWARFContext.h"
-#include "llvm/DebugInfo/DWARF/DWARFDebugLine.h"
#include "llvm/Object/ObjectFile.h"
#include "llvm/Testing/Support/Error.h"
#include "gtest/gtest.h"
@@ -2035,4 +2035,132 @@ TEST_F(DebugLineBasicFixture, PrintPathsProperly) {
EXPECT_THAT(Result.c_str(), MatchesRegex("a dir.b dir.b file"));
}
+/// Test that lookupAddressRange correctly filters rows based on DW_AT_LLVM_stmt_sequence.
+///
+/// This test verifies that:
+/// 1. When a DIE has a DW_AT_LLVM_stmt_sequence attribute, lookupAddressRange only returns
+/// rows from the sequence starting at the specified offset
+/// 2. When a DIE has an invalid DW_AT_LLVM_stmt_sequence offset, no rows are returned
+/// 3. When no DW_AT_LLVM_stmt_sequence is present, all matching rows are returned
+///
+/// The test creates a line table with two sequences at the same address range but
+/// different line numbers. It then creates three subprogram DIEs:
+/// - One with DW_AT_LLVM_stmt_sequence pointing to the first sequence
+/// - One with DW_AT_LLVM_stmt_sequence pointing to the second sequence
+/// - One with an invalid DW_AT_LLVM_stmt_sequence offset
+TEST_F(DebugLineBasicFixture, LookupAddressRangeWithStmtSequenceOffset) {
+ if (!setupGenerator())
+ GTEST_SKIP();
+
+ // Create new DWARF with the subprogram DIE
+ dwarfgen::CompileUnit &CU = Gen->addCompileUnit();
+ dwarfgen::DIE CUDie = CU.getUnitDIE();
+
+ CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c");
+ CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);
+
+ dwarfgen::DIE SD1 = CUDie.addChild(DW_TAG_subprogram);
+ SD1.addAttribute(DW_AT_name, DW_FORM_strp, "sub1");
+ SD1.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
+ SD1.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x1032U);
+ // DW_AT_LLVM_stmt_sequence points to the first sequence
+ SD1.addAttribute(DW_AT_LLVM_stmt_sequence, DW_FORM_sec_offset, 0x2e);
+
+ dwarfgen::DIE SD2 = CUDie.addChild(DW_TAG_subprogram);
+ SD2.addAttribute(DW_AT_name, DW_FORM_strp, "sub2");
+ SD2.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
+ SD2.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x1032U);
+ // DW_AT_LLVM_stmt_sequence points to the second sequence
+ SD2.addAttribute(DW_AT_LLVM_stmt_sequence, DW_FORM_sec_offset, 0x42);
+
+ dwarfgen::DIE SD3 = CUDie.addChild(DW_TAG_subprogram);
+ SD3.addAttribute(DW_AT_name, DW_FORM_strp, "sub3");
+ SD3.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
+ SD3.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x1032U);
+ // Invalid DW_AT_LLVM_stmt_sequence
+ SD3.addAttribute(DW_AT_LLVM_stmt_sequence, DW_FORM_sec_offset, 0x66);
+
+ // Create a line table with multiple sequences
+ LineTable < = Gen->addLineTable();
+
+ // First sequence with addresses 0x1000(Ln100), 0x1004(Ln101)
+ LT.addExtendedOpcode(9, DW_LNE_set_address, {{0x1000U, LineTable::Quad}});
+ LT.addStandardOpcode(DW_LNS_set_prologue_end, {});
+ LT.addStandardOpcode(DW_LNS_advance_line, {{99, LineTable::SLEB}});
+ LT.addStandardOpcode(DW_LNS_copy, {});
+ LT.addByte(0x4b); // Special opcode: address += 4, line += 1
+ LT.addExtendedOpcode(1, DW_LNE_end_sequence, {});
+
+ // Second sequence with addresses 0x1000(Ln200), 0x1004(Ln201)
+ LT.addExtendedOpcode(9, DW_LNE_set_address, {{0x1000U, LineTable::Quad}});
+ LT.addStandardOpcode(DW_LNS_set_prologue_end, {});
+ LT.addStandardOpcode(DW_LNS_advance_line, {{199, LineTable::SLEB}});
+ LT.addStandardOpcode(DW_LNS_copy, {});
+ LT.addByte(0x4b); // Special opcode: address += 4, line += 1
+ LT.addExtendedOpcode(1, DW_LNE_end_sequence, {});
+
+ // Generate the DWARF
+ generate();
+
+ // Parse the line table to get the sequence offset
+ auto ExpectedLineTable = Line.getOrParseLineTable(
+ LineData, /*Offset=*/0, *Context, nullptr, RecordRecoverable);
+ ASSERT_THAT_EXPECTED(ExpectedLineTable, Succeeded());
+ const auto *Table = *ExpectedLineTable;
+
+ uint32_t NumCUs = Context->getNumCompileUnits();
+ ASSERT_EQ(NumCUs, 1u);
+ DWARFUnit *Unit = Context->getUnitAtIndex(0);
+ auto DwarfCUDie = Unit->getUnitDIE(false);
+
+ auto Sub1Die = DwarfCUDie.getFirstChild();
+ auto Sub2Die = Sub1Die.getSibling();
+ auto Sub3Die = Sub2Die.getSibling();
+
+ // Verify Sub1Die is the DIE generated from SD1
+ auto NameAttr1 = Sub1Die.find(DW_AT_name);
+ EXPECT_STREQ(*dwarf::toString(*NameAttr1), "sub1");
+
+ // Verify Sub2Die is the DIE generated from SD2
+ auto NameAttr2 = Sub2Die.find(DW_AT_name);
+ EXPECT_STREQ(*dwarf::toString(*NameAttr2), "sub2");
+
+ // Verify Sub2Die is the DIE generated from SD3
+ auto NameAttr3 = Sub3Die.find(DW_AT_name);
+ EXPECT_STREQ(*dwarf::toString(*NameAttr3), "sub3");
+
+ // Ensure there are two sequences
+ ASSERT_EQ(Table->Sequences.size(), 2u);
+
+ // Lookup addresses in the first sequence with the second sequence's filter
+ {
+ std::vector<uint32_t> Rows;
+ bool Found;
+
+ // Look up using Sub3Die, which has an invalid DW_AT_LLVM_stmt_sequence
+ Found = Table->lookupAddressRange(
+ {0x1000, object::SectionedAddress::UndefSection}, /*Size=*/1, Rows,
+ &Sub3Die);
+ EXPECT_FALSE(Found);
+
+ // Look up using Sub1Die, which has a valid DW_AT_LLVM_stmt_sequence and
+ // should return row 0
+ Found = Table->lookupAddressRange(
+ {0x1000, object::SectionedAddress::UndefSection}, /*Size=*/1, Rows,
+ &Sub1Die);
+ EXPECT_TRUE(Found);
+ ASSERT_EQ(Rows.size(), 1u);
+ EXPECT_EQ(Rows[0], 0);
+
+ // Look up using Sub2Die, which has a valid DW_AT_LLVM_stmt_sequence and
+ // should return row 3
+ Rows.clear();
+ Found = Table->lookupAddressRange(
+ {0x1000, object::SectionedAddress::UndefSection}, /*Size=*/1, Rows,
+ &Sub2Die);
+ EXPECT_TRUE(Found);
+ ASSERT_EQ(Rows.size(), 1u);
+ EXPECT_EQ(Rows[0], 3);
+ }
+}
} // end anonymous namespace
|
82a6fb8
to
f877656
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few changes. Wait for others to comment as well.
The `macho-gsym-merged-callsites-dsym.yaml` test was failing with expensive checks on. This is because we still can't reliably associate line table information to functions until #123391 is merged. Example failure: https://lab.llvm.org/buildbot/#/builders/16/builds/12374
The `macho-gsym-merged-callsites-dsym.yaml` test was failing with expensive checks on. This is because we still can't reliably associate line table information to functions until llvm/llvm-project#123391 is merged. Example failure: https://lab.llvm.org/buildbot/#/builders/16/builds/12374
f877656
to
ad1d1d0
Compare
Rather than passing a DIE and then having the API query for the stmt_sequence, would it be tidier to have the caller do that work? (I imagine at the moment/prior to this patch, the line table parsing didn't have much/any knowledge of DIEs, and that might be a good thing to maintain?) |
@dwblaikie - I think either way is fine. The reason it's like this currently is because the Also if multiple callers will be reusing this functionality they'll have to separately query for the attribute having a bit of duplicate code. I'll change the API to take an offset in a bit if no one else has a strong preference otherwise. |
// Skip sequences that don't match our stmt_sequence offset if one was | ||
// provided | ||
if (StmtSequenceOffset && CurSeq.StmtSeqOffset != *StmtSequenceOffset) { | ||
++SeqPos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might help if the while
turned into a for loop (for (; SeqPos != LastSeq && SeqPos->LowPC < EndAddr; ++SeqPos)
) so that the increment doesn't have to appear in two places/risk getting out of sync.
Though more generally - since the entries will be ordered by StmtSeqOffset - you could do a binary search (llvm::binary_search
) to find the sequence, if the StmtSequenceOffset is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to binary search approach - though this makes the code a bit more complex I think.
@@ -1401,6 +1409,14 @@ bool DWARFDebugLine::LineTable::lookupAddressRangeImpl( | |||
|
|||
while (SeqPos != LastSeq && SeqPos->LowPC < EndAddr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: this seems questionable... The LowPC of one sequence isn't necessarily related to the LowPC of another sequence (the linker could reorder them), so I'm not sure what this is meant to do but I'm a bit surprised/not sure it does whatever it is meant to do.
1cd753b
to
16e8e20
Compare
return false; | ||
|
||
// Set LastSeq to just past this sequence since we only want this one | ||
LastSeq = std::next(SeqPos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a std::vector, so there's no need for the fancy std::next, this can be LastSeq = SeqPos + 1;
(std::next is handy if you have a non-random-access container/iterator, etc, but otherwise I think it's easier to read with the more direct expression)
// If we have a statement sequence offset, find the specific sequence | ||
// Binary search for sequence with matching StmtSeqOffset | ||
Sequence.StmtSeqOffset = *StmtSequenceOffset; | ||
SeqPos = std::lower_bound(Sequences.begin(), LastSeq, Sequence, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad here - Sequences is sorted by high pc, instead of by offset. So this'll have to be a linear search - so just replacing this lower_bound with llvm::find_if.
And rather than using the Sequence
object, using llvm::find_if you can search for the thing with the desired StmtSeqOffset directly comparing it to *StmtSequenceOffset.
SeqPos = llvm::find_if(Sequences, [&](const DWARFDebugLine::Sequence &Seq) {
return Seq.StmtSeqOffset == *StmtSequenceOffset;
});
...
…e lookups (llvm#123391) **Summary** Add support for filtering line table entries based on `DW_AT_LLVM_stmt_sequence` attribute when looking up address ranges. This ensures that line entries are correctly attributed to their corresponding functions, even when multiple functions share the same address range due to optimizations. **Background** In llvm#110192 we added support to clang to generate the `DW_AT_LLVM_stmt_sequence` attribute for `DW_TAG_subprogram`'s. Corresponding RFC: [New DWARF Attribute for Symbolication of Merged Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434) The `DW_AT_LLVM_stmt_sequence` attribute allows accurate attribution of line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker. **Implementation Details** The patch modifies `DWARFDebugLine::lookupAddressRange` to accept an optional DWARFDie parameter. When provided, the function checks if the `DIE` has a `DW_AT_LLVM_stmt_sequence` attribute. This attribute contains an offset into the line table that marks where the line entries for this DIE's function begin. If the attribute is present, the function filters the results to only include line entries from the sequence that starts at the specified offset. This ensures that even when multiple functions share the same address range, we return only the line entries that actually belong to the function represented by the DIE. The implementation: - Adds an optional DWARFDie parameter to lookupAddressRange - Extracts the `DW_AT_LLVM_stmt_sequence` offset if present - Modifies the address range lookup logic to filter sequences based on their offset - Returns only line entries from the matching sequence
Summary
Add support for filtering line table entries based on
DW_AT_LLVM_stmt_sequence
attribute when looking up address ranges. This ensures that line entries are correctly attributed to their corresponding functions, even when multiple functions share the same address range due to optimizations.Background
In #110192 we added support to clang to generate the
DW_AT_LLVM_stmt_sequence
attribute forDW_TAG_subprogram
's. Corresponding RFC: New DWARF Attribute for Symbolication of Merged FunctionsThe
DW_AT_LLVM_stmt_sequence
attribute allows accurate attribution of line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker.Implementation Details
The patch modifies
DWARFDebugLine::lookupAddressRange
to accept an optional DWARFDie parameter. When provided, the function checks if theDIE
has aDW_AT_LLVM_stmt_sequence
attribute. This attribute contains an offset into the line table that marks where the line entries for this DIE's function begin.If the attribute is present, the function filters the results to only include line entries from the sequence that starts at the specified offset. This ensures that even when multiple functions share the same address range, we return only the line entries that actually belong to the function represented by the DIE.
The implementation:
DW_AT_LLVM_stmt_sequence
offset if present