Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DebugInfo][DWARF] Utilize DW_AT_LLVM_stmt_sequence attr in line table lookups #123391

Merged
merged 8 commits into from
Feb 6, 2025

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented Jan 17, 2025

Summary
Add support for filtering line table entries based on DW_AT_LLVM_stmt_sequence attribute when looking up address ranges. This ensures that line entries are correctly attributed to their corresponding functions, even when multiple functions share the same address range due to optimizations.

Background
In #110192 we added support to clang to generate the DW_AT_LLVM_stmt_sequence attribute for DW_TAG_subprogram's. Corresponding RFC: New DWARF Attribute for Symbolication of Merged Functions

The DW_AT_LLVM_stmt_sequence attribute allows accurate attribution of line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker.

Implementation Details
The patch modifies DWARFDebugLine::lookupAddressRange to accept an optional DWARFDie parameter. When provided, the function checks if the DIE has a DW_AT_LLVM_stmt_sequence attribute. This attribute contains an offset into the line table that marks where the line entries for this DIE's function begin.

If the attribute is present, the function filters the results to only include line entries from the sequence that starts at the specified offset. This ensures that even when multiple functions share the same address range, we return only the line entries that actually belong to the function represented by the DIE.

The implementation:

  • Adds an optional DWARFDie parameter to lookupAddressRange
  • Extracts the DW_AT_LLVM_stmt_sequence offset if present
  • Modifies the address range lookup logic to filter sequences based on their offset
  • Returns only line entries from the matching sequence

@alx32 alx32 requested review from clayborg, dwblaikie, adrian-prantl, rastogishubham and kyulee-com and removed request for clayborg January 17, 2025 19:56
@alx32 alx32 force-pushed the 13_gsymutil_stmt_seq branch from 5d8b108 to 82a6fb8 Compare January 17, 2025 19:58
@alx32 alx32 marked this pull request as ready for review January 17, 2025 19:58
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-debuginfo

Author: None (alx32)

Changes

Summary
Add support for filtering line table entries based on DW_AT_LLVM_stmt_sequence attribute when looking up address ranges. This ensures that line entries are correctly attributed to their corresponding functions, even when multiple functions share the same address range due to optimizations.

Background
In #110192 we added support to clang to generate the DW_AT_LLVM_stmt_sequence attribute for DW_TAG_subprogram's. Corresponding RFC: New DWARF Attribute for Symbolication of Merged Functions

The DW_AT_LLVM_stmt_sequence attribute allows accurate attribution of line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker.

Implementation Details
The patch modifies DWARFDebugLine::lookupAddressRange to accept an optional DWARFDie parameter. When provided, the function checks if the DIE has a DW_AT_LLVM_stmt_sequence attribute. This attribute contains an offset into the line table that marks where the line entries for this DIE's function begin.

If the attribute is present, the function filters the results to only include line entries from the sequence that starts at the specified offset. This ensures that even when multiple functions share the same address range, we return only the line entries that actually belong to the function represented by the DIE.

The implementation:

  • Adds an optional DWARFDie parameter to lookupAddressRange
  • Extracts the DW_AT_LLVM_stmt_sequence offset if present
  • Modifies the address range lookup logic to filter sequences based on their offset
  • Returns only line entries from the matching sequence

Full diff: https://github.com/llvm/llvm-project/pull/123391.diff

3 Files Affected:

  • (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h (+39-4)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp (+30-7)
  • (modified) llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp (+129-1)
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
index ff7bf87d8e6b5a..732a4bfc28ab3a 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
@@ -209,6 +209,9 @@ class DWARFDebugLine {
     unsigned LastRowIndex;
     bool Empty;
 
+    /// The offset into the line table where this sequence begins
+    uint64_t Offset = 0;
+
     void reset();
 
     static bool orderByHighPC(const Sequence &LHS, const Sequence &RHS) {
@@ -243,8 +246,20 @@ class DWARFDebugLine {
     uint32_t lookupAddress(object::SectionedAddress Address,
                            bool *IsApproximateLine = nullptr) const;
 
+    /// Fills the Result argument with the indices of the rows that correspond
+    /// to the address range specified by \p Address and \p Size.
+    ///
+    /// \param Address - The starting address of the range.
+    /// \param Size - The size of the address range.
+    /// \param Result - The vector to fill with row indices.
+    /// \param Die - if provided, the function will check for a
+    /// DW_AT_LLVM_stmt_sequence attribute. If present, only rows from the
+    /// sequence starting at the matching offset will be added to the result.
+    ///
+    /// Returns true if any rows were found.
     bool lookupAddressRange(object::SectionedAddress Address, uint64_t Size,
-                            std::vector<uint32_t> &Result) const;
+                            std::vector<uint32_t> &Result,
+                            const DWARFDie *Die = nullptr) const;
 
     bool hasFileAtIndex(uint64_t FileIndex) const {
       return Prologue.hasFileAtIndex(FileIndex);
@@ -305,8 +320,20 @@ class DWARFDebugLine {
     uint32_t lookupAddressImpl(object::SectionedAddress Address,
                                bool *IsApproximateLine = nullptr) const;
 
-    bool lookupAddressRangeImpl(object::SectionedAddress Address, uint64_t Size,
-                                std::vector<uint32_t> &Result) const;
+    /// Fills the Result argument with the indices of the rows that correspond
+    /// to the address range specified by \p Address and \p Size.
+    ///
+    /// \param Address - The starting address of the range.
+    /// \param Size - The size of the address range.
+    /// \param Result - The vector to fill with row indices.
+    /// \param StmtSequenceOffset - if provided, only rows from the sequence
+    /// starting at the matching offset will be added to the result.
+    ///
+    /// Returns true if any rows were found.
+    bool
+    lookupAddressRangeImpl(object::SectionedAddress Address, uint64_t Size,
+                           std::vector<uint32_t> &Result,
+                           std::optional<uint64_t> StmtSequenceOffset) const;
   };
 
   const LineTable *getLineTable(uint64_t Offset) const;
@@ -377,7 +404,15 @@ class DWARFDebugLine {
                  function_ref<void(Error)> ErrorHandler);
 
     void resetRowAndSequence();
-    void appendRowToMatrix();
+
+    /// Append the current Row to the LineTable's matrix of rows and update the
+    /// current Sequence information.
+    ///
+    /// \param LineTableOffset - the offset into the line table where the
+    /// current sequence of rows begins. This offset is stored in the Sequence
+    /// to allow filtering rows based on their originating sequence when a
+    /// DW_AT_LLVM_stmt_sequence attribute is present.
+    void appendRowToMatrix(uint64_t LineTableOffset);
 
     struct AddrOpIndexDelta {
       uint64_t AddrOffset;
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
index 939a5163d55abc..27b4c6a5489127 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
@@ -570,8 +570,9 @@ void DWARFDebugLine::ParsingState::resetRowAndSequence() {
   Sequence.reset();
 }
 
-void DWARFDebugLine::ParsingState::appendRowToMatrix() {
+void DWARFDebugLine::ParsingState::appendRowToMatrix(uint64_t LineTableOffset) {
   unsigned RowNumber = LineTable->Rows.size();
+  Sequence.Offset = LineTableOffset;
   if (Sequence.Empty) {
     // Record the beginning of instruction sequence.
     Sequence.Empty = false;
@@ -848,6 +849,8 @@ Error DWARFDebugLine::LineTable::parse(
     *OS << '\n';
     Row::dumpTableHeader(*OS, /*Indent=*/Verbose ? 12 : 0);
   }
+  uint64_t LineTableSeqOffset = *OffsetPtr;
+
   bool TombstonedAddress = false;
   auto EmitRow = [&] {
     if (!TombstonedAddress) {
@@ -857,7 +860,7 @@ Error DWARFDebugLine::LineTable::parse(
       }
       if (OS)
         State.Row.dump(*OS);
-      State.appendRowToMatrix();
+      State.appendRowToMatrix(LineTableSeqOffset);
     }
   };
   while (*OffsetPtr < EndOffset) {
@@ -913,6 +916,9 @@ Error DWARFDebugLine::LineTable::parse(
         // followed.
         EmitRow();
         State.resetRowAndSequence();
+        // Cursor now points to right after the end_sequence opcode - so points
+        // to the start of the next sequence - if one exists.
+        LineTableSeqOffset = Cursor.tell();
         break;
 
       case DW_LNE_set_address:
@@ -1364,10 +1370,18 @@ DWARFDebugLine::LineTable::lookupAddressImpl(object::SectionedAddress Address,
 
 bool DWARFDebugLine::LineTable::lookupAddressRange(
     object::SectionedAddress Address, uint64_t Size,
-    std::vector<uint32_t> &Result) const {
+    std::vector<uint32_t> &Result, const DWARFDie *Die) const {
+
+  // If DIE is provided, check for DW_AT_LLVM_stmt_sequence
+  std::optional<uint64_t> StmtSequenceOffset;
+  if (Die) {
+    if (auto StmtSeqAttr = Die->find(dwarf::DW_AT_LLVM_stmt_sequence)) {
+      StmtSequenceOffset = toSectionOffset(StmtSeqAttr);
+    }
+  }
 
   // Search for relocatable addresses
-  if (lookupAddressRangeImpl(Address, Size, Result))
+  if (lookupAddressRangeImpl(Address, Size, Result, StmtSequenceOffset))
     return true;
 
   if (Address.SectionIndex == object::SectionedAddress::UndefSection)
@@ -1375,12 +1389,13 @@ bool DWARFDebugLine::LineTable::lookupAddressRange(
 
   // Search for absolute addresses
   Address.SectionIndex = object::SectionedAddress::UndefSection;
-  return lookupAddressRangeImpl(Address, Size, Result);
+  return lookupAddressRangeImpl(Address, Size, Result, StmtSequenceOffset);
 }
 
 bool DWARFDebugLine::LineTable::lookupAddressRangeImpl(
     object::SectionedAddress Address, uint64_t Size,
-    std::vector<uint32_t> &Result) const {
+    std::vector<uint32_t> &Result,
+    std::optional<uint64_t> StmtSequenceOffset) const {
   if (Sequences.empty())
     return false;
   uint64_t EndAddr = Address.Address + Size;
@@ -1401,6 +1416,14 @@ bool DWARFDebugLine::LineTable::lookupAddressRangeImpl(
 
   while (SeqPos != LastSeq && SeqPos->LowPC < EndAddr) {
     const DWARFDebugLine::Sequence &CurSeq = *SeqPos;
+
+    // Skip sequences that don't match our stmt_sequence offset if one was
+    // provided
+    if (StmtSequenceOffset && CurSeq.Offset != *StmtSequenceOffset) {
+      ++SeqPos;
+      continue;
+    }
+
     // For the first sequence, we need to find which row in the sequence is the
     // first in our range.
     uint32_t FirstRowIndex = CurSeq.FirstRowIndex;
@@ -1423,7 +1446,7 @@ bool DWARFDebugLine::LineTable::lookupAddressRangeImpl(
     ++SeqPos;
   }
 
-  return true;
+  return !Result.empty();
 }
 
 std::optional<StringRef>
diff --git a/llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp b/llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
index e549128031744e..7a1d634900ab6a 100644
--- a/llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
+++ b/llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
@@ -6,10 +6,10 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "llvm/DebugInfo/DWARF/DWARFDebugLine.h"
 #include "DwarfGenerator.h"
 #include "DwarfUtils.h"
 #include "llvm/DebugInfo/DWARF/DWARFContext.h"
-#include "llvm/DebugInfo/DWARF/DWARFDebugLine.h"
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Testing/Support/Error.h"
 #include "gtest/gtest.h"
@@ -2035,4 +2035,132 @@ TEST_F(DebugLineBasicFixture, PrintPathsProperly) {
   EXPECT_THAT(Result.c_str(), MatchesRegex("a dir.b dir.b file"));
 }
 
+/// Test that lookupAddressRange correctly filters rows based on DW_AT_LLVM_stmt_sequence.
+///
+/// This test verifies that:
+/// 1. When a DIE has a DW_AT_LLVM_stmt_sequence attribute, lookupAddressRange only returns 
+///    rows from the sequence starting at the specified offset
+/// 2. When a DIE has an invalid DW_AT_LLVM_stmt_sequence offset, no rows are returned
+/// 3. When no DW_AT_LLVM_stmt_sequence is present, all matching rows are returned
+///
+/// The test creates a line table with two sequences at the same address range but 
+/// different line numbers. It then creates three subprogram DIEs:
+/// - One with DW_AT_LLVM_stmt_sequence pointing to the first sequence
+/// - One with DW_AT_LLVM_stmt_sequence pointing to the second sequence  
+/// - One with an invalid DW_AT_LLVM_stmt_sequence offset
+TEST_F(DebugLineBasicFixture, LookupAddressRangeWithStmtSequenceOffset) {
+  if (!setupGenerator())
+    GTEST_SKIP();
+
+  // Create new DWARF with the subprogram DIE
+  dwarfgen::CompileUnit &CU = Gen->addCompileUnit();
+  dwarfgen::DIE CUDie = CU.getUnitDIE();
+
+  CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c");
+  CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);
+
+  dwarfgen::DIE SD1 = CUDie.addChild(DW_TAG_subprogram);
+  SD1.addAttribute(DW_AT_name, DW_FORM_strp, "sub1");
+  SD1.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
+  SD1.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x1032U);
+  // DW_AT_LLVM_stmt_sequence points to the first sequence
+  SD1.addAttribute(DW_AT_LLVM_stmt_sequence, DW_FORM_sec_offset, 0x2e);
+
+  dwarfgen::DIE SD2 = CUDie.addChild(DW_TAG_subprogram);
+  SD2.addAttribute(DW_AT_name, DW_FORM_strp, "sub2");
+  SD2.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
+  SD2.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x1032U);
+  // DW_AT_LLVM_stmt_sequence points to the second sequence
+  SD2.addAttribute(DW_AT_LLVM_stmt_sequence, DW_FORM_sec_offset, 0x42);
+
+  dwarfgen::DIE SD3 = CUDie.addChild(DW_TAG_subprogram);
+  SD3.addAttribute(DW_AT_name, DW_FORM_strp, "sub3");
+  SD3.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
+  SD3.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x1032U);
+  // Invalid DW_AT_LLVM_stmt_sequence
+  SD3.addAttribute(DW_AT_LLVM_stmt_sequence, DW_FORM_sec_offset, 0x66);
+
+  // Create a line table with multiple sequences
+  LineTable &LT = Gen->addLineTable();
+
+  // First sequence with addresses 0x1000(Ln100), 0x1004(Ln101)
+  LT.addExtendedOpcode(9, DW_LNE_set_address, {{0x1000U, LineTable::Quad}});
+  LT.addStandardOpcode(DW_LNS_set_prologue_end, {});
+  LT.addStandardOpcode(DW_LNS_advance_line, {{99, LineTable::SLEB}});
+  LT.addStandardOpcode(DW_LNS_copy, {});
+  LT.addByte(0x4b); // Special opcode: address += 4, line += 1
+  LT.addExtendedOpcode(1, DW_LNE_end_sequence, {});
+
+  // Second sequence with addresses 0x1000(Ln200), 0x1004(Ln201)
+  LT.addExtendedOpcode(9, DW_LNE_set_address, {{0x1000U, LineTable::Quad}});
+  LT.addStandardOpcode(DW_LNS_set_prologue_end, {});
+  LT.addStandardOpcode(DW_LNS_advance_line, {{199, LineTable::SLEB}});
+  LT.addStandardOpcode(DW_LNS_copy, {});
+  LT.addByte(0x4b); // Special opcode: address += 4, line += 1
+  LT.addExtendedOpcode(1, DW_LNE_end_sequence, {});
+
+  // Generate the DWARF
+  generate();
+
+  // Parse the line table to get the sequence offset
+  auto ExpectedLineTable = Line.getOrParseLineTable(
+      LineData, /*Offset=*/0, *Context, nullptr, RecordRecoverable);
+  ASSERT_THAT_EXPECTED(ExpectedLineTable, Succeeded());
+  const auto *Table = *ExpectedLineTable;
+
+  uint32_t NumCUs = Context->getNumCompileUnits();
+  ASSERT_EQ(NumCUs, 1u);
+  DWARFUnit *Unit = Context->getUnitAtIndex(0);
+  auto DwarfCUDie = Unit->getUnitDIE(false);
+
+  auto Sub1Die = DwarfCUDie.getFirstChild();
+  auto Sub2Die = Sub1Die.getSibling();
+  auto Sub3Die = Sub2Die.getSibling();
+
+  // Verify Sub1Die is the DIE generated from SD1
+  auto NameAttr1 = Sub1Die.find(DW_AT_name);
+  EXPECT_STREQ(*dwarf::toString(*NameAttr1), "sub1");
+
+  // Verify Sub2Die is the DIE generated from SD2
+  auto NameAttr2 = Sub2Die.find(DW_AT_name);
+  EXPECT_STREQ(*dwarf::toString(*NameAttr2), "sub2");
+
+  // Verify Sub2Die is the DIE generated from SD3
+  auto NameAttr3 = Sub3Die.find(DW_AT_name);
+  EXPECT_STREQ(*dwarf::toString(*NameAttr3), "sub3");
+
+  // Ensure there are two sequences
+  ASSERT_EQ(Table->Sequences.size(), 2u);
+
+  // Lookup addresses in the first sequence with the second sequence's filter
+  {
+    std::vector<uint32_t> Rows;
+    bool Found;
+
+    // Look up using Sub3Die, which has an invalid DW_AT_LLVM_stmt_sequence
+    Found = Table->lookupAddressRange(
+        {0x1000, object::SectionedAddress::UndefSection}, /*Size=*/1, Rows,
+        &Sub3Die);
+    EXPECT_FALSE(Found);
+
+    // Look up using Sub1Die, which has a valid DW_AT_LLVM_stmt_sequence and
+    // should return row 0
+    Found = Table->lookupAddressRange(
+        {0x1000, object::SectionedAddress::UndefSection}, /*Size=*/1, Rows,
+        &Sub1Die);
+    EXPECT_TRUE(Found);
+    ASSERT_EQ(Rows.size(), 1u);
+    EXPECT_EQ(Rows[0], 0);
+
+    // Look up using Sub2Die, which has a valid DW_AT_LLVM_stmt_sequence and
+    // should return row 3
+    Rows.clear();
+    Found = Table->lookupAddressRange(
+        {0x1000, object::SectionedAddress::UndefSection}, /*Size=*/1, Rows,
+        &Sub2Die);
+    EXPECT_TRUE(Found);
+    ASSERT_EQ(Rows.size(), 1u);
+    EXPECT_EQ(Rows[0], 3);
+  }
+}
 } // end anonymous namespace

@alx32 alx32 force-pushed the 13_gsymutil_stmt_seq branch from 82a6fb8 to f877656 Compare January 17, 2025 20:07
@llvm llvm deleted a comment from github-actions bot Jan 17, 2025
Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few changes. Wait for others to comment as well.

alx32 added a commit that referenced this pull request Jan 21, 2025
The `macho-gsym-merged-callsites-dsym.yaml` test was failing with
expensive checks on. This is because we still can't reliably associate
line table information to functions until
#123391 is merged.

Example failure:
https://lab.llvm.org/buildbot/#/builders/16/builds/12374
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 21, 2025
The `macho-gsym-merged-callsites-dsym.yaml` test was failing with
expensive checks on. This is because we still can't reliably associate
line table information to functions until
llvm/llvm-project#123391 is merged.

Example failure:
https://lab.llvm.org/buildbot/#/builders/16/builds/12374
@alx32 alx32 force-pushed the 13_gsymutil_stmt_seq branch from f877656 to ad1d1d0 Compare January 21, 2025 23:44
@dwblaikie
Copy link
Collaborator

Rather than passing a DIE and then having the API query for the stmt_sequence, would it be tidier to have the caller do that work? (I imagine at the moment/prior to this patch, the line table parsing didn't have much/any knowledge of DIEs, and that might be a good thing to maintain?)

@alx32
Copy link
Contributor Author

alx32 commented Jan 22, 2025

@dwblaikie - I think either way is fine. The reason it's like this currently is because the stmt_sequence offset is somewhat specific/internal to the line table format so it might make more sense for the line table to query it directly from the DIE rather than having callers deal with knowledge relating to the line table format. But you're right the line table had so far no knowledge of DIEs up until now.

Also if multiple callers will be reusing this functionality they'll have to separately query for the attribute having a bit of duplicate code.

I'll change the API to take an offset in a bit if no one else has a strong preference otherwise.

// Skip sequences that don't match our stmt_sequence offset if one was
// provided
if (StmtSequenceOffset && CurSeq.StmtSeqOffset != *StmtSequenceOffset) {
++SeqPos;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might help if the while turned into a for loop (for (; SeqPos != LastSeq && SeqPos->LowPC < EndAddr; ++SeqPos)) so that the increment doesn't have to appear in two places/risk getting out of sync.

Though more generally - since the entries will be ordered by StmtSeqOffset - you could do a binary search (llvm::binary_search) to find the sequence, if the StmtSequenceOffset is specified.

Copy link
Contributor Author

@alx32 alx32 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to binary search approach - though this makes the code a bit more complex I think.

@@ -1401,6 +1409,14 @@ bool DWARFDebugLine::LineTable::lookupAddressRangeImpl(

while (SeqPos != LastSeq && SeqPos->LowPC < EndAddr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: this seems questionable... The LowPC of one sequence isn't necessarily related to the LowPC of another sequence (the linker could reorder them), so I'm not sure what this is meant to do but I'm a bit surprised/not sure it does whatever it is meant to do.

@alx32 alx32 force-pushed the 13_gsymutil_stmt_seq branch from 1cd753b to 16e8e20 Compare February 4, 2025 21:06
return false;

// Set LastSeq to just past this sequence since we only want this one
LastSeq = std::next(SeqPos);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a std::vector, so there's no need for the fancy std::next, this can be LastSeq = SeqPos + 1; (std::next is handy if you have a non-random-access container/iterator, etc, but otherwise I think it's easier to read with the more direct expression)

// If we have a statement sequence offset, find the specific sequence
// Binary search for sequence with matching StmtSeqOffset
Sequence.StmtSeqOffset = *StmtSequenceOffset;
SeqPos = std::lower_bound(Sequences.begin(), LastSeq, Sequence,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad here - Sequences is sorted by high pc, instead of by offset. So this'll have to be a linear search - so just replacing this lower_bound with llvm::find_if.

And rather than using the Sequence object, using llvm::find_if you can search for the thing with the desired StmtSeqOffset directly comparing it to *StmtSequenceOffset.

SeqPos = llvm::find_if(Sequences, [&](const DWARFDebugLine::Sequence &Seq) {
  return Seq.StmtSeqOffset == *StmtSequenceOffset;
});
...

@alx32 alx32 merged commit 464f65a into llvm:main Feb 6, 2025
8 checks passed
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…e lookups (llvm#123391)

**Summary**
Add support for filtering line table entries based on
`DW_AT_LLVM_stmt_sequence` attribute when looking up address ranges.
This ensures that line entries are correctly attributed to their
corresponding functions, even when multiple functions share the same
address range due to optimizations.

**Background**
In llvm#110192 we added support to
clang to generate the `DW_AT_LLVM_stmt_sequence` attribute for
`DW_TAG_subprogram`'s. Corresponding RFC: [New DWARF Attribute for
Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)

The `DW_AT_LLVM_stmt_sequence` attribute allows accurate attribution of
line number information to their corresponding functions, even in
scenarios where functions are merged or share the same address space due
to optimizations like Identical Code Folding (ICF) in the linker.

**Implementation Details**
The patch modifies `DWARFDebugLine::lookupAddressRange` to accept an
optional DWARFDie parameter. When provided, the function checks if the
`DIE` has a `DW_AT_LLVM_stmt_sequence` attribute. This attribute
contains an offset into the line table that marks where the line entries
for this DIE's function begin.

If the attribute is present, the function filters the results to only
include line entries from the sequence that starts at the specified
offset. This ensures that even when multiple functions share the same
address range, we return only the line entries that actually belong to
the function represented by the DIE.

The implementation:
- Adds an optional DWARFDie parameter to lookupAddressRange
- Extracts the `DW_AT_LLVM_stmt_sequence` offset if present
- Modifies the address range lookup logic to filter sequences based on
their offset
- Returns only line entries from the matching sequence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants