Fix fcase() segfault #6452

MichaelChirico · 2024-09-02T06:01:55Z

Closes #6448.

@aitap I didn't add any test -- any ideas for what would make sense as a regression test here? Technically this could have been caught by existing tests, under a stricter operating environment.

The test from OP did reliably segfault, but it's harder to turn that into a test for the absence of a segfault -- experimenting, I was getting it to run 1500+ times before segfaulting.

MichaelChirico · 2024-09-02T06:02:07Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @MichaelChirico and the rest of your teammates on Graphite

github-actions · 2024-09-02T06:20:36Z

Generated via commit a563ada

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 10 minutes and 59 seconds

Time taken to run atime::atime_pkg on the tests: 5 minutes and 41 seconds

MichaelChirico · 2024-09-02T06:21:15Z

What @aitap suggested is a much simpler one-line fix here:

data.table/src/fifelse.c

Line 238 in 02ced80

value0 = thens;

-      value0 = thens;
+      value0 = PROTECT(thens); nprotect++;

Looking again though, it does look wasteful to me that we repeatedly run getAttrib() on value0 -- and the only reason we keep value0 around to begin with is to make sure subsequent thens match the first one (re-named value0) on class & factor levels. So the approach taken here is to record those attributes instead, and not keep all of value0 protected.

It makes the code a bit lengthier, but should be more memory-efficient.

HughParsonage · 2024-09-02T06:51:06Z

Are we sure this is the fix? I am surprised that a miscalculated UNPROTECT would not have triggered a stack imbalance warning message.

MichaelChirico · 2024-09-02T06:57:21Z

Are we sure this is the fix? I am surprised that a miscalculated UNPROTECT would not have triggered a stack imbalance warning message.

there's no missed unprotect -- because thens is REPROTECT'd.

I confirmed the examples from the issue which reliably segfault pretty quickly, no longer do so.

aitap · 2024-09-02T11:05:39Z

any ideas for what would make sense as a regression test here

~~Can we afford to enable gctorture for one test? With it enabled,~~With manual calls to gc(), it's possible to force value0 to point to memory allocated by one of the eval() calls and thus fail the class test:

library(data.table)
x <- 1:3
system.time(local({
 fcase(
  x<2, structure(list(1), class = "foo"),
  x<3, structure(list(2), class = "foo"),
  # after i=1, `value0` has been unprotected;
  # one of the temporary values from the following eval()s
  # has a chance to show up at its former address
  { gc(full = TRUE); replicate(10, FALSE); x<4 }, `attr<-`(list(3), "class", "foo")
 )
}))
# Error in fcase(x < 2, structure(list(1), class = "foo"), x < 3, structure(list(2),  : 
#   Argument #6 has different class than argument #2, Please make sure all output values have the same class.
# Timing stopped at: 0.01 0 0.02

Without an explicit gc, fcase returns the right answer (a list of 1...3 with the class set) because garbage collection and memory reuse doesn't naturally happen that frequently.

Edit: use replicate to produce an arbitrary number of temporaries and gc() to trigger the issue

MichaelChirico · 2024-09-02T15:09:06Z

could you test making the input a factor with other attributes too? that will trigger the max # of branches in fcase with a getAttrib call

MichaelChirico · 2024-09-02T16:32:43Z

Really slick idea to get gc() to happen in the middle of fcase() execution. Not clear what breadth of systems we can expect such a test to actually work as intended, but it's definitely better than nothing!

aitap · 2024-09-03T13:37:12Z

Here's a test with factors that errors for me on R-devel/data.table from Git on Linux and R-4.4.0/data.table-1.16.0 on Windows, all x86_64:

x <- 1:3
y <- I(as.factor(LETTERS[1:3]))
fcase(
  x<2, y[3],
  x<3, y[2],
  { gc(full = TRUE); for (i in 1:1000) list(factor(letters[3:6])); x<4 }, y[1]
)

It doesn't fail in a more interesting manner because value0 ends up not being a factor:

#2  0x00007ffff44cb764 in fcaseR (rho=0x555558b43390, args=0x555558884098) at fifelse.c:279
279                 error(_("Argument #2 and argument #%d are both factor but their levels are different."), i*2+2);
(gdb) p R_inspect(value0)
@555557edc868 16 STRSXP g0c1 [] (len=1, tl=0)
  @555555a6e668 09 CHARSXP g1c2 [MARK,REF(2109),gp=0x61] [ASCII] [cached] "unique.default"

Somehow, without the lists in the outputs, it takes more effort to get this particular SEXP reused, but it is never free'd, only given out by allocVector.

tdhock

looks reasonable to me, thanks!
~~would we have caught this if we were using rchk? #3618~~
edit: looks like we do have an rchk ci job, why didn't it catch this? isn't rchk supposed to check the PROTECT etc?

MichaelChirico · 2024-09-07T15:11:24Z

rchk is a static analyzer, it can't catch everything. possibly it doesn't know how to handle PROTECT_WITH_INDEX.

* More descriptive variable names for fcase counting variables * Fix fcase() segfault (#6452)

tdhock · 2024-09-08T13:06:35Z

@kalibera you are the main author of rchk right?
This example may be interesting for you to investigate, for improving rchk?

kalibera · 2024-09-09T07:49:06Z

@kalibera you are the main author of rchk right? This example may be interesting for you to investigate, for improving rchk?

Unfortunately it is not feasible examples like this would be covered. The main challenge for rchk here is, in rchk terminology, that something gets implicitly protected, but later implicitly unprotected. rchk gets the first part, but isn't able to collect enough information to track the second part (value0 is implicitly protected by getting a value from protected thens, but later thens loses its protection in an additional iteration of the loop). It is not realistic for cases like this to be handled and in principle one should not assume that a lack of report from rchk means that the code is correct (or even lacks some kind of bugs) - it's just a bug finding tool based on static analysis, not a formal verification tool.

A higher-level explanation could be that rchk is modelling which "variables" are protected, but it is an approximation (often also of how we think about the code, but still an approximation) - in fact R objects on the heap are protected or not. For this abstraction to work better, one should not copy pointers to SEXPs. When that is still needed, one needs to be careful and perhaps could add a comment to the code about why it does not break protection.

Otherwise I confirm the reason of the bug in the code discussed above: value0 holds to an object initially protected via thens, but then no longer protected.

MichaelChirico mentioned this pull request Sep 2, 2024

Fix fcase() segfault (#6452) #6451

Merged

MichaelChirico marked this pull request as ready for review September 2, 2024 06:02

MichaelChirico force-pushed the 09-02-more_descriptive_variable_names_for_fcase_counting_variables branch from 84e0932 to 49bc983 Compare September 2, 2024 06:16

MichaelChirico force-pushed the 09-02-fix_fcase_segfault branch from 4e5a82d to 25896d2 Compare September 2, 2024 06:16

MichaelChirico force-pushed the 09-02-fix_fcase_segfault branch from 25896d2 to 3cb9108 Compare September 2, 2024 06:27

Fix fcase() segfault

a563ada

MichaelChirico force-pushed the 09-02-fix_fcase_segfault branch from 3cb9108 to a563ada Compare September 2, 2024 17:02

MichaelChirico requested a review from tdhock September 7, 2024 05:16

tdhock approved these changes Sep 7, 2024

View reviewed changes

MichaelChirico merged commit 9e845c4 into 09-02-more_descriptive_variable_names_for_fcase_counting_variables Sep 7, 2024
4 of 5 checks passed

MichaelChirico deleted the 09-02-fix_fcase_segfault branch September 7, 2024 15:11

MichaelChirico added a commit that referenced this pull request Sep 7, 2024

Fix fcase() segfault (#6452) (#6451)

b1c6bf3

* More descriptive variable names for fcase counting variables * Fix fcase() segfault (#6452)

MichaelChirico added a commit that referenced this pull request Sep 7, 2024

Fix fcase() segfault (#6452) (#6451)

65dd334

* More descriptive variable names for fcase counting variables * Fix fcase() segfault (#6452)

MichaelChirico mentioned this pull request Sep 7, 2024

intermittent issue w/ fcase in 1.16 #6448

Closed

MichaelChirico mentioned this pull request Sep 28, 2024

Just gc() on ancient R where gc(full=TRUE) is not available #6546

Merged

tanho63 mentioned this pull request Oct 7, 2024

Release data.table version 1.16.2 #6444

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fcase() segfault #6452

Fix fcase() segfault #6452

MichaelChirico commented Sep 2, 2024 •

edited

Loading

MichaelChirico commented Sep 2, 2024 •

edited

Loading

github-actions bot commented Sep 2, 2024 •

edited

Loading

MichaelChirico commented Sep 2, 2024

HughParsonage commented Sep 2, 2024

MichaelChirico commented Sep 2, 2024 •

edited

Loading

aitap commented Sep 2, 2024 •

edited

Loading

MichaelChirico commented Sep 2, 2024

MichaelChirico commented Sep 2, 2024

aitap commented Sep 3, 2024

tdhock left a comment •

edited

Loading

MichaelChirico commented Sep 7, 2024

tdhock commented Sep 8, 2024

kalibera commented Sep 9, 2024

Fix fcase() segfault #6452

Fix fcase() segfault #6452

Conversation

MichaelChirico commented Sep 2, 2024 • edited Loading

MichaelChirico commented Sep 2, 2024 • edited Loading

github-actions bot commented Sep 2, 2024 • edited Loading

MichaelChirico commented Sep 2, 2024

HughParsonage commented Sep 2, 2024

MichaelChirico commented Sep 2, 2024 • edited Loading

aitap commented Sep 2, 2024 • edited Loading

MichaelChirico commented Sep 2, 2024

MichaelChirico commented Sep 2, 2024

aitap commented Sep 3, 2024

tdhock left a comment • edited Loading

Choose a reason for hiding this comment

MichaelChirico commented Sep 7, 2024

tdhock commented Sep 8, 2024

kalibera commented Sep 9, 2024

MichaelChirico commented Sep 2, 2024 •

edited

Loading

MichaelChirico commented Sep 2, 2024 •

edited

Loading

github-actions bot commented Sep 2, 2024 •

edited

Loading

MichaelChirico commented Sep 2, 2024 •

edited

Loading

aitap commented Sep 2, 2024 •

edited

Loading

tdhock left a comment •

edited

Loading