Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix subassignment to an all-NA vector #903

Merged
merged 13 commits into from
Jul 21, 2021
Merged

Fix subassignment to an all-NA vector #903

merged 13 commits into from
Jul 21, 2021

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Jul 17, 2021

It is going to take quite some time until we fix in vctrs. The code here is reasonably lean and can be removed once vec_slice<-() has an option that implements this.

Closes #773, closes #868.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

@lionel-
Copy link
Member

lionel- commented Jul 19, 2021

Do we really want to make tibble behave differently from vctrs? It is not clear that we can make vctrs behave like this without extensive changes as the vctrs C code (recursive through data frames) generally assumes that assignment does not change the type of underlying C arrays. Given the small activity of the linked issues, this doesn't seem like an urgent fix. I would wait until we figure this out in vctrs to avoid having inconsistent semantics between tibble, dplyr, funs.

@lionel-
Copy link
Member

lionel- commented Jul 19, 2021

I can look into fixing the vctrs issue end of August. Would that be a good timeframe @krlmlr?

@krlmlr
Copy link
Member Author

krlmlr commented Jul 19, 2021

I no longer agree that this is behavior that is "different from vctrs". It was an undesired breaking change when we implemented our own version of subassignment. This PR undoes this breaking change.

The corresponding issue has got a few "thumbs up"; the bug doesn't go unnoticed. It is surprising, and something that I think should "just work".

I see vctrs as an infrastructure provider to simplify certain parts of the implementation; if vctrs is not ready, we can work around or prototype here. Given the number of issues in the vctrs repo, maybe there are other things that are even more important. The workaround here is small and self-contained, and could perhaps be ported to vctrs. Perhaps the goals of tibble's API are also slightly different from vctrs; we could conceivably maintain the "same type no matter what" guarantee in vctrs and relax in tibble.

@lionel-
Copy link
Member

lionel- commented Jul 20, 2021

It was an undesired breaking change when we implemented our own version of subassignment. This PR undoes this breaking change.

This breaking change aligned the semantics with vctrs and packages using vctrs. The PR undoes the alignment.

The corresponding issue has got a few "thumbs up"; the bug doesn't go unnoticed.

There's a few reactions in a 2 years old issue.

It is surprising, and something that I think should "just work".

I think so too, but there's value in keeping semantics aligned.

Could you take a look at #903 (comment)?

@krlmlr
Copy link
Member Author

krlmlr commented Jul 21, 2021

I saw that comment. I'd prefer to keep the work decoupled as much as possible. We have tests in tibble, including tests against dev versions. If vctrs changes in an unpredicted and incompatible way, we get an early warning and can discuss.

We seem to agree that NA as a placeholder for a missing value is deeply idiomatic to R. Does this mean that the semantics in vctrs are incomplete, and the alignment has been carried out a bit too early? The tibble package has been around for a few years now, vctrs still is in v0.x.

What would the fix in vctrs look like? Would it be very different from this PR? I think we can gain a lot by merging here before we fix in vctrs.

I just saw the following property of vec_assign() which is pretty cool:

vctrs::vec_assign(tibble::tibble(a = 1:3, b = 4:6), 2, tibble::tibble(a = 4, b = 8))
#> # A tibble: 3 x 2
#>       a     b
#>   <int> <int>
#> 1     1     4
#> 2     4     8
#> 3     3     6

Created on 2021-07-21 by the reprex package (v2.0.0.9000)

This is not how tibble uses vec_assign() today -- we iterate over columns manually and do a vec_assign() for individual columns. This works for atomic columns and matrices, the tests cover that. Subassignment for packed columns, OTOH, never worked properly for data frames.

tbl <- data.frame(a = 1:3)
tbl$x <- data.frame(b = rep(NA_real_, 3), c = rep(NA_character_, 3))
tbl[2, "x"]$b <- 2
#> Warning in `[<-.data.frame`(`*tmp*`, 2, "x", value = structure(list(b = 2, :
#> provided 2 variables to replace 1 variables
tbl
#>   a x.b  x.c
#> 1 1  NA <NA>
#> 2 2   2    2
#> 3 3  NA <NA>

Created on 2021-07-21 by the reprex package (v2.0.0.9000)

It would be pretty cool if we could do that with the help of vctrs -- when it's ready there:

library(tibble)

tbl <- tibble(a = 1:3, x = tibble(b = NA_real_, c = NA_character_))
tbl[2, "x"]$b <- 2
#> Error: Can't recycle input of size 2 to size 1.

Created on 2021-07-21 by the reprex package (v2.0.0.9000)

Until then I'm fine with a solution that covers the vast majority of the real-world use cases and idioms.

@krlmlr krlmlr merged commit de2abba into master Jul 21, 2021
@krlmlr krlmlr deleted the b-773-na-subassign branch July 21, 2021 03:33
@krlmlr
Copy link
Member Author

krlmlr commented Jul 21, 2021

Happy to collaborate on a proper solution in vctrs any time that works for you.

@lionel-
Copy link
Member

lionel- commented Jul 21, 2021

Happy to collaborate on a proper solution in vctrs any time that works for you.

Thank you. I'm 70% sure we can find a solution in vctrs. Let's start by discussing what should happen in the other 30%. Wouldn't it be unfortunate for tibble to have different assignment semantics than the rest of the tidyverse? This was my concern and reason for holding this PR until we have time to take a deep look.

@krlmlr
Copy link
Member Author

krlmlr commented Jul 23, 2021

We may never be able to achieve perfect consistency, or only at too high a cost. I think this compromise gives us room to breathe for now. In the unfortunate (and less probable) case we still can make it stricter later if it's worthwhile. At the moment it seems that idiomatic behavior is more desirable.

tibble 3.1.3 is on CRAN now. I'm sure it will play out one way or another.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants