Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assignment fails when conversion requires type change (R 4.0.0) #773

Closed
pecanka opened this issue May 18, 2020 · 8 comments · Fixed by #903
Closed

Assignment fails when conversion requires type change (R 4.0.0) #773

pecanka opened this issue May 18, 2020 · 8 comments · Fixed by #903
Labels
bug an unexpected problem or unintended behavior vctrs ↗️ Requires vctrs package
Milestone

Comments

@pecanka
Copy link

pecanka commented May 18, 2020

Starting from R 4.0.0 (tested only on x64), an assignment into a column of a tibble filtered by rows fails when type conversion is required for instance from type logical to type integer, as in:

x = tibble(a=NA)
x[1,]$a = 2

The following error code is returned:

Error during wrapup: Assigned data `<tibble>` must be compatible with existing data.
i Error occurred for column `a`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1.
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

Without the filtering of rows there is no issue, the following works fine:

x = tibble(a=NA)
x$a = 2

In earlier versions of (e.g. R 3.6.2 x64) the code runs fine. Is this a bug?

@pecanka pecanka changed the title Assignment fails when conversion requires type change Assignment fails when conversion requires type change (R 4.0.0) May 18, 2020
@hadley
Copy link
Member

hadley commented Jun 5, 2020

In general, this is a deliberate change, so that you don't accidentally change the type of your columns. However, this specific case seems to be a bug, since generally we treat vectors consisting only of missing values as compatible with any other vector.

library(tibble)
df <- tibble(a = NA)
df[1, "a"] <-  2
#> Error: Assigned data `2` must be compatible with existing data.
#> ℹ Error occurred for column `a`.
#> x Can't convert from <double> to <logical> due to loss of precision.
#> * Locations: 1.

df$a[[1]] <-  2
df
#> # A tibble: 1 x 1
#>       a
#>   <dbl>
#> 1     2

Created on 2020-06-05 by the reprex package (v0.3.0)

@hadley hadley added bug an unexpected problem or unintended behavior vctrs ↗️ Requires vctrs package labels Jun 5, 2020
@krlmlr krlmlr added this to the 3.0.2 milestone Jul 5, 2020
@krlmlr
Copy link
Member

krlmlr commented Jul 5, 2020

Upstream: r-lib/vctrs#1172.

Not sure this is a bug. You can work around by initializing with a typed NA or by using a NA subset:

library(tibble)

# Fails
x <- tibble(id = 1:3, a = NA)
x[1:3, ]$a <- 2
#> Error: Assigned data `<tibble>` must be compatible with existing data.
#> ℹ Error occurred for column `a`.
#> x Can't convert from <double> to <logical> due to loss of precision.
#> * Locations: 1, 2, 3.

# Explicit type
x <- tibble(id = 1:3, a = NA_real_)
x[1:3, ]$a <- 2
x
#> # A tibble: 3 x 2
#>      id     a
#>   <int> <dbl>
#> 1     1     2
#> 2     2     2
#> 3     3     2

# If you have a boilerplate value, or for more complex types:
a <- 2
x <- tibble(id = 1:3, a = a[NA])
x
#> # A tibble: 3 x 2
#>      id     a
#>   <int> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     3    NA
x[1:3, ]$a <- 2
x
#> # A tibble: 3 x 2
#>      id     a
#>   <int> <dbl>
#> 1     1     2
#> 2     2     2
#> 3     3     2

Created on 2020-07-05 by the reprex package (v0.3.0)

@krlmlr krlmlr removed this from the 3.0.2 milestone Jul 5, 2020
@Lauler
Copy link

Lauler commented Jul 17, 2020

In my opinion it is a fairly common habit among many R programmers -- whether you agree with it or not -- to use an untyped NA column as placeholder. You ought to consider whether an all NA column maybe should be an exception and allow type conversion in this case.

The case for it:

  1. The to-be-converted NA-placeholder pattern is often intentional from the user as opposed to the other type conversion mistakes you are protecting users from.
  2. You are going to cause a lot of frustration among beginner and intermediate R users who don't necessarily know about the existence of typed NA's.

@krlmlr
Copy link
Member

krlmlr commented Feb 23, 2021

I'm not sure about the consequences. We can't seem to solve in r-lib/vctrs#1172. How do we avoid checking a vector for NA-ness with every assignment? This still requires linear time:

all_na <- function(x) {
  identical(x, rep(NA, length(x)))
}

bench::mark(
  all_na(rep(NA, 1000)),
  all_na(rep(NA, 10000)),
  all_na(rep(NA, 100000)),
  all_na(rep(NA, 1000000))
)
#> # A tibble: 4 x 6
#>   expression                  min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>             <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all_na(rep(NA, 1000))    1.96µs    4.4µs   206965.    7.91KB     20.7
#> 2 all_na(rep(NA, 10000))  14.25µs  20.39µs    40981.   99.23KB     57.5
#> 3 all_na(rep(NA, 1e+05))  180.4µs 188.85µs     4882.  781.34KB     78.8
#> 4 all_na(rep(NA, 1e+06))    2.2ms   2.44ms      348.    7.63MB     61.1

Created on 2021-02-23 by the reprex package (v1.0.0)

I wonder if we should convert NA columns to "vctrs_unspecified" and test for presence of this class. Hypothetical reprex:

library(tibble)

data <- tibble(a = 1:3, b = NA)
data
#> # A tibble: 3 x 2
#>       a     b
#>   <int> <???>
#> 1     1     .
#> 2     2     .
#> 3     3     .
class(data$b)
#> [1] "vctrs_unspecified"

Created on 2021-02-23 by the reprex package (v1.0.0)

Maybe we could (also) use a solution based on ALTREP?

CC @lionel-.

@krlmlr krlmlr added this to the 3.1.1 milestone Feb 23, 2021
@lionel-
Copy link
Member

lionel- commented Feb 23, 2021

How do we avoid checking a vector for NA-ness with every assignment? This still requires linear time:

This benchmark is showing the worst case. In general the unspecified check returns very quickly.

The problem of changing the type of [<- is harder to solve, especially at the C level. I think it's worth thinking about it again for the next non-patch release of vctrs (we'll probably start work on it in a couple of months).

@krlmlr
Copy link
Member

krlmlr commented Feb 23, 2021

We don't necessarily need to change this in vctrs, we can special-case in tibble for now I guess.

I agree that the check usually will be fast enough, except for corner cases when a user fills a logical column backwards. I really wish we had an "unspecified" sexptype.

@lionel-
Copy link
Member

lionel- commented Feb 23, 2021

I think we need a consistent approach so I wouldn't special-case in tibble before we take a general decision about this. vctrs and tibble should not behave differently.

@krlmlr krlmlr modified the milestones: 3.1.1, 3.1.2 Apr 16, 2021
@krlmlr krlmlr modified the milestones: 3.1.2, 3.1.3 Jul 17, 2021
krlmlr added a commit that referenced this issue Jul 21, 2021
- `tbl[row, col] <- rhs` treats an all-`NA` logical vector as a missing value both for existing data (#773) and for the right-hand side value (#868). This means that a column initialized with `NA` (of type `logical`) will change its type when a row is updated to a value of a different type.
@github-actions
Copy link
Contributor

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior vctrs ↗️ Requires vctrs package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants