Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using row index .I for joins as an argument to on= #5934

Open
MLopez-Ibanez opened this issue Feb 11, 2024 · 1 comment
Open

Using row index .I for joins as an argument to on= #5934

MLopez-Ibanez opened this issue Feb 11, 2024 · 1 comment

Comments

@MLopez-Ibanez
Copy link
Contributor

library(data.table)
dt1 = CJ(a=1:5, b=c("yes","no"))
# Key: <a, b>
# Index: <b>
#         a      b
#     <int> <char>
#  1:     1     no
#  2:     1    yes
#  3:     2     no
#  4:     2    yes
#  5:     3     no
#  6:     3    yes
#  7:     4     no
#  8:     4    yes
#  9:     5     no
# 10:     5    yes
dt2 = dt1[b == "yes", list(.RESULT = list(seq(0, a))), by=.I][,unlist(.RESULT), by=I]
#         I    V1
#     <int> <int>
#  1:     2     0
#  2:     2     1
#  3:     4     0
#  4:     4     1
#  5:     4     2
#  6:     6     0
#  7:     6     1
#  8:     6     2
#  9:     6     3
# 10:     8     0
# 11:     8     1
# 12:     8     2
# 13:     8     3
# 14:     8     4
# 15:    10     0
# 16:    10     1
# 17:    10     2
# 18:    10     3
# 19:    10     4
# 20:    10     5
#         I    V1
## This is what I want to do but it gives an error
# dt1 <- dt1[dt2, on=list(.I="I")]
# Error in colnamesInt(x, names(on), check_dups = FALSE) : 
#  argument specifying columns received non-existing column(s): cols[1]='.I'
## This is what I need to do but it is so much longer and complicated (and probably slower)
tmp <- copy(dt1)
tmp <- tmp[, row_index:=.I][dt2, on = list(row_index=I)]
tmp[, row_index:=NULL]
dt1 <- tmp
rm(tmp)
dt1
# Key: <a, b>
#         a      b    V1
#     <int> <char> <int>
#  1:     1    yes     0
#  2:     1    yes     1
#  3:     2    yes     0
#  4:     2    yes     1
#  5:     2    yes     2
#  6:     3    yes     0
#  7:     3    yes     1
#  8:     3    yes     2
#  9:     3    yes     3
# 10:     4    yes     0
# 11:     4    yes     1
# 12:     4    yes     2
# 13:     4    yes     3
# 14:     4    yes     4
# 15:     5    yes     0
# 16:     5    yes     1
# 17:     5    yes     2
# 18:     5    yes     3
# 19:     5    yes     4
# 20:     5    yes     5
#         a      b    V1

To make this work, issue #1494 probably needs to be fixed or introduce a different .ROWI that doesn't have .I's behavior.

(.ROWI or .ROWINDEX is anyway clearer than .I)

@trobx
Copy link

trobx commented Mar 31, 2024

dt1 <- dt1[, I:=.I][dt2, on=.(I)][, I:=NULL]

The copy to tmp is superfluous (unless I'm missing something?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants