Skip to content

Commit

Permalink
Closes #1010. Retain keys only when sure on rolling joins.
Browse files Browse the repository at this point in the history
  • Loading branch information
arunsrinivasan committed Jan 15, 2015
1 parent 31cf650 commit 1584248
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 4 deletions.
4 changes: 2 additions & 2 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1066,7 +1066,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
keylen = which.first(!key(x) %chin% ansvars)-1L
if (is.na(keylen)) keylen = length(key(x))
if (keylen > length(rightcols) && !.Call(CisOrderedSubset, irows, nrow(x))) keylen = length(rightcols)
if (keylen && ((is.data.table(i) && haskey(i)) || is.logical(i) || .Call(CisOrderedSubset, irows, nrow(x))))
if (keylen && ((is.data.table(i) && haskey(i)) || is.logical(i) || (.Call(CisOrderedSubset, irows, nrow(x)) && (!roll || length(irows) == 1L)))) # see #1010. don't set key when i has no key, but irows is ordered and roll != FALSE
setattr(ans,"sorted",head(key(x),keylen))
}
setattr(ans, "class", class(x)) # fix for #5296
Expand Down Expand Up @@ -1177,7 +1177,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
byval = i
bynames = head(key(x),length(leftcols))
allbyvars = NULL
bysameorder = haskey(i) || is.sorted(f__)
bysameorder = haskey(i) || (is.sorted(f__) && (!roll || length(f__) == 1L)) # Fix for #1010
## 'av' correct here ?? *** TO DO ***
xjisvars = intersect(av, names(x)[rightcols]) # no "x." for xvars.
# if 'get' is in 'av' use all cols in 'i', fix for bug #5443
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@

25. `foverlaps()` did not find overlapping intervals correctly *on numeric ranges* in a special case where both `start` and `end` intervals had *0.0*. This is now fixed. Thanks to @tdhock for the reproducible example. Closes [#1006](https://github.com/Rdatatable/data.table/issues/1006) partly.

26. When performing rolling joins, keys are set only when we can be absolutely sure. Closes [#1010](https://github.com/Rdatatable/data.table/issues/1010), which explains cases where keys should not be retained.

#### NOTES

1. Clearer explanation of what `duplicated()` does (borrowed from base). Thanks to @matthieugomez for pointing out. Closes [#872](https://github.com/Rdatatable/data.table/issues/872).
Expand Down
13 changes: 11 additions & 2 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@ test(147, DT[,MySum=sum(v)], error="unused argument") # user meant DT[,list(MyS

dt = data.table(a=c(1L,4L,5L), b=1:3, key="a")
test(148, dt[CJ(2:3),roll=TRUE], data.table(a=c(2L,3L),b=c(1L,1L),key="a"))
test(149, dt[J(2:3),roll=TRUE], data.table(a=c(2L,3L),b=c(1L,1L),key="a")) # in future this will detect the subset is ordered and retain the key
test(149, dt[J(2:3),roll=TRUE], data.table(a=c(2L,3L),b=c(1L,1L))) # in future this will detect the subset is ordered and retain the key

# 150:158 test out of order factor levels in key columns (now allowed from v1.8.0)
dt = data.table(x=factor(c("c","b","a"),levels=c("b","a","c")),y=1:3)
Expand Down Expand Up @@ -4651,7 +4651,7 @@ test(1317.1, dt1[dt2, roll=TRUE, nomatch=0L], data.table(x=c(7L,33L,33L), y=as.D
# also test where 'i' is not sorted.
set.seed(1L)
dt2 <- dt2[sample(nrow(dt2))] # key should be gone
test(1317.2, dt1[dt2, roll=TRUE, nomatch=0L], data.table(x=c(7L,33L,33L), y=as.Date(c("2013-07-31", "2013-07-31", "2013-07-31")), z=c(dt1$z[1:2], dt1$z[2]), w=c(dt2$w[1], dt2$w[c(2,6)]), key="x,y"))
test(1317.2, dt1[dt2, roll=TRUE, nomatch=0L], data.table(x=c(7L,33L,33L), y=as.Date(c("2013-07-31", "2013-07-31", "2013-07-31")), z=c(dt1$z[1:2], dt1$z[2]), w=c(dt2$w[1], dt2$w[c(2,6)])))

# bug fix for #472 : "parse" in j
set.seed(100)
Expand Down Expand Up @@ -5771,6 +5771,15 @@ ans=cbind(dt.ref[, .(start,end)], dt.query[2:3, .(q1,q2)])
setkey(ans, q1,q2)
test(1468, foverlaps(dt.query, dt.ref, nomatch=0L), ans)

# Fix for #1010 (discovered while fixing #1007). Don't retain key if i had no key, but irows is sorted, and roll != FALSE... See example in #1010.
require(data.table) ## 1.9.5
DT = data.table(x=c(-5,5), y=1:2, key="x")
test(1469.1, key(DT[J(c(2,0)), roll=TRUE]), NULL)
test(1469.2, key(DT[J(c(2,0)), .(x,y), roll=TRUE]), NULL)
test(1469.3, key(DT[J(c(2,0)), y, roll=TRUE, by=.EACHI]), NULL)
test(1469.4, key(DT[J(c(2,0))]), NULL)
test(1469.5, key(DT[SJ(c(2,0)), roll=TRUE]), "x")

##########################


Expand Down

0 comments on commit 1584248

Please sign in to comment.