Inconsistent and confusing naming convention for unequal join #2006

Guozhong-Gordon · 2017-01-30T10:31:24Z

It is great to have unequal join in data.table.
However, when I start to test the functionality. I found the following issue:

DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)

X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))

DT[X, .( x,foo,v,i.v,x.v, x.y,y), on=.(x, y<=foo)]

The result includes all rows from the second table (X here).
In general, the default naming conventions for the columns in the results are:
names without prefix refers to columns and values from the first table in the join. In the above example, from the table DT.
When we pay attention to the values of v and y, we will see the following inconsistency.
Namely, when the unequal join columns having different names,
eg, y with foo, as the above. It uses the name y from the first table
and actually take the value of "foo" which is from the second table, in the above example (X).
This is rather confusing. Is this a bug or the intention?

When the unequal join for two columns with the same name. It gets even more confusing.
A second example:
DT[X, .(x,y,v,foo,DT.v=x.v, i.v), on=.(x, v>=v)]
Just as noted previously, all columns without
prefix should have the values from the first table!
For those columns that are included in the unequal join
this does not apply! The name v is from the first table,
but the value is from the second table!
This is very confusing.

As a consequence of this, the following syntax
we can get completely confused with the following, when we
do not explicitly specify which tables the v columns comes from:
DT[X, on=.(x, v>=v)]
Look at the v column, Is it the value of the second table, or the first?
DT[X, on=.(x, v>=v), sum(y)*foo, by=.EACHI]
Which y is being summed?

jangorecki · 2017-03-05T12:59:36Z

@Guozhong-Gordon Columns on which you join are taken from i. so you need to take care of expected ones using x.. This behavior is discussed in #1615.

Guozhong-Gordon · 2017-03-05T18:54:16Z

Thanks for pointing to early related discussion. I know what is used as colum name. The issue remains, confusion in the case of unequal join. It is against the convention used in sql, also against common sense intuition. One has to twist the brain to avoid being confused.

…

On Sunday, March 5, 2017, Jan Gorecki ***@***.***> wrote: @Guozhong-Gordon <https://github.com/Guozhong-Gordon> Columns on which you join are taken from i. so you need to take care of expected ones using x.. This behavior is discussed in #1615 <#1615>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2006 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AYQGzPRPON_4uXrT9i1fPegqOhGzkZSVks5rirG7gaJpZM4LxPfU> .

sritchie73 mentioned this issue Mar 26, 2018

SQL-like column return for non-equi and rolling joins #2706

Closed

10 tasks

mattdowle added this to the v1.10.6 milestone Apr 10, 2018

mattdowle modified the milestones: v1.11.0, v1.11.2 Apr 29, 2018

jangorecki modified the milestones: 1.12.0, 1.11.6 Jun 6, 2018

mattdowle modified the milestones: 1.11.6, 1.12.0 Sep 20, 2018

mattdowle linked a pull request Oct 3, 2018 that will close this issue

Both columns for rolling and non-equi joins #3093

Open

4 tasks

mattdowle modified the milestones: 1.12.0, 1.12.2 Jan 11, 2019

jangorecki added the non-equi joins rolling, overlapping, non-equi joins label Apr 6, 2020

MichaelChirico added the consistency label Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent and confusing naming convention for unequal join #2006

Inconsistent and confusing naming convention for unequal join #2006

Guozhong-Gordon commented Jan 30, 2017

jangorecki commented Mar 5, 2017

Guozhong-Gordon commented Mar 5, 2017 via email

Inconsistent and confusing naming convention for unequal join #2006

Inconsistent and confusing naming convention for unequal join #2006

Comments

Guozhong-Gordon commented Jan 30, 2017

jangorecki commented Mar 5, 2017

Guozhong-Gordon commented Mar 5, 2017 via email