You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The result includes all rows from the second table (X here).
In general, the default naming conventions for the columns in the results are:
names without prefix refers to columns and values from the first table in the join. In the above example, from the table DT.
When we pay attention to the values of v and y, we will see the following inconsistency.
Namely, when the unequal join columns having different names,
eg, y with foo, as the above. It uses the name y from the first table
and actually take the value of "foo" which is from the second table, in the above example (X).
This is rather confusing. Is this a bug or the intention?
When the unequal join for two columns with the same name. It gets even more confusing.
A second example:
DT[X, .(x,y,v,foo,DT.v=x.v, i.v), on=.(x, v>=v)]
Just as noted previously, all columns without
prefix should have the values from the first table!
For those columns that are included in the unequal join
this does not apply! The name v is from the first table,
but the value is from the second table!
This is very confusing.
As a consequence of this, the following syntax
we can get completely confused with the following, when we
do not explicitly specify which tables the v columns comes from:
DT[X, on=.(x, v>=v)]
Look at the v column, Is it the value of the second table, or the first?
DT[X, on=.(x, v>=v), sum(y)*foo, by=.EACHI]
Which y is being summed?
The text was updated successfully, but these errors were encountered:
Thanks for pointing to early related discussion. I know what is used as
colum name. The issue remains, confusion in the case of unequal join. It is
against the convention used in sql, also against common sense intuition.
One has to twist the brain to avoid being confused.
It is great to have unequal join in data.table.
However, when I start to test the functionality. I found the following issue:
DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))
DT[X, .( x,foo,v,i.v,x.v, x.y,y), on=.(x, y<=foo)]
The result includes all rows from the second table (X here).
In general, the default naming conventions for the columns in the results are:
names without prefix refers to columns and values from the first table in the join. In the above example, from the table DT.
When we pay attention to the values of v and y, we will see the following inconsistency.
Namely, when the unequal join columns having different names,
eg, y with foo, as the above. It uses the name y from the first table
and actually take the value of "foo" which is from the second table, in the above example (X).
This is rather confusing. Is this a bug or the intention?
When the unequal join for two columns with the same name. It gets even more confusing.
A second example:
DT[X, .(x,y,v,foo,DT.v=x.v, i.v), on=.(x, v>=v)]
Just as noted previously, all columns without
prefix should have the values from the first table!
For those columns that are included in the unequal join
this does not apply! The name v is from the first table,
but the value is from the second table!
This is very confusing.
As a consequence of this, the following syntax
we can get completely confused with the following, when we
do not explicitly specify which tables the v columns comes from:
DT[X, on=.(x, v>=v)]
Look at the v column, Is it the value of the second table, or the first?
DT[X, on=.(x, v>=v), sum(y)*foo, by=.EACHI]
Which y is being summed?
The text was updated successfully, but these errors were encountered: