You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Submitted by: Stefan Fritsch; Assigned to: Arun ; R-Forge link
Hi,
I couldn't find a bug report for the general problem of matching character vectors with different encodings, so I thought I might open one.
Technically this doesn't have to be a bug (as you're comparing different vectors) but encoding is otherwise handled transparently in R and there is absolutely no indication of this problem to the user whatsoever and it often leads to massive and almost unnoticeable errors.
Imho there should be at least a warning. The code for reproduction is below.
Submitted by: Stefan Fritsch; Assigned to: Arun ; R-Forge link
Hi,
I couldn't find a bug report for the general problem of matching character vectors with different encodings, so I thought I might open one.
Technically this doesn't have to be a bug (as you're comparing different vectors) but encoding is otherwise handled transparently in R and there is absolutely no indication of this problem to the user whatsoever and it often leads to massive and almost unnoticeable errors.
Imho there should be at least a warning. The code for reproduction is below.
Thanks for your time. =)
Code for reproduction
Repository/R-Forge/Revision: 1046
library(data.table)
a<-c("a","ä","ß","z")
In my case the Encoding is latin1 and
I change au to UTF;
if you're on Linux you probably need to
do it the other way around.
Encoding(a)
au<-iconv(a,"latin1","UTF8")
au<-iconv(a,"UTF8","latin1")
dt<-data.table(a,b=1:4)
df<-data.frame(a,b=1:4)
rownames(df)<-df$a
a==au
df[au,]
setkey(dt,a)
dt[au]
merge(df,data.frame(a=au),by="a")
merge(dt,data.table(a=au),by="a")
match(a,au)
chmatch(a,au)
The text was updated successfully, but these errors were encountered: