-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect package encoding when parsing source code #605
Comments
It seems that non-ascii characters are allowed in tests, at least r cmd
check does not warn for them, so it would indeed make sense to use the
package encoding.
…On 24 Jun 2017 11:58, "Jeroen Ooms" ***@***.***> wrote:
There is some support for utf8
<#550> however by default
test_check() sources all test files as native. I think that if DESCRIPTION
contains Encoding: UTF-8 all test files should be sourced as UTF-8.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#605>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAoTQAncExfEh6_0vUAmc3NFWZeKGbHSks5sHOvpgaJpZM4OES1D>
.
|
We should pull across the code from r-lib/roxygen2#649. Unless you feel strongly, I think it's better to simply read in everything as UTF-8, and warn the user that testthat doesn't support other encodings. |
I don't feel strongly, though if the package declares |
@hadley There are still problems with this. Reading the files in UTF-8 is one thing, but you also need you to supply the encoding to Line 27 in 1faa32f
And maybe in This said, I am not convinced that defaulting to UTF-8 is best. UTF-8 is still a bit painful on Windows, as I have just experienced. It is also not the default, of course. |
The old testthat actually (by default) keeps UTF-8 files as UTF-8, because it marks them as "unknown" and then The new behavior is worse, because you end up with stings recoded into the native encoding, typically latin1 on windows, and this conversion loses information, so it is not even possible to convert it back to UTF-8. Can I fix the |
Yes please! |
Not so easy. :( ## Need to run this in a latin1 locale
old_locale <- Sys.getlocale("LC_CTYPE")
Sys.setlocale("LC_CTYPE", "en_US.ISO8859-1")
## UTF-8 string, quoted, so we can parse it
lines <- as.raw(c(0x22, 0xc3, 0xa1, 0x72, 0x76, 0xc3, 0xad, 0x7a, 0x74,
0xc5, 0xb1, 0x72, 0xc5, 0x91, 0x20, 0x74, 0xc3, 0xbc, 0x6b, 0xc3,
0xb6, 0x72, 0x66, 0xc3, 0xba, 0x72, 0xc3, 0xb3, 0x67, 0xc3, 0xa9,
0x70, 0x22))
lines <- rawToChar(lines)
Encoding(lines) <- "UTF-8"
stringi::stri_enc_isutf8(lines)
# > [1] TRUE
## Parse it, keep it UTF-8
expr <- parse(text = lines, encoding = "UTF-8")[[1]]
Encoding(expr)
#> [1] "UTF-8"
## Ooops
stringi::stri_enc_isutf8(expr)
#> [1] FALSE
## It was recoded into latin1 (the native encoding) :(
stringi::stri_enc_isutf8(iconv(expr, "latin1", "UTF-8"))
#> [1] TRUE
## With a text connection it is OK. Phew!
expr2 <- parse(textConnection(lines, encoding = "UTF-8"), encoding = "UTF-8")[[1]]
stringi::stri_enc_isutf8(expr2)
#> [1] TRUE
Sys.setlocale("LC_CTYPE", old_locale) |
There is some support for utf8 however by default
test_check()
sources all test files as native. I think that ifDESCRIPTION
containsEncoding: UTF-8
all test files should be sourced as UTF-8.The text was updated successfully, but these errors were encountered: