Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings dependency #216

Closed
elong0527 opened this issue Apr 14, 2024 · 3 comments
Closed

strings dependency #216

elong0527 opened this issue Apr 14, 2024 · 3 comments
Labels
question Further information is requested

Comments

@elong0527
Copy link
Collaborator

elong0527 commented Apr 14, 2024

In the convert function, I need to replace strings based on a mapping rule.

Currently, I rely on stringi in the convert function. Is there a efficient way in using base R?

text[index] <- stringi::stri_replace_all_fixed(text[index], names(char_latex), char_latex,

Here is an example mappings I would need to transfer from left side to the right side in a vector of strings.

  char_rtf <- c(
    "^" = "\\super ",
    "_" = "\\sub ",
    ">=" = "\\geq ",
    "<=" = "\\leq ",
  )

cc: @nanxstats @yihui

@elong0527 elong0527 added the question Further information is requested label Apr 14, 2024
@yihui
Copy link
Collaborator

yihui commented Apr 14, 2024

I think what you are doing currently is quite reasonable:

r2rtf/R/conversion.R

Lines 106 to 114 in c55513d

if (load_stringi) {
text[index] <- stringi::stri_replace_all_fixed(text[index], names(char_latex), char_latex,
vectorize_all = FALSE, opts_fixed = list(case_insensitive = FALSE)
)
} else {
for (i in 1:length(char_latex)) {
text[index] <- gsub(names(char_latex[i]), char_latex[i], text[index], fixed = TRUE)
}
}

I wouldn't worry too much about the performance of gsub() (with a for-loop). You can do some benchmarking for convert(load_stringi = TRUE) vs convert(load_stringi = FALSE) to have a clearer idea. I guess the latter is very likely to be slower, but if practically the difference is 10ms (gsub()) vs 1ms (stringi), I won't bother thinking about it at all and will just use gsub(). Of course, the time depends on the size of the character vector text. I guess the time difference won't be noticeable if length(text) is relatively small (e.g., less than 1000).

@elong0527
Copy link
Collaborator Author

Thanks for the suggestion, I am closing the issue with your blessing.

I added it because of a use case in clinical trial that requires to save all safety data as a listing in RTF format for EU (called ICH listing).

For a large trial, it can goes to > 100k records with > 10k pages in a RTF files.

@yihui
Copy link
Collaborator

yihui commented Apr 15, 2024

For 100k records, the time difference may be noticeable but I guess gsub() should take no longer than one second, which should be fine. Again, benchmarking will give you a clearer idea.

BTW, for gsub(), using perl = TRUE (with default fixed = FALSE, since fixed = TRUE is incompatible with perl = TRUE) may give you some substantial speedup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants