Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment implementing po_extract() #243

Merged
merged 31 commits into from
Nov 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
6cb76f6
Extract po_scan()
hadley Nov 6, 2021
a70b9bf
Inform selectively about R/src messages
hadley Nov 6, 2021
69bdd72
Optionally exclude condition functions
hadley Nov 6, 2021
06e1197
Build in tr_ support + add "explicit" style
hadley Nov 6, 2021
d2fe235
Also scan for messagef() etc
hadley Nov 6, 2021
05425a4
Delete accidentally committed .pot
hadley Nov 6, 2021
04aca15
Add tr/tr_ to known translators
hadley Nov 6, 2021
410f83c
Consistent function names
hadley Nov 6, 2021
da3ab28
Treat tr_ as a dots function
hadley Nov 7, 2021
655700e
Pull use_tr into own block
hadley Nov 7, 2021
ef21080
Make domain_fmt_funs consistent
hadley Nov 7, 2021
c3bc2c9
%chin%
MichaelChirico Nov 7, 2021
e401af0
typo
MichaelChirico Nov 7, 2021
dc077fd
Rename to po_extract()
hadley Nov 8, 2021
c28af76
Push style argument down to get_r_messages()
hadley Nov 8, 2021
743cb45
Start on docs
hadley Nov 8, 2021
dab11d7
Merge commit '583db6471e8a272a461006717855cee32ba82dd8'
hadley Nov 8, 2021
750e805
Also need to update usage
hadley Nov 8, 2021
ee6f642
Invisibly return the message data
hadley Nov 8, 2021
b600b24
Read style from DESCRIPTION if not set
hadley Nov 8, 2021
df938eb
Update test
hadley Nov 8, 2021
bbd4b43
Convert get_message_data.Rd to roxygen2
hadley Nov 8, 2021
e00ea69
Inherit params in po_extract()
hadley Nov 8, 2021
9e923e3
WS
hadley Nov 8, 2021
eefbc6d
Merge branch 'master' into po_scan
MichaelChirico Nov 9, 2021
fbe527a
ws
MichaelChirico Nov 9, 2021
0615c70
explicit argument for readability
MichaelChirico Nov 9, 2021
eb85e5f
clarify docs
MichaelChirico Nov 9, 2021
2f56145
update verbose default
MichaelChirico Nov 9, 2021
46cef27
TODO comment for later
MichaelChirico Nov 9, 2021
ca58ca0
update .rd
MichaelChirico Nov 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ Encoding: UTF-8
Config/testthat/edition: 3
VignetteBuilder: knitr
RoxygenNote: 7.1.2
Roxygen: list(markdown = TRUE)
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ export(translate_package)
export(get_message_data)
export(write_po_file, po_metadata)

export(po_extract)
export(po_compile)

export(check_cracked_messages, check_untranslated_cat, check_untranslated_src)
Expand Down
85 changes: 82 additions & 3 deletions R/get_message_data.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,98 @@
#' Extract user-visible messages from a package
#'
#' This function looks in the R and src directories of a package for
#' user-visible messages and compiles them as a [data.table::data.table()]
#' to facilitate analyzing this corpus as such.
#'
#' @param dir Character, default the present directory; a directory in which
#' an R package is stored.
#' @param custom_translation_functions A `list` with either/both of two
#' components, `R` and `src`, together governing how to extract any
#' non-standard strings from the package.
#'
#' See Details in [translate_package()].
#' @param style Translation style, either `"base"` or `"explict"`.
#' The default, `NULL`, reads from the `DESCRIPTION` field
#' `Config/potools/style` so you can specify the style once for your
#' package.
#'
#' Both styles extract strings explicitly flagged for translation with
#' `gettext()` or `ngettext()`. The base style additionally extracts
#' strings in calls to `stop()`, `warning()`, and `message()`,
#' and to `stopf()`, `warningf()`, and `messagef()` if you have
#' added those helpers to your package. The explicit style also accepts
#' `tr_()` as a short hand for `gettext()`. See
#' `vignette("developer")` for more details.
#' @param verbose Logical, default `TRUE` (except during testing). Should
#' extra information about progress, etc. be reported?
#' @return A `data.table` with the following schema:
#'
#' * `message_source`, `character`, either `"R"`
#' or `"src"`, saying whether the string was found in the R or the src
#' folder of the package
#' * `type`, `character`, either `"singular"` or `"plural"`;
#' `"plural"` means the string came from [ngettext()] and can be pluralized
#' * `file`, `character`, the file where the string was found
#' * `msgid`, `character`, the string (character literal or `char` array as
#' found in the source); missing for all `type == "plural"` strings
#' * `msgid_plural`, `list(character, character)`, the strings
#' (character literals or `char` arrays as found in the source); the first
#' applies in English for `n=1` (see `ngettext`), while the second
#' applies for `n!=1`; missing for all `type == "singular"` strings
#' * `call`, `character`, the full call containing the string
#' that was found
#' * `line_number`, `integer`, the line in `file` where the string was found
#' * `is_repeat`, `logical`, whether the `msgid` is a duplicate within this
#' `message_source`
#' * `is_marked_for_translation`, `logical`, whether the string is marked for
#' translation (e.g., in R, all character literals supplied to a `...`
#' argument in [stop()] are so marked)
#' * `is_templated`, `logical`,whether the string is templatable (e.g., uses
#' `%s` or other formatting markers)
#' @author Michael Chirico
#' @seealso [translate_package()], [write_po_file()]
#' @examples
#' pkg <- system.file('pkg', package = 'potools')
#' get_message_data(pkg)
#'
#' # includes strings provided to the custom R wrapper function catf()
#' get_message_data(pkg, custom_translation_functions = list(R = "catf:fmt|1"))
#'
#' # includes untranslated strings provided to the custom
#' # C/C++ wrapper function ReverseTemplateMessage()
#' get_message_data(
#' pkg,
#' custom_translation_functions = list(src = "ReverseTemplateMessage:2")
#' )
#'
#' # cleanup
#' rm(pkg)
get_message_data = function(
dir = ".",
custom_translation_functions = list(R = NULL, src = NULL),
style = NULL,
verbose = !is_testing()
) {
package = get_desc_data(dir, 'Package')
is_base = package == 'base'

if (verbose) message('Getting R-level messages...')
# If style not specified, read from DESCRIPTION
if (is.null(style)) {
style <- get_desc_data(dir, "Config/potools/style")
if (is.na(style)) {
style <- "base"
}
}

if (verbose && dir.exists(file.path(dir, "R"))) message('Getting R-level messages...')
r_message_data = get_r_messages(
dir,
custom_translation_functions = custom_translation_functions$R,
is_base
style = style,
is_base = is_base
)

if (verbose) message('Getting src-level messages...')
if (verbose && dir.exists(file.path(dir, "src"))) message('Getting src-level messages...')
src_message_data = get_src_messages(
dir,
custom_translation_functions = custom_translation_functions$src,
Expand Down
43 changes: 35 additions & 8 deletions R/get_r_messages.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Spiritual cousin version of tools::{x,xn}gettext. Instead of iterating the AST
# as R objects, do so from the parse data given by utils::getParseData().
get_r_messages <- function (dir, custom_translation_functions = NULL, is_base = FALSE) {
get_r_messages <- function (dir, custom_translation_functions = NULL, is_base = FALSE, style = c("base", "explicit")) {
style <- match.arg(style)

expr_data <- rbindlist(lapply(parse_r_files(dir, is_base), getParseData), idcol = 'file')
# R-free package (e.g. a data package) fails, #56
if (!nrow(expr_data)) return(r_message_schema())
Expand Down Expand Up @@ -37,16 +39,27 @@ get_r_messages <- function (dir, custom_translation_functions = NULL, is_base =
# <!-- mix and match those two types indefinitely -->
# <OP-RIGHT-PAREN>)</OP-RIGHT-PAREN>
# </expr>
dots_funs <- domain_dots_funs(use_conditions = style == "base")
fmt_funs <- domain_fmt_funs(use_conditions = style == "base")

singular_strings = rbind(
get_dots_strings(expr_data, DOMAIN_DOTS_FUNS, NON_DOTS_ARGS),
get_dots_strings(expr_data, dots_funs, NON_DOTS_ARGS),
# treat gettextf separately since it takes a named argument, and we ignore ...
get_named_arg_strings(expr_data, 'gettextf', c(fmt = 1L), recursive = TRUE),
get_named_arg_strings(expr_data, fmt_funs, c(fmt = 1L), recursive = TRUE),
# TODO: drop recursive=FALSE option now that exclude= is available? main purpose of recursive=
# was to block cat(gettextf(...)) usage right?
get_dots_strings(expr_data, 'cat', c("file", "sep", "fill", "labels", "append"), recursive = FALSE)
)

plural_strings = get_named_arg_strings(expr_data, 'ngettext', c(msg1 = 2L, msg2 = 3L), plural = TRUE)

if (style == "explicit") {
tr_ <- get_dots_strings(expr_data, 'tr_', character(), recursive = TRUE)
tr_n <- get_named_arg_strings(expr_data, 'tr_n', c(singular = 2L, plural = 3L), plural = TRUE)

singular_strings <- rbind(singular_strings, tr_)
plural_strings <- rbind(plural_strings, tr_n)
}

# for plural strings, the ordering within lines doesn't really matter since there's only one .pot entry,
# so just use the parent's location to get the line number
plural_strings[ , id := parent]
Expand Down Expand Up @@ -146,11 +159,14 @@ get_r_messages <- function (dir, custom_translation_functions = NULL, is_base =
# You are trying to join data.tables where %s has 0 columns.
msg[type == 'singular', 'is_repeat' := duplicated(msgid)]

known_translators = c(DOMAIN_DOTS_FUNS, 'ngettext', 'gettextf', get_fnames(custom_params))
known_translators = c(dots_funs, 'ngettext', fmt_funs, get_fnames(custom_params))
if (style == "explicit") {
known_translators <- c(known_translators, "tr", "tr_")
}
msg[ , 'is_marked_for_translation' := fname %chin% known_translators]

# TODO: assume custom translators are translated? or maybe just check the regex?
msg[ , "is_templated" := fname == "gettextf"]
msg[ , "is_templated" := fname %chin% fmt_funs]
msg[ , "fname" := NULL]

msg[]
Expand Down Expand Up @@ -274,10 +290,21 @@ exclude_untranslated = function(expr_data, comments) {
# if (is.null(f_args <- args(f))) next
# if (any(names(formals(f_args)) == 'domain')) cat(obj, '\n')
# }
DOMAIN_DOTS_FUNS = c("warning", "stop", "message", "packageStartupMessage", "gettext")
domain_dots_funs <- function(use_conditions = TRUE) {
c(
"gettext",
if (use_conditions) c("stop", "warning", "message", "packageStartupMessage")
)
}

domain_fmt_funs <- function(use_conditions = TRUE) {
paste0(domain_dots_funs(use_conditions), "f")
}

#
NON_DOTS_ARGS = c("domain", "call.", "appendLF", "immediate.", "noBreaks.")

# for functions (e.g. DOMAIN_DOTS_FUNS) where we extract strings from ... arguments
# for functions (e.g. domain_dots_funs) where we extract strings from ... arguments
get_dots_strings = function(expr_data, funs, arg_names, exclude = c('gettext', 'gettextf', 'ngettext'), recursive = TRUE) {
call_neighbors = get_call_args(expr_data, funs)
call_neighbors = drop_suppressed_and_named(call_neighbors, expr_data, arg_names)
Expand Down
45 changes: 45 additions & 0 deletions R/po_extract.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#' Extract messages for translation into a `.pot` file
#'
#' `po_extract()` scans your package for strings to be translated and
#' saves them into a `.pot` template file (in the package's `po` directory).
#' You should never modify this file by hand; instead modify the underlying
#' source code and re-run `po_extract()`.
#'
#' @returns The extracted messages as computed by [get_message_data()],
#' invisibly.
#' @inheritParams get_message_data
po_extract <- function(
dir = ".",
custom_translation_functions = list(),
verbose = !is_testing(),
style = NULL) {

message_data <- get_message_data(dir,
custom_translation_functions = custom_translation_functions,
verbose = verbose,
style = style
)

n <- nrow(message_data)
if (!n) {
if (verbose) message('No messages to translate')
return(invisible())
}
# TODO: messagef() is double-translating the ngettext() result...
# it needs an escape valve
if (verbose) messagef(ngettext(n, "Found %i message", "Found %i messages"), n)

po_dir <- file.path(dir, 'po')
dir.create(po_dir, showWarnings = FALSE)

desc <- get_desc_data(dir)
po_params = list(
package = desc[['Package']],
version = desc[['Version']],
copyright = NULL,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is your sense that these fields (copyright and bugs) aren't really needed?

my sense as well has been these things are basically covered by the DESCRIPTION file already, and kept it to try & be consistent with base (and it's something of a maintenance headache to implement)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should remove them; it's better to leave to the DESCRIPTION and avoid creating duplicates that might get out of date.

bugs = NULL
)

write_po_files(message_data, po_dir, po_params, template = TRUE, verbose = verbose)
invisible(message_data)
}
1 change: 0 additions & 1 deletion R/translate_package.R
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,6 @@ translate_package = function(
if (length(responded_yes) && responded_yes) return(invisible())
}

if (verbose) message('Generating .pot files...')
po_params = list(package = package, version = version, copyright = copyright, bugs = bugs)
write_po_files(message_data, po_dir, po_params, template = TRUE, use_base_rules = use_base_rules)

Expand Down
2 changes: 1 addition & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ get_desc_data = function(dir, fields = c('Package', 'Version')) {
stopf('%s is not a package (missing DESCRIPTION)', normalizePath(dir))
}
desc_data <- read.dcf(desc_file, fields)
if (nrow(desc_data) != 1L || anyNA(desc_data)) {
if (missing(fields) && (nrow(desc_data) != 1L || anyNA(desc_data))) {
stopf('%s is not a package (missing Package and/or Version field in DESCRIPTION)', normalizePath(dir))
}
return(drop(desc_data))
Expand Down
5 changes: 4 additions & 1 deletion R/write_po_file.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# default by xgettext, etc (unless --no-location is set, or if --add-location=never).
# note also that the gettext manual says we shouldn't write these ourselves... for now i'm
# going to go ahead and try to anyway until it breaks something :)
write_po_files <- function(message_data, po_dir, params, template = FALSE, use_base_rules = FALSE) {
write_po_files <- function(message_data, po_dir, params, template = FALSE, use_base_rules = FALSE, verbose = TRUE) {
if (template) {
r_file <- sprintf("R-%s.pot", params$package)
src_file <- sprintf("%s.pot", if (params$package == 'base') 'R' else params$package)
Expand All @@ -32,6 +32,9 @@ write_po_files <- function(message_data, po_dir, params, template = FALSE, use_b
`X-Generator` = sprintf("potools %s", packageVersion("potools"))
))
}

if (verbose) messagef('Writing %s', r_file)

write_po_file(
message_data[message_source == "R"],
file.path(po_dir, r_file),
Expand Down
89 changes: 62 additions & 27 deletions man/get_message_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading