Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add optional check for duplication in third dimension to mbind() #151

Conversation

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

I had a hard time tracking down where duplicated dimnames where introduced that .dimextract() complained about, so I generalised the solution.

mbind() checks the R option MAGPIE_MBIND_DUPLICATES, which should have the format '<reaction>_<filter>'. If is either 'warning' or 'stop', mbind() will raise a warning or an error when it introduces duplicates in the third dimension of the resulting magpie object. If is set to 'remove-plus', it will check for duplicates disregarding '|+' components in the names (as are used for REMIND output variables). This is intended for debugging purposes. See also the examples.

## Not run: 
options(MAGPIE_MBIND_DUPLICATES = "warning_remove-plus")
mbind(new.magpie(names = 'foo|bar'), new.magpie(names = 'foo|+|bar'))

## End(Not run)

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q changed the title add option check for duplication in third dimension to mbind() add optional check for duplication in third dimension to mbind() Jun 21, 2023
@tscheypidi
Copy link
Member

I see the application, but the implementation has two problems:

  1. It implements a very specific use-case (reporting objects with a very specific variable notation) while magclass should be as generic as possible. We spent in the past quite some time getting rid of application-specific code in the package and while some still remains the aim is not to introduce any new specific code.
  2. we had in the past some duplicate checks in these functions but removed as it considerably affected performance. Having an option which activates it for all mbind statements should most likely yield again the same performance issues we had before.

I would suggest to take that feature and implement it in another more case-specific package, closer to the IAM reporting logic, either as a check function or a wrapper for mbind.

@tscheypidi tscheypidi closed this Jun 21, 2023
@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member Author

  1. we had in the past some duplicate checks in these functions but removed as it considerably affected performance. Having an option which activates it for all mbind statements should most likely yield again the same performance issues we had before.
> N <- 1000000
> 
> foo <- system.time({
+     for (i in seq_len(N)) {
+         duplicatesOption <- strsplit(
+             x = getOption("MAGPIE_MBIND_DUPLICATES", default = ""), split = "_",
+             fixed = TRUE)[[1]]
+         
+         duplicatesOption[1] %in% c("warning", "stop")
+     }
+ })
> sprintf('%.2e', foo[['elapsed']] / N)
[1] "4.95e-06"

Precious nanoseconds …

> a <- population_magpie[,,'A2']
> b <- population_magpie[,,'B1']
> 
> bar <- system.time({
+     for (i in seq_len(N / 1000)) {
+         invisible(mbind(a, b))
+     }
+ })
> sprintf('%.2e', bar[['elapsed']] / (N / 1000))
[1] "3.33e-03"

More than 1 ‰ of typical run time!

  1. It implements a very specific use-case (reporting objects with a very specific variable notation) while magclass should be as generic as possible. We spent in the past quite some time getting rid of application-specific code in the package and while some still remains the aim is not to introduce any new specific code.

It implements the most generic form possible, plus one special case, that would be useful to 50 % of the paying magclass user base (rough estimate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants