Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Run] Fix Scientific Notation Errors in Calculator Plugin #28884

Merged
merged 16 commits into from
Jan 3, 2024

Conversation

viggyd
Copy link
Contributor

@viggyd viggyd commented Sep 28, 2023

Summary of the Pull Request

This PR allows the Calculator plugin in PowerToys Run to use scientific notation (e.g. 5e3 = 5000). The PR fixes a regression in PowerToys that was introduced in v0.69.1.

PR Checklist

Detailed Description of the Pull Request / Additional comments

Nothing else fixed.

Validation Steps Performed

I updated the Calculator unit tests to include various expressions using this notation and confirmed that they all pass.

After adding logic to replace `e` as a mathematical constant, there are bugs
when trying to use expressions like `5e3`. This change parses the
`<number>e<number>` format into expanded form to prevent replacement with
the constant.

Regex explanation: `(\d+\.*\d*)[eE](-*\d+)
`(\d+\.*\d*)`: Match any number of digits, followed by an optional `.` and
               more optional digits. The expression is used to capture things
               like: `5.0`, `1.`, `1` before the `e`
`[eE]`: Match either upper or lowercase `e`
`(-*\d+)`: Capture an optional `-` sign as well as a number of digits
The new regex captures a wider variety of numbers. See this post for details
on the regex used:

https://stackoverflow.com/a/23872060
@viggyd viggyd changed the title Module/run/fix scientific notation [Run] Fix Scientific Notation Errors in Calculator Plugin Sep 28, 2023
@viggyd
Copy link
Contributor Author

viggyd commented Sep 28, 2023

@microsoft-github-policy-service agree

Using `[` didn't capture the expression properly. Had to use `(` instead.
Copy link

@ekcom ekcom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions to improve/clarify regex included, but otherwise looks good.

Copy link

@Aradig26 Aradig26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these fast

Copy link
Collaborator

@jaimecbernardo jaimecbernardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Sorry for the time I took in reviewing this PR. I thought changes were still being discussed.
Thanks for opening the PR.

It looks good to me, but I think scientific notation should only be with the upper case "E".
The main reason being that currently there can be some ambiguous expressions with the constant "e",

What do you think?

@jaimecbernardo
Copy link
Collaborator

@viggyd , Please let me know what you think of the change I proposed (uppercase E only).
Trying to get this in before .75 and I'm closing out the merge window pretty soon.
If you're OK with the change, I can implement it as well.

@viggyd
Copy link
Contributor Author

viggyd commented Oct 24, 2023

@viggyd , Please let me know what you think of the change I proposed (uppercase E only). Trying to get this in before .75 and I'm closing out the merge window pretty soon. If you're OK with the change, I can implement it as well.

Sorry, for some reason I wasn't getting notifications for this. There is one change in the regex to make because it could accept multiple decimal points.

I don't have strong opinions on the capital E, using lowercase is what I'm used to, but I agree it is ambiguous with other usage. I'll make it stricter to reduce that ambiguity. I can push the changes up this evening. If you need them sooner, you're welcome to make them also!

@jaimecbernardo
Copy link
Collaborator

jaimecbernardo commented Oct 24, 2023

There is one change in the regex to make because it could accept multiple decimal points.

Thanks @viggy . I'm not sure what you mean with the multiple decimal points. Anyway, I don't think this is ready to go in yet after a re-review, because it doesn't support different decimal or number grouping characters.
I've added a commit to support only uppercase E and to add the expression to the the RegValidExpressChar so that upper case E is actually accepted.

@viggyd
Copy link
Contributor Author

viggyd commented Oct 24, 2023

There is one change in the regex to make because it could accept multiple decimal points.

Thanks @viggy . I'm not sure what you mean with the multiple decimal points. Anyway, I don't think this is ready to go in yet after a re-review, because it doesn't support different decimal or number grouping characters. I've added a commit to support only uppercase E and to add the expression to the the RegValidExpressChar so that upper case E is actually accepted.

Could you elaborate what you mean by different decimal or number grouping characters? I don't follow. I can add whatever additional changes are necessary

@jaimecbernardo
Copy link
Collaborator

Could you elaborate what you mean by different decimal or number grouping characters? I don't follow. I can add whatever additional changes are necessary

Hi @viggyd ,
There are regional formats where . is not the decimal point, but , is used instead. For example, "1.05" could be written like "1,05" on those countries.
image

With these languages the calculator plugin supports the decimal point of the locale. Here's one example:
image

There's even languages that use . as a group separator. Here's Germany(German), for example:
image

But with this PR, it assumes . will always be the decimal separator:
image
image

@viggyd
Copy link
Contributor Author

viggyd commented Oct 29, 2023

Hi @viggyd , There are regional formats where . is not the decimal point, but , is used instead. For example, "1.05" could be written like "1,05" on those countries. image

Are there any functions to get the decimal separator?

@jaimecbernardo
Copy link
Collaborator

Hi @viggyd , There are regional formats where . is not the decimal point, but , is used instead. For example, "1.05" could be written like "1,05" on those countries. image

Are there any functions to get the decimal separator?

In NumberTranslator.cs there's references to culture.NumberFormat.NumberGroupSeparator and culture.NumberFormat.NumberGroupSeparator

@htcfreek
Copy link
Collaborator

@viggyd
Please check out the plugin settings for calculator as there is a setting to force English input format.

@stefansjfw
Copy link
Collaborator

@viggyd Any progress here? Do you need some more help?

@viggyd
Copy link
Contributor Author

viggyd commented Dec 11, 2023

I'm sorry, I completely forgot about this issue. I'll be able to work on it this weekend, should have something that works by then

Previous regular expression did not allow decimal separators that were not ".".
The new expression uses the culture info decimal separator instead, which
should work better.
@viggyd
Copy link
Contributor Author

viggyd commented Dec 17, 2023

Okay, I pushed my update that accounts for various decimal separators, sorry for the delay!

This comment has been minimized.

Copy link
Collaborator

@Jay-o-Way Jay-o-Way left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two details

Comment on lines 82 to 92
if (CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator == ".")
{
p = string.Format(CultureInfo.CurrentCulture, p, @"\.");
}
else
{
p = string.Format(
CultureInfo.CurrentCulture,
p,
CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this can also be done with a ternary. But both would work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a ternary would take too much horizontal space because the CultureInfo.CurrentCulture... is a lot of characters so I left it at the expanded if/else instead

CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator);
}

var r = "($1 * 10^($5))";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create a variable for a single use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, I was just following a similar format to other functions that are already in the file. I updated the code to remove the single use variable in my latest push

@@ -18,6 +20,7 @@ public static class CalculateHelper
@"sinh\s*\(|cosh\s*\(|tanh\s*\(|arsinh\s*\(|arcosh\s*\(|artanh\s*\(|" +
@"pi|" +
@"==|~=|&&|\|\||" +
@"((-?(\d+(\.\d*)?)|-?(\.\d+))[E](-?\d+\.*\d*))|" + /* expression from CheckScientificNotation between parenthesis */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be updated here as well? only integer after E

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Fixed!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks :) Change looks good to me, I'll just give it a test and then it'll be good to go! :)

Missed this expression in my last update. Whoops
* -?(\d+({0}\d*)?): Captures a decimal number starting with a number (e.g. "-1.23")
* -?({0}\d+): Captures a decimal number without leading number (e.g. ".23")
* E: Captures capital 'E'
* (-?\d+{0}?\d*): Captures an integer number (e.g. "-1" or "23")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a comment update here as well :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stefansjfw
Copy link
Collaborator

Hm, it still accepts and processes decimal numbers after E. And it gives weird results:
image

not sure why

@stefansjfw
Copy link
Collaborator

Hm, it still accepts and processes decimal numbers after E. And it gives weird results: image

not sure why

Actually it assumes multiplication between 1E1 and .3 -> 1E1 * 0.3 . Weird syntax and can confuse users.

@viggyd
Copy link
Contributor Author

viggyd commented Dec 26, 2023

Looks like that's just something you can do? Definitely weird behavior:
image

@viggyd
Copy link
Contributor Author

viggyd commented Dec 26, 2023

Hm, it still accepts and processes decimal numbers after E. And it gives weird results: image
not sure why

Actually it assumes multiplication between 1E1 and .3 -> 1E1 * 0.3 . Weird syntax and can confuse users.

Any ideas on how best to deal with that? Is there a way to reject a pattern?

@viggyd
Copy link
Contributor Author

viggyd commented Dec 26, 2023

Part of the problem is that there's a lot of other expressions that already allow expressions in parentheses followed by any number of spaces and it treats that as valid. So I can do something like (10) 3 and I'll get 30 as a result even though it looks funny.

Edit: Just checked and it doesn't even have to be in parentheses. 10 3 still results in 30

@stefansjfw
Copy link
Collaborator

Hm, it still accepts and processes decimal numbers after E. And it gives weird results: image
not sure why

Actually it assumes multiplication between 1E1 and .3 -> 1E1 * 0.3 . Weird syntax and can confuse users.

Any ideas on how best to deal with that? Is there a way to reject a pattern?

image

looks like we accept this in general. I'm still worried that e.g. 1E1.5 might confuse users, because 10^1.5 is valid math expression, but E shouldn't accept decimal after (Windows calculator also prevent this when using exp).. Let's keep it as is as we respect the rules here.

there is another problem now. If I change the numbers format, as @jaimecbernardo mentioned above, to e.g. Portuguese:
image

Regex is correctly matched, but the result is wrong 🤔

@viggyd
Copy link
Contributor Author

viggyd commented Dec 27, 2023

I see the problem. Ordinarily, the NumberTranslator converts the number/decimal separators so that by the time input reaches the calculator engine, it's already in English format (i.e. . for decimal separation). For some reason, the translator is having trouble parsing an expression like 2,3E2 and so it doesn't convert the expression to English format. The Mages engine PowerToys is using for calculating actual results requires English format and so it's not giving proper results.

That being said, I'm not sure why the translator isn't properly parsing the input. Seems like it's not recognizing the E as a token to split the expression on?

I can try to either:

  1. Fix the translator so that the input we get when the user inputs scientific notation is in English and just remove the decimal separator logic in the calculator engine
  2. Just make the calculator engine replace any decimal separators with the regular English separator.

I think I'd prefer option 1 since that seems cleaner. I'll need to look into the translator and see how to make it recognize scientific notation input properly.

@stefansjfw
Copy link
Collaborator

I agree. Let's try the option number 1

@Jay-o-Way
Copy link
Collaborator

Jay-o-Way commented Dec 27, 2023

Yeah, those different possibilities for decimals are always a trouble.

Important

Both "...just remove the decimal separator logic in the calculator engine" and "Just make the calculator engine replace any decimal separators" actually sound dangerous and could possibly lead to a regression resulting in bad calculations. Note that a number like "one-thousand two-hundred thirty four" in some languages is written as 1,234 and in some others that would be 1.234. In both cases, the other character denotes a decimal.

Whatever is going to be done, it needs to be tested with all permutations.

@viggyd
Copy link
Contributor Author

viggyd commented Dec 27, 2023

Okay, I think the issue in the NumberTranslator's splitRegex pattern. It's considering something like 2,3E2 to be a single token and it isn't splitting on the E:
image

Because the regular expression is allowing characters [a-fA-F], it's not creating a token split on the E. So the token as a whole (2,3E2) is sent to the decimal.TryParse(...), which doesn't get translated to 2.3E2 presumably because the function doesn't understand the E in there. Interestingly, the same thing should happen for English decimal separators with the tokenization not working, but it works out there because there is no translation needed.

I'm assuming the allowance of a-fA-F is to allow hex notation, but I didn't think PowerToys supported that. Anyone have insight into why these characters are allowed to be part of a singular token? If PowerToys does support hexadecimal notation now, I'm not sure how to fix the translator. Are we forcing all hex numbers to start with 0x? If so, maybe I can look for that and split the token on E if there is no leading 0x. Looks like @jaimecbernardo added the code hexadecimal code. Thoughts?

@Jay-o-Way
Copy link
Collaborator

Jay-o-Way commented Dec 27, 2023

Darn it, you're absolutely right. In short, there is no way to tell if 5E3 is hexadecimal or scientific notation. Don't know how it would be possible to tell. Which jeopardizes the entire issue. This is why Calculator app has modes for input.

Having that said... Isn't scientific notation better to use for results, only? I mean, there's an inherent "insecurity" (or what's the word?) regarding significance. Like, when you say there are (about) 8 billion people on the planet, that's not an exact number. You could write it as 8B or 8E9. Note that both these are also valid as hexadecimal. What I mean is, you normally do not calculate anything with these kinds of estimates.

The hex regex stuff is only used once, there's no need to keep track of it in
the object's state. Just create it during the translation process.
@viggyd
Copy link
Contributor Author

viggyd commented Dec 28, 2023

Darn it, you're absolutely right. In short, there is no way to tell if 5E3 is hexadecimal or scientific notation. Don't know how it would be possible to tell. Which jeopardizes the entire issue. This is why Calculator app has modes for input.

Having that said... Isn't scientific notation better to use for results, only? I mean, there's an inherent "insecurity" (or what's the word?) regarding significance. Like, when you say there are (about) 8 billion people on the planet, that's not an exact number. You could write it as 8B or 8E9. Note that both these are also valid as hexadecimal. What I mean is, you normally do not calculate anything with these kinds of estimates.

I think maybe uncertainty is the word you're looking for?

While that certainly is a use case, I have used scientific notation in PowerToys in the past for the purposes of calculation. It's a convenience when working with smaller numbers, so that rather than needing to put in 0.001 I can just do 1e-3. It's especially useful when converting between various SI units, dimensional analysis, etc.

It looks like PowerToys does require the 0x prefix to denote a hexadecimal number. If I can assume that's always the case, I can do a 2-stage tokenization approach. The first can split out any hexadecimal expressions in the input. After that, all other tokens are free to be split without the a-fA-F in the splitting regex. The hexadecimals that I tokenized in the first stage can remain as one big group.

Something like (pseudocode):

string[] hex_norm_tokens = hexRegex.Split(input);

foreach (string token in hex_norm_tokens)
{
    if (token.startswith == "0x")
    {
         // Add to output builder - no further processing needed
         continue;
    }
    
    string[] reg_tokens = splitRegex.Split(token);
    // All current tokenization happens here
    ...
}

Should be able to split out hexadecimals on something like this: (?:(0x[\da-fA-F]+)+). Regex101 link with test expression: link

I tested this locally and its seems to work well. I'm wondering if I should update the unit tests as well. There's a few edge cases that we talked about during this conversation that'd be good to codify in unit tests somewhere to catch any regressions if they happen in the future

@Jay-o-Way
Copy link
Collaborator

Alrighty. If the 0x prefix is/becomes mandatory, we must make sure this clear in the documentations. But if it works, it works.

@viggyd
Copy link
Contributor Author

viggyd commented Jan 2, 2024

Okay, pushed a change that now includes a unit test for scientific notation in other cultures. It's very small though, let me know if there are any other cases I should test. I couldn't think of anything else that wasn't covered by a different test somewhere else. I can also move the tests I added in QueryTests into this new one I added under ExtendedCalculatorParserTests

Copy link
Collaborator

@stefansjfw stefansjfw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now! Nice work! Thanks for the contribution! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants