-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use Unicode aware API and CurrentCulture in compiler #16066
Conversation
resolved
2 tests failed. Investigating...
|
@@ -55,10 +55,10 @@ type PrimaryAssembly = | |||
static member IsPossiblePrimaryAssembly(fileName: string) = | |||
let name = System.IO.Path.GetFileNameWithoutExtension(fileName) | |||
|
|||
String.Compare(name, "mscorlib", true) <> 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was a bug, so I changed <>
to =
here. Or I did misunderstood something? This was introduced in #13870
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fsharp/src/Compiler/Driver/CompilerImports.fs
Lines 2437 to 2447 in b4afa99
// We check the exported types of all assemblies, since many may forward System.Object, | |
// but only check the actual type definitions for specific assemblies that we know | |
// might actually declare System.Object. | |
match mdef.Manifest with | |
| Some manifest when | |
manifest.ExportedTypes.TryFindByName "System.Object" |> Option.isSome | |
|| PrimaryAssembly.IsPossiblePrimaryAssembly resolvedAssembly.resolvedPath | |
&& mdef.TypeDefs.ExistsByName "System.Object" | |
-> | |
mkRefToILAssembly manifest |> Some | |
| _ -> None |
based on calling context this bug was resulting in making any framework assembly as primary assembly equivalent, this may affect some esoteric .NET implementations.
@vzarytovskii All tests are passing. I will fix |
@OwnageIsMagic I would be happy to review this next week - would you mind resolving the conflicts here? If you need any help with that, please let us know :) |
53ce9e4
to
60b384e
Compare
60b384e
to
b4afa99
Compare
@@ -371,7 +371,7 @@ let parseFormatStringInternal | |||
// type checker. They should always have '(...)' after for format string. | |||
let requireAndSkipInterpolationHoleFormat i = | |||
if i < len && fmt[i] = '(' then | |||
let i2 = fmt.IndexOfOrdinal(")", i+1) | |||
let i2 = fmt.IndexOf(')', i+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not ordinal here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char overloads are always Ordinal. My PR is larger than #16439 (that introduced IndexOfOrdinal
ext method) and established usage of char overload.
+ potentially it's faster on netfx (on Core they both delegated to the same Span function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3756,7 +3756,7 @@ let writePdb ( | |||
let pdbfileInfo = FileInfo(pdbfile).FullName | |||
|
|||
// If pdbfilepath matches output filepath then error | |||
if String.Compare(outfileInfo, pdbfileInfo, StringComparison.InvariantCulture) = 0 then | |||
if outfileInfo = pdbfileInfo then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the particular motivation for this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File systems usually doesn't normalize Unicode in paths, InvariantCulture does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those paths are CLI arguments, the logic to compare them should stay case insensitive like it was, to keep the behavior the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@T-Gro it was InvariantCulture
, not InvariantCultureIgnoreCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@T-Gro do you suggest to actually make it case insensitive?
@@ -925,7 +925,7 @@ module PrintTypes = | |||
if not denv.includeStaticParametersInTypeNames then | |||
None, args | |||
else | |||
let regex = System.Text.RegularExpressions.Regex(@"\`\d+") | |||
let regex = System.Text.RegularExpressions.Regex(@"\`\d+", System.Text.RegularExpressions.RegexOptions.ECMAScript) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't seen this yet - which of course doesn't mean this is not a valid change - just wondering, is this a recommendation from Microsoft or, in other words, what is it consistent with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See notes in OP.
By default Regex
is Unicode aware, so char classes maps to Unicode categories and allow much more then expected here ASCII chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a regular expression, this could also just be something like:
let tryGetTyparCount (s: string) =
let indexOfBacktick = s.LastIndexOf '`'
if indexOfBacktick >= 0 && indexOfBacktick < s.Length - 1 then
match Int32.TryParse(s.AsSpan(indexOfBacktick + 1)) with
| true, genericArgCount -> ValueSome genericArgCount
| false, _ -> ValueNone
else
ValueNone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ilwrite behavior to comparing CLI args should not be changed.
❗ Release notes requiredCaution No release notes found for the changed paths (see table below). Please make sure to add an entry with an informative description of the change as well as link to this pull request, issue and language suggestion if applicable. Release notes for this repository are based on Keep A Changelog format. The following format is recommended for this repository:
If you believe that release notes are not necessary for this PR, please add NO_RELEASE_NOTES label to the pull request. You can open this PR in browser to add release notes: open in github.dev
|
Any help needed? |
Caution Repository is on lockdown for maintenance, all merges are on hold. |
@OwnageIsMagic --- this PR seems to address a disparate set of issues.
In general our test suites are American/English with unicode tests only in the places where we know that unicode handling is impactful; we are not confident that it will catch bugs introduced in PRs such as this, or future regressions we may re-introduce in the future. In order to ensure that the change correctly addresses an observable problem, would it be possible to create a specific test case that currently fails, issue a bug and provide a fix, this will ensure that in the future we don't regress any fixes supplied. I am closing this PR, but please resubmitted PR's with regression test cases and targeted fixes. Thanks Kevin. |
@KevinRansom Most fixes in this PR are in 3 categories. 1. Micro optimizations like using member spec.IsGFormat =
- spec.IsDecimalFormat || System.Char.ToLower(spec.TypeChar) = 'g'
+ spec.IsDecimalFormat || spec.TypeChar = 'g' || spec.TypeChar = 'G' Currently there is no codepoint in Unicode beside - Regex(@"^(/|--)test:ErrorRanges$", RegexOptions.Compiled ||| RegexOptions.IgnoreCase)
+ Regex(
+ @"^(/|--)test:ErrorRanges$",
+ RegexOptions.Compiled
+ ||| RegexOptions.IgnoreCase
+ ||| RegexOptions.CultureInvariant
) Command line flags are literals and should not be compared with default 3. Actual bugs - while (Char.IsDigit s.[i]) do
+ while (isDigit s[i]) do
let n = int s.[i] - int '0' This one manifests itself in ([char]0x2126).ToString().Equals( ([char]0x3a9).ToString(), 'CurrentCulture')
# True
([char]0x2126).ToString().Equals( ([char]0x3a9).ToString(), 'InvariantCulture')
# True
New-Item -Type File ([char]0x2126).ToString()
New-Item -Type File ([char]0x3a9).ToString()
ls
Directory: D:\tst
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 14.05.2024 0:26 0 Ω
-a--- 14.05.2024 0:25 0 Ω It's 2 different files. The only place where Unicode aware comparison ( |
I think this is a viable approach.
Formatted printing, CLI argument handling as well as file system ops should be coverable with tests. |
Strictly speaking this is a breaking change, but it's for the good.
Reverts #6524 cc @forki.
Part of #12352 (somewhat)
RegexOptions.ECMAScript
Choosing a
StringComparison
member for your method callThis PR is not comprehensive (there is at least 1 instance of
Char.IsDigit
left).Commits are logically separated, you can check each commit individually.
Fixed behaviour: