Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit opted-in string literals into data section as UTF8 #76036

Merged
merged 22 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/features/string-literals-data-section.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# String literals in data section

This opt-in Roslyn feature allows changing how string literals in C# programs are emitted into PE files (`.dll`/`.exe`).
By default, string literals are emitted to the UserString heap which is limited to [2^24 bytes](https://github.com/dotnet/roslyn/issues/9852).
When the limit is reached, the following compiler error is reported by Roslyn:

```
error CS8103: Combined length of user strings used by the program exceeds allowed limit. Try to decrease use of string literals.
```

By turning on the feature flag `utf8-string-encoding`, string literals (where possible) are instead emitted as UTF-8 data into a different section of the PE file
jjonescz marked this conversation as resolved.
Show resolved Hide resolved
which does not have the same limit. The emit format is similar to [explicit u8 string literals](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-11.0/utf8-string-literals).

The feature flag can take a non-negative integer threshold. Only string literals whose length is over the threshold are emitted in the new way described above.
By default, the threshold is 100. Specifying 0 means all string literals are considered for the feature. Specifying `off` turns off the feature (this is the default).

The feature flag can be specified on the command line like `/features:utf8-string-encoding` or `/features:utf8-string-encoding=20`,
or in a project file in a `<PropertyGroup>` like `<Features>$(Features);utf8-string-encoding</Features>` or `<Features>$(Features);utf8-string-encoding=20</Features>`.
2 changes: 1 addition & 1 deletion src/Compilers/CSharp/Portable/CSharpResources.resx
Original file line number Diff line number Diff line change
Expand Up @@ -5309,7 +5309,7 @@ To remove the warning, you can use /reference instead (set the Embed Interop Typ
<value>Syntax tree should be created from a submission.</value>
</data>
<data name="ERR_TooManyUserStrings" xml:space="preserve">
<value>Combined length of user strings used by the program exceeds allowed limit. Try to decrease use of string literals.</value>
<value>Combined length of user strings used by the program exceeds allowed limit. Consider using feature flag 'utf8-string-encoding'.</value>
</data>
<data name="ERR_PatternNullableType" xml:space="preserve">
<value>It is not legal to use nullable type '{0}?' in a pattern; use the underlying type '{0}' instead.</value>
Expand Down
6 changes: 5 additions & 1 deletion src/Compilers/CSharp/Portable/CodeGen/CodeGenerator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
using System.Collections.Immutable;
using System.Diagnostics;
using System.Reflection.Metadata;
using System.Text;
using Microsoft.CodeAnalysis.CodeGen;
using Microsoft.CodeAnalysis.CSharp.Emit;
using Microsoft.CodeAnalysis.CSharp.Symbols;
Expand All @@ -32,6 +33,7 @@ internal sealed partial class CodeGenerator
private readonly BindingDiagnosticBag _diagnostics;
private readonly ILEmitStyle _ilEmitStyle;
private readonly bool _emitPdbSequencePoints;
private readonly int? _utf8StringEncodingThreshold;

private readonly HashSet<LocalSymbol> _stackLocals;

Expand Down Expand Up @@ -87,7 +89,8 @@ public CodeGenerator(
PEModuleBuilder moduleBuilder,
BindingDiagnosticBag diagnostics,
OptimizationLevel optimizations,
bool emittingPdb)
bool emittingPdb,
Compilation compilation)
jjonescz marked this conversation as resolved.
Show resolved Hide resolved
{
Debug.Assert((object)method != null);
Debug.Assert(boundBody != null);
Expand All @@ -101,6 +104,7 @@ public CodeGenerator(
_builder = builder;
_module = moduleBuilder;
_diagnostics = diagnostics;
_utf8StringEncodingThreshold = compilation.Utf8StringEncodingThreshold;

if (!method.GenerateDebugInfo)
{
Expand Down
24 changes: 23 additions & 1 deletion src/Compilers/CSharp/Portable/CodeGen/EmitExpression.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3481,13 +3481,35 @@ private void EmitConstantExpression(TypeSymbol type, ConstantValue constantValue
{
EmitInitObj(type, used, syntaxNode);
}
else
else if (!TryEmitStringLiteralAsUtf8Encoded(constantValue, syntaxNode))
{
_builder.EmitConstantValue(constantValue);
}
}
}

private bool TryEmitStringLiteralAsUtf8Encoded(ConstantValue constantValue, SyntaxNode syntaxNode)
{
// Emit long strings into data section so they don't overflow the UserString heap.
if (constantValue.IsString &&
constantValue.StringValue.Length > _utf8StringEncodingThreshold &&
LocalRewriter.TryGetUtf8ByteRepresentation(constantValue.StringValue, out byte[] utf8Bytes, out _))
{
var data = utf8Bytes.ToImmutableArray();
var field = _builder.module.GetFieldForDataString(data, syntaxNode, _diagnostics.DiagnosticBag);
if (field is null)
{
return false;
}

_builder.EmitOpCode(ILOpCode.Ldsfld);
_builder.EmitToken(field, syntaxNode, _diagnostics.DiagnosticBag);
return true;
}

return false;
}

private void EmitInitObj(TypeSymbol type, bool used, SyntaxNode syntaxNode)
{
if (used)
Expand Down
11 changes: 8 additions & 3 deletions src/Compilers/CSharp/Portable/Compiler/MethodCompiler.cs
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,8 @@ public static void CompileMethodBodies(

methodCompiler.WaitForWorkers();

moduleBeingBuiltOpt.FreezeDataStringHolders(diagnostics.DiagnosticBag);

// all threads that were adding methods must be finished now, we can freeze the class:
var privateImplClass = moduleBeingBuiltOpt.FreezePrivateImplementationDetails();
if (privateImplClass != null)
Expand Down Expand Up @@ -671,8 +673,11 @@ private void CompileSynthesizedMethods(PrivateImplementationDetails privateImplC
foreach (Cci.IMethodDefinition definition in privateImplClass.GetMethods(context).Concat(privateImplClass.GetTopLevelAndNestedTypeMethods(context)))
{
var method = (MethodSymbol)definition.GetInternalSymbol();
Debug.Assert(method.SynthesizesLoweredBoundBody);
method.GenerateMethodBody(compilationState, diagnostics);
if (method is not null)
{
Debug.Assert(method.SynthesizesLoweredBoundBody);
method.GenerateMethodBody(compilationState, diagnostics);
}
}

CompileSynthesizedMethods(compilationState);
Expand Down Expand Up @@ -1514,7 +1519,7 @@ private static MethodBody GenerateMethodBody(
{
StateMachineMoveNextBodyDebugInfo moveNextBodyDebugInfoOpt = null;

var codeGen = new CodeGen.CodeGenerator(method, block, builder, moduleBuilder, diagnosticsForThisMethod, optimizations, emittingPdb);
var codeGen = new CodeGen.CodeGenerator(method, block, builder, moduleBuilder, diagnosticsForThisMethod, optimizations, emittingPdb, compilation);

if (diagnosticsForThisMethod.HasAnyErrors())
{
Expand Down
1 change: 1 addition & 0 deletions src/Compilers/CSharp/Portable/Errors/MessageProvider.cs
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ public override void ReportDuplicateMetadataReferenceWeak(DiagnosticBag diagnost
public override int ERR_EncUpdateFailedMissingSymbol => (int)ErrorCode.ERR_EncUpdateFailedMissingSymbol;
public override int ERR_InvalidDebugInfo => (int)ErrorCode.ERR_InvalidDebugInfo;
public override int ERR_FunctionPointerTypesInAttributeNotSupported => (int)ErrorCode.ERR_FunctionPointerTypesInAttributeNotSupported;
public override int ERR_MissingPredefinedMember => (int)ErrorCode.ERR_MissingPredefinedMember;

// Generators:
public override int WRN_GeneratorFailedDuringInitialization => (int)ErrorCode.WRN_GeneratorFailedDuringInitialization;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Linq;
using System.Text;
using Microsoft.CodeAnalysis.CodeGen;
using Microsoft.CodeAnalysis.CSharp.Emit;
using Microsoft.CodeAnalysis.CSharp.Symbols;
Expand All @@ -21,6 +22,8 @@ namespace Microsoft.CodeAnalysis.CSharp
{
internal sealed partial class LocalRewriter : BoundTreeRewriterWithStackGuard
{
private static readonly UTF8Encoding s_utf8 = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true);

private readonly CSharpCompilation _compilation;
private readonly SyntheticBoundNodeFactory _factory;
private readonly SynthesizedSubmissionFields _previousSubmissionFields;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Linq;
using Microsoft.CodeAnalysis.CSharp.Symbols;
using Microsoft.CodeAnalysis.PooledObjects;
Expand Down Expand Up @@ -120,20 +121,37 @@ private BoundExpression MakeUtf8Span(BoundExpression node, IReadOnlyList<byte>?
return result;
}

private byte[]? GetUtf8ByteRepresentation(BoundUtf8String node)
internal static bool TryGetUtf8ByteRepresentation(
string s,
[NotNullWhen(returnValue: true)] out byte[]? result,
[NotNullWhen(returnValue: false)] out string? error)
{
var utf8 = new System.Text.UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true);
jjonescz marked this conversation as resolved.
Show resolved Hide resolved

try
{
return utf8.GetBytes(node.Value);
result = s_utf8.GetBytes(s);
error = null;
return true;
}
catch (Exception ex)
{
result = null;
error = ex.Message;
return false;
}
}

private byte[]? GetUtf8ByteRepresentation(BoundUtf8String node)
{
if (TryGetUtf8ByteRepresentation(node.Value, out byte[]? result, out string? error))
{
return result;
}
else
{
_diagnostics.Add(
ErrorCode.ERR_CannotBeConvertedToUtf8,
node.Syntax.Location,
ex.Message);
error);

return null;
}
Expand Down
4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.cs.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.de.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.es.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.fr.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.it.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.ja.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.ko.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.pl.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.pt-BR.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.ru.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading