Provide syntax warnings to Java #10645

kazcw · 2024-07-24T02:39:02Z

Pull Request Description

Translate syntax warnings and attach to IR when translating operator applications.

We should ensure that all Trees are checked for warnings and every warning is attached to some IR. That would require a bit of refactoring: In TreeToIr, we could define helpers wrapping every IR constructor and accepting a Tree parameter. The Tree could be used to populate the IdentifiedLocation when constructing the IR type, and then to attach all warnings after constructing the IR object.

Important Notes

Update JNI dependency.
Introduces a cargo bench runner for parser.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
Unit tests have been written where possible.

Akirathan · 2024-07-24T13:39:29Z

lib/rust/parser/debug/src/bin/bench_parse.rs

+}
+
+#[bench]
+fn bench_std_lib(b: &mut test::Bencher) {


Note that there is a framework for benchmarks in the engine. We currently have benchmarks for the engine compiler, like InlineCOmpilerBenchmark. We don't have benchmarks for parser yet, but they should not be difficult to implement.

Is this benchmark run automatically on the CI, or is it supposed to be run manually? Shouldn't we just add parser benchmarks to the already existing engine runtime-benchmarks infrastructure?

It would be great to have an EnsoParser benchmark to measure the overhead of JNI calls when doing TreeToIr. Probably just a copy of ManyLocalVarsBenchmark that doesn't call compiler, but only TreeToIr.

Akirathan

Scala and Java changes seems fine. I like that Parser now produces Warning diagnostics, like the rest of the runtime compiler.

I am not sure about the usefulness of the benchmark in Rust. I would rather add a new benchmark inside the already existing engine benchmarks infrastructure.

kazcw · 2024-07-24T13:50:52Z

I am not sure about the usefulness of the benchmark in Rust. I would rather add a new benchmark inside the already existing engine benchmarks infrastructure.

The important benchmarks to watch will be the Java and JS benchmarks, when they exist.

The Rust benchmark is more of a large microbenchmark. When changes in Rust don't affect the parser's output, the difference in performance can be measured more precisely by limiting the scope of measured operations. This is quite useful during parser development and could not be replaced by Java and JS benchmarks.

JaroslavTulach · 2024-07-24T15:07:25Z

Trees are checked for warnings and every warning is attached to some IR.

That's going to be quite some work.

define helpers wrapping every IR constructor and accepting a Tree parameter. The Tree could be used to populate the IdentifiedLocation when constructing the IR type, and then to attach all warnings after constructing the IR object.

The following change might be simpler:

diff --git engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
index 2d103c85c2..a75f6cfcf1 100644
--- engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
+++ engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
@@ -2022,7 +2022,8 @@ final class TreeToIr {
     return new MetadataStorage();
   }
 
-  private DiagnosticStorage diag() {
+  private DiagnosticStorage diag(Tree tree) {
+    // fill with the warnings from tree
     return DiagnosticStorage.apply(nil());
   }

then it is just about replacing all calls to diag() with diag(aTree).

engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java

JaroslavTulach · 2024-07-24T14:44:52Z

engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java

@@ -1197,6 +1201,14 @@ yield translateFunction(fun, name, isOperator, fun.getArgs(), fun.getBody(),
    };
  }

+  private void attachTranslatedWarnings(IR ir, Tree tree) {


With following signature

private static <T extends IR> IR attachTranslatedWarnings(T ir, Tree tree);

one can write yield attachTranslatedWarnings(ir, app).

JaroslavTulach · 2024-07-24T14:52:38Z

lib/rust/parser/generate-java/java/org/enso/syntax2/Parser.java

@@ -94,6 +94,8 @@ private Parser(long stateIn) {

  private static native long getMetadata(long state);

+  private static native String getWarningTemplate(int warningId);


The performance of native methods isn't comparable to non-native ones. If we call this method frequently (for each Tree?), we may see slowdowns. The design of the Tree serde was:

pass source to parser via native call

get byte[] array back

deserialize the array via Java-only means

never calls native method again

This getWarningMessage (as well as getUuidHigh and getUuidLow) are compromising the design and will become a bottleneck once (when we remove other, bigger bottlenecks).

On the Rust side, I implemented warnings with a table of messages with the idea being we can transfer the table all at once, ideally at build time. This method is an initial unoptimized implementation that was much simpler. The method is called for each warning encountered, so this temporary solution will not have any performance impact when warnings are not numerous.

This comment isn't really related to this PR, but I don't know where else to record these findings. Anyway it illustrates what I meant by

will become a bottleneck once (when we remove other, bigger bottlenecks).

I have just performed a measurement on 2024.3.1-nightly.2024.7.27 and executed:

enso$ ./built-distribution/enso-engine-*/enso-*/bin/enso --run test/Base_Tests/ --profiling-path start.npss Nic

that is a nice benchmark for parser as it needs to parse 118 .enso source files in test/Base_Tests and then it executes almost no Enso code. Following hotspots were identified in start.npss main thread by VisualVM:

There is still five more bottlenecks ahead of Parser.parseTree, but still it takes 1.3s to parse those files...

JaroslavTulach · 2024-07-24T14:55:01Z

lib/rust/parser/jni/src/lib.rs

+) -> jstring {
+    let message =
+        enso_parser::syntax::WARNINGS.get(id as usize).copied().unwrap_or("Unknown warning ID");
+    env.new_string(message).unwrap().into_raw()


Is the set of warning messages finite? Then it might make sense to create a cache on the Java side and avoid too frequent native calls and allocation of new_string with every call.

JaroslavTulach · 2024-07-24T15:00:25Z

lib/rust/parser/debug/src/bin/bench_parse.rs

+}
+
+#[bench]
+fn bench_std_lib(b: &mut test::Bencher) {


It would be great to have an EnsoParser benchmark to measure the overhead of JNI calls when doing TreeToIr. Probably just a copy of ManyLocalVarsBenchmark that doesn't call compiler, but only TreeToIr.

kazcw · 2024-07-24T15:42:12Z

Trees are checked for warnings and every warning is attached to some IR.

That's going to be quite some work.

define helpers wrapping every IR constructor and accepting a Tree parameter. The Tree could be used to populate the IdentifiedLocation when constructing the IR type, and then to attach all warnings after constructing the IR object.

The following change might be simpler:
diff --git engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
index 2d103c85c2..a75f6cfcf1 100644
--- engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
+++ engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
@@ -2022,7 +2022,8 @@ final class TreeToIr {
     return new MetadataStorage();
   }
 
-  private DiagnosticStorage diag() {
+  private DiagnosticStorage diag(Tree tree) {
+    // fill with the warnings from tree
     return DiagnosticStorage.apply(nil());
   }
then it is just about replacing all calls to diag() with diag(aTree).

Ah yes, I didn't see that the constructors already accept an initial set of diagnostics. That's a good way to do it.

This reverts commit fb2ef2f.

This reverts commit 32e0df5.

kazcw · 2024-07-24T16:38:15Z

Trees are checked for warnings and every warning is attached to some IR.

That's going to be quite some work.

define helpers wrapping every IR constructor and accepting a Tree parameter. The Tree could be used to populate the IdentifiedLocation when constructing the IR type, and then to attach all warnings after constructing the IR object.

The following change might be simpler:
diff --git engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
index 2d103c85c2..a75f6cfcf1 100644
--- engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
+++ engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
@@ -2022,7 +2022,8 @@ final class TreeToIr {
     return new MetadataStorage();
   }
 
-  private DiagnosticStorage diag() {
+  private DiagnosticStorage diag(Tree tree) {
+    // fill with the warnings from tree
     return DiagnosticStorage.apply(nil());
   }
then it is just about replacing all calls to diag() with diag(aTree).

I tried to implement this and encountered a problem because of the circular reference in diagnostics: Diagnostics are to be attached to the IR type, but the diagnostic types require an IR argument to construct. I don't think it's currently possible to populate the DiagnosticStorage when it is created.

JaroslavTulach · 2024-07-25T03:59:37Z

I tried to implement this and encountered a problem because of the circular reference in diagnostics: Diagnostics are to be attached to the IR type, but the diagnostic types require an IR argument to construct. I don't think it's currently possible to populate the DiagnosticStorage when it is created.

The only reason why Warning.Syntax requires IR is because the PR put it there:

case class Syntax(ir: IR, message: String) extends Warning {

let's just remove it and we'll be fine. This is a skech that seems to work for me:

diff --git engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
index 5695743d9c..097839a5c1 100644
--- engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
+++ engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
@@ -104,8 +104,9 @@ final class TreeToIr {
                 methodReference = translateExpression(sig.getVariable());
               }
               var signature = translateType(sig.getType());
+              var where = getIdentifiedLocation(sig);
               var ascription = new Type.Ascription(methodReference, signature, Option.empty(),
-                  getIdentifiedLocation(sig), meta(), diag());
+                  where, meta(), diag(where, sig));
               yield ascription;
             }
             default -> translateExpression(exprTree);
@@ -204,13 +205,16 @@ final class TreeToIr {
         yield new Module(imports.reverse(), exports.reverse(), bindings.reverse(), isPrivate,
             getIdentifiedLocation(module), meta(), DiagnosticStorage.apply(diag));
       }
-      default -> new Module(
+      default -> {
+        var where = getIdentifiedLocation(module);
+        yield new Module(
           nil(), nil(),
           join(translateSyntaxError(module, new Syntax.UnsupportedSyntax("translateModule")),
               nil()),
           false,
-          getIdentifiedLocation(module), meta(), diag()
+          where, meta(), diag(where, module)
       );
+      }
     };
   }
 
@@ -1102,7 +1106,7 @@ final class TreeToIr {
           if (branch.getDocumentation() != null) {
             var comment = translateComment(cas, branch.getDocumentation());
             var loc = getIdentifiedLocation(cas);
-            var doc = new Pattern.Documentation(comment.doc(), loc, meta(), diag());
+            var doc = new Pattern.Documentation(comment.doc(), loc, meta(), diag(loc, comment));
             var br = new Case.Branch(
                 doc,
                 new Empty(Option.empty(), meta(), diag()),
@@ -2034,8 +2038,15 @@ final class TreeToIr {
     return new MetadataStorage();
   }
 
-  private DiagnosticStorage diag() {
-    return DiagnosticStorage.apply(nil());
+  private static DiagnosticStorage diag(Tree tree) {
+    var where = getIdentifiedLocation(tree);
+    return diag(where, tree);
+  }
+
+  private static DiagnosticStorage diag(Option<IdentifiedLocation> where, Tree tree) {
+    var warnings = tree.getWarnings().stream();
+    var diags = warnings.map(warning -> new Warning.Syntax(where, Parser.getWarningMessage(warning)));
+    return DiagnosticStorage.apply(diags);
   }
 
   private static int castToInt(long presumablyInt) {
diff --git engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Warning.scala engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Warning.scala
index 90cb0df6dd..864495eac5 100644
--- engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Warning.scala
+++ engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Warning.scala
@@ -144,16 +144,13 @@ object Warning {
     override def diagnosticKeys(): Array[Any] = Array(ir.name)
   }
 
-  case class Syntax(ir: IR, message: String) extends Warning {
+  case class Syntax(val location: Option[IdentifiedLocation], message: String) extends Warning {
 
     /** @return a human-readable description of this error condition.
       */
     override def message(source: (IdentifiedLocation => String)): String =
       message
 
-    /** The location at which the diagnostic occurs. */
-    override val location: Option[IdentifiedLocation] = ir.location
-
     /** The important keys identifying identity of the diagnostic
       */
     override def diagnosticKeys(): Array[Any] = Array()

in order to share IdentifiedLocation I am creating another diag(location, tree) method and refactoring the code to share the instance. Looks to me we may need to create an empty diag in certain situations as well because some IR elements aren't directly tight to a Tree element. Of course, the amount of remaining locations that need to be changed is overwhelming.

kazcw changed the title ~~Warnings JNI~~ Provide syntax warnings to Java Jul 24, 2024

kazcw self-assigned this Jul 24, 2024

kazcw force-pushed the wip/kw/warnings-jni branch from ad3374e to 217001c Compare July 24, 2024 03:05

kazcw added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 24, 2024

kazcw force-pushed the wip/kw/warnings-jni branch from 217001c to e6a41e1 Compare July 24, 2024 03:16

kazcw added 2 commits July 24, 2024 12:42

Update JNI dependency.

fdce181

Warnings JNI

50149bd

kazcw force-pushed the wip/kw/warnings-jni branch 2 times, most recently from 6e2abb8 to 57c76d5 Compare July 24, 2024 12:43

Cargo bench

dd81f5d

kazcw force-pushed the wip/kw/warnings-jni branch from 57c76d5 to dd81f5d Compare July 24, 2024 12:51

kazcw marked this pull request as ready for review July 24, 2024 12:51

kazcw requested review from 4e6, JaroslavTulach, hubertp, Akirathan, farmaazon, vitvakatu, Frizi and AdRiley as code owners July 24, 2024 12:51

Akirathan reviewed Jul 24, 2024

View reviewed changes

Akirathan approved these changes Jul 24, 2024

View reviewed changes

JaroslavTulach approved these changes Jul 24, 2024

View reviewed changes

kazcw added 4 commits July 24, 2024 16:03

Apply @JaroslavTulach suggestion

32e0df5

Fix

fb2ef2f

Revert "Fix"

3dc45ff

This reverts commit fb2ef2f.

Revert "Apply @JaroslavTulach suggestion"

9d7de7f

This reverts commit 32e0df5.

kazcw added the CI: Ready to merge This PR is eligible for automatic merge label Jul 24, 2024

mergify bot merged commit 8b48637 into develop Jul 24, 2024
42 checks passed

mergify bot deleted the wip/kw/warnings-jni branch July 24, 2024 17:54

enso-bot bot mentioned this pull request Jul 25, 2024

Adding return type to signature causes Don't use interop for EnsoObject error #10613

Closed

enso-bot bot mentioned this pull request Jul 30, 2024

Jobs appear to be stuck on Base_Tests #10643

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide syntax warnings to Java #10645

Provide syntax warnings to Java #10645

kazcw commented Jul 24, 2024 •

edited

Loading

Akirathan Jul 24, 2024

JaroslavTulach Jul 24, 2024

Akirathan left a comment

kazcw commented Jul 24, 2024 •

edited

Loading

JaroslavTulach commented Jul 24, 2024

JaroslavTulach Jul 24, 2024

JaroslavTulach Jul 24, 2024

kazcw Jul 24, 2024 •

edited

Loading

JaroslavTulach Jul 25, 2024

JaroslavTulach Jul 29, 2024

JaroslavTulach Jul 24, 2024

JaroslavTulach Jul 24, 2024

kazcw commented Jul 24, 2024

kazcw commented Jul 24, 2024

JaroslavTulach commented Jul 25, 2024

		@@ -94,6 +94,8 @@ private Parser(long stateIn) {

		private static native long getMetadata(long state);

		private static native String getWarningTemplate(int warningId);

Provide syntax warnings to Java #10645

Provide syntax warnings to Java #10645

Conversation

kazcw commented Jul 24, 2024 • edited Loading

Pull Request Description

Important Notes

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Akirathan left a comment

Choose a reason for hiding this comment

kazcw commented Jul 24, 2024 • edited Loading

JaroslavTulach commented Jul 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kazcw Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kazcw commented Jul 24, 2024

kazcw commented Jul 24, 2024

JaroslavTulach commented Jul 25, 2024

kazcw commented Jul 24, 2024 •

edited

Loading

kazcw commented Jul 24, 2024 •

edited

Loading

kazcw Jul 24, 2024 •

edited

Loading