Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide syntax warnings to Java #10645

Merged
merged 7 commits into from
Jul 24, 2024
Merged

Provide syntax warnings to Java #10645

merged 7 commits into from
Jul 24, 2024

Conversation

kazcw
Copy link
Contributor

@kazcw kazcw commented Jul 24, 2024

Pull Request Description

Translate syntax warnings and attach to IR when translating operator applications.

We should ensure that all Trees are checked for warnings and every warning is attached to some IR. That would require a bit of refactoring: In TreeToIr, we could define helpers wrapping every IR constructor and accepting a Tree parameter. The Tree could be used to populate the IdentifiedLocation when constructing the IR type, and then to attach all warnings after constructing the IR object.

Important Notes

  • Update JNI dependency.
  • Introduces a cargo bench runner for parser.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

@kazcw kazcw changed the title Warnings JNI Provide syntax warnings to Java Jul 24, 2024
@kazcw kazcw self-assigned this Jul 24, 2024
@kazcw kazcw force-pushed the wip/kw/warnings-jni branch from ad3374e to 217001c Compare July 24, 2024 03:05
@kazcw kazcw added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 24, 2024
@kazcw kazcw force-pushed the wip/kw/warnings-jni branch from 217001c to e6a41e1 Compare July 24, 2024 03:16
@kazcw kazcw force-pushed the wip/kw/warnings-jni branch 2 times, most recently from 6e2abb8 to 57c76d5 Compare July 24, 2024 12:43
@kazcw kazcw force-pushed the wip/kw/warnings-jni branch from 57c76d5 to dd81f5d Compare July 24, 2024 12:51
@kazcw kazcw marked this pull request as ready for review July 24, 2024 12:51
}

#[bench]
fn bench_std_lib(b: &mut test::Bencher) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is a framework for benchmarks in the engine. We currently have benchmarks for the engine compiler, like InlineCOmpilerBenchmark. We don't have benchmarks for parser yet, but they should not be difficult to implement.

Is this benchmark run automatically on the CI, or is it supposed to be run manually? Shouldn't we just add parser benchmarks to the already existing engine runtime-benchmarks infrastructure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have an EnsoParser benchmark to measure the overhead of JNI calls when doing TreeToIr. Probably just a copy of ManyLocalVarsBenchmark that doesn't call compiler, but only TreeToIr.

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala and Java changes seems fine. I like that Parser now produces Warning diagnostics, like the rest of the runtime compiler.

I am not sure about the usefulness of the benchmark in Rust. I would rather add a new benchmark inside the already existing engine benchmarks infrastructure.

@kazcw
Copy link
Contributor Author

kazcw commented Jul 24, 2024

I am not sure about the usefulness of the benchmark in Rust. I would rather add a new benchmark inside the already existing engine benchmarks infrastructure.

The important benchmarks to watch will be the Java and JS benchmarks, when they exist.

The Rust benchmark is more of a large microbenchmark. When changes in Rust don't affect the parser's output, the difference in performance can be measured more precisely by limiting the scope of measured operations. This is quite useful during parser development and could not be replaced by Java and JS benchmarks.

@JaroslavTulach
Copy link
Member

Trees are checked for warnings and every warning is attached to some IR.

That's going to be quite some work.

define helpers wrapping every IR constructor and accepting a Tree parameter. The Tree could be used to populate the IdentifiedLocation when constructing the IR type, and then to attach all warnings after constructing the IR object.

The following change might be simpler:

diff --git engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
index 2d103c85c2..a75f6cfcf1 100644
--- engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
+++ engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
@@ -2022,7 +2022,8 @@ final class TreeToIr {
     return new MetadataStorage();
   }
 
-  private DiagnosticStorage diag() {
+  private DiagnosticStorage diag(Tree tree) {
+    // fill with the warnings from tree
     return DiagnosticStorage.apply(nil());
   }

then it is just about replacing all calls to diag() with diag(aTree).

@@ -1197,6 +1201,14 @@ yield translateFunction(fun, name, isOperator, fun.getArgs(), fun.getBody(),
};
}

private void attachTranslatedWarnings(IR ir, Tree tree) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With following signature

private static <T extends IR> IR attachTranslatedWarnings(T ir, Tree tree);

one can write yield attachTranslatedWarnings(ir, app).

@@ -94,6 +94,8 @@ private Parser(long stateIn) {

private static native long getMetadata(long state);

private static native String getWarningTemplate(int warningId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of native methods isn't comparable to non-native ones. If we call this method frequently (for each Tree?), we may see slowdowns. The design of the Tree serde was:

  • pass source to parser via native call
  • get byte[] array back
  • deserialize the array via Java-only means
  • never calls native method again

This getWarningMessage (as well as getUuidHigh and getUuidLow) are compromising the design and will become a bottleneck once (when we remove other, bigger bottlenecks).

Copy link
Contributor Author

@kazcw kazcw Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Rust side, I implemented warnings with a table of messages with the idea being we can transfer the table all at once, ideally at build time. This method is an initial unoptimized implementation that was much simpler. The method is called for each warning encountered, so this temporary solution will not have any performance impact when warnings are not numerous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't really related to this PR, but I don't know where else to record these findings. Anyway it illustrates what I meant by

will become a bottleneck once (when we remove other, bigger bottlenecks).

I have just performed a measurement on 2024.3.1-nightly.2024.7.27 and executed:

enso$ ./built-distribution/enso-engine-*/enso-*/bin/enso --run test/Base_Tests/ --profiling-path start.npss Nic

that is a nice benchmark for parser as it needs to parse 118 .enso source files in test/Base_Tests and then it executes almost no Enso code. Following hotspots were identified in start.npss main thread by VisualVM:

HotSpots

There is still five more bottlenecks ahead of Parser.parseTree, but still it takes 1.3s to parse those files...

) -> jstring {
let message =
enso_parser::syntax::WARNINGS.get(id as usize).copied().unwrap_or("Unknown warning ID");
env.new_string(message).unwrap().into_raw()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the set of warning messages finite? Then it might make sense to create a cache on the Java side and avoid too frequent native calls and allocation of new_string with every call.

}

#[bench]
fn bench_std_lib(b: &mut test::Bencher) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have an EnsoParser benchmark to measure the overhead of JNI calls when doing TreeToIr. Probably just a copy of ManyLocalVarsBenchmark that doesn't call compiler, but only TreeToIr.

@kazcw
Copy link
Contributor Author

kazcw commented Jul 24, 2024

Trees are checked for warnings and every warning is attached to some IR.

That's going to be quite some work.

define helpers wrapping every IR constructor and accepting a Tree parameter. The Tree could be used to populate the IdentifiedLocation when constructing the IR type, and then to attach all warnings after constructing the IR object.

The following change might be simpler:

diff --git engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
index 2d103c85c2..a75f6cfcf1 100644
--- engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
+++ engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
@@ -2022,7 +2022,8 @@ final class TreeToIr {
     return new MetadataStorage();
   }
 
-  private DiagnosticStorage diag() {
+  private DiagnosticStorage diag(Tree tree) {
+    // fill with the warnings from tree
     return DiagnosticStorage.apply(nil());
   }

then it is just about replacing all calls to diag() with diag(aTree).

Ah yes, I didn't see that the constructors already accept an initial set of diagnostics. That's a good way to do it.

@kazcw
Copy link
Contributor Author

kazcw commented Jul 24, 2024

Trees are checked for warnings and every warning is attached to some IR.

That's going to be quite some work.

define helpers wrapping every IR constructor and accepting a Tree parameter. The Tree could be used to populate the IdentifiedLocation when constructing the IR type, and then to attach all warnings after constructing the IR object.

The following change might be simpler:

diff --git engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
index 2d103c85c2..a75f6cfcf1 100644
--- engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
+++ engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
@@ -2022,7 +2022,8 @@ final class TreeToIr {
     return new MetadataStorage();
   }
 
-  private DiagnosticStorage diag() {
+  private DiagnosticStorage diag(Tree tree) {
+    // fill with the warnings from tree
     return DiagnosticStorage.apply(nil());
   }

then it is just about replacing all calls to diag() with diag(aTree).

I tried to implement this and encountered a problem because of the circular reference in diagnostics: Diagnostics are to be attached to the IR type, but the diagnostic types require an IR argument to construct. I don't think it's currently possible to populate the DiagnosticStorage when it is created.

@kazcw kazcw added the CI: Ready to merge This PR is eligible for automatic merge label Jul 24, 2024
@mergify mergify bot merged commit 8b48637 into develop Jul 24, 2024
42 checks passed
@mergify mergify bot deleted the wip/kw/warnings-jni branch July 24, 2024 17:54
@JaroslavTulach
Copy link
Member

I tried to implement this and encountered a problem because of the circular reference in diagnostics: Diagnostics are to be attached to the IR type, but the diagnostic types require an IR argument to construct. I don't think it's currently possible to populate the DiagnosticStorage when it is created.

The only reason why Warning.Syntax requires IR is because the PR put it there:

case class Syntax(ir: IR, message: String) extends Warning {

let's just remove it and we'll be fine. This is a skech that seems to work for me:

diff --git engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
index 5695743d9c..097839a5c1 100644
--- engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
+++ engine/runtime-parser/src/main/java/org/enso/compiler/core/TreeToIr.java
@@ -104,8 +104,9 @@ final class TreeToIr {
                 methodReference = translateExpression(sig.getVariable());
               }
               var signature = translateType(sig.getType());
+              var where = getIdentifiedLocation(sig);
               var ascription = new Type.Ascription(methodReference, signature, Option.empty(),
-                  getIdentifiedLocation(sig), meta(), diag());
+                  where, meta(), diag(where, sig));
               yield ascription;
             }
             default -> translateExpression(exprTree);
@@ -204,13 +205,16 @@ final class TreeToIr {
         yield new Module(imports.reverse(), exports.reverse(), bindings.reverse(), isPrivate,
             getIdentifiedLocation(module), meta(), DiagnosticStorage.apply(diag));
       }
-      default -> new Module(
+      default -> {
+        var where = getIdentifiedLocation(module);
+        yield new Module(
           nil(), nil(),
           join(translateSyntaxError(module, new Syntax.UnsupportedSyntax("translateModule")),
               nil()),
           false,
-          getIdentifiedLocation(module), meta(), diag()
+          where, meta(), diag(where, module)
       );
+      }
     };
   }
 
@@ -1102,7 +1106,7 @@ final class TreeToIr {
           if (branch.getDocumentation() != null) {
             var comment = translateComment(cas, branch.getDocumentation());
             var loc = getIdentifiedLocation(cas);
-            var doc = new Pattern.Documentation(comment.doc(), loc, meta(), diag());
+            var doc = new Pattern.Documentation(comment.doc(), loc, meta(), diag(loc, comment));
             var br = new Case.Branch(
                 doc,
                 new Empty(Option.empty(), meta(), diag()),
@@ -2034,8 +2038,15 @@ final class TreeToIr {
     return new MetadataStorage();
   }
 
-  private DiagnosticStorage diag() {
-    return DiagnosticStorage.apply(nil());
+  private static DiagnosticStorage diag(Tree tree) {
+    var where = getIdentifiedLocation(tree);
+    return diag(where, tree);
+  }
+
+  private static DiagnosticStorage diag(Option<IdentifiedLocation> where, Tree tree) {
+    var warnings = tree.getWarnings().stream();
+    var diags = warnings.map(warning -> new Warning.Syntax(where, Parser.getWarningMessage(warning)));
+    return DiagnosticStorage.apply(diags);
   }
 
   private static int castToInt(long presumablyInt) {
diff --git engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Warning.scala engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Warning.scala
index 90cb0df6dd..864495eac5 100644
--- engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Warning.scala
+++ engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Warning.scala
@@ -144,16 +144,13 @@ object Warning {
     override def diagnosticKeys(): Array[Any] = Array(ir.name)
   }
 
-  case class Syntax(ir: IR, message: String) extends Warning {
+  case class Syntax(val location: Option[IdentifiedLocation], message: String) extends Warning {
 
     /** @return a human-readable description of this error condition.
       */
     override def message(source: (IdentifiedLocation => String)): String =
       message
 
-    /** The location at which the diagnostic occurs. */
-    override val location: Option[IdentifiedLocation] = ir.location
-
     /** The important keys identifying identity of the diagnostic
       */
     override def diagnosticKeys(): Array[Any] = Array()

in order to share IdentifiedLocation I am creating another diag(location, tree) method and refactoring the code to share the instance. Looks to me we may need to create an empty diag in certain situations as well because some IR elements aren't directly tight to a Tree element. Of course, the amount of remaining locations that need to be changed is overwhelming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants