Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make tests more robust #1588

Merged
merged 1 commit into from
Aug 28, 2024
Merged

make tests more robust #1588

merged 1 commit into from
Aug 28, 2024

Conversation

rbri
Copy link
Collaborator

@rbri rbri commented Aug 25, 2024

at least in my win environment, these tests are failing when running from the command line with jdk11.

@andreabergia i guess you are the original author of the test. Please have a look if this is still ok.

@p-bakker
Copy link
Collaborator

Might the reason why it fails be due to using Java 11, which supports an older version of the Unicode standard than which the latest EcmaScript spec demands?

In the test262 sure there are also a couple of unicode-related tests that fail on just Java 11 die to this issue

As both Java and EcmaScript track the latest unicode version, older Java versions only support older Unicode versions and as Unicode support in Rhino depends on the Unicode support of the Java version on which Rhino is run, even if all unicode logic in Rhino is according to spec, on an older version of Java Rhino may behave in an incompatible way

@andreabergia
Copy link
Contributor

Looks good to me, although it should not be necessary - the java source files are utf8, and in the build.gradle file for tests there is:

    systemProperty 'file.encoding', 'UTF-8'

So... the real question is why this it failing for you, locally, without this change?

@rbri
Copy link
Collaborator Author

rbri commented Aug 25, 2024

So... the real question is why this it failing for you, locally, without this change?

Agree, but at the moment i have no idea why this fails, and my gradle knowledge is not that good... my assumption was to simple have it like the other tests

@tonygermano
Copy link
Contributor

tonygermano commented Aug 27, 2024

I ran the test with

./gradlew :tests:test --tests org.mozilla.javascript.tests.ParserTest

On java 11, testParseUnicodeMultibyteCharacter passed. I did get a failure on testParseUnicodeIdentifierPartWhichIsNotJavaIdentifierPart with the following stack trace.

java.lang.AssertionError: extra error: illegal character: 鿫
	at org.junit.Assert.fail(Assert.java:89)
	at org.mozilla.javascript.testing.TestErrorReporter.error(TestErrorReporter.java:42)
	at org.mozilla.javascript.Parser.addError(Parser.java:241)
	at org.mozilla.javascript.Parser.addError(Parser.java:227)
	at org.mozilla.javascript.Parser.addError(Parser.java:232)
	at org.mozilla.javascript.TokenStream.getToken(TokenStream.java:1357)
	at org.mozilla.javascript.Parser.peekToken(Parser.java:380)
	at org.mozilla.javascript.Parser.name(Parser.java:3220)
	at org.mozilla.javascript.Parser.primaryExpr(Parser.java:3117)
	at org.mozilla.javascript.Parser.memberExpr(Parser.java:2722)
	at org.mozilla.javascript.Parser.unaryExpr(Parser.java:2625)
	at org.mozilla.javascript.Parser.expExpr(Parser.java:2545)
	at org.mozilla.javascript.Parser.mulExpr(Parser.java:2528)
	at org.mozilla.javascript.Parser.addExpr(Parser.java:2514)
	at org.mozilla.javascript.Parser.shiftExpr(Parser.java:2497)
	at org.mozilla.javascript.Parser.relExpr(Parser.java:2475)
	at org.mozilla.javascript.Parser.eqExpr(Parser.java:2451)
	at org.mozilla.javascript.Parser.bitAndExpr(Parser.java:2442)
	at org.mozilla.javascript.Parser.bitXorExpr(Parser.java:2433)
	at org.mozilla.javascript.Parser.bitOrExpr(Parser.java:2424)
	at org.mozilla.javascript.Parser.andExpr(Parser.java:2415)
	at org.mozilla.javascript.Parser.orExpr(Parser.java:2406)
	at org.mozilla.javascript.Parser.condExpr(Parser.java:2373)
	at org.mozilla.javascript.Parser.assignExpr(Parser.java:2331)
	at org.mozilla.javascript.Parser.expr(Parser.java:2310)
	at org.mozilla.javascript.Parser.nameOrLabel(Parser.java:2068)
	at org.mozilla.javascript.Parser.statementHelper(Parser.java:1258)
	at org.mozilla.javascript.Parser.statement(Parser.java:1122)
	at org.mozilla.javascript.Parser.parse(Parser.java:616)
	at org.mozilla.javascript.Parser.parse(Parser.java:546)
	at org.mozilla.javascript.tests.ParserTest.parse(ParserTest.java:1462)
	at org.mozilla.javascript.tests.ParserTest.parse(ParserTest.java:1429)
	at org.mozilla.javascript.tests.ParserTest.parse(ParserTest.java:1424)
	at org.mozilla.javascript.tests.ParserTest.parse(ParserTest.java:1420)
	at org.mozilla.javascript.tests.ParserTest.testParseUnicodeIdentifierPartWhichIsNotJavaIdentifierPart(ParserTest.java:1216)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)

I switched to java 17 and reran the test and everything passed.

Likewise, in the shell, java 11:

js> var a鿫='hi'
js: "<stdin>", line 4: illegal character: 
js: var a鿫='hi'
js: .....^
js: "<stdin>", line 4: Compilation produced 1 syntax errors.

java 17:

js> var a鿫='hi'
js> print(a鿫)
hi

@p-bakker
Copy link
Collaborator

Isn't this because Java 11 not recognizing the astral plane character as such, thus returning only the first part of the surrogate pair, which in turn is an invalid character?

@andreabergia
Copy link
Contributor

That passes locally for me on mac os (and I assume on linux, in the github actions CI).
So, that seems a problem with the Windows JDK?

Regardless, the changes in this PR look good to me - the in-memory String objects have the same value regardless of how we write them in the source code (I checked).
So, LGTM 👍

@tonygermano
Copy link
Contributor

tonygermano commented Aug 27, 2024

I'm running linux for my gradle tests and rhino shell.

I think the test is good and the issue is in the parser. The test that is failing does not use a surrogate pair. The one with the surrogate pair is the one that is passing.

This is run with the same java 11 jvm that failed the test and threw the exception.

js> '鿫' === '\u9FEB'
true
js> '𠮷' === '\uD842\uDFB7'
true
js> =1
js: "<stdin>", line 4: illegal character: 
js: =1
js: ^
js: "<stdin>", line 4: Compilation produced 1 syntax errors.

js> 𠮷=1
1

The problem occurs when the particular unicode character appears as part of an identifier, but it's fine in a string.

@tonygermano
Copy link
Contributor

I upgraded from java 11.0.15 to 11.0.24 and everything works now.

@tonygermano
Copy link
Contributor

java 11.0.15

js> var unicodeString = new java.lang.String('鿫')
js> unicodeString.codePointAt(0)
40939
js> java.lang.Character.isUnicodeIdentifierStart(unicodeString.codePointAt(0))
false
js> java.lang.Character.isUnicodeIdentifierPart(unicodeString.codePointAt(0))
false

java 11.0.24

js> var unicodeString = new java.lang.String('鿫')
js> unicodeString.codePointAt(0)
40939
js> java.lang.Character.isUnicodeIdentifierStart(unicodeString.codePointAt(0))
true
js> java.lang.Character.isUnicodeIdentifierPart(unicodeString.codePointAt(0))
true

@rbri
Copy link
Collaborator Author

rbri commented Aug 27, 2024

@tonygermano this means we do not need ro change the test?

@tonygermano
Copy link
Contributor

@tonygermano this means we do not need ro change the test?

For me the test failed on an older java 11, but passed when I ran it on a current java 11. I would say that means we don't need to change it, but I'll defer to others on this.

@gbrail
Copy link
Collaborator

gbrail commented Aug 28, 2024

On Windows, with the latest JDK 11, the tests failed before and passed with this change. Plus, I agree that putting in the literal code points is more portable anyway. Thanks!

@gbrail gbrail merged commit c0936ca into mozilla:master Aug 28, 2024
3 checks passed
@tonygermano
Copy link
Contributor

@gbrail I'm curious what the failure message was when running on the latest windows jdk 11. Was it different than the error I shared? What java distro were you using? I was testing with Azul Zulu.

For me there was definitely something broken in the rhino shell related to identifier names using unicode characters depending on which java version I was using. I think I would expect there to be a failing test to go along with that when the issue is java related, or we probably don't have full test coverage.

@gbrail
Copy link
Collaborator

gbrail commented Aug 28, 2024 via email

@gbrail
Copy link
Collaborator

gbrail commented Aug 28, 2024 via email

@tonygermano
Copy link
Contributor

Interesting... so different than the error I was getting in the older linux jdk 11. Your error looks like an issue with character encoding (I think it's probably reading the source file as windows-1252 instead of UTF-8.)

@tonygermano
Copy link
Contributor

js> new java.lang.String(new java.lang.String('é¿«').getBytes('windows-1252'), 'UTF-8')

@tonygermano
Copy link
Contributor

@gbrail If you revert the change and add systemProp.file.encoding=UTF-8 to gradle.properties does it still fail?

@tonygermano
Copy link
Contributor

tonygermano commented Aug 28, 2024

Ignore that last request. I got my linux jvm to fail in the same way as your windows one by adding this

diff --git a/buildSrc/src/main/groovy/rhino.java-conventions.gradle b/buildSrc/src/main/groovy/rhino.java-conventions.gradle
index 9e6d3b49..54587f4f 100644
--- a/buildSrc/src/main/groovy/rhino.java-conventions.gradle
+++ b/buildSrc/src/main/groovy/rhino.java-conventions.gradle
@@ -17,6 +17,10 @@ dependencies {
     testImplementation "javax.xml.soap:javax.xml.soap-api:1.4.0"
 }
 
+tasks.withType(JavaCompile) {
+    options.encoding = "windows-1252"
+}
+
 compileJava {
     options.compilerArgs = [
         '-Xlint:deprecation,unchecked'

So, I think the solution to allowing unicode non-ASCII characters in the source files would be to make that same change, but force it to "UTF-8" instead of "windows-1252." The tasks.withType(JavaCompile) takes care of both the compileJava and compileTestJava tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants