Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[java-source-utils] Add Java source code utilities #623

Merged
merged 1 commit into from
Jul 31, 2020

Conversation

jonpryor
Copy link
Member

@jonpryor jonpryor commented Apr 10, 2020

Context: https://github.com/javaparser/javaparser/tree/javaparser-parent-3.16.1

There are two parts of the current .jar binding toolchain which are
painful and could be improved:

  1. Parameter names
  2. Documentation extraction

Parameter names (1) are important because they become the names of
event members as part of "event-ification". As such they are
semantically important, and the default behavior of "p0" makes for a
terrible user experience.

If the .class files in the .jar file are built with
javac -parameters (4273e5c), then the .class file will contain
parameter names and we're good. However, this may not be the case.

If the .class files are built with javac -g, then we'll try to
deduce parameter names from debug info, but that's also problematic.

What else can be used to provide parameter names?

It is not unusual for Java libraries to provide "source .jar" files,
e.g. Android provides android-stubs-src.jar files, and other
libraries may provide a *-sources.jar file. The contents of these
files are Java source code. These files are used by Android IDEs to
provide documentation for the Java library. They contain classes,
methods, parameter names, and associated Javadoc documentation.

What they are not guaranteed to do is compile. As such, we can't
compile them ourselves with javac -parameters and then process the
.class files, as they may refer to unresolvable types.

"Interestingly", we already have some tooling to deal with this:
tools/param-name-importer uses a custom Irony grammar to parse
the Android SDK *-stubs-src.jar files to grab parameter names.
However, this tooling is too strict; try to pass arbitrary Java
source code at it, and it quickly fails.

Which brings us to documentation (2): we have a javadoc2mdoc tool
which will parse Javadoc HTML documentation and convert it into
mdoc(5) documentation, which can be later turned into
XML documentation comments files by way of
mdoc export-msxdoc(1), but this tool is (1) painful to
maintain, because it processes Javadoc HTML, and
(2) requires Javadoc HTML.

Google hasn't updated their downloadable Javadoc .zip file since
API-24 (2016-October). API-30 is currently stable.

If we want newer docs, we either need to scrape the
developer.android.com/reference website to use with the existing
tooling, or... we need to be able to read the Javadoc comments within
the *-stubs-src.jar files provided with the Android SDK.
(Note: Android SDK docs are Apache 2; file format conversion is fine.)

We thus have two use-cases for which parsing Java source code would
be useful..

As luck would have it, there's a decent Apache 2-licensed Java project
which supports parsing Java source code: JavaParser.

Add a new tools/java-source-utils program which will parse Java
source code to produce two artifacts: parameter names and
consolidated Javadoc documentation:

    $ java -jar java-source-utils.jar --help
    java-source-utils [-v] [<-a|--aar> AAR]* [<-j|--jar> JAR]* [<-s|--source> DIRS]*
            [--bootclasspath CLASSPATH]
            [<-P|--output-params> OUT.params.txt] [<-D|--output-javadocs> OUT.xml] FILES

Provide --output-params OUT.params.txt, and the specified file will
be created which follows the file format laid out in
JavaParameterNamesLoader.cs:

    package java.lang
    ;---------------------------------------
      class Object
        wait(long timeout)

Provide --output-javadocs OUT.xml, and the resulting file will be a
class-parse-like XML file which uses //@jni-signature as the "key"
and a child <javadoc/> element to contain documentation, e.g.:

    <api api-source="java-source-utils">
      <package name="java.lang">
        <class name="Object" jni-signature="Ljava/lang/Object;">
          <javadoc>…</javadoc>
          <constructor jni-signature="()V">
            <javadoc>…</javadoc>
          </constructor>
          <method name="wait" jni-signature="(J)V" jni-returns="V" returns="void">
            <parameter name="name" jni-type="J" type="long" />
            <javadoc>…</javadoc>
          </method>
        </class>
      </package
    </api>

This should make it possible to update the Xamarin.Android API
documentation without resorting to web scraping (and updating the code
to deal with whatever new HTML dialects are now used).

If neither --output-params nor --output-javadocs is used, then
--output-javadocs will be executed, writing to stdout.

The XML file also contains parameter name information, so that one
file can be the "source of truth" for parameter names and
documentation.

FILES can be:

  • Java source code in a .java file; or
  • A file with a .jar or .zip extension, which will be extracted
    into a temp directory and all .java files within the directory
    will be processed; or
  • A directory tree, and all .java files will be processed.

If a single file references other types, the "root" directory containing
those types may need to be specified via --source DIR:

    $ java -jar "bin/Debug/java-source-utils.jar" -v \
      -s $HOMEandroid-toolchain/sdk/platforms/_t  \
      $HOME/android-toolchain/sdk/platforms/_t/android/app/Activity.java \
      -P android.params.txt -D android.xml >o.txt 2>&1

TODO:

In some scenarios, types won't be resolvable. What should output be?

We don't want to require that everything be resolvable -- it's painful, and
possibly impossible, e.g. w/ internal types -- so instead we should "demark"
the unresolvable types.

.params.txt output will use .* as a type prefix, e.g.

    method(.*UnresolvableType foo, int bar);

docs.xml will output L.*UnresolvableType;.

Fix JavaParameterNamesLoader.cs to support the above.

@jonpryor jonpryor force-pushed the jonp-java-source-tool branch 3 times, most recently from d31510e to 1614864 Compare April 15, 2020 15:23
@jonpryor jonpryor force-pushed the jonp-java-source-tool branch 6 times, most recently from f537a23 to eed0075 Compare April 24, 2020 23:41
@jonpryor jonpryor mentioned this pull request May 5, 2020
@jonpryor jonpryor force-pushed the jonp-java-source-tool branch 4 times, most recently from eb23fb9 to 1fcd334 Compare July 29, 2020 19:24
@jonpryor jonpryor requested a review from jpobst July 30, 2020 15:49
@jpobst
Copy link
Contributor

jpobst commented Jul 30, 2020

Can you expand a bit on what the intended usage/audience of this is? Is it something that will help us build Mono.Android.dll or other Xamarin assemblies better? Or is it intended for 3rd parties?

My hunch is that ~nobody external uses our JavaDoc support. We should be very cautious about adding new "supported" components that will add to our maintenance burden but will rarely be used.

I wonder if for .NET6 we could move all this to an external NuGet. That is, we do not ship with any JavaDoc support by default, but if you add a NuGet package like Xamarin.Android.Bindings.JavaDoc to your project it would light up JavaDoc support. 🤔

@jonpryor jonpryor force-pushed the jonp-java-source-tool branch from 1fcd334 to ba2119f Compare July 30, 2020 17:27
@jonpryor jonpryor marked this pull request as ready for review July 30, 2020 17:27
@jonpryor jonpryor force-pushed the jonp-java-source-tool branch from ba2119f to fabebee Compare July 30, 2020 17:30
@jonpryor
Copy link
Member Author

Can you expand a bit on what the intended usage/audience of this is?

Absolutely.

Step 2: class-parse.exe integration; given:

java -jar java-source-tools path/to/android.jar --output-javadoc android.xml
mono class-parse.exe --docspath=android.xml …

I want the resulting api.xml to contain the Javadoc parsed from android.jar. (class-parse already has support for parsing api.xml-style "documentation" files for parameter name overrides, so this would just "extend" the preservation.)

Step 3: generator.exe integration: given an api.xml which contains <javadoc/> elements, generator.exe can emit C# documentation comments from the Javadoc.

Currently, the Mono.Android.xml that we ship comes from the https://github.com/xamarin/android-api-docs repo, which was last updated with API-24, and thus is quite old. Step (2) + Step (3) would ~immediately update the docs contained in Mono.Android.xml for all our customers, entirely bypassing the horror of parsing Javadoc HTML. (Never mind that we can't easily obtain the Javadoc HTML for API-29…)

Furthermore, once we have an updated Mono.Android.xml, we can feed those changes back into the xamarin/android-api-docs repo.

Step (2) and Step (3) also nicely solves dotnet/android#4789, and will make it easier and more reliable to (1) get appropriate parameter names in bindings, so long as @(JavaSourceJar) is provided, and (2) nicely import docs for the binding.

jonpryor added a commit to jonpryor/java.interop that referenced this pull request Jul 30, 2020
Context: dotnet#623
Context: dotnet#623 (comment)

DO NOT MERGE UNTIL AFTER PR dotnet#623 IS MERGED.

Update `class-parse --docspath=PATH` so that if `PATH` contains
`<javadoc/>` elements, as produced by `tools/java-source-utils` (PR dotnet#623),
then those `<javadoc/>` elements will be inserted into the generated
API description.

The intent is to eventually allow `generator` to emit the `<javadoc/>`
data as C# XML Documentation, allowing a pipeline of:

	java -jar java-source-tools path/to/android.jar --output-javadoc android.xml
	mono class-parse.exe --docspath=android.xml -o api.xml …
	mono generator.exe api.xml …
@jonpryor
Copy link
Member Author

See also: #684

jonpryor added a commit to jonpryor/java.interop that referenced this pull request Jul 30, 2020
Context: dotnet#623
Context: dotnet#623 (comment)

DO NOT MERGE UNTIL AFTER PR dotnet#623 IS MERGED.

Update `class-parse --docspath=PATH` so that if `PATH` contains
`<javadoc/>` elements, as produced by `tools/java-source-utils` (PR dotnet#623),
then those `<javadoc/>` elements will be inserted into the generated
API description.

The intent is to eventually allow `generator` to emit the `<javadoc/>`
data as C# XML Documentation, allowing a pipeline of:

	java -jar java-source-tools path/to/android.jar --output-javadoc android.xml
	mono class-parse.exe --docspath=android.xml -o api.xml …
	mono generator.exe api.xml …
@jonpryor
Copy link
Member Author

…and considering the size of PR #674, I think I want to hold off on a "full" implementation of Step (3) until after it's merged, but it shouldn't be too difficult to do a simple prototype…

@jonpryor
Copy link
Member Author

Context: dotnet/android#4789

@jonpryor
Copy link
Member Author

Step (3): #685

@jonpryor jonpryor force-pushed the jonp-java-source-tool branch from fabebee to d9f05c8 Compare July 31, 2020 18:37
Context: https://github.com/javaparser/javaparser/tree/javaparser-parent-3.16.1
Context: dotnet#623

There are two parts of the current `.jar` binding toolchain which are
painful and could be improved:

 1. Parameter names
 2. Documentation extraction

Parameter names (1) are important because they become the names of
event members as part of ["event-ification"][0].  As such they are
semantically important, and the default behavior of "p0" makes for a
terrible user experience.

*If* the `.class` files in the `.jar` file are built with
`javac -parameters` (4273e5c), then the `.class` file will contain
parameter names and we're good.  However, this may not be the case.

If the `.class` files are built with `javac -g`, then we'll try to
deduce parameter names from debug info, but that's also problematic.

What else can be used to provide parameter names?

It is not unusual for Java libraries to provide "source `.jar`" files,
e.g. Android provides `android-stubs-src.jar` files, and other
libraries may provide a `*-sources.jar` file.  The contents of these
files are *Java source code*.  These files are used by Android IDEs to
provide documentation for the Java library.  They contain classes,
methods, parameter names, and associated Javadoc documentation.

What they are *not* guaranteed to do is *compile*.  As such, we can't
compile them ourselves with `javac -parameters` and then process the
`.class` files, as they may refer to unresolvable types.

"Interestingly", we *already* have some tooling to deal with this:
`tools/param-name-importer` uses a custom Irony grammar to parse
the Android SDK `*-stubs-src.jar` files to grab parameter names.
However, this tooling is *too strict*; try to pass arbitrary Java
source code at it, and it quickly fails.

Which brings us to documentation (2): we have a [javadoc2mdoc][1] tool
which will parse Javadoc HTML documentation and convert it into
[**mdoc**(5)][2] documentation, which can be later turned into
[XML documentation comments][3] files by way of
[**mdoc export-msxdoc**(1)][4], but this tool is (1) painful to
maintain, because it processes Javadoc *HTML*, and
(2) *requires Javadoc HTML*.

Google hasn't updated their downloadable Javadoc `.zip` file since
API-24 (2016-October).  API-30 is currently stable.

If we want newer docs, we either need to scrape the
developer.android.com/reference website to use with the existing
tooling, or...  we need to be able to read the Javadoc comments within
the `*-stubs-src.jar` files provided with the Android SDK.
(Note: Android SDK docs are Apache 2; file format conversion is fine.)

We thus have two use-cases for which parsing Java source code would
be useful..

As luck would have it, there's a decent Apache 2-licensed Java project
which supports parsing Java source code: [JavaParser][5].

Add a new `tools/java-source-utils` program which will parse Java
source code to produce two artifacts: parameter names and
consolidated Javadoc documentation:

	$ java -jar java-source-utils.jar --help
	java-source-utils [-v] [<-a|--aar> AAR]* [<-j|--jar> JAR]* [<-s|--source> DIRS]*
		[--bootclasspath CLASSPATH]
		[<-P|--output-params> OUT.params.txt] [<-D|--output-javadoc> OUT.xml] FILES

Provide `--output-params OUT.params.txt`, and the specified file will
be created which follows the file format laid out in
[`JavaParameterNamesLoader.cs`][6]:

	package java.lang
	;---------------------------------------
	  class Object
	    wait(long timeout)

Provide `--output-javadocs OUT.xml`, and the resulting file will be a
`class-parse`-like XML file which uses `//@jni-signature` as the "key"
and a child `<javadoc/>` element to contain documentation, e.g.:

	<api api-source="java-source-utils">
	  <package name="java.lang">
	    <class name="Object" jni-signature="Ljava/lang/Object;">
	      <javadoc>…</javadoc>
	      <constructor jni-signature="()V">
	        <javadoc>…</javadoc>
	      </constructor>
	      <method name="wait" jni-signature="(J)V" jni-returns="V" returns="void">
	        <parameter name="name" jni-type="J" type="long" />
	        <javadoc>…</javadoc>
	      </method>
	    </class>
	  </package
	</api>

This should make it possible to update the Xamarin.Android API
documentation without resorting to web scraping (and updating the code
to deal with whatever new HTML dialects are now used).

If neither `--output-params` nor `--output-javadocs` is used, then
`--output-javadocs` will be executed, writing to stdout.

The XML file *also* contains parameter name information, so that one
file can be the "source of truth" for parameter names and
documentation.

`FILES` can be:

  * Java source code in a `.java` file; or
  * A file with a `.jar` or `.zip` extension, which will be extracted
    into a temp directory and all `.java` files within the directory
    will be processed; or
  * A directory tree, and all `.java` files will be processed.

If a single file references other types, the "root" directory containing
those types may need to be specified via `--source DIR`:

	$ java -jar "bin/Debug/java-source-utils.jar" -v \
	  -s $HOMEandroid-toolchain/sdk/platforms/_t  \
	  $HOME/android-toolchain/sdk/platforms/_t/android/app/Activity.java \
	  -P android.params.txt -D android.xml >o.txt 2>&1

TODO:

In some scenarios, types won't be resolvable.  What should output be?

We don't want to *require* that everything be resolvable -- it's painful, and
possibly impossible, e.g. w/ internal types -- so instead we should "demark"
the unresolvable types.

`.params.txt` output will use `.*` as a type prefix, e.g.

	method(.*UnresolvableType foo, int bar);

`docs.xml` will output `L.*UnresolvableType;`.

Fix JavaParameterNamesLoader.cs to support the above.

[0]: https://docs.microsoft.com/en-us/xamarin/android/internals/api-design#events-and-listeners
[1]: https://github.com/xamarin/xamarin-android/tree/d48cf04f9749664bf48fc16bcb920d5d941cccab/tools/javadoc2mdoc
[2]: http://docs.go-mono.com/?link=man%3amdoc(5)
[3]: https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/xmldoc/
[4]: http://docs.go-mono.com/?link=man%3amdoc-export-msxdoc(1)
[5]: https://javaparser.org
[6]: https://github.com/xamarin/java.interop/blob/93df5a200e7b6f1b5add451aff66bbcb24293720/src/Xamarin.Android.Tools.Bytecode/JavaParameterNamesLoader.cs#L45-L68
@jonpryor jonpryor force-pushed the jonp-java-source-tool branch from d9f05c8 to 9082bbd Compare July 31, 2020 18:39
@jonpryor jonpryor merged commit 69e1b80 into dotnet:master Jul 31, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants