Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Jackson-jq native compilation and CDI friendly #120

Merged
merged 4 commits into from
Sep 21, 2021

Conversation

ricardozanini
Copy link
Contributor

@ricardozanini ricardozanini commented Sep 3, 2021

Fixes #119

Hey @eiiches, in this PR, I'm introducing some suggestions to make the library native compilation friendly. Also, the ObjectMapper used by the Scope is now lazy-loaded with an extension point for CDI injection. This way, our Scope can use the same ObjectMapper used by a CDI container.

The modifications in the internal package were needed because now we need to serialize the entire Scope in build time to not load all the functions during runtime (which can decrease the application's startup time).

I'm working parallelly on a Quarkus Extension for jackson-jq to generate byte code in build-time. Part of this work is to serialize the scope and have it created as a CDI bean. It's done already and has been tested with this PR.

Now, I need to align with the Quarkus team where the extension code will reside: if we can have it here and integrate it with their CI, or if it should be a core extension.

After this, Quarkus users would be able to use this extension with:

public class MyService {

    @Inject
    JacksonJqQuarkusScope scope;

    public List<JsonNode> parse(String expression, JsonNode document) throws JsonQueryException {
        final JsonQuery query = JsonQuery.compile(expression, Versions.JQ_1_6);
        List<JsonNode> out = new ArrayList<>();
        query.apply(this.scope, document, out::add);
        return out;
    }
}

The most important thing is that the external user interface hasn't been changed. The same Usage example we have still works.

@eiiches
Copy link
Owner

eiiches commented Sep 9, 2021

Hi, thanks for submitting this PR! I've never used Quarkus myself and I need to ask a few questions first to understand how this affects future jackson-jq development.

  • Do we need to provide serialization compatibility in future jackson-jq releases? In other words, does Quarkus by any chance try to deserialize from the binary data generated by serializing a different (probably older) version of Scope?

    • Is it safe to add/remove/change fields in a class?
  • What are the requirements for a class to be (de)serializable in Quarkus? Does it really have to implement getters/setters like JavaBeans? or are there other ways to make it serializable.

About where to place the extension, unless there's a particular reason, I want to keep this repository minimal as possible. It'd be difficult to maintain many integrations/extensions/plugins for external tools/frameworks here.

@ricardozanini
Copy link
Contributor Author

ricardozanini commented Sep 9, 2021

Hey @eiiches! Many thanks for your considerations! Replies in-line.

Do we need to provide serialization compatibility in future jackson-jq releases? In other words, does Quarkus by any chance try to deserialize from the binary data generated by serializing a different (probably older) version of Scope?

Scope should be serializable on the Quarkus side, so we can either provide a Substitution in the future if we believe that might break something. Although, I don't think it will break old versions since the serialization/deserialization happens during the same process. We won't have new classes added in between. If users choose to upgrade jackson-jq, they will recompile the whole project, updating all the references. Also, I'm providing a Quarkus implementation of Scope in the extension. We can solve serialization issues there (if any) in the future.

Is it safe to add/remove/change fields in a class?

Yes :)

What are the requirements for a class to be (de)serializable in Quarkus? Does it really have to implement getters/setters like JavaBeans? or are there other ways to make it serializable.

Here are the requirements. Basically, JavaBeans. Another option is to annotate the constructors, but I don't think this is an option since it will depend on Quarkus in the core API, which we don't want.

Is it possible to use https://immutables.github.io/ (or https://developers.google.com/protocol-buffers) in future? It auto-generates immutable classes with getters but (obviously) without setters. I prefer objects to be immutable and want to avoid setters and default no-arg ctors. Can we use Object Substitution for this?

We can definitely use Substitutions for this, as I used for some Jackson objects. The problem is rewriting all these classes again just for that. Not a very clean option, IMO. But if you intend to keep all the internal objects immutable, it won't be able to serialize them on the Quarkus side. I think substitution will be the only way then.

About where to place the extension, unless there's a particular reason, I want to keep this repository minimal as possible. It'd be difficult to maintain many integrations/extensions/plugins for external tools/frameworks here.

That's reasonable. I'll speak with the Quarkus community if they can have us there. :)

@ricardozanini
Copy link
Contributor Author

@eiiches if the immutable is the way to go, this is what we need to do in the extension side:

public class SerializableVersion {

    public String version;

    public SerializableVersion() {

    }

    public SerializableVersion(Version version) {
        this.version = version.toString();
    }
}
public class VersionSubstitution implements ObjectSubstitution<Version, SerializableVersion> {

    @Override
    public SerializableVersion serialize(Version obj) {
        return new SerializableVersion(obj);
    }

    @Override
    public Version deserialize(SerializableVersion obj) {
        return Version.valueOf(obj.version);
    }
}

It will require some work to convert all the serializable objects but can be done. On upgrades, I should keep it up aligned. I'll probably explore a way of generating this code.

@eiiches
Copy link
Owner

eiiches commented Sep 10, 2021

Thanks for the answers.

The more I think of this, this serialization (or serialization in general) is inevitably leaking too much implementation details to the public and I don't think we can guarantee its stability over time. It'd be difficult to refactor codes or even implement a new feature without breaking the Quarkus extension (depending on how we do the serialization).

My understanding is that the whole point of the serialization is to avoid parsing jq at runtime and improve startup time, because serialization is probably faster than parsing jq. But I'm wondering if, there is much performance difference, especially when the input is not large. If there isn't significant difference, we can just read jq.json(s) and serialize/store them as strings (without parsing) in Static Init phase and parse them at runtime. This way, we can avoid making (almost) everything serializable. Wdyt?


Below is just my unorganized/incomplete brain dump (ignore this):

What we have to (de)serialize:

  • Expression classes (e.g. StringInterpolation, ForeachExpression, ...)
  • Function classes (e.g. JoinFunction, KeysFunction, ...)
  • Scope class

Each of these needs a different strategy for serialization.

Expression classes

Several possibilities here:

  1. Original PR

    Pros: no runtime dependency, simple
    Cons: non-immutable classes, code bloat (getters/setters, no-arg ctors)

  2. Object Substitution + Standard Java Serialization + Immutables (Immutables can auto-implement Serializable for value classes)

    public class ExpressionSubstitution implements ObjectSubstitution<Expression, byte[]> {
        @Override
        public byte[] serialize(Expression expr) { // Not sure if Quarkus supports ObjectSubstitution for interfaces...
            try (final ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
                try (final ObjectOutputStream oos = new ObjectOutputStream(baos)) {
                    oos.writeObject(expr);
                }
                return baos.toByteArray();
            }
        }
        // TODO: deserialize
    }

    Pros: no runtime dependency

  3. Object Substitution + toString()

    public class ExpressionSubstitution implements ObjectSubstitution<Expression, String> {
        @Override
        public String serialize(Expression expr) {
            return expr.toString();
        }
        // TODO: deserialize by parsing the string using ExpressionParser
    }

    Pros: easy, no runtime dependency
    Cons: the whole point of serialization was to avoid the parsing cost at runtime

  4. ObjectSubstitution (with protobuf serialized bytes) + Protocol Buffers (basically same as 1 but with protobuf)

    Cons: runtime protobuf dependency (can be shaded away by maven-shade-plugin)

  5. ObjectSubstitution (with generated builder) + Protocol Buffers (same as 3 but with Quarkus native serialization)

    // assuming Version.Builder generated by protoc is natively serializable by Quarkus because it has getters/setters and the default private no-arg ctor
    // TODO: does Quarkus recognize non-public ctor
    public class VersionSubstitution implements VersionSubstitution<Version, Version.Builder> {
        @Override
        public Version.Builder serialize(Version v) {
            return v.toBuilder();
        }
        // TODO: call Version.Builder#build() in deserialize
    }

    Cons: we have to make this for every class (or can we just do implements ProtobufMessageSubstitution<GeneratedMessageV3, GeneratedMessageV3.Builder>)

The biggest problem is that, we can NOT switch between these without breaking the extension.

Function classes

Built-in Functions naturally have (implicit) no-arg default constructors. Quarkus can natively handle these classes.

Scope class

TBD

@eiiches
Copy link
Owner

eiiches commented Sep 10, 2021

Fwiw, the Scope initialization roughly takes about 0.6ms (OpenJDK 1.8) or 1.2ms (OpenJDK 15) on my i5-9600K machine.

public static void main(String[] args) {
	for (int j = 0; j < 1000; ++j) {
		final long start = System.currentTimeMillis();
		for (int i = 0; i < 1000; ++i) {
			final Scope s = Scope.newEmptyScope();
			BuiltinFunctionLoader.getInstance().loadFunctions(Versions.JQ_1_6, s);
		}
		final long took = System.currentTimeMillis() - start;
		System.out.printf("%f ms%n", took / 1000.0);
	}
}

@ricardozanini
Copy link
Contributor Author

Hey @eiiches!

Many thanks for your effort in exploring the serialization possibilities! Really appreciated!

Today, I made a few tests in Java and Native mode using my poc to understand if we really need serialization in the first place. Here's the result:

Scenario Start up time [1] Memory RSS [2] Docker Image Size Compilation Time
Java and Quarkus Extension 230ms 361 MB 387 MB 11.203 s
Java and no Extension 230ms 369 MB 386 MB 11.032 s

I dare to say that these results are almost neglectable. There's no gain with the extension to pre-build the functions, and we shouldn't serialize the objects whatsoever.

The gain here is from the developer's perspective, which will have a reference of Scope in the container to inject it in any service.

But the interesting part is when we are in native mode:

Scenario Start up time [1] Memory RSS [2] Docker Image Size Compilation Time
Native and Quarkus Extension .005ms 43MB 142 MB 01:48 min
Native and no Extension .006ms 52MB 171 MB 02:20 min

In native mode, we can see a considerable decrease in the image size and compilation time. This can be really significant in some environments, especially if we need to spin a handful number of pods.

One important thing to note is that the start-up time wasn't affected whatsoever, which aligns with your test.

The only thing to consider now is the amount of code we need to add in the native mode without the extension. This happens because I had to add some classes for reflection and turn on the service load registration.

I believe we can change this behavior. The extension can pre-load the functions in the classloader, avoiding using the Service Loader. It would require a modification in the way we create the Scope, basically accepting functions references.

This way, I can create the Scope in runtime, having all the functions serialized. Thus, we cover both worlds, and we won't bloat the API to external users. wdyt?

PoC with extension: https://github.com/ricardozanini/jq-native-poc/tree/with-extension
PoC with reflection enabled: https://github.com/ricardozanini/jq-native-poc/tree/main.

--
[1] - Using this guide
[2] - pmap measure

@ricardozanini
Copy link
Contributor Author

@eiiches, I reverted the changes from the internal package. However, please consider the current changes in the BultinFunctionLoader as a WIP since I experimented on the extension side.

Now, the strategy is only to serialize the functions on build time. This way, we do not need to mark anything for reflection nor enable the service loader in native mode.

Thus, I've been able to keep somewhat the test numbers from the table above. The only difference now is that some classes have to be loaded in runtime because of the macros. This is reflected in a change from 142 MB to 152 MB in the final image size. However, the compilation time hasn't changed.

We need to settle now how we can tweak the interface of BultinFunctionLoader to enable this "two-stage" loading. Also, the JsonQueryJacksonModule had to have a default constructor to be registered in build time with Quarkus Jackson Module. Do you see a problem here?

Copy link
Owner

@eiiches eiiches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built-in Functions naturally have (implicit) no-arg default constructors. Quarkus can natively handle these classes.

Sorry, I think I was wrong about this. Functions does have default constructors, but the functions might contain references to objects that Quarkus cannot (de)serialize.

So, the functions also need to be instanciated at runtime, not via deserialization. Looking at the Quarkus doc, this seems possible by using RUNTIME_INIT and Bytecode Recording but I'm not sure...

@Record(RUNTIME_INIT)
@BuildStep
public void jqFunctionsBuildStep(JqFunctionsRecorder recorder) {
    final Map<String, Function> functions = BuiltinFunctionLoader.getInstance().loadFunctionsFromServiceLoader(BuiltinFunctionLoader.class.getClassLoader(), Versions.JQ_1_6);
    functions.forEach((name, function) -> {
       recorder.addFunction(name, function.getClass());
    }); 
}

@Recorder
class JqFunctionsRecorder {
  public void addFunction(String name, final Class<?> clazz) {
    // we know the passed class always has a default ctor
    // TODO: we probably want to generate byte codes that directly calls the ctor, instead of  invokevirtual Class.newInstance
    scope.addFunction(name, clazz.newInstance());
  }
}

As for JsonQueryJacksonModule, I don't think this module should be registered to Quarkus's global ObjectMapper. It affects how DoubleNode and FloatNode are serialized to strings and might break other parts of the user applications. I think I need to understand why Quarkus uses a CDI-managed ObjectMapper instead of just new ObjectMapper().

@eiiches
Copy link
Owner

eiiches commented Sep 15, 2021

why Quarkus uses a CDI-managed ObjectMapper instead of just new ObjectMapper()

https://quarkus.io/guides/rest-json#json

If the only reason Quarkus does this is to make it easy to configure ObjectMapper settings, I think we can (and should) just use a separate ObjectMapper for jackson-jq (EDIT: because the ObjectMapper configurations should not affect each other).

@ricardozanini
Copy link
Contributor Author

ricardozanini commented Sep 17, 2021

Built-in Functions naturally have (implicit) no-arg default constructors. Quarkus can natively handle these classes.

Sorry, I think I was wrong about this. Functions does have default constructors, but the functions might contain references to objects that Quarkus cannot (de)serialize.

Don't worry. The functions loaded by the service loader can be deserialized since they don't have a state. Although, the functions that we call "macros", have a state and can't be immutable to follow the standards of this library. That's why the extension now is loading in two phases:

  1. In build time we load all the functions using the service loader to avoid using the loader in runtime.
  2. When creating the Scope synthetic bean, we load the macros since we are in runtime and we don't need do worry about serialization.

So, the functions also need to be instanciated at runtime, not via deserialization. Looking at the Quarkus doc, this seems possible by using RUNTIME_INIT and Bytecode Recording but I'm not sure...

That's how we are doing right now in the extension part:

// processor in build time
class JacksonJqProcessor {
     //....

    @BuildStep
    @Record(ExecutionTime.STATIC_INIT)
    SyntheticBeanBuildItem quarkusScopeBean(JacksonJqQuarkusScopeRecorder recorder,
                                            RecorderContext context) throws NoSuchMethodException {
        // preload everything and use it as a parent scope
        final Map<String, Function> functions =
                BuiltinFunctionLoader.getInstance()
                        .loadFunctionsFromServiceLoader(BuiltinFunctionLoader.class.getClassLoader(), Versions.JQ_1_6);

        return SyntheticBeanBuildItem
                .configure(Scope.class)
                .scope(Singleton.class)
                .runtimeValue(recorder.createScope(functions))
                .done();
    }
}

// creating the synthetic bean in runtime
@Recorder
public class JacksonJqScopeRecorder {

    public RuntimeValue<Scope> createScope(final Map<String, Function> functions) {
        final Scope scope = Scope.newEmptyScope();
        functions.forEach(scope::addFunction);
        BuiltinFunctionLoader.getInstance().loadFunctionsFromJsonJq(
                BuiltinFunctionLoader.class.getClassLoader(),
                Versions.JQ_1_6,
                scope).forEach(scope::addFunction);
        return new RuntimeValue<>(scope);
    }
}

As for JsonQueryJacksonModule, I don't think this module should be registered to Quarkus's global ObjectMapper. It affects how DoubleNode and FloatNode are serialized to strings and might break other parts of the user applications. I think I need to understand why Quarkus uses a CDI-managed ObjectMapper instead of just new ObjectMapper().

Oh! This is just to limit the amount of ObjectMapper in memory. We are trying to avoid having too many mappers in the container. There's no restriction. I'll create the mapper in build time and register the JacksonJq on it just for the Scope.

Many thanks @eiiches!

Signed-off-by: Ricardo Zanini <[email protected]>
@ricardozanini
Copy link
Contributor Author

Well, simple enough. Now, all we need is to expose the load functions from the BuiltinFunctionLoader to load them in two stages.

@eiiches
Copy link
Owner

eiiches commented Sep 17, 2021

Thanks for the explanation! @ricardozanini

The functions loaded by the service loader can be deserialized since they don't have a state.

Hmm, I'm still not sure about this. How about this function?

@BuiltinFunction("foo")
public class FooFunction implements Function {
    // Can Quarkus (de)serialize this field? Also, what happens if this field is made `static`?
    // * This field is declared private final (and doesn't have setters).
    // * IntNode doesn't have a default ctor.
    private final JsonNode defaultValue = IntNode.valueOf(1);

    public FooFunction() {} // implicit default ctor

    @Override
    public void apply(...) {
        ... // maybe return the defaultValue under some circumstances
    }
}

The rest of the part looks good and I'm ready to merge this PR once this Function (de)serialization thing is settled.

@ricardozanini
Copy link
Contributor Author

Hmm, I'm still not sure about this. How about this function?

I don't think we will have a problem in such a case since Quarkus will create the class this same way, using this same reference. As long as we don't add an accessor to the attribute, it will be fine.

Behind the scene, Quarkus will use the default no-arg constructor to deserialize the class like a simple new FooFunction(). Since it will use the class reference we have in the classpath, the defaultValue will be 1.

I made a little test to prove this assumption:

    @BuildStep
    @Record(ExecutionTime.STATIC_INIT)
    SyntheticBeanBuildItem quarkusScopeBean(JacksonJqScopeRecorder recorder,
                                            RecorderContext context) throws NoSuchMethodException {
        // preload everything and use it as a parent scope
        final Map<String, Function> functions =
                BuiltinFunctionLoader.getInstance()
                        .loadFunctionsFromServiceLoader(BuiltinFunctionLoader.class.getClassLoader(), Versions.JQ_1_6);

        functions.put("foo/0", new FooFunction());

        return SyntheticBeanBuildItem
                .configure(Scope.class)
                .scope(Singleton.class)
                .runtimeValue(recorder.createScope(functions))
                .done();
    }

The defaultValue is 1 in the synthetic bean creation (after deserialization):

    public RuntimeValue<Scope> createScope(final Map<String, Function> functions) {
        final Scope scope = Scope.newEmptyScope();
        functions.forEach(scope::addFunction);
        BuiltinFunctionLoader.getInstance().loadFunctionsFromJsonJq(
                BuiltinFunctionLoader.class.getClassLoader(),
                Versions.JQ_1_6,
                scope).forEach(scope::addFunction);
        return new RuntimeValue<>(scope);
    }

Test:

    @Inject
     Scope scope;

    @Test
    public void verifyFooFunction() {
        final Function fooFunction = scope.getFunction("foo", 0);
        assertTrue(fooFunction instanceof FooFunction);
    }

Now, if we add an accessor to FooFunction like getDefaultValue, Quarkus will fail because it understands that this is a read-only field:

java.lang.RuntimeException: java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
	[error]: Build step io.quarkus.deployment.steps.MainClassBuildStep#build threw an exception: java.lang.RuntimeException: Failed to record call to method public io.quarkus.runtime.RuntimeValue net.thisptr.jackson.jq.quarkus.JacksonJqScopeRecorder.createScope(java.util.Map)
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.writeBytecode(BytecodeRecorderImpl.java:462)
	at io.quarkus.deployment.steps.MainClassBuildStep.writeRecordedBytecode(MainClassBuildStep.java:455)
	at io.quarkus.deployment.steps.MainClassBuildStep.build(MainClassBuildStep.java:177)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at io.quarkus.deployment.ExtensionLoader$2.execute(ExtensionLoader.java:820)
	at io.quarkus.builder.BuildContext.run(BuildContext.java:277)
	at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
	at java.base/java.lang.Thread.run(Thread.java:829)
	at org.jboss.threads.JBossThread.run(JBossThread.java:501)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Cannot serialise field 'defaultValue' on object 'net.thisptr.jackson.jq.quarkus.FooFunction@71cb28be' as the property is read only
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.loadComplexObject(BytecodeRecorderImpl.java:1311)
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.loadObjectInstanceImpl(BytecodeRecorderImpl.java:966)
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.loadObjectInstance(BytecodeRecorderImpl.java:558)
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.loadComplexObject(BytecodeRecorderImpl.java:1147)
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.loadObjectInstanceImpl(BytecodeRecorderImpl.java:966)
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.loadObjectInstance(BytecodeRecorderImpl.java:558)
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.writeBytecode(BytecodeRecorderImpl.java:457)
	... 13 more
Caused by: java.lang.RuntimeException: Cannot serialise field 'defaultValue' on object 'net.thisptr.jackson.jq.quarkus.FooFunction@71cb28be' as the property is read only
	at io.quarkus.deployment.recording.BytecodeRecorderImpl.loadComplexObject(BytecodeRecorderImpl.java:1302)
	... 19 more

So if we do not have getters with a setter counterpart, we should be fine.

Is that fine in your opinion?

@eiiches
Copy link
Owner

eiiches commented Sep 17, 2021

Thanks, I understand it now.

Since functions having getters (or non-static public fields) are unusual enough (not 100% sure there won't be though), I think I'm fine with the current implementation if we can't easily avoid (de)serialization 👍

@eiiches
Copy link
Owner

eiiches commented Sep 17, 2021

One more thing: BuiltinFunctionLoader.getInstance().loadFunctionsFromJsonJq() is called at runtime (right?) and reads jq.json file from classpath. Is this intended? Just to make sure because I thought you might want to avoid reading files at runtime.

@ricardozanini
Copy link
Contributor Author

Hey @eiiches!

Yes, it's intended. We need that because these functions can't be serialized in build time because of the immutability.

That's fine, though. Based on my tests, that won't be a problem.

Many thanks!

@eiiches
Copy link
Owner

eiiches commented Sep 21, 2021

@ricardozanini Alright then, I'm merging this now.

Thanks for all your work and effort here! I'll publish a new release (1.0.0-preview.YYYYMMDD) with this change in a few days.

@eiiches eiiches merged commit e5a2971 into eiiches:develop/1.x Sep 21, 2021
@ricardozanini ricardozanini deleted the feature/quarkus-extension branch September 22, 2021 14:03
@eiiches
Copy link
Owner

eiiches commented Sep 25, 2021

Released in https://github.com/eiiches/jackson-jq/releases/tag/1.0.0-preview.20210926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Publish Scope as a CDI bean and make ObjectMapper injectable
3 participants