-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Jackson-jq native compilation and CDI friendly #120
Make Jackson-jq native compilation and CDI friendly #120
Conversation
jackson-jq/src/main/java/net/thisptr/jackson/jq/ObjectMapperDefaultProvider.java
Outdated
Show resolved
Hide resolved
Hi, thanks for submitting this PR! I've never used Quarkus myself and I need to ask a few questions first to understand how this affects future jackson-jq development.
About where to place the extension, unless there's a particular reason, I want to keep this repository minimal as possible. It'd be difficult to maintain many integrations/extensions/plugins for external tools/frameworks here. |
Hey @eiiches! Many thanks for your considerations! Replies in-line.
Yes :)
Here are the requirements. Basically, JavaBeans. Another option is to annotate the constructors, but I don't think this is an option since it will depend on Quarkus in the core API, which we don't want.
We can definitely use Substitutions for this, as I used for some Jackson objects. The problem is rewriting all these classes again just for that. Not a very clean option, IMO. But if you intend to keep all the internal objects immutable, it won't be able to serialize them on the Quarkus side. I think substitution will be the only way then.
That's reasonable. I'll speak with the Quarkus community if they can have us there. :) |
@eiiches if the immutable is the way to go, this is what we need to do in the extension side: public class SerializableVersion {
public String version;
public SerializableVersion() {
}
public SerializableVersion(Version version) {
this.version = version.toString();
}
} public class VersionSubstitution implements ObjectSubstitution<Version, SerializableVersion> {
@Override
public SerializableVersion serialize(Version obj) {
return new SerializableVersion(obj);
}
@Override
public Version deserialize(SerializableVersion obj) {
return Version.valueOf(obj.version);
}
} It will require some work to convert all the serializable objects but can be done. On upgrades, I should keep it up aligned. I'll probably explore a way of generating this code. |
Thanks for the answers. The more I think of this, this serialization (or serialization in general) is inevitably leaking too much implementation details to the public and I don't think we can guarantee its stability over time. It'd be difficult to refactor codes or even implement a new feature without breaking the Quarkus extension (depending on how we do the serialization). My understanding is that the whole point of the serialization is to avoid parsing jq at runtime and improve startup time, because serialization is probably faster than parsing jq. But I'm wondering if, there is much performance difference, especially when the input is not large. If there isn't significant difference, we can just read jq.json(s) and serialize/store them as strings (without parsing) in Static Init phase and parse them at runtime. This way, we can avoid making (almost) everything serializable. Wdyt? Below is just my unorganized/incomplete brain dump (ignore this): What we have to (de)serialize:
Each of these needs a different strategy for serialization. Expression classesSeveral possibilities here:
The biggest problem is that, we can NOT switch between these without breaking the extension. Function classesBuilt-in Functions naturally have (implicit) no-arg default constructors. Quarkus can natively handle these classes. Scope classTBD |
Fwiw, the Scope initialization roughly takes about 0.6ms (OpenJDK 1.8) or 1.2ms (OpenJDK 15) on my i5-9600K machine. public static void main(String[] args) {
for (int j = 0; j < 1000; ++j) {
final long start = System.currentTimeMillis();
for (int i = 0; i < 1000; ++i) {
final Scope s = Scope.newEmptyScope();
BuiltinFunctionLoader.getInstance().loadFunctions(Versions.JQ_1_6, s);
}
final long took = System.currentTimeMillis() - start;
System.out.printf("%f ms%n", took / 1000.0);
}
} |
Hey @eiiches! Many thanks for your effort in exploring the serialization possibilities! Really appreciated! Today, I made a few tests in Java and Native mode using my poc to understand if we really need serialization in the first place. Here's the result:
I dare to say that these results are almost neglectable. There's no gain with the extension to pre-build the functions, and we shouldn't serialize the objects whatsoever. The gain here is from the developer's perspective, which will have a reference of But the interesting part is when we are in native mode:
In native mode, we can see a considerable decrease in the image size and compilation time. This can be really significant in some environments, especially if we need to spin a handful number of pods. One important thing to note is that the start-up time wasn't affected whatsoever, which aligns with your test. The only thing to consider now is the amount of code we need to add in the native mode without the extension. This happens because I had to add some classes for reflection and turn on the service load registration. I believe we can change this behavior. The extension can pre-load the functions in the classloader, avoiding using the Service Loader. It would require a modification in the way we create the Scope, basically accepting functions references. This way, I can create the PoC with extension: https://github.com/ricardozanini/jq-native-poc/tree/with-extension -- |
Signed-off-by: Ricardo Zanini <[email protected]>
@eiiches, I reverted the changes from the Now, the strategy is only to serialize the functions on build time. This way, we do not need to mark anything for reflection nor enable the service loader in native mode. Thus, I've been able to keep somewhat the test numbers from the table above. The only difference now is that some classes have to be loaded in runtime because of the macros. This is reflected in a change from 142 MB to 152 MB in the final image size. However, the compilation time hasn't changed. We need to settle now how we can tweak the interface of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Built-in Functions naturally have (implicit) no-arg default constructors. Quarkus can natively handle these classes.
Sorry, I think I was wrong about this. Functions does have default constructors, but the functions might contain references to objects that Quarkus cannot (de)serialize.
So, the functions also need to be instanciated at runtime, not via deserialization. Looking at the Quarkus doc, this seems possible by using RUNTIME_INIT and Bytecode Recording but I'm not sure...
@Record(RUNTIME_INIT)
@BuildStep
public void jqFunctionsBuildStep(JqFunctionsRecorder recorder) {
final Map<String, Function> functions = BuiltinFunctionLoader.getInstance().loadFunctionsFromServiceLoader(BuiltinFunctionLoader.class.getClassLoader(), Versions.JQ_1_6);
functions.forEach((name, function) -> {
recorder.addFunction(name, function.getClass());
});
}
@Recorder
class JqFunctionsRecorder {
public void addFunction(String name, final Class<?> clazz) {
// we know the passed class always has a default ctor
// TODO: we probably want to generate byte codes that directly calls the ctor, instead of invokevirtual Class.newInstance
scope.addFunction(name, clazz.newInstance());
}
}
As for JsonQueryJacksonModule, I don't think this module should be registered to Quarkus's global ObjectMapper. It affects how DoubleNode and FloatNode are serialized to strings and might break other parts of the user applications. I think I need to understand why Quarkus uses a CDI-managed ObjectMapper instead of just new ObjectMapper()
.
jackson-jq/src/main/java/net/thisptr/jackson/jq/BuiltinFunctionLoader.java
Outdated
Show resolved
Hide resolved
jackson-jq/src/main/java/net/thisptr/jackson/jq/BuiltinFunctionLoader.java
Outdated
Show resolved
Hide resolved
https://quarkus.io/guides/rest-json#json If the only reason Quarkus does this is to make it easy to configure ObjectMapper settings, I think we can (and should) just use a separate ObjectMapper for jackson-jq (EDIT: because the ObjectMapper configurations should not affect each other). |
Don't worry. The functions loaded by the service loader can be deserialized since they don't have a state. Although, the functions that we call "macros", have a state and can't be immutable to follow the standards of this library. That's why the extension now is loading in two phases:
That's how we are doing right now in the extension part: // processor in build time
class JacksonJqProcessor {
//....
@BuildStep
@Record(ExecutionTime.STATIC_INIT)
SyntheticBeanBuildItem quarkusScopeBean(JacksonJqQuarkusScopeRecorder recorder,
RecorderContext context) throws NoSuchMethodException {
// preload everything and use it as a parent scope
final Map<String, Function> functions =
BuiltinFunctionLoader.getInstance()
.loadFunctionsFromServiceLoader(BuiltinFunctionLoader.class.getClassLoader(), Versions.JQ_1_6);
return SyntheticBeanBuildItem
.configure(Scope.class)
.scope(Singleton.class)
.runtimeValue(recorder.createScope(functions))
.done();
}
}
// creating the synthetic bean in runtime
@Recorder
public class JacksonJqScopeRecorder {
public RuntimeValue<Scope> createScope(final Map<String, Function> functions) {
final Scope scope = Scope.newEmptyScope();
functions.forEach(scope::addFunction);
BuiltinFunctionLoader.getInstance().loadFunctionsFromJsonJq(
BuiltinFunctionLoader.class.getClassLoader(),
Versions.JQ_1_6,
scope).forEach(scope::addFunction);
return new RuntimeValue<>(scope);
}
}
Oh! This is just to limit the amount of Many thanks @eiiches! |
Signed-off-by: Ricardo Zanini <[email protected]>
Signed-off-by: Ricardo Zanini <[email protected]>
Well, simple enough. Now, all we need is to expose the load functions from the |
Thanks for the explanation! @ricardozanini
Hmm, I'm still not sure about this. How about this function? @BuiltinFunction("foo")
public class FooFunction implements Function {
// Can Quarkus (de)serialize this field? Also, what happens if this field is made `static`?
// * This field is declared private final (and doesn't have setters).
// * IntNode doesn't have a default ctor.
private final JsonNode defaultValue = IntNode.valueOf(1);
public FooFunction() {} // implicit default ctor
@Override
public void apply(...) {
... // maybe return the defaultValue under some circumstances
}
} The rest of the part looks good and I'm ready to merge this PR once this Function (de)serialization thing is settled. |
I don't think we will have a problem in such a case since Quarkus will create the class this same way, using this same reference. As long as we don't add an accessor to the attribute, it will be fine. Behind the scene, Quarkus will use the default no-arg constructor to deserialize the class like a simple I made a little test to prove this assumption: @BuildStep
@Record(ExecutionTime.STATIC_INIT)
SyntheticBeanBuildItem quarkusScopeBean(JacksonJqScopeRecorder recorder,
RecorderContext context) throws NoSuchMethodException {
// preload everything and use it as a parent scope
final Map<String, Function> functions =
BuiltinFunctionLoader.getInstance()
.loadFunctionsFromServiceLoader(BuiltinFunctionLoader.class.getClassLoader(), Versions.JQ_1_6);
functions.put("foo/0", new FooFunction());
return SyntheticBeanBuildItem
.configure(Scope.class)
.scope(Singleton.class)
.runtimeValue(recorder.createScope(functions))
.done();
} The public RuntimeValue<Scope> createScope(final Map<String, Function> functions) {
final Scope scope = Scope.newEmptyScope();
functions.forEach(scope::addFunction);
BuiltinFunctionLoader.getInstance().loadFunctionsFromJsonJq(
BuiltinFunctionLoader.class.getClassLoader(),
Versions.JQ_1_6,
scope).forEach(scope::addFunction);
return new RuntimeValue<>(scope);
} Test: @Inject
Scope scope;
@Test
public void verifyFooFunction() {
final Function fooFunction = scope.getFunction("foo", 0);
assertTrue(fooFunction instanceof FooFunction);
} Now, if we add an accessor to
So if we do not have getters with a setter counterpart, we should be fine. Is that fine in your opinion? |
Thanks, I understand it now. Since functions having getters (or non-static public fields) are unusual enough (not 100% sure there won't be though), I think I'm fine with the current implementation if we can't easily avoid (de)serialization 👍 |
One more thing: |
Hey @eiiches! Yes, it's intended. We need that because these functions can't be serialized in build time because of the immutability. That's fine, though. Based on my tests, that won't be a problem. Many thanks! |
@ricardozanini Alright then, I'm merging this now. Thanks for all your work and effort here! I'll publish a new release (1.0.0-preview.YYYYMMDD) with this change in a few days. |
Fixes #119
Hey @eiiches, in this PR, I'm introducing some suggestions to make the library native compilation friendly. Also, the
ObjectMapper
used by theScope
is now lazy-loaded with an extension point for CDI injection. This way, ourScope
can use the sameObjectMapper
used by a CDI container.The modifications in the
internal
package were needed because now we need to serialize the entireScope
in build time to not load all the functions during runtime (which can decrease the application's startup time).I'm working parallelly on a Quarkus Extension for
jackson-jq
to generate byte code in build-time. Part of this work is to serialize the scope and have it created as a CDI bean. It's done already and has been tested with this PR.Now, I need to align with the Quarkus team where the extension code will reside: if we can have it here and integrate it with their CI, or if it should be a core extension.
After this, Quarkus users would be able to use this extension with:
The most important thing is that the external user interface hasn't been changed. The same Usage example we have still works.