Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple, queue based kryo pool #230

Merged
merged 1 commit into from
Jul 25, 2014
Merged

Conversation

magro
Copy link
Collaborator

@magro magro commented Jul 21, 2014

I'm submitting this as a proposal for a simple kryo pool, didn't want to push it directly into master. Some words on the suggested pool...

Kryo instances are created using a KryoFactory that's passed to the pool.
The pool uses a ConcurrentLinkedQueue to manage kryo instances.
The pool also allows to run callbacks by passing a kryo instance.

The included KryoPoolBenchmarkTest (with ITER_CNT = 100000) shows
the following output for me (excerpt):

>>> With pool (average): 8 ms
>>> Without pool (average): 2,105 ms

KryoPool usage example:

KryoFactory factory = new KryoFactory() {
  public Kryo create () {
    Kryo kryo = new Kryo();
    // configure kryo
    return kryo;
  }
};
KryoPool pool = new KryoPool(factory);
Kryo kryo = pool.borrow();
// do s.th. with kryo here, and afterwards release it
pool.release(kryo);

// or use a callback to work with kryo
String value = pool.run(new KryoCallback() {
  public String execute(Kryo kryo) {
    return kryo.readObject(input, String.class);
  }
});

@serverperformance
Copy link
Contributor

Cool. Simple and clean :)

I would soft-reference the pooled instances, to release them when GC reclaims space.

This would keep the simplicity of the artifact (zero-configuration) while becoming GC-friendly.

Regards,
Tumi

-----Original Message-----
From: Martin Grotzke [email protected]
Date: Sun, 20 Jul 2014 18:04:31
To: EsotericSoftware/[email protected]
Reply-To: EsotericSoftware/kryo [email protected]
Subject: [kryo] Add simple, queue based kryo pool (#230)

I'm submitting this as a proposal for a simple kryo pool, didn't want to push it directly into master. Some words on the suggested pool...

Kryo instances are created using a KryoFactory that's passed to the pool.
The pool uses a ConcurrentLinkedQueue to manage kryo instances.
The pool also allows to run callbacks by passing a kryo instance.

The included KryoPoolBenchmarkTest (with ITER_CNT = 100000) shows
the following output for me (excerpt):

>>> With pool (average): 8 ms
>>> Without pool (average): 2,105 ms

KryoPool usage example:

KryoFactory factory = new KryoFactory() {
  public Kryo create () {
    Kryo kryo = new Kryo();
    // configure kryo
    return kryo;
  }
};
KryoPool pool = new KryoPool(factory);
Kryo kryo = pool.borrow();
// do s.th. with kryo here, and afterwards release it
pool.release(kryo);

// or use a callback to work with kryo
String value = pool.run(new KryoCallback() {
  public String execute(Kryo kryo) {
    return kryo.readObject(input, String.class);
  }
});

You can merge this Pull Request by running:

git pull https://github.com/magro/kryo master

Or you can view, comment on it, or merge it online at:

#230

-- Commit Summary --

  • Add simple, queue based kryo pool.

-- File Changes --

A src/com/esotericsoftware/kryo/pool/KryoCallback.java (14)
A src/com/esotericsoftware/kryo/pool/KryoFactory.java (12)
A src/com/esotericsoftware/kryo/pool/KryoPool.java (81)
A test/com/esotericsoftware/kryo/pool/KryoPoolBenchmarkTest.java (159)
A test/com/esotericsoftware/kryo/pool/KryoPoolTest.java (112)

-- Patch Links --

https://github.com/EsotericSoftware/kryo/pull/230.patch
https://github.com/EsotericSoftware/kryo/pull/230.diff


Reply to this email directly or view it on GitHub:
#230

@serverperformance
Copy link
Contributor

And if you go this way, for me is always preferrable to soft-reference the hole backing queue than to softreference eache queue entry, as th GC has less work to do. Reclaim all or nothing.

Tumi

-----Original Message-----
From: "Tumi" [email protected]
Date: Mon, 21 Jul 2014 14:05:40
To: EsotericSoftware/[email protected]; EsotericSoftware/[email protected]
Reply-To: [email protected]
Subject: Re: [kryo] Add simple, queue based kryo pool (#230)

Cool. Simple and clean :)

I would soft-reference the pooled instances, to release them when GC reclaims space.

This would keep the simplicity of the artifact (zero-configuration) while becoming GC-friendly.

Regards,
Tumi

-----Original Message-----
From: Martin Grotzke [email protected]
Date: Sun, 20 Jul 2014 18:04:31
To: EsotericSoftware/[email protected]
Reply-To: EsotericSoftware/kryo [email protected]
Subject: [kryo] Add simple, queue based kryo pool (#230)

I'm submitting this as a proposal for a simple kryo pool, didn't want to push it directly into master. Some words on the suggested pool...

Kryo instances are created using a KryoFactory that's passed to the pool.
The pool uses a ConcurrentLinkedQueue to manage kryo instances.
The pool also allows to run callbacks by passing a kryo instance.

The included KryoPoolBenchmarkTest (with ITER_CNT = 100000) shows
the following output for me (excerpt):

>>> With pool (average): 8 ms
>>> Without pool (average): 2,105 ms

KryoPool usage example:

KryoFactory factory = new KryoFactory() {
  public Kryo create () {
    Kryo kryo = new Kryo();
    // configure kryo
    return kryo;
  }
};
KryoPool pool = new KryoPool(factory);
Kryo kryo = pool.borrow();
// do s.th. with kryo here, and afterwards release it
pool.release(kryo);

// or use a callback to work with kryo
String value = pool.run(new KryoCallback() {
  public String execute(Kryo kryo) {
    return kryo.readObject(input, String.class);
  }
});

You can merge this Pull Request by running:

git pull https://github.com/magro/kryo master

Or you can view, comment on it, or merge it online at:

#230

-- Commit Summary --

  • Add simple, queue based kryo pool.

-- File Changes --

A src/com/esotericsoftware/kryo/pool/KryoCallback.java (14)
A src/com/esotericsoftware/kryo/pool/KryoFactory.java (12)
A src/com/esotericsoftware/kryo/pool/KryoPool.java (81)
A test/com/esotericsoftware/kryo/pool/KryoPoolBenchmarkTest.java (159)
A test/com/esotericsoftware/kryo/pool/KryoPoolTest.java (112)

-- Patch Links --

https://github.com/EsotericSoftware/kryo/pull/230.patch
https://github.com/EsotericSoftware/kryo/pull/230.diff


Reply to this email directly or view it on GitHub:
#230

@magro
Copy link
Collaborator Author

magro commented Jul 21, 2014

Good point on using soft references!

On 07/21/2014 04:13 PM, Tumi wrote:

And if you go this way, for me is always preferrable to soft-reference
the hole backing queue than to softreference eache queue entry, as th GC
has less work to do. Reclaim all or nothing.

If there are kryo instances that grow rather big I'd say soft
referencing single instances would be preferable instead of throwing
away all of them.
I don't think that single soft references are that expensive in terms of
garbage collection, as long as there are not very many (tens, hundreds
of thousands?) kryo instances pooled - which I'd find rather unusual. Am
I wrong with that?

So I'd prefer to soft reference single kryo instances.

Cheers,
Martin

Tumi

-----Original Message-----
From: "Tumi" [email protected]
Date: Mon, 21 Jul 2014 14:05:40
To: EsotericSoftware/[email protected];
EsotericSoftware/[email protected]
Reply-To: [email protected]
Subject: Re: [kryo] Add simple, queue based kryo pool (#230)

Cool. Simple and clean :)

I would soft-reference the pooled instances, to release them when GC
reclaims space.

This would keep the simplicity of the artifact (zero-configuration)
while becoming GC-friendly.

Regards,
Tumi

-----Original Message-----
From: Martin Grotzke [email protected]
Date: Sun, 20 Jul 2014 18:04:31
To: EsotericSoftware/[email protected]
Reply-To: EsotericSoftware/kryo [email protected]
Subject: [kryo] Add simple, queue based kryo pool (#230)

I'm submitting this as a proposal for a simple kryo pool, didn't
want to push it directly into master. Some words on the suggested pool...

Kryo instances are created using a KryoFactory that's passed to
the pool.
The pool uses a ConcurrentLinkedQueue to manage kryo instances.
The pool also allows to run callbacks by passing a kryo instance.

The included KryoPoolBenchmarkTest (with ITER_CNT = 100000) shows
the following output for me (excerpt):

>>> With pool (average): 8 ms
>>> Without pool (average): 2,105 ms

KryoPool usage example:

KryoFactory factory = new KryoFactory() {
public Kryo create () {
Kryo kryo = new Kryo();
// configure kryo
return kryo;
}
};
KryoPool pool = new KryoPool(factory);
Kryo kryo = pool.borrow();
// do s.th. with kryo here, and afterwards release it
pool.release(kryo);

// or use a callback to work with kryo
String value = pool.run(new KryoCallback() {
public String execute(Kryo kryo) {
return kryo.readObject(input, String.class);
}
});

You can merge this Pull Request by running:

git pull https://github.com/magro/kryo master

Or you can view, comment on it, or merge it online at:

#230

-- Commit Summary --

  • Add simple, queue based kryo pool.

-- File Changes --

A src/com/esotericsoftware/kryo/pool/KryoCallback.java (14)
A src/com/esotericsoftware/kryo/pool/KryoFactory.java (12)
A src/com/esotericsoftware/kryo/pool/KryoPool.java (81)
A test/com/esotericsoftware/kryo/pool/KryoPoolBenchmarkTest.java (159)
A test/com/esotericsoftware/kryo/pool/KryoPoolTest.java (112)

-- Patch Links --

https://github.com/EsotericSoftware/kryo/pull/230.patch
https://github.com/EsotericSoftware/kryo/pull/230.diff


Reply to this email directly or view it on GitHub:
#230


Reply to this email directly or view it on GitHub
#230 (comment).

inoio gmbh - http://inoio.de
Schulterblatt 36, 20357 Hamburg
Amtsgericht Hamburg, HRB 123031
Geschäftsführer: Dennis Brakhane, Martin Grotzke, Ole Langbehn

@serverperformance
Copy link
Contributor

Yes you are right!

The only difference would be a doing single nullity check versus a loop while polling nulled references in borrow(), but in practice there is no difference. So better soft reference instances if you prefer

P.S: Many years ago I came to this article on IBM developerworks (umm, so much time since 2006), http://www.ibm.com/developerworks/library/j-jtp01246/index.html#2.1 , section “A poor man’s cache”, but you are right that in this case we are not talking about thousand of pooled instances.

Tumi

De: Martin Grotzke [mailto:[email protected]]
Enviado el: lunes, 21 de julio de 2014 16:36
Para: EsotericSoftware/kryo
CC: Tumi
Asunto: Re: [kryo] Add simple, queue based kryo pool (#230)

Good point on using soft references!

On 07/21/2014 04:13 PM, Tumi wrote:

And if you go this way, for me is always preferrable to soft-reference
the hole backing queue than to softreference eache queue entry, as th GC
has less work to do. Reclaim all or nothing.

If there are kryo instances that grow rather big I'd say soft
referencing single instances would be preferable instead of throwing
away all of them.
I don't think that single soft references are that expensive in terms of
garbage collection, as long as there are not very many (tens, hundreds
of thousands?) kryo instances pooled - which I'd find rather unusual. Am
I wrong with that?

So I'd prefer to soft reference single kryo instances.

Cheers,
Martin

Tumi

-----Original Message-----
From: "Tumi" [email protected]
Date: Mon, 21 Jul 2014 14:05:40
To: EsotericSoftware/[email protected];
EsotericSoftware/[email protected]
Reply-To: [email protected]
Subject: Re: [kryo] Add simple, queue based kryo pool (#230)

Cool. Simple and clean :)

I would soft-reference the pooled instances, to release them when GC
reclaims space.

This would keep the simplicity of the artifact (zero-configuration)
while becoming GC-friendly.

Regards,
Tumi

-----Original Message-----
From: Martin Grotzke [email protected]
Date: Sun, 20 Jul 2014 18:04:31
To: EsotericSoftware/[email protected]
Reply-To: EsotericSoftware/kryo [email protected]
Subject: [kryo] Add simple, queue based kryo pool (#230)

I'm submitting this as a proposal for a simple kryo pool, didn't
want to push it directly into master. Some words on the suggested pool...

Kryo instances are created using a KryoFactory that's passed to
the pool.
The pool uses a ConcurrentLinkedQueue to manage kryo instances.
The pool also allows to run callbacks by passing a kryo instance.

The included KryoPoolBenchmarkTest (with ITER_CNT = 100000) shows
the following output for me (excerpt):

>>> With pool (average): 8 ms
>>> Without pool (average): 2,105 ms

KryoPool usage example:

KryoFactory factory = new KryoFactory() { 
public Kryo create () { 
Kryo kryo = new Kryo(); 
// configure kryo 
return kryo; 
} 
}; 
KryoPool pool = new KryoPool(factory); 
Kryo kryo = pool.borrow(); 
// do s.th. with kryo here, and afterwards release it 
pool.release(kryo); 

// or use a callback to work with kryo 
String value = pool.run(new KryoCallback() { 
public String execute(Kryo kryo) { 
return kryo.readObject(input, String.class); 
} 
}); 

You can merge this Pull Request by running:

git pull https://github.com/magro/kryo master

Or you can view, comment on it, or merge it online at:

#230

-- Commit Summary --

  • Add simple, queue based kryo pool.

-- File Changes --

A src/com/esotericsoftware/kryo/pool/KryoCallback.java (14)
A src/com/esotericsoftware/kryo/pool/KryoFactory.java (12)
A src/com/esotericsoftware/kryo/pool/KryoPool.java (81)
A test/com/esotericsoftware/kryo/pool/KryoPoolBenchmarkTest.java (159)
A test/com/esotericsoftware/kryo/pool/KryoPoolTest.java (112)

-- Patch Links --

https://github.com/EsotericSoftware/kryo/pull/230.patch
https://github.com/EsotericSoftware/kryo/pull/230.diff


Reply to this email directly or view it on GitHub:
#230


Reply to this email directly or view it on GitHub
#230 (comment).

inoio gmbh - http://inoio.de
Schulterblatt 36, 20357 Hamburg
Amtsgericht Hamburg, HRB 123031
Geschäftsführer: Dennis Brakhane, Martin Grotzke, Ole Langbehn


Reply to this email directly or view it on GitHub #230 (comment) . https://github.com/notifications/beacon/6210680__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyMTU3MjU4NywiZGF0YSI6eyJpZCI6Mzc0NzI4NzZ9fQ==--d5dce9e20e31a325a8d96b4f5dd2e3437e6568be.gif

@romix
Copy link
Collaborator

romix commented Jul 21, 2014

+1 from me. Nice idea and pretty simple implementation.

@romix
Copy link
Collaborator

romix commented Jul 21, 2014

BTW, should there be an option to set a max size of the pool in a constructor, i.e. created a size-bound pool?

@serverperformance
Copy link
Contributor

I think in this case there is no need of that parametrization (and the small added complexity in borrow and release): there should a number of pooled instances <= the max number of threads (except for leaks or bad usages), so that limitatios is already configured in yout thread pool, if any.

(My opinion)

De: romix [mailto:[email protected]]
Enviado el: lunes, 21 de julio de 2014 17:12
Para: EsotericSoftware/kryo
CC: Tumi
Asunto: Re: [kryo] Add simple, queue based kryo pool (#230)

BTW, should there be an option to set a max size of the pool in a constructor, i.e. created a size-bound pool?


Reply to this email directly or view it on GitHub #230 (comment) . https://github.com/notifications/beacon/6210680__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyMTU3NDcyMywiZGF0YSI6eyJpZCI6Mzc0NzI4NzZ9fQ==--181d879e9bbae0c85ad07fad2469a3f555725ab4.gif

@mindwind
Copy link

I'm new for kryo.
I want to know, is it safe reuse Kryo instance ?
A reused Kryo instance internal state stay same with a new Kryo instance?
You borrow it from pool like a new instance, the internal state is clean and has no other side effect?

@romix
Copy link
Collaborator

romix commented Jul 23, 2014

@mindwind I think it is described on the GitHub Kryo home page, but I can explain it again:
Kryo instances are not thread-safe. They cannot be used by multiple threads simultaneously at the same time. But is fine to use the same Kryo instance by different threads at different times.

@mindwind
Copy link

@romix thanks, i will read the home page more carefully.

@magro
Copy link
Collaborator Author

magro commented Jul 23, 2014

I just changed the pool to cache kryo instances using SoftReferences...

@serverperformance
Copy link
Contributor

In borrow(), instead of making a single poll() , I think that you should loop while the soft-referenced polled instance is not null (as GC may have reclaimed one instance but not the next).

Cheers,
Tumi

-----Original Message-----
From: Martin Grotzke [email protected]
Date: Wed, 23 Jul 2014 06:54:40
To: EsotericSoftware/[email protected]
Reply-To: EsotericSoftware/kryo [email protected]
Cc: [email protected]
Subject: Re: [kryo] Add simple, queue based kryo pool (#230)

I just changed the pool to cache kryo instances using SoftReferences...


Reply to this email directly or view it on GitHub:
#230 (comment)

@magro
Copy link
Collaborator Author

magro commented Jul 23, 2014

@serverperformance Good point! I just changed this to

while((ref = queue.poll()) != null) {
  if((res = ref.get()) != null) {
    return res;
  }
}

@romix
Copy link
Collaborator

romix commented Jul 23, 2014

@magro Looks good. You should merge it, I'd say. And please add a corresponding section into the docs on our homepage.

@magro
Copy link
Collaborator Author

magro commented Jul 23, 2014

@romix Good point, I just added a section to the README. To you want to have a quick look at it?

@romix
Copy link
Collaborator

romix commented Jul 23, 2014

@magro I had a quick look.

Here are my quick comments:

  1. Change "you should use the KryoPool" to "you may want to use the KryoPool", because people may use their own pooling implementations as well

  2. For a callback-based approach make it more clear that it will borrow/release Kryo instances automatically and it should not be done by a callback

  3. "//configure kryo" should be "// configure kryo instance, customize settings"

  4. May be mention something about the size of the pool or how it behaves, i.e. that it uses soft-references and will release resources in case of a high memory load.

@@ -531,6 +532,36 @@ The serializers Kryo provides use the call stack when serializing nested objects

**Kryo is not thread safe. Each thread should have its own Kryo, Input, and Output instances. Also, the byte[] Input uses may be modified and then returned to its original state during deserialization, so the same byte[] "should not be used concurrently in separate threads**.

## Pooling Kryo instances

Because the creation/initialization of `Kryo` instances is rather expensive, in a multithreaded scenario you should use the `KryoPool`. Here's an example that shows how to use it:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say most of the time you'd want to use ThreadLocal and have a Kryo per thread. You'd only need a pool if you don't want a Kryo per thread for whatever reason. What is a reason you would not want a Kryo per thread? A thread is pretty heavyweight. I would guess you could have a Kryo per thread in just about any situation and be fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ThreadLocal is also techically possible. But to be honest I don't like ThreadLocals that much because it feels like a global variable that a component accesses, violating IoC / reducing control. Additionally it depends on the ThreadPool that's used and the threading strategy (what if Kryo/the ThreadLocal used from within an I/O event loop running with a single thread only?). So IMHO using ThreadLocals is more complex and the user needs to be sure in which invironment he uses a ThreadLocal.

Still, I'd mention ThreadLocal as a possible solution in the README.

Ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if Kryo/the ThreadLocal used from within an I/O event loop running
with a single thread only?

Seems like it would work fine? If you have a single thread, you don't need
more than one Kryo instance.

So IMHO using ThreadLocals is more complex and the user needs to be sure
in which invironment he uses a ThreadLocal.

Agreed, but only slightly. As long as both ways are mentioned I'm good with
the pool.

@romix
Copy link
Collaborator

romix commented Jul 23, 2014

@magro BTW, Martin, this PR reminds me of another issue, we discussed at some point (e.g. #182). Current PR is about pooling Kryo instances. But it would be also cool to have a nice and flexible way to create Kryo instances with the same configuration. Do we want to provide something for it as well?

@serverperformance
Copy link
Contributor

Looks perfect to me, nice work!

-----Original Message-----
From: Martin Grotzke [email protected]
Date: Wed, 23 Jul 2014 07:44:24
To: EsotericSoftware/[email protected]
Reply-To: EsotericSoftware/kryo [email protected]
Cc: [email protected]
Subject: Re: [kryo] Add simple, queue based kryo pool (#230)

@serverperformance Good point! I just changed this to

while((ref = queue.poll()) != null) {
  if((res = ref.get()) != null) {
    return res;
  }
}

Reply to this email directly or view it on GitHub:
#230 (comment)

@magro
Copy link
Collaborator Author

magro commented Jul 23, 2014

@romix Thanks for your suggestions, I improved the README (now as separate commit) and also mentioned ThreadLocal now. Still/again room for improvements?

Re configuration of Kryo instances, do you have s.th. in mind like Chill's KryoInstantiator that you mentioned in #182? Yes, we could provide this OOTB as well. I can look into this when I'm looking at memcached-session-manager and kryo2 again (which can take some time to be honest ;-)).

@romix
Copy link
Collaborator

romix commented Jul 23, 2014

@magro README is OK now, IMHO

Re configuration of Kryo instances, I don't have anything concrete at hand. One idea could be really to borrow Chill's approach. Or we could follow what you suggested in #182 and provide a builder API or something like that. Overall:

  • It would be nice to be able to have more flexibility when it comes to configuration/creation of Kryo instances. It would be nice to have it composable, i.e. being able to compose multiple different pieces of code, which configure different aspects of a Kryo instance (e.g. register serializers, customize serializer's settings, etc).
  • It would be nice to be able to "serialize" a configuration of a Kryo instance and all related serializers, so that another node (which performs deserialization) can (re)create a proper Kryo instance without a need to use the same (hand-written) piece of code, which was used by the serializing part. This is particularly important for scenarios where Kryo is to be used as a data storage format for persistent data or where clusters of nodes are formed very dynamically and it is not guaranteed that all of them are running exactly the same user-level code (e.g. they all run JVMs and Kryo in version X.Y.Z, but the apps/services on top of it are different).

Kryo instances are created using a `KryoFactory` that's passed to the pool.
The pool also allows to run callbacks by passing a kryo instance.
The pool uses a `ConcurrentLinkedQueue` to manage kryo instances.

The included `KryoPoolBenchmarkTest` (with `ITER_CNT = 100000`) shows
the following output for me (excerpt):

    >>> With pool (average): 8 ms
    >>> Without pool (average): 2,105 ms

KryoPool usage example:

```java
KryoFactory factory = new KryoFactory() {
  public Kryo create () {
    Kryo kryo = new Kryo();
    // configure kryo
    return kryo;
  }
};
KryoPool pool = new KryoPool(factory);
Kryo kryo = pool.borrow();
// do s.th. with kryo here, and afterwards release it
pool.release(kryo);

// or use a callback to work with kryo
String value = pool.run(new KryoCallback() {
  public String execute(Kryo kryo) {
    return kryo.readObject(input, String.class);
  }
});
```
magro added a commit that referenced this pull request Jul 25, 2014
Add simple, queue based kryo pool
@magro magro merged commit 23db0d8 into EsotericSoftware:master Jul 25, 2014
@brianfromoregon
Copy link

Late to the party, just wanted to share two thoughts

  1. we generally disallow soft refs in our apps because of impact on jvm, so would avoid the pool if it uses them :(

  2. I use a KryoFactory interface but learned that it needed two steps of creation instead of a single create() method. Right now they look like

    Kryo create(String name)
    void registration(Kryo)

My users chain factories together, end users want their KryoFactory to start with a Kryo created by the library, but all registration needs to happen after default serializers are added so that the correct serializer is chosen. So first create is called, and the chain of factories installs default serializers, and then registration is called when the chain of factories registers types. With only a single create method, the library's Factory has to do registration inside create() and so the end user's Factory cannot change the default serializer for those types.

@magro
Copy link
Collaborator Author

magro commented Jul 25, 2014

  1. we generally disallow soft refs in our apps because of impact on jvm, so would avoid the pool if it uses them :(

Can you go into more detail regarding the impact on the jvm (and share some docs/numbers perhaps)?
If this is an issue for you we could also provide 2 implementations of the pool (probably changing KryoPool to be an interface then), one using SoftReferences and one using hard references.

  1. I use a KryoFactory interface but learned that it needed two steps of creation instead of a single create() method.

Can't you just use your internal KryoFactory and chains of factories inside the newly introduced c.e.k.pool.KryoFactory?

@magro
Copy link
Collaborator Author

magro commented Jul 27, 2014

Ping @brianfromoregon

@brianfromoregon
Copy link

Hi, I think you don't need to change any of what you've produced for me, I just wanted to share the use case early on. Yes I think an interface is good practice anyway (Kryo should have an interface imo). Yes I could chain my factories inside of your factory, it would then just mean I have two KryoFactory types in my app when ideally there would be one.

Thanks!

@magro
Copy link
Collaborator Author

magro commented Jul 28, 2014

@brianfromoregon Would you use the pool if it wouldn't use SoftReferences? Which change would you suggest regarding the KryoFactory?

@brianfromoregon
Copy link

Hi, I might use the pool at home but at work I key Kryos by name so wouldn't need the pool. For the KryoFactory I would suggest keeping it simple like it is now, and keeping its use constained to the pool package. Because, KryoFactory overlaps in a small way with what I believe to be a large gap in the Kryo world: versioning. So I'd avoid committing code in that space ("how do we create and version Kryos") until a strategic direction is agreed upon.

I attended a great talk at oscon last week about designing APIs for reuse. I agreed with his argument that the key is a message format that can be extended (added to) over time. And for that, a good approach is having a "bag" of key/value pairs which can be added to PLUS not imposing structure in the message, keeping it as just data and letting reader decide structure they wish to impose. Anyway, it got me thinking about Kryo and the serialized forms I'm persisting to file/grid today. They are NOT extensible at all by default, just adding a field to a class breaks it. :( So with Kryo I can't use any of the advice he gave.... so how do we version with Kryo. I don't know, like i said it feels like a big gap that something like protobuf has a better answer for.

@NathanSweet
Copy link
Member

On Tue, Jul 29, 2014 at 1:39 AM, Brian Harris [email protected]
wrote:

I attended a great talk
http://www.oscon.com/oscon2014/public/schedule/detail/34922 at oscon
last week about designing APIs for reuse. I agreed with his argument that
the key is a message format that can be extended (added to) over time. And
for that, a good approach is having a "bag" of key/value pairs which can be
added to PLUS not imposing structure in the message, keeping it as just
data and letting reader decide structure they wish to impose. Anyway, it
got me thinking about Kryo and the serialized forms I'm persisting to
file/grid today. They are NOT extensible at all by default, just adding a
field to a class breaks it. :( So with Kryo I can't use any of the advice
he gave.... so how do we version with Kryo. I don't know, like i said it
feels like a big gap that something like protobuf has a better answer for.

Kryo doesn't limit you, you can write serializers that support versioning
and Kryo has even comes with a few. TaggedFieldSerializer is great for
allowing classes to be extended without breaking previously serialized
bytes and it does it with barely any performance cost. I highly recommend
it and I have put it to good use a number of times.
CompatibleFieldSerializer is more flexible and has limited forward and
backward compatibility, but has a performance hit.

-Nate

@brianfromoregon
Copy link

Why is TaggedFieldSerializer not the default if it has such high recommendation? Surely for out of the box behavior a slight performance hit would be worth the compatibility gain? I will definitely consider switching to it as default.

@NathanSweet
Copy link
Member

Because it requires tagging fields with an annotation.

On Tue, Jul 29, 2014 at 7:49 AM, Brian Harris [email protected]
wrote:

Why is TaggedFieldSerializer not the default if it has such high
recommendation? Surely for out of the box behavior a slight performance hit
would be worth the compatibility gain? I will definitely consider switching
to it as default.


Reply to this email directly or view it on GitHub
#230 (comment).

@romix
Copy link
Collaborator

romix commented Jul 29, 2014

Why is TaggedFieldSerializer not the default if it has such high recommendation? Surely for out of the
box behavior a slight performance hit would be worth the compatibility gain? I will definitely consider
switching to it as default.

I think also your guess about performance is not quite wrong. Many people use Kryo for in-flight data. Compatibility is often not an issue for them. If they would need compatibility, they would probably have used protostuff-runtime or avro (which both don't need any pre-defined schemas and can serialize almost any classes out of the box) or they would go for something like thrift, protobuffers, avro and other schema-based approaches.

I think if Kryo is to be seriously used for such scenarios where compatibility and versioning is so important, then we should eventually consider something like schema-based solutions (either predefined or dynamically generated).

Another dimension is where and how you keep the meta-data, i.e. your schema.

  • You may want to embed it in each serialized object graph, so that it is "self-contained" and can be always deserialized without any further information. This is very convenient and useful in certain scenarios. And this is where e.g. TaggedFieldSerializer could be used for starters.
  • But you may also keep your meta-data stored/available separately, so that you do not waste time when producing and space when persisting each of your serialized object graphs. This is typically a preferred way for long-term persistence of data in databases. Here we don't have anything at hand yet, AFAIK.

My conclusion: the problem is not as straight forward as it may seem. We need a good discussion resulting in reasonable strategy if we want to move forward in that direction.

@NathanSweet
Copy link
Member

On Tue, Jul 29, 2014 at 10:11 AM, romix [email protected] wrote:

I think if Kryo is to be seriously used for such scenarios where
compatibility and versioning is so important, then we should eventually
consider something like schema-based solutions (either predefined or
dynamically generated).

Kryo is already very good at versioning. Most applications only need
backward compatibility, ie newer versions of the app need to read
serialized bytes from any old version. I do this in my apps using
TaggedFieldSerializer and it works amazingly well.

One of my apps has released more than 200 versions over the years and any
version can read the serialized bytes from ANY previous version. During
this time my classes have evolved a lot. 99% of the time adding a new field
or deprecating and renaming an old field are sufficient and
TaggedFieldSerializer handles this. For my app there have been 3 different
instances where the class structure changed. In these cases I wrote a
serializer that reads the old class and converts it to the new classes as
the old bytes are deserialized.

The drawback to TaggedFieldSerializer is that it requires field
annotations, so isn't a good fit for classes you don't control. In this
case you can transfer data to and from classes you don't control and
classes that use TaggedFieldSerializer. Note this is the same thing you
should do when using FieldSerializer if you care about versioning, because
you don't want to rely on private fields for classes you don't control.

One reason to not use TaggedFieldSerializer is if you need forward
compatibility, ie older versions of the app need to read serialized bytes
from any new version. In practice I think this use case is relatively rare
and will never have such an elegant solution (CompatibleFieldSerializer
does it, but isn't terribly elegant).

I don't think it is very interesting to compete with protobuf, which
already works well. I think writing a schema as protobuf does loses most of
what makes Kryo so good: Kryo is good at using Java class definitions to
reduce the effort needed to use serialization efficiently.

Using a schema has it's own drawbacks. First, you need to write and
maintain the schema separately from your classes. IMO this is only
worthwhile if you need language interoperability. If you don't, it is much
more comfortable to use Kryo and leverage the class definitions, which
doesn't mean you lose forward and backward compatibility.

Using a schema also means you need to smush your data to fit in the schema.
Eg, protobuf advises against using protobuf generated classes as first
class citizens in your object model. You need to transfer data to and from
your objects and protobuf objects. Even if you use the protobuf objects
directly to avoid this, you still need to transfer data to and from classes
you don't control.

Another dimension is where and how you keep the meta-data, i.e. your
schema.

You may want to embed it in each serialized object graph, so that it
is "self-contained" and can be always deserialized without any further
information. This is very convenient and useful in certain scenarios. And
this is where e.g. TaggedFieldSerializer could be used for starters.

Both CompatibleFieldSerializer and TaggedFieldSerializer fit this
description. CompatibleFieldSerializer stores field names in the serialized
bytes, which can be quite a bit of overhead along with using chunked
encoding. TaggedFieldSerializer stores only a varint per field, which
usually means 1 byte per field. This is much more efficient and is about as
little additional overhead as possible.

But you may also keep your meta-data stored/available separately, so
that you do not waste time when producing and space when persisting each of
your serialized object graphs. This is typically a preferred way for
long-term persistence of data in databases. Here we don't have anything at
hand yet, AFAIK.

Sure we do. TaggedFieldSerializer falls under this category, which stores
the "schema" in the class files as annotations.

-Nate

@romix
Copy link
Collaborator

romix commented Jul 29, 2014

@nate: Good points and explanations about TaggedFieldSerializer.

Regarding schema-based solutions: You realize that AVRO and protostuff-runtime have a mode, where no pre-defined schema is required? It is generated dynamically during serialization based only on class-files, i.e. it is very similar to what Kryo does + it writes a schema (for each object graph). As a result, any avro instance and any protostuff-runtime instance can read the data serialized this way.

Regarding usage of TaggedFieldSerializer: From experience, I can tell you that DB people would almost certainly not accept non-schema based solution. This is just a fact of life ;-)

@romix
Copy link
Collaborator

romix commented Jul 29, 2014

BTW, Avro would write this whole automatically derived schema in front of the serialized data whereas protostuff-runtime would more or less embed the information in front of each field similar to TaggedFieldSerializer. I think recent versions of Hazelcast serialization also started using something similar.

@NathanSweet
Copy link
Member

On Tue, Jul 29, 2014 at 11:09 AM, romix [email protected] wrote:

@nate https://github.com/Nate: Good points and explanations about
TaggedFieldSerializer.

Regarding schema-based solutions: You realize that AVRO and
protostuff-runtime have a mode, where no pre-defined schema is required? It
is generated dynamically during serialization based only on class-files,
i.e. it is very similar to what Kryo does + it writes a schema (for each
object graph). As a result, any avro instance and any protostuff-runtime
instance can read the data serialized this way.

Yep, any instance can read the data, provided the class definitions match.
The schema and class definitions are tightly coupled, so managing the
schema in the class definition is advantageous. The "schema"
TaggedFieldSerializer writes to the serialized bytes is very minimal.

Regarding usage of TaggedFieldSerializer: From experience, I can tell you
that DB people would almost certainly not accept non-schema based solution.
This is just a fact of life ;-)

Huh? DBAs aren't people. ;)

I don't see the DB as a driving factor for how serialization is done. Kryo
likely only makes sense when putting opaque bytes in the DB, in which case
it doesn't matter that Java, Kryo, TaggedFieldSerializer and class
definitions are needed to interpret them. Otherwise there is a layer
between the serialized bytes and the DB where the data is represented as
objects and can be stuffed into columns in the DB however is necessary,
using data types the DB understands.

Cheers,
-Nate

@brianfromoregon
Copy link

Nathan, TaggedFieldSerializer sounds great, except asking my users to annotate their objects with @tag is not going to fly. Would it be possible to have a serializer which does the same thing as taggedfieldserializer (puts the varint with each field) except its written to assume that all fields are "annotated"/"tagged"? Then I could change my default serializer to this new thing and everyone gets backward compatibility out of the box.

@NathanSweet
Copy link
Member

On Tue, Jul 29, 2014 at 6:43 PM, Brian Harris [email protected]
wrote:

Nathan, TaggedFieldSerializer sounds great, except asking my users to
annotate their objects with @tag https://github.com/Tag is not going to
fly. Would it be possible to have a serializer which does the same thing as
taggedfieldserializer (puts the varint with each field) except its written
to assume that all fields are "annotated"/"tagged"? Then I could change my
default serializer to this new thing and everyone gets backward
compatibility out of the box.

Of course you can come up with any scheme you like. The tags let the
serializer know what field a value (and therefore the type of the value)
and if it should be ignored (the field was marked deprecated and
effectively removed). What int will you write to identify a value if you
don't have tags? You can write the field name and this is what
CompatibleFieldSerializer does.

-Nate

@romix
Copy link
Collaborator

romix commented Jul 29, 2014

This discussion about CompatibleFieldSerializer & alternatives is really interesting and useful, but it is getting more and more off-topic for this PR. I'd suggest to create a dedicated issue and continue this discussion there.

BTW, when it comes to a 3rd party classes and there is no way to annotate them in a usual way, one trick I used in some projects was to allow for "external" annotations. In this case, annotations are not provided inside class files, but elsewhere (config files, DB, etc) in a declarative way. Such external definitions would contain class names, names of the annotated fields and annotations themselves (i.e. the class name of an annotation and its parameters). Of course, we introduced a simple interface which abstracted away how annotations were accessed at the low-level (i.e. from a class file or from such a declarative description). This way we could "annotate" even third-party libs we were using.

@magro
Copy link
Collaborator Author

magro commented Jul 31, 2014

As you've probably seen I've submitted another pull request that changes the KryoPool to an interface and makes SoftReferences optional.

Regarding the discussion about versioning: +1 for @romix' suggestion, it would be great if we could continue the discussion or on the mailing list or a dedicated issue (and not stop it here :-)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants