Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degration of FSTLongOffheapMap #72

Closed
CoderFromCasterlyRock opened this issue Jun 19, 2015 · 2 comments
Closed

Performance degration of FSTLongOffheapMap #72

CoderFromCasterlyRock opened this issue Jun 19, 2015 · 2 comments

Comments

@CoderFromCasterlyRock
Copy link

Hi Guys

Firstly thanks for this wonderful library.
I am using FSTLongOffheapMap (version 2.29) on windows 7, jdk 1.7 to store some objects. I tested the latency of storing an object and the 99.99 percentile comes to about 100 micros. This is excellent considering I ran it on windows + didn't write a custom serializer + didn't tune the GC too much.

However, if I change the set-up as described in this post of mine, the performance degrade considerably.
http://stackoverflow.com/questions/30928650/performance-degradation-of-fast-serialization

I would really appreciate it if anyone has any ideas/suggestions.
I have a reproducible test in the above link as well.

Many thanks

@RuedigerMoeller
Copy link
Owner

Thanks for the report, I'll investigate asap.

@RuedigerMoeller
Copy link
Owner

Hi:

I think the test is flawed:

When running your sync (no queues + thread context switches) test in a loop (=proper warmup) I get a Mean of 0.7 micros and Max outlier of 14 micros (doubled number of elements in map though) storing a single event.
This is the performance of FST, the loss and latency you see is caused by your queuing/thread context switches. In addition the test has a flaw:

You put a burst of 50k events into a queue taking time on Event creation. As putting events is much faster than than storing events, you get accumulation: N'th event gets the latency of all 0..n-1 events accumulated ;).

It seems to be good in the first run due to missing JVM warm up: event creation is slow then so events get not queued up.

Other issues:

  1. Major: NO WARMUP. put a loop and let the test run several times (like 10) before looking at the numbers.

  2. (minor) Enqueing is done via offer without checking result

  3. you poll the queue doing a yield if no event is avaiable, this can lead to indeterministic latency spikes. Do the yield only after some backoff.

You should put an event into the queue each ~1-2 microsecond to avoid queuing up events and measuring aggregated times this way.

change TestFSTSerializer to:

        for( int i = 0; i< eventCount; i++ ){
            MktDataEvent event = new MktDataEvent( "EDM6", 99.0, (100 + i), 99.50, (200 + i) );
            dispatcher.enqueue( event );
            long nanos = System.nanoTime();
            while( System.nanoTime() - nanos < 3000 )
                Thread.yield();

        }

and main method (warmup, ignore first runs):

    public static void main( String ... args ) throws Exception{
        for (int i = 0; i < 1000; i++) {
            System.gc();
            Thread.sleep( 2000 );
            System.out.println("start test ==>");
            testDispatchAndPersistence( true );
//            testOffHeapPersistence();
        }

yields:

[Mean = 5.19, StdDeviation = 29.67]

[Max = 544.77, Total count = 50000]

[Buckets = 23, SubBuckets = 256]

Note that thread context switches cost you 3-8 microseconds (so high end kernel bypassed network's can be nearly as fast as queuing between threads !!).
You could try to use faster queues than java.concurrent ones to further reduce latency.

Note for later testing: as persistence relies on OS writeback eagerness, you need to tweak OS settings to writeback very un-eager and/or use SSD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants