Performance degration of FSTLongOffheapMap #72

CoderFromCasterlyRock · 2015-06-19T01:56:23Z

Hi Guys

Firstly thanks for this wonderful library.
I am using FSTLongOffheapMap (version 2.29) on windows 7, jdk 1.7 to store some objects. I tested the latency of storing an object and the 99.99 percentile comes to about 100 micros. This is excellent considering I ran it on windows + didn't write a custom serializer + didn't tune the GC too much.

However, if I change the set-up as described in this post of mine, the performance degrade considerably.
http://stackoverflow.com/questions/30928650/performance-degradation-of-fast-serialization

I would really appreciate it if anyone has any ideas/suggestions.
I have a reproducible test in the above link as well.

Many thanks

RuedigerMoeller · 2015-06-19T09:04:09Z

Thanks for the report, I'll investigate asap.

RuedigerMoeller · 2015-06-19T15:30:30Z

Hi:

I think the test is flawed:

When running your sync (no queues + thread context switches) test in a loop (=proper warmup) I get a Mean of 0.7 micros and Max outlier of 14 micros (doubled number of elements in map though) storing a single event.
This is the performance of FST, the loss and latency you see is caused by your queuing/thread context switches. In addition the test has a flaw:

You put a burst of 50k events into a queue taking time on Event creation. As putting events is much faster than than storing events, you get accumulation: N'th event gets the latency of all 0..n-1 events accumulated ;).

It seems to be good in the first run due to missing JVM warm up: event creation is slow then so events get not queued up.

Other issues:

Major: NO WARMUP. put a loop and let the test run several times (like 10) before looking at the numbers.
(minor) Enqueing is done via offer without checking result
you poll the queue doing a yield if no event is avaiable, this can lead to indeterministic latency spikes. Do the yield only after some backoff.

You should put an event into the queue each ~1-2 microsecond to avoid queuing up events and measuring aggregated times this way.

change TestFSTSerializer to:

        for( int i = 0; i< eventCount; i++ ){
            MktDataEvent event = new MktDataEvent( "EDM6", 99.0, (100 + i), 99.50, (200 + i) );
            dispatcher.enqueue( event );
            long nanos = System.nanoTime();
            while( System.nanoTime() - nanos < 3000 )
                Thread.yield();

        }

and main method (warmup, ignore first runs):

    public static void main( String ... args ) throws Exception{
        for (int i = 0; i < 1000; i++) {
            System.gc();
            Thread.sleep( 2000 );
            System.out.println("start test ==>");
            testDispatchAndPersistence( true );
//            testOffHeapPersistence();
        }

yields:

[Mean = 5.19, StdDeviation = 29.67]

[Max = 544.77, Total count = 50000]

[Buckets = 23, SubBuckets = 256]

Note that thread context switches cost you 3-8 microseconds (so high end kernel bypassed network's can be nearly as fast as queuing between threads !!).
You could try to use faster queues than java.concurrent ones to further reduce latency.

Note for later testing: as persistence relies on OS writeback eagerness, you need to tweak OS settings to writeback very un-eager and/or use SSD.

RuedigerMoeller closed this as completed Jun 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degration of FSTLongOffheapMap #72

Performance degration of FSTLongOffheapMap #72

CoderFromCasterlyRock commented Jun 19, 2015

RuedigerMoeller commented Jun 19, 2015

RuedigerMoeller commented Jun 19, 2015

Performance degration of FSTLongOffheapMap #72

Performance degration of FSTLongOffheapMap #72

Comments

CoderFromCasterlyRock commented Jun 19, 2015

RuedigerMoeller commented Jun 19, 2015

RuedigerMoeller commented Jun 19, 2015