Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache to http server and mutlithread with eio #119

Closed
wants to merge 5 commits into from

Conversation

ailrst
Copy link

@ailrst ailrst commented Jan 8, 2025

This substantially improves the throughput of the server, testing with random opcodes gives around 9000 opcodes / sec.

$ wrk2 -t 8 -c64 -d 30s -s stress.lua -R10000 --latency http://localhost:8000

#[Mean    =     1150.503, StdDeviation   =     1521.297]
#[Max     =    11919.360, Total count    =       190688]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  284353 requests in 30.04s, 104.75MB read
  Socket errors: connect 0, read 0, write 0, timeout 2
  Non-2xx or 3xx responses: 193219
Requests/sec:   9465.37
Transfer/sec:      3.49MB

image

We can see in these tests the in-memory cache has little effect as it is always cold (and the test opcodes are uniformly random, and there are far more of them than the size of the cache). For real programs the cache hit rate should be higher. The 'varnishcachehot' and 'varnishcachecold' are tested by putting the varnish http cache in front of the server, these in the hot case it has pre-cached all the opcodes, and in the cold cache it has just been restarted. We can probably get a nix wrapper to do this?

testing with varnish:

nix-shell -p varnish wrk2
mkdir ~/varnish
varnishd   -b localhost:8000 -a localhost:8001 -n ~/varnish
cd ../aslp/aslp_server
wrk2 -t 64 -c64 -d 100s -s stress.lua -R10000 --latency http://localhost:8001

Before these changes we had:

#[Mean    =    17361.798, StdDeviation   =     4917.355]
#[Max     =    27901.952, Total count    =        23948]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  33878 requests in 30.00s, 11.59MB read
  Socket errors: connect 0, read 0, write 0, timeout 53
  Non-2xx or 3xx responses: 23437
Requests/sec:   1129.21

I think its possibly worth separating the server and http client into a separate repo.

I also tested with an lru cache on the cpp side and it didn't have a positive impact, I suspect an LRU cache is not going to be effective with random input.


Parallelism strategy

We have 1 request handler thread (which the (not threadsafe) in-memory cache lives in) and multiple lifter threads which accept work through Eio.ExecutorPool.

Basic benchmarks show that adding threads to the request handler doesn't improve performance as we are significantly bound by the lifting side.

Copy link
Member

@katrinafyi katrinafyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

it would be nice to separate the shutdown endpoint from ?opcode=. also, it is useful to have the "Disassembling 0x000" text to show when there is a cache miss. can this be added back?

@ailrst
Copy link
Author

ailrst commented Jan 16, 2025

it is useful to have the "Disassembling 0x000" text to show when there is a cache miss

If we're doing 1000 opcodes a second is it really that useful? I the IO has a noticeable but slight performance penalty, I could put it back in behind a flag?

@ailrst
Copy link
Author

ailrst commented Jan 16, 2025

Also note this has been moved to https://github.com/uq-pac/aslp-rpc

@katrinafyi
Copy link
Member

is it really that useful?

yeah add it back

@ailrst ailrst closed this Jan 21, 2025
@katrinafyi katrinafyi deleted the server-perf branch February 5, 2025 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants