diff --git a/BENCHMARK.md b/BENCHMARK.md new file mode 100644 index 0000000..a8ec73b --- /dev/null +++ b/BENCHMARK.md @@ -0,0 +1,58 @@ +# Benchmark + +## v0.1.0 + +To evaluate the performance of the `AsyncBatcher` library, we used the [Keras example](examples/keras) +with [locust](https://locust.io/) to simulate multiple users making requests to the server. + +In this example, we have a tensorflow model that takes ~11ms to make a single prediction and ~13ms to process +a batch of 200 predictions. + +For this benchmark, we ran the FastAPI server with Uvicorn with 12 workers, +on a MacBook Pro with Apple M2 Max chip (12 cores). + +Predict endpoint: + +![predict__total_requests_per_second.png](assets%2Fpredict__total_requests_per_second.png) + +Optimized Predict endpoint: + +![optimized_predict__total_requests_per_second.png](assets%2Foptimized_predict__total_requests_per_second.png) + +As we can see from the graphs, for the `/predict` endpoint, the failure rate started to increase linearly at the RPS +of ~284 requests/second, and the 95th percentile response time was ~1100ms at this point. Then with the increase of +the RPS, the number of successful requests was ~130 requests/second in average, with a high failure rate +(~300 from ~400 requests/second). + +For the `/optimized_predict` endpoint, the failure was smaller than 3 requests/second (0.00625%) during the whole test, +the average response time was increasing slightly with the increase of the RPS, and it reached ~120ms in average at +the end of the test (> 480 requests/second), with a 95th percentile response time almost stable and smaller than 500ms. + +You can find check the reports for more details: +- [predict__report.html](assets/predict__report.html) +- [optimized_predict__report.html](assets/optimized_predict__report.html) + +## v0.2.0 +In the version 0.2.0, we replaced the threading implementation with a full asyncio implementation, and we added support +for multiple concurrent batch processing. + +These two changes improved the performance of the `AsyncBatcher` package, and to evaluate the performance, we re-ran the +load tests with the new version, the new configuration, and a faster increase in the number of requests. +```python +from concurrent.futures import ThreadPoolExecutor + +executor = ThreadPoolExecutor(max_workers=12) +batcher = MlBatcher( + model=model, + max_queue_time=0.01, + concurrency=12, + executor=executor +) +``` +And the results were impressive: + +![optimized_predict__total_requests_per_second_v0.2.0.png](assets%2Foptimized_predict__total_requests_per_second_v0.2.0.png) + +The average response time was increasing very slowly with the increase of the RPS, and it reached ~96ms in average at +the end of the test with almost ~500 requests/second (-20% of the first version), with a 95th percentile response time +almost stable and smaller than 300ms for the whole test (-40% of the first version). diff --git a/README.md b/README.md index cff4297..e0a36ac 100644 --- a/README.md +++ b/README.md @@ -27,35 +27,9 @@ async_batcher.start() ## Benchmark -To evaluate the performance of the `AsyncBatcher` library, we used the [Keras example](examples/keras) -with [locust](https://locust.io/) to simulate multiple users making requests to the server. +The benchmark is available in the [BENCHMARK.md](https://github.com/hussein-awala/async-batcher/blob/main/BENCHMARK.md) +file. -In this example, we have a tensorflow model that takes ~11ms to make a single prediction and ~13ms to process -a batch of 200 predictions. - -For this benchmark, we ran the FastAPI server with Uvicorn with 12 workers, -on a MacBook Pro with Apple M2 Max chip (12 cores). - -Predict endpoint: - -![predict__total_requests_per_second.png](assets%2Fpredict__total_requests_per_second.png) - -Optimized Predict endpoint: - -![optimized_predict__total_requests_per_second.png](assets%2Foptimized_predict__total_requests_per_second.png) - -As we can see from the graphs, for the `/predict` endpoint, the failure rate started to increase linearly at the RPS -of ~284 requests/second, and the 95th percentile response time was ~1100ms at this point. Then with the increase of -the RPS, the number of successful requests was ~130 requests/second in average, with a high failure rate -(~300 from ~400 requests/second). - -For the `/optimized_predict` endpoint, the failure was smaller than 3 requests/second (0.00625%) during the whole test, -the average response time was increasing slightly with the increase of the RPS, and it reached ~120ms in average at -the end of the test (> 480 requests/second), with a 95th percentile response time almost stable and smaller than 500ms. - -You can find check the reports for more details: -- [predict__report.html](assets/predict__report.html) -- [optimized_predict__report.html](assets/optimized_predict__report.html) ## Use cases The `AsyncBatcher` library can be used in any application that needs to handle asynchronous requests in batches, diff --git a/assets/optimized_predict__report_v0.2.0.html b/assets/optimized_predict__report_v0.2.0.html new file mode 100644 index 0000000..51fe90a --- /dev/null +++ b/assets/optimized_predict__report_v0.2.0.html @@ -0,0 +1,277 @@ + + + + + + + + + Locust + + +
+ + + + + + \ No newline at end of file diff --git a/assets/optimized_predict__total_requests_per_second_v0.2.0.png b/assets/optimized_predict__total_requests_per_second_v0.2.0.png new file mode 100644 index 0000000..2041d04 Binary files /dev/null and b/assets/optimized_predict__total_requests_per_second_v0.2.0.png differ