Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimization: create a new torrent repository using DashMap #784

Merged

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Apr 9, 2024

Relates to:

This PR adds a new torrent repository implementation where the outer collection for torrents uses a DashMap.

This is something @mickvandijke was working on this PR.

There have been many changes. @da2ce7 has extracted a package for the repositories.

This PR adds a new repo using DashMap to the new package. However, it does not implement some of the extra features @mickvandijke added to the other PR. For example, it does not limit memory consumption. None of the other repos have that feature, so I suggest merging this PR and implementing that feature in the future for all repos.

Why

The current repository used in production is the one using a SkipMap data structure. DashMap was the first type we considered to allow adding new torrents in parallels.

The implementation with DashMap has not been merged because it does not guarantee the order when you iterate over the torrents. The tracker API returns torrents ordered by InfoHash. To avoid breaking the API that PR would need to add a new data structure (kind of Index) to keep the torrent list ordered.

This implementation helps us run performance tests with this option without spending much time fixing its limitations. It looks like the performance is similar to the SkiMap, so we don't need to use it in production now. We only need to keep it for benchmarking in the future if other versions are better.

Benchmarking

Running the Aquatic UDP load test, the DashMap looks slightly better.

SkipMap:

Requests out: 396788.68/second
Responses in: 357105.27/second
  - Connect responses:  176662.91
  - Announce responses: 176863.44
  - Scrape responses:   3578.91
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 105
  - p99.9: 287
  - p100: 351  

DashMap with initial capacity 0 (best result. On average is lower, similar to SkipMap):

Requests out: 410658.38/second
Responses in: 365892.86/second
  - Connect responses:  181258.91
  - Announce responses: 181005.95
  - Scrape responses:   3628.00
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 104
  - p99.9: 295
  - p100: 363

With Criterion:

image

image

image

image

Conclusion

From my point of view, other performance optimisations have more potential than this, considering this implementation is not finished. The changes needed to finish this implementation will probably decrease the performance obtained in this benchmarking. In the future, we can review this option if we change the tracker API behaviour to getting all torrents.

It will be used to create a new torrent repository implementation,
using a DashMap for the torrent list.

DashMap crate: https://crates.io/crates/dashmap
It's not enabled as the deafult repository becuase DashMap does not
return the items in order. Some tests fail:

``` output
failures:

---- core::services::torrent::tests::searching_for_torrents::should_return_torrents_ordered_by_info_hash stdout ----
thread 'core::services::torrent::tests::searching_for_torrents::should_return_torrents_ordered_by_info_hash' panicked at src/core/services/torrent.rs:303:13:
assertion `left == right` failed
  left: [BasicInfo { info_hash: InfoHash([158, 2, 23, 208, 250, 113, 200, 115, 50, 205, 139, 249, 219, 234, 188, 178, 194, 207, 60, 77]), seeders: 1, completed: 0, leechers: 0 }, BasicInfo { info_hash: InfoHash([3, 132, 5, 72, 100, 58, 242, 167, 182, 58, 159, 92, 188, 163, 72, 188, 113, 80, 202, 58]), seeders: 1, completed: 0, leechers: 0 }]
 right: [BasicInfo { info_hash: InfoHash([3, 132, 5, 72, 100, 58, 242, 167, 182, 58, 159, 92, 188, 163, 72, 188, 113, 80, 202, 58]), seeders: 1, completed: 0, leechers: 0 }, BasicInfo { info_hash: InfoHash([158, 2, 23, 208, 250, 113, 200, 115, 50, 205, 139, 249, 219, 234, 188, 178, 194, 207, 60, 77]), seeders: 1, completed: 0, leechers: 0 }]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

failures:
    core::services::torrent::tests::searching_for_torrents::should_return_torrents_ordered_by_info_hash

test result: FAILED. 212 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.18s

error: test failed, to rerun pass `--lib`
```

On the other hand, to use it, a new data strcuture should be added to the repo:

An Index with sorted torrents by Infohash

The API uses pagination returning torrents in alphabetically order by InfoHash. Adding such an Index would probably
decrease the performace of this repository implementation. And it's
performace looks similar to the current SkipMap implementation.

SkipMap performace with Aquatic UDP load test:

```
Requests out: 396788.68/second
Responses in: 357105.27/second
  - Connect responses:  176662.91
  - Announce responses: 176863.44
  - Scrape responses:   3578.91
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 105
  - p99.9: 287
  - p100: 351
```

DashMap performace with Aquatic UDP load test:

```
Requests out: 410658.38/second
Responses in: 365892.86/second
  - Connect responses:  181258.91
  - Announce responses: 181005.95
  - Scrape responses:   3628.00
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 104
  - p99.9: 295
  - p100: 363
```
@josecelano josecelano requested a review from da2ce7 April 9, 2024 15:28
@josecelano josecelano added the Optimization Make it Faster label Apr 9, 2024
It's only used for benchmarking.
Copy link

codecov bot commented Apr 9, 2024

Codecov Report

Attention: Patch coverage is 90.74074% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 78.16%. Comparing base (af52045) to head (4030fd1).

Files Patch % Lines
...nt-repository/src/repository/dash_map_mutex_std.rs 90.74% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #784      +/-   ##
===========================================
+ Coverage    77.87%   78.16%   +0.29%     
===========================================
  Files          158      159       +1     
  Lines         8758     8863     +105     
===========================================
+ Hits          6820     6928     +108     
+ Misses        1938     1935       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DashMap implementation does not support returning torrent ordered, so we
have to skip those tests for this reporitoy implementation.
@josecelano josecelano force-pushed the 565-torrent-repository-using-a-dashmap branch from c653eb3 to 4030fd1 Compare April 9, 2024 16:20
@josecelano
Copy link
Member Author

ACK 4030fd1

@josecelano josecelano linked an issue Apr 9, 2024 that may be closed by this pull request
@josecelano josecelano added this to the v3.0.0 milestone Apr 9, 2024
@josecelano josecelano merged commit b5fb03b into torrust:develop Apr 9, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Optimization Make it Faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use dashmap for torrent in memory storage instead of the std hashmap
1 participant