udp: handle udp requests concurrently #644

da2ce7 · 2024-01-24T10:29:49Z

Extracted from #557

Upgrades the udp implementation to handle requests concurrently, no timeout is needed as udp is non-blocking.

Hard-coded limit of 50 concurrent requests. With trivial changes can make this limit configurable, however I don't think that it would really be much benefit for it, the main concern is memory usage, and since udp is no blocking each request should return very quickly, could set to 5000 and it should still be fine.

josecelano · 2024-01-24T10:36:42Z

Hi @da2ce7 did you read my comment about dynamically limiting the number of concurrent requests?

da2ce7 · 2024-01-24T10:39:26Z

Hi @da2ce7 did you read my comment about dynamically limiting the number of concurrent requests?

This is not related to udp requests, as udp requests are non-blocking.

codecov · 2024-01-24T10:42:03Z

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (444c395) 77.03% compared to head (72c8348) 77.61%.

Files	Patch %	Lines
src/servers/udp/server.rs	84.33%	13 Missing ⚠️
src/servers/udp/mod.rs	0.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #644      +/-   ##
===========================================
+ Coverage    77.03%   77.61%   +0.57%     
===========================================
  Files          130      131       +1     
  Lines         8519     8555      +36     
===========================================
+ Hits          6563     6640      +77     
+ Misses        1956     1915      -41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

da2ce7 · 2024-01-24T10:45:40Z

@josecelano I mean, I doubt there is any real improvement above 5 requests at the same time. The limit is the locking speed of our data-structures, and the kernel sending the udp packets.

josecelano · 2024-01-24T10:53:28Z

@josecelano I mean, I doubt there is any real improvement above 5 requests at the same time. The limit is the locking speed of our data-structures, and the kernel sending the udp packets.

@da2ce7 my point is:

The main bottleneck is accessing the data in the core tracker service (our data-structures).
All HTTP and UDP trackers use the same core service.
The high-level services (HTTP and UDP) make requests to the low-level core service. And they have to wait (lock) until they can access the core data.
The more pending requests at the core level the more response time (from the core service and from the HTTP and UDP to the final users) and the more memory consumption.

The idea is to reject requests at the high level when they detect the core level is taking too long to respond. This should balance the load and avoid server degradation.

Anyway, that limit could be implemented on top of this PR once is merged. I think it can be easily changed.

josecelano

Hi @da2ce7 I think it's a good functionality and good implementation but it's not compatible with dynamically changing the number of active requests since the 50 limit is hardcoded in the type.

I would merge it but if we want to make it dynamic in the future we will need to change this code.

josecelano · 2024-01-24T10:58:07Z

src/servers/udp/server.rs

+            match result {
+                Ok(udp_request) => {
+                    trace!("Received Request from: {}", udp_request.from);
+                    Self::make_response(tracker.clone(), socket.clone(), udp_request).await;


Hi @da2ce7 I would rename make_response to handle_request.

josecelano · 2024-01-24T11:01:00Z

ACK 72c8348

josecelano · 2024-01-24T12:31:32Z

Hi @da2ce7 another concern is whether we should spawn a new thread for each request or it would be better to have a fixed amount of workers each one processing many requests (sequentially).

See #611 (comment)

If I'm not wrong this is the approach for some webservers like Nginx and Apache.

Below ChatGPT feedback:

Spawning a New Thread per Request (up to 50)

Advantages:

Responsiveness: Each request is immediately assigned its own thread, potentially leading to faster response times as each request is processed in parallel.
Simplicity: The logic can be simpler as each thread only handles one request, avoiding the complexities of managing a shared worker pool.
Scalability: This approach can handle spikes in requests efficiently, as new threads are created as needed (up to the limit).

Disadvantages:

Resource Intensive: Threads consume system resources (memory and CPU). Having too many threads can lead to high context-switching overhead and increased memory usage.
Scalability Limit: The maximum cap of 50 threads may not be sufficient for high loads, and increasing this limit can strain the system.
Lack of Control: It's harder to manage and monitor individual threads as compared to a fixed pool of workers.

Using a Fixed Number of Worker Threads

Advantages:

Resource Efficiency: A fixed number of threads can be more efficient in terms of resource usage, as there's a limit to how many threads are running at any given time.
Predictability: This model offers more predictable performance, as the system is not continuously creating and destroying threads.
Easier to Manage: Managing a fixed pool of threads is typically simpler, especially for monitoring and debugging purposes.
Load Balancing: You can implement more sophisticated load balancing strategies among the fixed set of worker threads.

Disadvantages:

Potential for Bottlenecks: If the number of worker threads is too low, they might become a bottleneck during peak loads.
Complexity in Implementation: Managing a pool of worker threads and the distribution of requests to them can be more complex than simply spawning a new thread per request.
Less Responsive to Spikes: Unlike the dynamic approach, a fixed number of threads might not handle sudden spikes in traffic as responsively.

Conclusion

The choice between these two approaches depends on your specific requirements and constraints:

If your application experiences highly variable loads with significant spikes, dynamically creating threads (up to a limit) might be more beneficial.
However, if the load is relatively predictable and you want more control over resource usage, a fixed number of worker threads would be more suitable.

In many modern applications, especially those that are I/O-bound like network applications, an asynchronous or event-driven model (such as using an event loop and non-blocking I/O) can often provide a more scalable solution than either of these threading models. This might be worth considering depending on the specifics of your application and its environment.

da2ce7 · 2024-01-24T14:53:36Z

@josecelano Tokio has a light-weight threading model, with a global thread pool that matches the number of execution cores of the system.

While there is some overhead in creating tokio tasks, they are far lighter-weight than system-threads. I think that having a fixed set of worker threads dedicated to udp, while more optimal, would not give the significant benefit above light-weight threads used here.

The "costly" part in my code is the "yield_now()" call, this happens when there is around 50 active tasks, it will temporally pause the handling of new requests, and give an opportunity for the other tasks to finish up... one attack/abuse would be that the task-pool is deliberately exhausted, causing many unnecessarily pauses for the yield...

... however the answer for this is to have a two-level task-pool, where the tasks are grouped by client that created the request:

pool (tasks grouped by client) <- pool (tasks for client)

With such a design you would need to connect with many different clients to fill the pool, each client gets its own pool... the client local pool would be blocking if full... but the task grouped by client would yield when full.

josecelano · 2024-01-24T15:05:00Z

@josecelano Tokio has a light-weight threading model, with a global thread pool that matches the number of execution cores of the system.

While there is some overhead in creating tokio tasks, they are far lighter-weight than system-threads. I think that having a fixed set of worker threads dedicated to udp, while more optimal, would not give the significant benefit above light-weight threads used here.

The "costly" part in my code is the "yield_now()" call, this happens when there is around 50 active tasks, it will temporally pause the handling of new requests, and give an opportunity for the other tasks to finish up... one attack/abuse would be that the task-pool is deliberately exhausted, causing many unnecessarily pauses for the yield...

... however the answer for this is to have a two-level task-pool, where the tasks are grouped by client that created the request:
pool (tasks grouped by client) <- pool (tasks for client)
With such a design you would need to connect with many different clients to fill the pool, each client gets its own pool... the client local pool would be blocking if full... but the task grouped by client would yield when full.

Hi @da2ce7 I did not remember that Tokyo tasks are not equal to threads. Thank you for your explanation. I'm going to merge it, my other concern it's not clear yet. @WarmBeer is working on limiting memory consumption. I have the feeling that we need a global solution for limiting both the CPU and memory consumption. If @WarmBeer solution works we can later limit the CPU consumption to avoid system degradation.

da2ce7 requested a review from a team as a code owner January 24, 2024 10:29

da2ce7 temporarily deployed to coverage January 24, 2024 10:29 — with GitHub Actions Inactive

da2ce7 mentioned this pull request Jan 24, 2024

Config Overhaul #557

Closed

5 tasks

udp: handle udp requests concurrently

72c8348

da2ce7 force-pushed the 20240124_udp_reqs_concurrent branch from 7d8ec06 to 72c8348 Compare January 24, 2024 10:37

da2ce7 temporarily deployed to coverage January 24, 2024 10:37 — with GitHub Actions Inactive

josecelano added this to the v3.0.0 milestone Jan 24, 2024

josecelano linked an issue Jan 24, 2024 that may be closed by this pull request

Process UDP requests concurrently #611

Closed

1 task

josecelano approved these changes Jan 24, 2024

View reviewed changes

josecelano merged commit dee86be into torrust:develop Jan 24, 2024
12 checks passed

da2ce7 mentioned this pull request Mar 26, 2024

Implement a proper graceful shutdown for the UDP tracker #596

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

udp: handle udp requests concurrently #644

udp: handle udp requests concurrently #644

da2ce7 commented Jan 24, 2024

josecelano commented Jan 24, 2024

da2ce7 commented Jan 24, 2024 •

edited

Loading

codecov bot commented Jan 24, 2024 •

edited

Loading

da2ce7 commented Jan 24, 2024

josecelano commented Jan 24, 2024 •

edited

Loading

josecelano left a comment

josecelano Jan 24, 2024

josecelano commented Jan 24, 2024

josecelano commented Jan 24, 2024

da2ce7 commented Jan 24, 2024

josecelano commented Jan 24, 2024

udp: handle udp requests concurrently #644

udp: handle udp requests concurrently #644

Conversation

da2ce7 commented Jan 24, 2024

josecelano commented Jan 24, 2024

da2ce7 commented Jan 24, 2024 • edited Loading

codecov bot commented Jan 24, 2024 • edited Loading

Codecov Report

da2ce7 commented Jan 24, 2024

josecelano commented Jan 24, 2024 • edited Loading

josecelano left a comment

Choose a reason for hiding this comment

josecelano Jan 24, 2024

Choose a reason for hiding this comment

josecelano commented Jan 24, 2024

josecelano commented Jan 24, 2024

Spawning a New Thread per Request (up to 50)

Using a Fixed Number of Worker Threads

Conclusion

da2ce7 commented Jan 24, 2024

josecelano commented Jan 24, 2024

da2ce7 commented Jan 24, 2024 •

edited

Loading

codecov bot commented Jan 24, 2024 •

edited

Loading

josecelano commented Jan 24, 2024 •

edited

Loading