Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udp: handle udp requests concurrently #644

Merged
merged 1 commit into from
Jan 24, 2024

Conversation

da2ce7
Copy link
Contributor

@da2ce7 da2ce7 commented Jan 24, 2024

Extracted from #557

Upgrades the udp implementation to handle requests concurrently, no timeout is needed as udp is non-blocking.

Hard-coded limit of 50 concurrent requests. With trivial changes can make this limit configurable, however I don't think that it would really be much benefit for it, the main concern is memory usage, and since udp is no blocking each request should return very quickly, could set to 5000 and it should still be fine.

@da2ce7 da2ce7 requested a review from a team as a code owner January 24, 2024 10:29
@da2ce7 da2ce7 mentioned this pull request Jan 24, 2024
5 tasks
@josecelano
Copy link
Member

@da2ce7 da2ce7 force-pushed the 20240124_udp_reqs_concurrent branch from 7d8ec06 to 72c8348 Compare January 24, 2024 10:37
@josecelano josecelano added this to the v3.0.0 milestone Jan 24, 2024
@josecelano josecelano linked an issue Jan 24, 2024 that may be closed by this pull request
1 task
@da2ce7
Copy link
Contributor Author

da2ce7 commented Jan 24, 2024

Hi @da2ce7 did you read my comment about dynamically limiting the number of concurrent requests?

This is not related to udp requests, as udp requests are non-blocking.

Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (444c395) 77.03% compared to head (72c8348) 77.61%.

Files Patch % Lines
src/servers/udp/server.rs 84.33% 13 Missing ⚠️
src/servers/udp/mod.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #644      +/-   ##
===========================================
+ Coverage    77.03%   77.61%   +0.57%     
===========================================
  Files          130      131       +1     
  Lines         8519     8555      +36     
===========================================
+ Hits          6563     6640      +77     
+ Misses        1956     1915      -41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@da2ce7
Copy link
Contributor Author

da2ce7 commented Jan 24, 2024

@josecelano I mean, I doubt there is any real improvement above 5 requests at the same time. The limit is the locking speed of our data-structures, and the kernel sending the udp packets.

@josecelano
Copy link
Member

josecelano commented Jan 24, 2024

@josecelano I mean, I doubt there is any real improvement above 5 requests at the same time. The limit is the locking speed of our data-structures, and the kernel sending the udp packets.

@da2ce7 my point is:

  • The main bottleneck is accessing the data in the core tracker service (our data-structures).
  • All HTTP and UDP trackers use the same core service.
  • The high-level services (HTTP and UDP) make requests to the low-level core service. And they have to wait (lock) until they can access the core data.
  • The more pending requests at the core level the more response time (from the core service and from the HTTP and UDP to the final users) and the more memory consumption.

The idea is to reject requests at the high level when they detect the core level is taking too long to respond. This should balance the load and avoid server degradation.

Anyway, that limit could be implemented on top of this PR once is merged. I think it can be easily changed.

Copy link
Member

@josecelano josecelano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @da2ce7 I think it's a good functionality and good implementation but it's not compatible with dynamically changing the number of active requests since the 50 limit is hardcoded in the type.

I would merge it but if we want to make it dynamic in the future we will need to change this code.

match result {
Ok(udp_request) => {
trace!("Received Request from: {}", udp_request.from);
Self::make_response(tracker.clone(), socket.clone(), udp_request).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @da2ce7 I would rename make_response to handle_request.

@josecelano
Copy link
Member

ACK 72c8348

@josecelano
Copy link
Member

Hi @da2ce7 another concern is whether we should spawn a new thread for each request or it would be better to have a fixed amount of workers each one processing many requests (sequentially).

See #611 (comment)

If I'm not wrong this is the approach for some webservers like Nginx and Apache.

Below ChatGPT feedback:

Spawning a New Thread per Request (up to 50)

Advantages:

  1. Responsiveness: Each request is immediately assigned its own thread, potentially leading to faster response times as each request is processed in parallel.
  2. Simplicity: The logic can be simpler as each thread only handles one request, avoiding the complexities of managing a shared worker pool.
  3. Scalability: This approach can handle spikes in requests efficiently, as new threads are created as needed (up to the limit).

Disadvantages:

  1. Resource Intensive: Threads consume system resources (memory and CPU). Having too many threads can lead to high context-switching overhead and increased memory usage.
  2. Scalability Limit: The maximum cap of 50 threads may not be sufficient for high loads, and increasing this limit can strain the system.
  3. Lack of Control: It's harder to manage and monitor individual threads as compared to a fixed pool of workers.

Using a Fixed Number of Worker Threads

Advantages:

  1. Resource Efficiency: A fixed number of threads can be more efficient in terms of resource usage, as there's a limit to how many threads are running at any given time.
  2. Predictability: This model offers more predictable performance, as the system is not continuously creating and destroying threads.
  3. Easier to Manage: Managing a fixed pool of threads is typically simpler, especially for monitoring and debugging purposes.
  4. Load Balancing: You can implement more sophisticated load balancing strategies among the fixed set of worker threads.

Disadvantages:

  1. Potential for Bottlenecks: If the number of worker threads is too low, they might become a bottleneck during peak loads.
  2. Complexity in Implementation: Managing a pool of worker threads and the distribution of requests to them can be more complex than simply spawning a new thread per request.
  3. Less Responsive to Spikes: Unlike the dynamic approach, a fixed number of threads might not handle sudden spikes in traffic as responsively.

Conclusion

The choice between these two approaches depends on your specific requirements and constraints:

  • If your application experiences highly variable loads with significant spikes, dynamically creating threads (up to a limit) might be more beneficial.
  • However, if the load is relatively predictable and you want more control over resource usage, a fixed number of worker threads would be more suitable.

In many modern applications, especially those that are I/O-bound like network applications, an asynchronous or event-driven model (such as using an event loop and non-blocking I/O) can often provide a more scalable solution than either of these threading models. This might be worth considering depending on the specifics of your application and its environment.

@da2ce7
Copy link
Contributor Author

da2ce7 commented Jan 24, 2024

@josecelano Tokio has a light-weight threading model, with a global thread pool that matches the number of execution cores of the system.

While there is some overhead in creating tokio tasks, they are far lighter-weight than system-threads. I think that having a fixed set of worker threads dedicated to udp, while more optimal, would not give the significant benefit above light-weight threads used here.

The "costly" part in my code is the "yield_now()" call, this happens when there is around 50 active tasks, it will temporally pause the handling of new requests, and give an opportunity for the other tasks to finish up... one attack/abuse would be that the task-pool is deliberately exhausted, causing many unnecessarily pauses for the yield...

... however the answer for this is to have a two-level task-pool, where the tasks are grouped by client that created the request:

pool (tasks grouped by client) <- pool (tasks for client)

With such a design you would need to connect with many different clients to fill the pool, each client gets its own pool... the client local pool would be blocking if full... but the task grouped by client would yield when full.

@josecelano
Copy link
Member

@josecelano Tokio has a light-weight threading model, with a global thread pool that matches the number of execution cores of the system.

While there is some overhead in creating tokio tasks, they are far lighter-weight than system-threads. I think that having a fixed set of worker threads dedicated to udp, while more optimal, would not give the significant benefit above light-weight threads used here.

The "costly" part in my code is the "yield_now()" call, this happens when there is around 50 active tasks, it will temporally pause the handling of new requests, and give an opportunity for the other tasks to finish up... one attack/abuse would be that the task-pool is deliberately exhausted, causing many unnecessarily pauses for the yield...

... however the answer for this is to have a two-level task-pool, where the tasks are grouped by client that created the request:

pool (tasks grouped by client) <- pool (tasks for client)

With such a design you would need to connect with many different clients to fill the pool, each client gets its own pool... the client local pool would be blocking if full... but the task grouped by client would yield when full.

Hi @da2ce7 I did not remember that Tokyo tasks are not equal to threads. Thank you for your explanation. I'm going to merge it, my other concern it's not clear yet. @WarmBeer is working on limiting memory consumption. I have the feeling that we need a global solution for limiting both the CPU and memory consumption. If @WarmBeer solution works we can later limit the CPU consumption to avoid system degradation.

@josecelano josecelano merged commit dee86be into torrust:develop Jan 24, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Process UDP requests concurrently
2 participants