Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway Timeout in VDK UI for Long-Running Jobs Freezes Jupyter Lab Session[Solution] #2789

Closed
duyguHsnHsn opened this issue Oct 11, 2023 · 1 comment
Assignees
Labels

Comments

@duyguHsnHsn
Copy link
Collaborator

Solves: #2741

Problem Narrative:

The issue at hand involves the VDK JupyterLab extension that initiates a server-side process expected to run for an extended period, sometimes over an hour (VDK operations like Run). However, the extension encounters a problem: it receives a 504 Gateway Time-out error while waiting for the process to conclude.This error isn't originating from the extension itself but likely from an intermediate proxy or the server configuration, indicative of a severed connection due to prolonged inactivity based on timeout settings.

Proposed solution:

Note: Here when we talk about server, we do not mean Control Service but the Python component of the extension.
Asynchronous Requests with Status Checks (Polling)
Given these constraints, the recommended strategy pivots to an asynchronous model, characterised by the following steps:

  1. Asynchronous Requests with Status Checks: The process begins with the client (React part of JupyterLab extension) sending a request to initiate the long-running task. Instead of keeping the request open(Python part of the extension), the server responds immediately with an acknowledgment that includes a unique task identifier for future status update.
  2. Initiate the Process: This approach necessitates server-side adjustments(in the extension, not in Control Service) to ensure it doesn't wait for task completion before responding. Instead, it must return a task ID or something similar, enabling the client to track the task's progress independently of the original request.
  3. Client-Side Polling: Armed with the task ID or something similar, the client implements a polling mechanism. Periodically, it dispatches a lightweight status inquiry to the server, using the task ID as a reference. This routine continues at regular intervals, ensuring the client remains updated regarding the task's status while avoiding connection inactivity.
  4. Handle Task Completion: Upon receiving server confirmation that the task has concluded, the client issues a final request to fetch the resultant data or undertake subsequent steps as necessary.

By decoupling the task's duration from the request-response cycle, this solution mitigates the risk of premature connection termination due to timeouts, ensuring that users receive the outcomes of their long-running tasks without disruption.

@yonitoo
Copy link
Contributor

yonitoo commented Oct 18, 2023

Created follow-up stories #2806 and #2807 based on this solution.

yonitoo added a commit that referenced this issue Nov 16, 2023
…ks in the background and tracks their status (#2869)

What: Introduce Task Runner, a polling mechanism that runs tasks in the
background and tracks their status.
Implements
[#2806](#2806) and
[#2807](#2807).

Why: Address the problem described in
[#2789](#2789).

Testing Done: CI/CD is passing. Introduced relevant tests

Signed-off-by: Yoan Salambashev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants