Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log fetching #20

Open
cisaacstern opened this issue Aug 31, 2022 · 3 comments
Open

Log fetching #20

cisaacstern opened this issue Aug 31, 2022 · 3 comments

Comments

@cisaacstern
Copy link
Member

Reposting @sharkinsspatial's comment from https://github.com/pangeo-forge/registrar/issues/53#issuecomment-1232274752 here, since I believe this will be the best place to discuss this topic going forward:

@cisaacstern I had put together an initial POC with option 2 from above, but based on our discussions the other day I'm going to take a slightly different track that is more aligned with the pangeo-forge-runner approach. Next step include

  1. A PR for pangeo-forge-runner which extends https://github.com/yuvipanda/pangeo-forge-runner/blob/main/pangeo_forge_runner/bakery/dataflow.py to allow requests against a bakery with a job_id and a logs filter pattern and returns a log object.
  2. Extend pangeo-forge-orchestrator with an endpoint which wraps this call.

A few concerns I have with this approach

  1. The potentially large response body depending upon the filter pattern which is applied.
  2. The potential for logs to leak secrets and the lack of security around the pangeo-forge-orchestrator endpoints.

@yuvipanda Does this approach seem structurally ok from the pangeo-forge-runner perspective or should I build out some alternative classes rather than including this directly in the bakery dataflow subclass?

@cisaacstern
Copy link
Member Author

IMHO this path makes a lot of sense, and the questions above are good ones to frame implementation.

To state one perhaps obvious idea, design-wise it makes sense to me for to add a placeholder Bakery.get_logs method to the base bakery, as is done for:

https://github.com/yuvipanda/pangeo-forge-runner/blob/d4ba5045538ac53a7457049acde0d2350f4af37b/pangeo_forge_runner/bakery/base.py#L25

and then have each bakery type override this method with its own implementation. For DataflowBakery, there's already an assumption (though not a requirement) that the gcloud CLI will exist (and be authenticated) in the environment, so perhaps fetching logs via subprocess calls to gcloud would be a good default pathway for DataflowBakery.get_logs?

@cisaacstern
Copy link
Member Author

lack of security around the pangeo-forge-orchestrator endpoints

We do have the ability to lock certain endpoints with authentication headers (and already do so for create/delete/update operations). Certainly we could do so for these endpoints as well (or a subset of them), and let https://pangeo-forge.org authenticate via a next.js server-side call. That being said, this approach doesn't really change the fact that the logs will eventually be publicly queryable (whether the user is querying directly, or with https://pangeo-forge.org as an intermediary).

@cisaacstern
Copy link
Member Author

Just made a few notes on this subject in pangeo-forge/pangeo-forge-orchestrator#150 (comment). Specifically, the fetch_logs executable as part of that PR (adapted from some code @yuvipanda shared) would be great basis for this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant