-
Notifications
You must be signed in to change notification settings - Fork 1
Conversation
docs/README.md
Outdated
To query logs for recipe runs, see the API docs at the `/docs#/logs` route of the deployment | ||
(https://api.pangeo-forge.org/docs#/logs for the production deployment). | ||
These routes are protected, due to the risk of leaking secrets in the logs. The | ||
`PANGEO_FORGE_API_KEY` is available to admins via in the deployments secrets config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cisaacstern, i'm curious... does this mean that we won't be exposing the logs on the front-end in the near future? Is the idea that if something is broken, as an admin I can query the API to figure out what's going on but any other user won't be able to see these logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersy005, based on @yuvipanda's comment #145 (comment), I thought the careful thing would be to keep these protected to start, until we've had a chance to think through if/what to do with them.
Is the idea that if something is broken, as an admin I can query the API to figure out what's going on but any other user won't be able to see these logs?
Yes, I was imagining that this might be the approach initially. But to be honest, I've been doing this for a while myself now (albeit without the help of an Orchestrator API route)... and it can be toilsome.
So perhaps the best thing is to just make these public to start with?
A third option would be leave the raw logs served here protected, but add another layer/route that applies some type of filtering to them, and the filtered/sanitized result could be passed to the frontend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersy005, for context, this is an example of the text that this route would return:
https://gist.github.com/cisaacstern/2a79707feaf27c5c0a2d4d93e5738fe5
So entirely aside from leaking secrets, there's the question of how to format this into something that a frontend user would actually find useful (99% of it is apache beam boilerplate)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically only the last ~4 lines are relevant to actually debugging the recipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thought: this route doesn't offer any time of "follow"/"tail" functionality, and I'm not sure that it would perform especially well in a "stream logs to the frontend" capacity. Though conceivably we could do some string formatting to provide just a short error trace from the end of these logs, and display it on the frontend if recipe_run.conclusion = failed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was imagining that this might be the approach initially.
I see... Perhaps focusing on de-risking our implementation is the right approach for a first try. Once we are ready to make access to the logs fully public, we can revisit this.
One more thought: this route doesn't offer any time of "follow"/"tail" functionality, and I'm not sure that it would perform especially well in a "stream logs to the frontend" capacity.
Are performance concerns the main reason for not supporting "stream-like" events or is this "stream" feature not available via gcloud logging read
? I was wondering if this could be helpful
https://amittallapragada.github.io/docker/fastapi/python/2020/12/23/server-side-events.html
???
So entirely aside from leaking secrets, there's the question of how to format this into something that a frontend user would actually find useful (99% of it is apache beam boilerplate)...
Debugging these long tracebacks will probably be a challenge. Nonetheless, I think providing easy access to the logs would be a great feature. Can these logs be filtered and structured (by another service/application) so they can be easily consumed by the front-end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these logs be filtered and structured ... so they can be easily consumed by the front-end?
@andersy005, in the latest commits I've added a public /logs/trace
route which truncates the last portion of the traceback (for failed runs only) and serves it over a public route for the frontend to consume. Your feedback pushed me to figure out some way to make this useful now, and this is what I came up with. I totally agree that this is an essential feature to lighten the load on maintainers (I've been there!) and empower recipe contributors to be more self-sufficient.
There will no doubt be edge cases to deal with, but hopefully this feature will useful in its current form, at least in certain circumstances. Once this PR is merged, I'll make a recommendation for next steps for the logging feature in a new issue.
With 22093ed, {
"recipe_id": "noaa-oisst-avhrr-only",
"bakery_id": 1,
"feedstock_id": 1,
"head_sha": "f0e28b410c0fc9bdcaeffea1f1676a623c85e30b",
"version": "",
"started_at": "2022-09-30T08:21:28",
"completed_at": null,
"conclusion": null,
"status": "in_progress",
"is_test": true,
"dataset_type": "zarr",
"dataset_public_url": null,
"message": "{\"job_name\": \"a70666f7267652d70722d3135302e6865726f6b756170702e636f6d2504\", \"job_id\": \"2022-09-30_01_21_42-16154033814144718767\"}",
"id": 4,
"bakery": {
"region": "foo",
"name": "pangeo-ldeo-nsf-earthcube",
"description": "bar",
"id": 1
},
"feedstock": {
"spec": "pforgetest/test-staged-recipes",
"provider": "github",
"id": 1
}
} |
I'm now somewhat blocked because I've deployed a few jobs from the review app, to test this feature, but they all appear to be stalled. These are all test runs of recipes, which should not require more than ~20 mins max, but they have all been running for +/- 2 hrs with no obvious progress. Too tired to figure out why this is now, so I'll revisit tomorrow. |
I've converted this back into a draft, to indicate that getting sidetracked by #156 prevented me from finishing it this week. If I'm not yet on parental leave on Monday, I'll pick it up then. If I am, and someone else would like to finish it (which you would be more than welcome to), here's what I was planning to do next:
|
This PR remains incomplete. Here are a few notes for anyone who may be interested in completing it (which you would be welcome to do). If it remains incomplete in early November, I will revisit it at that time.
|
No description provided.