This repository has been archived by the owner on Dec 7, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Dataflow job logs router #150
Closed
Closed
Changes from 12 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
4b73e73
logs router first commit
cisaacstern e057902
logs router WIP
cisaacstern 4167666
logs router cont.
cisaacstern 057760f
test logs router first commit
cisaacstern 550bd86
test get_logs helper
cisaacstern 79204ae
test get logs route
cisaacstern 94283aa
drop defaults from get_logs helper
cisaacstern aed5d97
on run, save job_name & job_id in recipe_run message
cisaacstern b7244ac
fix issue comment test
cisaacstern 4c85405
preserve existing message, if there is one
cisaacstern 9198d9b
fix existing message syntax
cisaacstern ef92a15
add query job logs doc
cisaacstern c345f99
Merge remote-tracking branch 'origin/main' into logs-router
cisaacstern a484fab
update secrets + bakeries config for pr 150
cisaacstern cb2ffb4
tweak heroku.yml
cisaacstern e3da9da
Merge remote-tracking branch 'origin/main' into logs-router
cisaacstern 297b16f
factor logs query into plain funcs
cisaacstern b66cff1
factor logs routes to return recipe_run as intermediate output
cisaacstern 7391f50
add public trace routes WIP
cisaacstern 0161d09
finish trace route tests
cisaacstern 82b1dad
different assert for trace tests
cisaacstern 0e45aa3
update logs docs
cisaacstern fd37371
add note about logs retention
cisaacstern 2bf7d8a
ok, i guess drop the details assert
cisaacstern dae96e9
remove duplicate subprocess call. huge issue.
cisaacstern dbd4b98
Merge remote-tracking branch 'origin/main' into logs-router
cisaacstern 22093ed
patch for 132
cisaacstern 448aff0
Merge remote-tracking branch 'origin/main' into logs-router
cisaacstern ccb545b
add fetch_logs.py executable
cisaacstern 2cd67c5
dump dataflow logs to json
cisaacstern cbe42a7
comment out time.sleep
cisaacstern fb22a87
check for secrets in logs, rewrite logs router WIP
cisaacstern f26949a
log router tests rewrite WIP
cisaacstern 6be7691
get logs router tests to pass
cisaacstern fdb634f
setup config for pr 150 review app
cisaacstern 32ebbff
logs route tweaks
cisaacstern 34fb7ee
move logs test params into shared list
cisaacstern f3a081a
fix surface secrets func and test it
cisaacstern f1bd920
add gitleaks to Dockerfile
cisaacstern c7dd84e
Merge branch 'main' into logs-router
andersy005 3f4bc3e
Merge branch 'main' into logs-router
andersy005 a5d8848
Merge branch 'main' into logs-router
andersy005 87264eb
Merge branch 'main' into logs-router
andersy005 8805e0f
Merge branch 'main' into logs-router
andersy005 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
import json | ||
import subprocess | ||
|
||
from fastapi import APIRouter, Depends, HTTPException, Query | ||
from fastapi.responses import PlainTextResponse | ||
from sqlmodel import Session, SQLModel, select | ||
from starlette import status | ||
|
||
from ..dependencies import check_authentication_header, get_session as get_database_session | ||
from ..models import MODELS | ||
|
||
logs_router = APIRouter() | ||
|
||
DEFAULT_SEVERITY = Query( | ||
"ERROR", | ||
description=( | ||
"A valid gcloud logging severity as defined in " | ||
"https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#LogSeverity" | ||
), | ||
) | ||
DEFAULT_LIMIT = Query(1, description="Max number of log entries to return.") | ||
|
||
|
||
def job_id_from_recipe_run(recipe_run: SQLModel) -> str: | ||
try: | ||
job_id = json.loads(recipe_run.message)["job_id"] | ||
except (KeyError, json.JSONDecodeError) as e: | ||
detail = ( | ||
f"Message field of {recipe_run = } missing 'job_id'." | ||
if type(e) == KeyError | ||
else f"Message field of {recipe_run = } not JSON decodable." | ||
) | ||
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=detail) | ||
|
||
return job_id | ||
|
||
|
||
def get_logs( | ||
job_id: str, | ||
severity: str, | ||
limit: int, | ||
): | ||
log_name_prefix = "projects/pangeo-forge-4967/logs/dataflow.googleapis.com" | ||
query = ( | ||
'resource.type="dataflow_step" ' | ||
f'AND resource.labels.job_id="{job_id}" ' | ||
f'AND logName=("{log_name_prefix}%2Fjob-message" OR "{log_name_prefix}%2Flauncher") ' | ||
f'AND severity="{severity}"' | ||
) | ||
cmd = "gcloud logging read".split() + [query] | ||
|
||
if limit: | ||
cmd += [f"--limit={limit}"] | ||
|
||
logs = subprocess.check_output(cmd) | ||
return logs | ||
|
||
|
||
@logs_router.get( | ||
"/recipe_runs/{id}/logs", | ||
summary="Get job logs for a recipe_run, specified by database id.", | ||
tags=["recipe_run", "logs", "admin"], | ||
response_class=PlainTextResponse, | ||
dependencies=[Depends(check_authentication_header)], | ||
) | ||
async def logs_from_recipe_run_id( | ||
id: int, | ||
*, | ||
db_session: Session = Depends(get_database_session), | ||
severity: str = DEFAULT_SEVERITY, | ||
limit: int = DEFAULT_LIMIT, | ||
): | ||
recipe_run = db_session.exec( | ||
select(MODELS["recipe_run"].table).where(MODELS["recipe_run"].table.id == id) | ||
).one() | ||
job_id = job_id_from_recipe_run(recipe_run) | ||
logs = get_logs(job_id, severity, limit) | ||
return logs | ||
|
||
|
||
@logs_router.get( | ||
"/feedstocks/{feedstock_spec:path}/{commit}/{recipe_id}/logs", | ||
summary="Get job logs for a recipe run, specified by feedstock_spec, commit, and recipe_id.", | ||
tags=["feedstock", "logs", "admin"], | ||
response_class=PlainTextResponse, | ||
dependencies=[Depends(check_authentication_header)], | ||
) | ||
async def logs_from_feedstock_spec_commit_and_recipe_id( | ||
feedstock_spec: str, | ||
commit: str, | ||
recipe_id: str, | ||
*, | ||
db_session: Session = Depends(get_database_session), | ||
severity: str = DEFAULT_SEVERITY, | ||
limit: int = DEFAULT_LIMIT, | ||
): | ||
feedstock = db_session.exec( | ||
select(MODELS["feedstock"].table).where(MODELS["feedstock"].table.spec == feedstock_spec) | ||
).one() | ||
statement = ( | ||
select(MODELS["recipe_run"].table) | ||
.where(MODELS["recipe_run"].table.recipe_id == recipe_id) | ||
.where(MODELS["recipe_run"].table.head_sha == commit) | ||
.where(MODELS["recipe_run"].table.feedstock_id == feedstock.id) | ||
) | ||
recipe_run = db_session.exec(statement).one() | ||
job_id = job_id_from_recipe_run(recipe_run) | ||
logs = get_logs(job_id, severity, limit) | ||
return logs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cisaacstern, i'm curious... does this mean that we won't be exposing the logs on the front-end in the near future? Is the idea that if something is broken, as an admin I can query the API to figure out what's going on but any other user won't be able to see these logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersy005, based on @yuvipanda's comment #145 (comment), I thought the careful thing would be to keep these protected to start, until we've had a chance to think through if/what to do with them.
Yes, I was imagining that this might be the approach initially. But to be honest, I've been doing this for a while myself now (albeit without the help of an Orchestrator API route)... and it can be toilsome.
So perhaps the best thing is to just make these public to start with?
A third option would be leave the raw logs served here protected, but add another layer/route that applies some type of filtering to them, and the filtered/sanitized result could be passed to the frontend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersy005, for context, this is an example of the text that this route would return:
So entirely aside from leaking secrets, there's the question of how to format this into something that a frontend user would actually find useful (99% of it is apache beam boilerplate)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically only the last ~4 lines are relevant to actually debugging the recipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thought: this route doesn't offer any time of "follow"/"tail" functionality, and I'm not sure that it would perform especially well in a "stream logs to the frontend" capacity. Though conceivably we could do some string formatting to provide just a short error trace from the end of these logs, and display it on the frontend if
recipe_run.conclusion = failed
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... Perhaps focusing on de-risking our implementation is the right approach for a first try. Once we are ready to make access to the logs fully public, we can revisit this.
Are performance concerns the main reason for not supporting "stream-like" events or is this "stream" feature not available via
gcloud logging read
? I was wondering if this could be helpfulhttps://amittallapragada.github.io/docker/fastapi/python/2020/12/23/server-side-events.html
???
Debugging these long tracebacks will probably be a challenge. Nonetheless, I think providing easy access to the logs would be a great feature. Can these logs be filtered and structured (by another service/application) so they can be easily consumed by the front-end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersy005, in the latest commits I've added a public
/logs/trace
route which truncates the last portion of the traceback (for failed runs only) and serves it over a public route for the frontend to consume. Your feedback pushed me to figure out some way to make this useful now, and this is what I came up with. I totally agree that this is an essential feature to lighten the load on maintainers (I've been there!) and empower recipe contributors to be more self-sufficient.There will no doubt be edge cases to deal with, but hopefully this feature will useful in its current form, at least in certain circumstances. Once this PR is merged, I'll make a recommendation for next steps for the logging feature in a new issue.