Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-167] [Feature] Supress stdout while running dbt as imported module. #4687

Closed
1 task done
alanmcruickshank opened this issue Feb 5, 2022 · 8 comments
Closed
1 task done
Labels
duplicate This issue or pull request already exists enhancement New feature or request logging

Comments

@alanmcruickshank
Copy link
Contributor

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

This relates to sqlfluff/sqlfluff-github-actions#15.

As part of sqlfluff, we use dbt under the hood to template the code before linting. This still works well 👍.

We also publish a github action which supports annotations. This normally works by linting the file for any bad sql, and then redirecting the output of that command to a file, which can then be picked up to power annotations within a pull request (see readme and examples).

There is a recent tricky bug with that github action where the redirection to that file is picking up an unexpected item which isn't json, namely the line 08:11:10 Partial parse save file not found. Starting full parse.. This happens because we're redirecting stdout to file, and picking up the dbt events which get forwarded to stdout at the same time as the linting output which we were expecting. The reason that this is harder to control than normal is that in functions.py dbt sets up it's own link steam directly to stdout rather than just relying on the python logging framework.

I would love a way to suppress, or at least redirect, any output to stdout from dbt when used in this way (as an imported package, see templater.py for exactly how it's imported).

Describe alternatives you've considered

  1. The current workaround in place is to recommend people use grep to remove the offending line from the output afterwards, but that's gross (SyntaxError: Unexpected token : in JSON at position 2 sqlfluff/sqlfluff-github-actions#15).

  2. Considered forcing the use of the legacy logging framework (by setting the dbt flag ENABLE_LEGACY_LOGGER), but that feels like building a dependency on old code which I imagine is scheduled for removal at some point.

Who will this benefit?

SQLFluff users who want to use the github action with their dbt project will definitely benefit.

Anyone importing dbt as a python library in the future will also benefit, because they can have more control over how dbt logs.

Are you interested in contributing this feature?

I can help contribute, but would love significant input on how we solve this feature before I touch anything.

Anything else?

No response

@alanmcruickshank alanmcruickshank added enhancement New feature or request triage labels Feb 5, 2022
@github-actions github-actions bot changed the title [Feature] Supress stdout while running dbt as imported module. [CT-167] [Feature] Supress stdout while running dbt as imported module. Feb 5, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 5, 2022

Thanks @alanmcruickshank! I'd want someone on the team to weigh in with firmer details, but my sense/hope is that you could silence dbt-core's INFO-level logs by importing and modifying events.functions.STDOUT_LOG, namely its log level. Alternately, this could be a good use for a more general-purpose --log-level config, as an extension of the conversation in #3451 (cc @emmyoop).

You could also try setting the flag / env var for JSON-formatted logging? Similar to how you're setting flags here. That wouldn't hide the message, but would prevent the syntax error.

I think you raised a really fair point in sqlfluff/sqlfluff-github-actions#15 (comment):

To make things more unhelpful, dbt is using a logger called default_stdout rather than dbt. and so it's not immediately obvious that things are coming from there.

I agree that this type of use case is something we'll see more of in the future, such that we should develop a good answer. I'll also give the obligatory reminder that "importing dbt as a python library" is not (yet) an officially supported interface for dbt-core. Definitely a goal to get there :)

@alanmcruickshank
Copy link
Contributor Author

Thanks Jeremy. Importing and modifying a flag feels like an appropriately robust solution for this: not too hacky, but also not too cast-iron that it makes it hard to iterate.

If the team like that solution, let me know which flag and I might be able to solve this all on the sqlfluff side and not need to change anything on the dbt side. 👍

@emmyoop
Copy link
Member

emmyoop commented Feb 7, 2022

@alanmcruickshank, before looking down this path, I just want to confirm that you don't want or need the logs?

You could also try setting the flag / env var for JSON-formatted logging? Similar to how you're setting flags here. That wouldn't hide the message, but would prevent the syntax error.

This option would force all the logs to be json so the offending line would become {"code": "I030", "data": {}, "invocation_id": "ce4bd719-92f5-4919-b3f9-5ba786eb2793", "level": "info", "log_version": 2, "msg": "Partial parse save file not found. Starting full parse.", "pid": 55965, "thread_name": "MainThread", "ts": "2022-02-07T15:59:46.298663Z", "type": "log_line"}.

@ChenyuLInx
Copy link
Contributor

@emmyoop I looked at the code a bit seems like we don’t have an argument to set those events logging levels right? (is that something needed?) This level_overwrite link will always be a warning.
And do you think we will need to support more output other than stdout, feels like this might be useful for other cases too.

@alanmcruickshank
Copy link
Contributor Author

I just want to confirm that you don't want or need the logs?

@emmyoop for this use case I think we'd want to supress entirely, or at least redirect to a separate file rather than stdout. JSON errors are easier to filter out, but we'd still want to filter them.

In other use cases I think the JSON errors could be helpful.

@emmyoop
Copy link
Member

emmyoop commented Feb 7, 2022

@alanmcruickshank we have a change in the works by two community contributors to add a flag to suppress all logs to stdout. I've just sketched out the idea over in #3451 if you want to check it out! Adding a flag to suppress the logs is only half of that specific task, but would get you what you need for this use case.

@ChenyuLInx most logs currently go to stdout as well as a file. There are a few exceptions but that is generally the default behavior. You are correct that we don't have an argument that can be passed in to set that log level other than the --debug flag. Definitely worth a conversation!

@emmyoop emmyoop removed the triage label Feb 7, 2022
@alanmcruickshank
Copy link
Contributor Author

Ah perfect. The solution suggested in #3451 would work perfectly in this case. 👍

Is someone already assigned to work on it or are you wanting extra hands to contribute?

@emmyoop
Copy link
Member

emmyoop commented Feb 7, 2022

@alanmcruickshank we've already had two community members commit to work together on this issue. Thanks for the offer though!

I'm going to close this as a duplicate since the functionality coming out of #3451 will meet your needs as well! Feel free to comment over there if you have anything to add.

@emmyoop emmyoop closed this as completed Feb 7, 2022
@emmyoop emmyoop added the duplicate This issue or pull request already exists label Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request logging
Projects
None yet
Development

No branches or pull requests

4 participants