Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract parser: update heuristic for mixed logs #717

Merged
merged 6 commits into from
Aug 26, 2024
Merged

Conversation

matyax
Copy link
Contributor

@matyax matyax commented Aug 23, 2024

The first log of the sample can be anything, and the fact that we're only checking one felt awkward.

Updated the implementation to:

  • Check all the lines
  • Use JSON.parse() instead of checking the first character, like we do in LogRowMessage.
  • Add both parsers when the type is mixed and not only json

@matyax matyax requested a review from a team as a code owner August 23, 2024 09:06
@matyax matyax self-assigned this Aug 23, 2024
@matyax
Copy link
Contributor Author

matyax commented Aug 23, 2024

Follow up from the chat we had yesterday. BTW, I don't mind going back to checking the first character (after trimming) like we were doing, and we can also apply both parsers and drop error on JSON (I don't know the original reason for that).

What felt non-negotiable was only checking the first line.

@gtk-grafana
Copy link
Contributor

To recap our discussion in slack: This is a harder problem then it looks, if the result is mostly one format or the other, but occasionally mixed, we risk firing extra requests, or request looping despite any checking we do on the client.

We can always choose to use both parsers, which reduces the number of requests, but increases the cost of every query.

I know @cyriltovena gave us the ok in the previous version of the app to use both parsers, but maybe it's worth having the conversation again?
CC @trevorwhitney

My gut says that the cost of running unnecessary queries is higher then using two parsers all the time, but that has performance implications for users that have good data that is 100% one format.

Straw person: We use both parsers, and if users need to squeeze out more performance we add a config to the loki datasource to assert if the datasource is json, logfmt, or mixed?

Crazy idea: Can we have a new "auto" parser?

@matyax
Copy link
Contributor Author

matyax commented Aug 23, 2024

Auto parser would be amazing.

@gtk-grafana
Copy link
Contributor

gtk-grafana commented Aug 23, 2024

Would be a win for logQL users as well

@matyax matyax force-pushed the matyax/mixed-logs branch from 3e90712 to c6ba227 Compare August 23, 2024 12:48
@matyax matyax requested a review from gtk-grafana August 23, 2024 12:58
@matyax
Copy link
Contributor Author

matyax commented Aug 23, 2024

What do you think? Should we go with this improvement and resume the conversation of smarter alternatives later?

const linesField = data.fields.find((f) => f.name === 'Line' || f.name === 'body');
result.type = linesField?.values[0]?.[0] === '{' ? 'json' : 'logfmt';
linesField?.values.forEach((value: string) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use a normal for loop or some or something that will stop executing when we find a non-json log line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/forEach#description

There is no way to stop or break a forEach() loop other than by throwing an exception. If you need such behavior, the forEach() method is the wrong tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. No need to be suboptimal.

@gtk-grafana
Copy link
Contributor

This is a good change, but I'll want some more time to test on Monday, worried about re-introducing the request looping bug

@matyax matyax force-pushed the matyax/mixed-logs branch from afd04d2 to 1c5f610 Compare August 23, 2024 19:39
@matyax matyax requested a review from gtk-grafana August 23, 2024 19:39
Copy link
Contributor

@gtk-grafana gtk-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and is working well locally. I like how this only executes another query for mixed json results! Great work Matias

@matyax matyax merged commit f1c1ddc into main Aug 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants