-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(vrl): add development design document #8735
Conversation
Signed-off-by: Jean Mertz <[email protected]>
This comment has been minimized.
This comment has been minimized.
Signed-off-by: Jean Mertz <[email protected]>
lib/vrl/DESIGN.md
Outdated
- Use `parse_*` for string decoding functions (e.g. `parse_json` and | ||
`parse_grok`). | ||
|
||
- Use `encode_*` for string encoding functions (e.g. `encode_base64`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be noted that some decoding functions are named decode_*
(decode_base64
and decode_percent
).
Ideally, we'd rectify this, but that's a breaking change. We can create issues for when a current implementation goes against the design principles so that we can do a sweep of breaking changes before we hit 1.0.
Alternatively, we accept this, document it, and keep things as is, even when we release 1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I'm torn on this, I prefer aligning on parse_base64
but using base64
directly it's referred to as "decode".
I'd lean towards internal consistency, maybe document "this decodes base64 encoded values" similar to us referencing "exploding" with the unnest
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some distinction here between functions that parse a string into some other type (like parse_json
or parse_int
) and functions that turn a string into another string (decode_base64
or decode_percent
).
To me parse_base64
reads pretty weird given no "parsing" is really happening, just translation. I think parse
makes sense for only for structured data.
It does makes encode
a little confusing since it is used as the opposite of both decode
and parse
when you might only expect it to be the opposite of decode
. I might suggest we use to_X
like to_json
and to_key_value
but this would overload to_X
given it is also used for converting to primitives. We could go the Go/Java route of unmarshal
and marshal
?
Ideally, we'd rectify this, but that's a breaking change. We can create issues for when a current implementation goes against the design principles so that we can do a sweep of breaking changes before we hit 1.0.
I think we could also just alias the functions to rename them to the desired scheme; dropping the deprecated aliases at 1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 marshal
/unmarshal
seem natural given my history in Go - but it is a larger change (aliasing seems reasonable)
lib/vrl/DESIGN.md
Outdated
|
||
- Return boolean from `is_*` functions (e.g. `is_string`). | ||
|
||
- Return an error when `parse_*` functions fail to decode the string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All but one parse_*
functions are fallible. parse_query_string
being the exception.
|
||
- `parse_*` functions should almost always error when used incorrectly. | ||
|
||
- `to_*` functions must never fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't currently true because we decided to have to_*
functions also take a string, meaning it does both decoding and type conversion.
We can either update this rule to match reality, or (at some point before 1.0) do a breaking change to update the functions to match this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 would that end up being something like:
if is_integer(.field) {
.field = int(.field)
}
lib/vrl/DESIGN.md
Outdated
{ "error": "invalid data format" } | ||
``` | ||
|
||
TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you've seen any common patterns in the wild that are worth mentioning here, please let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like this pattern.
I am wondering if we'll want to introduce "pipelining" at some point like:
data = decode_base64(.message) |> parse_json
That is really just syntax sugar though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pipe operator support is tracked in https://github.com/timberio/vector/issues/5431, but I don't think there's anything to note here, since it isn't actually part of the language yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment I'd generally make about VRL is that users seem to generally be using !
just about everywhere. This may be because most of the docs/examples use it for terseness, or because it's easier, or even because their event flow is much more known in advance and they need less handling.
lib/vrl/DESIGN.md
Outdated
- Use `parse_*` for string decoding functions (e.g. `parse_json` and | ||
`parse_grok`). | ||
|
||
- Use `encode_*` for string encoding functions (e.g. `encode_base64`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I'm torn on this, I prefer aligning on parse_base64
but using base64
directly it's referred to as "decode".
I'd lean towards internal consistency, maybe document "this decodes base64 encoded values" similar to us referencing "exploding" with the unnest
function
|
||
- `parse_*` functions should almost always error when used incorrectly. | ||
|
||
- `to_*` functions must never fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 would that end up being something like:
if is_integer(.field) {
.field = int(.field)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JeanMertz ! This seems like a great start and will help guide decisions in VRL in the future. I especially like the target audience and calling out of anti-patterns that we've rejected.
I can review again when the TODOs are addressed.
lib/vrl/DESIGN.md
Outdated
- Use `parse_*` for string decoding functions (e.g. `parse_json` and | ||
`parse_grok`). | ||
|
||
- Use `encode_*` for string encoding functions (e.g. `encode_base64`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some distinction here between functions that parse a string into some other type (like parse_json
or parse_int
) and functions that turn a string into another string (decode_base64
or decode_percent
).
To me parse_base64
reads pretty weird given no "parsing" is really happening, just translation. I think parse
makes sense for only for structured data.
It does makes encode
a little confusing since it is used as the opposite of both decode
and parse
when you might only expect it to be the opposite of decode
. I might suggest we use to_X
like to_json
and to_key_value
but this would overload to_X
given it is also used for converting to primitives. We could go the Go/Java route of unmarshal
and marshal
?
Ideally, we'd rectify this, but that's a breaking change. We can create issues for when a current implementation goes against the design principles so that we can do a sweep of breaking changes before we hit 1.0.
I think we could also just alias the functions to rename them to the desired scheme; dropping the deprecated aliases at 1.0.
lib/vrl/DESIGN.md
Outdated
- For one or more parameters, the first parameter must be the "target" of the | ||
function (e.g. `parse_regex(target: <string>, pattern: <regex>)`). | ||
|
||
- The first parameter must therefor almost always be named `target`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The first parameter must therefor almost always be named `target`. | |
- The first parameter must therefore almost always be named `target`. |
lib/vrl/DESIGN.md
Outdated
{ "error": "invalid data format" } | ||
``` | ||
|
||
TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like this pattern.
I am wondering if we'll want to introduce "pipelining" at some point like:
data = decode_base64(.message) |> parse_json
That is really just syntax sugar though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent start. I left a few comments, I'm curious what you think?
Signed-off-by: Jean Mertz <[email protected]>
Signed-off-by: Jean Mertz <[email protected]>
Signed-off-by: Jean Mertz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start. I think it would be worth adding a section about how VRL handles types and type inference, particularly in light of the statement that "All errors are caught at compile time."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing!
Would it be worth going into more detail about the type defs returned from a function? For example, it could be worth mentioning that some functions return objects with fields that may or may not be defined - the type defs for those fields should be in the definition as Kind::Bytes | Kind::Null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great, though I'd think it'd be helpful to discuss the design of the fallibility system itself a bit here too. The way that it interacts with the type system in particular seems to make up the bulk of the language's learning curve.
Signed-off-by: Jean Mertz <[email protected]>
This one should be ready for a final round of reviews. I'll file issues for any lingering comments, to allow us to iterate on this document in future PRs, and avoid keeping this PR open until all i's are dotted and t's are crossed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this!
performant as it can be, and there's no way to use functions in such a way | ||
that performance of a VRL program tanks. We might introduce network calls at | ||
some point, if we find a good caching solution to solve most of our concerns, | ||
but so far we've avoided any network calls inside our stdlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biggest issue here is async. If we want to start introducing IO we will need to rewire VRL calls to be async. If we do it well it shouldn't affect performance too much since the runtime can be handling other events while waiting on IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. I don't think it fits this document, but worth keeping in mind.
Signed-off-by: Jean Mertz <[email protected]>
Signed-off-by: Jean Mertz <[email protected]>
closes #3740
There's still a few TODO's left in the document, but it's well enough along that an initial round of reviews is welcome. Any additions will be made in follow-up commits, to make reviewing them easier.
👀 RENDERED