Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(vrl): add development design document #8735

Merged
merged 8 commits into from
Oct 1, 2021
Merged

Conversation

JeanMertz
Copy link
Contributor

@JeanMertz JeanMertz commented Aug 16, 2021

closes #3740

There's still a few TODO's left in the document, but it's well enough along that an initial round of reviews is welcome. Any additions will be made in follow-up commits, to make reviewing them easier.

👀 RENDERED

@JeanMertz JeanMertz added type: task Generic non-code related tasks domain: vrl Anything related to the Vector Remap Language labels Aug 16, 2021
@netlify

This comment has been minimized.

Comment on lines 124 to 127
- Use `parse_*` for string decoding functions (e.g. `parse_json` and
`parse_grok`).

- Use `encode_*` for string encoding functions (e.g. `encode_base64`).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that some decoding functions are named decode_* (decode_base64 and decode_percent).

Ideally, we'd rectify this, but that's a breaking change. We can create issues for when a current implementation goes against the design principles so that we can do a sweep of breaking changes before we hit 1.0.

Alternatively, we accept this, document it, and keep things as is, even when we release 1.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm torn on this, I prefer aligning on parse_base64 but using base64 directly it's referred to as "decode".

I'd lean towards internal consistency, maybe document "this decodes base64 encoded values" similar to us referencing "exploding" with the unnest function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some distinction here between functions that parse a string into some other type (like parse_json or parse_int) and functions that turn a string into another string (decode_base64 or decode_percent).

To me parse_base64 reads pretty weird given no "parsing" is really happening, just translation. I think parse makes sense for only for structured data.

It does makes encode a little confusing since it is used as the opposite of both decode and parse when you might only expect it to be the opposite of decode. I might suggest we use to_X like to_json and to_key_value but this would overload to_X given it is also used for converting to primitives. We could go the Go/Java route of unmarshal and marshal?

Ideally, we'd rectify this, but that's a breaking change. We can create issues for when a current implementation goes against the design principles so that we can do a sweep of breaking changes before we hit 1.0.

I think we could also just alias the functions to rename them to the desired scheme; dropping the deprecated aliases at 1.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 marshal/unmarshal seem natural given my history in Go - but it is a larger change (aliasing seems reasonable)


- Return boolean from `is_*` functions (e.g. `is_string`).

- Return an error when `parse_*` functions fail to decode the string.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All but one parse_* functions are fallible. parse_query_string being the exception.


- `parse_*` functions should almost always error when used incorrectly.

- `to_*` functions must never fail.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't currently true because we decided to have to_* functions also take a string, meaning it does both decoding and type conversion.

We can either update this rule to match reality, or (at some point before 1.0) do a breaking change to update the functions to match this rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 would that end up being something like:

if is_integer(.field) {
    .field = int(.field)
}

lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
{ "error": "invalid data format" }
```

TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've seen any common patterns in the wild that are worth mentioning here, please let me know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like this pattern.

I am wondering if we'll want to introduce "pipelining" at some point like:

data = decode_base64(.message) |> parse_json

That is really just syntax sugar though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipe operator support is tracked in https://github.com/timberio/vector/issues/5431, but I don't think there's anything to note here, since it isn't actually part of the language yet.

Copy link
Contributor

@spencergilbert spencergilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment I'd generally make about VRL is that users seem to generally be using ! just about everywhere. This may be because most of the docs/examples use it for terseness, or because it's easier, or even because their event flow is much more known in advance and they need less handling.

lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
Comment on lines 124 to 127
- Use `parse_*` for string decoding functions (e.g. `parse_json` and
`parse_grok`).

- Use `encode_*` for string encoding functions (e.g. `encode_base64`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm torn on this, I prefer aligning on parse_base64 but using base64 directly it's referred to as "decode".

I'd lean towards internal consistency, maybe document "this decodes base64 encoded values" similar to us referencing "exploding" with the unnest function


- `parse_*` functions should almost always error when used incorrectly.

- `to_*` functions must never fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 would that end up being something like:

if is_integer(.field) {
    .field = int(.field)
}

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JeanMertz ! This seems like a great start and will help guide decisions in VRL in the future. I especially like the target audience and calling out of anti-patterns that we've rejected.

I can review again when the TODOs are addressed.

lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
Comment on lines 124 to 127
- Use `parse_*` for string decoding functions (e.g. `parse_json` and
`parse_grok`).

- Use `encode_*` for string encoding functions (e.g. `encode_base64`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some distinction here between functions that parse a string into some other type (like parse_json or parse_int) and functions that turn a string into another string (decode_base64 or decode_percent).

To me parse_base64 reads pretty weird given no "parsing" is really happening, just translation. I think parse makes sense for only for structured data.

It does makes encode a little confusing since it is used as the opposite of both decode and parse when you might only expect it to be the opposite of decode. I might suggest we use to_X like to_json and to_key_value but this would overload to_X given it is also used for converting to primitives. We could go the Go/Java route of unmarshal and marshal?

Ideally, we'd rectify this, but that's a breaking change. We can create issues for when a current implementation goes against the design principles so that we can do a sweep of breaking changes before we hit 1.0.

I think we could also just alias the functions to rename them to the desired scheme; dropping the deprecated aliases at 1.0.

- For one or more parameters, the first parameter must be the "target" of the
function (e.g. `parse_regex(target: <string>, pattern: <regex>)`).

- The first parameter must therefor almost always be named `target`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The first parameter must therefor almost always be named `target`.
- The first parameter must therefore almost always be named `target`.

lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
{ "error": "invalid data format" }
```

TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like this pattern.

I am wondering if we'll want to introduce "pipelining" at some point like:

data = decode_base64(.message) |> parse_json

That is really just syntax sugar though.

Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent start. I left a few comments, I'm curious what you think?

lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
lib/vrl/DESIGN.md Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
lib/vrl/DESIGN.md Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
lib/vrl/DESIGN.md Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
Signed-off-by: Jean Mertz <[email protected]>
Signed-off-by: Jean Mertz <[email protected]>
lib/vrl/DESIGN.md Show resolved Hide resolved
lib/vrl/DESIGN.md Show resolved Hide resolved
@JeanMertz JeanMertz requested a review from binarylogic August 20, 2021 14:32
Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start. I think it would be worth adding a section about how VRL handles types and type inference, particularly in light of the statement that "All errors are caught at compile time."

lib/vrl/DESIGN.md Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
Copy link
Contributor

@StephenWakely StephenWakely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing!

Would it be worth going into more detail about the type defs returned from a function? For example, it could be worth mentioning that some functions return objects with fields that may or may not be defined - the type defs for those fields should be in the definition as Kind::Bytes | Kind::Null.

lib/vrl/DESIGN.md Show resolved Hide resolved
lib/vrl/DESIGN.md Show resolved Hide resolved
Copy link
Member

@lukesteensen lukesteensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, though I'd think it'd be helpful to discuss the design of the fallibility system itself a bit here too. The way that it interacts with the type system in particular seems to make up the bulk of the language's learning curve.

Signed-off-by: Jean Mertz <[email protected]>
@JeanMertz
Copy link
Contributor Author

This one should be ready for a final round of reviews.

I'll file issues for any lingering comments, to allow us to iterate on this document in future PRs, and avoid keeping this PR open until all i's are dotted and t's are crossed.

@JeanMertz JeanMertz requested a review from jszwedko September 28, 2021 14:40
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this!

lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
lib/vrl/DESIGN.md Outdated Show resolved Hide resolved
performant as it can be, and there's no way to use functions in such a way
that performance of a VRL program tanks. We might introduce network calls at
some point, if we find a good caching solution to solve most of our concerns,
but so far we've avoided any network calls inside our stdlib.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest issue here is async. If we want to start introducing IO we will need to rewire VRL calls to be async. If we do it well it shouldn't affect performance too much since the runtime can be handling other events while waiting on IO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I don't think it fits this document, but worth keeping in mind.

Signed-off-by: Jean Mertz <[email protected]>
@JeanMertz JeanMertz enabled auto-merge (squash) October 1, 2021 13:14
@JeanMertz JeanMertz disabled auto-merge October 1, 2021 14:07
@JeanMertz JeanMertz merged commit 59075fc into master Oct 1, 2021
@JeanMertz JeanMertz deleted the jean/vrl-design-doc branch October 1, 2021 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language type: task Generic non-code related tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Remap specification
7 participants