Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new transform): New remap transform #3341

Merged
merged 33 commits into from
Sep 9, 2020
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
ae9e84e
Add remap transform
Jeffail Aug 4, 2020
f6311ca
Allow newlines* at the end of maps
Jeffail Aug 4, 2020
00e2629
WIP: Implement arithmetic parser
Jeffail Aug 5, 2020
e3471fc
WIP: Implement op parser and grouping
Jeffail Aug 6, 2020
645976c
Finish arithmetic implementations
Jeffail Aug 7, 2020
3850561
Add remap meta file
Jeffail Aug 10, 2020
2a449ae
Addressing comments
Jeffail Aug 10, 2020
6d21cf6
Add remap example
Jeffail Aug 11, 2020
b01a940
More tidy up
Jeffail Aug 11, 2020
96629fb
fmt
Jeffail Aug 11, 2020
904d7bf
Clippy stuff
Jeffail Aug 11, 2020
10d5ee5
Implement remap if statements
Jeffail Aug 13, 2020
97fae47
Add if statement example to remap docs
Jeffail Aug 13, 2020
bde0b03
Add only_fields function
Jeffail Aug 14, 2020
a583bee
Rebased
Jeffail Aug 18, 2020
636d8e9
Add coercion functions to remap
Jeffail Aug 19, 2020
9eab5af
Add quoted path segments
Jeffail Aug 20, 2020
594a66e
Rejig
Jeffail Aug 20, 2020
83af802
Implement own unescaping for string lits
Jeffail Aug 21, 2020
02ee95c
Tidy up
Jeffail Aug 21, 2020
4515e2f
Add not operator
Jeffail Aug 21, 2020
46e266d
If statement enforce boolean query result
Jeffail Aug 21, 2020
3962e04
Merge branch 'master' into remap-transform
JeanMertz Sep 7, 2020
a29990a
reduce number of unwraps
JeanMertz Sep 7, 2020
6ff7d4a
add field with spacing test
JeanMertz Sep 7, 2020
975d854
add "quoted path segment" behavior test
JeanMertz Sep 7, 2020
66ea912
update remap internal event for dropped events
JeanMertz Sep 7, 2020
2f2b5d4
Update src/mapping/parser/mod.rs
JeanMertz Sep 9, 2020
9a68ebc
Update src/mapping/parser/mod.rs
JeanMertz Sep 9, 2020
1a15382
Update src/mapping/parser/mod.rs
JeanMertz Sep 9, 2020
4c15291
Update src/mapping/query/functions.rs
JeanMertz Sep 9, 2020
18acbbc
Update src/mapping/query/functions.rs
JeanMertz Sep 9, 2020
a996bd0
fix code suggestions
JeanMertz Sep 9, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 166 additions & 0 deletions .meta/transforms/remap.toml.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
[transforms.remap]
title = "Remap"
allow_you_to_description = "remap one or more log fields"
beta = false
common = false
function_category = "schema"
input_types = ["log"]
output_types = ["log"]
requirements = {}

<%= render("_partials/fields/_component_options.toml", type: "transform", name: "remap") %>

[transforms.remap.options.mapping]
type = "string"
common = true
required = true
description = """\
A mapping that describes field assignments and deletions to be performed on log \
events.\
"""
examples = [
""".type = "foo"""",
""".new_field = .old_field * 2
del(.old_field)""",
"""only_fields(.message, .timestamp, .name)""",
]

[[transforms.remap.examples]]
label = "Generic"
body = """\
The remap transform makes it easy to add, rename and remove fields with a series of statements. Given events of the following form:

```json
{
"name": "Foo McBarson",
"friends": 23,
"enemies": 17
}
```

If, say, we wished to add a new field `type` with a static value `"human_person"`, a dynamic field `acquaintances` that equals the sum of the fields `friends` and `enemies`, and remove the field `enemies`, we can achieve that with the following mapping:

```toml
[transforms.remap_human_person]
type = "remap"
mapping = \"\"\"
.type = "human_person"
.acquaintances = .friends + .enemies
del(.enemies)
\"\"\"
```

And log events will be output with the following structure:

```json
{
"name": "Foo McBarson",
"type": "human_person",
"friends": 23,
"acquaintances": 40
}
```\
"""

[[transforms.remap.examples]]
label = "Conditional mapping"
body = """\
Sometimes there are mappings that we only wish to execute given certain conditions, the remap transform allows you to express this with if statements. For example, given events of the following form:

```json
{
"type": "foo",
"foo": {
"body": "hello world",
"id": "XXX"
}
}
```

And, occasionally of the following alternative form:

```json
{
"type": "bar",
"body": "hello world",
"id": "YYY"
}
```

If we wished to create a consistent format between the two, where the fields `body` and `id` are always at the root of the event, we can achieve this by conditionally mapping against the field `type` of the input event:

```toml
[transforms.remap_human_person]
type = "remap"
mapping = \"\"\"
if .type == "foo" {
.body = .foo.body
.id = .foo.id
del(.foo)
}
\"\"\"
```

And with the above mapping our events will be output with the following consistent structure:

```json
{
"type": "foo",
"body": "hello world",
"id": "XXX"
}
```\
"""

[[transforms.remap.examples]]
label = "Type Coercion"
body = """\
The remap transform offers the functions `string`, `int`, `float`, `bool` and `timestamp`, that attempt to coerce values into fixed types. Each of these functions take a query parameter describing the value to coerce and these queries can themselves include arithmetic and functions.

The `timestamp` function also requires a second parameter describing the timestamp format to parse. All other functions have an optional second parameter that describes a default value to return if the target value does not exist or otherwise cannot be coerced.

For example, given the following event:

```json
{
// ... existing fields
"bytes_in": "5667",
"bytes_out": "20574",
"host": "5.86.210.12",
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": "201",
"timestamp": "19/06/2019:17:20:49 -0400",
"user_id": "zieme4647"
}
```

And the following configuration:

```toml
[transforms.remap_log]
type = "remap"
mapping = \"\"\"
.bytes_in = int(.bytes_in)
.bytes_out = int(.bytes_out)
.imaginary_bool = bool(.doesnt.exist, true)
.timestamp = timestamp(.timestamp, "%d/%m/%Y:%H:%M:%S %z")
.status = int(.status, 200)
\"\"\"
```

A log event will be output with the following structure:

```json
{
// ... existing fields
"bytes_in": 5667,
"bytes_out": 20574,
"host": "5.86.210.12",
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": 201,
"timestamp": "2019-06-19T17:20:49-04:00",
"user_id": "zieme4647",
"imaginary_bool": true
}
```\
"""
57 changes: 57 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ bytesize = { version = "1.0.0", optional = true }
glob = "0.2.11"
grok = { version = "~1.0.1", optional = true }
nom = { version = "5.1.2", optional = true }
pest = "2.1.3"
pest_derive = "2.1.0"
uuid = { version = "0.7", features = ["serde", "v4"], optional = true }
exitcode = "1.1.2"
snafu = { version = "0.6", features = ["futures-01", "futures"] }
Expand Down Expand Up @@ -289,6 +291,7 @@ transforms = [
"transforms-lua",
"transforms-merge",
"transforms-regex_parser",
"transforms-remap",
"transforms-remove_fields",
"transforms-remove_tags",
"transforms-rename_fields",
Expand Down Expand Up @@ -316,6 +319,7 @@ transforms-logfmt_parser = ["logfmt"]
transforms-lua = ["rlua"]
transforms-merge = []
transforms-regex_parser = []
transforms-remap = []
transforms-remove_fields = []
transforms-remove_tags = []
transforms-rename_fields = []
Expand Down
92 changes: 92 additions & 0 deletions benches/bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,23 @@ use criterion::{criterion_group, criterion_main, Benchmark, Criterion, Throughpu

use approx::assert_relative_eq;
use futures::{compat::Future01CompatExt, future, stream, StreamExt};
use indexmap::IndexMap;
use rand::{
distributions::{Alphanumeric, Uniform},
prelude::*,
};
use std::convert::TryFrom;
use string_cache::DefaultAtom as Atom;
use vector::config::{self, TransformConfig, TransformContext};
use vector::event::Event;
use vector::test_util::{
next_addr, runtime, send_lines, start_topology, wait_for_tcp, CountReceiver,
};
use vector::transforms::{
add_fields::AddFields,
remap::{Remap, RemapConfig},
Transform,
};
use vector::{sinks, sources, transforms};

mod batch;
Expand All @@ -32,6 +39,7 @@ criterion_group!(
benchmark_complex,
bench_elasticsearch_index,
benchmark_regex,
benchmark_remap,
);
criterion_main!(
benches,
Expand Down Expand Up @@ -657,6 +665,90 @@ fn bench_elasticsearch_index(c: &mut Criterion) {
);
}

fn benchmark_remap(c: &mut Criterion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/bench-0f9a98460f4dba0d
add fields with remap   time:   [1.3603 us 1.3614 us 1.3625 us]                                   
                        change: [-1.5879% +0.1667% +1.5785%] (p = 0.84 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

add fields with add_fields                                                                             
                        time:   [2.2038 us 2.2086 us 2.2152 us]
                        change: [-0.9292% -0.6531% -0.2980%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

Very nice. Good job. :)

let event = {
let mut event = Event::from("augment me");
event.as_mut_log().insert("copy_from", "buz");
event
};

c.bench_function("add fields with remap", |b| {
let conf = RemapConfig {
mapping: r#".foo = "bar"
.bar = "baz"
.copy = .copy_from"#
.to_string(),
drop_on_err: true,
};
let mut tform = Remap::new(conf).unwrap();

b.iter(|| {
let result = tform.transform(event.clone()).unwrap();
assert_eq!(
result
.as_log()
.get(&Atom::from("foo"))
.unwrap()
.to_string_lossy(),
"bar"
);
assert_eq!(
result
.as_log()
.get(&Atom::from("bar"))
.unwrap()
.to_string_lossy(),
"baz"
);
assert_eq!(
result
.as_log()
.get(&Atom::from("copy"))
.unwrap()
.to_string_lossy(),
"buz"
);
})
});

c.bench_function("add fields with add_fields", |b| {
let mut fields = IndexMap::new();
fields.insert("foo".into(), "bar".into());
fields.insert("bar".into(), "baz".into());
fields.insert("copy".into(), "{{ copy_from }}".into());

let mut tform = AddFields::new(fields, true);

b.iter(|| {
let result = tform.transform(event.clone()).unwrap();
assert_eq!(
result
.as_log()
.get(&Atom::from("foo"))
.unwrap()
.to_string_lossy(),
"bar"
);
assert_eq!(
result
.as_log()
.get(&Atom::from("bar"))
.unwrap()
.to_string_lossy(),
"baz"
);
assert_eq!(
result
.as_log()
.get(&Atom::from("copy"))
.unwrap()
.to_string_lossy(),
"buz"
);
})
});
}

fn random_lines(size: usize) -> impl Iterator<Item = String> {
let mut rng = SmallRng::from_rng(thread_rng()).unwrap();

Expand Down
Loading