Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update output config docs #2597

Merged
merged 3 commits into from
Sep 22, 2016
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 145 additions & 8 deletions libbeat/docs/outputconfig.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,6 @@ The number of workers per configured host publishing events to Elasticsearch. Th
is best used with load balancing mode enabled. Example: If you have 2 hosts and
3 workers, in total 6 workers are started (3 for each host).

===== port

The default port of the Elasticsearch server if the port number is missing in <<hosts-option>> URL. The default port number is 9200.

===== username
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More as a historical remark: port was removed as it would not work will with hosts.


The basic authentication username for connecting to Elasticsearch.
Expand Down Expand Up @@ -148,9 +144,51 @@ for more information about the environment variables.

===== index

The index root name to write events to. The default is the Beat name.
For example "{beatname_lc}" generates "[{beatname_lc}-]YYYY.MM.DD" indexes (for example,
"{beatname_lc}-2015.04.26").
The index name to write events to. The default is "{beatname_lc}-%{+yyyy.MM.dd}" (for example, "{beatname_lc}-2015.04.26").

===== indices

Array of index selector rules supporting conditionals, *Format String*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description (and the other ones like it in the doc) needs work. It took me a couple of times reading it before I realized that you're describing how to set the index dynamically. I think you're also saying that the index will be set based on the first item in the array that results in a match, but I am not sure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy'n past ;)

I think you're also saying that the index will be set based on the first item in the array that results in a match
correct.

I did try to make it 'short, as these settings are quite 'complicated' in comparison to most other settings. indices (and the others) is an array of rules executed one after another. The first rule matching (conditionals match and format string doesn't fail) in the array of rules will set index the event is published to. If one rule did match, no other rule in the array will be check to not generate ambiguities in case of 2 rules potentially matching. If no rule did match, the index field will be evaluated (which is basically another rule). Truth is index is just another rule appended to the list of rules defined in indices.

based field access and name mappings. The first rule matching will be used to
set the `index` for the event to be published. If `indices` is missing or no
rule matches, the `index` field will be used.

Rule settings:

*`index`*: The index *Format String* to use. If the fields used are missing, the rule fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format String shouldn't be capitalized because it's not a product component. Do you plan to point to the reference content that you wrote about format strings? I'm thinking about relocating the content about the config file to the guides for individual Beats, but I won't do that for now, so use the {libbeat}/filename.html[link text] convention to add links for now. I will clean up the links when I move the content. If you want, I can add links after this PR is merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing all *Format String* to format string.

My idea with documenting all types in "config file format" includes (long-term goal) adding the type to all settings in the reference docs by name + link to the definition of said type. Having some docs in libbeat only and others reused among multiple beats I did struggle with the execution, though.


*`mapping`*: Dictionary mapping index names to new names

*`default`*: Default string value if `mapping` does not find a match.

*`when`*: Condition which must succeed in order to execute the current rule.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can have a general example for topics, pipelines, indices we can link to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 We should have a usage type topic that describes, for example, how to set the pipeline dynamically. Having the reference content is good enough for Beta, though.


Examples elasticsearch output with `indices`:

["source","yaml"]
------------------------------------------------------------------------------
output.elasticsearch:
hosts: ["http://localhost:9200"]
index: "logs-%{+yyyy.MM.dd}"
indices:
- index: "critical-%{+yyyy.MM.dd}"
when.contains:
message: "CRITICAL"
- index: "error-%{+yyyy.MM.dd}"
when.contains:
message: "ERR"
------------------------------------------------------------------------------

===== pipeline

*Format String* value configuring the ingest node pipeline to write events to.

===== pipelines

Array of pipeline selector configurations supporting conditionals, *Format String*
based field access and name mappings. The first rule matching will be used to
set the `pipeline` for the event to be published. If `pipelines` is missing or no
rule matches, the `pipeline` field will be used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar example as with indices would be nice here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much repetition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like repetition either, but there's no guarantee with reference info that users will read from top to bottom, so some repetition might be necessary (or at least a pointer to the supporting info).


===== template

Expand Down Expand Up @@ -527,9 +565,57 @@ The password for connecting to Kafka.

===== topic

The Kafka topic used for produced events. The setting can be a format string
The Kafka topic used for produced events. The setting can be a *Format String*
using any event field. To set the topic from document type use `%{[type]}`.

===== topics

Array of topic selector rules supporting conditionals, *Format String*
based field access and name mappings. The first rule matching will be used to
set the `topic` for the event to be published. If `topics` is missing or no
rule matches, the `topic` field will be used.

Rule settings:

*`topic`*: The topic *Format String* to use. If the fields used are missing, the
rule fails.

*`mapping`*: Dictionary mapping index names to new names

*`default`*: Default string value if `mapping` does not find a match.

*`when`*: Condition which must succeed in order to execute the current rule.

===== key

The Kafka event key. The event key must be unique and can be extracted from the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if no key is set? Is it by default auto generated by Kafka?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys in kafka are optional. if key is set and reused, the old content will be overwritten. By never changing content, keys can be used for some artificial kind of deduplication.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note that the key is optional. From the current description it looks like it is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

event using a *Format String*.

===== partition

Kafka output broker event partitioning strategy. Must be one of `random`,
`round_robin`, or `hash`. By default the `hash` partitioner is used.

*`random.group_events`*: Select a new partition by random every `group_events`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't parse this sentence. Maybe there is a missing word?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No missing word. Using group_events is supposed to use the name as numeric replacement. I want to express that the number configured in group_events sets how many consecutive published events will be send to the same partition, before the partitioner selects a new partition to use for the next set events being published.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sort of substitution works better when the name of the option clearly indicates what it represents. In this case, it's not clear by the name that group_events will resolve to a number, so some readers will stumble over how the sentence is constructed. I realize you're trying to avoid using a lot of words, but it's better to be explicit like you've been in your explanation to me. Also, unless you're telling the user to do something, avoid using the imperative. When someone reads "select", they think they need to select something. It might take them a few seconds to realize that you're essentially saying "configure this option so that the software selects." Does that make sense?

events being published. The default value is 1 meaning after each event a new
parition is picked randomly.

*`round_robin.group_events`*: Select the next partition every `group_events`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as line 599

events being published. The default value is 1 meaning after each event the next partition will be selected.

*`hash.hash`*: List of fields used to compute the partitioning hash value from.
If no field is configured, the events `key` value will be used.

*`hash.random`*: Randomly distribute events if no hash or key value can be computed.

All partitioners will try to publish events to all partitions by default. If a
partitions leader becomes unreachable for the beat, the output might block. All
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition's leader

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

partitioners support setting `reachable_only` to overwrite this
behavior. If `reachable_only` is set to `true`, events will be published to
available partitions only. Note: publishing to the subset of available partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a proper note by starting it on a separate line and beginning with "NOTE: ". Also, saying "only" makes the sentence grammatically ambiguous in English. I would say something like:

NOTE: Publishing to a subset of available partitions potentially increases resource usage because events may become unevenly distributed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

only potentially increases resource usage due to events becoming more unevenly
distributed.

===== client_id

The configurable ClientID used for logging, debugging, and auditing purposes. The default is "beats".
Expand Down Expand Up @@ -662,6 +748,57 @@ The name of the Redis list or channel the events are published to. The default i
The name of the Redis list or channel the events are published to. The default is
"{beatname_lc}".

The redis key can be set dynamically using a *Format String* accessing any
fields in the event to be published.

This configuration will use the `fields.list` field to set the redis list key. If
`fields.list` is missing, `fallback` will be used.

["source","yaml"]
------------------------------------------------------------------------------
output.redis:
hosts: ["localhost"]
key: "%{[fields.list]:fallback}"
------------------------------------------------------------------------------

===== keys

Array of key selector configurations supporting conditionals, *Format String*
based field access and name mappings. The first rule matching will be used to
set the `key` for the event to be published. If `keys` is missing or no
rule matches, the `key` field will be used.

Rule settings:

*`key`*: The key *Format String*. If the fields used in the format string are missing, the rule fails.

*`mapping`*: Dictionary mapping key values to new names

*`default`*: Default string value if `mapping` does not find a match.

*`when`*: Condition which must succeed in order to execute the current rule.

Example `keys` settings:

["source","yaml"]
------------------------------------------------------------------------------
output.redis:
hosts: ["localhost"]
key: "default_list"
keys:
- key: "info_list" # send to info_list if `message` field contains INFO
when.contains:
message: "INFO"
- key: "debug_list" # send to debug_list if `message` field contains DEBUG
when.contains:
message: "DEBUG"
- key: "%{[type]}"
mapping:
"http": "frontend_list"
"nginx": "frontend_list"
"mysql": "backend_list"
------------------------------------------------------------------------------

===== password

The password to authenticate with. The default is no authentication.
Expand Down