Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Stop shipping log.offset by default #3934

Open
ruflin opened this issue Dec 20, 2023 · 12 comments
Open

[Proposal] Stop shipping log.offset by default #3934

ruflin opened this issue Dec 20, 2023 · 12 comments
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@ruflin
Copy link
Contributor

ruflin commented Dec 20, 2023

Currently Elastic Agent (filebeat under the hood) ships log.offset by default for each event when the logfile or filestream input is used (are there other inputs that do this too?). Some users use log.offset for sorting of log entries with an identical timestamp. With the switch to nano seconds as default, this should become less of a problem: elastic/elasticsearch#102548 As log.offset also takes a relevant amount of storage for each event, the proposal is to remove shipping it by default and make it opt in. This can be considered a breaking change and needs further discussion.

For users to remove log.offset already now, it can be either done on the edge with a processor or centrally through an ingest pipeline. The processor on the edge looks as following:

processors:
- drop_fields:
    fields: ["log.offset"]
    ignore_missing: true

Even though likely code changes are in filebeat, I filed this issue in the Elastic Agent repo to make it clear, changes would also apply to data shipped by Elastic Agent.

Related issues

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz
Copy link
Member

cmacknz commented Dec 20, 2023

No opposition to doing this, besides figuring out how to handle the breaking change aspect of it.

We could introduce this as a feature flag directly in the agent policy, and then make a decision on when to flip it from true to false. This would give a relatively easy way to turn it off globally.

# This section enables or disables feature flags supported by Agent and its components.
#agent.features:
# fqdn:
# enabled: false

We could also argue that this is something that should be in each individual integration that collects logs, but there are a lot of those.

@amitkanfer
Copy link
Contributor

We could introduce this as a feature flag directly in the agent policy,

@nimarezainia another one to the list ^^

@ruflin
Copy link
Contributor Author

ruflin commented Dec 21, 2023

We could also argue that this is something that should be in each individual integration that collects logs, but there are a lot of those.

I would argue log.offset is something related to the input not integrations.

@andrewkroh
Copy link
Member

Recently there were a several other new non-ECS fields added to filestream input which have an impact on users. I think their only purpose is troubleshooting, so having them present all the time does not make sense. IMO is would be nice to make this opt-in in some way too like log.offset.

// windows
  "log": {
    "offset": 0,
    "file": {
      "path": "C:\\Users\\Admin\\Desktop\\test\\fs_metadata\\logs\\some.log",
      "idxhi": 327680,
      "idxlo": 99308,
      "vol": 2860917223,
      "fingerprint": "0e9d2f73e97662c9a156111ea556fc536782043f2a1cd69fecbe3b69cab64c9d"
    }
  }
// posix
  "log": {
    "offset": 0,
    "file": {
      "inode": 119400347,
      "fingerprint": "2edc986847e209b4016e141a6dc8716d3207350f416969382d431539bf292e4a",
      "path": "/path/to/your/logs/some.log",
      "device_id": 16777234
    }
  }

@ruflin
Copy link
Contributor Author

ruflin commented Jan 10, 2024

@cmacknz @rdner Have we tested what is the overhead of the addition of these field? As @andrewkroh described, we should make these ideally opt in for debugging. @jpountz for awareness.

@cachedout I would like us to have some simple benchmarks in place for storage usage for collecting a simple file where we find out quickly in case some changes increase storage usage. @amitkanfer Does your team already have something link this already in place?

@cmacknz
Copy link
Member

cmacknz commented Jan 10, 2024

Have we tested what is the overhead of the addition of these field?

There is no regular testing for the storage size of individual documents or data streams, only adhoc testing. I believe this was added as a direct result of being unable to diagnose issues in support cases.

For events read from the same file these fields are exactly the same the majority of the time. Having some way to opt-in to those fields would make sense.

I don't think we have a concept of an option that augments the actual event data with troubleshooting information. I think that is probably useful in general. The easiest alternative would be to convert this information into debug logs but they would be extremely verbose at high event rates (essentially making these the only visible logs).

@cachedout
Copy link
Contributor

@cachedout I would like us to have some simple benchmarks in place for storage usage for collecting a simple file where we find out quickly in case some changes increase storage usage.

We could certainly consider this but I was under the impression that the Ingest folks have their own folks dedicated to benchmarking and performance. Is this the case @andresrc ?

@rdner
Copy link
Member

rdner commented Jan 12, 2024

@ruflin yes, we ran benchmarks and tested the overhead after adding these new fields elastic/beats#36065 (comment)

However, these benchmarks were aimed at the Filebeat throughput and not the Elasticsearch storage layout.

These new values mentioned in #3934 (comment) and log.offset help us to identify issues when investigating SDHs. We could make them opt-in, sure, but that would be an extra step in the investigation.

Also, adding these particular values was requested by a customer.

@ruflin
Copy link
Contributor Author

ruflin commented Jan 15, 2024

I understand the value of these add fields but we need to understand the storage impact of these. If these increase the storage usage only in a single digit percentage it can have a large $ effect for all users.

@amitkanfer Can you help me find who should take on the work to have some storage benchmarks in place? It is important that we know when changes / new features in inputs increase / reduce the storage usage in Elasticsearch, especially for logs.

@amitkanfer
Copy link
Contributor

@pierrehilbert - this align perfectly with one of the KRs you're sponsoring around lowering storage volume. We also mentioned the need to first evaluate the current state. What do you think to use this proposal as a test case?

@andresrc
Copy link
Contributor

We could certainly consider this but I was under the impression that the Ingest folks have their own folks dedicated to benchmarking and performance. Is this the case @andresrc ?

I think so, but not sure to what extend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

9 participants