-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file system information to each event #36065
Conversation
This includes: * Unix-like: * device * inode * Windows: * idxlo * idxhi * vol * All: * Fingerprint (if the fingerprint mode is enabled)
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Since we're adding what looks like 3-4 string fields to every event, I wonder how this will impact performance. Should we do some benchmarking first? Or put this behind a configuration setting (either enabled by default so we can turn it off in case of any reported performance problems or disabled by default so we can turn it on only for customers that want this information)? |
Also, should fields like |
@ycombinator we've already discussed the performance impact in the original issue and agreed the value for us is more important. The biggest field here is the fingerprint which is already an opt-in behaviour, this field is present only if the fingerprint mode is active. The rest are just very short strings. |
@ycombinator also adding to my previous comment: the length of all the added fields is constant and they're attached only to events produced by the filestream input. So, the impact is not global. |
It makes me a bit unesasy that we don't actually know what the performance impact of these additional fields on every filestream event would be. But I see the points that the sizes of these fields are constants and also that in terms of a % increase compared to the size of the overall event, the increase might not be significant. The only question I have remaining, then, is #36065 (comment). |
@ycombinator don't get me wrong, I'm all for testing the performance here. We can do it once @alexsapran is available. If you think we should hold this PR until then, I'm fine with that. Regarding the use of integers in #36065 (comment) I made the change here abc9643 |
I'm 99% sure the performance testing is going to agree with our speculation that the performance impact is non-existent or negligible. But I think it's worth waiting for it only because, if we don't have proof, once this is released we won't have a way to undo or disable it in case there are negative consequences. That being said, we do have another month to
Thanks! |
@pierrehilbert @nimarezainia @cmacknz opinions? Should we add a new flag to the filestream input configuration to make these fields optional? The maximum impact from this change is an event size increase by:
I personally think that the input configuration is already too polluted with options, I'm not sure if it's worth to add a new option to save ~145 bytes. |
To be clear, my suggestion for a new configuration flag to enable/disable these fields is only because we haven't (yet) performance-tested the impact of this change — a change that adds ~145 bytes to every filestream event. If we do the performance testing and it proves that the impact is nonexistent or negligible, I don't think we need a flag. |
As mentioned in the issue itself when you created it, I agree that we should validate the perf impact of this change to ensure that it's as little as what we imagine. |
@pierrehilbert see my comment above #36065 (comment) |
Hi all, sorry for the late replay, I was out sick. Looking at the discussion I can definitely help, but at the same time, I want to make sure we are talking about the same items when we say performance. |
Discussed it with @rdner and provided me with the information needed. I will run some verifications and get back to this thread. |
@ycombinator as @alexsapran pointed out to me, in ECS https://www.elastic.co/guide/en/ecs/current/ecs-file.html#field-file-device So, what should we do, keep the integer because it makes sense, or be ECS compliant with a string value? I suppose we cannot make a change to ECS, can we? |
`device` in ECS is not a numerical ID, it's a string name of the device.
@ycombinator I've just realised that in ECS they mean a name of the device, not its ID, so I renamed my new field to |
Overall it would be nice to backport those changes back to ECS so we keep the FB output ECS compliant when possible, I know this is more involved, but it's something to keep an eye out for future work we do especially when adding new fields. |
@alexsapran well technically, the changes in this PR are compliant. It's just ECS does not have additional fields that we have here. I'm not even sure it should. |
I run the benchmarks for the use case of JSON formatted log and unstructured format. JSONThe data show that we were able to achieve the same amount of EPS. I noticed a slight increase in the output throughput metrics, from 20.9MB/s to 21.6MB/s, which is expected given that we are increasing the data we ship. Unstructured RAWWhen I was running the benchmark I noticed that the EPS was increased after this PR, not including the changes from f8582d6 which was pushed after I started. All of the above was done with compression level 0, which is the default compression. I am re-running the unstructured tests and looking into them. Overall I don't see this PR adding any performance regression on the output side of the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
In elastic#36065 a few fields were renamed in order to clarify their purpose. Unfortunately, this rename was a part of a new feature PR which was not supposed to be backported to 7.17. However, backporting some other changes related to this code has now become challenging and results in build failures (note the fixing commits): - elastic#36095 - elastic#36264 This PR makes the naming consistent with the main branch, so we can easily backport changes.
In #36065 a few fields were renamed in order to clarify their purpose. Unfortunately, this rename was a part of a new feature PR which was not supposed to be backported to 7.17. However, backporting some other changes related to this code has now become challenging and results in build failures (note the fixing commits): - #36095 - #36264 This PR makes the naming consistent with the main branch, so we can easily backport changes.
This includes: * Unix-like: * device * inode * Windows: * idxlo * idxhi * vol * All: * Fingerprint (if the fingerprint mode is enabled)
What does this PR do?
This includes:
Why is it important?
This was requested by some customers and it makes it easier for us to troubleshoot.
Checklist
- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Note that you need to replace the path placeholders.
You should see this event printed to the console:
On Windows you'd see:
Note the file system metadata in the
log.file
section of the event.Related issues