Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared knowledge base on mapping log events to Elastic Common Schema #601

Open
Duchadian opened this issue Oct 29, 2019 · 6 comments
Open

Comments

@Duchadian
Copy link

Issue / roadmap suggestion posted here at the suggestion of some of the devs.

Elastic Common Schema has provided a unified format for logs ingested using Beats in those cases where a module for parsing and mapping the events in the log is available. However, it can be difficult to parse and map events to ECS for the log sources for which a module is not available.

It would be nice to have a shared knowledge base for previous log sources that have been mapped to ECS. This would make converting new log sources to ECS significantly easier, as there would be a central location with examples of previous log sources that have been converted to ECS. Right now, that is scattered across various repos where it is relatively difficult to find.

The knowledge base could take a similar form to the grok-patterns repo.

@Duchadian
Copy link
Author

Tagging @jsvd because we talked about this at Elastic on Tour.

@jsvd
Copy link
Member

jsvd commented Oct 30, 2019

hi @Duchadian I'm moving this to elastic/ecs, thanks for opening the issue :)

@jsvd jsvd transferred this issue from elastic/logstash Oct 30, 2019
@webmat
Copy link
Contributor

webmat commented Oct 30, 2019

Hi folks, the closest thing we have to this right now is opening an issue on the ECS repo, with your mapping, and tagging it with the label "mapping". Ideally the mapping can be shared via a public web spreadsheet, but it can be shared another way too.

Opening such issues will allow us to provide feedback and thoughts on what needs adjustments, explain edge cases that apply to your situation & so on. These issues and discussions are then available for the community to look at, see the discussion, the thought process & so on.

Does that help for what you had in mind?

@Duchadian
Copy link
Author

Hi @webmat, that's sort of what I had in mind.

From my perspective, what ECS lacks is a centralised repo / website / something where the community (in collaboration with Elastic) can maintain parsers and mappings to ECS for all kinds of log sources across various versions of the log source. This would ensure that ECS can be more easily adopted for all kinds of log sources by users, vendors, and Elastic itself.

What most of the issues with the mapping label are missing is the original event. That way, it is difficult to see which part of the original log maps to which field in ECS. A side-by-side of the original event, the parsed event, and the event mapped to ECS would be a way to make this clearer in my opinion. Otherwise, it is more difficult to get the semantics of the message right in the mapping from any new log source to ECS.

I hope it is at least sort of clear what I mean.

@Randy-312
Copy link

I second this ask!
It would also be good to highlight where each mapping is in the cycle, and IF a module is on the way for it to be used as a plugin.

Wikipedia could handle it, but I'd rather have it on GitHub where we can find it easily, and contribute to it.

@webmat
Copy link
Contributor

webmat commented Nov 14, 2019

A few things converged recently, and it crystallized an idea I think will be in line with what you have in mind. I'm hoping we can start on this and get a POC before year end 🤞

CSV mapping format

The gist of the idea is to formalize a small CSV-based format (3-6 column names) where one can capture their planned mapping. I would hope to keep it simple enough that people can get the gist of it quickly, but still be precise enough that the files can be leveraged programmatically.

Having this based on CSV would have a few interesting properties:

  • Easy to go from CSV to spreadsheet and back. An online spreadsheet is great to gather feedback on the mapping, whether among colleagues, or by sharing here and tagging as I described above.
  • The resulting CSV files can easily be collected in a community repo. The ECS team won't be able maintain them over time, but may still be useful to the community as a reference.
  • Last but not least, having this simple yet precise CSV format would make it possible to consume programmatically, as I mentioned above. This could take the shape of:
    • Scripts to generate the desired kind of pipeline (Beats processors, ES Ingest node, Logstash, etc)
    • Eventually even processors that can consume the CSV directly. Renaming 20 fields in a pipeline is laborious and takes a lot of space. What if you had a processor that took a file that describes all the renames and did lightweight adjustments such as type coercion, lowercasing, etc? This specific idea has been on my mind for a while, as you can see here ;-)

The pipelines resulting from these CSV mapping files would not be feature complete, of course. Since the goal of this CSV format is to make it easy to get started, it wouldn't be turing-complete. But the generated pipeline would still be a great start.

So that's the gist of this CSV format.

Micro-pipelines

Kind of related to this, we've been gradually collecting ideas of "micro-pipelines" that are made possible, once you have predictable field names. Here's a few examples:

  • once you have a full URL in field url.full or url.original, here's a pipeline that breaks it down to all of the fields in url.*.
    • This applies to other kinds of semi-structured field breakdowns, like breaking down a domain name to subdomain, registered_domain and top_level_domain.
  • We know where the IPs are in ECS
    • One micro-pipeline can geolocate them all (when they're a public IP)
    • Another can perform Autonomous System lookup
  • Perform mappings such as network.transport <=> network.iana_number. When your source has one, this micro-pipeline auto-fills the other.
  • We could build a micro-pipeline that auto-fills related.ip or other related.* fields that enable easy pivoting.

You can check out #181 for the full list of such ideas for micro-pipelines.

Setting expectations

What I'm talking about here is coming out with a public POC, as an experiment. This will not be officially supported, at least initially.

However I think these two experiments could simplify knowledge exchange around ECS within the community. And if the experiments prove useful, perhaps they can take root as I envision them, or lead to something bigger along the same lines :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants