-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] Support for extracting UserAgent string #32434
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@michalpristas I suspect this is doable via regular expression, and any converter would be doing regex internally. Can this be accomplished with |
it can, the reason i'm bringing this separately is that in our pipelines this is a processor very commonly used and having to play with regular expressions may be overwhelming a bit. |
I don't think it would be feasible to do user agent string extraction just with the There's a go library that we could potentially re-use for this: https://github.com/ua-parser/uap-go. As user agent parsing yields multiple values, I'm not sure whether OTTL or a separate processor is the right place for user agent string parsing. I think it would be neat if it's possible to build a log parsing pipeline purely in OTTL, including UA parsing and other things that may yield multiple values but I'm not sure what the guideline and the scope of OTTL is. To me, user agent string parsing feels an essential building block that should be available to users out of the box, one way or another. |
Hello, I am interested in working on this, if you are looking for a volunteer (it would be my first contribution to OTel 😄 ) |
+1 to adding this function to OTTL. |
Assigned the issue to you @pchila 👍 |
Thanks for volunteering to work on this @pchila. I'd suggest waiting until the I'm okay with adding a function like this given we have an implementation that parses a standard format user agent strings (I think RFC 9910 is the official source right now) into a standard map-like structure, such as the attributes provided by semconv. I think following something close to what we have for the @felixbarny That's a good note that caching would be helpful here to improve performance. I think we should save that in a follow-up after this function is implemented to keep the each PR small and to ensure we get caching right. |
Thank you @evan-bradley , I will have a look at what is done for I will comment here sketching out the input/output of the function before starting implementation. |
So, I had a look at what's been implemented for the @TylerHelmuth would such first implementation would be ok in your opinion or we still need more details/clarification? |
When we first started discussing this function we didn't have https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/ottl/ottlfuncs/README.md#adding-new-editorsconverters in place. I think this function does meet the acceptance guidelines since it is a significant user experience improvement and potentially a performance improvement. I am worried about the resulting semantic convention attributes we'd be producing not being stable. When they stabilize the function could break. We'll want to clearly document the current semantic convention version it is following. Maybe the semantic convention version to generate should be an optional param. |
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Added a new ottl converter `UserAgent`: it parses an input string and matches against a [set of known UA regexes](https://github.com/ua-parser/uap-core/blob/master/regexes.yaml) to correctly identify user agent and its version **Link to tracking Issue:** #32434 **Testing:** Unit tests, E2E tests **Documentation:** <Describe the documentation added.> Added UserAgent description in `pkg/ottl/ottlfuncs/README.md` --------- Co-authored-by: Tyler Helmuth <[email protected]>
Hi @pchila @TylerHelmuth, are we good to close this issue now that #34172 has been merged or is there some more work to be done? |
Are there any follow up enhancements that we're planning? For example, adding caching or supporting more attributes, like the ones the Elasticsearch |
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Added a new ottl converter `UserAgent`: it parses an input string and matches against a [set of known UA regexes](https://github.com/ua-parser/uap-core/blob/master/regexes.yaml) to correctly identify user agent and its version **Link to tracking Issue:** open-telemetry#32434 **Testing:** Unit tests, E2E tests **Documentation:** <Describe the documentation added.> Added UserAgent description in `pkg/ottl/ottlfuncs/README.md` --------- Co-authored-by: Tyler Helmuth <[email protected]>
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
The intended converter extracts details from the user agent string a browser sends with its web requests into User Agent SemConv attributes.
Describe the solution you'd like
Example:
Input:
Result
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: