Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow all properties of an object to be queried and aggregated as top-level attributes #103567

Closed
felixbarny opened this issue Dec 19, 2023 · 1 comment · Fixed by #103648
Closed
Assignees
Labels
>feature :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:StorageEngine

Comments

@felixbarny
Copy link
Member

In order to properly map events that come in via the OpenTelemetry Protocol (OTLP), we'd like to have the ability to store attributes in an attributes and resource_attributes object.

For example:

{
  "resource_attributes": {
    "host.name": "foo"
  }
}

We still want to be able do queries and aggregations directly on host.name. While we could just store attributes at the top level, we'd be losing information on whether something is a resource attribute vs attribute and it would be difficult to convert that document back into OTLP without loss of information.

Maybe we can build on the foundations of the alias field type. However, it would need to be a generic kind of alias that makes attributes.* and resource_attributes.* available at the top level. Ideally, this should not have an impact on the field limit of an index.

We could, for example, create a special object field type (maybe root_object) or have a mapping parameter for object field types that optionally makes the fields within a document available at the top-level. Any defined subfield is also available as an alias at the same level that the object field is defined. These fields behave in queries and aggregations as if they had been defined at the root level. When returning fields from a search, they should probably only be returned with the prefix, for example ["attributes.host.name": ["my-host"]], rather than duplicated, such as ["attributes.host.name": ["my-host"], "host.name": ["my-host"]]

When there are multiple root_object definitions, there needs to be a declared order of precedence in which attributes are resolved. For example, if both attributes and resource_attributes define a service.name field, the one in resource_attributes should win.

This can also help with #98384 as we can add dynamic templates that define attributes.* and resource_attributes.* as time_series_dimension. Alternatively, this new field type can be marked as a time_series_dimension which implies that all its sub-fields are also dimensions.

Example mapping:

{
  "dynamic": "strict",
  "properties": {
    "resource_attributes": {
      "type": "root_object",
      "dynamic": true,
      "subobjects": false,
      "time_series_dimension": true
      "properties": {
        "host.name": {
          "type": "keyword"
        }
      }
    }
  }
}

cc @elastic/opentelemetry-leads @elastic/es-analytics-geo

@felixbarny felixbarny added :Search Foundations/Mapping Index mappings, including merging and defining field types >feature labels Dec 19, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Dec 19, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@felixbarny felixbarny changed the title Allow all properties of an object to be references as top-level attributes Allow all properties of an object to be queried and aggregated as top-level attributes Dec 19, 2023
@kkrik-es kkrik-es self-assigned this Dec 20, 2023
kkrik-es added a commit to kkrik-es/elasticsearch that referenced this issue Dec 21, 2023
`PassthoughObjectMapper` extends `ObjectMapper` to create a container
for fields that also need to be referenced as if they were at the root
level. This is done by creating aliases for all its subfields.

It also supports an option of annotating all its subfields as
dimensions. This will be leveraged in TSDB, where dimension fields can
be dynamically defined as nested under a passthrough object - and still
referenced directly (i.e. without prefixes) in aggregation queries.

Related to elastic#103567
@felixbarny felixbarny linked a pull request Jan 10, 2024 that will close this issue
kkrik-es added a commit that referenced this issue Feb 1, 2024
* Introduce passthrough field type

`PassthoughObjectMapper` extends `ObjectMapper` to create a container
for fields that also need to be referenced as if they were at the root
level. This is done by creating aliases for all its subfields.

It also supports an option of annotating all its subfields as
dimensions. This will be leveraged in TSDB, where dimension fields can
be dynamically defined as nested under a passthrough object - and still
referenced directly (i.e. without prefixes) in aggregation queries.

Related to #103567

* Update docs/changelog/103648.yaml

* no subobjects

* create dimensions dynamically

* remove unused method

* restore ignoreAbove incompatibility with dimension

* fix test

* refactor, skip aliases on conflict

* fix branch

* fix branch

* add tests

* update test

* remove unused variable

* add yaml test for subobject

* minor refactoring

* add unittest for PassThroughObjectMapper

* suggested fixes

* suggested fixes

* update yaml with warning for duplicate alias

* updates from review

* add withoutMappers()
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 9, 2024
That also asserts routing aspects of indexing, searching and getting by id.

Relates to elastic#103567
elasticsearchmachine pushed a commit that referenced this issue Feb 9, 2024
That also asserts routing aspects of indexing, searching and getting by
id.

Relates to #103567
elasticsearchmachine pushed a commit that referenced this issue Mar 13, 2024
#106080)

Supporting non-keyword fields requires updating non-keyword fields in
the routing path to be included in routing calculations. Routing is
performed in coordinating nodes that lack mappings (or mappings haven't
been created yet, for dynamically-defined dimensions), so the routing
hash they calculate are passed to data nodes and stored in a new fields,
namely _ts_routind_hash. This is included in the _id field, in turn, so
that it can consistently reach the right shard for get-by-id and
delete-by-id operations.

A few interesting points:

- The hash is passed from the coordinating to data nodes using the `routing` field in `IndexRequest`; adding another field to the latter requires updating dozens of classes.
- We explicitly skip (double-) storing the hash to the routing field, as the latter is not optimized for storage using the TSDB codec.
- The routing hash may not be available in Translog operations, it can then be retrieved from the `id` prefix.

Related to #103567
@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:StorageEngine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants