-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change @timestamp
field type to date_nanos
in Logs data streams
#102548
base: main
Are you sure you want to change the base?
Conversation
Hi @eyalkoren, I've created a changelog YAML for you. |
@timestamp
field type to date_nanos
in Logs data streams
It would be great to get feedback from the following groups:
Eventually we have to make this change because of otel and as I would argue, it is the better default. The main question for me is how we can roll it out in the best way with the minimal risk for our users. Some more ideas:
|
I can only comment on the need from OTel perspective: With OTLP using a nano-precision timestamp and our vision to store OTLP data with least mapping possible, we will need this for "Going big on OTel" rather sooner than later. I'm not deep enough though to comment on the risks. |
While true, this risk only affects users that actually ingest logs with timestamps in a resolution that's more granular than milliseconds. If the input timestamps are in milliseconds (such as I suppose this will not be the case for most typical logging use cases but it does have an effect for traces (which are out of scope for this PR, though).
While there's a risk, the semantic of @weltenwort I'd also be interested in your opinion on the risks of changing the field type for |
You should expect higher storage, but I don't think that this should drive the decision. If nano resolution is the standard, let's adopt it instead of fighting against it. We should just double check that everything we need works with nanosecond resolution, historically we had a couple incompatible features, hopefully they all got fixed since then. |
Seconding @AlexanderWert's and @jpountz's comments. I don't have anything more to add :) |
It's hard to judge the risk of breaking clients in general since the parsing behavior is very dependent on the specific language and libraries used. Time and date representation is always a tricky topic. So in that sense it sounds like a breaking change to me. Looking at Kibana specifically, the language mostly reduces the precision of too large numbers instead of failing the parsing. But as far as I'm aware the best practices are to use the string representations anyway. @felixbarny not sure if that is the information you were looking for. |
From an end-user perspective the biggest reason why I want nanoseconds is to reduce the likelihood of log reordering in Kibana Discover and the Log Stream UI. There is currently no reliable tiebreaker that would prevent that (e.g., filebeat sending logs into one index with 4 shards, sometimes 10s of log lines per microsecond). If that could be fixed I would be happy - with or without nanoseconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should adapt the logs@default-pipeline
a bit here. Currently, it assigns a default @timestamp
if it has not been set by the user, that's equal to the arrival time of the document. This change implicitly also increases the precision of that default timestamp from milliseconds to microseconds. I don't think this level of precision is required for the default timestamp. Therefore, I propose to truncate the timestamp in a script processor. For example like so:
{
"script": {
"if": "ctx['@timestamp'] == null",
"source": "ctx['@timestamp'] = System.currentTimeMillis()"
}
}
That's only the case when using timestamps that have a higher precision than milliseconds, right? If the precision of the stored timestamps doesn't increase, Lucene will optimize storage using gcd encoding, is that correct?
Using the default format for |
Playing devils advocate here and taking the comment from @StephanErb into account. Lets assume for some reason the timestamp was not set on the edge. The bulk request arriving still has the docs in the order of how each line was read. If we use more precision, will each entry have a slightly different timestamp and with it gets the right sorting order? Or will it get all the same timestamp anyways? |
The default timestamp is just a guard against data loss rather than a full replacement for properly setting and parsing the timestamp. Documents that are processed in the same microsecond would still have the same timestamp currently. So an actual nano-precision timestamp set in the application that produces the logs, or the |
This is correct for doc values. The KD index is a bit less smart about these things, so a |
Thanks for all the great feedback everyone 🙏
@felixbarny True. As you noted, we do set automatically the
@felixbarny This is not a theoretical thing, it is a real issue that broke the Logs UI. You are right, the issue is not that I'd need a bottom line here to decide on next steps. To do it as easy as possible, let's start with a go/no-go decision on this change as it is proposed now. Please add your votes with 👍 (go) or 👎 (no-go) and feel free to add a comment below either way if required, for example: "I think we should proceed with this change but it also requires ...." or "I think we should block this PR until the following dependencies are done: ..." |
It has more to do with cost rather than risk. High-resolution timestamps have a real cost for users. We already know that the default timestamp (arrival timestamp) isn't the actual timestamp so I don't think paying for the additional resolution is really worth it given that the precision in terms of getting closer to the real timestamp doesn't increase. The counter to that is that it gives more precision for when documents have arrived which could help with ordering. I think it's a valid argument and there's some value added by going to a higher granularity. But I still doubt that the additional costs are worth it in the specific case of the default timestamp.
I'm not trying to dismiss the risks here but get more fidelity into what the exact risks are. I do acknowledge that there's some level of risk associated with making the change. At the same time, I think it's less of a question of whether or not we should make the change. I think it's pretty clear that we'll do it. The question is how we roll out the change, how we minimize the risk, and how we communicate the change to users. Given that there's relatively easy workaround for users that do experience issues by using the I'm also wondering whether we should engage the breaking changes committee for this. @jpountz as you're part of the committee, what's your recommendation here? |
Yes, that's the point, it relates to the tie breaking discussion above. I don't think the storage cost worths consideration in this case. Our automatic timestamp setting should be applied to a negligible fraction of the documents (in general, not within a specific index necessarily). When it is applied, I think it's fine that it consumes the same storage. I don't mind either way, only raising this question.
OK, that's an important input. My understanding was that there may be an alternative, for example push nanosecond-resolution timestamp within ECS and then rely on that. If we go this path, it may be a different field, which means that this PR is irrelevant. If we have an agreement that we go with this change, let's do discuss the rollout and whether we need to do something other than documenting and notifying (for which I will need some more specific pointers as well). |
No significant additional comments to make here - agree with the comments above. We should pay the cost now since this is generally the direction we need to go in, especially with OTel data where we may not always be able to depend on a Though not in scope of this PR, worth mentioning that I do think it makes sense to avoid making this change on metrics data streams for now, since |
@joshdover maybe it is relevant to this PR. |
Actually - I got my fields mixed up. |
@joshdover I've not experimented with changing the the field type of the |
A significant part of this thread has been about storage costs. Would it be an option to bundle the |
We could bundle it together with the logsdb change but to be honest, I would prefer to make the change for everything, we just didn't get around yet to push it forward. |
@StephanErb in the meantime, are you able to override the definition of the |
To keep us moving forward on this one:
Do I miss something in the above summary? If not, I suggest we move forward by running the storage test and find alignment on how we communicate the above change best to warn users ahead. |
@ruflin has someone determined the storage impact of date_nanos in logs? |
@tylerperk I'm not aware that this was done so far. We should still do this but also have a look at the conversation around storage in the previous comments. |
Is there a way I can switch to nanos for OTel logs? It seems logs are only sorted up until millis and range is only considered up until millis. This is extremely confusing and I would like to overwrite this behavior somehow in my elastic cloud. Can I solve this with a query or do I have to change ingest pipelines/templates? Aplogoies if this isn't the right place to ask but I am sure a lot of people using OTel will find this useful. for example this query: "query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": "2024-11-14T13:13:11.092822461Z"
}
}
}
]
}
} is giving me results with time: 2024-11-14T13:13:11.092812461Z which is clearly earlier... Is it possible and thanks in advance! |
The plan is to default to But you can customize the data type of |
I don't have logs@custom mapping but I have logs@mappings, I assume it's the same and I can update the data type here? {
"component_templates": [
{
"name": "logs@mappings",
"component_template": {
"template": {
"mappings": {
"date_detection": false,
"properties": {
"@timestamp": {
"type": "date_nanos"
},
"data_stream.namespace": {
"type": "constant_keyword"
},
"data_stream.dataset": {
"type": "constant_keyword"
},
"data_stream.type": {
"type": "constant_keyword",
"value": "logs"
}
}
}
},
"version": 14,
"_meta": {
"description": "default mappings for the logs index template installed by x-pack",
"managed": true
},
"deprecated": false
}
}
]
} |
It's best not to modify |
Changing the
@timestamp
field type for Logs data streams in order to support nanosecond precision.Implementation
Logs data streams have two sources defining the
@timestamp
field -ecs@mappings
andlogs@mappings
.As long as there is no formal support for
date_nanos
timestamps in ECS, we can only change it forlogs-*-*
data streams throughlogs@mappings
, which will take precedence as it is an explicit mapping, as opposed to the ECS dynamic mappings.Risks
What stems from the above is that the change proposed here introduces an incompatibility of the default Logs index template with ECS.
Although
date
anddate_nanos
types are mostly compatible and interchangeable, there is a risk of breaking specific client behaviors with this change.Even though the value returned through searches is always a string, there may still be inconsistencies created when switching from
date
todate_nanos
, for example:format
that is associated with it. Since we don't define explicit format for@timestamp
(before or within this change), the returned string will be formatted according to the default format. Since the default format fordate
isstrict_date_optional_time
and the default format fordate_nanos
isstrict_date_optional_time_nanos
, the returned value will be different, for example -2023-11-23T12:51:54.048204Z
instead of2023-11-23T12:51:54.048Z
. While not a huge risk, this may fail clients that run verifications on the returned value format. Even only breaking tests that assert for specific values is unpleasant.epoch_millis
) is specified in the search request and the client fails to parse it due to its different length, like the issue that was recently fixed in Kibana, where the TypeScript library failed to parse the nanoseconds date representation as it was too long.Still missing
date
timestamp and some withdate_nanos
timestampdate
type throughlogs@custom