From 5ec391d9a4516cfa8981edce72d1968342034aef Mon Sep 17 00:00:00 2001 From: Daniel Palma Date: Thu, 6 Jun 2024 12:21:20 +0100 Subject: [PATCH] Add docs about flow_published_at metadata field (#1483) --- site/docs/concepts/collections.md | 38 ++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/site/docs/concepts/collections.md b/site/docs/concepts/collections.md index 396e7015f5..dfd160905e 100644 --- a/site/docs/concepts/collections.md +++ b/site/docs/concepts/collections.md @@ -38,7 +38,43 @@ An example document for a collection with two fields, `name` and `count` is show } ``` -The `_meta` object is present in all Flow documents, and contains metadata added by Flow. Minimally, every document `_meta` always has a `uuid`, which is a globally unique id for each document. Some capture connectors may add additional `_meta` properties to tie each document to a specific record within the source system. Documents that were captured from cloud storage connectors, for example, will contain `/_meta/file` and `/_meta/offset` properties that tell you where the document came from within your cloud storage bucket. +### System Fields and Metadata + +The `_meta` object is present in all Flow documents, and contains metadata added by Flow. +Minimally, every document `_meta` always has a `uuid`, which is a globally unique id for each document. +Some capture connectors may add additional `_meta` properties to tie each document to a specific record +within the source system. Documents that were captured from cloud storage connectors, for example, +will contain `/_meta/file` and `/_meta/offset` properties that tell you where the document came from +within your cloud storage bucket. + +#### _meta/uuid + +The `_meta/uuid` field is a system-generated globally unique identifier for each document within Estuary Flow. + +#### flow_published_at + +The `flow_published_at` field is a system-generated timestamp within Estuary Flow, +derived from the runtime environment. It captures the exact moment a document is published to a collection, +offering a reliable proxy for when the document was last modified or inserted. + +- Source: The `flow_published_at` field is generated by the runtime environment of Estuary Flow. +- Definition: This field represents the timestamp when a document is captured and subsequently published +to a collection. Essentially, it is a projection of the `_meta/uuid` field, where the UUID contains an +encoded timestamp component. +- Availability: The `flow_published_at` field is available in every collection, as it is a derived +projection from the `_meta/uuid` field. + +For a given document identified by a unique key, the `flow_published_at` field can be used as a proxy for +the last time the document was modified. This is particularly useful when performing incremental updates +or transformations, such as in a data warehouse environment. + +When dealing with materializations that are _not_ delta updates: + +- A document in Estuary Flow is any JSON object emitted by a capture connector. +The `flow_published_at` field provides the timestamp for when this JSON object was captured and inserted +into the collection. +- If the collection is reduced with a strategy like `lastWriteWins` or `merge` on the +capture side, `flow_published_at` becomes the timestamp for the last event that updated the document. ## Viewing collection documents