Skip to content

Commit

Permalink
Add docs about flow_published_at metadata field (#1483)
Browse files Browse the repository at this point in the history
  • Loading branch information
danthelion authored Jun 6, 2024
1 parent 5209ca6 commit 5ec391d
Showing 1 changed file with 37 additions and 1 deletion.
38 changes: 37 additions & 1 deletion site/docs/concepts/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,43 @@ An example document for a collection with two fields, `name` and `count` is show
}
```

The `_meta` object is present in all Flow documents, and contains metadata added by Flow. Minimally, every document `_meta` always has a `uuid`, which is a globally unique id for each document. Some capture connectors may add additional `_meta` properties to tie each document to a specific record within the source system. Documents that were captured from cloud storage connectors, for example, will contain `/_meta/file` and `/_meta/offset` properties that tell you where the document came from within your cloud storage bucket.
### System Fields and Metadata

The `_meta` object is present in all Flow documents, and contains metadata added by Flow.
Minimally, every document `_meta` always has a `uuid`, which is a globally unique id for each document.
Some capture connectors may add additional `_meta` properties to tie each document to a specific record
within the source system. Documents that were captured from cloud storage connectors, for example,
will contain `/_meta/file` and `/_meta/offset` properties that tell you where the document came from
within your cloud storage bucket.

#### _meta/uuid

The `_meta/uuid` field is a system-generated globally unique identifier for each document within Estuary Flow.

#### flow_published_at

The `flow_published_at` field is a system-generated timestamp within Estuary Flow,
derived from the runtime environment. It captures the exact moment a document is published to a collection,
offering a reliable proxy for when the document was last modified or inserted.

- Source: The `flow_published_at` field is generated by the runtime environment of Estuary Flow.
- Definition: This field represents the timestamp when a document is captured and subsequently published
to a collection. Essentially, it is a projection of the `_meta/uuid` field, where the UUID contains an
encoded timestamp component.
- Availability: The `flow_published_at` field is available in every collection, as it is a derived
projection from the `_meta/uuid` field.

For a given document identified by a unique key, the `flow_published_at` field can be used as a proxy for
the last time the document was modified. This is particularly useful when performing incremental updates
or transformations, such as in a data warehouse environment.

When dealing with materializations that are _not_ delta updates:

- A document in Estuary Flow is any JSON object emitted by a capture connector.
The `flow_published_at` field provides the timestamp for when this JSON object was captured and inserted
into the collection.
- If the collection is reduced with a strategy like `lastWriteWins` or `merge` on the
capture side, `flow_published_at` becomes the timestamp for the last event that updated the document.

## Viewing collection documents

Expand Down

0 comments on commit 5ec391d

Please sign in to comment.