diff --git a/website/docs/guides/redshift-qs.md b/website/docs/guides/redshift-qs.md
index 8990c4db925..38769adbac8 100644
--- a/website/docs/guides/redshift-qs.md
+++ b/website/docs/guides/redshift-qs.md
@@ -21,7 +21,7 @@ In this quickstart guide, you'll learn how to use dbt Cloud with Redshift. It wi
- Document your models
- Schedule a job to run
-:::tips Videos for you
+:::tip Videos for you
Check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos.
:::
diff --git a/website/docs/guides/snowflake-qs.md b/website/docs/guides/snowflake-qs.md
index 40bdeed1ef2..18e77ce050c 100644
--- a/website/docs/guides/snowflake-qs.md
+++ b/website/docs/guides/snowflake-qs.md
@@ -46,7 +46,7 @@ You can also watch the [YouTube video on dbt and Snowflake](https://www.youtube.
## Create a new Snowflake worksheet
1. Log in to your trial Snowflake account.
-2. In the Snowflake UI, click **+ Worksheet** in the upper right corner to create a new worksheet.
+2. In the Snowflake UI, click **+ Create** in the left-hand corner, underneath the Snowflake logo, which opens a dropdown. Select the first option, **SQL Worksheet**.
## Load data
The data used here is stored as CSV files in a public S3 bucket and the following steps will guide you through how to prepare your Snowflake account for that data and upload it.
diff --git a/website/docs/reference/artifacts/run-results-json.md b/website/docs/reference/artifacts/run-results-json.md
index 13ad528d185..118b5615ea8 100644
--- a/website/docs/reference/artifacts/run-results-json.md
+++ b/website/docs/reference/artifacts/run-results-json.md
@@ -3,14 +3,17 @@ title: "Run results JSON file"
sidebar_label: "Run results"
---
-**Current schema**: [`v5`](https://schemas.getdbt.com/dbt/run-results/v5/index.html)
+**Current schema**: [`v6`](https://schemas.getdbt.com/dbt/run-results/v6/index.html)
**Produced by:**
[`build`](/reference/commands/build)
+ [`clone`](/reference/commands/clone)
[`compile`](/reference/commands/compile)
[`docs generate`](/reference/commands/cmd-docs)
+ [`retry`](/reference/commands/retry)
[`run`](/reference/commands/run)
[`seed`](/reference/commands/seed)
+ [`show`](/reference/commands/show)
[`snapshot`](/reference/commands/snapshot)
[`test`](/reference/commands/test)
[`run-operation`](/reference/commands/run-operation)
diff --git a/website/docs/reference/commands/parse.md b/website/docs/reference/commands/parse.md
index 5e8145762f7..967991522bc 100644
--- a/website/docs/reference/commands/parse.md
+++ b/website/docs/reference/commands/parse.md
@@ -9,7 +9,7 @@ The `dbt parse` command parses and validates the contents of your dbt project. I
It will also produce an artifact with detailed timing information, which is useful to understand parsing times for large projects. Refer to [Project parsing](/reference/parsing) for more information.
-Starting in v1.5, `dbt parse` will write or return a [manifest](/reference/artifacts/manifest-json), enabling you to introspect dbt's understanding of all the resources in your project.
+Starting in v1.5, `dbt parse` will write or return a [manifest](/reference/artifacts/manifest-json), enabling you to introspect dbt's understanding of all the resources in your project. Since `dbt parse` doesn't connect to your warehouse, [this manifest will not contain any compiled code](/faqs/Warehouse/db-connection-dbt-compile).
By default, the dbt Cloud IDE will attempt a "partial" parse, which means it'll only check changes since the last parse (new or updated parts of your project when you make changes). Since the dbt Cloud IDE automatically parses in the background whenever you save your work, manually running `dbt parse` yourself is likely to be fast because it's just looking at recent changes.
diff --git a/website/docs/reference/commands/version.md b/website/docs/reference/commands/version.md
index 4d5ce6524dd..9643be92ab8 100644
--- a/website/docs/reference/commands/version.md
+++ b/website/docs/reference/commands/version.md
@@ -13,7 +13,7 @@ The `--version` command-line flag returns information about the currently instal
## Versioning
To learn more about release versioning for dbt Core, refer to [How dbt Core uses semantic versioning](/docs/dbt-versions/core#how-dbt-core-uses-semantic-versioning).
-If using a [dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks), which provide ongoing updates to dbt, then `dbt_version` represents the release version of dbt in dbt Cloud. This also follows semantic versioning guidelines, using the `YYYY.MM.DD+
` format. The year, month, and day represent the date the version was built (for example, `2024.10.28+996c6a8`). The suffix provides an additional unique identification for each build.
+If using a [dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks), which provide ongoing updates to dbt, then `dbt_version` represents the release version of dbt in dbt Cloud. This also follows semantic versioning guidelines, using the `YYYY.M.D+` format. The year, month, and day represent the date the version was built (for example, `2024.10.8+996c6a8`). The suffix provides an additional unique identification for each build.
## Example usages
diff --git a/website/docs/reference/database-permissions/redshift-permissions.md b/website/docs/reference/database-permissions/redshift-permissions.md
index 5f0949a3528..2bca5d4acfa 100644
--- a/website/docs/reference/database-permissions/redshift-permissions.md
+++ b/website/docs/reference/database-permissions/redshift-permissions.md
@@ -10,16 +10,16 @@ The following example provides you with the SQL statements you can use to manage
**Note** that `database_name`, `database.schema_name`, and `user_name` are placeholders and you can replace them as needed for your organization's naming convention.
-
```
-grant usage on database database_name to user_name;
grant create schema on database database_name to user_name;
grant usage on schema database.schema_name to user_name;
grant create table on schema database.schema_name to user_name;
grant create view on schema database.schema_name to user_name;
-grant usage on all schemas in database database_name to user_name;
+grant usage for schemas in database database_name to role role_name;
grant select on all tables in database database_name to user_name;
grant select on all views in database database_name to user_name;
```
+To connect to the database, confirm with an admin that your user role or group has been added to the database. Note that Redshift permissions differ from Postgres, and commands like [`grant connect`](https://www.postgresql.org/docs/current/sql-grant.html) aren't supported in Redshift.
+
Check out the [official documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html) for more information.
diff --git a/website/docs/reference/node-selection/methods.md b/website/docs/reference/node-selection/methods.md
index 600a578ef8e..29eb79a9130 100644
--- a/website/docs/reference/node-selection/methods.md
+++ b/website/docs/reference/node-selection/methods.md
@@ -7,6 +7,11 @@ Selector methods return all resources that share a common property, using the
syntax `method:value`. While it is recommended to explicitly denote the method,
you can omit it (the default value will be one of `path`, `file` or `fqn`).
+
+
+The `--select` and `--selector` arguments sound similar, but they are different. To understand the difference, see [Differences between `--select` and `--selector`](/reference/node-selection/yaml-selectors#difference-between---select-and---selector).
+
+
Many of the methods below support Unix-style wildcards:
diff --git a/website/docs/reference/node-selection/syntax.md b/website/docs/reference/node-selection/syntax.md
index 2e53eff72df..dce2a9c40e8 100644
--- a/website/docs/reference/node-selection/syntax.md
+++ b/website/docs/reference/node-selection/syntax.md
@@ -21,6 +21,8 @@ dbt's node selection syntax makes it possible to run only specific resources in
We use the terms "nodes" and "resources" interchangeably. These encompass all the models, tests, sources, seeds, snapshots, exposures, and analyses in your project. They are the objects that make up dbt's DAG (directed acyclic graph).
:::
+The `--select` and `--selector` arguments are similar in that they both allow you to select resources. To understand the difference, see [Differences between `--select` and `--selector`](/reference/node-selection/yaml-selectors#difference-between---select-and---selector).
+
## Specifying resources
By default, `dbt run` executes _all_ of the models in the dependency graph; `dbt seed` creates all seeds, `dbt snapshot` performs every snapshot. The `--select` flag is used to specify a subset of nodes to execute.
@@ -103,6 +105,8 @@ As your selection logic gets more complex, and becomes unwieldly to type out as
consider using a [yaml selector](/reference/node-selection/yaml-selectors). You can use a predefined definition with the `--selector` flag.
Note that when you're using `--selector`, most other flags (namely `--select` and `--exclude`) will be ignored.
+The `--select` and `--selector` arguments are similar in that they both allow you to select resources. To understand the difference between `--select` and `--selector` arguments, see [this section](/reference/node-selection/yaml-selectors#difference-between---select-and---selector) for more details.
+
### Troubleshoot with the `ls` command
Constructing and debugging your selection syntax can be challenging. To get a "preview" of what will be selected, we recommend using the [`list` command](/reference/commands/list). This command, when combined with your selection syntax, will output a list of the nodes that meet that selection criteria. The `dbt ls` command supports all types of selection syntax arguments, for example:
@@ -136,15 +140,6 @@ Together, the [`state`](/reference/node-selection/methods#state) selector and de
State and defer can be set by environment variables as well as CLI flags:
-- `--state` or `DBT_STATE`: file path
-- `--defer` or `DBT_DEFER`: boolean
-
-:::warning Syntax deprecated
-
-In dbt v1.5, we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined.
-
-:::
-
- `--state` or `DBT_STATE`: file path
- `--defer` or `DBT_DEFER`: boolean
- `--defer-state` or `DBT_DEFER_STATE`: file path to use for deferral only (optional)
@@ -157,6 +152,12 @@ If both the flag and env var are provided, the flag takes precedence.
- The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version.
- These are powerful, complex features. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison.
+:::warning Syntax deprecated
+
+In [dbt v1.5](/docs/dbt-versions/core-upgrade/upgrading-to-v1.5#behavior-changes), we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined.
+
+:::
+
### The "result" status
Another element of job state is the `result` of a prior dbt invocation. After executing a `dbt run`, for example, dbt creates the `run_results.json` artifact which contains execution times and success / error status for dbt models. You can read more about `run_results.json` on the ['run results'](/reference/artifacts/run-results-json) page.
@@ -204,7 +205,7 @@ When a job is selected, dbt Cloud will surface the artifacts from that job's mos
After issuing the `dbt source freshness` command, you can reference the source freshness results by adding a selector to a subsequent command:
```bash
-# You can also set the DBT_ARTIFACT_STATE_PATH environment variable instead of the --state flag.
+# You can also set the DBT_STATE environment variable instead of the --state flag.
dbt source freshness # must be run again to compare current to previous state
dbt build --select "source_status:fresher+" --state path/to/prod/artifacts
```
diff --git a/website/docs/reference/node-selection/yaml-selectors.md b/website/docs/reference/node-selection/yaml-selectors.md
index ff6628919b7..ef7ca1673eb 100644
--- a/website/docs/reference/node-selection/yaml-selectors.md
+++ b/website/docs/reference/node-selection/yaml-selectors.md
@@ -288,3 +288,22 @@ selectors:
**Note:** While selector inheritance allows the logic from another selector to be _reused_, it doesn't allow the logic from that selector to be _modified_ by means of `parents`, `children`, `indirect_selection`, and so on.
The `selector` method returns the complete set of nodes returned by the named selector.
+
+## Difference between `--select` and `--selector`
+
+In dbt, [`select`](/reference/node-selection/syntax#how-does-selection-work) and `selector` are related concepts used for choosing specific models, tests, or resources. The following tables explains the differences and when to best use them:
+
+| Feature | `--select` | `--selector` |
+| ------- | ---------- | ------------- |
+| Definition | Ad-hoc, specified directly in the command. | Pre-defined in `selectors.yml` file. |
+| Usage | One-time or task-specific filtering.| Reusable for multiple executions. |
+| Complexity | Requires manual entry of selection criteria. | Can encapsulate complex logic for reuse. |
+| Flexibility | Flexible; less reusable. | Flexible; focuses on reusable and structured logic.|
+| Example | `dbt run --select my_model+`
(runs `my_model` and all downstream dependencies with the `+` operator). | `dbt run --selector nightly_diet_snowplow`
(runs models defined by the `nightly_diet_snowplow` selector in `selectors.yml`). |
+
+Notes:
+- You can combine `--select` with `--exclude` for ad-hoc selection of nodes.
+- The `--select` and `--selector` syntax both provide the same overall functions for node selection. Using [graph operators](/reference/node-selection/graph-operators) (such as `+`, `@`.) and [set operators](/reference/node-selection/set-operators) (such as `union` and `intersection`) in `--select` is the same as YAML-based configs in `--selector`.
+
+
+For additional examples, check out [this GitHub Gist](https://gist.github.com/jeremyyeo/1aeca767e2a4f157b07955d58f8078f7).
diff --git a/website/docs/reference/resource-configs/athena-configs.md b/website/docs/reference/resource-configs/athena-configs.md
index fd5bc663ee7..082f3b5c249 100644
--- a/website/docs/reference/resource-configs/athena-configs.md
+++ b/website/docs/reference/resource-configs/athena-configs.md
@@ -106,7 +106,7 @@ lf_grants={
-There are some limitations and recommendations that should be considered:
+Consider these limitations and recommendations:
- `lf_tags` and `lf_tags_columns` configs support only attaching lf tags to corresponding resources.
- We recommend managing LF Tags permissions somewhere outside dbt. For example, [terraform](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lakeformation_permissions) or [aws cdk](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_lakeformation-readme.html).
@@ -114,8 +114,7 @@ There are some limitations and recommendations that should be considered:
- Any tags listed in `lf_inherited_tags` should be strictly inherited from the database level and never overridden at the table and column level.
- Currently, `dbt-athena` does not differentiate between an inherited tag association and an override it made previously.
- For example, If a `lf_tags_config` value overrides an inherited tag in one run, and that override is removed before a subsequent run, the prior override will linger and no longer be encoded anywhere (for example, Terraform where the inherited value is configured nor in the DBT project where the override previously existed but now is gone).
-
-
+
### Table location
The saved location of a table is determined in precedence by the following conditions:
@@ -144,6 +143,9 @@ The following [incremental models](https://docs.getdbt.com/docs/build/incrementa
- `append`: Insert new records without updating, deleting or overwriting any existing data. There might be duplicate data (great for log or historical data).
- `merge`: Conditionally updates, deletes, or inserts rows into an Iceberg table. Used in combination with `unique_key`.It is only available when using Iceberg.
+Consider this limitation when using Iceberg models:
+
+- Incremental Iceberg models — Sync all columns on schema change. You can't remove columns used for partitioning with an incremental refresh; you must fully refresh the model.
### On schema change
@@ -361,8 +363,7 @@ The materialization also supports invalidating hard deletes. For usage details,
### Snapshots known issues
-- Incremental Iceberg models - Sync all columns on schema change. Columns used for partitioning can't be removed. From a dbt perspective, the only way is to fully refresh the incremental model.
-- Tables, schemas and database names should only be lowercase
+- Tables, schemas, and database names should only be lowercase.
- To avoid potential conflicts, make sure [`dbt-athena-adapter`](https://github.com/Tomme/dbt-athena) is not installed in the target environment.
- Snapshot does not support dropping columns from the source table. If you drop a column, make sure to drop the column from the snapshot as well. Another workaround is to NULL the column in the snapshot definition to preserve the history.
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
index 1ee89efc95c..95bed967a14 100644
--- a/website/docs/reference/resource-configs/databricks-configs.md
+++ b/website/docs/reference/resource-configs/databricks-configs.md
@@ -10,19 +10,18 @@ When materializing a model as `table`, you may include several optional configs
-| Option | Description | Required? | Model Support | Example |
-|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|---------------|--------------------------|
-| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` |
-| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` |
-| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` |
-| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL | `date_day` |
-| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` |
-| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` |
-| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` |
-| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` |
-
-\* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation.
-We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties.
+| Option | Description | Required? | Model support | Example |
+|-----------|---------|-------------------|---------------|-------------|
+| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` |
+| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` |
+| partition_by | Partition the created table by the specified columns. A directory is created for each partition.| Optional | SQL, Python | `date_day` |
+| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL | `date_day` |
+| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` |
+| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` |
+| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` |
+| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` |
+
+\* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. There is not yet a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties.
@@ -30,45 +29,47 @@ We do not yet have a PySpark API to set tblproperties at table creation, so this
1.8 introduces support for [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) at the table level, in addition to all table configuration supported in 1.7.
-| Option | Description | Required? | Model Support | Example |
-|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|---------------|--------------------------|
-| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` |
-| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` |
-| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` |
-| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` |
-| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` |
-| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` |
-| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` |
-| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL+, Python+ | `{'my_tag': 'my_value'}` |
-| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` |
+| Option | Description | Required?| Model support | Example |
+|-----------|---------------|----------|---------------|----------|
+| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` |
+| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` |
+| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` |
+| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` |
+| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` |
+| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` |
+| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` |
+| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL†, Python† | `{'my_tag': 'my_value'}` |
+| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` |
\* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation.
We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties.
-\+ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements.
-
+† `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements.
+
dbt-databricks v1.9 adds support for the `table_format: iceberg` config. Try it now on the [dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks). All other table configurations were also supported in 1.8.
-| Option | Description | Required? | Model Support | Example |
-|---------------------|-----------------------------|-------------------------------------------|-----------------|--------------------------|
-| table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` |
-| file_format+ | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` |
-| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` |
-| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` |
+| Option | Description| Required? | Model support | Example |
+|-------------|--------|-----------|-----------------|---------------|
+| table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` |
+| file_format † | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` |
+| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` |
+| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` |
| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` |
-| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` |
-| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` |
-| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` |
-| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL++, Python++ | `{'my_tag': 'my_value'}` |
-| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` |
+| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` |
+| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` |
+| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` |
+| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL ‡ , Python ‡ | `{'my_tag': 'my_value'}` |
+| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` |
\* We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties.
-\+ When `table_format` is `iceberg`, `file_format` must be `delta`.
-\++ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements.
+
+† When `table_format` is `iceberg`, `file_format` must be `delta`.
+
+‡ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements.
@@ -260,7 +261,7 @@ This strategy is currently only compatible with All Purpose Clusters, not SQL Wa
This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select _all_ of the relevant data for a partition when using this incremental strategy.
-If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` + `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead.
+If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` and `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead.
-
```yml
@@ -139,20 +138,28 @@ sources:
## Definition
-Set the `event_time` to the name of the field that represents the timestamp of the event -- "at what time did the row occur" -- as opposed to an event ingestion date. You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block.
+You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block.
+
+`event_time` is required for the [incremental microbatch](/docs/build/incremental-microbatch) strategy and highly recommended for [Advanced CI's compare changes](/docs/deploy/advanced-ci#optimizing-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.
+
+### Best practices
+
+Set the `event_time` to the name of the field that represents the actual timestamp of the event (like `account_created_at`). The timestamp of the event should represent "at what time did the row occur" rather than an event ingestion date. Marking a column as the `event_time` when it isn't, diverges from the semantic meaning of the column which may result in user confusion when other tools make use of the metadata.
-Here are some examples of good and bad `event_time` columns:
+However, if an ingestion date (like `loaded_at`, `ingested_at`, or `last_updated_at`) are the only timestamps you use, you can set `event_time` to these fields. Here are some considerations to keep in mind if you do this:
-- ✅ Good:
- - `account_created_at` — This represents the specific time when an account was created, making it a fixed event in time.
- - `session_began_at` — This captures the exact timestamp when a user session started, which won’t change and directly ties to the event.
+- Using `last_updated_at` or `loaded_at` — May result in duplicate entries in the resulting table in the data warehouse over multiple runs. Setting an appropriate [lookback](/reference/resource-configs/lookback) value can reduce duplicates but it can't fully eliminate them since some updates outside the lookback window won't be processed.
+- Using `ingested_at` — Since this column is created by your ingestion/EL tool instead of coming from the original source, it will change if/when you need to resync your connector for some reason. This means that data will be reprocessed and loaded into your warehouse for a second time against a second date. As long as this never happens (or you run a full refresh when it does), microbatches will be processed correctly when using `ingested_at`.
-- ❌ Bad:
+Here are some examples of recommended and not recommended `event_time` columns:
- - `_fivetran_synced` — This isn't the time that the event happened, it's the time that the event was ingested.
- - `last_updated_at` — This isn't a good use case as this will keep changing over time.
-`event_time` is required for [Incremental microbatch](/docs/build/incremental-microbatch) and highly recommended for [Advanced CI's compare changes](/docs/deploy/advanced-ci#optimizing-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.
+| Status
| Column name | Description |
+|--------------------|---------------------|----------------------|
+| ✅ Recommended | `account_created_at` | Represents the specific time when an account was created, making it a fixed event in time. |
+| ✅ Recommended | `session_began_at` | Captures the exact timestamp when a user session started, which won’t change and directly ties to the event. |
+| ❌ Not recommended | `_fivetran_synced` | This represents the time the event was ingested, not when it happened. |
+| ❌ Not recommended | `last_updated_at` | Changes over time and isn't tied to the event itself. If used, note the considerations mentioned earlier in [best practices](#best-practices). |
## Examples
diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md
index 859e4e9e31a..4556544d189 100644
--- a/website/docs/reference/resource-configs/hard-deletes.md
+++ b/website/docs/reference/resource-configs/hard-deletes.md
@@ -48,7 +48,9 @@ snapshots:
## Description
-The `hard_deletes` config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table.
+The `hard_deletes` config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table.
+
+You can use `hard_deletes` with dbt-postgres, dbt-bigquery, dbt-snowflake, and dbt-redshift adapters.
import HardDeletes from '/snippets/_hard-deletes.md';
diff --git a/website/docs/reference/resource-configs/snapshot_meta_column_names.md b/website/docs/reference/resource-configs/snapshot_meta_column_names.md
index 24e4c8ca577..59d63374de7 100644
--- a/website/docs/reference/resource-configs/snapshot_meta_column_names.md
+++ b/website/docs/reference/resource-configs/snapshot_meta_column_names.md
@@ -19,7 +19,7 @@ snapshots:
dbt_valid_to:
dbt_scd_id:
dbt_updated_at:
- dbt_is_deleted:
+ dbt_is_deleted:
```
@@ -35,7 +35,7 @@ snapshots:
"dbt_valid_to": "",
"dbt_scd_id": "",
"dbt_updated_at": "",
- "dbt_is_deleted": "",
+ "dbt_is_deleted": "",
}
)
}}
@@ -54,7 +54,7 @@ snapshots:
dbt_valid_to:
dbt_scd_id:
dbt_updated_at:
- dbt_is_deleted:
+ dbt_is_deleted:
```
@@ -67,17 +67,17 @@ In order to align with an organization's naming conventions, the `snapshot_meta_
By default, dbt snapshots use the following column names to track change history using [Type 2 slowly changing dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) records:
-| Field | Meaning | Notes |
-| -------------- | ------- | ----- |
-| `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | The value is affected by the [`strategy`](/reference/resource-configs/strategy). |
-| `dbt_valid_to` | The timestamp when this row is no longer valid. | |
-| `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. |
-| `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. |
-| `dbt_is_deleted` | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. |
+| Field | Meaning
| Notes | Example |
+| -------------- | ------- | ----- | ------- |
+| `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | The value is affected by the [`strategy`](/reference/resource-configs/strategy). | `snapshot_meta_column_names: {dbt_valid_from: start_date}` |
+| `dbt_valid_to` | The timestamp when this row is no longer valid. | | `snapshot_meta_column_names: {dbt_valid_to: end_date}` |
+| `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_scd_id: scd_id}` |
+| `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_updated_at: modified_date}` |
+| `dbt_is_deleted` | A string value indicating if the record has been deleted. (`True` if deleted, `False` if not deleted). |Added when `hard_deletes='new_record'` is configured. | `snapshot_meta_column_names: {dbt_is_deleted: is_deleted}` |
-However, these column names can be customized using the `snapshot_meta_column_names` config.
+All of these column names can be customized using the `snapshot_meta_column_names` config. Refer to the [Example](#example) for more details.
-:::warning
+:::warning
To avoid any unintentional data modification, dbt will **not** automatically apply any column renames. So if a user applies `snapshot_meta_column_names` config for a snapshot without updating the pre-existing table, they will get an error. We recommend either only using these settings for net-new snapshots, or arranging an update of pre-existing tables prior to committing a column name change.
diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md
index 9d84e892236..30275450793 100644
--- a/website/docs/reference/resource-configs/snowflake-configs.md
+++ b/website/docs/reference/resource-configs/snowflake-configs.md
@@ -337,6 +337,15 @@ For dbt limitations, these dbt features are not supported:
- [Model contracts](/docs/collaborate/govern/model-contracts)
- [Copy grants configuration](/reference/resource-configs/snowflake-configs#copying-grants)
+### Troubleshooting dynamic tables
+
+If your dynamic table model fails to rerun with the following error message after the initial execution:
+
+```sql
+SnowflakeDynamicTableConfig.__init__() missing 6 required positional arguments: 'name', 'schema_name', 'database_name', 'query', 'target_lag', and 'snowflake_warehouse'
+```
+Ensure that `QUOTED_IDENTIFIERS_IGNORE_CASE` on your account is set to `FALSE`.
+
## Temporary tables
Incremental table merges for Snowflake prefer to utilize a `view` rather than a `temporary table`. The reasoning is to avoid the database write step that a temporary table would initiate and save compile time.
diff --git a/website/docs/reference/resource-configs/tags.md b/website/docs/reference/resource-configs/tags.md
index 505a33a00f7..6fc8960af7c 100644
--- a/website/docs/reference/resource-configs/tags.md
+++ b/website/docs/reference/resource-configs/tags.md
@@ -93,6 +93,21 @@ resource_type:
```
+
+To apply tags to a model in your `models/` directory, add the `config` property similar to the following example:
+
+
+
+```yaml
+models:
+ - name: my_model
+ description: A model description
+ config:
+ tags: ['example_tag']
+```
+
+
+
@@ -126,10 +141,24 @@ You can use the [`+` operator](/reference/node-selection/graph-operators#the-plu
- `dbt run --select +model_name+` — Run a model, its upstream dependencies, and its downstream dependencies.
- `dbt run --select tag:my_tag+ --exclude tag:exclude_tag` — Run model tagged with `my_tag` and their downstream dependencies, and exclude models tagged with `exclude_tag`, regardless of their dependencies.
+
+:::tip Usage notes about tags
+
+When using tags, consider the following:
+
+- Tags are additive across project hierarchy.
+- Some resource types (like sources, exposures) require tags at the top level.
+
+Refer to [usage notes](#usage-notes) for more information.
+:::
+
## Examples
+
+The following examples show how to apply tags to resources in your project. You can configure tags in the `dbt_project.yml`, `schema.yml`, or SQL files.
+
### Use tags to run parts of your project
-Apply tags in your `dbt_project.yml` as a single value or a string:
+Apply tags in your `dbt_project.yml` as a single value or a string. In the following example, one of the models, the `jaffle_shop` model, is tagged with `contains_pii`.
@@ -153,16 +182,52 @@ models:
- "published"
```
+
+
+
+### Apply tags to models
+
+This section demonstrates applying tags to models in the `dbt_project.yml`, `schema.yml`, and SQL files.
+
+To apply tags to a model in your `dbt_project.yml` file, you would add the following:
+
+
+
+```yaml
+models:
+ jaffle_shop:
+ +tags: finance # jaffle_shop model is tagged with 'finance'.
+```
+
+
+
+To apply tags to a model in your `models/` directory YAML file, you would add the following using the `config` property:
+
+
+
+```yaml
+models:
+ - name: stg_customers
+ description: Customer data with basic cleaning and transformation applied, one row per customer.
+ config:
+ tags: ['santi'] # stg_customers.yml model is tagged with 'santi'.
+ columns:
+ - name: customer_id
+ description: The unique key for each customer.
+ data_tests:
+ - not_null
+ - unique
+```
-You can also apply tags to individual resources using a config block:
+To apply tags to a model in your SQL file, you would add the following:
```sql
{{ config(
- tags=["finance"]
+ tags=["finance"] # stg_payments.sql model is tagged with 'finance'.
) }}
select ...
@@ -211,14 +276,10 @@ seeds:
-:::tip Upgrade to dbt Core 1.9
-
-Applying tags to saved queries is only available in dbt Core versions 1.9 and later.
-:::
+
-
This following example shows how to apply a tag to a saved query in the `dbt_project.yml` file. The saved query is then tagged with `order_metrics`.
@@ -263,7 +324,6 @@ Run resources with multiple tags using the following commands:
# Run all resources tagged "order_metrics" and "hourly"
dbt build --select tag:order_metrics tag:hourly
```
-
## Usage notes
diff --git a/website/docs/reference/resource-configs/teradata-configs.md b/website/docs/reference/resource-configs/teradata-configs.md
index 89a2ff76fba..e2a5d5fddca 100644
--- a/website/docs/reference/resource-configs/teradata-configs.md
+++ b/website/docs/reference/resource-configs/teradata-configs.md
@@ -5,32 +5,13 @@ id: "teradata-configs"
## General
-* *Set `quote_columns`* - to prevent a warning, make sure to explicitly set a value for `quote_columns` in your `dbt_project.yml`. See the [doc on quote_columns](https://docs.getdbt.com/reference/resource-configs/quote_columns) for more information.
+* *Set `quote_columns`* - to prevent a warning, make sure to explicitly set a value for `quote_columns` in your `dbt_project.yml`. See the [doc on quote_columns](/reference/resource-configs/quote_columns) for more information.
```yaml
seeds:
+quote_columns: false #or `true` if you have csv column headers with spaces
```
-* *Enable view column types in docs* - Teradata Vantage has a dbscontrol configuration flag called `DisableQVCI`. This flag instructs the database to create `DBC.ColumnsJQV` with view column type definitions. To enable this functionality you need to:
- 1. Enable QVCI mode in Vantage. Use `dbscontrol` utility and then restart Teradata. Run these commands as a privileged user on a Teradata node:
- ```bash
- # option 551 is DisableQVCI. Setting it to false enables QVCI.
- dbscontrol << EOF
- M internal 551=false
- W
- EOF
-
- # restart Teradata
- tpareset -y Enable QVCI
- ```
- 2. Instruct `dbt` to use `QVCI` mode. Include the following variable in your `dbt_project.yml`:
- ```yaml
- vars:
- use_qvci: true
- ```
- For example configuration, see [dbt_project.yml](https://github.com/Teradata/dbt-teradata/blob/main/test/catalog/with_qvci/dbt_project.yml) in `dbt-teradata` QVCI tests.
-
## Models
###
@@ -142,7 +123,7 @@ id: "teradata-configs"
For details, see [CREATE TABLE documentation](https://docs.teradata.com/r/76g1CuvvQlYBjb2WPIuk3g/B6Js16DRQVwPDjgJ8rz7hg).
-* `with_statistics` - should statistics be copied from the base table, e.g.:
+* `with_statistics` - should statistics be copied from the base table. For example:
```yaml
{{
config(
@@ -289,7 +270,7 @@ For example, in the `snapshots/snapshot_example.sql` file:
Grants are supported in dbt-teradata adapter with release version 1.2.0 and above. You can use grants to manage access to the datasets you're producing with dbt. To implement these permissions, define grants as resource configs on each model, seed, or snapshot. Define the default grants that apply to the entire project in your `dbt_project.yml`, and define model-specific grants within each model's SQL or YAML file.
-for e.g. :
+For example:
models/schema.yml
```yaml
models:
@@ -299,7 +280,7 @@ for e.g. :
select: ['user_a', 'user_b']
```
-Another e.g. for adding multiple grants:
+Another example for adding multiple grants:
```yaml
models:
@@ -314,8 +295,8 @@ Another e.g. for adding multiple grants:
Refer to [grants](/reference/resource-configs/grants) for more information on Grants.
-## Query Band
-Query Band in dbt-teradata can be set on three levels:
+## Query band
+Query band in dbt-teradata can be set on three levels:
1. Profiles level: In the `profiles.yml` file, the user can provide `query_band` using the following example:
```yaml
@@ -348,6 +329,11 @@ If a user sets some key-value pair with value as `'{model}'`, internally this `'
- For example, if the model the user is running is `stg_orders`, `{model}` will be replaced with `stg_orders` in runtime.
- If no `query_band` is set by the user, the default query_band used will be: ```org=teradata-internal-telem;appname=dbt;```
+## Unit testing
+* Unit testing is supported in dbt-teradata, allowing users to write and execute unit tests using the dbt test command.
+ * For detailed guidance, refer to the [dbt unit tests documentation](/docs/build/documentation).
+> In Teradata, reusing the same alias across multiple common table expressions (CTEs) or subqueries within a single model is not permitted, as it results in parsing errors; therefore, it is essential to assign unique aliases to each CTE or subquery to ensure proper query execution.
+
## valid_history incremental materialization strategy
_This is available in early access_
@@ -361,26 +347,27 @@ In temporal databases, valid time is crucial for applications like historical re
unique_key='id',
on_schema_change='fail',
incremental_strategy='valid_history',
- valid_from='valid_from_column',
- history_column_in_target='history_period_column'
+ valid_period='valid_period_col',
+ use_valid_to_time='no',
)
}}
```
The `valid_history` incremental strategy requires the following parameters:
-* `valid_from` — Column in the source table of **timestamp** datatype indicating when each record became valid.
-* `history_column_in_target` — Column in the target table of **period** datatype that tracks history.
+* `unique_key`: The primary key of the model (excluding the valid time components), specified as a column name or list of column names.
+* `valid_period`: Name of the model column indicating the period for which the record is considered to be valid. The datatype must be `PERIOD(DATE)` or `PERIOD(TIMESTAMP)`.
+* `use_valid_to_time`: Whether the end bound value of the valid period in the input is considered by the strategy when building the valid timeline. Use `no` if you consider your record to be valid until changed (and supply any value greater to the begin bound for the end bound of the period. A typical convention is `9999-12-31` of ``9999-12-31 23:59:59.999999`). Use `yes` if you know until when the record is valid (typically this is a correction in the history timeline).
The valid_history strategy in dbt-teradata involves several critical steps to ensure the integrity and accuracy of historical data management:
* Remove duplicates and conflicting values from the source data:
* This step ensures that the data is clean and ready for further processing by eliminating any redundant or conflicting records.
- * The process of removing duplicates and conflicting values from the source data involves using a ranking mechanism to ensure that only the highest-priority records are retained. This is accomplished using the SQL RANK() function.
+ * The process of removing primary key duplicates (two or more records with the same value for the `unique_key` and BEGIN() bond of the `valid_period` fields) in the dataset produced by the model. If such duplicates exist, the row with the lowest value is retained for all non-primary-key fields (in the order specified in the model). Full-row duplicates are always de-duplicated.
* Identify and adjust overlapping time slices:
- * Overlapping time periods in the data are detected and corrected to maintain a consistent and non-overlapping timeline.
-* Manage records needing to be overwritten or split based on the source and target data:
+ * Overlapping or adjacent time periods in the data are corrected to maintain a consistent and non-overlapping timeline. To achieve this, the macro adjusts the valid period end bound of a record to align with the begin bound of the next record (if they overlap or are adjacent) within the same `unique_key` group. If `use_valid_to_time = 'yes'`, the valid period end bound provided in the source data is used. Otherwise, a default end date is applied for missing bounds, and adjustments are made accordingly.
+* Manage records needing to be adjusted, deleted, or split based on the source and target data:
* This involves handling scenarios where records in the source data overlap with or need to replace records in the target data, ensuring that the historical timeline remains accurate.
-* Utilize the TD_NORMALIZE_MEET function to compact history:
- * This function helps to normalize and compact the history by merging adjacent time periods, improving the efficiency and performance of the database.
+* Compact history:
+ * Normalize and compact the history by merging records of adjacent time periods with the same value, optimizing database storage and performance. We use the function TD_NORMALIZE_MEET for this purpose.
* Delete existing overlapping records from the target table:
* Before inserting new or updated records, any existing records in the target table that overlap with the new data are removed to prevent conflicts.
* Insert the processed data into the target table:
@@ -414,11 +401,6 @@ These steps collectively ensure that the valid_history strategy effectively mana
2 | PERIOD(TIMESTAMP)[2024-03-01 00:00:00.0, 2024-03-12 00:00:00.0] | A | x1
2 | PERIOD(TIMESTAMP)[2024-03-12 00:00:00.0, 9999-12-31 23:59:59.9999] | C | x1
```
-
-
-:::info
-The target table must already exist before running the model. Ensure the target table is created and properly structured with the necessary columns, including a column that tracks the history with period datatype, before running a dbt model.
-:::
## Common Teradata-specific tasks
* *collect statistics* - when a table is created or modified significantly, there might be a need to tell Teradata to collect statistics for the optimizer. It can be done using `COLLECT STATISTICS` command. You can perform this step using dbt's `post-hooks`, e.g.:
diff --git a/website/docs/reference/resource-configs/updated_at.md b/website/docs/reference/resource-configs/updated_at.md
index 09122859e43..39ef7ae82d7 100644
--- a/website/docs/reference/resource-configs/updated_at.md
+++ b/website/docs/reference/resource-configs/updated_at.md
@@ -64,7 +64,7 @@ You will get a warning if the data type of the `updated_at` column does not matc
## Description
A column within the results of your snapshot query that represents when the record row was last updated.
-This parameter is **required if using the `timestamp` [strategy](/reference/resource-configs/strategy)**.
+This parameter is **required if using the `timestamp` [strategy](/reference/resource-configs/strategy)**. The `updated_at` field may support ISO date strings and unix epoch integers, depending on the data platform you use.
## Default
diff --git a/website/docs/reference/resource-configs/where.md b/website/docs/reference/resource-configs/where.md
index 63c40b99902..3ea6497423a 100644
--- a/website/docs/reference/resource-configs/where.md
+++ b/website/docs/reference/resource-configs/where.md
@@ -134,7 +134,9 @@ You can override this behavior by:
Within this macro definition, you can reference whatever custom macros you want, based on static inputs from the configuration. At simplest, this enables you to DRY up code that you'd otherwise need to repeat across many different `.yml` files. Because the `get_where_subquery` macro is resolved at runtime, your custom macros can also include [fetching the results of introspective database queries](https://docs.getdbt.com/reference/dbt-jinja-functions/run_query).
-**Example:** Filter your test to the past three days of data, using dbt's cross-platform [`dateadd()`](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros#dateadd) utility macro.
+#### Example
+
+Filter your test to the past N days of data, using dbt's cross-platform [`dateadd()`](/reference/dbt-jinja-functions/cross-database-macros#dateadd) utility macro. You can set the number of days in the placeholder string.
@@ -147,7 +149,7 @@ models:
tests:
- unique:
config:
- where: "date_column > __three_days_ago__" # placeholder string for static config
+ where: "date_column > __3_days_ago__" # placeholder string for static config
```
@@ -158,10 +160,9 @@ models:
{% macro get_where_subquery(relation) -%}
{% set where = config.get('where') %}
{% if where %}
- {% if "__three_days_ago__" in where %}
+ {% if "_days_ago__" in where %}
{# replace placeholder string with result of custom macro #}
- {% set three_days_ago = dbt.dateadd('day', -3, current_timestamp()) %}
- {% set where = where | replace("__three_days_ago__", three_days_ago) %}
+ {% set where = replace_days_ago(where) %}
{% endif %}
{%- set filtered -%}
(select * from {{ relation }} where {{ where }}) dbt_subquery
@@ -171,6 +172,21 @@ models:
{% do return(relation) %}
{%- endif -%}
{%- endmacro %}
+
+{% macro replace_days_ago(where_string) %}
+ {# Use regex to search the pattern for the number days #}
+ {# Default to 3 days when no number found #}
+ {% set re = modules.re %}
+ {% set days = 3 %}
+ {% set pattern = '__(\d+)_days_ago__' %}
+ {% set match = re.search(pattern, where_string) %}
+ {% if match %}
+ {% set days = match.group(1) | int %}
+ {% endif %}
+ {% set n_days_ago = dbt.dateadd('day', -days, current_timestamp()) %}
+ {% set result = re.sub(pattern, n_days_ago, where_string) %}
+ {{ return(result) }}
+{% endmacro %}
```
diff --git a/website/docs/reference/resource-properties/description.md b/website/docs/reference/resource-properties/description.md
index cf7b2b29a5a..a542b9aba79 100644
--- a/website/docs/reference/resource-properties/description.md
+++ b/website/docs/reference/resource-properties/description.md
@@ -14,6 +14,7 @@ description: "This guide explains how to use the description key to add YAML des
{ label: 'Analyses', value: 'analyses', },
{ label: 'Macros', value: 'macros', },
{ label: 'Data tests', value: 'data_tests', },
+ { label: 'Unit tests', value: 'unit_tests', },
]
}>
@@ -150,24 +151,81 @@ macros:
+You can add a description to a [singular data test](/docs/build/data-tests#singular-data-tests) or a [generic data test](/docs/build/data-tests#generic-data-tests).
+
```yml
+# Singular data test example
+
version: 2
data_tests:
- name: data_test_name
description: markdown_string
-
```
+
+
+
+
+```yml
+# Generic data test example
+
+version: 2
+models:
+ - name: model_name
+ columns:
+ - name: column_name
+ tests:
+ - unique:
+ description: markdown_string
+```
-The `description` property is available for generic and singular data tests beginning in dbt v1.9.
+The `description` property is available for [singular data tests](/docs/build/data-tests#singular-data-tests) or [generic data tests](/docs/build/data-tests#generic-data-tests) beginning in dbt v1.9.
+
+
+
+
+
+
+
+
+
+
+
+```yml
+unit_tests:
+ - name: unit_test_name
+ description: "markdown_string"
+ model: model_name
+ given: ts
+ - input: ref_or_source_call
+ rows:
+ - {column_name: column_value}
+ - {column_name: column_value}
+ - {column_name: column_value}
+ - {column_name: column_value}
+ - input: ref_or_source_call
+ format: csv
+ rows: dictionary | string
+ expect:
+ format: dict | csv | sql
+ fixture: fixture_name
+```
+
+
+
+
+
+
+
+The `description` property is available for [unit tests](/docs/build/unit-tests) beginning in dbt v1.8.
@@ -176,13 +234,17 @@ The `description` property is available for generic and singular data tests begi
## Definition
-A user-defined description. Can be used to document:
+
+A user-defined description used to document:
+
- a model, and model columns
- sources, source tables, and source columns
- seeds, and seed columns
- snapshots, and snapshot columns
- analyses, and analysis columns
- macros, and macro arguments
+- data tests, and data test columns
+- unit tests for models
These descriptions are used in the documentation website rendered by dbt (refer to [the documentation guide](/docs/build/documentation) or [dbt Explorer](/docs/collaborate/explore-projects)).
@@ -196,6 +258,18 @@ Be mindful of YAML semantics when providing a description. If your description c
## Examples
+This section contains examples of how to add descriptions to various resources:
+
+- [Add a simple description to a model and column](#add-a-simple-description-to-a-model-and-column)
+- [Add a multiline description to a model](#add-a-multiline-description-to-a-model)
+- [Use some markdown in a description](#use-some-markdown-in-a-description)
+- [Use a docs block in a description](#use-a-docs-block-in-a-description)
+- [Link to another model in a description](#link-to-another-model-in-a-description)
+- [Include an image from your repo in your descriptions](#include-an-image-from-your-repo-in-your-descriptions)
+- [Include an image from the web in your descriptions](#include-an-image-from-the-web-in-your-descriptions)
+- [Add a description to a data test](#add-a-description-to-a-data-test)
+- [Add a description to a unit test](#add-a-description-to-a-unit-test)
+
### Add a simple description to a model and column
@@ -400,3 +474,80 @@ models:
If mixing images and text, also consider using a docs block.
+### Add a description to a data test
+
+
+
+
+
+
+
+You can add a `description` property to a generic or singular data test.
+
+#### Generic data test
+
+This example shows a generic data test that checks for unique values in a column for the `orders` model.
+
+
+
+```yaml
+version: 2
+
+models:
+ - name: orders
+ columns:
+ - name: order_id
+ tests:
+ - unique:
+ description: "The order_id is unique for every row in the orders model"
+```
+
+
+#### Singular data test
+
+This example shows a singular data test that checks to ensure all values in the `payments` model are not negative (≥ 0).
+
+
+
+```yaml
+version: 2
+data_tests:
+ - name: assert_total_payment_amount_is_positive
+ description: >
+ Refunds have a negative amount, so the total amount should always be >= 0.
+ Therefore return records where total amount < 0 to make the test fail.
+
+```
+
+
+Note that in order for the test to run, the `tests/assert_total_payment_amount_is_positive.sql` SQL file has to exist in the `tests` directory.
+
+### Add a description to a unit test
+
+
+
+
+
+
+
+This example shows a unit test that checks to ensure the `opened_at` timestamp is properly truncated to a date for the `stg_locations` model.
+
+
+
+```yaml
+unit_tests:
+ - name: test_does_location_opened_at_trunc_to_date
+ description: "Check that opened_at timestamp is properly truncated to a date."
+ model: stg_locations
+ given:
+ - input: source('ecom', 'raw_stores')
+ rows:
+ - {id: 1, name: "Rego Park", tax_rate: 0.2, opened_at: "2016-09-01T00:00:00"}
+ - {id: 2, name: "Jamaica", tax_rate: 0.1, opened_at: "2079-10-27T23:59:59.9999"}
+ expect:
+ rows:
+ - {location_id: 1, location_name: "Rego Park", tax_rate: 0.2, opened_date: "2016-09-01"}
+ - {location_id: 2, location_name: "Jamaica", tax_rate: 0.1, opened_date: "2079-10-27"}
+```
+
+
diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md
index 018988a4934..4fcc4e8a24d 100644
--- a/website/docs/reference/snapshot-configs.md
+++ b/website/docs/reference/snapshot-configs.md
@@ -284,19 +284,18 @@ Snapshots can be configured in multiple ways:
-1. Defined in YAML files using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [the dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 and higher).
+1. Defined in YAML files using the `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) or whichever folder you pefer. Available in [the dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks), dbt v1.9 and higher.
2. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys.
-1. Defined in a YAML file using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [the dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 and higher). The latest snapshot YAML syntax provides faster and more efficient management.
-2. Using a `config` block within a snapshot defined in Jinja SQL.
-3. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys.
-
+1. Using a `config` block within a snapshot defined in Jinja SQL.
+2. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys.
+3. Defined in a YAML file using the `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [the dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 and higher).
-Snapshot configurations are applied hierarchically in the order above with higher taking precedence.
+Snapshot configurations are applied hierarchically in the order above with higher taking precedence. You can also apply [tests](/reference/snapshot-properties) to snapshots using the [`tests` property](/reference/resource-properties/data-tests).
### Examples
diff --git a/website/docs/terms/elt.md b/website/docs/terms/elt.md
deleted file mode 100644
index 0e7d11bf7dd..00000000000
--- a/website/docs/terms/elt.md
+++ /dev/null
@@ -1,139 +0,0 @@
----
-id: elt
-title: What is ELT (Extract, Load, Transform)?
-description: ELT is the process of first extraction data from different sources, then loading it into a data warehouse, and finally transforming it.
-displayText: ELT
-hoverSnippet: Extract, Load, Transform (ELT) is the process of first extracting data from different data sources, loading it into a target data warehouse, and finally transforming it.
----
-
- What is ELT (Extract, Load, Transform)? How does it differ from ETL?
-
-Extract, Load, Transform (ELT) is the process of first extracting data from different data sources, then loading it into a target , and finally transforming it.
-
-ELT has emerged as a paradigm for how to manage information flows in a modern data warehouse. This represents a fundamental shift from how data previously was handled when Extract, Transform, Load (ETL) was the data workflow most companies implemented.
-
-Transitioning from ETL to ELT means that you no longer have to capture your transformations during the initial loading of the data into your data warehouse. Rather, you are able to load all of your data, then build transformations on top of it. Data teams report that the ELT workflow has several advantages over the traditional ETL workflow which we’ll go over [in-depth later in this glossary](#benefits-of-elt).
-
-## How ELT works
-
-In an ELT process, data is extracted from data sources, loaded into a target data platform, and finally transformed for analytics use. We’ll go over the three components (extract, load, transform) in detail here.
-
-![Diagram depicting the ELT workflow. Data is depicted being extracted from example data sources like an Email CRM, Facebook Ads platform, Backend databases, and Netsuite. The data is then loaded as raw data into a data warehouse. From there, the data is transformed within the warehouse by renaming, casting, joining, or enriching the raw data. The result is then modeled data inside your data warehouse.](/img/docs/terms/elt/elt-diagram.png)
-
-### Extract
-
-In the extraction process, data is extracted from multiple data sources. The data extracted is, for the most part, data that teams eventually want to use for analytics work. Some examples of data sources can include:
-
-- Backend application databases
-- Marketing platforms
-- Email and sales CRMs
-- and more!
-
-Accessing these data sources using Application Programming Interface (API) calls can be a challenge for individuals and teams who don't have the technical expertise or resources to create their own scripts and automated processes. However, the recent development of certain open-source and Software as a Service (SaaS) products has removed the need for this custom development work. By establishing the option to create and manage pipelines in an automated way, you can extract the data from data sources and load it into data warehouses via a user interface.
-
-Since not every data source will integrate with SaaS tools for extraction and loading, it’s sometimes inevitable that teams will write custom ingestion scripts in addition to their SaaS tools.
-
-### Load
-
-During the loading stage, data that was extracted is loaded into the target data warehouse. Some examples of modern data warehouses include Snowflake, Amazon Redshift, and Google BigQuery. Examples of other data storage platforms include data lakes such as Databricks’s Data Lakes. Most of the SaaS applications that extract data from your data sources will also load it into your target data warehouse. Custom or in-house extraction and load processes usually require strong data engineering and technical skills.
-
-At this point in the ELT process, the data is mostly unchanged from its point of extraction. If you use an extraction and loading tool like Fivetran, there may have been some light normalization on your data. But for all intents and purposes, the data loaded into your data warehouse at this stage is in its raw format.
-
-### Transform
-
-In the final transformation step, the raw data that has been loaded into your data warehouse is finally ready for modeling! When you first look at this data, you may notice a few things about it…
-
-- Column names may or may not be clear
-- Some columns are potentially the incorrect data type
-- Tables are not joined to other tables
-- Timestamps may be in the incorrect timezone for your reporting
-- fields may need to be unnested
-- Tables may be missing primary keys
-- And more!
-
-...hence the need for transformation! During the transformation process, data from your data sources is usually:
-
-- **Lightly Transformed**: Fields are cast correctly, timestamp fields’ timezones are made uniform, tables and fields are renamed appropriately, and more.
-- **Heavily Transformed**: Business logic is added, appropriate materializations are established, data is joined together, etc.
-- **QA’d**: Data is tested according to business standards. In this step, data teams may ensure primary keys are unique, model relations match-up, column values are appropriate, and more.
-
-Common ways to transform your data include leveraging modern technologies such as dbt, writing custom SQL scripts that are automated by a scheduler, utilizing stored procedures, and more.
-
-## ELT vs ETL
-
-The primary difference between the traditional ETL and the modern ELT workflow is when [data transformation](https://www.getdbt.com/analytics-engineering/transformation/) and loading take place. In ETL workflows, data extracted from data sources is transformed prior to being loaded into target data platforms. Newer ELT workflows have data being transformed after being loaded into the data platform of choice. Why is this such a big deal?
-
-| | ELT | ETL |
-|---|---|---|
-| Programming skills required| Often little to no code to extract and load data into your data warehouse. | Often requires custom scripts or considerable data engineering lift to extract and transform data prior to load. |
-| Separation of concerns | Extraction, load, and transformation layers can be explicitly separated out by different products. | ETL processes are often encapsulated in one product. |
-| Distribution of transformations | Since transformations take place last, there is greater flexibility in the modeling process. Worry first about getting your data in one place, then you have time to explore the data to understand the best way to transform it. | Because transformation occurs before data is loaded into the target location, teams must conduct thorough work prior to make sure data is transformed properly. Heavy transformations often take place downstream in the BI layer. |
-| [Data team distribution](https://www.getdbt.com/data-teams/analytics-job-descriptions/) | ELT workflows empower data team members who know SQL to create their own extraction and loading pipelines and transformations. | ETL workflows often require teams with greater technical skill to create and maintain pipelines. |
-
-Why has ELT adoption grown so quickly in recent years? A few reasons:
-
-- **The abundance of cheap cloud storage with modern data warehouses.** The creation of modern data warehouses such Redshift and Snowflake has made it so teams of all sizes can store and scale their data at a more efficient cost. This was a huge enabler for the ELT workflow.
-- **The development of low-code or no-code data extractors and loaders.** Products that require little technical expertise such as Fivetran and Stitch, which can extract data from many data sources and load it into many different data warehouses, have helped lower the barrier of entry to the ELT workflow. Data teams can now relieve some of the data engineering lift needed to extract data and create complex transformations.
-- **A true code-based, version-controlled transformation layer with the development of dbt.** Prior to the development of dbt, there was no singular transformation layer product. dbt helps data analysts apply software engineering best practices (version control, CI/CD, and testing) to data transformation, ultimately allowing for anyone who knows SQL to be a part of the ELT process.
-- **Increased compatibility between ELT layers and technology in recent years.** With the expansion of extraction, loading, and transformation layers that integrate closely together and with cloud storage, the ELT workflow has never been more accessible. For example, Fivetran creates and maintains [dbt packages](https://hub.getdbt.com/) to help write dbt transformations for the data sources they connect to.
-
-## Benefits of ELT
-
-You often hear about the benefits of the ELT workflow to data, but you can sometimes forget to talk about the benefits it brings to people. There are a variety of benefits that this workflow brings to the actual data (which we’ll outline in detail below), such as the ability to recreate historical transformations, test data and data models, and more. We'll also want to use this section to emphasize the empowerment the ELT workflow brings to both data team members and business stakeholders.
-
-### ELT benefit #1: Data as code
-
-Ok we said it earlier: The ELT workflow allows data teams to function like software engineers. But what does this really mean? How does it actually impact your data?
-
-#### Analytics code can now follow the same best practices as software code
-
-At its core, data transformations that occur last in a data pipeline allow for code-based and version-controlled transformations. These two factors alone permit data team members to:
-
-- Easily recreate historical transformations by rolling back commits
-- Establish code-based tests
-- Implement CI/CD workflows
-- Document data models like typical software code.
-
-#### Scaling, made sustainable
-
-As your business grows, the number of data sources correspondingly increases along with it. As such, so do the number of transformations and models needed for your business. Managing a high number of transformations without version control or automation is not scalable.
-
-The ELT workflow capitalizes on transformations occurring last to provide flexibility and software engineering best practices to data transformation. Instead of having to worry about how your extraction scripts scale as your data increases, data can be extracted and loaded automatically with a few clicks.
-
-### ELT benefit #2: Bring the power to the people
-
-The ELT workflow opens up a world of opportunity for the people that work on that data, not just the data itself.
-
-#### Empowers data team members
-
-Data analysts, analytics engineers, and even data scientists no longer have to be dependent on data engineers to create custom pipelines and models. Instead, they can use point-and-click products such as Fivetran and Airbyte to extract and load the data for them.
-
-Having the transformation as the final step in the ELT workflow also allows data folks to leverage their understanding of the data and SQL to focus more on actually modeling the data.
-
-#### Promotes greater transparency for end busines users
-
-Data teams can expose the version-controlled code used to transform data for analytics to end business users by no longer having transformations hidden in the ETL process. Instead of having to manually respond to the common question, “How is this data generated?” data folks can direct business users to documentation and repositories. Having end business users involved or viewing the data transformations promote greater collaboration and awareness between business and data folks.
-
-## ELT tools
-
-As mentioned earlier, the recent development of certain technologies and products has helped lower the barrier of entry to implementing the ELT workflow. Most of these new products act as one or two parts of the ELT process, but some have crossover across all three parts. We’ll outline some of the current tools in the ELT ecosystem below.
-
-| Product | E/L/T? | Description | Open source option? |
-|---|---|---|---|
-| Fivetran/HVR | E, some T, L | Fivetran is a SaaS company that helps data teams extract, load, and perform some transformation on their data. Fivetran easily integrates with modern data warehouses and dbt. They also offer transformations that leverage dbt Core. | :x: |
-| Stitch by Talend | E, L | Stitch (part of Talend) is another SaaS product that has many data connectors to extract data and load it into data warehouses. | :x: |
-| Airbyte | E, L | Airbyte is an open-source and cloud service that allows teams to create data extraction and load pipelines. | :white_check_mark: |
-| Funnel | E, some T, L | Funnel is another product that can extract and load data. Funnel’s data connectors are primarily focused around marketing data sources. | :x: |
-| dbt | T | dbt is the transformation tool that enables data analysts and engineers to transform, test, and document data in the cloud data warehouse. dbt offers both an open-source and cloud-based product. | :white_check_mark: |
-
-## Conclusion
-
-The past few years have been a whirlwind for the data world. The increased accessibility and affordability of cloud warehouses, no-code data extractors and loaders, and a true transformation layer with dbt has allowed for the ELT workflow to become the preferred analytics workflow. ETL predates ELT and differs in when data is transformed. In both processes, data is first extracted from different sources. However, in ELT processes, data is loaded into the target data platform and then transformed. The ELT workflow ultimately allows for data team members to extract, load, and model their own data in a flexible, accessible, and scalable way.
-
-## Further reading
-
-Here's some of our favorite content about the ELT workflow:
-
-- [The case for the ELT workflow](https://www.getdbt.com/analytics-engineering/case-for-elt-workflow/)
-- [A love letter to ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/)
-- [What is dbt?](https://getdbt.com/product/what-is-dbt/)
diff --git a/website/docs/terms/etl.md b/website/docs/terms/etl.md
deleted file mode 100644
index 321f59a65d0..00000000000
--- a/website/docs/terms/etl.md
+++ /dev/null
@@ -1,130 +0,0 @@
----
-id: etl
-title: What is ETL (Extract, Transform, Load)?
-description: ETL is the process of first extracting data from a data source, transforming it, and then loading it into a target data warehouse.
-displayText: ETL
-hoverSnippet: Extract, Transform, Load (ETL) is the process of first extracting data from a data source, transforming it, and then loading it into a target data warehouse.
----
-
-
- What is ETL (Extract, Transform, Load)? How has it evolved?
-
-
-ETL, or “Extract, Transform, Load”, is the process of first extracting data from a data source, transforming it, and then loading it into a target . In ETL workflows, much of the meaningful [data transformation](https://www.getdbt.com/analytics-engineering/transformation/) occurs outside this primary pipeline in a downstream business intelligence (BI) platform.
-
-ETL is contrasted with the newer (Extract, Load, Transform) workflow, where transformation occurs after data has been loaded into the target data warehouse. In many ways, the ETL workflow could have been renamed the ETLT workflow, because a considerable portion of meaningful data transformations happen outside the data pipeline. The same transformations can occur in both ETL and ELT workflows, the primary difference is *when* (inside or outside the primary ETL workflow) and *where* the data is transformed (ETL platform/BI tool/data warehouse).
-
-It’s important to talk about ETL and understand how it works, where it provides value, and how it can hold people back. If you don’t talk about the benefits and drawbacks of systems, how can you expect to improve them?
-
-## How ETL works
-
-In an ETL process, data is first extracted from a source, transformed, and then loaded into a target data platform. We’ll go into greater depth for all three steps below.
-
-![A diagram depicting the ETL workflow. The diagram starts by depicting raw data being extracted from various example data sources like an email CRM, Facebook Ads platform, a backend database, and Netsuite. Once the data is extracted, the raw data is transformed within the data pipeline via renaming, casting, joining, and enriching. After the data is transformed within the data pipeline, the modeled data is loaded into a data warehouse.](/img/docs/terms/etl/etl-diagram.png)
-
-### Extract
-
-In this first step, data is extracted from different data sources. Data that is extracted at this stage is likely going to be eventually used by end business users to make decisions. Some examples of these data sources include:
-
-- Ad platforms (Facebook Ads, Google Ads, etc.)
-- Backend application databases
-- Sales CRMs
-- And more!
-
-To actually get this data, data engineers may write custom scripts that make Application Programming Interface (API) calls to extract all the relevant data. Because making and automating these API calls gets harder as data sources and data volume grows, this method of extraction often requires strong technical skills. In addition, these extraction scripts also involve considerable maintenance since APIs change relatively often. Data engineers are often incredibly competent at using different programming languages such as Python and Java. Data teams can also extract from these data sources with open source and Software as a Service (SaaS) products.
-
-### Transform
-
-At this stage, the raw data that has been extracted is normalized and modeled. In ETL workflows, much of the actual meaningful business logic, metric calculations, and entity joins tend to happen further down in a downstream BI platform. As a result, the transformation stage here is focused on data cleanup and normalization – renaming of columns, correct casting of fields, timestamp conversions.
-
-To actually transform the data, there’s two primary methods teams will use:
-
-- **Custom solutions**: In this solution, data teams (typically data engineers on the team), will write custom scripts and create automated pipelines to transform the data. Unlike ELT transformations that typically use SQL for modeling, ETL transformations are often written in other programming languages such as Python or Scala. Data engineers may leverage technologies such as Apache Spark or Hadoop at this point to help process large volumes of data.
-- **ETL products**: There are ETL products that will extract, transform, and load your data in one platform. [These tools](#etl-tools) often involve little to no code and instead use Graphical User Interfaces (GUI) to create pipelines and transformations.
-
-### Load
-
-In the final stage, the transformed data is loaded into your target data warehouse. Once this transformed data is in its final destination, it’s most commonly exposed to end business users either in a BI tool or in the data warehouse directly.
-
-The ETL workflow implies that your raw data does not live in your data warehouse. *Because transformations occur before load, only transformed data lives in your data warehouse in the ETL process.* This can make it harder to ensure that transformations are performing the correct functionality.
-
-## How ETL is being used
-
-While ELT adoption is growing, we still see ETL use cases for processing large volumes of data and adhering to strong data governance principles.
-
-### ETL to efficiently normalize large volumes of data
-
-ETL can be an efficient way to perform simple normalizations across large data sets. Doing these lighter transformations across a large volume of data during loading can help get the data formatted properly and quickly for downstream use. In addition, end business users sometimes need quick access to raw or somewhat normalized data. Through an ETL workflow, data teams can conduct lightweight transformations on data sources and quickly expose them in their target data warehouse and downstream BI tool.
-
-### ETL for hashing PII prior to load
-
-Some companies will want to mask, hash, or remove PII values before it enters their data warehouse. In an ETL workflow, teams can transform PII to hashed values or remove them completely during the loading process. This limits where PII is available or accessible in an organization’s data warehouse.
-
-## ETL challenges
-
-There are reasons ETL has persisted as a workflow for over twenty years. However, there are also reasons why there’s been such immense innovation in this part of the data world in the past decade. From our perspective, the technical and human limitations we describe below are some of the reasons ELT has surpassed ETL as the preferred workflow.
-
-### ETL challenge #1: Technical limitations
-
-**Limited or lack of version control**
-
-When transformations exist as standalone scripts or deeply woven in ETL products, it can be hard to version control the transformations. Not having version control on transformation as code means that data teams can’t easily recreate or rollback historical transformations and perform code reviews.
-
-**Immense amount of business logic living in BI tools**
-
-Some teams with ETL workflows only implement much of their business logic in their BI platform versus earlier in their transformation phase. While most organizations have some business logic in their BI tools, an excess of this logic downstream can make rendering data in the BI tool incredibly slow and potentially hard to track if the code in the BI tool is not version controlled or exposed in documentation.
-
-**Challenging QA processes**
-
-While data quality testing can be done in ETL processes, not having the raw data living somewhere in the data warehouse inevitably makes it harder to ensure data models are performing the correct functionality. In addition, quality control continually gets harder as the number of data sources and pipelines within your system grows.
-
-### ETL challenge #2: Human limitations
-
-**Data analysts can be excluded from ETL work**
-
-Because ETL workflows often involve incredibly technical processes, they've restricted data analysts from being involved in the data workflow process. One of the greatest strengths of data analysts is their knowledge of the data and SQL, and when extractions and transformations involve unfamiliar code or applications, they and their expertise can be left out of the process. Data analysts and scientists also become dependent on other people to create the schemas, tables, and datasets they need for their work.
-
-**Business users are kept in the dark**
-
-Transformations and business logic can often be buried deep in custom scripts, ETL tools, and BI platforms. At the end of the day, this can hurt business users: They're kept out of the data modeling process and have limited views into how data transformation takes place. As a result, end business users often have little clarity on data definition, quality, and freshness, which ultimately can decrease trust in the data and data team.
-
-## ETL vs ELT
-
-You may read other articles or technical documents that use ETL and ELT interchangeably. On paper, the only difference is the order in which the T and the L appear. However, this mere switching of letters dramatically changes the way data exists in and flows through a business’ system.
-
-In both processes, data from different data sources is extracted in similar ways. However, in ELT, data is then directly loaded into the target data platform versus being transformed in ETL. Now, via ELT workflows, both raw and transformed data can live in a data warehouse. In ELT workflows, data folks have the flexibility to model the data after they’ve had the opportunity to explore and analyze the raw data. ETL workflows can be more constraining since transformations happen immediately after extraction. We break down some of the other major differences between the two below:
-
-| | ELT | ETL |
-|---|---|---|
-| Programming skills required | Often requires little to no code to extract and load data into your data warehouse. | Often requires custom scripts or considerable data engineering lift to extract and transform data prior to load. |
-| Separation of concerns | Extraction, load, and transformation layers can be explicitly separated out by different products. | ETL processes are often encapsulated in one product. |
-| Distribution of transformations | Since transformations take place last, there is greater flexibility in the modeling process. Worry first about getting your data in one place, then you have time to explore the data to understand the best way to transform it. | Because transformation occurs before data is loaded into the target location, teams must conduct thorough work prior to make sure data is transformed properly. Heavy transformations often take place downstream in the BI layer. |
-| [Data team roles](https://www.getdbt.com/data-teams/analytics-job-descriptions/) | ELT workflows empower data team members who know SQL to create their own extraction and loading pipelines and transformations. | ETL workflows often require teams with greater technical skill to create and maintain pipelines. |
-
-While ELT is growing in adoption, it’s still important to talk about when ETL might be appropriate and where you'll see challenges with the ETL workflow.
-
-## ETL tools
-
-There exists a variety of ETL technologies to help teams get data into their data warehouse. A good portion of ETL tools on the market today are geared toward enterprise businesses and teams, but there are some that are also applicable for smaller organizations.
-
-| Platform | E/T/L? | Description | Open source option? |
-|---|---|---|---|
-| Informatica | E, T, L | An all-purpose ETL platform that supports low or no-code extraction, transformations and loading. Informatica also offers a broad suite of data management solutions beyond ETL and is often leveraged by enterprise organizations. | :x: |
-| Integrate.io | E, T, L | A newer ETL product focused on both low-code ETL as well as reverse ETL pipelines. | :x: |
-| Matillion | E, T, L | Matillion is an end-to-end ETL solution with a variety of native data connectors and GUI-based transformations. | :x: |
-| Microsoft SISS | E, T, L | Microsoft’s SQL Server Integration Services (SISS) offers a robust, GUI-based platform for ETL services. SISS is often used by larger enterprise teams. | :x: |
-| Talend Open Studio | E, T, L | An open source suite of GUI-based ETL tools. | :white_check_mark: |
-
-## Conclusion
-
-ETL, or “Extract, Transform, Load,” is the process of extracting data from different data sources, transforming it, and loading that transformed data into a data warehouse. ETL typically supports lighter transformations during the phase prior to loading and more meaningful transformations to take place in downstream BI tools. We’re seeing now that ETL is fading out and the newer ELT workflow is replacing it as a practice for many data teams. However, it’s important to note that ETL allowed us to get us to where we are today: Capable of building workflows that extract data within simple UIs, store data in scalable cloud data warehouses, and write data transformations like software engineers.
-
-## Further Reading
-
-Please check out some of our favorites reads regarding ETL and ELT below:
-
-- [Glossary: ELT](https://docs.getdbt.com/terms/elt)
-- [The case for the ELT workflow](https://www.getdbt.com/analytics-engineering/case-for-elt-workflow/)
-- [A love letter to ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/)
-- [Reverse ETL](https://www.getdbt.com/analytics-engineering/use-cases/operational-analytics/)
-
diff --git a/website/docs/terms/reverse-etl.md b/website/docs/terms/reverse-etl.md
deleted file mode 100644
index a3ccd0b0f70..00000000000
--- a/website/docs/terms/reverse-etl.md
+++ /dev/null
@@ -1,94 +0,0 @@
----
-id: reverse-etl
-title: Reverse ETL
-description: Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms.
-displayText: reverse ETL
-hoverSnippet: Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms.
----
-
-
- Reverse ETL, demystified: What it is in plain english
-
-
-Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms. Once in an end platform, that data is often used to drive meaningful business actions, such as creating custom audiences in ad platforms, personalizing email campaigns, or supplementing data in a sales CRM. You may also hear about reverse ETL referred to as operational analytics or data activation.
-
-Reverse ETL efforts typically happen after data teams have set up their [modern data stack](https://www.getdbt.com/blog/future-of-the-modern-data-stack/) and ultimately have a consistent and automated way to extract, load, and transform data. Data teams are also often responsible for setting up the pipelines to send down data to business platforms, and business users are typically responsible for *using the data* once it gets to their end platform.
-
-Ultimately, reverse ETL is a way to put data where the work is already happening, support self-service efforts, and help business users derive real action out of their data.
-
-## How reverse ETL works
-
-In the reverse ETL process, transformed data is synced from a data warehouse to external tools in order to be leveraged by different business teams.
-
-![A diagram depicting how the reverse ETL process works. It starts with data being extract from data sources like email CRMs, Facebook Ad platforms, backend databases, and NetSuite. The raw data is then loaded into a data warehouse. After loading, the data is transformed and modeled. The modeled data is then loaded directly back into the tools that created the data, like Email CRMs, Facebook Ad platforms, and others so the insights are more accessible to business users.](/img/docs/terms/reverse-etl/reverse-etl-diagram.png)
-
-The power of reverse ETL comes from sending down *already transformed data* to business platforms. Raw data, while beautiful in its own way, typically lacks the structure, aggregations, and aliasing to be useful for end business users off the bat. After data teams transform data for business use in pipelines, typically to expose in an end business intelligence (BI) tool, they can also send this cleaned and meaningful data to other platforms where business users can derive value using [reverse ETL tools](#reverse-etl-tools).
-
-Data teams can choose to write additional transformations that may need to happen for end business tools in reverse ETL tools themselves or by creating [additional models in dbt](https://getdbt.com/open-source-data-culture/reverse-etl-playbook/).
-
-## Why use reverse ETL?
-
-There’s a few reasons why your team may want to consider using reverse ETL:
-
-### Putting data where the work is happening
-
-While most data teams would love it if business users spent a significant portion of their time in their BI tool, that’s neither practical nor necessarily the most efficient use of their time. In the real world, many business users will spend some time in a BI tool, identify the data that could be useful in a platform they spend a significant amount of time in, and work with the data team to get that data where they need it. Users feel comfortable and confident in the systems they use everyday—why not put the data in the places that allow them to thrive?
-
-### Manipulating data to fit end platform requirements
-
-Reverse ETL helps you to put data your business users need *in the format their end tool expects*. Oftentimes, end platforms expect data fields to be named or cast in a certain way. Instead of business users having to manually input those values in the correct format, you can transform your data using a product like dbt or directly in a reverse ETL tool itself, and sync down that data in an automated way.
-
-### Supporting self-service efforts
-
-By sending down data-team approved data in reverse ETL pipelines, your business users have the flexibility to use that data however they see fit. Soon, your business users will be making audiences, testing personalization efforts, and running their end platform like a well-oiled, data-powered machine.
-
-
-## Reverse ETL use cases
-
-Just as there are almost endless opportunities with data, there are many potential different use cases for reverse ETL. We won’t go into every possible option, but we’ll cover some of the common use cases that exist for reverse ETL efforts.
-
-### Personalization
-
-Reverse ETL allows business users to access data that they normally would only have access to in a BI tool *in the platforms they use every day*. As a result, business users can now use this data to personalize how they create ads, send emails, and communicate with customers.
-
-Personalization was all the hype a few years ago and now, you rarely ever see an email come into your inbox without some sort of personalization in-place. Data teams using reverse ETL are able to pass down important customer information, such as location, customer lifetime value (CLV), tenure, and other fields, that can be used to create personalized emails, establish appropriate messaging, and segment email flows. All we can say: the possibilities for personalization powered by reverse ETL are endless.
-
-### Sophisticated paid marketing initiatives
-
-At the end of the day, businesses want to serve the right ads to the right people (and at the right cost). A common use case for reverse ETL is for teams to use their customer data to create audiences in ad platforms to either serve specific audiences or create lookalikes. While ad platforms have gotten increasingly sophisticated with their algorithms to identify high-value audiences, it usually never hurts to try supplementing those audiences with your own data to create sophisticated audiences or lookalikes.
-
-### Self-service analytics culture
-
-We hinted at it earlier, but reverse ETL efforts can be an effective way to promote a self-service analytics culture. When data teams put the data where business users need it, business users can confidently access it on their own, driving even faster insights and action. Instead of requesting a data pull from a data team member, they can find the data they need directly within the platform that they use. Reverse ETL allows business users to act on metrics that have already been built out and validated by data teams without creating ad-hoc requests.
-
-### “Real-time” data
-
-It would be amiss if we didn’t mention reverse ETL and the notion of “real-time” data. While you can have the debate over the meaningfulness and true value-add of real-time data another time, reverse ETL can be a mechanism to bring data to end business platforms in a more “real-time” way.
-
-Data teams can set up syncs in reverse ETL tools at higher cadences, allowing business users to have the data they need, faster. Obviously, there’s some cost-benefit analysis on how often you want to be loading data via [ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/) and hitting your data warehouse, but reverse ETL can help move data into external tools at a quicker cadence if deemed necessary.
-
-All this to say: move with caution in the realm of “real-time”, understand your stakeholders’ wants and decision-making process for real-time data, and work towards a solution that’s both practical and impactful.
-
-## Reverse ETL tools
-
-Reverse ETL tools typically establish the connection between your data warehouse and end business tools, offer an interface to create additional transformations or audiences, and support automation of downstream syncs. Below are some examples of tools that support reverse ETL pipelines.
-
-| Tool | Description | Open source option? |
-|:---:|:---:|:---:|
-| Hightouch | A platform to sync data models and create custom audiences for downstream business platforms. | :x: |
-| Polytomic | A unified sync platform for syncing to and from data warehouses (ETL and Reverse ETL), databases, business apps, APIs, and spreadsheets. | :x: |
-| Census | Another reverse ETL tool that can sync data from your data warehouse to your go-to-market tools. | :x: |
-| Rudderstack | Also a CDP (customer data platform), Rudderstack additionally supports pushing down data and audience to external tools, such as ad platforms and email CRMs. | :white_check_mark: |
-| Grouparoo | Grouparoo, part of Airbyte, is an open source framework to move data from data warehouses to different cloud-based tools. | :white_check_mark: |
-
-## Conclusion
-
-Reverse ETL enables you to sync your transformed data stored in your data warehouse to external platforms often used by marketing, sales, and product teams. It allows you to leverage your data in a whole new way. Reverse ETL pipelines can support personalization efforts, sophisticated paid marketing initiatives, and ultimately offer new ways to leverage your data. In doing this, it creates a self-service analytics culture where stakeholders can receive the data they need in, in the places they need, in an automated way.
-
-## Further reading
-
-If you’re interested learning more about reverse ETL and the impact it could have on your team, check out the following:
-
-- [How dbt Labs’s data team approaches reverse ETL](https://getdbt.com/open-source-data-culture/reverse-etl-playbook/)
-- [The operational data warehouse in action: Reverse ETL, CDPs, and the future of data activation](https://www.getdbt.com/coalesce-2021/operational-data-warehouse-reverse-etl-cdp-data-activation/)
-- [The analytics engineering guide: Operational analytics](https://www.getdbt.com/analytics-engineering/use-cases/operational-analytics/)
diff --git a/website/sidebars.js b/website/sidebars.js
index 3a8f560c297..69da75833a7 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -26,11 +26,12 @@ const sidebarSettings = {
label: "About dbt Cloud",
link: { type: "doc", id: "docs/cloud/about-cloud/dbt-cloud-features" },
items: [
- "docs/cloud/about-cloud/dbt-cloud-features",
"docs/cloud/about-cloud/architecture",
- "docs/cloud/about-cloud/tenancy",
- "docs/cloud/about-cloud/access-regions-ip-addresses",
"docs/cloud/about-cloud/browsers",
+ "docs/cloud/about-cloud/change-your-dbt-cloud-theme",
+ "docs/cloud/about-cloud/dbt-cloud-features",
+ "docs/cloud/about-cloud/access-regions-ip-addresses",
+ "docs/cloud/about-cloud/tenancy",
],
}, // About dbt Cloud directory
{
@@ -289,9 +290,9 @@ const sidebarSettings = {
items: [
"docs/cloud/dbt-cloud-ide/develop-in-the-cloud",
"docs/cloud/dbt-cloud-ide/keyboard-shortcuts",
- "docs/cloud/dbt-cloud-ide/ide-user-interface",
- "docs/cloud/dbt-cloud-ide/lint-format",
"docs/cloud/dbt-cloud-ide/git-commit-signing",
+ "docs/cloud/dbt-cloud-ide/lint-format",
+ "docs/cloud/dbt-cloud-ide/ide-user-interface",
{
type: "category",
label: "dbt Copilot",
@@ -366,9 +367,9 @@ const sidebarSettings = {
items: [
"docs/build/about-metricflow",
"docs/build/join-logic",
- "docs/build/validation",
"docs/build/metricflow-time-spine",
"docs/build/metricflow-commands",
+ "docs/build/validation",
],
},
{
@@ -438,10 +439,10 @@ const sidebarSettings = {
link: { type: "doc", id: "docs/build/enhance-your-code" },
items: [
"docs/build/enhance-your-code",
- "docs/build/project-variables",
"docs/build/environment-variables",
- "docs/build/packages",
"docs/build/hooks-operations",
+ "docs/build/packages",
+ "docs/build/project-variables",
],
},
{
@@ -500,13 +501,13 @@ const sidebarSettings = {
link: { type: "doc", id: "docs/deploy/monitor-jobs" },
items: [
"docs/deploy/monitor-jobs",
- "docs/deploy/run-visibility",
- "docs/deploy/retry-jobs",
+ "docs/deploy/artifacts",
"docs/deploy/job-notifications",
"docs/deploy/model-notifications",
- "docs/deploy/webhooks",
- "docs/deploy/artifacts",
+ "docs/deploy/run-visibility",
+ "docs/deploy/retry-jobs",
"docs/deploy/source-freshness",
+ "docs/deploy/webhooks",
],
},
"docs/deploy/deployment-tools",
@@ -524,12 +525,12 @@ const sidebarSettings = {
link: { type: "doc", id: "docs/collaborate/explore-projects" },
items: [
"docs/collaborate/explore-projects",
- "docs/collaborate/data-health-signals",
"docs/collaborate/access-from-dbt-cloud",
"docs/collaborate/column-level-lineage",
+ "docs/collaborate/data-health-signals",
+ "docs/collaborate/explore-multiple-projects",
"docs/collaborate/model-performance",
"docs/collaborate/project-recommendations",
- "docs/collaborate/explore-multiple-projects",
"docs/collaborate/dbt-explorer-faqs",
{
type: "category",
@@ -729,8 +730,8 @@ const sidebarSettings = {
link: { type: "doc", id: "docs/dbt-cloud-apis/sl-api-overview" },
items: [
"docs/dbt-cloud-apis/sl-api-overview",
- "docs/dbt-cloud-apis/sl-jdbc",
"docs/dbt-cloud-apis/sl-graphql",
+ "docs/dbt-cloud-apis/sl-jdbc",
"docs/dbt-cloud-apis/sl-python",
],
},
@@ -809,6 +810,7 @@ const sidebarSettings = {
items: [
"docs/dbt-versions/dbt-cloud-release-notes",
"docs/dbt-versions/compatible-track-changelog",
+ "docs/dbt-versions/2024-release-notes",
"docs/dbt-versions/2023-release-notes",
"docs/dbt-versions/2022-release-notes",
{
@@ -851,18 +853,18 @@ const sidebarSettings = {
"reference/project-configs/asset-paths",
"reference/project-configs/clean-targets",
"reference/project-configs/config-version",
- "reference/project-configs/seed-paths",
"reference/project-configs/dispatch-config",
"reference/project-configs/docs-paths",
"reference/project-configs/macro-paths",
- "reference/project-configs/packages-install-path",
"reference/project-configs/name",
"reference/project-configs/on-run-start-on-run-end",
+ "reference/project-configs/packages-install-path",
"reference/project-configs/profile",
"reference/project-configs/query-comment",
"reference/project-configs/quoting",
"reference/project-configs/require-dbt-version",
"reference/project-configs/snapshot-paths",
+ "reference/project-configs/seed-paths",
"reference/project-configs/model-paths",
"reference/project-configs/test-paths",
"reference/project-configs/version",
@@ -926,27 +928,27 @@ const sidebarSettings = {
type: "category",
label: "General configs",
items: [
+ "reference/advanced-config-usage",
"reference/resource-configs/access",
"reference/resource-configs/alias",
"reference/resource-configs/batch-size",
"reference/resource-configs/begin",
+ "reference/resource-configs/contract",
"reference/resource-configs/database",
+ "reference/resource-configs/docs",
"reference/resource-configs/enabled",
"reference/resource-configs/event-time",
"reference/resource-configs/full_refresh",
- "reference/resource-configs/contract",
"reference/resource-configs/grants",
"reference/resource-configs/group",
- "reference/resource-configs/docs",
"reference/resource-configs/lookback",
+ "reference/resource-configs/meta",
"reference/resource-configs/persist_docs",
+ "reference/resource-configs/plus-prefix",
"reference/resource-configs/pre-hook-post-hook",
"reference/resource-configs/schema",
"reference/resource-configs/tags",
"reference/resource-configs/unique_key",
- "reference/resource-configs/meta",
- "reference/advanced-config-usage",
- "reference/resource-configs/plus-prefix",
],
},
{
@@ -956,10 +958,10 @@ const sidebarSettings = {
"reference/model-properties",
"reference/resource-properties/model_name",
"reference/model-configs",
+ "reference/resource-properties/concurrent_batches",
"reference/resource-configs/materialized",
"reference/resource-configs/on_configuration_change",
"reference/resource-configs/sql_header",
- "reference/resource-properties/concurrent_batches",
],
},
{
@@ -1010,10 +1012,10 @@ const sidebarSettings = {
items: [
"reference/resource-properties/unit-tests",
"reference/resource-properties/unit-test-input",
- "reference/resource-properties/unit-testing-versions",
- "reference/resource-properties/unit-test-overrides",
"reference/resource-properties/data-formats",
"reference/resource-properties/data-types",
+ "reference/resource-properties/unit-testing-versions",
+ "reference/resource-properties/unit-test-overrides",
],
},
{
@@ -1089,15 +1091,15 @@ const sidebarSettings = {
label: "Node selection",
items: [
"reference/node-selection/syntax",
+ "reference/node-selection/exclude",
+ "reference/node-selection/defer",
"reference/node-selection/graph-operators",
"reference/node-selection/set-operators",
- "reference/node-selection/exclude",
"reference/node-selection/methods",
"reference/node-selection/putting-it-together",
+ "reference/node-selection/state-comparison-caveats",
"reference/node-selection/yaml-selectors",
"reference/node-selection/test-selection-examples",
- "reference/node-selection/defer",
- "reference/node-selection/state-comparison-caveats",
],
},
{
@@ -1115,8 +1117,8 @@ const sidebarSettings = {
link: { type: "doc", id: "reference/global-configs/adapter-behavior-changes" },
items: [
"reference/global-configs/adapter-behavior-changes",
- "reference/global-configs/databricks-changes",
"reference/global-configs/redshift-changes",
+ "reference/global-configs/databricks-changes",
],
},
{
@@ -1132,6 +1134,8 @@ const sidebarSettings = {
type: "category",
label: "Available flags",
items: [
+ "reference/global-configs/usage-stats",
+ "reference/global-configs/version-compatibility",
"reference/global-configs/logs",
"reference/global-configs/cache",
"reference/global-configs/failing-fast",
@@ -1141,8 +1145,6 @@ const sidebarSettings = {
"reference/global-configs/print-output",
"reference/global-configs/record-timing-info",
"reference/global-configs/resource-type",
- "reference/global-configs/usage-stats",
- "reference/global-configs/version-compatibility",
"reference/global-configs/warnings",
],
},
@@ -1183,9 +1185,9 @@ const sidebarSettings = {
label: "dbt Artifacts",
items: [
"reference/artifacts/dbt-artifacts",
+ "reference/artifacts/catalog-json",
"reference/artifacts/manifest-json",
"reference/artifacts/run-results-json",
- "reference/artifacts/catalog-json",
"reference/artifacts/sources-json",
"reference/artifacts/sl-manifest",
"reference/artifacts/other-artifacts",
diff --git a/website/snippets/_cloud-environments-info.md b/website/snippets/_cloud-environments-info.md
index 6d202d01998..cc153cf38a8 100644
--- a/website/snippets/_cloud-environments-info.md
+++ b/website/snippets/_cloud-environments-info.md
@@ -8,12 +8,12 @@ In dbt Cloud, there are two types of environments:
- Production
- **Development environment** — Determines the settings used in the dbt Cloud IDE or dbt Cloud CLI, for that particular project.
-Each dbt Cloud project can only have a single development environment but can have any number of deployment environments.
+Each dbt Cloud project can only have a single development environment, but can have any number of General deployment environments, one Production deployment environment and one Staging deployment environment.
-|| Development | Staging | Deployment |
-|------| --- | --- | --- |
-| **Determines settings for** | dbt Cloud IDE or dbt Cloud CLI | dbt Cloud Job runs | dbt Cloud Job runs |
-| **How many can I have in my project?** | 1 | Any number | Any number |
+| | Development | General | Production | Staging |
+|----------|-------------|---------|------------|---------|
+| **Determines settings for** | dbt Cloud IDE or dbt Cloud CLI | dbt Cloud Job runs | dbt Cloud Job runs | dbt Cloud Job runs |
+| **How many can I have in my project?** | 1 | Any number | 1 | 1 |
:::note
For users familiar with development on dbt Core, each environment is roughly analogous to an entry in your `profiles.yml` file, with some additional information about your repository to ensure the proper version of code is executed. More info on dbt core environments [here](/docs/core/dbt-core-environments).
@@ -25,11 +25,12 @@ Both development and deployment environments have a section called **General Set
| Setting | Example Value | Definition | Accepted Values |
| --- | --- | --- | --- |
-| Name | Production | The environment name | Any string! |
-| Environment Type | Deployment | The type of environment | [Deployment, Development] |
-| dbt Version | 1.4 (latest) | The dbt version used | Any dbt version in the dropdown |
-| Default to Custom Branch | ☑️ | Determines whether to use a branch other than the repository’s default | See below |
-| Custom Branch | dev | Custom Branch name | See below |
+| Environment name | Production | The environment name | Any string! |
+| Environment type | Deployment | The type of environment | Deployment, Development|
+| Set deployment type | PROD | Designates the deployment environment type. | Production, Staging, General |
+| dbt version | Latest | dbt Cloud automatically upgrades the dbt version running in this environment, based on the [release track](/docs/dbt-versions/cloud-release-tracks) you select. | Lastest, Compatible, Extended |
+| Only run on a custom branch | ☑️ | Determines whether to use a branch other than the repository’s default | See below |
+| Custom branch | dev | Custom Branch name | See below |
:::note About dbt version
diff --git a/website/snippets/_enterprise-permissions-table.md b/website/snippets/_enterprise-permissions-table.md
index b39337697c1..4104759b24d 100644
--- a/website/snippets/_enterprise-permissions-table.md
+++ b/website/snippets/_enterprise-permissions-table.md
@@ -19,7 +19,7 @@ Key:
{`
| Account-level permission| Account Admin | Billing admin | Manage marketplace apps | Project creator | Security admin | Viewer |
|:-------------------------|:-------------:|:------------:|:-------------------------:|:---------------:|:--------------:|:------:|
-| Account settings | W | - | - | R | R | R |
+| Account settings* | W | - | - | R | R | R |
| Audit logs | R | - | - | - | R | R |
| Auth provider | W | - | - | - | W | R |
| Billing | W | W | - | - | - | R |
@@ -38,6 +38,9 @@ Key:
+\* Roles with write (**W**) access to Account settings can modify account-level settings, including [setting up Slack notifications](/docs/deploy/job-notifications#slack-notifications).
+
+
#### Project permissions for account roles
diff --git a/website/snippets/_git-providers-supporting-ci.md b/website/snippets/_git-providers-supporting-ci.md
new file mode 100644
index 00000000000..34bd87db2fc
--- /dev/null
+++ b/website/snippets/_git-providers-supporting-ci.md
@@ -0,0 +1,15 @@
+## Availability of features by Git provider
+
+- If your git provider has a [native dbt Cloud integration](/docs/cloud/git/git-configuration-in-dbt-cloud), you can seamlessly set up [continuous integration (CI)](/docs/deploy/ci-jobs) jobs directly within dbt Cloud.
+
+- For providers without native integration, you can still use the [Git clone method](/docs/cloud/git/import-a-project-by-git-url) to import your git URL and leverage the [dbt Cloud Administrative API](/docs/dbt-cloud-apis/admin-cloud-api) to trigger a CI job to run.
+
+The following table outlines the available integration options and their corresponding capabilities.
+
+| **Git provider** | **Native dbt Cloud integration** | **Automated CI job**|**Git clone**| **Information**|
+| -----------------| ---------------------------------| -------------------------------------------|-----------------------|---------|
+|[Azure DevOps](/docs/cloud/git/setup-azure)
| ✅| ✅ | ✅ | Organizations on the Team and Developer plans can connect to Azure DeveOps using a deploy key. Note, you won’t be able to configure automated CI jobs but you can still develop.|
+|[GitHub](/docs/cloud/git/connect-github)
| ✅ | ✅ | ✅ |
+|[GitLab](/docs/cloud/git/connect-gitlab)
| ✅ | ✅ | ✅ |
+|All other git providers using [Git clone](/docs/cloud/git/import-a-project-by-git-url) ([BitBucket](/docs/cloud/git/import-a-project-by-git-url#bitbucket), [AWS CodeCommit](/docs/cloud/git/import-a-project-by-git-url#aws-codecommit), and others)| ❌ | ❌ | ✅ | Refer to the [Customizing CI/CD with custom pipelines](/guides/custom-cicd-pipelines?step=1) guide to set up continuous integration and continuous deployment (CI/CD).|
+
diff --git a/website/snippets/cloud-feature-parity.md b/website/snippets/cloud-feature-parity.md
index 1107b999c14..f71109292d7 100644
--- a/website/snippets/cloud-feature-parity.md
+++ b/website/snippets/cloud-feature-parity.md
@@ -1,6 +1,6 @@
The following table outlines which dbt Cloud features are supported on the different SaaS options available today. For more information about feature availability, please [contact us](https://www.getdbt.com/contact/).
-| Feature | AWS Multi-tenant | AWS single tenant |Azure multi-tenant ([Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud)) | Azure single tenant |
+| Feature | AWS Multi-tenant | AWS single tenant |Azure multi-tenant | Azure single tenant |
|-------------------------------|------------------|-----------------------|---------------------|---------------------|
| Audit logs | ✅ | ✅ | ✅ | ✅ |
| Continuous integration jobs | ✅ | ✅ | ✅ | ✅ |
diff --git a/website/static/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png b/website/static/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png
new file mode 100644
index 00000000000..29f64cc59f7
Binary files /dev/null and b/website/static/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png differ
diff --git a/website/static/img/blog/authors/mwan.png b/website/static/img/blog/authors/mwan.png
new file mode 100644
index 00000000000..ac852ee3636
Binary files /dev/null and b/website/static/img/blog/authors/mwan.png differ
diff --git a/website/static/img/docs/cloud-integrations/assign-app-to-members.png b/website/static/img/docs/cloud-integrations/assign-app-to-members.png
new file mode 100644
index 00000000000..dac1b415d30
Binary files /dev/null and b/website/static/img/docs/cloud-integrations/assign-app-to-members.png differ
diff --git a/website/static/img/docs/cloud-integrations/azure-subscription.png b/website/static/img/docs/cloud-integrations/azure-subscription.png
new file mode 100644
index 00000000000..19f19dc2814
Binary files /dev/null and b/website/static/img/docs/cloud-integrations/azure-subscription.png differ
diff --git a/website/static/img/docs/cloud-integrations/create-service-principal.png b/website/static/img/docs/cloud-integrations/create-service-principal.png
new file mode 100644
index 00000000000..a072c92b3ef
Binary files /dev/null and b/website/static/img/docs/cloud-integrations/create-service-principal.png differ
diff --git a/website/static/img/docs/cloud-integrations/review-and-assign.png b/website/static/img/docs/cloud-integrations/review-and-assign.png
new file mode 100644
index 00000000000..570717daeda
Binary files /dev/null and b/website/static/img/docs/cloud-integrations/review-and-assign.png differ
diff --git a/website/static/img/docs/cloud-integrations/service-principal-fields.png b/website/static/img/docs/cloud-integrations/service-principal-fields.png
new file mode 100644
index 00000000000..eb391ab122d
Binary files /dev/null and b/website/static/img/docs/cloud-integrations/service-principal-fields.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/azure-enable.png b/website/static/img/docs/dbt-cloud/access-control/azure-enable.png
index 8d95a5cb9fe..7f79bcb3c7c 100644
Binary files a/website/static/img/docs/dbt-cloud/access-control/azure-enable.png and b/website/static/img/docs/dbt-cloud/access-control/azure-enable.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/edit-entra-saml.png b/website/static/img/docs/dbt-cloud/access-control/edit-entra-saml.png
new file mode 100644
index 00000000000..ceda1ee0bcc
Binary files /dev/null and b/website/static/img/docs/dbt-cloud/access-control/edit-entra-saml.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/entra-id-saml.png b/website/static/img/docs/dbt-cloud/access-control/entra-id-saml.png
new file mode 100644
index 00000000000..01ab65cef27
Binary files /dev/null and b/website/static/img/docs/dbt-cloud/access-control/entra-id-saml.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/google-enable.png b/website/static/img/docs/dbt-cloud/access-control/google-enable.png
index 0c46cac6d6e..a2ffd42fb50 100644
Binary files a/website/static/img/docs/dbt-cloud/access-control/google-enable.png and b/website/static/img/docs/dbt-cloud/access-control/google-enable.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/new-okta-completed.png b/website/static/img/docs/dbt-cloud/access-control/new-okta-completed.png
index 7da82285a20..89c246ffc45 100644
Binary files a/website/static/img/docs/dbt-cloud/access-control/new-okta-completed.png and b/website/static/img/docs/dbt-cloud/access-control/new-okta-completed.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/new-okta-config.png b/website/static/img/docs/dbt-cloud/access-control/new-okta-config.png
index c7018a64327..342e89ca631 100644
Binary files a/website/static/img/docs/dbt-cloud/access-control/new-okta-config.png and b/website/static/img/docs/dbt-cloud/access-control/new-okta-config.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/saml-enable-entra.png b/website/static/img/docs/dbt-cloud/access-control/saml-enable-entra.png
new file mode 100644
index 00000000000..e0a71da007b
Binary files /dev/null and b/website/static/img/docs/dbt-cloud/access-control/saml-enable-entra.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/saml-enable.png b/website/static/img/docs/dbt-cloud/access-control/saml-enable.png
index a165a3ee59b..212afeb7fef 100644
Binary files a/website/static/img/docs/dbt-cloud/access-control/saml-enable.png and b/website/static/img/docs/dbt-cloud/access-control/saml-enable.png differ
diff --git a/website/static/img/docs/dbt-cloud/access-control/sso-uri.png b/website/static/img/docs/dbt-cloud/access-control/sso-uri.png
index c557b903e57..87787184974 100644
Binary files a/website/static/img/docs/dbt-cloud/access-control/sso-uri.png and b/website/static/img/docs/dbt-cloud/access-control/sso-uri.png differ
diff --git a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png
index 3773d468d6a..4b3c64a7b32 100644
Binary files a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png and b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png differ
diff --git a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-verify-overridden-version.png b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-verify-overridden-version.png
deleted file mode 100644
index a6e553a0b2e..00000000000
Binary files a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-verify-overridden-version.png and /dev/null differ
diff --git a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png
index e17e1eb471f..4ceeb564576 100644
Binary files a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png and b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png differ
diff --git a/website/static/img/docs/dbt-cloud/connecting-azure-devops/AD app.png b/website/static/img/docs/dbt-cloud/connecting-azure-devops/AD app.png
index 01536bab17f..a921c8544b5 100644
Binary files a/website/static/img/docs/dbt-cloud/connecting-azure-devops/AD app.png and b/website/static/img/docs/dbt-cloud/connecting-azure-devops/AD app.png differ
diff --git a/website/static/img/docs/dbt-cloud/connecting-azure-devops/add-service-principal.png b/website/static/img/docs/dbt-cloud/connecting-azure-devops/add-service-principal.png
new file mode 100644
index 00000000000..7b9065df74d
Binary files /dev/null and b/website/static/img/docs/dbt-cloud/connecting-azure-devops/add-service-principal.png differ
diff --git a/website/static/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png b/website/static/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png
index b8b11f6ea00..7494972d4f6 100644
Binary files a/website/static/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png and b/website/static/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png differ
diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/dark-mode.png b/website/static/img/docs/dbt-cloud/using-dbt-cloud/dark-mode.png
new file mode 100644
index 00000000000..80d36bcaaa0
Binary files /dev/null and b/website/static/img/docs/dbt-cloud/using-dbt-cloud/dark-mode.png differ
diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/light-vs-dark.png b/website/static/img/docs/dbt-cloud/using-dbt-cloud/light-vs-dark.png
new file mode 100644
index 00000000000..54263d092b5
Binary files /dev/null and b/website/static/img/docs/dbt-cloud/using-dbt-cloud/light-vs-dark.png differ
diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg b/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg
deleted file mode 100644
index 04ec9280f14..00000000000
Binary files a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg and /dev/null differ
diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.png b/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.png
deleted file mode 100644
index 5f75707090c..00000000000
Binary files a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.png and /dev/null differ
diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/theme-selection-in-the-ide.png b/website/static/img/docs/dbt-cloud/using-dbt-cloud/theme-selection-in-the-ide.png
new file mode 100644
index 00000000000..cdb85349153
Binary files /dev/null and b/website/static/img/docs/dbt-cloud/using-dbt-cloud/theme-selection-in-the-ide.png differ
diff --git a/website/vercel.json b/website/vercel.json
index 993ff9065bd..927b7ec6b2f 100644
--- a/website/vercel.json
+++ b/website/vercel.json
@@ -2,6 +2,11 @@
"cleanUrls": true,
"trailingSlash": false,
"redirects": [
+ {
+ "source": "/docs/cloud/about-cloud/dark-mode",
+ "destination": "/docs/cloud/about-cloud/change-your-dbt-cloud-theme",
+ "permanent": true
+ },
{
"source": "/docs/collaborate/git/managed-repository",
"destination": "/docs/cloud/git/managed-repository",
@@ -3631,13 +3636,28 @@
"destination": "https://www.getdbt.com/blog/guide-to-surrogate-key",
"permanent": true
},
+ {
+ "source": "/terms/elt",
+ "destination": "https://www.getdbt.com/blog/extract-load-transform",
+ "permanent": true
+ },
+ {
+ "source": "/terms/etl",
+ "destination": "https://www.getdbt.com/blog/extract-transform-load",
+ "permanent": true
+ },
+ {
+ "source": "/terms/reverse-etl",
+ "destination": "https://www.getdbt.com/blog/reverse-etl-playbook",
+ "permanent": true
+ },
{
"source": "/glossary",
"destination": "https://www.getdbt.com/blog",
"permanent": true
},
{
- "source": "/terms/:path((?!elt|etl|reverse-etl).*)",
+ "source": "/terms/:path*",
"destination": "https://www.getdbt.com/blog",
"permanent": true
}