diff --git a/website/blog/2023-11-14-specify-prod-environment.md b/website/blog/2023-11-14-specify-prod-environment.md index 0e205abd749..880488e2b43 100644 --- a/website/blog/2023-11-14-specify-prod-environment.md +++ b/website/blog/2023-11-14-specify-prod-environment.md @@ -14,8 +14,8 @@ is_featured: false --- -:::note You can now use a Staging environment! -This blog post was written before Staging environments. You can now use dbt Cloud can to support the patterns discussed here. Read more about [Staging environments](/docs/deploy/deploy-environments#staging-environment). +:::note You can now specify a Staging environment too! +This blog post was written before dbt Cloud added full support for Staging environments. Now that they exist, you should mark your CI environment as Staging as well. Read more about [Staging environments](/docs/deploy/deploy-environments#staging-environment). ::: :::tip The Bottom Line: diff --git a/website/blog/2025-01-21-wish-i-had-a-control-plane-for-my-renovation.md b/website/blog/2025-01-21-wish-i-had-a-control-plane-for-my-renovation.md new file mode 100644 index 00000000000..80db3f66de6 --- /dev/null +++ b/website/blog/2025-01-21-wish-i-had-a-control-plane-for-my-renovation.md @@ -0,0 +1,67 @@ +--- +title: "Why I wish I had a control plane for my renovation" +description: "When I think back to my renovation, I realize how much smoother it would've been if I’d had a control plane for the entire process." +slug: wish-i-had-a-control-plane-for-my-renovation + +authors: [mark_wan] + +tags: [analytics craft, data_ecosystem] +hide_table_of_contents: false + +date: 2025-01-21 +is_featured: true +--- + +When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didn’t realize was just how complex and exhausting managing so many moving parts would be. + + + +We had to coordinate multiple elements: + +- The **architects**, who designed the layout, interior, and exterior. +- The **architectural plans**, which outlined what the house should look like. +- The **builders**, who executed those plans. +- The **inspectors**, **councils**, and **energy raters**, who checked whether everything met the required standards. + + + +Each piece was critical — without the plans, there’s no shared vision; without the builders, the plans don’t come to life; and without inspections, mistakes go unnoticed. + +But as an inexperienced project manager, I was also the one responsible for stitching everything together: +- Architects handed me detailed plans, builders asked for clarifications. +- Inspectors flagged issues that were often too late to fix without extra costs or delays. +- On top of all this, I also don't speak "builder". + +So what should have been quick and collaborative conversations, turned into drawn-out processes because there was no unified system to keep everyone on the same page. + +## In many ways, this mirrors how data pipelines operate + +- The **architects** are the engineers — designing how the pieces fit together. +- The **architectural plans** are your dbt code — the models, tests, and configurations that define what your data should look like. +- The **builders** are the compute layers (for example, Snowflake, BigQuery, or Databricks) that execute those transformations. +- The **inspectors** are the monitoring tools, which focus on retrospective insights like logs, job performance, and error rates. + +Here’s the challenge: monitoring tools, by their nature, look backward. They’re great at telling you what happened, but they don’t help you plan or declare what should happen. And when these roles, plans, execution, and monitoring are siloed, teams are left trying to manually stitch them together, often wasting time troubleshooting issues or coordinating workflows. + +## What makes dbt Cloud different + +[dbt Cloud](https://www.getdbt.com/product/dbt-cloud) unifies these perspectives into a single [control plane](https://www.getdbt.com/blog/data-control-plane-introduction), bridging proactive and retrospective capabilities: + +- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/syntax#state-selection) of your data before jobs even run — your architectural plans are baked into the pipeline. +- **Retrospective insights**: dbt Cloud surfaces [job logs](https://docs.getdbt.com/docs/deploy/run-visibility), performance metrics, and test results, providing the same level of insight as traditional monitoring tools. + +But the real power lies in how dbt integrates these two perspectives. Transformation logic (the plans) and monitoring (the inspections) are tightly connected, creating a continuous feedback loop where issues can be identified and resolved faster, and pipelines can be optimized more effectively. + +## Why does this matter? + +1. **The silo problem**: Many organizations rely on separate tools for transformation and monitoring. This fragmentation creates blind spots, making it harder to identify and resolve issues. +2. **Integrated workflows**: dbt Cloud eliminates these silos by connecting transformation and monitoring logic in one place. It doesn’t just report on what happened; it ties those insights directly to the proactive plans that define your pipeline. +3. **Operational confidence**: With dbt Cloud, you can trust that your data pipelines are not only functional but aligned with your business goals, monitored in real-time, and easy to troubleshoot. + +## Why I wish I had a control plane for my renovation + +When I think back to my renovation, I realize how much smoother it would have been if I’d had a control plane for the entire process. There are firms that specialize in design-and-build projects, in-house architects, engineers, and contractors. The beauty of these firms is that everything is under one roof, so you know they’re communicating seamlessly. + +In my case, though, my architect, builder, and engineer were all completely separate, which meant I was the intermediary. I was the pigeon service shuttling information between them, and it was exhausting. Discussions that should have taken minutes, stretched into weeks and sometimes even months because there was no centralized communication. + +dbt Cloud is like having that design-and-build firm for your data pipelines. It’s the control plane that unites proactive planning with retrospective monitoring, eliminating silos and inefficiencies. With dbt Cloud, you don’t need to play the role of the pigeon service — it gives you the visibility, integration, and control you need to manage modern data workflows effortlessly. diff --git a/website/blog/authors.yml b/website/blog/authors.yml index 3070ec806b5..da08a8aa729 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -623,3 +623,11 @@ yu_ishikawa: url: https://www.linkedin.com/in/yuishikawa0301 name: Yu Ishikawa organization: Ubie +mark_wan: + image_url: /img/blog/authors/mwan.png + job_title: Senior Solutions Architect + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/markwwan/ + name: Mark Wan + organization: dbt Labs diff --git a/website/docs/docs/build/custom-aliases.md b/website/docs/docs/build/custom-aliases.md index ae1d93e66a7..9a1e6b3a647 100644 --- a/website/docs/docs/build/custom-aliases.md +++ b/website/docs/docs/build/custom-aliases.md @@ -157,3 +157,8 @@ If these models should indeed have the same database identifier, you can work ar By default, dbt will create versioned models with the alias `_v`, where `` is that version's unique identifier. You can customize this behavior just like for non-versioned models by configuring a custom `alias` or re-implementing the `generate_alias_name` macro. +## Related docs + +- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias +- [Custom schema](/docs/build/custom-schemas) to learn how to customize dbt schema +- [Custom database](/docs/build/custom-databases) to learn how to customize dbt database diff --git a/website/docs/docs/build/custom-databases.md b/website/docs/docs/build/custom-databases.md index 7a607534230..a336f50c9f5 100644 --- a/website/docs/docs/build/custom-databases.md +++ b/website/docs/docs/build/custom-databases.md @@ -98,3 +98,9 @@ See docs on macro `dispatch`: ["Managing different global overrides across packa ### BigQuery When dbt opens a BigQuery connection, it will do so using the `project_id` defined in your active `profiles.yml` target. This `project_id` will be billed for the queries that are executed in the dbt run, even if some models are configured to be built in other projects. + +## Related docs + +- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias +- [Custom schema](/docs/build/custom-schemas) to learn how to customize dbt model schema +- [Custom aliases](/docs/build/custom-aliases) to learn how to customize dbt model alias name diff --git a/website/docs/docs/build/custom-schemas.md b/website/docs/docs/build/custom-schemas.md index 6dabf56943c..fb62b5c926a 100644 --- a/website/docs/docs/build/custom-schemas.md +++ b/website/docs/docs/build/custom-schemas.md @@ -207,3 +207,9 @@ In the `generate_schema_name` macro examples shown in the [built-in alternative If your schema names are being generated incorrectly, double-check your target name in the relevant environment. For more information, consult the [managing environments in dbt Core](/docs/core/dbt-core-environments) guide. + +## Related docs + +- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias +- [Custom database](/docs/build/custom-databases) to learn how to customize dbt model database +- [Custom aliases](/docs/build/custom-aliases) to learn how to customize dbt model alias name diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md index 559fe468644..24c468b976d 100644 --- a/website/docs/docs/build/data-tests.md +++ b/website/docs/docs/build/data-tests.md @@ -73,7 +73,9 @@ having total_amount < 0 The name of this test is the name of the file: `assert_total_payment_amount_is_positive`. -Note, you won't need to include semicolons (;) at the end of the SQL statement in your singular test files as it can cause your test to fail. +Note: +- Omit semicolons (;) at the end of the SQL statement in your singular test files, as they can cause your test to fail. +- Singular tests placed in the tests directory are automatically executed when running `dbt test`. Don't reference singular tests in `model_name.yml`, as they are not treated as generic tests or macros, and doing so will result in an error. To add a description to a singular test in your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content: diff --git a/website/docs/docs/build/enhance-your-code.md b/website/docs/docs/build/enhance-your-code.md index 5f2d48f6f5a..85fa02f70e2 100644 --- a/website/docs/docs/build/enhance-your-code.md +++ b/website/docs/docs/build/enhance-your-code.md @@ -7,21 +7,17 @@ pagination_prev: null
- - -
-
-
+ + title="Project variables" + body="Learn how to use project variables to provide data to models for compilation." + link="/docs/build/project-variables" + icon="dbt-bit"/> -
\ No newline at end of file + diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 9d51b77e2e4..b1aef5f28db 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -29,7 +29,7 @@ Microbatch is an incremental strategy designed for large time-series datasets: - Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies). -### How microbatch works +## How microbatch works When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure. @@ -37,6 +37,19 @@ Each "batch" corresponds to a single bounded time period (by default, a single d This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills), concurrently, and [retry](#retry) them independently. +### Adapter-specific behavior + +dbt's microbatch strategy uses the most efficient mechanism available for "full batch" replacement on each adapter. This can vary depending on the adapter: + +- `dbt-postgres`: Uses the `merge` strategy, which performs "update" or "insert" operations. +- `dbt-redshift`: Uses the `delete+insert` strategy, which "inserts" or "replaces." +- `dbt-snowflake`: Uses the `delete+insert` strategy, which "inserts" or "replaces." +- `dbt-bigquery`: Uses the `insert_overwrite` strategy, which "inserts" or "replaces." +- `dbt-spark`: Uses the `insert_overwrite` strategy, which "inserts" or "replaces." +- `dbt-databricks`: Uses the `replace_where` strategy, which "inserts" or "replaces." + +Check out the [supported incremental strategies by adapter](/docs/build/incremental-strategy#supported-incremental-strategies-by-adapter) for more info. + ## Example A `sessions` model aggregates and enriches data that comes from two other models: @@ -170,7 +183,7 @@ customers as ( -dbt will instruct the data platform to take the result of each batch query and insert, update, or replace the contents of the `analytics.sessions` table for the same day of data. To perform this operation, dbt will use the most efficient atomic mechanism for "full batch" replacement that is available on each data platform. +dbt will instruct the data platform to take the result of each batch query and [insert, update, or replace](#adapter-specific-behavior) the contents of the `analytics.sessions` table for the same day of data. To perform this operation, dbt will use the most efficient atomic mechanism for "full batch" replacement that is available on each data platform. For details, see [How microbatch works](#how-microbatch-works). It does not matter whether the table already contains data for that day. Given the same input data, the resulting table is the same no matter how many times a batch is reprocessed. diff --git a/website/docs/docs/build/incremental-strategy.md b/website/docs/docs/build/incremental-strategy.md index 9176e962a3a..b613388a7c9 100644 --- a/website/docs/docs/build/incremental-strategy.md +++ b/website/docs/docs/build/incremental-strategy.md @@ -1,5 +1,6 @@ --- title: "About incremental strategy" +sidebar_label: "About incremental strategy" description: "Learn about the various ways (strategies) to implement incremental materializations." id: "incremental-strategy" --- diff --git a/website/docs/docs/build/metricflow-commands.md b/website/docs/docs/build/metricflow-commands.md index 2da5618b86f..9a35939a8b4 100644 --- a/website/docs/docs/build/metricflow-commands.md +++ b/website/docs/docs/build/metricflow-commands.md @@ -77,7 +77,7 @@ The following table lists the commands compatible with the dbt Cloud IDE and dbt | [`list dimension-values`](#list-dimension-values) | List dimensions with metrics. | ✅ | ✅ | | [`list entities`](#list-entities) | Lists all unique entities. | ✅ | ✅ | | [`list saved-queries`](#list-saved-queries) | Lists available saved queries. Use the `--show-exports` flag to display each export listed under a saved query or `--show-parameters` to show the full query parameters each saved query uses. | ✅ | ✅ | -| [`query`](#query) | Query metrics, saved queries, and dimensions you want to see in the command line interface. Refer to [query examples](#query-examples) to help you get started. | ✅ | ✅ | +| [`query`](#query) | Query metrics, saved queries, and dimensions you want to see in the command line interface. Refer to [query examples](#query-examples) to query metrics and dimensions (such as querying metrics, using the `where` filter, adding an `order`, and more). | ✅ | ✅ | | [`validate`](#validate) | Validates semantic model configurations. | ✅ | ✅ | | [`export`](#export) | Runs exports for a singular saved query for testing and generating exports in your development environment. You can also use the `--select` flag to specify particular exports from a saved query. | ❌ | ✅ | | [`export-all`](#export-all) | Runs exports for multiple saved queries at once, saving time and effort. | ❌ | ✅ | @@ -118,7 +118,7 @@ Use the `mf` prefix before the command name to execute them in dbt Core. For exa -### List metrics +## List metrics This command lists the metrics with their available dimensions: ```bash @@ -132,7 +132,7 @@ Options: --help Show this message and exit. ``` -### List dimensions +## List dimensions This command lists all unique dimensions for a metric or multiple metrics. It displays only common dimensions when querying multiple metrics: @@ -146,7 +146,7 @@ Options: --help Show this message and exit. ``` -### List dimension-values +## List dimension-values This command lists all dimension values with the corresponding metric: @@ -168,7 +168,7 @@ Options: --help Show this message and exit. ``` -### List entities +## List entities This command lists all unique entities: @@ -182,7 +182,7 @@ Options: --help Show this message and exit. ``` -### List saved queries +## List saved queries This command lists all available saved queries: @@ -209,7 +209,7 @@ The list of available saved queries: - Export(new_customer_orders, alias=orders, schemas=customer_schema, exportAs=TABLE) ``` -### Validate +## Validate The following command performs validations against the defined semantic model configurations. @@ -234,7 +234,7 @@ Options: --help Show this message and exit. ``` -### Health checks +## Health checks The following command performs a health check against the data platform you provided in the configs. @@ -244,7 +244,7 @@ Note, in dbt Cloud the `health-checks` command isn't required since it uses dbt mf health-checks # In dbt Core ``` -### Tutorial +## Tutorial Follow the dedicated MetricFlow tutorial to help you get started: @@ -253,7 +253,7 @@ Follow the dedicated MetricFlow tutorial to help you get started: mf tutorial # In dbt Core ``` -### Query +## Query Create a new query with MetricFlow and execute it against your data platform. The query returns the following result: @@ -284,10 +284,11 @@ Options: time of the data (inclusive) *Not available in dbt Cloud yet - --where TEXT SQL-like where statement provided as a string and wrapped in quotes: --where "condition_statement" - For example, to query a single statement: --where "revenue > 100" - To query multiple statements: --where "revenue > 100 and user_count < 1000" - To add a dimension filter to a where filter, ensure the filter item is part of your model. + --where TEXT SQL-like where statement provided as a string and wrapped in quotes. + All filter items must explicitly reference fields or dimensions that are part of your model. + To query a single statement: ---where "{{ Dimension('order_id__revenue') }} > 100" + To query multiple statements: --where "{{ Dimension('order_id__revenue') }} > 100 and {{ Dimension('user_count') }} < 1000" + To add a dimension filter, use the `Dimension()` template wrapper to indicate that the filter item is part of your model. Refer to the [FAQ](#faqs) for more info on how to do this using a template wrapper. --limit TEXT Limit the number of rows out using an int or leave @@ -318,13 +319,18 @@ Options: ``` -### Query examples +## Query examples -The following tabs present various types of query examples that you can use to query metrics and dimensions. Select the tab that best suits your needs: +This section shares various types of query examples that you can use to query metrics and dimensions. The query examples listed are: - +- [Query metrics](#query-metrics) +- [Query dimensions](#query-dimensions) +- [Add `order`/`limit` function](#add-orderlimit) +- [Add `where` clause](#add-where-clause) +- [Filter by time](#filter-by-time) +- [Query saved queries](#query-saved-queries) - +### Query metrics Use the example to query multiple metrics by dimension and return the `order_total` and `users_active` metrics by `metric_time.` @@ -347,9 +353,8 @@ mf query --metrics order_total,users_active --group-by metric_time # In dbt Core | 2017-06-20 | 712.51 | | 2017-06-21 | 541.65 | ``` - - +### Query dimensions You can include multiple dimensions in a query. For example, you can group by the `is_food_order` dimension to confirm if orders were for food or not. Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. @@ -375,9 +380,7 @@ mf query --metrics order_total --group-by order_id__is_food_order # In dbt Core | 2017-06-19 | True | 448.11 | ``` - - - +### Add order/limit You can add order and limit functions to filter and present the data in a readable format. The following query limits the data set to 10 records and orders them by `metric_time`, descending. Note that using the `-` prefix will sort the query in descending order. Without the `-` prefix sorts the query in ascending order. @@ -405,21 +408,24 @@ mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --o | 2017-08-29 | False | 333.65 | | 2017-08-28 | False | 334.73 | ``` - - +### Add where clause -You can further filter the data set by adding a `where` clause to your query. The following example shows you how to query the `order_total` metric, grouped by `is_food_order` with multiple where statements (orders that are food orders and orders from the week starting on or after Feb 1st, 2024). Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. +You can further filter the data set by adding a `where` clause to your query. The following example shows you how to query the `order_total` metric, grouped by `is_food_order` with multiple `where` statements (orders that are food orders and orders from the week starting on or after Feb 1st, 2024). **Query** ```bash # In dbt Cloud -dbt sl query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and metric_time__week >= '2024-02-01'" +dbt sl query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and {{ TimeDimension('metric_time', 'week') }} >= '2024-02-01'" # In dbt Core -mf query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and metric_time__week >= '2024-02-01'" +mf query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and TimeDimension('metric_time', 'week') }} >= '2024-02-01'" ``` +Notes: +- The type of dimension changes the syntax you use. So if you have a date field, use `TimeDimension` instead of `Dimension`. +- When you query a dimension, you need to specify the primary entity for that dimension. In the example just shared, the primary entity is `order_id`. + **Result** ```bash ✔ Success 🦄 - query completed after 1.06 seconds @@ -437,9 +443,7 @@ mf query --metrics order_total --group-by order_id__is_food_order --where "{{ Di | 2017-08-22 | True | 401.91 | ``` - - - +### Filter by time To filter by time, there are dedicated start and end time options. Using these options to filter by time allows MetricFlow to further optimize query performance by pushing down the where filter when appropriate. @@ -468,9 +472,7 @@ mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --o | 2017-08-22 | True | 401.91 | ``` - - - +### Query saved queries You can use this for frequently used queries. Replace `` with the name of your [saved query](/docs/build/saved-queries). @@ -487,10 +489,7 @@ For example, if you use dbt Cloud and have a saved query named `new_customer_ord When querying [saved queries](/docs/build/saved-queries), you can use parameters such as `where`, `limit`, `order`, `compile`, and so on. However, keep in mind that you can't access `metric` or `group_by` parameters in this context. This is because they are predetermined and fixed parameters for saved queries, and you can't change them at query time. If you would like to query more metrics or dimensions, you can build the query using the standard format. ::: - - - -### Additional query examples +## Additional query examples The following tabs present additional query examples, like exporting to a CSV. Select the tab that best suits your needs: @@ -559,7 +558,7 @@ mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 - -### Time granularity +## Time granularity Optionally, you can specify the time granularity you want your data to be aggregated at by appending two underscores and the unit of granularity you want to `metric_time`, the global time dimension. You can group the granularity by: `day`, `week`, `month`, `quarter`, and `year`. @@ -571,7 +570,7 @@ dbt sl query --metrics revenue --group-by metric_time__month # In dbt Cloud mf query --metrics revenue --group-by metric_time__month # In dbt Core ``` -### Export +## Export Run [exports for a specific saved query](/docs/use-dbt-semantic-layer/exports#exports-for-single-saved-query). Use this command to test and generate exports in your development environment. You can also use the `--select` flag to specify particular exports from a saved query. Refer to [exports in development](/docs/use-dbt-semantic-layer/exports#exports-in-development) for more info. @@ -581,7 +580,7 @@ Export is available in dbt Cloud. dbt sl export ``` -### Export-all +## Export-all Run [exports for multiple saved queries](/docs/use-dbt-semantic-layer/exports#exports-for-multiple-saved-queries) at once. This command provides a convenient way to manage and execute exports for several queries simultaneously, saving time and effort. Refer to [exports in development](/docs/use-dbt-semantic-layer/exports#exports-in-development) for more info. diff --git a/website/docs/docs/build/metricflow-time-spine.md b/website/docs/docs/build/metricflow-time-spine.md index 5499c61a8e4..cc3c0cfd3a0 100644 --- a/website/docs/docs/build/metricflow-time-spine.md +++ b/website/docs/docs/build/metricflow-time-spine.md @@ -7,7 +7,6 @@ tags: [Metrics, Semantic Layer] --- - It's common in analytics engineering to have a date dimension or "time spine" table as a base table for different types of time-based joins and aggregations. The structure of this table is typically a base column of daily or hourly dates, with additional columns for other time grains, like fiscal quarters, defined based on the base column. You can join other tables to the time spine on the base column to calculate metrics like revenue at a point in time, or to aggregate to a specific time grain. @@ -23,7 +22,7 @@ To see the generated SQL for the metric and dimension types that use time spine ## Configuring time spine in YAML - Time spine models are normal dbt models with extra configurations that tell dbt and MetricFlow how to use specific columns by defining their properties. Add the [`models` key](/reference/model-properties) for the time spine in your `models/` directory. If your project already includes a calendar table or date dimension, you can configure that table as a time spine. Otherwise, review the [example time-spine tables](#example-time-spine-tables) to create one. + Time spine models are normal dbt models with extra configurations that tell dbt and MetricFlow how to use specific columns by defining their properties. Add the [`models` key](/reference/model-properties) for the time spine in your `models/` directory. If your project already includes a calendar table or date dimension, you can configure that table as a time spine. Otherwise, review the [example time-spine tables](#example-time-spine-tables) to create one. If the relevant model file (`util/_models.yml`) doesn't exist, create it and add the configuration mentioned in the [next section](#creating-a-time-spine-table). Some things to note when configuring time spine models: @@ -34,9 +33,9 @@ To see the generated SQL for the metric and dimension types that use time spine - If you're looking to specify the grain of a time dimension so that MetricFlow can transform the underlying column to the required granularity, refer to the [Time granularity documentation](/docs/build/dimensions?dimension=time_gran) :::tip -If you previously used a model called `metricflow_time_spine`, you no longer need to create this specific model. You can now configure MetricFlow to use any date dimension or time spine table already in your project by updating the `model` setting in the Semantic Layer. - -If you don’t have a date dimension table, you can still create one by using the code snippet in the [next section](#creating-a-time-spine-table) to build your time spine model. +- If you previously used a `metricflow_time_spine.sql` model, you can delete it after configuring the `time_spine` property in YAML. The Semantic Layer automatically recognizes the new configuration. No additional `.yml` files are needed. +- You can also configure MetricFlow to use any date dimension or time spine table already in your project by updating the `model` setting in the Semantic Layer. +- If you don’t have a date dimension table, you can still create one by using the code snippet in the [next section](#creating-a-time-spine-table) to build your time spine model. ::: ### Creating a time spine table @@ -112,9 +111,37 @@ models: For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example. +### Migrating from SQL to YAML +If your project already includes a time spine (`metricflow_time_spine.sql`), you can migrate its configuration to YAML to address any deprecation warnings you may get. + +1. Add the following configuration to a new or existing YAML file using the [`models` key](/reference/model-properties) for the time spine in your `models/` directory. Name the YAML file whatever you want (for example, `util/_models.yml`): + + + + ```yaml + models: + - name: all_days + description: A time spine with one row per day, ranging from 2020-01-01 to 2039-12-31. + time_spine: + standard_granularity_column: date_day # Column for the standard grain of your table + columns: + - name: date_day + granularity: day # Set the granularity of the column + ``` + + +2. After adding the YAML configuration, delete the existing `metricflow_time_spine.sql` file from your project to avoid any issues. + +3. Test the configuration to ensure compatibility with your production jobs. + +Note that if you're migrating from a `metricflow_time_spine.sql` file: + +- Replace its functionality by adding the `time_spine` property to YAML as shown in the previous example. +- Once configured, MetricFlow will recognize the YAML settings, and then the SQL model file can be safely removed. + ### Considerations when choosing which granularities to create{#granularity-considerations} -- MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible. For example, if you have a time spine at a monthly grain, and query a dimension at a monthly grain, MetricFlow will use the monthly time spine. If you only have a daily time spine, MetricFlow will use the daily time spine and date_trunc to month. +- MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible. For example, if you have a time spine at a monthly grain, and query a dimension at a monthly grain, MetricFlow will use the monthly time spine. If you only have a daily time spine, MetricFlow will use the daily time spine and `date_trunc` to month. - You can add a time spine for each granularity you intend to use if query efficiency is more important to you than configuration time, or storage constraints. For most engines, the query performance difference should be minimal and transforming your time spine to a coarser grain at query time shouldn't add significant overhead to your queries. - We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors. For example, if you have dimensions at an hourly grain, you should have a time spine at an hourly grain. diff --git a/website/docs/docs/build/metrics-overview.md b/website/docs/docs/build/metrics-overview.md index e874dced63a..d80c9d37cc0 100644 --- a/website/docs/docs/build/metrics-overview.md +++ b/website/docs/docs/build/metrics-overview.md @@ -7,9 +7,11 @@ tags: [Metrics, Semantic Layer] pagination_next: "docs/build/cumulative" --- -Once you've created your semantic models, it's time to start adding metrics. Metrics can be defined in the same YAML files as your semantic models, or split into separate YAML files into any other subdirectories (provided that these subdirectories are also within the same dbt project repo). +After building [semantic models](/docs/build/semantic-models), it's time to start adding metrics. This page explains the different supported metric types you can add to your dbt project -This article explains the different supported metric types you can add to your dbt project. The keys for metrics definitions are: +Metrics can be defined in the same YAML files as your semantic models, or defined in their dedicated separate YAML files located in any subdirectories within the same dbt project repository. + +The keys for metrics definitions are: @@ -107,7 +109,10 @@ It's possible to define a default time granularity for metrics if it's different The granularity can be set using the `time_granularity` parameter on the metric, and defaults to `day`. If day is not available because the dimension is defined at a coarser granularity, it will default to the defined granularity for the dimension. ### Example -You have a semantic model called `orders` with a time dimension called `order_time`. You want the `orders` metric to roll up to `monthly` by default; however, you want the option to look at these metrics hourly. You can set the `time_granularity` parameter on the `order_time` dimension to `hour`, and then set the `time_granularity` parameter in the metric to `month`. +- You have a semantic model called `orders` with a time dimension called `order_time`. +- You want the `orders` metric to roll up to `monthly` by default; however, you want the option to look at these metrics hourly. +- You can set the `time_granularity` parameter on the `order_time` dimension to `hour`, and then set the `time_granularity` parameter in the metric to `month`. + ```yaml semantic_models: ... @@ -120,15 +125,19 @@ semantic_models: - name: orders expr: 1 agg: sum - metrics: - - name: orders - type: simple - label: Count of Orders - type_params: - measure: - name: orders - time_granularity: month -- Optional, defaults to day + +metrics: + - name: orders + type: simple + label: Count of Orders + type_params: + measure: + name: orders + time_granularity: month -- Optional, defaults to day ``` + +Remember that metrics can be defined in the same YAML files as your semantic models but must be defined as a separate top-level section and not nested within the `semantic_models` key. Or you can define metrics in their dedicated separate YAML files located in any subdirectories within the same dbt project repository. + ## Conversion metrics @@ -179,6 +188,7 @@ metrics: name: active_users fill_nulls_with: 0 join_to_timespine: true + cumulative_type_params: window: 7 days ``` diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index f72f1eb75de..b754639c01b 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -80,7 +80,7 @@ The following table outlines the configurations available for snapshots: | [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp | | [unique_key](/reference/resource-configs/unique_key) | A column(s) (string or array) or expression for the record | Yes | `id` or `[order_id, product_id]` | | [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [updated_at](/reference/resource-configs/updated_at) | A column in your snapshot query results that indicates when each record was last updated, used in the `timestamp` strategy. May support ISO date strings and unix epoch integers, depending on the data platform you use. | Only if using the `timestamp` strategy | updated_at | | [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table.| No | string | | [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | | [hard_deletes](/reference/resource-configs/hard-deletes) | Specify how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`.| No | string | @@ -237,7 +237,7 @@ The `timestamp` strategy requires the following configurations: | Config | Description | Example | | ------ | ----------- | ------- | -| updated_at | A column which represents when the source row was last updated | `updated_at` | +| updated_at | A column which represents when the source row was last updated. May support ISO date strings and unix epoch integers, depending on the data platform you use. | `updated_at` | **Example usage:** @@ -437,103 +437,100 @@ Snapshot tables will be created as a clone of your sourc In dbt Core v1.9+ (or available sooner in [the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks)): - These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config. -ess) - Use the [`dbt_valid_to_current` config](/reference/resource-configs/dbt_valid_to_current) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. - Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track deleted records as new rows with the `dbt_is_deleted` meta field when using the `hard_deletes='new_record'` field. -| Field | Meaning | Usage | -| -------------- | ------- | ----- | -| dbt_valid_from | The timestamp when this snapshot row was first inserted | This column can be used to order the different "versions" of a record. | -| dbt_valid_to | The timestamp when this row became invalidated.
For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | -| dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt | -| dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt | -| dbt_is_deleted | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. This is used internally by dbt | +| Field |
Meaning
| Notes | Example| +| -------------- | ------- | ----- | ------- | +| `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | This column can be used to order the different "versions" of a record. | `snapshot_meta_column_names: {dbt_valid_from: start_date}` | +| `dbt_valid_to` | The timestamp when this row became invalidated. For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | `snapshot_meta_column_names: {dbt_valid_to: end_date}` | +| `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_scd_id: scd_id}` | +| `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_updated_at: modified_date}` | +| `dbt_is_deleted` | A string value indicating if the record has been deleted. (`True` if deleted, `False` if not deleted). |Added when `hard_deletes='new_record'` is configured. | `snapshot_meta_column_names: {dbt_is_deleted: is_deleted}` | -*The timestamps used for each column are subtly different depending on the strategy you use: +All of these column names can be customized using the `snapshot_meta_column_names` config. Refer to this [example](/reference/resource-configs/snapshot_meta_column_names#example) for more details. -For the `timestamp` strategy, the configured `updated_at` column is used to populate the `dbt_valid_from`, `dbt_valid_to` and `dbt_updated_at` columns. +*The timestamps used for each column are subtly different depending on the strategy you use: -
- Details for the timestamp strategy +- For the `timestamp` strategy, the configured `updated_at` column is used to populate the `dbt_valid_from`, `dbt_valid_to` and `dbt_updated_at` columns. -Snapshot query results at `2024-01-01 11:00` + -| id | status | updated_at | -| -- | ------- | ---------------- | -| 1 | pending | 2024-01-01 10:47 | + Snapshot query results at `2024-01-01 11:00` -Snapshot results (note that `11:00` is not used anywhere): + | id | status | updated_at | + | -- | ------- | ---------------- | + | 1 | pending | 2024-01-01 10:47 | -| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | -| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | | 2024-01-01 10:47 | + Snapshot results (note that `11:00` is not used anywhere): -Query results at `2024-01-01 11:30`: + | id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | + | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | + | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | | 2024-01-01 10:47 | -| id | status | updated_at | -| -- | ------- | ---------------- | -| 1 | shipped | 2024-01-01 11:05 | + Query results at `2024-01-01 11:30`: -Snapshot results (note that `11:30` is not used anywhere): + | id | status | updated_at | + | -- | ------- | ---------------- | + | 1 | shipped | 2024-01-01 11:05 | -| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | -| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | -| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 | + Snapshot results (note that `11:30` is not used anywhere): -Snapshot results with `hard_deletes='new_record'`: + | id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | + | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | + | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | + | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 | -| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted | -|----|---------|------------------|------------------|------------------|------------------|----------------| -| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False | -| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False | -| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | + Snapshot results with `hard_deletes='new_record'`: + | id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted | + |----|---------|------------------|------------------|------------------|------------------|----------------| + | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False | + | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False | + | 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | -
-
+ -For the `check` strategy, the current timestamp is used to populate each column. If configured, the `check` strategy uses the `updated_at` column instead, as with the timestamp strategy. +- For the `check` strategy, the current timestamp is used to populate each column. If configured, the `check` strategy uses the `updated_at` column instead, as with the timestamp strategy. -
- Details for the check strategy + -Snapshot query results at `2024-01-01 11:00` + Snapshot query results at `2024-01-01 11:00` -| id | status | -| -- | ------- | -| 1 | pending | + | id | status | + | -- | ------- | + | 1 | pending | -Snapshot results: + Snapshot results: -| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | -| -- | ------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2024-01-01 11:00 | | 2024-01-01 11:00 | + | id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | + | -- | ------- | ---------------- | ---------------- | ---------------- | + | 1 | pending | 2024-01-01 11:00 | | 2024-01-01 11:00 | -Query results at `2024-01-01 11:30`: + Query results at `2024-01-01 11:30`: -| id | status | -| -- | ------- | -| 1 | shipped | + | id | status | + | -- | ------- | + | 1 | shipped | -Snapshot results: + Snapshot results: -| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | -| --- | ------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | -| 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 | + | id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | + | --- | ------- | ---------------- | ---------------- | ---------------- | + | 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | + | 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 | -Snapshot results with `hard_deletes='new_record'`: + Snapshot results with `hard_deletes='new_record'`: -| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted | -|----|---------|------------------|------------------|------------------|----------------| -| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | False | -| 1 | shipped | 2024-01-01 11:30 | 2024-01-01 11:40 | 2024-01-01 11:30 | False | -| 1 | deleted | 2024-01-01 11:40 | | 2024-01-01 11:40 | True | + | id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted | + |----|---------|------------------|------------------|------------------|----------------| + | 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | False | + | 1 | shipped | 2024-01-01 11:30 | 2024-01-01 11:40 | 2024-01-01 11:30 | False | + | 1 | deleted | 2024-01-01 11:40 | | 2024-01-01 11:40 | True | -
+ ## Configure snapshots in versions 1.8 and earlier @@ -586,7 +583,7 @@ The following table outlines the configurations available for snapshots in versi | [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | | [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | | [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [updated_at](/reference/resource-configs/updated_at) | A column in your snapshot query results that indicates when each record was last updated, used in the `timestamp` strategy. May support ISO date strings and unix epoch integers, depending on the data platform you use. | Only if using the `timestamp` strategy | updated_at | | [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | - A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md index aad1ac42c8e..6876d9ac9cb 100644 --- a/website/docs/docs/build/sources.md +++ b/website/docs/docs/build/sources.md @@ -14,7 +14,7 @@ search_weight: "heavy" ## Using sources Sources make it possible to name and describe the data loaded into your warehouse by your Extract and Load tools. By declaring these tables as sources in dbt, you can then -- select from source tables in your models using the `{{ source() }}` function, helping define the lineage of your data +- select from source tables in your models using the [`{{ source() }}` function,](/reference/dbt-jinja-functions/source) helping define the lineage of your data - test your assumptions about your source data - calculate the freshness of your source data diff --git a/website/docs/docs/build/unit-tests.md b/website/docs/docs/build/unit-tests.md index b7123fa35d6..4a92a5792c9 100644 --- a/website/docs/docs/build/unit-tests.md +++ b/website/docs/docs/build/unit-tests.md @@ -10,9 +10,6 @@ keywords: - - - Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ building a model. Starting in dbt Core v1.8, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. @@ -219,10 +216,19 @@ dbt test --select test_is_valid_email_address Your model is now ready for production! Adding this unit test helped catch an issue with the SQL logic _before_ you materialized `dim_customers` in your warehouse and will better ensure the reliability of this model in the future. - ## Unit testing incremental models -When configuring your unit test, you can override the output of macros, vars, or environment variables. This enables you to unit test your incremental models in "full refresh" and "incremental" modes. +When configuring your unit test, you can override the output of macros, vars, or environment variables. This enables you to unit test your incremental models in "full refresh" and "incremental" modes. + +:::note +Incremental models need to exist in the database first before running unit tests or doing a `dbt build`. Use the [`--empty` flag](/reference/commands/build#the---empty-flag) to build an empty version of the models to save warehouse spend. You can also optionally select only your incremental models using the [`--select` flag](/reference/node-selection/syntax#shorthand). + + ```shell + dbt run --select "config.materialized:incremental" --empty + ``` + + After running the command, you can then perform a regular `dbt build` for that model and then run your unit test. +::: When testing an incremental model, the expected output is the __result of the materialization__ (what will be merged/inserted), not the resulting model itself (what the final table will look like after the merge/insert). diff --git a/website/docs/docs/cloud-integrations/semantic-layer/excel.md b/website/docs/docs/cloud-integrations/semantic-layer/excel.md index c80040dce01..6da7edce3fe 100644 --- a/website/docs/docs/cloud-integrations/semantic-layer/excel.md +++ b/website/docs/docs/cloud-integrations/semantic-layer/excel.md @@ -36,7 +36,7 @@ import Tools from '/snippets/_sl-excel-gsheets.md'; diff --git a/website/docs/docs/cloud/about-cloud-develop-defer.md b/website/docs/docs/cloud/about-cloud-develop-defer.md index d1685c42cba..2d7a605d59c 100644 --- a/website/docs/docs/cloud/about-cloud-develop-defer.md +++ b/website/docs/docs/cloud/about-cloud-develop-defer.md @@ -2,7 +2,7 @@ title: Using defer in dbt Cloud id: about-cloud-develop-defer description: "Learn how to leverage defer to prod when developing with dbt Cloud." -sidebar_label: "Using defer in dbt Cloud" +sidebar_label: "Defer in dbt Cloud" pagination_next: "docs/cloud/cloud-cli-installation" --- diff --git a/website/docs/docs/cloud/about-cloud/change-your-dbt-cloud-theme.md b/website/docs/docs/cloud/about-cloud/change-your-dbt-cloud-theme.md new file mode 100644 index 00000000000..579f12d10fc --- /dev/null +++ b/website/docs/docs/cloud/about-cloud/change-your-dbt-cloud-theme.md @@ -0,0 +1,44 @@ +--- +title: "Change your dbt Cloud theme" +id: change-your-dbt-cloud-theme +description: "Learn about theme switching in dbt Cloud" +sidebar_label: Change your dbt Cloud theme +image: /img/docs/dbt-cloud/using-dbt-cloud/light-vs-dark.png +--- + +# Change your dbt Cloud theme + +dbt Cloud supports **Light mode** (default), **Dark mode**, and **System mode** (respects your browser's theme for light or dark mode) under the **Theme** section of your user profile. You can seamlessly switch between these modes directly from the profile menu, customizing your viewing experience. + +Your selected theme is stored in your user profile, ensuring a consistent experience across dbt Cloud. + +Theme selection applies across all areas of dbt Cloud, including the [IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud), [dbt Explorer](/docs/collaborate/explore-projects), [environments](/docs/environments-in-dbt), [jobs](/docs/deploy/jobs), and more. Learn more about customizing themes in [Change themes in dbt Cloud](/docs/cloud/about-cloud/change-your-dbt-cloud-theme#change-themes-in-dbt-cloud). + +## Prerequisites + +- You have a dbt Cloud account. If you don’t, try [dbt Cloud for free!](https://www.getdbt.com/signup) +- Dark mode is currently available on the Developer plan and will gradually be made available for all [plans](https://www.getdbt.com/pricing) in the future. Stay tuned for updates. + +## Change themes in dbt Cloud + +To switch to dark mode in the dbt Cloud UI, follow these steps: + +1. Navigate to your account name at the bottom left of your account. +2. Under **Theme**, select **Dark**. + + + +And that’s it! 🎉 Your chosen selected theme will follow you across all devices. + +## Disable dark mode in dbt Cloud + +To disable dark mode in the dbt Cloud UI, follow these steps: + +1. Navigate to the three dots at the bottom right of the IDE. +2. Select **Switch to light mode** from the menu. + + + +## Legacy dark mode + +The **Switch to dark mode** menu item in the IDE will soon be deprecated. All users who have access to the IDE will default to **Light mode** upon signing in, but can easily switch to **Dark mode** from the user menu in the navigation. Once you switch to your new theme, it will apply to all of your devices. diff --git a/website/docs/docs/cloud/about-cloud/tenancy.md b/website/docs/docs/cloud/about-cloud/tenancy.md index efbc248f91e..812841c5e13 100644 --- a/website/docs/docs/cloud/about-cloud/tenancy.md +++ b/website/docs/docs/cloud/about-cloud/tenancy.md @@ -10,7 +10,7 @@ import AboutCloud from '/snippets/_test-tenancy.md'; ### Multi-tenant -The Multi Tenant (SaaS) deployment environment refers to the SaaS dbt Cloud application hosted by dbt Labs. This is the most commonly used deployment and is completely managed and maintained by dbt Labs, the makers of dbt. As a SaaS product, a user can quickly [create an account](https://www.getdbt.com/signup/) on our North American servers and get started using the dbt and related services immediately. _If your organization requires cloud services hosted on EMEA or APAC regions_, please [contact us](https://www.getdbt.com/contact/). The deployments are hosted on AWS or Azure ([Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud)) and are always kept up to date with the currently supported dbt versions, software updates, and bug fixes. +The Multi Tenant (SaaS) deployment environment refers to the SaaS dbt Cloud application hosted by dbt Labs. This is the most commonly used deployment and is completely managed and maintained by dbt Labs, the makers of dbt. As a SaaS product, a user can quickly [create an account](https://www.getdbt.com/signup/) on our North American servers and get started using the dbt and related services immediately. _If your organization requires cloud services hosted on EMEA or APAC regions_, please [contact us](https://www.getdbt.com/contact/). The deployments are hosted on AWS or Azure and are always kept up to date with the currently supported dbt versions, software updates, and bug fixes. ### Single tenant diff --git a/website/docs/docs/cloud/account-integrations.md b/website/docs/docs/cloud/account-integrations.md index e5ff42cb900..e3c55da9015 100644 --- a/website/docs/docs/cloud/account-integrations.md +++ b/website/docs/docs/cloud/account-integrations.md @@ -33,8 +33,8 @@ Connect your dbt Cloud account to an OAuth provider that are integrated with dbt To configure an OAuth account integration: 1. Navigate to **Account settings** in the side menu. 2. Under the **Settings** section, click on **Integrations**. -3. Under **OAuth**, and click on **Link** to connect your Slack account. -4. For custom OAuth providers, under **Custom OAuth integrations**, click on **Add integration** and select the OAuth provider from the list. Fill in the required fields and click **Save**. +3. Under **OAuth**, click on **Link** to [connect your Slack account](/docs/deploy/job-notifications#set-up-the-slack-integration). +4. For custom OAuth providers, under **Custom OAuth integrations**, click on **Add integration** and select the [OAuth provider](/docs/cloud/manage-access/sso-overview) from the list. Fill in the required fields and click **Save**. diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md index abd3c86d4a8..33af7ee1393 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md @@ -74,7 +74,7 @@ To configure your own linting rules: 1. Create a new file in the root project directory (the parent or top-level directory for your files). Note: The root project directory is the directory where your `dbt_project.yml` file resides. 2. Name the file `.sqlfluff` (make sure you add the `.` before `sqlfluff`). -3. [Create](https://docs.sqlfluff.com/en/stable/configuration.html#new-project-configuration) and add your custom config code. +3. [Create](https://docs.sqlfluff.com/en/stable/configuration/setting_configuration.html#new-project-configuration) and add your custom config code. 4. Save and commit your changes. 5. Restart the IDE. 6. Test it out and happy linting! diff --git a/website/docs/docs/cloud/git/git-configuration-in-dbt-cloud.md b/website/docs/docs/cloud/git/git-configuration-in-dbt-cloud.md index 57558f7cb5b..fc5235526ce 100644 --- a/website/docs/docs/cloud/git/git-configuration-in-dbt-cloud.md +++ b/website/docs/docs/cloud/git/git-configuration-in-dbt-cloud.md @@ -43,4 +43,9 @@ Whether you use a Git integration that natively connects with dbt Cloud or prefe link="/docs/cloud/git/connect-azure-devops" icon="dbt-bit"/> + diff --git a/website/docs/docs/cloud/git/import-a-project-by-git-url.md b/website/docs/docs/cloud/git/import-a-project-by-git-url.md index 804b1542e80..bf6cf9d1aa8 100644 --- a/website/docs/docs/cloud/git/import-a-project-by-git-url.md +++ b/website/docs/docs/cloud/git/import-a-project-by-git-url.md @@ -8,6 +8,10 @@ In dbt Cloud, you can import a git repository from any valid git URL that points ## Git protocols You must use the `git@...` or `ssh:..`. version of your git URL, not the `https://...` version. dbt Cloud uses the SSH protocol to clone repositories, so dbt Cloud will be unable to clone repos supplied with the HTTP protocol. +import GitProvidersCI from '/snippets/_git-providers-supporting-ci.md'; + + + ## Managing deploy keys After importing a project by Git URL, dbt Cloud will generate a Deploy Key for your repository. To find the deploy key in dbt Cloud: diff --git a/website/docs/docs/cloud/git/setup-azure.md b/website/docs/docs/cloud/git/setup-azure.md index c6213b49453..f54bb752937 100644 --- a/website/docs/docs/cloud/git/setup-azure.md +++ b/website/docs/docs/cloud/git/setup-azure.md @@ -12,15 +12,13 @@ sidebar_label: "Set up Azure DevOps" To use our native integration with Azure DevOps in dbt Cloud, an account admin needs to set up an Microsoft Entra ID app. We recommend setting up a separate [Entra ID application than used for SSO](/docs/cloud/manage-access/set-up-sso-microsoft-entra-id). 1. [Register an Entra ID app](#register-a-microsoft-entra-id-app). -2. [Add permissions to your new app](#add-permissions-to-your-new-app). -3. [Add another redirect URI](#add-another-redirect-uri). -4. [Connect Azure DevOps to your new app](#connect-azure-devops-to-your-new-app). -5. [Add your Entra ID app to dbt Cloud](#add-your-azure-ad-app-to-dbt-cloud). +2. [Connect Azure DevOps to your new app](#connect-azure-devops-to-your-new-app). +3. [Add your Entra ID app to dbt Cloud](#add-your-azure-ad-app-to-dbt-cloud). -Once the Microsoft Entra ID app is added to dbt Cloud, an account admin must also [connect a service user](/docs/cloud/git/setup-azure#connect-a-service-user) via OAuth, which will be used to power headless actions in dbt Cloud such as deployment runs and CI. +Once the Microsoft Entra ID app is added to dbt Cloud, an account admin must also connect a [service principal](https://learn.microsoft.com/en-us/entra/identity-platform/app-objects-and-service-principals?tabs=browser), which will be used to power headless actions in dbt Cloud such as deployment runs and CI. -Once the Microsoft Entra ID app is added to dbt Cloud and the service user is connected, then dbt Cloud developers can personally authenticate in dbt Cloud from Azure DevOps. For more on this, see [Authenticate with Azure DevOps](/docs/cloud/git/authenticate-azure). +Once the Microsoft Entra ID app is added to dbt Cloud and the service principal is connected, then dbt Cloud developers can personally authenticate in dbt Cloud from Azure DevOps. For more on this, see [Authenticate with Azure DevOps](/docs/cloud/git/authenticate-azure). The following personas are required to complete the steps on this page: - Microsoft Entra ID admin @@ -38,46 +36,17 @@ A Microsoft Entra ID admin needs to perform the following steps: 4. Provide a name for your app. We recommend using, "dbt Labs Azure DevOps app". 5. Select **Accounts in any organizational directory (Any Entra ID directory - Multitenant)** as the Supported Account Types. Many customers ask why they need to select Multitenant instead of Single tenant, and they frequently get this step wrong. Microsoft considers Azure DevOps (formerly called Visual Studio) and Microsoft Entra ID as separate tenants, and in order for this Entra ID application to work properly, you must select Multitenant. -6. Add a redirect URI by selecting **Web** and, in the field, entering `https://YOUR_ACCESS_URL/complete/azure_active_directory`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/access-regions-ip-addresses) for your region and plan. -7. Click **Register**. - - +6. Click **Register**. Here's what your app should look like before registering it: - -## Add permissions to your new app - -An Entra ID admin needs to provide your new app access to Azure DevOps: - -1. Select **API permissions** in the left navigation panel. -2. Remove the **Microsoft Graph / User Read** permission. -3. Click **Add a permission**. -4. Select **Azure DevOps**. -5. Select the **user_impersonation** permission. This is the only permission available for Azure DevOps. - - - -## Add another redirect URI - -A Microsoft Entra ID admin needs to add another redirect URI to your Entra ID application. This redirect URI will be used to authenticate the service user for headless actions in deployment environments. - -1. Navigate to your Microsoft Entra ID application. - -2. Select the link next to **Redirect URIs** -3. Click **Add URI** and add the URI, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/access-regions-ip-addresses) for your region and plan: -`https://YOUR_ACCESS_URL/complete/azure_active_directory_service_user` -4. Click **Save**. - - - ## Create a client secret A Microsoft Entra ID admin needs to complete the following steps: -1. Navigate to your Microsoft Entra ID application. +1. Navigate to **Microsoft Entra ID**, click **App registrations**, and click on your app. 2. Select **Certificates and Secrets** from the left navigation panel. 3. Select **Client secrets** and click **New client secret** 4. Give the secret a description and select the expiration time. Click **Add**. @@ -89,39 +58,84 @@ An Azure admin will need one of the following permissions in both the Microsoft - Azure Service Administrator - Azure Co-administrator -If your Azure DevOps account is connected to Entra ID, then you can proceed to [Connect a service user](#connect-a-service-user). However, if you're just getting set up, connect Azure DevOps to the Microsoft Entra ID app you just created: +:::note -1. From your Azure DevOps account, select **Organization settings** in the bottom left. -2. Navigate to Microsoft Entra ID. -3. Click **Connect directory**. -4. Select the directory you want to connect. -5. Click **Connect**. +You can only add a managed identity or service principal for the tenant to which your organization is connected. You need to add a directory to your organization so that it can access all the service principals and other identities. +Navigate to **Organization settings** --> **Microsoft Entra** --> **Connect Directory** to connect. - +::: -## Add your Microsoft Entra ID app to dbt Cloud +1. From your Azure DevOps account organization screen, click **Organization settings** in the bottom left. +2. Under **General** settings, click **Users**. +3. Click **Add users**, and in the resulting panel, enter the service principal's name in the first field. Then, click the name when it appears below the field. +4. In the **Add to projects** field, click the boxes for any projects you want to include (or select all). +5. Set the **Azure DevOps Groups** to **Project Administrator**. -A dbt Cloud account admin needs to perform the following steps. + -Once you connect your Microsoft Entra ID app and Azure DevOps, you need to provide dbt Cloud information about the app: +## Configure the Entra ID connection +There are two connection methods currently available for dbt Cloud and Azure DevOps: +- **Service principal** (recommended): Create an application connection via client ID and secret for unattended authentication. +- **Service user** (legacy): Create a user that will authenticate the connection with username and password. This configuration should be avoided. -1. Navigate to your account settings in dbt Cloud. -2. Select **Integrations**. -3. Scroll to the Azure DevOps section. -4. Complete the form: - - **Azure DevOps Organization:** Must match the name of your Azure DevOps organization exactly. Do not include the `dev.azure.com/` prefix in this field. ✅ Use `my-devops-org` ❌ Avoid `dev.azure.com/my-devops-org` - - **Application (client) ID:** Found in the Microsoft Entra ID app. - - **Client Secrets:** Copy the **Value** field in the Microsoft Entra ID app client secrets and paste it in the **Client Secret** field in dbt Cloud. Entra ID admins are responsible for the Entra ID app secret expiration and dbt Admins should note the expiration date for rotation. - - **Directory(tenant) ID:** Found in the Microsoft Entra ID app. - + -Your Microsoft Entra ID app should now be added to your dbt Cloud Account. People on your team who want to develop in the dbt Cloud IDE or dbt Cloud CLI can now personally [authorize Azure DevOps from their profiles](/docs/cloud/git/authenticate-azure). + + +## Create a service principal + +The application's service principal represents the Entra ID application object. Whereas a service user represents a real user in Azure with an Entra ID (and an applicable license), the service principal is a secure identity used by an application to access Azure resources unattended. The service principal authenticates with a client ID and secret rather than a username and password (or any other form of user auth). Service principals are the [Microsoft recommended method](https://learn.microsoft.com/en-us/entra/architecture/secure-service-accounts#types-of-microsoft-entra-service-accounts) for authenticating apps. + +### Add a role to the Service Principal + +In your Azure account: + +1. Navigate to **Subscriptions** and click on the appropriate subscription name for the application environment. +2. From the left-side menu of the subscription window, click **Access control (IAM)**. +3. From the top menu, click **Add** and select **Add role assignment** from the dropdown. + + + +4. In the **Role** tab, select a role with appropriate permissions to assign the service principal. +5. Click the **Members** tab. You must set **Assign access to** to **User, group, or service principal**. +6. Click **Select members** and search for your app name in the window. Once it appears, click your app, which will appear in the **Selected members** section. Click **Select** at the bottom to save your selection. + + + +5. Confirm the correct details and click **Review + assign**. + -## Connect a service user +Navigate back to the **App registrations** screen and click the app. On the left menu, click **Roles and administrators**, and you will see the app role assignment. -Because Azure DevOps forces all authentication to be linked to a user's permissions, we recommend an Azure DevOps admin create a "service user" in Azure DevOps whose permissions will be used to power headless actions in dbt Cloud such as dbt Cloud project repo selection, deployment runs, and CI. A service user is a pseudo user set up in the same way an admin would set up a real user, but it's given permissions specifically scoped for service to service interactions. You should avoid linking authentication to a real Azure DevOps user because if this person leaves your organization, dbt Cloud will lose privileges to the dbt Azure DevOps repositories, causing production runs to fail. +### Migrate to service principal + +If your dbt Cloud app does not have a service principal, take the following actions in your Azure account: + +1. Navigate to **Microsoft Entra ID**. +2. Under **Manage** on the left-side menu, click **App registrations**. +3. Click the app for the dbt Cloud and Azure DevOps integration. +4. Locate the **Managed application in local directory** field and click **Create Service Principal**. + + + +5. Follow the instructions in [Add role to service principal](#add-a-role-to-the-service-principal) if the app doesn't already have them assigned. +6. In dbt Cloud, navigate to **Account settings** --> **Integrations** and edit the **Azure DevOps** integration. +7. Click the **Service principal** option, fill out the fields, and click **Save**. The services will continue to function uninterrupted. + + + + + + +:::important + +Service users are no longer a recommended method for authentication and accounts using them should [migrate](#migrate-to-service-principal) to Entra ID [service principals](https://learn.microsoft.com/en-us/entra/identity-platform/app-objects-and-service-principals) in the future. Service prinicpals are the [Microsoft recommended service account type](https://learn.microsoft.com/en-us/entra/architecture/secure-service-accounts#types-of-microsoft-entra-service-accounts) for app authentication. + +::: + +An Azure DevOps admin can create a "service user (not recommended)" in Azure DevOps whose permissions will be used to power headless actions in dbt Cloud such as dbt Cloud project repo selection, deployment runs, and CI. A service user is a pseudo user set up in the same way an admin would set up a real user, but it's given permissions specifically scoped for service to service interactions. You should avoid linking authentication to a real Azure DevOps user because if this person leaves your organization, dbt Cloud will lose privileges to the dbt Azure DevOps repositories, causing production runs to fail. :::info Service user authentication expiration dbt Cloud will refresh the authentication for the service user on each run triggered by the scheduler, API, or CI. If your account does not have any active runs for over 90 days, an admin will need to manually refresh the authentication of the service user by disconnecting and reconnecting the service user's profile via the OAuth flow described above in order to resume headless interactions like project set up, deployment runs, and CI. @@ -393,3 +407,25 @@ These tokens are limited to the following [scopes](https://learn.microsoft.com/e - `vso.project`: Grants the ability to read projects and teams. - `vso.build_execute`: Grants the ability to access build artifacts, including build results, definitions, and requests, and the ability to queue a build, update build properties, and the ability to receive notifications about build events with service hooks. ::: + + + + +## Add your Microsoft Entra ID app to dbt Cloud + +A dbt Cloud account admin must take the following actions. + +Once you connect your Microsoft Entra ID app and Azure DevOps, you need to provide dbt Cloud information about the app: + +1. Navigate to your account settings in dbt Cloud. +2. Select **Integrations**. +3. Scroll to the Azure DevOps section. +4. Complete the form: + - **Azure DevOps Organization:** Must match the name of your Azure DevOps organization exactly. Do not include the `dev.azure.com/` prefix in this field. ✅ Use `my-devops-org` ❌ Avoid `dev.azure.com/my-devops-org` + - **Application (client) ID:** Found in the Microsoft Entra ID app. + - **Client Secrets:** Copy the **Value** field in the Microsoft Entra ID app client secrets and paste it in the **Client Secret** field in dbt Cloud. Entra ID admins are responsible for the Entra ID app secret expiration and dbt Admins should note the expiration date for rotation. + - **Directory(tenant) ID:** Found in the Microsoft Entra ID app. + + - **Redirect URI (Service users only)**: Copy this field to **Redirect URIs** field in your Entra ID app. + +Your Microsoft Entra ID app should now be added to your dbt Cloud Account. People on your team who want to develop in the dbt Cloud IDE or dbt Cloud CLI can now personally [authorize Azure DevOps from their profiles](/docs/cloud/git/authenticate-azure). diff --git a/website/docs/docs/cloud/manage-access/audit-log.md b/website/docs/docs/cloud/manage-access/audit-log.md index de52434be06..b3a4b30c7dd 100644 --- a/website/docs/docs/cloud/manage-access/audit-log.md +++ b/website/docs/docs/cloud/manage-access/audit-log.md @@ -102,9 +102,11 @@ The audit log supports various events for different objects in dbt Cloud. You wi | Invite Added | invite.Added | User invitation added and sent to the user | | Invite Redeemed | invite.Redeemed | User redeemed invitation | | User Added to Account | account.UserAdded | New user added to the account | -| User Added to Group | user_group_user.Added | An existing user is added to a group | -| User Removed from Account | account.UserRemoved | User removed from the account -| User Removed from Group | user_group_user.Removed | An existing user is removed from a group | +| User Added to Group | user_group_user.Added | An existing user was added to a group | +| User Removed from Account | account.UserRemoved | User removed from the account | +| User Removed from Group | user_group_user.Removed | An existing user was removed from a group | +| User License Created | user_license.added | A new user license was consumed | +| User License Removed | user_license.removed | A user license was removed from the seat count | | Verification Email Confirmed | user.jit.email.Confirmed | Email verification confirmed by user | | Verification Email Sent | user.jit.email.Sent | Email verification sent to user created via JIT | diff --git a/website/docs/docs/cloud/manage-access/auth0-migration.md b/website/docs/docs/cloud/manage-access/auth0-migration.md index 2f45ad7dcc8..f961201e153 100644 --- a/website/docs/docs/cloud/manage-access/auth0-migration.md +++ b/website/docs/docs/cloud/manage-access/auth0-migration.md @@ -19,6 +19,12 @@ Alternatively, you can start the process from the **Settings** page in the **Sin +There are two fields in these settings that you will need for the migration: +- **Single sign-on URL:** This will be in the format of your login URL `https:///login/callback?connection=` +- **Audience URI (SP Entity ID):** This will be in the format `urn:auth0::` + +Replace `` with your accounts login slug. + Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. You can just skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](/docs/cloud/manage-access/sso-overview). :::warning Login \{slug\} @@ -30,17 +36,19 @@ After changing the slug, admins must share the new login URL with their dbt Clou ::: -## SAML 2.0 and Okta +## SAML 2.0 SAML 2.0 users must update a few fields in the SSO app configuration to match the new Auth0 URL and URI. You can approach this by editing the existing SSO app settings or creating a new one to accommodate the Auth0 settings. One approach isn't inherently better, so you can choose whichever works best for your organization. -The fields that will be updated are: -- Single sign-on URL — `https:///login/callback?connection={slug}` -- Audience URI (SP Entity ID) — `urn:auth0::{slug}` +### SAML 2.0 and Okta + +The Okta fields that will be updated are: +- Single sign-on URL — `https:///login/callback?connection=` +- Audience URI (SP Entity ID) — `urn:auth0::` Below are sample steps to update. You must complete all of them to ensure uninterrupted access to dbt Cloud and you should coordinate with your identity provider admin when making these changes. -1. Replace `{slug}` with your organization’s login slug. It must be unique across all dbt Cloud instances and is usually something like your company name separated by dashes (for example, `dbt-labs`). +1. Replace `` with your organization’s login slug. It must be unique across all dbt Cloud instances and is usually something like your company name separated by dashes (for example, `dbt-labs`). Here is an example of an updated SAML 2.0 setup in Okta. @@ -56,39 +64,37 @@ Here is an example of an updated SAML 2.0 setup in Okta. 4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. -## Google Workspace +### SAML 2.0 and Entra ID -Google Workspace admins updating their SSO APIs with the Auth0 URL won't have to do much if it is an existing setup. This can be done as a new project or by editing an existing SSO setup. No additional scopes are needed since this is migrating from an existing setup. All scopes were defined during the initial configuration. +The Entra ID fields that will be updated are: +- Single sign-on URL — `https:///login/callback?connection=` +- Audience URI (SP Entity ID) — `urn:auth0::` -Below are steps to update. You must complete all of them to ensure uninterrupted access to dbt Cloud and you should coordinate with your identity provider admin when making these changes. +The new values for these fields can be found in dbt Cloud by navigating to **Account settting** --> **Single sign-on**. -1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt Cloud single sign-on settings. From the project page **Quick Access**, select **APIs and Services** - - - -2. Click **Credentials** from the left side pane and click the appropriate name from **OAuth 2.0 Client IDs** - - +1. Replace `` with your organization’s login slug. It must be unique across all dbt Cloud instances and is usually something like your company name separated by dashes (for example, `dbt-labs`). -3. In the **Client ID for Web application** window, find the **Authorized Redirect URIs** field and click **Add URI** and enter `https:///login/callback`. +2. Locate your dbt Cloud SAML2.0 app in the **Enterprise applications** section of Azure. Click **Single sign-on** on the left side menu. -Click **Save** once you are done. +3. Edit the **Basic SAML configuration** tile and enter the values from your account: + - Entra ID **Identifier (Entity ID)** = dbt Cloud **Audience URI (SP Entity ID)** + - Entra ID **Reply URL (Assertion Consumer Service URL)** = dbt Cloud **Single sign-on URL** - + -4. _You will need a person with Google Workspace admin privileges to complete these steps in dbt Cloud_. In dbt Cloud, navigate to the **Account Settings**, click on **Single Sign-on**, and then click **Edit** on the right side of the SSO pane. Toggle the **Enable New SSO Authentication** option and select **Save**. This will trigger an authorization window from Google that will require admin credentials. _The migration action is final and cannot be undone_. Once the authentication has gone through, test the new configuration using the SSO login URL provided on the settings page. +4. Save the fields and the completed configuration will look something like this: -:::warning Domain authorization + -You must complete the domain authorization before you toggle `Enable New SSO Authentication`, or the migration will not complete successfully. +3. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. _The new SSO migration action is final and cannot be undone_ -::: + - +4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. ## Microsoft Entra ID -Microsoft Entra ID admins will need to make a slight adjustment to the existing authentication app in the Azure portal. This migration does not require that the entire app be deleted or recreated; you can edit the existing app. Start by opening the Azure portal and navigating to the Microsoft Entra ID overview. +Microsoft Entra ID admins using OpenID Connect (ODIC) will need to make a slight adjustment to the existing authentication app in the Azure portal. This migration does not require that the entire app be deleted or recreated; you can edit the existing app. Start by opening the Azure portal and navigating to the Microsoft Entra ID overview. Below are steps to update. You must complete all of them to ensure uninterrupted access to dbt Cloud and you should coordinate with your identity provider admin when making these changes. @@ -113,3 +119,32 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: + +## Google Workspace + +Google Workspace admins updating their SSO APIs with the Auth0 URL won't have to do much if it is an existing setup. This can be done as a new project or by editing an existing SSO setup. No additional scopes are needed since this is migrating from an existing setup. All scopes were defined during the initial configuration. + +Below are steps to update. You must complete all of them to ensure uninterrupted access to dbt Cloud and you should coordinate with your identity provider admin when making these changes. + +1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt Cloud single sign-on settings. From the project page **Quick Access**, select **APIs and Services** + + + +2. Click **Credentials** from the left side pane and click the appropriate name from **OAuth 2.0 Client IDs** + + + +3. In the **Client ID for Web application** window, find the **Authorized Redirect URIs** field and click **Add URI** and enter `https:///login/callback`. + +Click **Save** once you are done. + +4. _You will need a person with Google Workspace admin privileges to complete these steps in dbt Cloud_. In dbt Cloud, navigate to the **Account Settings**, click on **Single Sign-on**, and then click **Edit** on the right side of the SSO pane. Toggle the **Enable New SSO Authentication** option and select **Save**. This will trigger an authorization window from Google that will require admin credentials. _The migration action is final and cannot be undone_. Once the authentication has gone through, test the new configuration using the SSO login URL provided on the settings page. + +:::warning Domain authorization + +You must complete the domain authorization before you toggle `Enable New SSO Authentication`, or the migration will not complete successfully. + +::: + + + diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md index 2b2575efc57..84eab8c1b27 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md @@ -56,7 +56,7 @@ Client Secret for use in dbt Cloud. 6. Save the **Consent screen** settings to navigate back to the **Create OAuth client id** page. -7. Use the following configuration values when creating your Credentials, replacing `YOUR_ACCESS_URL` and `YOUR_AUTH0_URI`, which need to be replaced with the [appropriate Access URL and Auth0 URI](/docs/cloud/manage-access/sso-overview#auth0-multi-tenant-uris) for your region and plan. +7. Use the following configuration values when creating your Credentials, replacing `YOUR_ACCESS_URL` and `YOUR_AUTH0_URI`, which need to be replaced with the appropriate Access URL and Auth0 URI from your [account settings](/docs/cloud/manage-access/sso-overview#auth0-uris). | Config | Value | | ------ | ----- | diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md b/website/docs/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md index 81463cf9ee5..7b095a67288 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md @@ -37,7 +37,7 @@ Log into the Azure portal for your organization. Using the [**Microsoft Entra ID | **Name** | dbt Cloud | | **Supported account types** | Accounts in this organizational directory only _(single tenant)_ | -4. Configure the **Redirect URI**. The table below shows the appropriate Redirect URI values for single-tenant and multi-tenant deployments. For most enterprise use-cases, you will want to use the single-tenant Redirect URI. Replace `YOUR_AUTH0_URI` with the [appropriate Auth0 URI](/docs/cloud/manage-access/sso-overview#auth0-multi-tenant-uris) for your region and plan. +4. Configure the **Redirect URI**. The table below shows the appropriate Redirect URI values for single-tenant and multi-tenant deployments. For most enterprise use-cases, you will want to use the single-tenant Redirect URI. Replace `YOUR_AUTH0_URI` with the [appropriate Auth0 URI](/docs/cloud/manage-access/sso-overview#auth0-uris) for your region and plan. | Application Type | Redirect URI | | ----- | ----- | @@ -138,7 +138,7 @@ To complete setup, follow the steps below in the dbt Cloud application. | **Client Secret** | Paste the **Client Secret** (remember to use the Secret Value instead of the Secret ID) from the steps above;
**Note:** When the client secret expires, an Entra ID admin will have to generate a new one to be pasted into dbt Cloud for uninterrupted application access. | | **Tenant ID** | Paste the **Directory (tenant ID)** recorded in the steps above | | **Domain** | Enter the domain name for your Azure directory (such as `fishtownanalytics.com`). Only use the primary domain; this won't block access for other domains. | -| **Slug** | Enter your desired login slug. Users will be able to log into dbt Cloud by navigating to `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/manage-access/sso-overview#auth0-multi-tenant-uris) for your region and plan. Login slugs must be unique across all dbt Cloud accounts, so pick a slug that uniquely identifies your company. | +| **Slug** | Enter your desired login slug. Users will be able to log into dbt Cloud by navigating to `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/manage-access/sso-overview#auth0-uris) for your region and plan. Login slugs must be unique across all dbt Cloud accounts, so pick a slug that uniquely identifies your company. | diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-okta.md b/website/docs/docs/cloud/manage-access/set-up-sso-okta.md index 83c9f6492c6..9bc1b3d2683 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-okta.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-okta.md @@ -75,6 +75,9 @@ so pick a slug that uniquely identifies your company. * **Single sign on URL**: `https://YOUR_AUTH0_URI/login/callback?connection=` * **Audience URI (SP Entity ID)**: `urn:auth0::{login slug}` * **Relay State**: `` +* **Name ID format**: `Unspecified` +* **Application username**: `Custom` / `user.getInternalProperty("id")` +* **Update Application username on**: `Create and update` - - Use the **Attribute Statements** and **Group Attribute Statements** forms to map your organization's Okta User and Group Attributes to the format that dbt Cloud expects. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md index ca93d81badf..992e4ca2967 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md @@ -59,7 +59,9 @@ Additionally, you may configure the IdP attributes passed from your identity pro | email | Unspecified | user.email | The user's email address | | first_name | Unspecified | user.first_name | The user's first name | | last_name | Unspecified | user.last_name | The user's last name | -| NameID (if applicable) | Unspecified | user.email | The user's email address | +| NameID | Unspecified | ID | The user's unchanging ID | + +`NameID` values can be persistent (`urn:oasis:names:tc:SAML:2.0:nameid-format:persistent`) rather than unspecified if your IdP supports these values. Using an email address for `NameID` will work, but dbt Cloud creates an entirely new user if that email address changes. Configuring a value that will not change, even if the user's email address does, is a best practice. dbt Cloud's [role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control) relies on group mappings from the IdP to assign dbt Cloud users to dbt Cloud groups. To @@ -144,6 +146,9 @@ Login slugs must be unique across all dbt Cloud accounts, so pick a slug that un * **Single sign on URL**: `https://YOUR_AUTH0_URI/login/callback?connection=` * **Audience URI (SP Entity ID)**: `urn:auth0::` * **Relay State**: `` + * **Name ID format**: `Unspecified` + * **Application username**: `Custom` / `user.getInternalProperty("id")` + * **Update Application username on**: `Create and update` @@ -245,7 +250,7 @@ Login slugs must be unique across all dbt Cloud accounts, so pick a slug that un * **Audience URI (SP Entity ID)**: `urn:auth0::` - **Start URL**: `` 5. Select the **Signed response** checkbox. -6. The default **Name ID** is the primary email. Multi-value input is not supported. +6. The default **Name ID** is the primary email. Multi-value input is not supported. If your user profile has a unique, stable value that will persist across email address changes, it's best to use that; otherwise, email will work. 7. Use the **Attribute mapping** page to map your organization's Google Directory Attributes to the format that dbt Cloud expects. 8. Click **Add another mapping** to map additional attributes. @@ -329,9 +334,11 @@ Follow these steps to set up single sign-on (SSO) with dbt Cloud: From the Set up Single Sign-On with SAML page: 1. Click **Edit** in the User Attributes & Claims section. -2. Leave the claim under "Required claim" as is. -3. Delete all claims under "Additional claims." -4. Click **Add new claim** and add these three new claims: +2. Click **Unique User Identifier (Name ID)** under **Required claim.** +3. Set **Name identifier format** to **Unspecified**. +4. Set **Source attribute** to **user.objectid**. +5. Delete all claims under **Additional claims.** +6. Click **Add new claim** and add the following new claims: | Name | Source attribute | | ----- | ----- | @@ -339,16 +346,22 @@ From the Set up Single Sign-On with SAML page: | **first_name** | user.givenname | | **last_name** | user.surname | -5. Click **Add a group claim** from User Attributes and Claims. -6. If you'll assign users directly to the enterprise application, select **Security Groups**. If not, select **Groups assigned to the application**. -7. Set **Source attribute** to **Group ID**. -8. Under **Advanced options**, check **Customize the name of the group claim** and specify **Name** to **groups**. +7. Click **Add a group claim** from **User Attributes and Claims.** +8. If you assign users directly to the enterprise application, select **Security Groups**. If not, select **Groups assigned to the application**. +9. Set **Source attribute** to **Group ID**. +10. Under **Advanced options**, check **Customize the name of the group claim** and specify **Name** to **groups**. **Note:** Keep in mind that the Group ID in Entra ID maps to that group's GUID. It should be specified in lowercase for the mappings to work as expected. The Source Attribute field alternatively can be set to a different value of your preference. ### Finish setup -9. After creating the Azure application, follow the instructions in the [dbt Cloud Setup](#dbt-cloud-setup) section to complete the integration. +9. After creating the Azure application, follow the instructions in the [dbt Cloud Setup](#dbt-cloud-setup) section to complete the integration. The names for fields in dbt Cloud vary from those in the Entra ID app. They're mapped as follows: + + | dbt Cloud field | Corresponding Entra ID field | + | ----- | ----- | + | **Identity Provider SSO URL** | Login URL | + | **Identity Provider Issuer** | Microsoft Entra Identifier | + ## OneLogin integration @@ -386,7 +399,7 @@ We recommend using the following values: | name | name format | value | | ---- | ----------- | ----- | -| NameID | Unspecified | Email | +| NameID | Unspecified | OneLogin ID | | email | Unspecified | Email | | first_name | Unspecified | First Name | | last_name | Unspecified | Last Name | diff --git a/website/docs/docs/cloud/secure/databricks-privatelink.md b/website/docs/docs/cloud/secure/databricks-privatelink.md index aaa6e0c6eb7..9cf13489bab 100644 --- a/website/docs/docs/cloud/secure/databricks-privatelink.md +++ b/website/docs/docs/cloud/secure/databricks-privatelink.md @@ -56,6 +56,7 @@ The following steps will walk you through the setup of a Databricks AWS PrivateL - Databricks instance name: - Databricks Azure resource ID: - dbt Cloud multi-tenant environment: EMEA + - Azure region: Region that hosts your Databricks workspace (like, WestEurope, NorthEurope) ``` 5. Once our Support team confirms the resources are available in the Azure portal, navigate to the Azure Databricks Workspace and browse to **Networking** > **Private Endpoint Connections**. Then, highlight the `dbt` named option and select **Approve**. diff --git a/website/docs/docs/collaborate/model-query-history.md b/website/docs/docs/collaborate/model-query-history.md index 872a5a295da..6f7ef6958b1 100644 --- a/website/docs/docs/collaborate/model-query-history.md +++ b/website/docs/docs/collaborate/model-query-history.md @@ -35,7 +35,7 @@ To access the features, you should meet the following: 1. You have a dbt Cloud account on the [Enterprise plan](https://www.getdbt.com/pricing/). Single-tenant accounts should contact their account representative for setup. 2. You have set up a [production](https://docs.getdbt.com/docs/deploy/deploy-environments#set-as-production-environment) deployment environment for each project you want to explore, with at least one successful job run. 3. You have [admin permissions](/docs/cloud/manage-access/enterprise-permissions) in dbt Cloud to edit project settings or production environment settings. -4. Use Snowflake or BigQuery as your data warehouse and can enable query history permissions or work with an admin to do so. Support for additional data platforms coming soon. +4. Use Snowflake or BigQuery as your data warehouse and can enable [query history permissions](#for-snowflake) or work with an admin to do so. Support for additional data platforms coming soon. - For Snowflake users: You **must** have a Snowflake Enterprise tier or higher subscription. ## Enable query history in dbt Cloud diff --git a/website/docs/docs/connect-adapters.md b/website/docs/docs/connect-adapters.md index a15f301a260..e4180710e16 100644 --- a/website/docs/docs/connect-adapters.md +++ b/website/docs/docs/connect-adapters.md @@ -1,5 +1,5 @@ --- -title: "How to connect to adapters" +title: "Connect to adapters" id: "connect-adapters" --- diff --git a/website/docs/docs/core/connect-data-platform/teradata-setup.md b/website/docs/docs/core/connect-data-platform/teradata-setup.md index f4ffbe37f35..b2fb8040935 100644 --- a/website/docs/docs/core/connect-data-platform/teradata-setup.md +++ b/website/docs/docs/core/connect-data-platform/teradata-setup.md @@ -38,17 +38,19 @@ import SetUpPages from '/snippets/_setup-pages-intro.md'; | 1.6.x | ✅ | ✅ | ✅ | ❌ | | 1.7.x | ✅ | ✅ | ✅ | ❌ | | 1.8.x | ✅ | ✅ | ✅ | ✅ | +| 1.9.x | ✅ | ✅ | ✅ | ✅ | ## dbt dependent packages version compatibility -| dbt-teradata | dbt-core | dbt-teradata-util | dbt-util | -|--------------|------------|-------------------|----------------| -| 1.2.x | 1.2.x | 0.1.0 | 0.9.x or below | -| 1.6.7 | 1.6.7 | 1.1.1 | 1.1.1 | -| 1.7.x | 1.7.x | 1.1.1 | 1.1.1 | -| 1.8.x | 1.8.x | 1.1.1 | 1.1.1 | -| 1.8.x | 1.8.x | 1.2.0 | 1.2.0 | -| 1.8.x | 1.8.x | 1.3.0 | 1.3.0 | +| dbt-teradata | dbt-core | dbt-teradata-util | dbt-util | +|--------------|----------|-------------------|----------------| +| 1.2.x | 1.2.x | 0.1.0 | 0.9.x or below | +| 1.6.7 | 1.6.7 | 1.1.1 | 1.1.1 | +| 1.7.x | 1.7.x | 1.1.1 | 1.1.1 | +| 1.8.x | 1.8.x | 1.1.1 | 1.1.1 | +| 1.8.x | 1.8.x | 1.2.0 | 1.2.0 | +| 1.8.x | 1.8.x | 1.3.0 | 1.3.0 | +| 1.9.x | 1.9.x | 1.3.0 | 1.3.0 | ### Connecting to Teradata @@ -95,7 +97,6 @@ Parameter | Default | Type | Description `browser_tab_timeout` | `"5"` | quoted integer | Specifies the number of seconds to wait before closing the browser tab after Browser Authentication is completed. The default is 5 seconds. The behavior is under the browser's control, and not all browsers support automatic closing of browser tabs. `browser_timeout` | `"180"` | quoted integer | Specifies the number of seconds that the driver will wait for Browser Authentication to complete. The default is 180 seconds (3 minutes). `column_name` | `"false"` | quoted boolean | Controls the behavior of cursor `.description` sequence `name` items. Equivalent to the Teradata JDBC Driver `COLUMN_NAME` connection parameter. False specifies that a cursor `.description` sequence `name` item provides the AS-clause name if available, or the column name if available, or the column title. True specifies that a cursor `.description` sequence `name` item provides the column name if available, but has no effect when StatementInfo parcel support is unavailable. -`connect_failure_ttl` | `"0"` | quoted integer | Specifies the time-to-live in seconds to remember the most recent connection failure for each IP address/port combination. The driver subsequently skips connection attempts to that IP address/port for the duration of the time-to-live. The default value of zero disables this feature. The recommended value is half the database restart time. Equivalent to the Teradata JDBC Driver `CONNECT_FAILURE_TTL` connection parameter. `connect_timeout` | `"10000"` | quoted integer | Specifies the timeout in milliseconds for establishing a TCP socket connection. Specify 0 for no timeout. The default is 10 seconds (10000 milliseconds). `cop` | `"true"` | quoted boolean | Specifies whether COP Discovery is performed. Equivalent to the Teradata JDBC Driver `COP` connection parameter. `coplast` | `"false"` | quoted boolean | Specifies how COP Discovery determines the last COP hostname. Equivalent to the Teradata JDBC Driver `COPLAST` connection parameter. When `coplast` is `false` or omitted, or COP Discovery is turned off, then no DNS lookup occurs for the coplast hostname. When `coplast` is `true`, and COP Discovery is turned on, then a DNS lookup occurs for a coplast hostname. @@ -110,7 +111,7 @@ Parameter | Default | Type | Description `log` | `"0"` | quoted integer | Controls debug logging. Somewhat equivalent to the Teradata JDBC Driver `LOG` connection parameter. This parameter's behavior is subject to change in the future. This parameter's value is currently defined as an integer in which the 1-bit governs function and method tracing, the 2-bit governs debug logging, the 4-bit governs transmit and receive message hex dumps, and the 8-bit governs timing. Compose the value by adding together 1, 2, 4, and/or 8. `logdata` | | string | Specifies extra data for the chosen logon authentication method. Equivalent to the Teradata JDBC Driver `LOGDATA` connection parameter. `logon_timeout` | `"0"` | quoted integer | Specifies the logon timeout in seconds. Zero means no timeout. -`logmech` | `"TD2"` | string | Specifies the logon authentication method. Equivalent to the Teradata JDBC Driver `LOGMECH` connection parameter. Possible values are `TD2` (the default), `JWT`, `LDAP`, `KRB5` for Kerberos, or `TDNEGO`. +`logmech` | `"TD2"` | string | Specifies the logon authentication method. Equivalent to the Teradata JDBC Driver `LOGMECH` connection parameter. Possible values are `TD2` (the default), `JWT`, `LDAP`, `BROWSER`, `KRB5` for Kerberos, or `TDNEGO`. `max_message_body` | `"2097000"` | quoted integer | Specifies the maximum Response Message size in bytes. Equivalent to the Teradata JDBC Driver `MAX_MESSAGE_BODY` connection parameter. `partition` | `"DBC/SQL"` | string | Specifies the database partition. Equivalent to the Teradata JDBC Driver `PARTITION` connection parameter. `request_timeout` | `"0"` | quoted integer | Specifies the timeout for executing each SQL request. Zero means no timeout. @@ -210,7 +211,9 @@ For using cross-DB macros, teradata-utils as a macro namespace will not be used, ##### hash - `Hash` macro needs an `md5` function implementation. Teradata doesn't support `md5` natively. You need to install a User Defined Function (UDF): + `Hash` macro needs an `md5` function implementation. Teradata doesn't support `md5` natively. You need to install a User Defined Function (UDF) and optionally specify `md5_udf` [variable](/docs/build/project-variables). + + If not specified the code defaults to using `GLOBAL_FUNCTIONS.hash_md5`. See the following instructions on how to install the custom UDF: 1. Download the md5 UDF implementation from Teradata (registration required): https://downloads.teradata.com/download/extensibility/md5-message-digest-udf. 1. Unzip the package and go to `src` directory. 1. Start up `bteq` and connect to your database. @@ -228,6 +231,12 @@ For using cross-DB macros, teradata-utils as a macro namespace will not be used, ```sql GRANT EXECUTE FUNCTION ON GLOBAL_FUNCTIONS TO PUBLIC WITH GRANT OPTION; ``` + Instruction on how to add md5_udf variable in dbt_project.yml for custom hash function: + ```yaml + vars: + md5_udf: Custom_database_name.hash_method_function + ``` + ##### last_day `last_day` in `teradata_utils`, unlike the corresponding macro in `dbt_utils`, doesn't support `quarter` datepart. @@ -241,6 +250,15 @@ dbt-teradata 1.8.0 and later versions support unit tests, enabling you to valida ## Limitations +### Browser authentication + +* When running a dbt job with logmech set to "browser", the initial authentication opens a browser window where you must enter your username and password. +* After authentication, this window remains open, requiring you to manually switch back to the dbt console. +* For every subsequent connection, a new browser tab briefly opens, displaying the message "TERADATA BROWSER AUTHENTICATION COMPLETED," and silently reuses the existing session. +* However, the focus stays on the browser window, so you’ll need to manually switch back to the dbt console each time. +* This behavior is the default functionality of the teradatasql driver and cannot be avoided at this time. +* To prevent session expiration and the need to re-enter credentials, ensure the authentication browser window stays open until the job is complete. + ### Transaction mode Both ANSI and TERA modes are now supported in dbt-teradata. TERA mode's support is introduced with dbt-teradata 1.7.1, it is an initial implementation. @@ -254,4 +272,4 @@ The adapter was originally created by [Doug Beatty](https://github.com/dbeatty10 ## License -The adapter is published using Apache-2.0 License. Refer to the [terms and conditions](https://github.com/dbt-labs/dbt-core/blob/main/License.md) to understand items such as creating derivative work and the support model. +The adapter is published using Apache-2.0 License. Refer to the [terms and conditions](https://github.com/dbt-labs/dbt-core/blob/main/License.md) to understand items such as creating derivative work and the support model. diff --git a/website/docs/docs/dbt-cloud-apis/sl-api-overview.md b/website/docs/docs/dbt-cloud-apis/sl-api-overview.md index e4e2a91791d..a6862dcb0fb 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-api-overview.md +++ b/website/docs/docs/dbt-cloud-apis/sl-api-overview.md @@ -30,18 +30,18 @@ plan="dbt Cloud Team or Enterprise"
- - + + "} ``` @@ -58,7 +56,7 @@ Each GQL request also requires a dbt Cloud `environmentId`. The API uses both th ### Metadata calls -**Fetch data platform dialect** +#### Fetch data platform dialect In some cases in your application, it may be useful to know the dialect or data platform that's internally used for the dbt Semantic Layer connection (such as if you are building `where` filters from a user interface rather than user-inputted SQL). @@ -72,13 +70,13 @@ The GraphQL API has an easy way to fetch this with the following query: } ``` -**Fetch available metrics** +#### Fetch available metrics ```graphql metrics(environmentId: BigInt!): [Metric!]! ``` -**Fetch available dimensions for metrics** +#### Fetch available dimensions for metrics ```graphql dimensions( @@ -87,7 +85,7 @@ dimensions( ): [Dimension!]! ``` -**Fetch available granularities given metrics** +#### Fetch available granularities given metrics Note: This call for `queryableGranularities` returns only queryable granularities for metric time - the primary time dimension across all metrics selected. @@ -123,7 +121,7 @@ You can also optionally access it from the metrics endpoint: } ``` -**Fetch measures** +#### Fetch measures ```graphql { @@ -147,7 +145,7 @@ You can also optionally access it from the metrics endpoint: } ``` -**Fetch available metrics given a set of dimensions** +#### Fetch available metrics given a set of dimensions ```graphql metricsForDimensions( @@ -156,7 +154,7 @@ metricsForDimensions( ): [Metric!]! ``` -**Metric Types** +#### Metric types ```graphql Metric { @@ -174,7 +172,7 @@ Metric { MetricType = [SIMPLE, RATIO, CUMULATIVE, DERIVED] ``` -**Metric Type parameters** +#### Metric type parameters ```graphql MetricTypeParams { @@ -190,7 +188,7 @@ MetricTypeParams { ``` -**Dimension Types** +#### Dimension types ```graphql Dimension { @@ -208,7 +206,7 @@ Dimension { DimensionType = [CATEGORICAL, TIME] ``` -**List saved queries** +#### List saved queries ```graphql { @@ -235,9 +233,13 @@ DimensionType = [CATEGORICAL, TIME] ### Querying -When querying for data, _either_ a `groupBy` _or_ a `metrics` selection is required. +When querying for data, _either_ a `groupBy` _or_ a `metrics` selection is required. The following section provides examples of how to query metrics: + +- [Create dimension values query](#create-dimension-values-query) +- [Create metric query](#create-metric-query) +- [Fetch query result](#fetch-query-result) -**Create Dimension Values query** +#### Create dimension values query ```graphql @@ -249,7 +251,7 @@ mutation createDimensionValuesQuery( ``` -**Create Metric query** +#### Create metric query ```graphql createQuery( @@ -265,6 +267,7 @@ createQuery( ```graphql MetricInput { name: String! + alias: String! } GroupByInput { @@ -283,7 +286,7 @@ OrderByinput { # -- pass one and only one of metric or groupBy } ``` -**Fetch query result** +#### Fetch query result ```graphql query( @@ -323,7 +326,7 @@ mutation { ### Output format and pagination -**Output format** +#### Output format By default, the output is in Arrow format. You can switch to JSON format using the following parameter. However, due to performance limitations, we recommend using the JSON parameter for testing and validation. The JSON received is a base64 encoded string. To access it, you can decode it using a base64 decoder. The JSON is created from pandas, which means you can change it back to a dataframe using `pandas.read_json(json, orient="table")`. Or you can work with the data directly using `json["data"]`, and find the table schema using `json["schema"]["fields"]`. Alternatively, you can pass `encoded:false` to the jsonResult field to get a raw JSON string directly. @@ -343,7 +346,7 @@ By default, the output is in Arrow format. You can switch to JSON format using t The results default to the table but you can change it to any [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html) supported value. -**Pagination** +#### Pagination By default, we return 1024 rows per page. If your result set exceeds this, you need to increase the page number using the `pageNum` option. @@ -413,30 +416,39 @@ order_total ordered_at """ ``` -### Additional Create Query examples +### Additional create query examples -The following section provides query examples for the GraphQL API, such as how to query metrics, dimensions, where filters, and more. +The following section provides query examples for the GraphQL API, such as how to query metrics, dimensions, where filters, and more: -**Query two metrics grouped by time** + - [Query metric alias](#query-metric-alias) + - [Query with a time grain](#query-with-a-time-grain) + - [Query multiple metrics and multiple dimensions](#query-multiple-metrics-and-multiple-dimensions) + - [Query a categorical dimension on its own](#query-a-categorical-dimension-on-its-own) + - [Query with a where filter](#query-with-a-where-filter) + - [Query with order](#query-with-order) + - [Query with limit](#query-with-limit) + - [Query saved queries](#query-saved-queries) + - [Query with just compiling SQL](#query-with-just-compiling-sql) + +#### Query metric alias ```graphql mutation { createQuery( - environmentId: BigInt! - metrics: [{name: "food_order_amount"}] - groupBy: [{name: "metric_time"}, {name: "customer__customer_type"}] + environmentId: "123" + metrics: [{name: "metric_name", alias: "metric_alias"}] ) { - queryId + ... } } ``` -**Query with a time grain** +#### Query with a time grain ```graphql mutation { createQuery( - environmentId: BigInt! + environmentId: "123" metrics: [{name: "order_total"}] groupBy: [{name: "metric_time", grain: MONTH}] ) { @@ -447,12 +459,12 @@ mutation { Note that when using granularity in the query, the output of a time dimension with a time grain applied to it always takes the form of a dimension name appended with a double underscore and the granularity level - `{time_dimension_name}__{DAY|WEEK|MONTH|QUARTER|YEAR}`. Even if no granularity is specified, it will also always have a granularity appended to it and will default to the lowest available (usually daily for most data sources). It is encouraged to specify a granularity when using time dimensions so that there won't be any unexpected results with the output data. -**Query two metrics with a categorical dimension** +#### Query multiple metrics and multiple dimensions ```graphql mutation { createQuery( - environmentId: BigInt! + environmentId: "123" metrics: [{name: "food_order_amount"}, {name: "order_gross_profit"}] groupBy: [{name: "metric_time", grain: MONTH}, {name: "customer__customer_type"}] ) { @@ -461,12 +473,12 @@ mutation { } ``` -**Query a categorical dimension on its own** +#### Query a categorical dimension on its own ```graphql mutation { createQuery( - environmentId: 123456 + environmentId: "123" groupBy: [{name: "customer__customer_type"}] ) { queryId @@ -474,7 +486,7 @@ mutation { } ``` -**Query with a where filter** +#### Query with a where filter The `where` filter takes a list argument (or a string for a single input). Depending on the object you are filtering, there are a couple of parameters: @@ -487,7 +499,7 @@ Note: If you prefer a `where` clause with a more explicit path, you can optional ```graphql mutation { createQuery( - environmentId: BigInt! + environmentId: "123" metrics:[{name: "order_total"}] groupBy:[{name: "customer__customer_type"}, {name: "metric_time", grain: month}] where:[{sql: "{{ Dimension('customer__customer_type') }} = 'new'"}, {sql:"{{ Dimension('metric_time').grain('month') }} > '2022-10-01'"}] @@ -497,9 +509,11 @@ mutation { } ``` -For both `TimeDimension()`, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures and metrics associated with the where filter have different grains. +For both `TimeDimension()`, the grain is only required in the `where` filter if the aggregation time dimensions for the measures and metrics associated with the where filter have different grains. -For example, consider this Semantic model and Metric configuration, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model. +#### Example + +For example, consider this semantic model and metric configuration, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model. ```yaml semantic_model: @@ -536,22 +550,36 @@ metrics: measure: measure_1 ``` -Assuming the user is querying `metric_0` and `metric_1` together, a valid filter would be: +Assuming the user is querying `metric_0` and `metric_1` together, the following are valid or invalid filters: - * `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` +|
Example
|
Filter
| +| ------- | ------ | +| ✅
Valid filter| `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` | +| ❌
Invalid filter | ` "{{ TimeDimension('metric_time') }} > '2020-01-01'"`

Metrics in the query are defined based on measures with different grains. | +❌
Invalid filter | `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"`

`metric_1` is not available at a month grain. | -Invalid filters would be: - - * ` "{{ TimeDimension('metric_time') }} > '2020-01-01'"` — metrics in the query are defined based on measures with different grains. - * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` — `metric_1` is not available at a month grain. -**Query with Order** +#### Multi-hop joins + +In cases where you need to query across multiple related tables (multi-hop joins), use the `entity_path` argument to specify the path between related entities. The following are examples of how you can define these joins: + +- In this example, you're querying the `location_name` dimension but specifying that it should be joined using the `order_id` field. + ```sql + {{Dimension('location__location_name', entity_path=['order_id'])}} + ``` +- In this example, the `salesforce_account_owner` dimension is joined to the `region` field, with the path going through `salesforce_account`. + ```sql + {{ Dimension('salesforce_account_owner__region',['salesforce_account']) }} + ``` + + +#### Query with order ```graphql mutation { createQuery( - environmentId: BigInt! + environmentId: "123" metrics: [{name: "order_total"}] groupBy: [{name: "metric_time", grain: MONTH}] orderBy: [{metric: {name: "order_total"}}, {groupBy: {name: "metric_time", grain: MONTH}, descending:true}] @@ -561,13 +589,12 @@ mutation { } ``` - -**Query with Limit** +#### Query with limit ```graphql mutation { createQuery( - environmentId: BigInt! + environmentId: "123" metrics: [{name:"food_order_amount"}, {name: "order_gross_profit"}] groupBy: [{name:"metric_time", grain: MONTH}, {name: "customer__customer_type"}] limit: 10 @@ -577,34 +604,17 @@ mutation { } ``` -**Query with just compiling SQL** - -This takes the same inputs as the `createQuery` mutation. - -```graphql -mutation { - compileSql( - environmentId: BigInt! - metrics: [{name:"food_order_amount"} {name:"order_gross_profit"}] - groupBy: [{name:"metric_time", grain: MONTH}, {name:"customer__customer_type"}] - ) { - sql - } -} -``` - -**Querying compile SQL with saved queries** +#### Query saved queries -This query includes the field `savedQuery` and generates the SQL based on a predefined [saved query](/docs/build/saved-queries),rather than dynamically building it from a list of metrics and groupings. You can use this for frequently used queries. +This takes the same inputs as the `createQuery` mutation, but includes the field `savedQuery`. You can use this for frequently used queries. ```graphql mutation { - compileSql( - environmentId: 200532 - savedQuery: "new_customer_orders" # new field + createQuery( + environmentId: "123" + savedQuery: "new_customer_orders" ) { queryId - sql } } ``` @@ -613,30 +623,18 @@ mutation { When querying [saved queries](/docs/build/saved-queries),you can use parameters such as `where`, `limit`, `order`, `compile`, and so on. However, keep in mind that you can't access `metric` or `group_by` parameters in this context. This is because they are predetermined and fixed parameters for saved queries, and you can't change them at query time. If you would like to query more metrics or dimensions, you can build the query using the standard format. ::: -**Create query with saved queries** +#### Query with just compiling SQL -This takes the same inputs as the `createQuery` mutation, but includes the field `savedQuery`. You can use this for frequently used queries. +This takes the same inputs as the `createQuery` mutation. ```graphql mutation { - createQuery( - environmentId: 200532 - savedQuery: "new_customer_orders" # new field + compileSql( + environmentId: "123" + metrics: [{name:"food_order_amount"} {name:"order_gross_profit"}] + groupBy: [{name:"metric_time", grain: MONTH}, {name:"customer__customer_type"}] ) { - queryId + sql } } ``` - -### Multi-hop joins - -In cases where you need to query across multiple related tables (multi-hop joins), use the `entity_path` argument to specify the path between related entities. The following are examples of how you can define these joins: - -- In this example, you're querying the `location_name` dimension but specifying that it should be joined using the `order_id` field. - ```sql - {{Dimension('location__location_name', entity_path=['order_id'])}} - ``` -- In this example, the `salesforce_account_owner` dimension is joined to the `region` field, with the path going through `salesforce_account`. - ```sql - {{ Dimension('salesforce_account_owner__region',['salesforce_account']) }} - ``` diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index d9ce3bf4fd1..64c1ca529f3 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -202,112 +202,20 @@ select * from semantic_layer.saved_queries() - +For more a more detailed example, see [Query metric alias](#query-metric-alias). + ## Querying the API for metric values @@ -334,10 +242,24 @@ Additionally, when performing granularity calculations that are global (not spec Note that `metric_time` should be available in addition to any other time dimensions that are available for the metric(s). In the case where you are looking at one metric (or multiple metrics from the same data source), the values in the series for the primary time dimension and `metric_time` are equivalent. - ## Examples -Refer to the following examples to help you get started with the JDBC API. +The following sections provide examples of how to query metrics using the JDBC API: + + - [Fetch metadata for metrics](#fetch-metadata-for-metrics) + - [Query common dimensions](#query-common-dimensions) + - [Query grouped by time](#query-grouped-by-time) + - [Query with a time grain](#query-with-a-time-grain) + - [Group by categorical dimension](#group-by-categorical-dimension) + - [Query only a dimension](#query-only-a-dimension) + - [Query by all dimensions](#query-by-all-dimensions) + - [Query with where filters](#query-with-where-filters) + - [Query with a limit](#query-with-a-limit) + - [Query with order by examples](#query-with-order-by-examples) + - [Query with compile keyword](#query-with-compile-keyword) + - [Query a saved query](#query-a-saved-query) + - [Query metric alias](#query-metric-alias) + - [Multi-hop joins](#multi-hop-joins) ### Fetch metadata for metrics @@ -403,6 +325,19 @@ select * from {{ }} ``` +### Query by all dimensions + +You can use the `semantic_layer.query_with_all_group_bys` endpoint to query by all valid dimensions. + +```sql +select * from {{ + semantic_layer.query_with_all_group_bys(metrics =['revenue','orders','food_orders'], + compile= True) +}} +``` + +This returns all dimensions that are valid for the set of metrics in the request. + ### Query with where filters Where filters in API allow for a filter list or string. We recommend using the filter list for production applications as this format will realize all benefits from the where possible. @@ -499,7 +434,7 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], }} ``` -### Query with Order By Examples +### Query with order by examples Order By can take a basic string that's a Dimension, Metric, or Entity, and this will default to ascending order @@ -577,6 +512,43 @@ select * from {{ semantic_layer.query(saved_query="new_customer_orders", limit=5 The JDBC API will use the saved query (`new_customer_orders`) as defined and apply a limit of 5 records. +### Query metric alias + +You can query metrics using aliases, which allow you to use simpler or more intuitive names for metrics instead of their full definitions. + +```sql +select * from {{ + semantic_layer.query(metrics=[Metric("revenue", alias="metric_alias")]) +}} +``` + +For example, let's say your metric configuration includes an alias like `total_revenue_global` for the `order_total` metric. You can query the metric using the alias instead of the original name: + +```sql +select * from {{ + semantic_layer.query(metrics=[Metric("order_total", alias="total_revenue_global")], group_by=['metric_time']) +}} +``` + +The result will be: + +``` +| METRIC_TIME | TOTAL_REVENUE_GLOBAL | +|:-------------:|:------------------: | +| 2023-12-01 | 1500.75 | +| 2023-12-02 | 1725.50 | +| 2023-12-03 | 1850.00 | +``` + +:::tip +Note that you need to use the actual metric name when using the `where` Jinja clauses. For example, if you used `banana` as an alias for `revenue`, you need to use the actual metric name, `revenue`, in the `where` clause, not `banana`. + +```graphql +semantic_layer.query(metrics=[Metric("revenue", alias="banana")], where="{{ Metric('revenue') }} > 0") +``` + +::: + ### Multi-hop joins In cases where you need to query across multiple related tables (multi-hop joins), use the `entity_path` argument to specify the path between related entities. The following are examples of how you can define these joins: @@ -590,6 +562,7 @@ In cases where you need to query across multiple related tables (multi-hop joins {{ Dimension('salesforce_account_owner__region',['salesforce_account']) }} ``` + ## FAQs diff --git a/website/docs/docs/dbt-cloud-apis/user-tokens.md b/website/docs/docs/dbt-cloud-apis/user-tokens.md index b7bf4fdce28..dc246a0958b 100644 --- a/website/docs/docs/dbt-cloud-apis/user-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/user-tokens.md @@ -1,5 +1,6 @@ --- title: "User tokens" +sidebar_label: "Personal access tokens " id: "user-tokens" pagination_next: "docs/dbt-cloud-apis/service-tokens" --- diff --git a/website/docs/docs/dbt-versions/2024-release-notes.md b/website/docs/docs/dbt-versions/2024-release-notes.md new file mode 100644 index 00000000000..ef03a5102df --- /dev/null +++ b/website/docs/docs/dbt-versions/2024-release-notes.md @@ -0,0 +1,453 @@ +--- +title: "2024 dbt Cloud release notes" +description: "2024 dbt Cloud release notes" +id: "2024-release-notes" +sidebar: "2024 release notes" +pagination_next: null +pagination_prev: null +--- + +dbt Cloud release notes for recent and historical changes. Release notes fall into one of the following categories: + +- **New:** New products and features +- **Enhancement:** Performance improvements and feature enhancements +- **Fix:** Bug and security fixes +- **Behavior change:** A change to existing behavior that doesn't fit into the other categories, such as feature deprecations or changes to default settings + +Release notes are grouped by month for both multi-tenant and virtual private cloud (VPC)\* environments + +\* The official release date for this new format of release notes is May 15th, 2024. Historical release notes for prior dates may not reflect all available features released earlier this year or their tenancy availability. + +## December 2024 + +- **New**: Saved queries now support [tags](/reference/resource-configs/tags), which allow you to categorize your resources and filter them. Add tags to your [saved queries](/docs/build/saved-queries) in the `semantic_model.yml` file or `dbt_project.yml` file. For example: + + + ```yml + [saved-queries](/docs/build/saved-queries): + jaffle_shop: + customer_order_metrics: + +tags: order_metrics + ``` + +- **New**: [Dimensions](/reference/resource-configs/meta) now support the `meta` config property in [dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and from dbt Core 1.9. You can add metadata to your dimensions to provide additional context and information about the dimension. Refer to [meta](/reference/resource-configs/meta) for more information. +- **New**: [Auto exposures](/docs/collaborate/auto-exposures) are now generally available to dbt Cloud Enterprise plans. Auto-exposures integrate natively with Tableau (Power BI coming soon) and auto-generate downstream lineage in dbt Explorer for a richer experience. +- **New**: The dbt Semantic Layer supports Sigma as a [partner integration](/docs/cloud-integrations/avail-sl-integrations), available in Preview. Refer to [Sigma](https://help.sigmacomputing.com/docs/configure-a-dbt-semantic-layer-integration) for more information. +- **New**: The dbt Semantic Layer now supports Azure Single-tenant deployments. Refer to [Set up the dbt Semantic Layer](/docs/use-dbt-semantic-layer/setup-sl) for more information on how to get started. +- **Fix**: Resolved intermittent issues in Single-tenant environments affecting Semantic Layer and query history. +- **Fix**: [The dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl) now respects the BigQuery [`execution_project` attribute](/docs/core/connect-data-platform/bigquery-setup#execution-project), including for exports. +- **New**: [Model notifications](/docs/deploy/model-notifications) are now generally available in dbt Cloud. These notifications alert model owners through email about any issues encountered by models and tests as soon as they occur while running a job. +- **New**: You can now use your [Azure OpenAI key](/docs/cloud/account-integrations?ai-integration=azure#ai-integrations) (available in beta) to use dbt Cloud features like [dbt Copilot](/docs/cloud/dbt-copilot) and [Ask dbt](/docs/cloud-integrations/snowflake-native-app) . Additionally, you can use your own [OpenAI API key](/docs/cloud/account-integrations?ai-integration=openai#ai-integrations) or use [dbt Labs-managed OpenAI](/docs/cloud/account-integrations?ai-integration=dbtlabs#ai-integrations) key. Refer to [AI integrations](/docs/cloud/account-integrations#ai-integrations) for more information. +- **New**: The [`hard_deletes`](/reference/resource-configs/hard-deletes) config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table. + +## November 2024 + +- **Enhancement**: Data health signals in dbt Explorer are now available for Exposures, providing a quick view of data health while browsing resources. To view trust signal icons, go to dbt Explorer and click **Exposures** under the **Resource** tab. Refer to [Data health signals for resources](/docs/collaborate/data-health-signals) for more info. +- **Bug**: Identified and fixed an error with Semantic Layer queries that take longer than 10 minutes to complete. +- **Fix**: Job environment variable overrides in credentials are now respected for Exports. Previously, they were ignored. +- **Behavior change**: If you use a custom microbatch macro, set a [`require_batched_execution_for_custom_microbatch_strategy` behavior flag](/reference/global-configs/behavior-changes#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the [microbatch strategy](/docs/build/incremental-microbatch#how-microbatch-compares-to-other-incremental-strategies). +- **Enhancement**: For users that have Advanced CI's [compare changes](/docs/deploy/advanced-ci#compare-changes) feature enabled, you can optimize performance when running comparisons by using custom dbt syntax to customize deferral usage, exclude specific large models (or groups of models with tags), and more. Refer to [Compare changes custom commands](/docs/deploy/job-commands#compare-changes-custom-commands) for examples of how to customize the comparison command. +- **New**: SQL linting in CI jobs is now generally available in dbt Cloud. You can enable SQL linting in your CI jobs, using [SQLFluff](https://sqlfluff.com/), to automatically lint all SQL files in your project as a run step before your CI job builds. SQLFluff linting is available on [dbt Cloud release tracks](/docs/dbt-versions/cloud-release-tracks) and to dbt Cloud [Team or Enterprise](https://www.getdbt.com/pricing/) accounts. Refer to [SQL linting](/docs/deploy/continuous-integration#sql-linting) for more information. +- **New**: Use the [`dbt_valid_to_current`](/reference/resource-configs/dbt_valid_to_current) config to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. This feature is available in [the dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) (formerly called `Versionless`) and dbt Core v1.9 and later. +- **New**: Use the [`event_time`](/reference/resource-configs/event-time) configuration to specify "at what time did the row occur." This configuration is required for [Incremental microbatch](/docs/build/incremental-microbatch) and can be added to ensure you're comparing overlapping times in [Advanced CI's compare changes](/docs/deploy/advanced-ci). Available in [the dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) (formerly called `Versionless`) and dbt Core v1.9 and higher. +- **Fix**: This update improves [dbt Semantic Layer Tableau integration](/docs/cloud-integrations/semantic-layer/tableau) making query parsing more reliable. Some key fixes include: + - Error messages for unsupported joins between saved queries and ALL tables. + - Improved handling of queries when multiple tables are selected in a data source. + - Fixed a bug when an IN filter contained a lot of values. + - Better error messaging for queries that can't be parsed correctly. +- **Enhancement**: The dbt Semantic Layer supports creating new credentials for users who don't have permissions to create service tokens. In the **Credentials & service tokens** side panel, the **+Add Service Token** option is unavailable for those users who don't have permission. Instead, the side panel displays a message indicating that the user doesn't have permission to create a service token and should contact their administration. Refer to [Set up dbt Semantic Layer](/docs/use-dbt-semantic-layer/setup-sl) for more details. + + +## October 2024 + + + + Documentation for new features and functionality announced at Coalesce 2024: + + - Iceberg table support for [Snowflake](https://docs.getdbt.com/reference/resource-configs/snowflake-configs#iceberg-table-format) + - [Athena](https://docs.getdbt.com/reference/resource-configs/athena-configs) and [Teradata](https://docs.getdbt.com/reference/resource-configs/teradata-configs) adapter support in dbt Cloud + - dbt Cloud now hosted on [Azure](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses) + - Get comfortable with [dbt Cloud Release Tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks) that keep your project up-to-date, automatically — on a cadence appropriate for your team + - Scalable [microbatch incremental models](https://docs.getdbt.com/docs/build/incremental-microbatch) + - Advanced CI [features](https://docs.getdbt.com/docs/deploy/advanced-ci) + - [Linting with CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration#sql-linting) + - dbt Assist is now [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot) + - Developer blog on [Snowflake Feature Store and dbt: A bridge between data pipelines and ML](https://docs.getdbt.com/blog/snowflake-feature-store) + - New [Quickstart for dbt Cloud CLI](https://docs.getdbt.com/guides/dbt-cloud-cli?step=1) + - [Auto-exposures with Tableau](https://docs.getdbt.com/docs/collaborate/auto-exposures) + - Semantic Layer integration with [Excel desktop and M365](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel) + - [Data health tiles](https://docs.getdbt.com/docs/collaborate/data-tile) + - [Semantic Layer and Cloud IDE integration](https://docs.getdbt.com/docs/build/metricflow-commands#metricflow-commands) + - Query history in [Explorer](https://docs.getdbt.com/docs/collaborate/model-query-history#view-query-history-in-explorer) + - Semantic Layer Metricflow improvements, including [improved granularity and custom calendar](https://docs.getdbt.com/docs/build/metricflow-time-spine#custom-calendar) + - [Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python) is now generally available + + + +- **Behavior change:** [Multi-factor authentication](/docs/cloud/manage-access/mfa) is now enforced on all users who log in with username and password credentials. +- **Enhancement**: The dbt Semantic Layer JDBC now allows users to paginate `semantic_layer.metrics()` and `semantic_layer.dimensions()` for metrics and dimensions using `page_size` and `page_number` parameters. Refer to [Paginate metadata calls](/docs/dbt-cloud-apis/sl-jdbc#querying-the-api-for-metric-metadata) for more information. +- **Enhancement**: The dbt Semantic Layer JDBC now allows you to filter your metrics to include only those that contain a specific substring, using the `search` parameter. If no substring is provided, the query returns all metrics. Refer to [Fetch metrics by substring search](/docs/dbt-cloud-apis/sl-jdbc#querying-the-api-for-metric-metadata) for more information. +- **Fix**: The [dbt Semantic Layer Excel integration](/docs/cloud-integrations/semantic-layer/excel) now correctly surfaces errors when a query fails to execute. Previously, it was not clear why a query failed to run. +- **Fix:** Previously, POST requests to the Jobs API with invalid `cron` strings would return HTTP response status code 500s but would update the underlying entity. Now, POST requests to the Jobs API with invalid `cron` strings will result in status code 400s, without the underlying entity being updated. +- **Fix:** Fixed an issue where the `Source` view page in dbt Explorer did not correctly display source freshness status if older than 30 days. +- **Fix:** The UI now indicates when the description of a model is inherited from a catalog comment. +- **Behavior change:** User API tokens have been deprecated. Update to [personal access tokens](/docs/dbt-cloud-apis/user-tokens) if you have any still in use. +- **New**: The dbt Cloud IDE supports signed commits for Git, available for Enterprise plans. You can sign your Git commits when pushing them to the repository to prevent impersonation and enhance security. Supported Git providers are GitHub and GitLab. Refer to [Git commit signing](/docs/cloud/dbt-cloud-ide/git-commit-signing.md) for more information. +- **New:** With dbt Mesh, you can now enable bidirectional dependencies across your projects. Previously, dbt enforced dependencies to only go in one direction. dbt checks for cycles across projects and raises errors if any are detected. For details, refer to [Cycle detection](/docs/collaborate/govern/project-dependencies#cycle-detection). There's also the [Intro to dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro) guide to help you learn more best practices. +- **New**: The [dbt Semantic Layer Python software development kit](/docs/dbt-cloud-apis/sl-python) is now [generally available](/docs/dbt-versions/product-lifecycles). It provides users with easy access to the dbt Semantic Layer with Python and enables developers to interact with the dbt Semantic Layer APIs to query metrics/dimensions in downstream tools. +- **Enhancement**: You can now add a description to a singular data test. Use the [`description` property](/reference/resource-properties/description) to document [singular data tests](/docs/build/data-tests#singular-data-tests). You can also use [docs block](/docs/build/documentation#using-docs-blocks) to capture your test description. The enhancement is available now in [the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks), and it will be included in dbt Core v1.9. +- **New**: Introducing the [microbatch incremental model strategy](/docs/build/incremental-microbatch) (beta), available now in [dbt Cloud Latest](/docs/dbt-versions/cloud-release-tracks) and will soon be supported in dbt Core v1.9. The microbatch strategy allows for efficient, batch-based processing of large time-series datasets for improved performance and resiliency, especially when you're working with data that changes over time (like new records being added daily). To enable this feature in dbt Cloud, set the `DBT_EXPERIMENTAL_MICROBATCH` environment variable to `true` in your project. +- **New**: The dbt Semantic Layer supports custom calendar configurations in MetricFlow, available in [Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). Custom calendar configurations allow you to query data using non-standard time periods like `fiscal_year` or `retail_month`. Refer to [custom calendar](/docs/build/metricflow-time-spine#custom-calendar) to learn how to define these custom granularities in your MetricFlow timespine YAML configuration. +- **New**: In the "Latest" release track in dbt Cloud, [Snapshots](/docs/build/snapshots) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9. + - Who does this affect? Users of the "Latest" release track in dbt Cloud can define snapshots using the new YAML specification. Users upgrading to "Latest" who have existing snapshot definitions can keep their existing configurations, or they can choose to migrate their snapshot definitions to YAML. + - Users on older versions: No action is needed; existing snapshots will continue to work as before. However, we recommend upgrading to the "Latest" release track to take advantage of the new snapshot features. +- **Behavior change:** Set [`state_modified_compare_more_unrendered_values`](/reference/global-configs/behavior-changes#source-definitions-for-state) to true to reduce false positives for `state:modified` when configs differ between `dev` and `prod` environments. +- **Behavior change:** Set the [`skip_nodes_if_on_run_start_fails`](/reference/global-configs/behavior-changes#failures-in-on-run-start-hooks) flag to `True` to skip all selected resources from running if there is a failure on an `on-run-start` hook. +- **Enhancement**: In the "Latest" release track in dbt Cloud, snapshots defined in SQL files can now use `config` defined in `schema.yml` YAML files. This update resolves the previous limitation that required snapshot properties to be defined exclusively in `dbt_project.yml` and/or a `config()` block within the SQL file. This will also be released in dbt Core 1.9. +- **New**: In the "Latest" release track in dbt Cloud, the `snapshot_meta_column_names` config allows for customizing the snapshot metadata columns. This feature allows an organization to align these automatically-generated column names with their conventions, and will be included in the upcoming dbt Core 1.9 release. +- **Enhancement**: the "Latest" release track in dbt Cloud infers a model's `primary_key` based on configured data tests and/or constraints within `manifest.json`. The inferred `primary_key` is visible in dbt Explorer and utilized by the dbt Cloud [compare changes](/docs/deploy/run-visibility#compare-tab) feature. This will also be released in dbt Core 1.9. Read about the [order dbt infers columns can be used as primary key of a model](https://github.com/dbt-labs/dbt-core/blob/7940ad5c7858ff11ef100260a372f2f06a86e71f/core/dbt/contracts/graph/nodes.py#L534-L541). +- **New:** dbt Explorer now includes trust signal icons, which is currently available as a [Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). Trust signals offer a quick, at-a-glance view of data health when browsing your dbt models in Explorer. These icons indicate whether a model is **Healthy**, **Caution**, **Degraded**, or **Unknown**. For accurate health data, ensure the resource is up-to-date and has had a recent job run. Refer to [Data health signals](/docs/collaborate/data-health-signals) for more information. +- **New:** Auto exposures are now available in Preview in dbt Cloud. Auto-exposures helps users understand how their models are used in downstream analytics tools to inform investments and reduce incidents. It imports and auto-generates exposures based on Tableau dashboards, with user-defined curation. To learn more, refer to [Auto exposures](/docs/collaborate/auto-exposures). + + +## September 2024 + +- **Fix**: MetricFlow updated `get_and_expire` to replace the unsupported `GETEX` command with a `GET` and conditional expiration, ensuring compatibility with Azure Redis 6.0. +- **Enhancement**: The [dbt Semantic Layer Python SDK](/docs/dbt-cloud-apis/sl-python) now supports `TimeGranularity` custom grain for metrics. This feature allows you to define custom time granularities for metrics, such as `fiscal_year` or `retail_month`, to query data using non-standard time periods. +- **New**: Use the dbt Copilot AI engine to generate semantic model for your models, now available in beta. dbt Copilot automatically generates documentation, tests, and now semantic models based on the data in your model, . To learn more, refer to [dbt Copilot](/docs/cloud/dbt-copilot). +- **New**: Use the new recommended syntax for [defining `foreign_key` constraints](/reference/resource-properties/constraints) using `refs`, available in the "Latest" release track in dbt Cloud. This will soon be released in dbt Core v1.9. This new syntax will capture dependencies and works across different environments. +- **Enhancement**: You can now run [Semantic Layer commands](/docs/build/metricflow-commands) commands in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). The supported commands are `dbt sl list`, `dbt sl list metrics`, `dbt sl list dimension-values`, `dbt sl list saved-queries`, `dbt sl query`, `dbt sl list dimensions`, `dbt sl list entities`, and `dbt sl validate`. +- **New**: Microsoft Excel, a dbt Semantic Layer integration, is now generally available. The integration allows you to connect to Microsoft Excel to query metrics and collaborate with your team. Available for [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) or [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True). For more information, refer to [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). +- **New**: [Data health tile](/docs/collaborate/data-tile) is now generally available in dbt Explorer. Data health tiles provide a quick at-a-glance view of your data quality, highlighting potential issues in your data. You can embed these tiles in your dashboards to quickly identify and address data quality issues in your dbt project. +- **New**: dbt Explorer's Model query history feature is now in Preview for dbt Cloud Enterprise customers. Model query history allows you to view the count of consumption queries for a model based on the data warehouse's query logs. This feature provides data teams insight, so they can focus their time and infrastructure spend on the worthwhile used data products. To learn more, refer to [Model query history](/docs/collaborate/model-query-history). +- **Enhancement**: You can now use [Extended Attributes](/docs/dbt-cloud-environments#extended-attributes) and [Environment Variables](/docs/build/environment-variables) when connecting to the Semantic Layer. If you set a value directly in the Semantic Layer Credentials, it will have a higher priority than Extended Attributes. When using environment variables, the default value for the environment will be used. If you're using exports, job environment variable overrides aren't supported yet, but they will be soon. +- **New:** There are two new [environment variable defaults](/docs/build/environment-variables#dbt-cloud-context) — `DBT_CLOUD_ENVIRONMENT_NAME` and `DBT_CLOUD_ENVIRONMENT_TYPE`. +- **New:** The [Amazon Athena warehouse connection](/docs/cloud/connect-data-platform/connect-amazon-athena) is available as a public preview for dbt Cloud accounts that have upgraded to [the "Latest" release track](/docs/dbt-versions/cloud-release-tracks). + +## August 2024 + +- **Fix:** Fixed an issue in [dbt Explorer](/docs/collaborate/explore-projects) where navigating to a consumer project from a public node resulted in displaying a random public model rather than the original selection. +- **New**: You can now configure metrics at granularities at finer time grains, such as hour, minute, or even by the second. This is particularly useful for more detailed analysis and for datasets where high-resolution time data is required, such as minute-by-minute event tracking. Refer to [dimensions](/docs/build/dimensions) for more information about time granularity. +- **Enhancement**: Microsoft Excel now supports [saved selections](/docs/cloud-integrations/semantic-layer/excel#using-saved-selections) and [saved queries](/docs/cloud-integrations/semantic-layer/excel#using-saved-queries). Use Saved selections to save your query selections within the Excel application. The application also clears stale data in [trailing rows](/docs/cloud-integrations/semantic-layer/excel#other-settings) by default. To return your results and keep any previously selected data intact, un-select the **Clear trailing rows** option. +- **Behavior change:** GitHub is no longer supported for OAuth login to dbt Cloud. Use a supported [SSO or OAuth provider](/docs/cloud/manage-access/sso-overview) to securely manage access to your dbt Cloud account. + +## July 2024 +- **Behavior change:** `target_schema` is no longer a required configuration for [snapshots](/docs/build/snapshots). You can now target different schemas for snapshots across development and deployment environments using the [schema config](/reference/resource-configs/schema). +- **New:** [Connections](/docs/cloud/connect-data-platform/about-connections#connection-management) are now available under **Account settings** as a global setting. Previously, they were found under **Project settings**. This is being rolled out in phases over the coming weeks. +- **New:** Admins can now assign [environment-level permissions](/docs/cloud/manage-access/environment-permissions) to groups for specific roles. +- **New:** [Merge jobs](/docs/deploy/merge-jobs) for implementing [continuous deployment (CD)](/docs/deploy/continuous-deployment) workflows are now GA in dbt Cloud. Previously, you had to either set up a custom GitHub action or manually build the changes every time a pull request is merged. +- **New**: The ability to lint your SQL files from the dbt Cloud CLI is now available. To learn more, refer to [Lint SQL files](/docs/cloud/configure-cloud-cli#lint-sql-files). +- **Behavior change:** dbt Cloud IDE automatically adds a `--limit 100` to preview queries to avoid slow and expensive queries during development. Recently, dbt Core changed how the `limit` is applied to ensure that `order by` clauses are consistently respected. Because of this, queries that already contain a limit clause might now cause errors in the IDE previews. To address this, dbt Labs plans to provide an option soon to disable the limit from being applied. Until then, dbt Labs recommends removing the (duplicate) limit clause from your queries during previews to avoid these IDE errors. + +- **Enhancement**: Introducing a revamped overview page for dbt Explorer, available in beta. It includes a new design and layout for the Explorer homepage. The new layout provides a more intuitive experience for users to navigate their dbt projects, as well as a new **Latest updates** section to view the latest changes or issues related to project resources. To learn more, refer to [Overview page](/docs/collaborate/explore-projects#overview-page). + +#### dbt Semantic Layer +- **New**: Introduced the [`dbt-sl-sdk` Python software development kit (SDK)](https://github.com/dbt-labs/semantic-layer-sdk-python) Python library, which provides you with easy access to the dbt Semantic Layer with Python. It allows developers to interact with the dbt Semantic Layer APIs and query metrics and dimensions in downstream tools. Refer to the [dbt Semantic Layer Python SDK](/docs/dbt-cloud-apis/sl-python) for more information. +- **New**: Introduced Semantic validations in CI pipelines. Automatically test your semantic nodes (metrics, semantic models, and saved queries) during code reviews by adding warehouse validation checks in your CI job using the `dbt sl validate` command. You can also validate modified semantic nodes to guarantee code changes made to dbt models don't break these metrics. Refer to [Semantic validations in CI](/docs/deploy/ci-jobs#semantic-validations-in-ci) to learn about the additional commands and use cases. +- **New**: We now expose the `meta` field within the [config property](/reference/resource-configs/meta) for dbt Semantic Layer metrics in the [JDBC and GraphQL APIs](/docs/dbt-cloud-apis/sl-api-overview) under the `meta` field. +- **New**: Added a new command in the dbt Cloud CLI called `export-all`, which allows you to export multiple or all of your saved queries. Previously, you had to explicitly specify the [list of saved queries](/docs/build/metricflow-commands#list-saved-queries). +- **Enhancement**: The dbt Semantic Layer now offers more granular control by supporting multiple data platform credentials, which can represent different roles or service accounts. Available for dbt Cloud Enterprise plans, you can map credentials to service tokens for secure authentication. Refer to [Set up dbt Semantic Layer](/docs/use-dbt-semantic-layer/setup-sl#set-up-dbt-semantic-layer) for more details. +- **Fix**: Addressed a bug where unicode query filters (such as Chinese characters) were not working correctly in the dbt Semantic Layer Tableau integration. +- **Fix**: Resolved a bug with parsing certain private keys for BigQuery when running an export. +- **Fix**: Addressed a bug that caused a "closed connection" error to be returned when querying or running an Export. +- **Fix**: Resolved an issue in dbt Core where, during partial parsing, all generated metrics in a file were incorrectly deleted instead of just those related to the changed semantic model. Now, only the metrics associated with the modified model are affected. + +## June 2024 +- **New:** Introduced new granularity support for cumulative metrics in MetricFlow. Granularity options for cumulative metrics are slightly different than granularity for other metric types. For other metrics, we use the `date_trunc` function to implement granularity. However, because cumulative metrics are non-additive (values can't be added up), we can't use the `date_trunc` function to change their time grain granularity. + + Instead, we use the `first()`, `last()`, and `avg()` aggregation functions to aggregate cumulative metrics over the requested period. By default, we take the first value of the period. You can change this behavior by using the `period_agg` parameter. For more information, refer to [Granularity options for cumulative metrics](/docs/build/cumulative#granularity-options). + +#### dbt Semantic Layer +- **New:** Added support for SQL optimization in MetricFlow. We will now push down categorical dimension filters to the metric source table. Previously filters were applied after we selected from the metric source table. This change helps reduce full table scans on certain query engines. +- **New:** Enabled `where` filters on dimensions (included in saved queries) to use the cache during query time. This means you can now dynamically filter your dashboards without losing the performance benefits of caching. Refer to [caching](/docs/use-dbt-semantic-layer/sl-cache#result-caching) for more information. +- **Enhancement:** In [Google Sheets](/docs/cloud-integrations/semantic-layer/gsheets), we added information icons and descriptions to metrics and dimensions options in the Query Builder menu. Click on the **Info** icon button to view a description of the metric or dimension. Available in the following Query Builder menu sections: metric, group by, where, saved selections, and saved queries. +- **Enhancement:** In [Google Sheets](/docs/cloud-integrations/semantic-layer/gsheets), you can now apply granularity to all time dimensions, not just metric time. This update uses our [APIs](/docs/dbt-cloud-apis/sl-api-overview) to support granularity selection on any chosen time dimension. +- **Enhancement**: MetricFlow time spine warnings now prompt users to configure missing or small-grain-time spines. An error message is displayed for multiple time spines per granularity. +- **Enhancement**: Errors now display if no time spine is configured at the requested or smaller granularity. +- **Enhancement:** Improved querying error message when no semantic layer credentials were set. +- **Enhancement:** Querying grains for cumulative metrics now returns multiple granularity options (day, week, month, quarter, year) like all other metric types. Previously, you could only query one grain option for cumulative metrics. +- **Fix:** Removed errors that prevented querying cumulative metrics with other granularities. +- **Fix:** Fixed various Tableau errors when querying certain metrics or when using calculated fields. +- **Fix:** In Tableau, we relaxed naming field expectations to better identify calculated fields. +- **Fix:** Fixed an error when refreshing database metadata for columns that we can't convert to Arrow. These columns will now be skipped. This mainly affected Redshift users with custom types. +- **Fix:** Fixed Private Link connections for Databricks. + +#### Also available this month: + +- **Enhancement:** Updates to the UI when [creating merge jobs](/docs/deploy/merge-jobs) are now available. The updates include improvements to helper text, new deferral settings, and performance improvements. +- **New**: The dbt Semantic Layer now offers a seamless integration with Microsoft Excel, available in [preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). Build semantic layer queries and return data on metrics directly within Excel, through a custom menu. To learn more and install the add-on, check out [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). +- **New:** [Job warnings](/docs/deploy/job-notifications) are now GA. Previously, you could receive email or Slack alerts about your jobs when they succeeded, failed, or were canceled. Now with the new **Warns** option, you can also receive alerts when jobs have encountered warnings from tests or source freshness checks during their run. This gives you more flexibility on _when_ to be notified. +- **New:** A [preview](/docs/dbt-versions/product-lifecycles#dbt-cloud) of the dbt Snowflake Native App is now available. With this app, you can access dbt Explorer, the **Ask dbt** chatbot, and orchestration observability features, extending your dbt Cloud experience into the Snowflake UI. To learn more, check out [About the dbt Snowflake Native App](/docs/cloud-integrations/snowflake-native-app) and [Set up the dbt Snowflake Native App](/docs/cloud-integrations/set-up-snowflake-native-app). + +## May 2024 + +- **Enhancement:** We've now introduced a new **Prune branches** [Git button](/docs/cloud/dbt-cloud-ide/ide-user-interface#prune-branches-modal) in the dbt Cloud IDE. This button allows you to delete local branches that have been deleted from the remote repository, keeping your branch management tidy. Available in all regions now and will be released to single tenant accounts during the next release cycle. + +#### dbt Cloud Launch Showcase event + +The following features are new or enhanced as part of our [dbt Cloud Launch Showcase](https://www.getdbt.com/resources/webinars/dbt-cloud-launch-showcase) event on May 14th, 2024: + +- **New:** [dbt Copilot](/docs/cloud/dbt-copilot) is a powerful AI engine helping you generate documentation, tests, and semantic models, saving you time as you deliver high-quality data. Available in private beta for a subset of dbt Cloud Enterprise users and in the dbt Cloud IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. + +- **New:** The new low-code editor, now in private beta, enables less SQL-savvy analysts to create or edit dbt models through a visual, drag-and-drop experience inside of dbt Cloud. These models compile directly to SQL and are indistinguishable from other dbt models in your projects: they are version-controlled, can be accessed across projects in dbt Mesh, and integrate with dbt Explorer and the Cloud IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. + +- **New:** [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) is now Generally Available (GA) to all users. The dbt Cloud CLI is a command-line interface that allows you to interact with dbt Cloud, use automatic deferral, leverage dbt Mesh, and more! + +- **New:** The VS Code extension [Power user for dbt Core and dbt Cloud](https://docs.myaltimate.com/arch/beta/) is now available in beta for [dbt Cloud CLI](https://docs.myaltimate.com/setup/reqdConfigCloud/) users. The extension accelerates dbt and SQL development and includes features such as generating models from your source definitions or SQL, and [more](https://docs.myaltimate.com/)! + +- **New:** [Unit tests](/docs/build/unit-tests) are now GA in dbt Cloud. Unit tests enable you to test your SQL model logic against a set of static inputs. + +- + + Native support in dbt Cloud for Azure Synapse Analytics is now available as a [preview](/docs/dbt-versions/product-lifecycles#dbt-cloud)! + + To learn more, refer to [Connect Azure Synapse Analytics](/docs/cloud/connect-data-platform/connect-azure-synapse-analytics) and [Microsoft Azure Synapse DWH configurations](/reference/resource-configs/azuresynapse-configs). + + Also, check out the [Quickstart for dbt Cloud and Azure Synapse Analytics](/guides/azure-synapse-analytics?step=1). The guide walks you through: + + - Loading the Jaffle Shop sample data (provided by dbt Labs) into Azure Synapse Analytics. + - Connecting dbt Cloud to Azure Synapse Analytics. + - Turning a sample query into a model in your dbt project. A model in dbt is a SELECT statement. + - Adding tests to your models. + - Documenting your models. + - Scheduling a job to run. + + + +- **New:** MetricFlow enables you to now add metrics as dimensions to your metric filters to create more complex metrics and gain more insights. Available for all dbt Cloud Semantic Layer users. + +- **New:** [Staging environment](/docs/deploy/deploy-environments#staging-environment) is now GA. Use staging environments to grant developers access to deployment workflows and tools while controlling access to production data. Available to all dbt Cloud users. + +- **New:** Oauth login support via [Databricks](/docs/cloud/manage-access/set-up-databricks-oauth) is now GA to Enterprise customers. + +- + + dbt Explorer's current capabilities — including column-level lineage, model performance analysis, and project recommendations — are now Generally Available for dbt Cloud Enterprise and Teams plans. With Explorer, you can more easily navigate your dbt Cloud project – including models, sources, and their columns – to gain a better understanding of its latest production or staging state. + + To learn more about its features, check out: + + - [Explore projects](/docs/collaborate/explore-projects) + - [Explore multiple projects](/docs/collaborate/explore-multiple-projects) + - [Column-level lineage](/docs/collaborate/column-level-lineage) + - [Model performance](/docs/collaborate/model-performance) + - [Project recommendations](/docs/collaborate/project-recommendations) + + + +- **New:** Native support for Microsoft Fabric in dbt Cloud is now GA. This feature is powered by the [dbt-fabric](https://github.com/Microsoft/dbt-fabric) adapter. To learn more, refer to [Connect Microsoft Fabric](/docs/cloud/connect-data-platform/connect-microsoft-fabric) and [Microsoft Fabric DWH configurations](/reference/resource-configs/fabric-configs). There's also a [quickstart guide](https://docs.getdbt.com/guides/microsoft-fabric?step=1) to help you get started. + +- **New:** dbt Mesh is now GA to dbt Cloud Enterprise users. dbt Mesh is a framework that helps organizations scale their teams and data assets effectively. It promotes governance best practices and breaks large projects into manageable sections. Get started with dbt Mesh by reading the [dbt Mesh quickstart guide](https://docs.getdbt.com/guides/mesh-qs?step=1). + +- **New:** The dbt Semantic Layer [Tableau Desktop, Tableau Server](/docs/cloud-integrations/semantic-layer/tableau), and [Google Sheets integration](/docs/cloud-integrations/semantic-layer/gsheets) is now GA to dbt Cloud Team or Enterprise accounts. These first-class integrations allow you to query and unlock valuable insights from your data ecosystem. + +- **Enhancement:** As part of our ongoing commitment to improving the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud#considerations), the filesystem now comes with improvements to speed up dbt development, such as introducing a Git repository limit of 10GB. + +#### Also available this month: + +- **Update**: The [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) is now available for Azure single tenant and is accessible in all [deployment regions](/docs/cloud/about-cloud/access-regions-ip-addresses) for both multi-tenant and single-tenant accounts. + +- **New**: The [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl) introduces [declarative caching](/docs/use-dbt-semantic-layer/sl-cache), allowing you to cache common queries to speed up performance and reduce query compute costs. Available for dbt Cloud Team or Enterprise accounts. + +- + + The **Latest** Release Track is now Generally Available (previously Public Preview). + + On this release track, you get automatic upgrades of dbt, including early access to the latest features, fixes, and performance improvements for your dbt project. dbt Labs will handle upgrades behind-the-scenes, as part of testing and redeploying the dbt Cloud application — just like other dbt Cloud capabilities and other SaaS tools that you're using. No more manual upgrades and no more need for _a second sandbox project_ just to try out new features in development. + + To learn more about the new setting, refer to [Release Tracks](/docs/dbt-versions/cloud-release-tracks) for details. + + + + + +- **Behavior change:** Introduced the `require_resource_names_without_spaces` flag, opt-in and disabled by default. If set to `True`, dbt will raise an exception if it finds a resource name containing a space in your project or an installed package. This will become the default in a future version of dbt. Read [No spaces in resource names](/reference/global-configs/behavior-changes#no-spaces-in-resource-names) for more information. + +## April 2024 + +- + + You can now set up a continuous deployment (CD) workflow for your projects natively in dbt Cloud. You can now access a beta release of [Merge jobs](/docs/deploy/merge-jobs), which is a new [job type](/docs/deploy/jobs), that enables you to trigger dbt job runs as soon as changes (via Git pull requests) merge into production. + + + + + +- **Behavior change:** Introduced the `require_explicit_package_overrides_for_builtin_materializations` flag, opt-in and disabled by default. If set to `True`, dbt will only use built-in materializations defined in the root project or within dbt, rather than implementations in packages. This will become the default in May 2024 (dbt Core v1.8 and dbt Cloud release tracks). Read [Package override for built-in materialization](/reference/global-configs/behavior-changes#package-override-for-built-in-materialization) for more information. + +**dbt Semantic Layer** +- **New**: Use Saved selections to [save your query selections](/docs/cloud-integrations/semantic-layer/gsheets#using-saved-selections) within the [Google Sheets application](/docs/cloud-integrations/semantic-layer/gsheets). They can be made private or public and refresh upon loading. +- **New**: Metrics are now displayed by their labels as `metric_name`. +- **Enhancement**: [Metrics](/docs/build/metrics-overview) now supports the [`meta` option](/reference/resource-configs/meta) under the [config](/reference/resource-properties/config) property. Previously, we only supported the now deprecated `meta` tag. +- **Enhancement**: In the Google Sheets application, we added [support](/docs/cloud-integrations/semantic-layer/gsheets#using-saved-queries) to allow jumping off from or exploring MetricFlow-defined saved queries directly. +- **Enhancement**: In the Google Sheets application, we added support to query dimensions without metrics. Previously, you needed a dimension. +- **Enhancement**: In the Google Sheets application, we added support for time presets and complex time range filters such as "between", "after", and "before". +- **Enhancement**: In the Google Sheets application, we added supported to automatically populate dimension values when you select a "where" filter, removing the need to manually type them. Previously, you needed to manually type the dimension values. +- **Enhancement**: In the Google Sheets application, we added support to directly query entities, expanding the flexibility of data requests. +- **Enhancement**: In the Google Sheets application, we added an option to exclude column headers, which is useful for populating templates with only the required data. +- **Deprecation**: For the Tableau integration, the [`METRICS_AND_DIMENSIONS` data source](/docs/cloud-integrations/semantic-layer/tableau#using-the-integration) has been deprecated for all accounts not actively using it. We encourage users to transition to the "ALL" data source for future integrations. + +## March 2024 + +- **New:** The Semantic Layer services now support using Privatelink for customers who have it enabled. +- **New:** You can now develop against and test your Semantic Layer in the Cloud CLI if your developer credential uses SSO. +- **Enhancement:** You can select entities to Group By, Filter By, and Order By. +- **Fix:** `dbt parse` no longer shows an error when you use a list of filters (instead of just a string filter) on a metric. +- **Fix:** `join_to_timespine` now properly gets applied to conversion metric input measures. +- **Fix:** Fixed an issue where exports in Redshift were not always committing to the DWH, which also had the side-effect of leaving table locks open. +- **Behavior change:** Introduced the `source_freshness_run_project_hooks` flag, opt-in and disabled by default. If set to `True`, dbt will include `on-run-*` project hooks in the `source freshness` command. This will become the default in a future version of dbt. Read [Project hooks with source freshness](/reference/global-configs/behavior-changes#project-hooks-with-source-freshness) for more information. + + +## February 2024 + +- **New:** [Exports](/docs/use-dbt-semantic-layer/exports#define-exports) allow you to materialize a saved query as a table or view in your data platform. By using exports, you can unify metric definitions in your data platform and query them as you would any other table or view. +- **New:** You can access a list of your [exports](/docs/use-dbt-semantic-layer/exports) with the new list saved-queries command by adding `--show-exports` +- **New:** The dbt Semantic Layer and [Tableau Connector](/docs/cloud-integrations/semantic-layer/tableau) now supports relative date filters in Tableau. + +- + + You can now use the [exports](/docs/use-dbt-semantic-layer/exports) feature with [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl), allowing you to query reliable metrics and fast data reporting. Exports enhance the saved queries feature, allowing you to write commonly used queries directly within your data platform using dbt Cloud's job scheduler. + + By exposing tables of metrics and dimensions, exports enable you to integrate with additional tools that don't natively connect with the dbt Semantic Layer, such as PowerBI. + + Exports are available for dbt Cloud multi-tenant [Team or Enterprise](https://www.getdbt.com/pricing/) plans on dbt versions 1.7 or newer. Refer to the [exports blog](https://www.getdbt.com/blog/announcing-exports-for-the-dbt-semantic-layer) for more details. + + + + + +- + + Now available for dbt Cloud Team and Enterprise plans is the ability to trigger deploy jobs when other deploy jobs are complete. You can enable this feature [in the UI](/docs/deploy/deploy-jobs) with the **Run when another job finishes** option in the **Triggers** section of your job or with the [Create Job API endpoint](/dbt-cloud/api-v2#/operations/Create%20Job). + + When enabled, your job will run after the specified upstream job completes. You can configure which run status(es) will trigger your job. It can be just on `Success` or on all statuses. If you have dependencies between your dbt projects, this allows you to _natively_ orchestrate your jobs within dbt Cloud — no need to set up a third-party tool. + + An example of the **Triggers** section when creating the job: + + + + + +- + + _Now available in the dbt version dropdown in dbt Cloud — starting with select customers, rolling out to wider availability through February and March._ + + On this release track, you get automatic upgrades of dbt, including early access to the latest features, fixes, and performance improvements for your dbt project. dbt Labs will handle upgrades behind-the-scenes, as part of testing and redeploying the dbt Cloud application — just like other dbt Cloud capabilities and other SaaS tools that you're using. No more manual upgrades and no more need for _a second sandbox project_ just to try out new features in development. + + To learn more about the new setting, refer to [Release Tracks](/docs/dbt-versions/cloud-release-tracks) for details. + + + + + + +- + + You can now [override the dbt version](/docs/dbt-versions/upgrade-dbt-version-in-cloud#override-dbt-version) that's configured for the development environment within your project and use a different version — affecting only your user account. This lets you test new dbt features without impacting other people working on the same project. And when you're satisfied with the test results, you can safely upgrade the dbt version for your project(s). + + Use the **dbt version** dropdown to specify the version to override with. It's available on your project's credentials page in the **User development settings** section. For example: + + + + + +- + + You can now edit, format, or lint files and execute dbt commands directly in your primary git branch in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). This enhancement is available across various repositories, including native integrations, imported git URLs, and managed repos. + + This enhancement is currently available to all dbt Cloud multi-tenant regions and will soon be available to single-tenant accounts. + + The primary branch of the connected git repo has traditionally been _read-only_ in the IDE. This update changes the branch to _protected_ and allows direct edits. When a commit is made, dbt Cloud will prompt you to create a new branch. dbt Cloud will pre-populate the new branch name with the GIT_USERNAME-patch-#; however, you can edit the field with a custom branch name. + + Previously, the primary branch was displayed as read-only, but now the branch is displayed with a lock icon to identify it as protected: + + + + + + + + + + When you make a commit while on the primary branch, a modal window will open prompting you to create a new branch and enter a commit message: + + + + + +- **Enhancement:** The dbt Semantic Layer [Google Sheets integration](/docs/cloud-integrations/semantic-layer/gsheets) now exposes a note on the cell where the data was requested, indicating clearer data requests. The integration also now exposes a new **Time Range** option, which allows you to quickly select date ranges. +- **Enhancement:** The [GraphQL API](/docs/dbt-cloud-apis/sl-graphql) includes a `requiresMetricTime` parameter to better handle metrics that must be grouped by time. (Certain metrics defined in MetricFlow can't be looked at without a time dimension). +- **Enhancement:** Enable querying metrics with offset and cumulative metrics with the time dimension name, instead of `metric_time`. [Issue #1000](https://github.com/dbt-labs/metricflow/issues/1000) + - Enable querying `metric_time` without metrics. [Issue #928](https://github.com/dbt-labs/metricflow/issues/928) +- **Enhancement:** Added support for consistent SQL query generation, which enables ID generation consistency between otherwise identical MF queries. Previously, the SQL generated by `MetricFlowEngine` was not completely consistent between identical queries. [Issue 1020](https://github.com/dbt-labs/metricflow/issues/1020) +- **Fix:** The Tableau Connector returns a date filter when filtering by dates. Previously it was erroneously returning a timestamp filter. +- **Fix:** MetricFlow now validates if there are `metrics`, `group by`, or `saved_query` items in each query. Previously, there was no validation. [Issue 1002](https://github.com/dbt-labs/metricflow/issues/1002) +- **Fix:** Measures using `join_to_timespine` in MetricFlow now have filters applied correctly after time spine join. +- **Fix:** Querying multiple granularities with offset metrics: + - If you query a time offset metric with multiple instances of `metric_time`/`agg_time_dimension`, only one of the instances will be offset. All of them should be. + - If you query a time offset metric with one instance of `metric_time`/`agg_time_dimension` but filter by a different one, the query will fail. +- **Fix:** MetricFlow prioritizes a candidate join type over the default type when evaluating nodes to join. For example, the default join type for distinct values queries is `FULL OUTER JOIN`, however, time spine joins require `CROSS JOIN`, which is more appropriate. +- **Fix:** Fixed a bug that previously caused errors when entities were referenced in `where` filters. + +## January 2024 + +- + + Hello from the dbt Docs team: @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun! First, we’d like to thank the 10 new community contributors to docs.getdbt.com :pray: What a busy start to the year! We merged 110 PRs in January. + + Here's how we improved the [docs.getdbt.com](http://docs.getdbt.com/) experience: + + - Added new hover behavior for images + - Added new expandables for FAQs + - Pruned outdated notices and snippets as part of the docs site maintenance + + January saw some great new content: + + - New [dbt Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-4-faqs) page + - Beta launch of [Explorer’s column-level lineage](https://docs.getdbt.com/docs/collaborate/column-level-lineage) feature + - Developer blog posts: + - [More time coding, less time waiting: Mastering defer in dbt](https://docs.getdbt.com/blog/defer-to-prod) + - [Deprecation of dbt Server](https://docs.getdbt.com/blog/deprecation-of-dbt-server) + - From the community: [Serverless, free-tier data stack with dlt + dbt core](https://docs.getdbt.com/blog/serverless-dlt-dbt-stack) + - The Extrica team added docs for the [dbt-extrica community adapter](https://docs.getdbt.com/docs/core/connect-data-platform/extrica-setup) + - Semantic Layer: New [conversion metrics docs](https://docs.getdbt.com/docs/build/conversion) and added the parameter `fill_nulls_with` to all metric types (launched the week of January 12, 2024) + - New [dbt environment command](https://docs.getdbt.com/reference/commands/dbt-environment) and its flags for the dbt Cloud CLI + + January also saw some refreshed content, either aligning with new product features or requests from the community: + + - Native support for [partial parsing in dbt Cloud](https://docs.getdbt.com/docs/cloud/account-settings#partial-parsing) + - Updated guidance on using dots or underscores in the [Best practice guide for models](https://docs.getdbt.com/best-practices/how-we-style/1-how-we-style-our-dbt-models) + - Updated [PrivateLink for VCS docs](https://docs.getdbt.com/docs/cloud/secure/vcs-privatelink) + - Added a new `job_runner` role in our [Enterprise project role permissions docs](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions#project-role-permissions) + - Added saved queries to [Metricflow commands](https://docs.getdbt.com/docs/build/metricflow-commands#list-saved-queries) + - Removed [as_text docs](https://github.com/dbt-labs/docs.getdbt.com/pull/4726) that were wildly outdated + + + +- **New:** New metric type that allows you to measure conversion events. For example, users who viewed a web page and then filled out a form. For more details, refer to [Conversion metrics](/docs/build/conversion). +- **New:** Instead of specifying the fully qualified dimension name (for example, `order__user__country`) in the group by or filter expression, you now only need to provide the primary entity and dimensions name, like `user__county`. +- **New:** You can now query the [saved queries](/docs/build/saved-queries) you've defined in the dbt Semantic Layer using [Tableau](/docs/cloud-integrations/semantic-layer/tableau), [GraphQL API](/docs/dbt-cloud-apis/sl-graphql), [JDBC API](/docs/dbt-cloud-apis/sl-jdbc), and the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation). + +- + + By default, dbt parses all the files in your project at the beginning of every dbt invocation. Depending on the size of your project, this operation can take a long time to complete. With the new partial parsing feature in dbt Cloud, you can reduce the time it takes for dbt to parse your project. When enabled, dbt Cloud parses only the changed files in your project instead of parsing all the project files. As a result, your dbt invocations will take less time to run. + + To learn more, refer to [Partial parsing](/docs/cloud/account-settings#partial-parsing). + + + + + +- **Enhancement:** The YAML spec parameter `label` is now available for Semantic Layer metrics in [JDBC and GraphQL APIs](/docs/dbt-cloud-apis/sl-api-overview). This means you can conveniently use `label` as a display name for your metrics when exposing them. +- **Enhancement:** Added support for `create_metric: true` for a measure, which is a shorthand to quickly create metrics. This is useful in cases when metrics are only used to build other metrics. +- **Enhancement:** Added support for Tableau parameter filters. You can use the [Tableau connector](/docs/cloud-integrations/semantic-layer/tableau) to create and use parameters with your dbt Semantic Layer data. +- **Enhancement:** Added support to expose `expr` and `agg` for [Measures](/docs/build/measures) in the [GraphQL API](/docs/dbt-cloud-apis/sl-graphql). +- **Enhancement:** You have improved error messages in the command line interface when querying a dimension that is not reachable for a given metric. +- **Enhancement:** You can now query entities using our Tableau integration (similar to querying dimensions). +- **Enhancement:** A new data source is available in our Tableau integration called "ALL", which contains all semantic objects defined. This has the same information as "METRICS_AND_DIMENSIONS". In the future, we will deprecate "METRICS_AND_DIMENSIONS" in favor of "ALL" for clarity. + +- **Fix:** Support for numeric types with precision greater than 38 (like `BIGDECIMAL`) in BigQuery is now available. Previously, it was unsupported so would return an error. +- **Fix:** In some instances, large numeric dimensions were being interpreted by Tableau in scientific notation, making them hard to use. These should now be displayed as numbers as expected. +- **Fix:** We now preserve dimension values accurately instead of being inadvertently converted into strings. +- **Fix:** Resolved issues with naming collisions in queries involving multiple derived metrics using the same metric input. Previously, this could cause a naming collision. Input metrics are now deduplicated, ensuring each is referenced only once. +- **Fix:** Resolved warnings related to using two duplicate input measures in a derived metric. Previously, this would trigger a warning. Input measures are now deduplicated, enhancing query processing and clarity. +- **Fix:** Resolved an error where referencing an entity in a filter using the object syntax would fail. For example, `{{Entity('entity_name')}}` would fail to resolve. diff --git a/website/docs/docs/dbt-versions/cloud-release-tracks.md b/website/docs/docs/dbt-versions/cloud-release-tracks.md index 89836aa13e6..929b901d1d6 100644 --- a/website/docs/docs/dbt-versions/cloud-release-tracks.md +++ b/website/docs/docs/dbt-versions/cloud-release-tracks.md @@ -16,8 +16,8 @@ By moving your environments and jobs to release tracks you can get all the funct | Release track | Description | Plan availability | API value | | ------------- | ----------- | ----------------- | --------- | -| **Latest**
| Formerly called "Versionless", provides a continuous release of the latest functionality in dbt Cloud. Includes early access to new features of the dbt framework before they're available in open source releases of dbt Core. | All plans | `latest` (or `versionless`) | -| **Compatible** | Provides a monthly release aligned with the most recent open source versions of dbt Core and adapters, plus functionality exclusively available in dbt Cloud. | Team + Enterprise | `compatible` | +| **Latest**
| Formerly called "Versionless", provides a continuous release of the latest functionality in dbt Cloud.

Includes early access to new features of the dbt framework before they're available in open source releases of dbt Core. | All plans | `latest` (or `versionless`) | +| **Compatible** | Provides a monthly release aligned with the most recent open source versions of dbt Core and adapters, plus functionality exclusively available in dbt Cloud.

See [Compatible track changelog](/docs/dbt-versions/compatible-track-changelog) for more information. | Team + Enterprise | `compatible` | | **Extended** | The previous month's "Compatible" release. | Enterprise | `extended` | The first "Compatible" release was on December 12, 2024, after the final release of dbt Core v1.9.0. For December 2024 only, the "Extended" release is the same as "Compatible." Starting in January 2025, "Extended" will be one month behind "Compatible." diff --git a/website/docs/docs/dbt-versions/compatible-track-changelog.md b/website/docs/docs/dbt-versions/compatible-track-changelog.md index a8243e2ceff..348b9f26ce4 100644 --- a/website/docs/docs/dbt-versions/compatible-track-changelog.md +++ b/website/docs/docs/dbt-versions/compatible-track-changelog.md @@ -18,6 +18,48 @@ Starting in January 2025, each monthly "Extended" release will match the previou For more information, see [release tracks](/docs/dbt-versions/cloud-release-tracks). +## January 2025 + +Release date: January 14, 2025 + +This release includes functionality from the following versions of dbt Core OSS: +``` +dbt-core==1.9.1 + +# shared interfaces +dbt-adapters==1.13.1 +dbt-common==1.14.0 +dbt-semantic-interfaces==0.7.4 + +# adapters +dbt-athena==1.9.0 +dbt-bigquery==1.9.1 +dbt-databricks==1.9.1 +dbt-fabric==1.9.0 +dbt-postgres==1.9.0 +dbt-redshift==1.9.0 +dbt-snowflake==1.9.0 +dbt-spark==1.9.0 +dbt-synapse==1.8.2 +dbt-teradata==1.9.0 +dbt-trino==1.9.0 +``` + +Changelogs: +- [dbt-core 1.9.1](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md#dbt-core-191---december-16-2024) +- [dbt-adapters 1.13.1](https://github.com/dbt-labs/dbt-adapters/blob/main/CHANGELOG.md#dbt-adapters-1131---january-10-2025) +- [dbt-common 1.14.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md) +- [dbt-bigquery 1.9.1](https://github.com/dbt-labs/dbt-bigquery/blob/1.9.latest/CHANGELOG.md#dbt-bigquery-191---january-10-2025) +- [dbt-databricks 1.9.1](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-191-december-16-2024) +- [dbt-fabric 1.9.0](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.0) +- [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) +- [dbt-redshift 1.9.0](https://github.com/dbt-labs/dbt-redshift/blob/1.9.latest/CHANGELOG.md#dbt-redshift-190---december-09-2024) +- [dbt-snowflake 1.9.0](https://github.com/dbt-labs/dbt-snowflake/blob/1.9.latest/CHANGELOG.md#dbt-snowflake-190---december-09-2024) +- [dbt-spark 1.9.0](https://github.com/dbt-labs/dbt-spark/blob/1.9.latest/CHANGELOG.md#dbt-spark-190---december-10-2024) +- [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) +- [dbt-teradata 1.9.0](https://github.com/Teradata/dbt-teradata/releases/tag/v1.9.0) +- [dbt-trino 1.9.0](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-190---december-20-2024) + ## December 2024 Release date: December 12, 2024 diff --git a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md index 2a4a9d96528..6009fc4c73a 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md +++ b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md @@ -103,7 +103,8 @@ You can read more about each of these behavior changes in the following links: ### Snowflake -- Iceberg Table Format support will be available on three out-of-the-box materializations: table, incremental, dynamic tables. +- Iceberg Table Format — Support will be available on three out-of-the-box materializations: table, incremental, dynamic tables. +- Breaking change — When upgrading from dbt 1.8 to 1.9 `{{ target.account }}` replaces underscores with dashes. For example, if the `target.account` is set to `sample_company`, then the compiled code now generates `sample-company`. [Refer to the `dbt-snowflake` issue](https://github.com/dbt-labs/dbt-snowflake/issues/1286) for more information. ### Bigquery diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index 369511aae8e..bda22baa3ab 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -14,440 +14,19 @@ dbt Cloud release notes for recent and historical changes. Release notes fall in - **Fix:** Bug and security fixes - **Behavior change:** A change to existing behavior that doesn't fit into the other categories, such as feature deprecations or changes to default settings -Release notes are grouped by month for both multi-tenant and virtual private cloud (VPC)\* environments - -\* The official release date for this new format of release notes is May 15th, 2024. Historical release notes for prior dates may not reflect all available features released earlier this year or their tenancy availability. - -## December 2024 - -- **New**: Saved queries now support [tags](/reference/resource-configs/tags), which allow you to categorize your resources and filter them. Add tags to your [saved queries](/docs/build/saved-queries) in the `semantic_model.yml` file or `dbt_project.yml` file. For example: - - - ```yml - [saved-queries](/docs/build/saved-queries): - jaffle_shop: - customer_order_metrics: - +tags: order_metrics - ``` - -- **New**: [Dimensions](/reference/resource-configs/meta) now support the `meta` config property in [dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and from dbt Core 1.9. You can add metadata to your dimensions to provide additional context and information about the dimension. Refer to [meta](/reference/resource-configs/meta) for more information. -- **New**: [Auto exposures](/docs/collaborate/auto-exposures) are now generally available to dbt Cloud Enterprise plans. Auto-exposures integrate natively with Tableau (Power BI coming soon) and auto-generate downstream lineage in dbt Explorer for a richer experience. -- **New**: The dbt Semantic Layer supports Sigma as a [partner integration](/docs/cloud-integrations/avail-sl-integrations), available in Preview. Refer to [Sigma](https://help.sigmacomputing.com/docs/configure-a-dbt-semantic-layer-integration) for more information. -- **New**: The dbt Semantic Layer now supports Azure Single-tenant deployments. Refer to [Set up the dbt Semantic Layer](/docs/use-dbt-semantic-layer/setup-sl) for more information on how to get started. -- **Fix**: Resolved intermittent issues in Single-tenant environments affecting Semantic Layer and query history. -- **Fix**: [The dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl) now respects the BigQuery [`execution_project` attribute](/docs/core/connect-data-platform/bigquery-setup#execution-project), including for exports. -- **New**: [Model notifications](/docs/deploy/model-notifications) are now generally available in dbt Cloud. These notifications alert model owners through email about any issues encountered by models and tests as soon as they occur while running a job. -- **New**: You can now use your [Azure OpenAI key](/docs/cloud/account-integrations?ai-integration=azure#ai-integrations) (available in beta) to use dbt Cloud features like [dbt Copilot](/docs/cloud/dbt-copilot) and [Ask dbt](/docs/cloud-integrations/snowflake-native-app) . Additionally, you can use your own [OpenAI API key](/docs/cloud/account-integrations?ai-integration=openai#ai-integrations) or use [dbt Labs-managed OpenAI](/docs/cloud/account-integrations?ai-integration=dbtlabs#ai-integrations) key. Refer to [AI integrations](/docs/cloud/account-integrations#ai-integrations) for more information. -- **New**: The [`hard_deletes`](/reference/resource-configs/hard-deletes) config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table. - -## November 2024 - -- **Enhancement**: Data health signals in dbt Explorer are now available for Exposures, providing a quick view of data health while browsing resources. To view trust signal icons, go to dbt Explorer and click **Exposures** under the **Resource** tab. Refer to [Data health signals for resources](/docs/collaborate/data-health-signals) for more info. -- **Bug**: Identified and fixed an error with Semantic Layer queries that take longer than 10 minutes to complete. -- **Fix**: Job environment variable overrides in credentials are now respected for Exports. Previously, they were ignored. -- **Behavior change**: If you use a custom microbatch macro, set a [`require_batched_execution_for_custom_microbatch_strategy` behavior flag](/reference/global-configs/behavior-changes#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the [microbatch strategy](/docs/build/incremental-microbatch#how-microbatch-compares-to-other-incremental-strategies). -- **Enhancement**: For users that have Advanced CI's [compare changes](/docs/deploy/advanced-ci#compare-changes) feature enabled, you can optimize performance when running comparisons by using custom dbt syntax to customize deferral usage, exclude specific large models (or groups of models with tags), and more. Refer to [Compare changes custom commands](/docs/deploy/job-commands#compare-changes-custom-commands) for examples of how to customize the comparison command. -- **New**: SQL linting in CI jobs is now generally available in dbt Cloud. You can enable SQL linting in your CI jobs, using [SQLFluff](https://sqlfluff.com/), to automatically lint all SQL files in your project as a run step before your CI job builds. SQLFluff linting is available on [dbt Cloud release tracks](/docs/dbt-versions/cloud-release-tracks) and to dbt Cloud [Team or Enterprise](https://www.getdbt.com/pricing/) accounts. Refer to [SQL linting](/docs/deploy/continuous-integration#sql-linting) for more information. -- **New**: Use the [`dbt_valid_to_current`](/reference/resource-configs/dbt_valid_to_current) config to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. This feature is available in [the dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) (formerly called `Versionless`) and dbt Core v1.9 and later. -- **New**: Use the [`event_time`](/reference/resource-configs/event-time) configuration to specify "at what time did the row occur." This configuration is required for [Incremental microbatch](/docs/build/incremental-microbatch) and can be added to ensure you're comparing overlapping times in [Advanced CI's compare changes](/docs/deploy/advanced-ci). Available in [the dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) (formerly called `Versionless`) and dbt Core v1.9 and higher. -- **Fix**: This update improves [dbt Semantic Layer Tableau integration](/docs/cloud-integrations/semantic-layer/tableau) making query parsing more reliable. Some key fixes include: - - Error messages for unsupported joins between saved queries and ALL tables. - - Improved handling of queries when multiple tables are selected in a data source. - - Fixed a bug when an IN filter contained a lot of values. - - Better error messaging for queries that can't be parsed correctly. -- **Enhancement**: The dbt Semantic Layer supports creating new credentials for users who don't have permissions to create service tokens. In the **Credentials & service tokens** side panel, the **+Add Service Token** option is unavailable for those users who don't have permission. Instead, the side panel displays a message indicating that the user doesn't have permission to create a service token and should contact their administration. Refer to [Set up dbt Semantic Layer](/docs/use-dbt-semantic-layer/setup-sl) for more details. - - -## October 2024 - - - - Documentation for new features and functionality announced at Coalesce 2024: - - - Iceberg table support for [Snowflake](https://docs.getdbt.com/reference/resource-configs/snowflake-configs#iceberg-table-format) - - [Athena](https://docs.getdbt.com/reference/resource-configs/athena-configs) and [Teradata](https://docs.getdbt.com/reference/resource-configs/teradata-configs) adapter support in dbt Cloud - - dbt Cloud now hosted on [Azure](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses) - - Get comfortable with [dbt Cloud Release Tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks) that keep your project up-to-date, automatically — on a cadence appropriate for your team - - Scalable [microbatch incremental models](https://docs.getdbt.com/docs/build/incremental-microbatch) - - Advanced CI [features](https://docs.getdbt.com/docs/deploy/advanced-ci) - - [Linting with CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration#sql-linting) - - dbt Assist is now [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot) - - Developer blog on [Snowflake Feature Store and dbt: A bridge between data pipelines and ML](https://docs.getdbt.com/blog/snowflake-feature-store) - - New [Quickstart for dbt Cloud CLI](https://docs.getdbt.com/guides/dbt-cloud-cli?step=1) - - [Auto-exposures with Tableau](https://docs.getdbt.com/docs/collaborate/auto-exposures) - - Semantic Layer integration with [Excel desktop and M365](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel) - - [Data health tiles](https://docs.getdbt.com/docs/collaborate/data-tile) - - [Semantic Layer and Cloud IDE integration](https://docs.getdbt.com/docs/build/metricflow-commands#metricflow-commands) - - Query history in [Explorer](https://docs.getdbt.com/docs/collaborate/model-query-history#view-query-history-in-explorer) - - Semantic Layer Metricflow improvements, including [improved granularity and custom calendar](https://docs.getdbt.com/docs/build/metricflow-time-spine#custom-calendar) - - [Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python) is now generally available - - - -- **Behavior change:** [Multi-factor authentication](/docs/cloud/manage-access/mfa) is now enforced on all users who log in with username and password credentials. -- **Enhancement**: The dbt Semantic Layer JDBC now allows users to paginate `semantic_layer.metrics()` and `semantic_layer.dimensions()` for metrics and dimensions using `page_size` and `page_number` parameters. Refer to [Paginate metadata calls](/docs/dbt-cloud-apis/sl-jdbc#querying-the-api-for-metric-metadata) for more information. -- **Enhancement**: The dbt Semantic Layer JDBC now allows you to filter your metrics to include only those that contain a specific substring, using the `search` parameter. If no substring is provided, the query returns all metrics. Refer to [Fetch metrics by substring search](/docs/dbt-cloud-apis/sl-jdbc#querying-the-api-for-metric-metadata) for more information. -- **Fix**: The [dbt Semantic Layer Excel integration](/docs/cloud-integrations/semantic-layer/excel) now correctly surfaces errors when a query fails to execute. Previously, it was not clear why a query failed to run. -- **Fix:** Previously, POST requests to the Jobs API with invalid `cron` strings would return HTTP response status code 500s but would update the underlying entity. Now, POST requests to the Jobs API with invalid `cron` strings will result in status code 400s, without the underlying entity being updated. -- **Fix:** Fixed an issue where the `Source` view page in dbt Explorer did not correctly display source freshness status if older than 30 days. -- **Fix:** The UI now indicates when the description of a model is inherited from a catalog comment. -- **Behavior change:** User API tokens have been deprecated. Update to [personal access tokens](/docs/dbt-cloud-apis/user-tokens) if you have any still in use. -- **New**: The dbt Cloud IDE supports signed commits for Git, available for Enterprise plans. You can sign your Git commits when pushing them to the repository to prevent impersonation and enhance security. Supported Git providers are GitHub and GitLab. Refer to [Git commit signing](/docs/cloud/dbt-cloud-ide/git-commit-signing.md) for more information. -- **New:** With dbt Mesh, you can now enable bidirectional dependencies across your projects. Previously, dbt enforced dependencies to only go in one direction. dbt checks for cycles across projects and raises errors if any are detected. For details, refer to [Cycle detection](/docs/collaborate/govern/project-dependencies#cycle-detection). There's also the [Intro to dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro) guide to help you learn more best practices. -- **New**: The [dbt Semantic Layer Python software development kit](/docs/dbt-cloud-apis/sl-python) is now [generally available](/docs/dbt-versions/product-lifecycles). It provides users with easy access to the dbt Semantic Layer with Python and enables developers to interact with the dbt Semantic Layer APIs to query metrics/dimensions in downstream tools. -- **Enhancement**: You can now add a description to a singular data test. Use the [`description` property](/reference/resource-properties/description) to document [singular data tests](/docs/build/data-tests#singular-data-tests). You can also use [docs block](/docs/build/documentation#using-docs-blocks) to capture your test description. The enhancement is available now in [the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks), and it will be included in dbt Core v1.9. -- **New**: Introducing the [microbatch incremental model strategy](/docs/build/incremental-microbatch) (beta), available now in [dbt Cloud Latest](/docs/dbt-versions/cloud-release-tracks) and will soon be supported in dbt Core v1.9. The microbatch strategy allows for efficient, batch-based processing of large time-series datasets for improved performance and resiliency, especially when you're working with data that changes over time (like new records being added daily). To enable this feature in dbt Cloud, set the `DBT_EXPERIMENTAL_MICROBATCH` environment variable to `true` in your project. -- **New**: The dbt Semantic Layer supports custom calendar configurations in MetricFlow, available in [Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). Custom calendar configurations allow you to query data using non-standard time periods like `fiscal_year` or `retail_month`. Refer to [custom calendar](/docs/build/metricflow-time-spine#custom-calendar) to learn how to define these custom granularities in your MetricFlow timespine YAML configuration. -- **New**: In the "Latest" release track in dbt Cloud, [Snapshots](/docs/build/snapshots) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9. - - Who does this affect? Users of the "Latest" release track in dbt Cloud can define snapshots using the new YAML specification. Users upgrading to "Latest" who have existing snapshot definitions can keep their existing configurations, or they can choose to migrate their snapshot definitions to YAML. - - Users on older versions: No action is needed; existing snapshots will continue to work as before. However, we recommend upgrading to the "Latest" release track to take advantage of the new snapshot features. -- **Behavior change:** Set [`state_modified_compare_more_unrendered_values`](/reference/global-configs/behavior-changes#source-definitions-for-state) to true to reduce false positives for `state:modified` when configs differ between `dev` and `prod` environments. -- **Behavior change:** Set the [`skip_nodes_if_on_run_start_fails`](/reference/global-configs/behavior-changes#failures-in-on-run-start-hooks) flag to `True` to skip all selected resources from running if there is a failure on an `on-run-start` hook. -- **Enhancement**: In the "Latest" release track in dbt Cloud, snapshots defined in SQL files can now use `config` defined in `schema.yml` YAML files. This update resolves the previous limitation that required snapshot properties to be defined exclusively in `dbt_project.yml` and/or a `config()` block within the SQL file. This will also be released in dbt Core 1.9. -- **New**: In the "Latest" release track in dbt Cloud, the `snapshot_meta_column_names` config allows for customizing the snapshot metadata columns. This feature allows an organization to align these automatically-generated column names with their conventions, and will be included in the upcoming dbt Core 1.9 release. -- **Enhancement**: the "Latest" release track in dbt Cloud infers a model's `primary_key` based on configured data tests and/or constraints within `manifest.json`. The inferred `primary_key` is visible in dbt Explorer and utilized by the dbt Cloud [compare changes](/docs/deploy/run-visibility#compare-tab) feature. This will also be released in dbt Core 1.9. Read about the [order dbt infers columns can be used as primary key of a model](https://github.com/dbt-labs/dbt-core/blob/7940ad5c7858ff11ef100260a372f2f06a86e71f/core/dbt/contracts/graph/nodes.py#L534-L541). -- **New:** dbt Explorer now includes trust signal icons, which is currently available as a [Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). Trust signals offer a quick, at-a-glance view of data health when browsing your dbt models in Explorer. These icons indicate whether a model is **Healthy**, **Caution**, **Degraded**, or **Unknown**. For accurate health data, ensure the resource is up-to-date and has had a recent job run. Refer to [Data health signals](/docs/collaborate/data-health-signals) for more information. -- **New:** Auto exposures are now available in Preview in dbt Cloud. Auto-exposures helps users understand how their models are used in downstream analytics tools to inform investments and reduce incidents. It imports and auto-generates exposures based on Tableau dashboards, with user-defined curation. To learn more, refer to [Auto exposures](/docs/collaborate/auto-exposures). - - -## September 2024 - -- **Fix**: MetricFlow updated `get_and_expire` to replace the unsupported `GETEX` command with a `GET` and conditional expiration, ensuring compatibility with Azure Redis 6.0. -- **Enhancement**: The [dbt Semantic Layer Python SDK](/docs/dbt-cloud-apis/sl-python) now supports `TimeGranularity` custom grain for metrics. This feature allows you to define custom time granularities for metrics, such as `fiscal_year` or `retail_month`, to query data using non-standard time periods. -- **New**: Use the dbt Copilot AI engine to generate semantic model for your models, now available in beta. dbt Copilot automatically generates documentation, tests, and now semantic models based on the data in your model, . To learn more, refer to [dbt Copilot](/docs/cloud/dbt-copilot). -- **New**: Use the new recommended syntax for [defining `foreign_key` constraints](/reference/resource-properties/constraints) using `refs`, available in the "Latest" release track in dbt Cloud. This will soon be released in dbt Core v1.9. This new syntax will capture dependencies and works across different environments. -- **Enhancement**: You can now run [Semantic Layer commands](/docs/build/metricflow-commands) commands in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). The supported commands are `dbt sl list`, `dbt sl list metrics`, `dbt sl list dimension-values`, `dbt sl list saved-queries`, `dbt sl query`, `dbt sl list dimensions`, `dbt sl list entities`, and `dbt sl validate`. -- **New**: Microsoft Excel, a dbt Semantic Layer integration, is now generally available. The integration allows you to connect to Microsoft Excel to query metrics and collaborate with your team. Available for [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) or [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True). For more information, refer to [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). -- **New**: [Data health tile](/docs/collaborate/data-tile) is now generally available in dbt Explorer. Data health tiles provide a quick at-a-glance view of your data quality, highlighting potential issues in your data. You can embed these tiles in your dashboards to quickly identify and address data quality issues in your dbt project. -- **New**: dbt Explorer's Model query history feature is now in Preview for dbt Cloud Enterprise customers. Model query history allows you to view the count of consumption queries for a model based on the data warehouse's query logs. This feature provides data teams insight, so they can focus their time and infrastructure spend on the worthwhile used data products. To learn more, refer to [Model query history](/docs/collaborate/model-query-history). -- **Enhancement**: You can now use [Extended Attributes](/docs/dbt-cloud-environments#extended-attributes) and [Environment Variables](/docs/build/environment-variables) when connecting to the Semantic Layer. If you set a value directly in the Semantic Layer Credentials, it will have a higher priority than Extended Attributes. When using environment variables, the default value for the environment will be used. If you're using exports, job environment variable overrides aren't supported yet, but they will be soon. -- **New:** There are two new [environment variable defaults](/docs/build/environment-variables#dbt-cloud-context) — `DBT_CLOUD_ENVIRONMENT_NAME` and `DBT_CLOUD_ENVIRONMENT_TYPE`. -- **New:** The [Amazon Athena warehouse connection](/docs/cloud/connect-data-platform/connect-amazon-athena) is available as a public preview for dbt Cloud accounts that have upgraded to [the "Latest" release track](/docs/dbt-versions/cloud-release-tracks). - -## August 2024 - -- **Fix:** Fixed an issue in [dbt Explorer](/docs/collaborate/explore-projects) where navigating to a consumer project from a public node resulted in displaying a random public model rather than the original selection. -- **New**: You can now configure metrics at granularities at finer time grains, such as hour, minute, or even by the second. This is particularly useful for more detailed analysis and for datasets where high-resolution time data is required, such as minute-by-minute event tracking. Refer to [dimensions](/docs/build/dimensions) for more information about time granularity. -- **Enhancement**: Microsoft Excel now supports [saved selections](/docs/cloud-integrations/semantic-layer/excel#using-saved-selections) and [saved queries](/docs/cloud-integrations/semantic-layer/excel#using-saved-queries). Use Saved selections to save your query selections within the Excel application. The application also clears stale data in [trailing rows](/docs/cloud-integrations/semantic-layer/excel#other-settings) by default. To return your results and keep any previously selected data intact, un-select the **Clear trailing rows** option. -- **Behavior change:** GitHub is no longer supported for OAuth login to dbt Cloud. Use a supported [SSO or OAuth provider](/docs/cloud/manage-access/sso-overview) to securely manage access to your dbt Cloud account. - -## July 2024 -- **Behavior change:** `target_schema` is no longer a required configuration for [snapshots](/docs/build/snapshots). You can now target different schemas for snapshots across development and deployment environments using the [schema config](/reference/resource-configs/schema). -- **New:** [Connections](/docs/cloud/connect-data-platform/about-connections#connection-management) are now available under **Account settings** as a global setting. Previously, they were found under **Project settings**. This is being rolled out in phases over the coming weeks. -- **New:** Admins can now assign [environment-level permissions](/docs/cloud/manage-access/environment-permissions) to groups for specific roles. -- **New:** [Merge jobs](/docs/deploy/merge-jobs) for implementing [continuous deployment (CD)](/docs/deploy/continuous-deployment) workflows are now GA in dbt Cloud. Previously, you had to either set up a custom GitHub action or manually build the changes every time a pull request is merged. -- **New**: The ability to lint your SQL files from the dbt Cloud CLI is now available. To learn more, refer to [Lint SQL files](/docs/cloud/configure-cloud-cli#lint-sql-files). -- **Behavior change:** dbt Cloud IDE automatically adds a `--limit 100` to preview queries to avoid slow and expensive queries during development. Recently, dbt Core changed how the `limit` is applied to ensure that `order by` clauses are consistently respected. Because of this, queries that already contain a limit clause might now cause errors in the IDE previews. To address this, dbt Labs plans to provide an option soon to disable the limit from being applied. Until then, dbt Labs recommends removing the (duplicate) limit clause from your queries during previews to avoid these IDE errors. - -- **Enhancement**: Introducing a revamped overview page for dbt Explorer, available in beta. It includes a new design and layout for the Explorer homepage. The new layout provides a more intuitive experience for users to navigate their dbt projects, as well as a new **Latest updates** section to view the latest changes or issues related to project resources. To learn more, refer to [Overview page](/docs/collaborate/explore-projects#overview-page). - -#### dbt Semantic Layer -- **New**: Introduced the [`dbt-sl-sdk` Python software development kit (SDK)](https://github.com/dbt-labs/semantic-layer-sdk-python) Python library, which provides you with easy access to the dbt Semantic Layer with Python. It allows developers to interact with the dbt Semantic Layer APIs and query metrics and dimensions in downstream tools. Refer to the [dbt Semantic Layer Python SDK](/docs/dbt-cloud-apis/sl-python) for more information. -- **New**: Introduced Semantic validations in CI pipelines. Automatically test your semantic nodes (metrics, semantic models, and saved queries) during code reviews by adding warehouse validation checks in your CI job using the `dbt sl validate` command. You can also validate modified semantic nodes to guarantee code changes made to dbt models don't break these metrics. Refer to [Semantic validations in CI](/docs/deploy/ci-jobs#semantic-validations-in-ci) to learn about the additional commands and use cases. -- **New**: We now expose the `meta` field within the [config property](/reference/resource-configs/meta) for dbt Semantic Layer metrics in the [JDBC and GraphQL APIs](/docs/dbt-cloud-apis/sl-api-overview) under the `meta` field. -- **New**: Added a new command in the dbt Cloud CLI called `export-all`, which allows you to export multiple or all of your saved queries. Previously, you had to explicitly specify the [list of saved queries](/docs/build/metricflow-commands#list-saved-queries). -- **Enhancement**: The dbt Semantic Layer now offers more granular control by supporting multiple data platform credentials, which can represent different roles or service accounts. Available for dbt Cloud Enterprise plans, you can map credentials to service tokens for secure authentication. Refer to [Set up dbt Semantic Layer](/docs/use-dbt-semantic-layer/setup-sl#set-up-dbt-semantic-layer) for more details. -- **Fix**: Addressed a bug where unicode query filters (such as Chinese characters) were not working correctly in the dbt Semantic Layer Tableau integration. -- **Fix**: Resolved a bug with parsing certain private keys for BigQuery when running an export. -- **Fix**: Addressed a bug that caused a "closed connection" error to be returned when querying or running an Export. -- **Fix**: Resolved an issue in dbt Core where, during partial parsing, all generated metrics in a file were incorrectly deleted instead of just those related to the changed semantic model. Now, only the metrics associated with the modified model are affected. - -## June 2024 -- **New:** Introduced new granularity support for cumulative metrics in MetricFlow. Granularity options for cumulative metrics are slightly different than granularity for other metric types. For other metrics, we use the `date_trunc` function to implement granularity. However, because cumulative metrics are non-additive (values can't be added up), we can't use the `date_trunc` function to change their time grain granularity. - - Instead, we use the `first()`, `last()`, and `avg()` aggregation functions to aggregate cumulative metrics over the requested period. By default, we take the first value of the period. You can change this behavior by using the `period_agg` parameter. For more information, refer to [Granularity options for cumulative metrics](/docs/build/cumulative#granularity-options). - -#### dbt Semantic Layer -- **New:** Added support for SQL optimization in MetricFlow. We will now push down categorical dimension filters to the metric source table. Previously filters were applied after we selected from the metric source table. This change helps reduce full table scans on certain query engines. -- **New:** Enabled `where` filters on dimensions (included in saved queries) to use the cache during query time. This means you can now dynamically filter your dashboards without losing the performance benefits of caching. Refer to [caching](/docs/use-dbt-semantic-layer/sl-cache#result-caching) for more information. -- **Enhancement:** In [Google Sheets](/docs/cloud-integrations/semantic-layer/gsheets), we added information icons and descriptions to metrics and dimensions options in the Query Builder menu. Click on the **Info** icon button to view a description of the metric or dimension. Available in the following Query Builder menu sections: metric, group by, where, saved selections, and saved queries. -- **Enhancement:** In [Google Sheets](/docs/cloud-integrations/semantic-layer/gsheets), you can now apply granularity to all time dimensions, not just metric time. This update uses our [APIs](/docs/dbt-cloud-apis/sl-api-overview) to support granularity selection on any chosen time dimension. -- **Enhancement**: MetricFlow time spine warnings now prompt users to configure missing or small-grain-time spines. An error message is displayed for multiple time spines per granularity. -- **Enhancement**: Errors now display if no time spine is configured at the requested or smaller granularity. -- **Enhancement:** Improved querying error message when no semantic layer credentials were set. -- **Enhancement:** Querying grains for cumulative metrics now returns multiple granularity options (day, week, month, quarter, year) like all other metric types. Previously, you could only query one grain option for cumulative metrics. -- **Fix:** Removed errors that prevented querying cumulative metrics with other granularities. -- **Fix:** Fixed various Tableau errors when querying certain metrics or when using calculated fields. -- **Fix:** In Tableau, we relaxed naming field expectations to better identify calculated fields. -- **Fix:** Fixed an error when refreshing database metadata for columns that we can't convert to Arrow. These columns will now be skipped. This mainly affected Redshift users with custom types. -- **Fix:** Fixed Private Link connections for Databricks. - -#### Also available this month: - -- **Enhancement:** Updates to the UI when [creating merge jobs](/docs/deploy/merge-jobs) are now available. The updates include improvements to helper text, new deferral settings, and performance improvements. -- **New**: The dbt Semantic Layer now offers a seamless integration with Microsoft Excel, available in [preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). Build semantic layer queries and return data on metrics directly within Excel, through a custom menu. To learn more and install the add-on, check out [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). -- **New:** [Job warnings](/docs/deploy/job-notifications) are now GA. Previously, you could receive email or Slack alerts about your jobs when they succeeded, failed, or were canceled. Now with the new **Warns** option, you can also receive alerts when jobs have encountered warnings from tests or source freshness checks during their run. This gives you more flexibility on _when_ to be notified. -- **New:** A [preview](/docs/dbt-versions/product-lifecycles#dbt-cloud) of the dbt Snowflake Native App is now available. With this app, you can access dbt Explorer, the **Ask dbt** chatbot, and orchestration observability features, extending your dbt Cloud experience into the Snowflake UI. To learn more, check out [About the dbt Snowflake Native App](/docs/cloud-integrations/snowflake-native-app) and [Set up the dbt Snowflake Native App](/docs/cloud-integrations/set-up-snowflake-native-app). - -## May 2024 - -- **Enhancement:** We've now introduced a new **Prune branches** [Git button](/docs/cloud/dbt-cloud-ide/ide-user-interface#prune-branches-modal) in the dbt Cloud IDE. This button allows you to delete local branches that have been deleted from the remote repository, keeping your branch management tidy. Available in all regions now and will be released to single tenant accounts during the next release cycle. - -#### dbt Cloud Launch Showcase event - -The following features are new or enhanced as part of our [dbt Cloud Launch Showcase](https://www.getdbt.com/resources/webinars/dbt-cloud-launch-showcase) event on May 14th, 2024: - -- **New:** [dbt Copilot](/docs/cloud/dbt-copilot) is a powerful AI engine helping you generate documentation, tests, and semantic models, saving you time as you deliver high-quality data. Available in private beta for a subset of dbt Cloud Enterprise users and in the dbt Cloud IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. - -- **New:** The new low-code editor, now in private beta, enables less SQL-savvy analysts to create or edit dbt models through a visual, drag-and-drop experience inside of dbt Cloud. These models compile directly to SQL and are indistinguishable from other dbt models in your projects: they are version-controlled, can be accessed across projects in dbt Mesh, and integrate with dbt Explorer and the Cloud IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. - -- **New:** [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) is now Generally Available (GA) to all users. The dbt Cloud CLI is a command-line interface that allows you to interact with dbt Cloud, use automatic deferral, leverage dbt Mesh, and more! - -- **New:** The VS Code extension [Power user for dbt Core and dbt Cloud](https://docs.myaltimate.com/arch/beta/) is now available in beta for [dbt Cloud CLI](https://docs.myaltimate.com/setup/reqdConfigCloud/) users. The extension accelerates dbt and SQL development and includes features such as generating models from your source definitions or SQL, and [more](https://docs.myaltimate.com/)! - -- **New:** [Unit tests](/docs/build/unit-tests) are now GA in dbt Cloud. Unit tests enable you to test your SQL model logic against a set of static inputs. - -- - - Native support in dbt Cloud for Azure Synapse Analytics is now available as a [preview](/docs/dbt-versions/product-lifecycles#dbt-cloud)! - - To learn more, refer to [Connect Azure Synapse Analytics](/docs/cloud/connect-data-platform/connect-azure-synapse-analytics) and [Microsoft Azure Synapse DWH configurations](/reference/resource-configs/azuresynapse-configs). - - Also, check out the [Quickstart for dbt Cloud and Azure Synapse Analytics](/guides/azure-synapse-analytics?step=1). The guide walks you through: - - - Loading the Jaffle Shop sample data (provided by dbt Labs) into Azure Synapse Analytics. - - Connecting dbt Cloud to Azure Synapse Analytics. - - Turning a sample query into a model in your dbt project. A model in dbt is a SELECT statement. - - Adding tests to your models. - - Documenting your models. - - Scheduling a job to run. - - - -- **New:** MetricFlow enables you to now add metrics as dimensions to your metric filters to create more complex metrics and gain more insights. Available for all dbt Cloud Semantic Layer users. - -- **New:** [Staging environment](/docs/deploy/deploy-environments#staging-environment) is now GA. Use staging environments to grant developers access to deployment workflows and tools while controlling access to production data. Available to all dbt Cloud users. - -- **New:** Oauth login support via [Databricks](/docs/cloud/manage-access/set-up-databricks-oauth) is now GA to Enterprise customers. - -- - - dbt Explorer's current capabilities — including column-level lineage, model performance analysis, and project recommendations — are now Generally Available for dbt Cloud Enterprise and Teams plans. With Explorer, you can more easily navigate your dbt Cloud project – including models, sources, and their columns – to gain a better understanding of its latest production or staging state. - - To learn more about its features, check out: - - - [Explore projects](/docs/collaborate/explore-projects) - - [Explore multiple projects](/docs/collaborate/explore-multiple-projects) - - [Column-level lineage](/docs/collaborate/column-level-lineage) - - [Model performance](/docs/collaborate/model-performance) - - [Project recommendations](/docs/collaborate/project-recommendations) - - - -- **New:** Native support for Microsoft Fabric in dbt Cloud is now GA. This feature is powered by the [dbt-fabric](https://github.com/Microsoft/dbt-fabric) adapter. To learn more, refer to [Connect Microsoft Fabric](/docs/cloud/connect-data-platform/connect-microsoft-fabric) and [Microsoft Fabric DWH configurations](/reference/resource-configs/fabric-configs). There's also a [quickstart guide](https://docs.getdbt.com/guides/microsoft-fabric?step=1) to help you get started. - -- **New:** dbt Mesh is now GA to dbt Cloud Enterprise users. dbt Mesh is a framework that helps organizations scale their teams and data assets effectively. It promotes governance best practices and breaks large projects into manageable sections. Get started with dbt Mesh by reading the [dbt Mesh quickstart guide](https://docs.getdbt.com/guides/mesh-qs?step=1). - -- **New:** The dbt Semantic Layer [Tableau Desktop, Tableau Server](/docs/cloud-integrations/semantic-layer/tableau), and [Google Sheets integration](/docs/cloud-integrations/semantic-layer/gsheets) is now GA to dbt Cloud Team or Enterprise accounts. These first-class integrations allow you to query and unlock valuable insights from your data ecosystem. - -- **Enhancement:** As part of our ongoing commitment to improving the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud#considerations), the filesystem now comes with improvements to speed up dbt development, such as introducing a Git repository limit of 10GB. - -#### Also available this month: - -- **Update**: The [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) is now available for Azure single tenant and is accessible in all [deployment regions](/docs/cloud/about-cloud/access-regions-ip-addresses) for both multi-tenant and single-tenant accounts. - -- **New**: The [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl) introduces [declarative caching](/docs/use-dbt-semantic-layer/sl-cache), allowing you to cache common queries to speed up performance and reduce query compute costs. Available for dbt Cloud Team or Enterprise accounts. - -- - - The **Latest** Release Track is now Generally Available (previously Public Preview). - - On this release track, you get automatic upgrades of dbt, including early access to the latest features, fixes, and performance improvements for your dbt project. dbt Labs will handle upgrades behind-the-scenes, as part of testing and redeploying the dbt Cloud application — just like other dbt Cloud capabilities and other SaaS tools that you're using. No more manual upgrades and no more need for _a second sandbox project_ just to try out new features in development. - - To learn more about the new setting, refer to [Release Tracks](/docs/dbt-versions/cloud-release-tracks) for details. - - - - - -- **Behavior change:** Introduced the `require_resource_names_without_spaces` flag, opt-in and disabled by default. If set to `True`, dbt will raise an exception if it finds a resource name containing a space in your project or an installed package. This will become the default in a future version of dbt. Read [No spaces in resource names](/reference/global-configs/behavior-changes#no-spaces-in-resource-names) for more information. - -## April 2024 - -- - - You can now set up a continuous deployment (CD) workflow for your projects natively in dbt Cloud. You can now access a beta release of [Merge jobs](/docs/deploy/merge-jobs), which is a new [job type](/docs/deploy/jobs), that enables you to trigger dbt job runs as soon as changes (via Git pull requests) merge into production. - - - - - -- **Behavior change:** Introduced the `require_explicit_package_overrides_for_builtin_materializations` flag, opt-in and disabled by default. If set to `True`, dbt will only use built-in materializations defined in the root project or within dbt, rather than implementations in packages. This will become the default in May 2024 (dbt Core v1.8 and dbt Cloud release tracks). Read [Package override for built-in materialization](/reference/global-configs/behavior-changes#package-override-for-built-in-materialization) for more information. - -**dbt Semantic Layer** -- **New**: Use Saved selections to [save your query selections](/docs/cloud-integrations/semantic-layer/gsheets#using-saved-selections) within the [Google Sheets application](/docs/cloud-integrations/semantic-layer/gsheets). They can be made private or public and refresh upon loading. -- **New**: Metrics are now displayed by their labels as `metric_name`. -- **Enhancement**: [Metrics](/docs/build/metrics-overview) now supports the [`meta` option](/reference/resource-configs/meta) under the [config](/reference/resource-properties/config) property. Previously, we only supported the now deprecated `meta` tag. -- **Enhancement**: In the Google Sheets application, we added [support](/docs/cloud-integrations/semantic-layer/gsheets#using-saved-queries) to allow jumping off from or exploring MetricFlow-defined saved queries directly. -- **Enhancement**: In the Google Sheets application, we added support to query dimensions without metrics. Previously, you needed a dimension. -- **Enhancement**: In the Google Sheets application, we added support for time presets and complex time range filters such as "between", "after", and "before". -- **Enhancement**: In the Google Sheets application, we added supported to automatically populate dimension values when you select a "where" filter, removing the need to manually type them. Previously, you needed to manually type the dimension values. -- **Enhancement**: In the Google Sheets application, we added support to directly query entities, expanding the flexibility of data requests. -- **Enhancement**: In the Google Sheets application, we added an option to exclude column headers, which is useful for populating templates with only the required data. -- **Deprecation**: For the Tableau integration, the [`METRICS_AND_DIMENSIONS` data source](/docs/cloud-integrations/semantic-layer/tableau#using-the-integration) has been deprecated for all accounts not actively using it. We encourage users to transition to the "ALL" data source for future integrations. - -## March 2024 - -- **New:** The Semantic Layer services now support using Privatelink for customers who have it enabled. -- **New:** You can now develop against and test your Semantic Layer in the Cloud CLI if your developer credential uses SSO. -- **Enhancement:** You can select entities to Group By, Filter By, and Order By. -- **Fix:** `dbt parse` no longer shows an error when you use a list of filters (instead of just a string filter) on a metric. -- **Fix:** `join_to_timespine` now properly gets applied to conversion metric input measures. -- **Fix:** Fixed an issue where exports in Redshift were not always committing to the DWH, which also had the side-effect of leaving table locks open. -- **Behavior change:** Introduced the `source_freshness_run_project_hooks` flag, opt-in and disabled by default. If set to `True`, dbt will include `on-run-*` project hooks in the `source freshness` command. This will become the default in a future version of dbt. Read [Project hooks with source freshness](/reference/global-configs/behavior-changes#project-hooks-with-source-freshness) for more information. - - -## February 2024 - -- **New:** [Exports](/docs/use-dbt-semantic-layer/exports#define-exports) allow you to materialize a saved query as a table or view in your data platform. By using exports, you can unify metric definitions in your data platform and query them as you would any other table or view. -- **New:** You can access a list of your [exports](/docs/use-dbt-semantic-layer/exports) with the new list saved-queries command by adding `--show-exports` -- **New:** The dbt Semantic Layer and [Tableau Connector](/docs/cloud-integrations/semantic-layer/tableau) now supports relative date filters in Tableau. - -- - - You can now use the [exports](/docs/use-dbt-semantic-layer/exports) feature with [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl), allowing you to query reliable metrics and fast data reporting. Exports enhance the saved queries feature, allowing you to write commonly used queries directly within your data platform using dbt Cloud's job scheduler. - - By exposing tables of metrics and dimensions, exports enable you to integrate with additional tools that don't natively connect with the dbt Semantic Layer, such as PowerBI. - - Exports are available for dbt Cloud multi-tenant [Team or Enterprise](https://www.getdbt.com/pricing/) plans on dbt versions 1.7 or newer. Refer to the [exports blog](https://www.getdbt.com/blog/announcing-exports-for-the-dbt-semantic-layer) for more details. - - - - - -- - - Now available for dbt Cloud Team and Enterprise plans is the ability to trigger deploy jobs when other deploy jobs are complete. You can enable this feature [in the UI](/docs/deploy/deploy-jobs) with the **Run when another job finishes** option in the **Triggers** section of your job or with the [Create Job API endpoint](/dbt-cloud/api-v2#/operations/Create%20Job). - - When enabled, your job will run after the specified upstream job completes. You can configure which run status(es) will trigger your job. It can be just on `Success` or on all statuses. If you have dependencies between your dbt projects, this allows you to _natively_ orchestrate your jobs within dbt Cloud — no need to set up a third-party tool. - - An example of the **Triggers** section when creating the job: - - - - - -- - - _Now available in the dbt version dropdown in dbt Cloud — starting with select customers, rolling out to wider availability through February and March._ - - On this release track, you get automatic upgrades of dbt, including early access to the latest features, fixes, and performance improvements for your dbt project. dbt Labs will handle upgrades behind-the-scenes, as part of testing and redeploying the dbt Cloud application — just like other dbt Cloud capabilities and other SaaS tools that you're using. No more manual upgrades and no more need for _a second sandbox project_ just to try out new features in development. - - To learn more about the new setting, refer to [Release Tracks](/docs/dbt-versions/cloud-release-tracks) for details. - - - - - - -- - - You can now [override the dbt version](/docs/dbt-versions/upgrade-dbt-version-in-cloud#override-dbt-version) that's configured for the development environment within your project and use a different version — affecting only your user account. This lets you test new dbt features without impacting other people working on the same project. And when you're satisfied with the test results, you can safely upgrade the dbt version for your project(s). - - Use the **dbt version** dropdown to specify the version to override with. It's available on your project's credentials page in the **User development settings** section. For example: - - - - - -- - - You can now edit, format, or lint files and execute dbt commands directly in your primary git branch in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). This enhancement is available across various repositories, including native integrations, imported git URLs, and managed repos. - - This enhancement is currently available to all dbt Cloud multi-tenant regions and will soon be available to single-tenant accounts. - - The primary branch of the connected git repo has traditionally been _read-only_ in the IDE. This update changes the branch to _protected_ and allows direct edits. When a commit is made, dbt Cloud will prompt you to create a new branch. dbt Cloud will pre-populate the new branch name with the GIT_USERNAME-patch-#; however, you can edit the field with a custom branch name. - - Previously, the primary branch was displayed as read-only, but now the branch is displayed with a lock icon to identify it as protected: - - - - - - - - - - When you make a commit while on the primary branch, a modal window will open prompting you to create a new branch and enter a commit message: - - - - - -- **Enhancement:** The dbt Semantic Layer [Google Sheets integration](/docs/cloud-integrations/semantic-layer/gsheets) now exposes a note on the cell where the data was requested, indicating clearer data requests. The integration also now exposes a new **Time Range** option, which allows you to quickly select date ranges. -- **Enhancement:** The [GraphQL API](/docs/dbt-cloud-apis/sl-graphql) includes a `requiresMetricTime` parameter to better handle metrics that must be grouped by time. (Certain metrics defined in MetricFlow can't be looked at without a time dimension). -- **Enhancement:** Enable querying metrics with offset and cumulative metrics with the time dimension name, instead of `metric_time`. [Issue #1000](https://github.com/dbt-labs/metricflow/issues/1000) - - Enable querying `metric_time` without metrics. [Issue #928](https://github.com/dbt-labs/metricflow/issues/928) -- **Enhancement:** Added support for consistent SQL query generation, which enables ID generation consistency between otherwise identical MF queries. Previously, the SQL generated by `MetricFlowEngine` was not completely consistent between identical queries. [Issue 1020](https://github.com/dbt-labs/metricflow/issues/1020) -- **Fix:** The Tableau Connector returns a date filter when filtering by dates. Previously it was erroneously returning a timestamp filter. -- **Fix:** MetricFlow now validates if there are `metrics`, `group by`, or `saved_query` items in each query. Previously, there was no validation. [Issue 1002](https://github.com/dbt-labs/metricflow/issues/1002) -- **Fix:** Measures using `join_to_timespine` in MetricFlow now have filters applied correctly after time spine join. -- **Fix:** Querying multiple granularities with offset metrics: - - If you query a time offset metric with multiple instances of `metric_time`/`agg_time_dimension`, only one of the instances will be offset. All of them should be. - - If you query a time offset metric with one instance of `metric_time`/`agg_time_dimension` but filter by a different one, the query will fail. -- **Fix:** MetricFlow prioritizes a candidate join type over the default type when evaluating nodes to join. For example, the default join type for distinct values queries is `FULL OUTER JOIN`, however, time spine joins require `CROSS JOIN`, which is more appropriate. -- **Fix:** Fixed a bug that previously caused errors when entities were referenced in `where` filters. - -## January 2024 - -- - - Hello from the dbt Docs team: @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun! First, we’d like to thank the 10 new community contributors to docs.getdbt.com :pray: What a busy start to the year! We merged 110 PRs in January. - - Here's how we improved the [docs.getdbt.com](http://docs.getdbt.com/) experience: - - - Added new hover behavior for images - - Added new expandables for FAQs - - Pruned outdated notices and snippets as part of the docs site maintenance - - January saw some great new content: - - - New [dbt Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-4-faqs) page - - Beta launch of [Explorer’s column-level lineage](https://docs.getdbt.com/docs/collaborate/column-level-lineage) feature - - Developer blog posts: - - [More time coding, less time waiting: Mastering defer in dbt](https://docs.getdbt.com/blog/defer-to-prod) - - [Deprecation of dbt Server](https://docs.getdbt.com/blog/deprecation-of-dbt-server) - - From the community: [Serverless, free-tier data stack with dlt + dbt core](https://docs.getdbt.com/blog/serverless-dlt-dbt-stack) - - The Extrica team added docs for the [dbt-extrica community adapter](https://docs.getdbt.com/docs/core/connect-data-platform/extrica-setup) - - Semantic Layer: New [conversion metrics docs](https://docs.getdbt.com/docs/build/conversion) and added the parameter `fill_nulls_with` to all metric types (launched the week of January 12, 2024) - - New [dbt environment command](https://docs.getdbt.com/reference/commands/dbt-environment) and its flags for the dbt Cloud CLI - - January also saw some refreshed content, either aligning with new product features or requests from the community: - - - Native support for [partial parsing in dbt Cloud](https://docs.getdbt.com/docs/cloud/account-settings#partial-parsing) - - Updated guidance on using dots or underscores in the [Best practice guide for models](https://docs.getdbt.com/best-practices/how-we-style/1-how-we-style-our-dbt-models) - - Updated [PrivateLink for VCS docs](https://docs.getdbt.com/docs/cloud/secure/vcs-privatelink) - - Added a new `job_runner` role in our [Enterprise project role permissions docs](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions#project-role-permissions) - - Added saved queries to [Metricflow commands](https://docs.getdbt.com/docs/build/metricflow-commands#list-saved-queries) - - Removed [as_text docs](https://github.com/dbt-labs/docs.getdbt.com/pull/4726) that were wildly outdated - - - -- **New:** New metric type that allows you to measure conversion events. For example, users who viewed a web page and then filled out a form. For more details, refer to [Conversion metrics](/docs/build/conversion). -- **New:** Instead of specifying the fully qualified dimension name (for example, `order__user__country`) in the group by or filter expression, you now only need to provide the primary entity and dimensions name, like `user__county`. -- **New:** You can now query the [saved queries](/docs/build/saved-queries) you've defined in the dbt Semantic Layer using [Tableau](/docs/cloud-integrations/semantic-layer/tableau), [GraphQL API](/docs/dbt-cloud-apis/sl-graphql), [JDBC API](/docs/dbt-cloud-apis/sl-jdbc), and the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation). - -- - - By default, dbt parses all the files in your project at the beginning of every dbt invocation. Depending on the size of your project, this operation can take a long time to complete. With the new partial parsing feature in dbt Cloud, you can reduce the time it takes for dbt to parse your project. When enabled, dbt Cloud parses only the changed files in your project instead of parsing all the project files. As a result, your dbt invocations will take less time to run. - - To learn more, refer to [Partial parsing](/docs/cloud/account-settings#partial-parsing). - - - - - -- **Enhancement:** The YAML spec parameter `label` is now available for Semantic Layer metrics in [JDBC and GraphQL APIs](/docs/dbt-cloud-apis/sl-api-overview). This means you can conveniently use `label` as a display name for your metrics when exposing them. -- **Enhancement:** Added support for `create_metric: true` for a measure, which is a shorthand to quickly create metrics. This is useful in cases when metrics are only used to build other metrics. -- **Enhancement:** Added support for Tableau parameter filters. You can use the [Tableau connector](/docs/cloud-integrations/semantic-layer/tableau) to create and use parameters with your dbt Semantic Layer data. -- **Enhancement:** Added support to expose `expr` and `agg` for [Measures](/docs/build/measures) in the [GraphQL API](/docs/dbt-cloud-apis/sl-graphql). -- **Enhancement:** You have improved error messages in the command line interface when querying a dimension that is not reachable for a given metric. -- **Enhancement:** You can now query entities using our Tableau integration (similar to querying dimensions). -- **Enhancement:** A new data source is available in our Tableau integration called "ALL", which contains all semantic objects defined. This has the same information as "METRICS_AND_DIMENSIONS". In the future, we will deprecate "METRICS_AND_DIMENSIONS" in favor of "ALL" for clarity. - -- **Fix:** Support for numeric types with precision greater than 38 (like `BIGDECIMAL`) in BigQuery is now available. Previously, it was unsupported so would return an error. -- **Fix:** In some instances, large numeric dimensions were being interpreted by Tableau in scientific notation, making them hard to use. These should now be displayed as numbers as expected. -- **Fix:** We now preserve dimension values accurately instead of being inadvertently converted into strings. -- **Fix:** Resolved issues with naming collisions in queries involving multiple derived metrics using the same metric input. Previously, this could cause a naming collision. Input metrics are now deduplicated, ensuring each is referenced only once. -- **Fix:** Resolved warnings related to using two duplicate input measures in a derived metric. Previously, this would trigger a warning. Input measures are now deduplicated, enhancing query processing and clarity. -- **Fix:** Resolved an error where referencing an entity in a filter using the object syntax would fail. For example, `{{Entity('entity_name')}}` would fail to resolve. +Release notes are grouped by month for both multi-tenant and virtual private cloud (VPC) environments. + +## January 2025 + + +- **New**: Users can now switch themes directly from the user menu, available [in Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). We have added support for **Light mode** (default), **Dark mode**, and automatic theme switching based on system preferences. The selected theme is stored in the user profile and will follow users across all devices. + - Dark mode is currently available on the Developer plan and will be available for all [plans](https://www.getdbt.com/pricing) in the future. We’ll be rolling it out gradually, so stay tuned for updates. For more information, refer to [Change your dbt Cloud theme](/docs/cloud/about-cloud/change-your-dbt-cloud-theme). +- **Fix**: dbt Semantic Layer errors in the Cloud IDE are now displayed with proper formatting, fixing an issue where newlines appeared broken or difficult to read. This fix ensures error messages are more user-friendly and easier to parse. +- **Fix**: Fixed an issue where [saved queries](/docs/build/saved-queries) with no [exports](/docs/build/saved-queries#configure-exports) would fail with an `UnboundLocalError`. Previously, attempting to process a saved query without any exports would cause an error due to an undefined relation variable. Exports are optional, and this fix ensures saved queries without exports don't fail. +- **New**: You can now query metric alias in dbt Semantic Layer [GraphQL](/docs/dbt-cloud-apis/sl-graphql) and [JDBC](/docs/dbt-cloud-apis/sl-jdbc) APIs. + - For the JDBC API, refer to [Query metric alias](/docs/dbt-cloud-apis/sl-jdbc#query-metric-alias) for more information. + - For the GraphQL API, refer to [Query metric alias](/docs/dbt-cloud-apis/sl-graphql#query-metric-alias) for more information. +- **Enhancement**: Added support to automatically refresh access tokens when Snowflake's SSO connection expires. Previously, users would get the following error: `Connection is not available, request timed out after 30000ms` and would have to wait 10 minutes to try again. +- **Enhancement**: The [`dbt_version` format](/reference/commands/version#versioning) in dbt Cloud now better aligns with [semantic versioning rules](https://semver.org/). Leading zeroes have been removed from the month and day (`YYYY.M.D+`). For example: + - New format: `2024.10.8+996c6a8` + - Previous format: `2024.10.08+996c6a8` diff --git a/website/docs/docs/dbt-versions/upgrade-dbt-version-in-cloud.md b/website/docs/docs/dbt-versions/upgrade-dbt-version-in-cloud.md index 52faa9385fa..f19240b3998 100644 --- a/website/docs/docs/dbt-versions/upgrade-dbt-version-in-cloud.md +++ b/website/docs/docs/dbt-versions/upgrade-dbt-version-in-cloud.md @@ -26,21 +26,19 @@ To upgrade an environment in the [dbt Cloud Admin API](/docs/dbt-cloud-apis/admi ### Override dbt version -Configure your project to use a different dbt Core version than what's configured in your [development environment](/docs/dbt-cloud-environments#types-of-environments). This _override_ only affects your user account, no one else's. Use this to safely test new dbt features before upgrading the dbt version for your projects. +Configure your project to use a different dbt version than what's configured in your [development environment](/docs/dbt-cloud-environments#types-of-environments). This _override_ only affects your user account, no one else's. Use this to safely test new dbt features before upgrading the dbt version for your projects. 1. Click your account name from the left side panel and select **Account settings**. -1. Choose **Credentials** from the sidebar and select a project. This opens a side panel. -1. In the side panel, click **Edit** and scroll to the **User development settings** section. Choose a version from the **dbt version** dropdown and click **Save**. +2. Choose **Credentials** from the sidebar and select a project. This opens a side panel. +3. In the side panel, click **Edit** and scroll to the **User development settings** section. +4. Choose a version from the **dbt version** dropdown and click **Save**. - An example of overriding the configured version with 1.7 for the selected project: + An example of overriding the configured version to ["Latest" release track](/docs/dbt-versions/cloud-release-tracks) for the selected project: -1. (Optional) Verify that dbt Cloud will use your override setting to build the project. Invoke `dbt build` in the IDE's command bar. Expand the **System Logs** section and find the output's first line. It should begin with `Running with dbt=` and list the version dbt Cloud is using. - - Example output of a successful `dbt build` run: - - +5. (Optional) Verify that dbt Cloud will use your override setting to build the project by invoking a `dbt build` command in the dbt Cloud IDE's command bar. Expand the **System Logs** section and find the output's first line. It should begin with `Running with dbt=` and list the version dbt Cloud is using.

+ For users on Release tracks, the output will display `Running dbt...` instead of a specific version, reflecting the flexibility and continuous automatic updates provided by the release track functionality. ## Jobs @@ -298,13 +296,17 @@ If you believe your project might be affected, read more details in the migratio #### Testing your changes before upgrading -Once you know what code changes you'll need to make, you can start implementing them. We recommend you create a separate dbt project, **Upgrade Project**, to test your changes before making them live in your main dbt project. In your **Upgrade Project**, connect to the same repository you use for your production project. This time, set the development environment [settings](/docs/dbt-versions/upgrade-dbt-version-in-cloud) to run the latest version of dbt Core. Next, check out a branch `dbt-version-upgrade`, make the appropriate updates to your project, and verify your dbt project compiles and runs with the new version in the IDE. If upgrading directly to the latest version results in too many issues, try testing your project iteratively on successive minor versions. There are years of development and a few breaking changes between distant versions of dbt Core (for example, 0.14 --> 1.0). The likelihood of experiencing problems upgrading between successive minor versions is much lower, which is why upgrading regularly is recommended. - -Once you have your project compiling and running on the latest version of dbt in the development environment for your `dbt-version-upgrade` branch, try replicating one of your production jobs to run off your branch's code. You can do this by creating a new deployment environment for testing, setting the custom branch to 'ON' and referencing your `dbt-version-upgrade` branch. You'll also need to set the dbt version in this environment to the latest dbt Core version. - - - - - - -Then add a job to the new testing environment that replicates one of the production jobs your team relies on. If that job runs smoothly, you should be all set to merge your branch into main and change your development and deployment environments in your main dbt project to run off the newest version of dbt Core. +Once you know what code changes you'll need to make, you can start implementing them. We recommend you: +- Create a separate dbt project, "Upgrade project", to test your changes before making them live in your main dbt project. +- In your "Upgrade project", connect to the same repository you use for your production project. +- Set the development environment [settings](/docs/dbt-versions/upgrade-dbt-version-in-cloud) to run the latest version of dbt Core. +- Check out a branch `dbt-version-upgrade`, make the appropriate updates to your project, and verify your dbt project compiles and runs with the new version in the dbt Cloud IDE. + - If upgrading directly to the latest version results in too many issues, try testing your project iteratively on successive minor versions. There are years of development and a few breaking changes between distant versions of dbt Core (for example, 0.14 --> 1.0). The likelihood of experiencing problems upgrading between successive minor versions is much lower, which is why upgrading regularly is recommended. +- Once you have your project compiling and running on the latest version of dbt in the development environment for your `dbt-version-upgrade` branch, try replicating one of your production jobs to run off your branch's code. +- You can do this by creating a new deployment environment for testing, setting the custom branch to 'ON' and referencing your `dbt-version-upgrade` branch. You'll also need to set the dbt version in this environment to the latest dbt Core version. + + + +- Then add a job to the new testing environment that replicates one of the production jobs your team relies on. + - If that job runs smoothly, you should be all set to merge your branch into main. + - Then change your development and deployment environments in your main dbt project to run off the newest version of dbt Core. diff --git a/website/docs/docs/deploy/about-ci.md b/website/docs/docs/deploy/about-ci.md index e27d2e7d08e..f442630e14f 100644 --- a/website/docs/docs/deploy/about-ci.md +++ b/website/docs/docs/deploy/about-ci.md @@ -25,3 +25,4 @@ Refer to the guide [Get started with continuous integration tests](/guides/set-u icon="dbt-bit"/>

+ diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 08c7813bfd3..ff7c321282d 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -16,6 +16,10 @@ You can set up [continuous integration](/docs/deploy/continuous-integration) (CI - Set up a [connection with your Git provider](/docs/cloud/git/git-configuration-in-dbt-cloud). This integration lets dbt Cloud run jobs on your behalf for job triggering. - If you're using a native [GitLab](/docs/cloud/git/connect-gitlab) integration, you need a paid or self-hosted account that includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). If you're using GitLab Free, merge requests will trigger CI jobs but CI job status updates (success or failure of the job) will not be reported back to GitLab. +import GitProvidersCI from '/snippets/_git-providers-supporting-ci.md'; + + + ## Set up CI jobs {#set-up-ci-jobs} dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deployment environment](/docs/deploy/deploy-environments#create-a-deployment-environment) that's connected to a staging database. Having a separate environment dedicated for CI will provide better isolation between your temporary CI schema builds and your production data builds. Additionally, sometimes teams need their CI jobs to be triggered when a PR is made to a branch other than main. If your team maintains a staging branch as part of your release process, having a separate environment will allow you to set a [custom branch](/faqs/Environments/custom-branch-settings) and, accordingly, the CI job in that dedicated environment will be triggered only when PRs are made to the specified custom branch. To learn more, refer to [Get started with CI tests](/guides/set-up-ci). diff --git a/website/docs/docs/deploy/continuous-integration.md b/website/docs/docs/deploy/continuous-integration.md index c738e641a5b..4488e75cecd 100644 --- a/website/docs/docs/deploy/continuous-integration.md +++ b/website/docs/docs/deploy/continuous-integration.md @@ -27,6 +27,11 @@ When the CI run completes, you can view the run status directly from within the dbt Cloud deletes the temporary schema from your  when you close or merge the pull request. If your project has schema customization using the [generate_schema_name](/docs/build/custom-schemas#how-does-dbt-generate-a-models-schema-name) macro, dbt Cloud might not drop the temporary schema from your data warehouse. For more information, refer to [Troubleshooting](/docs/deploy/ci-jobs#troubleshooting). +import GitProvidersCI from '/snippets/_git-providers-supporting-ci.md'; + + + + ## Differences between CI jobs and other deployment jobs The [dbt Cloud scheduler](/docs/deploy/job-scheduler) executes CI jobs differently from other deployment jobs in these important ways: diff --git a/website/docs/docs/deploy/deploy-jobs.md b/website/docs/docs/deploy/deploy-jobs.md index 0505f1f228d..b9bc1b0ffa9 100644 --- a/website/docs/docs/deploy/deploy-jobs.md +++ b/website/docs/docs/deploy/deploy-jobs.md @@ -32,9 +32,9 @@ You can create a deploy job and configure it to run on [scheduled days and times - (Optional) **Description** — Provide a description of what the job does (for example, what the job consumes and what the job produces). - **Environment** — By default, it’s set to the deployment environment you created the deploy job from. 3. Options in the **Execution settings** section: - - **Commands** — By default, it includes the `dbt build` command. Click **Add command** to add more [commands](/docs/deploy/job-commands) that you want to be invoked when the job runs. - - **Generate docs on run** — Enable this option if you want to [generate project docs](/docs/collaborate/build-and-view-your-docs) when this deploy job runs. - - **Run source freshness** — Enable this option to invoke the `dbt source freshness` command before running the deploy job. Refer to [Source freshness](/docs/deploy/source-freshness) for more details. + - [**Commands**](/docs/deploy/job-commands#built-in-commands) — By default, it includes the `dbt build` command. Click **Add command** to add more [commands](/docs/deploy/job-commands) that you want to be invoked when the job runs. During a job run, [built-in commands](/docs/deploy/job-commands#built-in-commands) are "chained" together and if one run step fails, the entire job fails with an "Error" status. + - [**Generate docs on run**](/docs/deploy/job-commands#checkbox-commands) — Enable this option if you want to [generate project docs](/docs/collaborate/build-and-view-your-docs) when this deploy job runs. If the step fails, the job can succeed if subsequent steps pass. + - [**Run source freshness**](/docs/deploy/job-commands#checkbox-commands) — Enable this option to invoke the `dbt source freshness` command before running the deploy job. If the step fails, the job can succeed if subsequent steps pass. Refer to [Source freshness](/docs/deploy/source-freshness) for more details. 4. Options in the **Triggers** section: - **Run on schedule** — Run the deploy job on a set schedule. - **Timing** — Specify whether to [schedule](#schedule-days) the deploy job using **Intervals** that run the job every specified number of hours, **Specific hours** that run the job at specific times of day, or **Cron schedule** that run the job specified using [cron syntax](#cron-schedule). diff --git a/website/docs/docs/deploy/monitor-jobs.md b/website/docs/docs/deploy/monitor-jobs.md index 40298f0cdbe..2d2bf033937 100644 --- a/website/docs/docs/deploy/monitor-jobs.md +++ b/website/docs/docs/deploy/monitor-jobs.md @@ -10,13 +10,14 @@ Monitor your dbt Cloud jobs to help identify improvement and set up alerts to pr This portion of our documentation will go over dbt Cloud's various capabilities that help you monitor your jobs and set up alerts to ensure seamless orchestration, including: -- [Run visibility](/docs/deploy/run-visibility) — View your run history to help identify where improvements can be made to scheduled jobs. -- [Retry jobs](/docs/deploy/retry-jobs) — Rerun your errored jobs from start or the failure point. -- [Job notifications](/docs/deploy/job-notifications) — Receive email or Slack notifications when a job run succeeds, encounters warnings, fails, or is canceled. -- [Model notifications](/docs/deploy/model-notifications) — Receive email notifications about any issues encountered by your models and tests as soon as they occur while running a job. -- [Webhooks](/docs/deploy/webhooks) — Use webhooks to send events about your dbt jobs' statuses to other systems. -- [Leverage artifacts](/docs/deploy/artifacts) — dbt Cloud generates and saves artifacts for your project, which it uses to power features like creating docs for your project and reporting freshness of your sources. -- [Source freshness](/docs/deploy/source-freshness) — Monitor data governance by enabling snapshots to capture the freshness of your data sources. +- [Leverage artifacts](/docs/deploy/artifacts) — dbt Cloud generates and saves artifacts for your project, which it uses to power features like creating docs for your project and reporting freshness of your sources. +- [Job notifications](/docs/deploy/job-notifications) — Receive email or Slack notifications when a job run succeeds, encounters warnings, fails, or is canceled. +- [Model notifications](/docs/deploy/model-notifications) — Receive email notifications about any issues encountered by your models and tests as soon as they occur while running a job. +- [Retry jobs](/docs/deploy/retry-jobs) — Rerun your errored jobs from start or the failure point. +- [Run visibility](/docs/deploy/run-visibility) — View your run history to help identify where improvements can be made to scheduled jobs. +- [Source freshness](/docs/deploy/source-freshness) — Monitor data governance by enabling snapshots to capture the freshness of your data sources. +- [Webhooks](/docs/deploy/webhooks) — Use webhooks to send events about your dbt jobs' statuses to other systems. + To set up and add data health tiles to view data freshness and quality checks in your dashboard, refer to [data health tiles](/docs/collaborate/data-tile). diff --git a/website/docs/docs/deploy/webhooks.md b/website/docs/docs/deploy/webhooks.md index 4ff9c350344..5aa8abbe41f 100644 --- a/website/docs/docs/deploy/webhooks.md +++ b/website/docs/docs/deploy/webhooks.md @@ -36,17 +36,23 @@ You can also check out the free [dbt Fundamentals course](https://learn.getdbt.c ## Create a webhook subscription {#create-a-webhook-subscription} -Navigate to **Account settings** in dbt Cloud (by clicking your account name from the left side panel), and click **Create New Webhook** in the **Webhooks** section. You can find the appropriate dbt Cloud access URL for your region and plan with [Regions & IP addresses](/docs/cloud/about-cloud/access-regions-ip-addresses). - -To configure your new webhook: - -- **Name** — Enter a name for your outbound webhook. -- **Description** — Enter a description of the webhook. -- **Events** — Choose the event you want to trigger this webhook. You can subscribe to more than one event. -- **Jobs** — Specify the job(s) you want the webhook to trigger on. Or, you can leave this field empty for the webhook to trigger on all jobs in your account. By default, dbt Cloud configures your webhook at the account level. -- **Endpoint** — Enter your application's endpoint URL, where dbt Cloud can send the event(s) to. +1. Navigate to **Account settings** in dbt Cloud (by clicking your account name from the left side panel) +2. Go to the **Webhooks** section and click **Create webhook**. +3. To configure your new webhook: + - **Webhook name** — Enter a name for your outbound webhook. + - **Description** — Enter a description of the webhook. + - **Events** — Choose the event you want to trigger this webhook. You can subscribe to more than one event. + - **Jobs** — Specify the job(s) you want the webhook to trigger on. Or, you can leave this field empty for the webhook to trigger on all jobs in your account. By default, dbt Cloud configures your webhook at the account level. + - **Endpoint** — Enter your application's endpoint URL, where dbt Cloud can send the event(s) to. +4. When done, click **Save**. + + dbt Cloud provides a secret token that you can use to [check for the authenticity of a webhook](#validate-a-webhook). It’s strongly recommended that you perform this check on your server to protect yourself from fake (spoofed) requests. + +:::info +Note that dbt Cloud automatically deactivates a webhook after 5 consecutive failed attempts to send events to your endpoint. To re-activate the webhook, locate it in the webhooks list and click the reactivate button to enable it and continue receiving events. +::: -When done, click **Save**. dbt Cloud provides a secret token that you can use to [check for the authenticity of a webhook](#validate-a-webhook). It’s strongly recommended that you perform this check on your server to protect yourself from fake (spoofed) requests. +To find the appropriate dbt Cloud access URL for your region and plan, refer to [Regions & IP addresses](/docs/cloud/about-cloud/access-regions-ip-addresses). ### Differences between completed and errored webhook events {#completed-errored-event-difference} The `job.run.errored` event is a subset of the `job.run.completed` events. If you subscribe to both, you will receive two notifications when your job encounters an error. However, dbt Cloud triggers the two events at different times: diff --git a/website/docs/docs/supported-data-platforms.md b/website/docs/docs/supported-data-platforms.md index 75fb8f2dfbe..a0af8f5b070 100644 --- a/website/docs/docs/supported-data-platforms.md +++ b/website/docs/docs/supported-data-platforms.md @@ -1,7 +1,7 @@ --- title: "Supported data platforms" id: "supported-data-platforms" -sidebar_label: "Supported data platforms" +sidebar_label: "About supported data platforms" description: "Connect dbt to any data platform in dbt Cloud or dbt Core, using a dedicated adapter plugin" hide_table_of_contents: true pagination_next: "docs/connect-adapters" diff --git a/website/docs/guides/customize-schema-alias.md b/website/docs/guides/customize-schema-alias.md index 28d4aada525..e9c3b95fc98 100644 --- a/website/docs/guides/customize-schema-alias.md +++ b/website/docs/guides/customize-schema-alias.md @@ -9,6 +9,7 @@ icon: 'guides' hide_table_of_contents: true level: 'Advanced' recently_updated: true +keywords: ["generate", "schema name", "guide", "dbt", "schema customization", "custom schema"] ---
diff --git a/website/docs/guides/redshift-qs.md b/website/docs/guides/redshift-qs.md index 8990c4db925..38769adbac8 100644 --- a/website/docs/guides/redshift-qs.md +++ b/website/docs/guides/redshift-qs.md @@ -21,7 +21,7 @@ In this quickstart guide, you'll learn how to use dbt Cloud with Redshift. It wi - Document your models - Schedule a job to run -:::tips Videos for you +:::tip Videos for you Check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. ::: diff --git a/website/docs/guides/snowflake-qs.md b/website/docs/guides/snowflake-qs.md index 40bdeed1ef2..18e77ce050c 100644 --- a/website/docs/guides/snowflake-qs.md +++ b/website/docs/guides/snowflake-qs.md @@ -46,7 +46,7 @@ You can also watch the [YouTube video on dbt and Snowflake](https://www.youtube. ## Create a new Snowflake worksheet 1. Log in to your trial Snowflake account. -2. In the Snowflake UI, click **+ Worksheet** in the upper right corner to create a new worksheet. +2. In the Snowflake UI, click **+ Create** in the left-hand corner, underneath the Snowflake logo, which opens a dropdown. Select the first option, **SQL Worksheet**. ## Load data The data used here is stored as CSV files in a public S3 bucket and the following steps will guide you through how to prepare your Snowflake account for that data and upload it. diff --git a/website/docs/reference/artifacts/run-results-json.md b/website/docs/reference/artifacts/run-results-json.md index 13ad528d185..118b5615ea8 100644 --- a/website/docs/reference/artifacts/run-results-json.md +++ b/website/docs/reference/artifacts/run-results-json.md @@ -3,14 +3,17 @@ title: "Run results JSON file" sidebar_label: "Run results" --- -**Current schema**: [`v5`](https://schemas.getdbt.com/dbt/run-results/v5/index.html) +**Current schema**: [`v6`](https://schemas.getdbt.com/dbt/run-results/v6/index.html) **Produced by:** [`build`](/reference/commands/build) + [`clone`](/reference/commands/clone) [`compile`](/reference/commands/compile) [`docs generate`](/reference/commands/cmd-docs) + [`retry`](/reference/commands/retry) [`run`](/reference/commands/run) [`seed`](/reference/commands/seed) + [`show`](/reference/commands/show) [`snapshot`](/reference/commands/snapshot) [`test`](/reference/commands/test) [`run-operation`](/reference/commands/run-operation) diff --git a/website/docs/reference/commands/parse.md b/website/docs/reference/commands/parse.md index 5e8145762f7..967991522bc 100644 --- a/website/docs/reference/commands/parse.md +++ b/website/docs/reference/commands/parse.md @@ -9,7 +9,7 @@ The `dbt parse` command parses and validates the contents of your dbt project. I It will also produce an artifact with detailed timing information, which is useful to understand parsing times for large projects. Refer to [Project parsing](/reference/parsing) for more information. -Starting in v1.5, `dbt parse` will write or return a [manifest](/reference/artifacts/manifest-json), enabling you to introspect dbt's understanding of all the resources in your project. +Starting in v1.5, `dbt parse` will write or return a [manifest](/reference/artifacts/manifest-json), enabling you to introspect dbt's understanding of all the resources in your project. Since `dbt parse` doesn't connect to your warehouse, [this manifest will not contain any compiled code](/faqs/Warehouse/db-connection-dbt-compile). By default, the dbt Cloud IDE will attempt a "partial" parse, which means it'll only check changes since the last parse (new or updated parts of your project when you make changes). Since the dbt Cloud IDE automatically parses in the background whenever you save your work, manually running `dbt parse` yourself is likely to be fast because it's just looking at recent changes. diff --git a/website/docs/reference/commands/version.md b/website/docs/reference/commands/version.md index 4d5ce6524dd..9643be92ab8 100644 --- a/website/docs/reference/commands/version.md +++ b/website/docs/reference/commands/version.md @@ -13,7 +13,7 @@ The `--version` command-line flag returns information about the currently instal ## Versioning To learn more about release versioning for dbt Core, refer to [How dbt Core uses semantic versioning](/docs/dbt-versions/core#how-dbt-core-uses-semantic-versioning). -If using a [dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks), which provide ongoing updates to dbt, then `dbt_version` represents the release version of dbt in dbt Cloud. This also follows semantic versioning guidelines, using the `YYYY.MM.DD+` format. The year, month, and day represent the date the version was built (for example, `2024.10.28+996c6a8`). The suffix provides an additional unique identification for each build. +If using a [dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks), which provide ongoing updates to dbt, then `dbt_version` represents the release version of dbt in dbt Cloud. This also follows semantic versioning guidelines, using the `YYYY.M.D+` format. The year, month, and day represent the date the version was built (for example, `2024.10.8+996c6a8`). The suffix provides an additional unique identification for each build. ## Example usages diff --git a/website/docs/reference/database-permissions/redshift-permissions.md b/website/docs/reference/database-permissions/redshift-permissions.md index 5f0949a3528..2bca5d4acfa 100644 --- a/website/docs/reference/database-permissions/redshift-permissions.md +++ b/website/docs/reference/database-permissions/redshift-permissions.md @@ -10,16 +10,16 @@ The following example provides you with the SQL statements you can use to manage **Note** that `database_name`, `database.schema_name`, and `user_name` are placeholders and you can replace them as needed for your organization's naming convention. - ``` -grant usage on database database_name to user_name; grant create schema on database database_name to user_name; grant usage on schema database.schema_name to user_name; grant create table on schema database.schema_name to user_name; grant create view on schema database.schema_name to user_name; -grant usage on all schemas in database database_name to user_name; +grant usage for schemas in database database_name to role role_name; grant select on all tables in database database_name to user_name; grant select on all views in database database_name to user_name; ``` +To connect to the database, confirm with an admin that your user role or group has been added to the database. Note that Redshift permissions differ from Postgres, and commands like [`grant connect`](https://www.postgresql.org/docs/current/sql-grant.html) aren't supported in Redshift. + Check out the [official documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html) for more information. diff --git a/website/docs/reference/node-selection/methods.md b/website/docs/reference/node-selection/methods.md index 600a578ef8e..29eb79a9130 100644 --- a/website/docs/reference/node-selection/methods.md +++ b/website/docs/reference/node-selection/methods.md @@ -7,6 +7,11 @@ Selector methods return all resources that share a common property, using the syntax `method:value`. While it is recommended to explicitly denote the method, you can omit it (the default value will be one of `path`, `file` or `fqn`). + + +The `--select` and `--selector` arguments sound similar, but they are different. To understand the difference, see [Differences between `--select` and `--selector`](/reference/node-selection/yaml-selectors#difference-between---select-and---selector). + + Many of the methods below support Unix-style wildcards: diff --git a/website/docs/reference/node-selection/syntax.md b/website/docs/reference/node-selection/syntax.md index 2e53eff72df..dce2a9c40e8 100644 --- a/website/docs/reference/node-selection/syntax.md +++ b/website/docs/reference/node-selection/syntax.md @@ -21,6 +21,8 @@ dbt's node selection syntax makes it possible to run only specific resources in We use the terms "nodes" and "resources" interchangeably. These encompass all the models, tests, sources, seeds, snapshots, exposures, and analyses in your project. They are the objects that make up dbt's DAG (directed acyclic graph). ::: +The `--select` and `--selector` arguments are similar in that they both allow you to select resources. To understand the difference, see [Differences between `--select` and `--selector`](/reference/node-selection/yaml-selectors#difference-between---select-and---selector). + ## Specifying resources By default, `dbt run` executes _all_ of the models in the dependency graph; `dbt seed` creates all seeds, `dbt snapshot` performs every snapshot. The `--select` flag is used to specify a subset of nodes to execute. @@ -103,6 +105,8 @@ As your selection logic gets more complex, and becomes unwieldly to type out as consider using a [yaml selector](/reference/node-selection/yaml-selectors). You can use a predefined definition with the `--selector` flag. Note that when you're using `--selector`, most other flags (namely `--select` and `--exclude`) will be ignored. +The `--select` and `--selector` arguments are similar in that they both allow you to select resources. To understand the difference between `--select` and `--selector` arguments, see [this section](/reference/node-selection/yaml-selectors#difference-between---select-and---selector) for more details. + ### Troubleshoot with the `ls` command Constructing and debugging your selection syntax can be challenging. To get a "preview" of what will be selected, we recommend using the [`list` command](/reference/commands/list). This command, when combined with your selection syntax, will output a list of the nodes that meet that selection criteria. The `dbt ls` command supports all types of selection syntax arguments, for example: @@ -136,15 +140,6 @@ Together, the [`state`](/reference/node-selection/methods#state) selector and de State and defer can be set by environment variables as well as CLI flags: -- `--state` or `DBT_STATE`: file path -- `--defer` or `DBT_DEFER`: boolean - -:::warning Syntax deprecated - -In dbt v1.5, we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined. - -::: - - `--state` or `DBT_STATE`: file path - `--defer` or `DBT_DEFER`: boolean - `--defer-state` or `DBT_DEFER_STATE`: file path to use for deferral only (optional) @@ -157,6 +152,12 @@ If both the flag and env var are provided, the flag takes precedence. - The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version. - These are powerful, complex features. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison. +:::warning Syntax deprecated + +In [dbt v1.5](/docs/dbt-versions/core-upgrade/upgrading-to-v1.5#behavior-changes), we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined. + +::: + ### The "result" status Another element of job state is the `result` of a prior dbt invocation. After executing a `dbt run`, for example, dbt creates the `run_results.json` artifact which contains execution times and success / error status for dbt models. You can read more about `run_results.json` on the ['run results'](/reference/artifacts/run-results-json) page. @@ -204,7 +205,7 @@ When a job is selected, dbt Cloud will surface the artifacts from that job's mos After issuing the `dbt source freshness` command, you can reference the source freshness results by adding a selector to a subsequent command: ```bash -# You can also set the DBT_ARTIFACT_STATE_PATH environment variable instead of the --state flag. +# You can also set the DBT_STATE environment variable instead of the --state flag. dbt source freshness # must be run again to compare current to previous state dbt build --select "source_status:fresher+" --state path/to/prod/artifacts ``` diff --git a/website/docs/reference/node-selection/yaml-selectors.md b/website/docs/reference/node-selection/yaml-selectors.md index ff6628919b7..ef7ca1673eb 100644 --- a/website/docs/reference/node-selection/yaml-selectors.md +++ b/website/docs/reference/node-selection/yaml-selectors.md @@ -288,3 +288,22 @@ selectors: **Note:** While selector inheritance allows the logic from another selector to be _reused_, it doesn't allow the logic from that selector to be _modified_ by means of `parents`, `children`, `indirect_selection`, and so on. The `selector` method returns the complete set of nodes returned by the named selector. + +## Difference between `--select` and `--selector` + +In dbt, [`select`](/reference/node-selection/syntax#how-does-selection-work) and `selector` are related concepts used for choosing specific models, tests, or resources. The following tables explains the differences and when to best use them: + +| Feature | `--select` | `--selector` | +| ------- | ---------- | ------------- | +| Definition | Ad-hoc, specified directly in the command. | Pre-defined in `selectors.yml` file. | +| Usage | One-time or task-specific filtering.| Reusable for multiple executions. | +| Complexity | Requires manual entry of selection criteria. | Can encapsulate complex logic for reuse. | +| Flexibility | Flexible; less reusable. | Flexible; focuses on reusable and structured logic.| +| Example | `dbt run --select my_model+`
(runs `my_model` and all downstream dependencies with the `+` operator). | `dbt run --selector nightly_diet_snowplow`
(runs models defined by the `nightly_diet_snowplow` selector in `selectors.yml`). | + +Notes: +- You can combine `--select` with `--exclude` for ad-hoc selection of nodes. +- The `--select` and `--selector` syntax both provide the same overall functions for node selection. Using [graph operators](/reference/node-selection/graph-operators) (such as `+`, `@`.) and [set operators](/reference/node-selection/set-operators) (such as `union` and `intersection`) in `--select` is the same as YAML-based configs in `--selector`. + + +For additional examples, check out [this GitHub Gist](https://gist.github.com/jeremyyeo/1aeca767e2a4f157b07955d58f8078f7). diff --git a/website/docs/reference/resource-configs/athena-configs.md b/website/docs/reference/resource-configs/athena-configs.md index fd5bc663ee7..082f3b5c249 100644 --- a/website/docs/reference/resource-configs/athena-configs.md +++ b/website/docs/reference/resource-configs/athena-configs.md @@ -106,7 +106,7 @@ lf_grants={ -There are some limitations and recommendations that should be considered: +Consider these limitations and recommendations: - `lf_tags` and `lf_tags_columns` configs support only attaching lf tags to corresponding resources. - We recommend managing LF Tags permissions somewhere outside dbt. For example, [terraform](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lakeformation_permissions) or [aws cdk](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_lakeformation-readme.html). @@ -114,8 +114,7 @@ There are some limitations and recommendations that should be considered: - Any tags listed in `lf_inherited_tags` should be strictly inherited from the database level and never overridden at the table and column level. - Currently, `dbt-athena` does not differentiate between an inherited tag association and an override it made previously. - For example, If a `lf_tags_config` value overrides an inherited tag in one run, and that override is removed before a subsequent run, the prior override will linger and no longer be encoded anywhere (for example, Terraform where the inherited value is configured nor in the DBT project where the override previously existed but now is gone). - - + ### Table location The saved location of a table is determined in precedence by the following conditions: @@ -144,6 +143,9 @@ The following [incremental models](https://docs.getdbt.com/docs/build/incrementa - `append`: Insert new records without updating, deleting or overwriting any existing data. There might be duplicate data (great for log or historical data). - `merge`: Conditionally updates, deletes, or inserts rows into an Iceberg table. Used in combination with `unique_key`.It is only available when using Iceberg. +Consider this limitation when using Iceberg models: + +- Incremental Iceberg models — Sync all columns on schema change. You can't remove columns used for partitioning with an incremental refresh; you must fully refresh the model. ### On schema change @@ -361,8 +363,7 @@ The materialization also supports invalidating hard deletes. For usage details, ### Snapshots known issues -- Incremental Iceberg models - Sync all columns on schema change. Columns used for partitioning can't be removed. From a dbt perspective, the only way is to fully refresh the incremental model. -- Tables, schemas and database names should only be lowercase +- Tables, schemas, and database names should only be lowercase. - To avoid potential conflicts, make sure [`dbt-athena-adapter`](https://github.com/Tomme/dbt-athena) is not installed in the target environment. - Snapshot does not support dropping columns from the source table. If you drop a column, make sure to drop the column from the snapshot as well. Another workaround is to NULL the column in the snapshot definition to preserve the history. diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index 1ee89efc95c..95bed967a14 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -10,19 +10,18 @@ When materializing a model as `table`, you may include several optional configs -| Option | Description | Required? | Model Support | Example | -|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|---------------|--------------------------| -| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | -| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | -| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | -| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL | `date_day` | -| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | -| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | -| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | - -\* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. -We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. +| Option | Description | Required? | Model support | Example | +|-----------|---------|-------------------|---------------|-------------| +| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | +| partition_by | Partition the created table by the specified columns. A directory is created for each partition.| Optional | SQL, Python | `date_day` | +| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL | `date_day` | +| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | +| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | +| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | +| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | + +\* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. There is not yet a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. @@ -30,45 +29,47 @@ We do not yet have a PySpark API to set tblproperties at table creation, so this 1.8 introduces support for [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) at the table level, in addition to all table configuration supported in 1.7. -| Option | Description | Required? | Model Support | Example | -|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|---------------|--------------------------| -| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | -| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | -| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | -| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | -| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | -| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | -| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL+, Python+ | `{'my_tag': 'my_value'}` | -| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | +| Option | Description | Required?| Model support | Example | +|-----------|---------------|----------|---------------|----------| +| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | +| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | +| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | +| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | +| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | +| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL, Python | `{'my_tag': 'my_value'}` | +| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. -\+ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. - +† `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. + dbt-databricks v1.9 adds support for the `table_format: iceberg` config. Try it now on the [dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks). All other table configurations were also supported in 1.8. -| Option | Description | Required? | Model Support | Example | -|---------------------|-----------------------------|-------------------------------------------|-----------------|--------------------------| -| table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` | -| file_format+ | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | -| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | -| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | +| Option | Description| Required? | Model support | Example | +|-------------|--------|-----------|-----------------|---------------| +| table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` | +| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | +| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | | liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | -| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | -| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | -| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL++, Python++ | `{'my_tag': 'my_value'}` | -| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | +| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | +| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | +| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL , Python | `{'my_tag': 'my_value'}` | +| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. -\+ When `table_format` is `iceberg`, `file_format` must be `delta`. -\++ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. + +† When `table_format` is `iceberg`, `file_format` must be `delta`. + +‡ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. @@ -260,7 +261,7 @@ This strategy is currently only compatible with All Purpose Clusters, not SQL Wa This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select _all_ of the relevant data for a partition when using this incremental strategy. -If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` + `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead. +If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` and `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead. - ```yml @@ -139,20 +138,28 @@ sources: ## Definition -Set the `event_time` to the name of the field that represents the timestamp of the event -- "at what time did the row occur" -- as opposed to an event ingestion date. You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block. +You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block. + +`event_time` is required for the [incremental microbatch](/docs/build/incremental-microbatch) strategy and highly recommended for [Advanced CI's compare changes](/docs/deploy/advanced-ci#optimizing-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments. + +### Best practices + +Set the `event_time` to the name of the field that represents the actual timestamp of the event (like `account_created_at`). The timestamp of the event should represent "at what time did the row occur" rather than an event ingestion date. Marking a column as the `event_time` when it isn't, diverges from the semantic meaning of the column which may result in user confusion when other tools make use of the metadata. -Here are some examples of good and bad `event_time` columns: +However, if an ingestion date (like `loaded_at`, `ingested_at`, or `last_updated_at`) are the only timestamps you use, you can set `event_time` to these fields. Here are some considerations to keep in mind if you do this: -- ✅ Good: - - `account_created_at` — This represents the specific time when an account was created, making it a fixed event in time. - - `session_began_at` — This captures the exact timestamp when a user session started, which won’t change and directly ties to the event. +- Using `last_updated_at` or `loaded_at` — May result in duplicate entries in the resulting table in the data warehouse over multiple runs. Setting an appropriate [lookback](/reference/resource-configs/lookback) value can reduce duplicates but it can't fully eliminate them since some updates outside the lookback window won't be processed. +- Using `ingested_at` — Since this column is created by your ingestion/EL tool instead of coming from the original source, it will change if/when you need to resync your connector for some reason. This means that data will be reprocessed and loaded into your warehouse for a second time against a second date. As long as this never happens (or you run a full refresh when it does), microbatches will be processed correctly when using `ingested_at`. -- ❌ Bad: +Here are some examples of recommended and not recommended `event_time` columns: - - `_fivetran_synced` — This isn't the time that the event happened, it's the time that the event was ingested. - - `last_updated_at` — This isn't a good use case as this will keep changing over time. -`event_time` is required for [Incremental microbatch](/docs/build/incremental-microbatch) and highly recommended for [Advanced CI's compare changes](/docs/deploy/advanced-ci#optimizing-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments. +|
Status
| Column name | Description | +|--------------------|---------------------|----------------------| +| ✅ Recommended | `account_created_at` | Represents the specific time when an account was created, making it a fixed event in time. | +| ✅ Recommended | `session_began_at` | Captures the exact timestamp when a user session started, which won’t change and directly ties to the event. | +| ❌ Not recommended | `_fivetran_synced` | This represents the time the event was ingested, not when it happened. | +| ❌ Not recommended | `last_updated_at` | Changes over time and isn't tied to the event itself. If used, note the considerations mentioned earlier in [best practices](#best-practices). | ## Examples diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 859e4e9e31a..4556544d189 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -48,7 +48,9 @@ snapshots: ## Description -The `hard_deletes` config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table. +The `hard_deletes` config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table. + +You can use `hard_deletes` with dbt-postgres, dbt-bigquery, dbt-snowflake, and dbt-redshift adapters. import HardDeletes from '/snippets/_hard-deletes.md'; diff --git a/website/docs/reference/resource-configs/snapshot_meta_column_names.md b/website/docs/reference/resource-configs/snapshot_meta_column_names.md index 24e4c8ca577..59d63374de7 100644 --- a/website/docs/reference/resource-configs/snapshot_meta_column_names.md +++ b/website/docs/reference/resource-configs/snapshot_meta_column_names.md @@ -19,7 +19,7 @@ snapshots: dbt_valid_to: dbt_scd_id: dbt_updated_at: - dbt_is_deleted: + dbt_is_deleted: ``` @@ -35,7 +35,7 @@ snapshots: "dbt_valid_to": "", "dbt_scd_id": "", "dbt_updated_at": "", - "dbt_is_deleted": "", + "dbt_is_deleted": "", } ) }} @@ -54,7 +54,7 @@ snapshots: dbt_valid_to: dbt_scd_id: dbt_updated_at: - dbt_is_deleted: + dbt_is_deleted: ```
@@ -67,17 +67,17 @@ In order to align with an organization's naming conventions, the `snapshot_meta_ By default, dbt snapshots use the following column names to track change history using [Type 2 slowly changing dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) records: -| Field | Meaning | Notes | -| -------------- | ------- | ----- | -| `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | The value is affected by the [`strategy`](/reference/resource-configs/strategy). | -| `dbt_valid_to` | The timestamp when this row is no longer valid. | | -| `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | -| `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | -| `dbt_is_deleted` | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. | +| Field |
Meaning
| Notes | Example | +| -------------- | ------- | ----- | ------- | +| `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | The value is affected by the [`strategy`](/reference/resource-configs/strategy). | `snapshot_meta_column_names: {dbt_valid_from: start_date}` | +| `dbt_valid_to` | The timestamp when this row is no longer valid. | | `snapshot_meta_column_names: {dbt_valid_to: end_date}` | +| `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_scd_id: scd_id}` | +| `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_updated_at: modified_date}` | +| `dbt_is_deleted` | A string value indicating if the record has been deleted. (`True` if deleted, `False` if not deleted). |Added when `hard_deletes='new_record'` is configured. | `snapshot_meta_column_names: {dbt_is_deleted: is_deleted}` | -However, these column names can be customized using the `snapshot_meta_column_names` config. +All of these column names can be customized using the `snapshot_meta_column_names` config. Refer to the [Example](#example) for more details. -:::warning +:::warning To avoid any unintentional data modification, dbt will **not** automatically apply any column renames. So if a user applies `snapshot_meta_column_names` config for a snapshot without updating the pre-existing table, they will get an error. We recommend either only using these settings for net-new snapshots, or arranging an update of pre-existing tables prior to committing a column name change. diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index 9d84e892236..30275450793 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -337,6 +337,15 @@ For dbt limitations, these dbt features are not supported: - [Model contracts](/docs/collaborate/govern/model-contracts) - [Copy grants configuration](/reference/resource-configs/snowflake-configs#copying-grants) +### Troubleshooting dynamic tables + +If your dynamic table model fails to rerun with the following error message after the initial execution: + +```sql +SnowflakeDynamicTableConfig.__init__() missing 6 required positional arguments: 'name', 'schema_name', 'database_name', 'query', 'target_lag', and 'snowflake_warehouse' +``` +Ensure that `QUOTED_IDENTIFIERS_IGNORE_CASE` on your account is set to `FALSE`. + ## Temporary tables Incremental table merges for Snowflake prefer to utilize a `view` rather than a `temporary table`. The reasoning is to avoid the database write step that a temporary table would initiate and save compile time. diff --git a/website/docs/reference/resource-configs/tags.md b/website/docs/reference/resource-configs/tags.md index 505a33a00f7..6fc8960af7c 100644 --- a/website/docs/reference/resource-configs/tags.md +++ b/website/docs/reference/resource-configs/tags.md @@ -93,6 +93,21 @@ resource_type: ``` + +To apply tags to a model in your `models/` directory, add the `config` property similar to the following example: + + + +```yaml +models: + - name: my_model + description: A model description + config: + tags: ['example_tag'] +``` + + + @@ -126,10 +141,24 @@ You can use the [`+` operator](/reference/node-selection/graph-operators#the-plu - `dbt run --select +model_name+` — Run a model, its upstream dependencies, and its downstream dependencies. - `dbt run --select tag:my_tag+ --exclude tag:exclude_tag` — Run model tagged with `my_tag` and their downstream dependencies, and exclude models tagged with `exclude_tag`, regardless of their dependencies. + +:::tip Usage notes about tags + +When using tags, consider the following: + +- Tags are additive across project hierarchy. +- Some resource types (like sources, exposures) require tags at the top level. + +Refer to [usage notes](#usage-notes) for more information. +::: + ## Examples + +The following examples show how to apply tags to resources in your project. You can configure tags in the `dbt_project.yml`, `schema.yml`, or SQL files. + ### Use tags to run parts of your project -Apply tags in your `dbt_project.yml` as a single value or a string: +Apply tags in your `dbt_project.yml` as a single value or a string. In the following example, one of the models, the `jaffle_shop` model, is tagged with `contains_pii`. @@ -153,16 +182,52 @@ models: - "published" ``` + + + +### Apply tags to models + +This section demonstrates applying tags to models in the `dbt_project.yml`, `schema.yml`, and SQL files. + +To apply tags to a model in your `dbt_project.yml` file, you would add the following: + + + +```yaml +models: + jaffle_shop: + +tags: finance # jaffle_shop model is tagged with 'finance'. +``` + + + +To apply tags to a model in your `models/` directory YAML file, you would add the following using the `config` property: + + + +```yaml +models: + - name: stg_customers + description: Customer data with basic cleaning and transformation applied, one row per customer. + config: + tags: ['santi'] # stg_customers.yml model is tagged with 'santi'. + columns: + - name: customer_id + description: The unique key for each customer. + data_tests: + - not_null + - unique +``` -You can also apply tags to individual resources using a config block: +To apply tags to a model in your SQL file, you would add the following: ```sql {{ config( - tags=["finance"] + tags=["finance"] # stg_payments.sql model is tagged with 'finance'. ) }} select ... @@ -211,14 +276,10 @@ seeds: -:::tip Upgrade to dbt Core 1.9 - -Applying tags to saved queries is only available in dbt Core versions 1.9 and later. -::: + - This following example shows how to apply a tag to a saved query in the `dbt_project.yml` file. The saved query is then tagged with `order_metrics`. @@ -263,7 +324,6 @@ Run resources with multiple tags using the following commands: # Run all resources tagged "order_metrics" and "hourly" dbt build --select tag:order_metrics tag:hourly ``` - ## Usage notes diff --git a/website/docs/reference/resource-configs/teradata-configs.md b/website/docs/reference/resource-configs/teradata-configs.md index 89a2ff76fba..e2a5d5fddca 100644 --- a/website/docs/reference/resource-configs/teradata-configs.md +++ b/website/docs/reference/resource-configs/teradata-configs.md @@ -5,32 +5,13 @@ id: "teradata-configs" ## General -* *Set `quote_columns`* - to prevent a warning, make sure to explicitly set a value for `quote_columns` in your `dbt_project.yml`. See the [doc on quote_columns](https://docs.getdbt.com/reference/resource-configs/quote_columns) for more information. +* *Set `quote_columns`* - to prevent a warning, make sure to explicitly set a value for `quote_columns` in your `dbt_project.yml`. See the [doc on quote_columns](/reference/resource-configs/quote_columns) for more information. ```yaml seeds: +quote_columns: false #or `true` if you have csv column headers with spaces ``` -* *Enable view column types in docs* - Teradata Vantage has a dbscontrol configuration flag called `DisableQVCI`. This flag instructs the database to create `DBC.ColumnsJQV` with view column type definitions. To enable this functionality you need to: - 1. Enable QVCI mode in Vantage. Use `dbscontrol` utility and then restart Teradata. Run these commands as a privileged user on a Teradata node: - ```bash - # option 551 is DisableQVCI. Setting it to false enables QVCI. - dbscontrol << EOF - M internal 551=false - W - EOF - - # restart Teradata - tpareset -y Enable QVCI - ``` - 2. Instruct `dbt` to use `QVCI` mode. Include the following variable in your `dbt_project.yml`: - ```yaml - vars: - use_qvci: true - ``` - For example configuration, see [dbt_project.yml](https://github.com/Teradata/dbt-teradata/blob/main/test/catalog/with_qvci/dbt_project.yml) in `dbt-teradata` QVCI tests. - ## Models ### @@ -142,7 +123,7 @@ id: "teradata-configs" For details, see [CREATE TABLE documentation](https://docs.teradata.com/r/76g1CuvvQlYBjb2WPIuk3g/B6Js16DRQVwPDjgJ8rz7hg). -* `with_statistics` - should statistics be copied from the base table, e.g.: +* `with_statistics` - should statistics be copied from the base table. For example: ```yaml {{ config( @@ -289,7 +270,7 @@ For example, in the `snapshots/snapshot_example.sql` file: Grants are supported in dbt-teradata adapter with release version 1.2.0 and above. You can use grants to manage access to the datasets you're producing with dbt. To implement these permissions, define grants as resource configs on each model, seed, or snapshot. Define the default grants that apply to the entire project in your `dbt_project.yml`, and define model-specific grants within each model's SQL or YAML file. -for e.g. : +For example: models/schema.yml ```yaml models: @@ -299,7 +280,7 @@ for e.g. : select: ['user_a', 'user_b'] ``` -Another e.g. for adding multiple grants: +Another example for adding multiple grants: ```yaml models: @@ -314,8 +295,8 @@ Another e.g. for adding multiple grants: Refer to [grants](/reference/resource-configs/grants) for more information on Grants. -## Query Band -Query Band in dbt-teradata can be set on three levels: +## Query band +Query band in dbt-teradata can be set on three levels: 1. Profiles level: In the `profiles.yml` file, the user can provide `query_band` using the following example: ```yaml @@ -348,6 +329,11 @@ If a user sets some key-value pair with value as `'{model}'`, internally this `' - For example, if the model the user is running is `stg_orders`, `{model}` will be replaced with `stg_orders` in runtime. - If no `query_band` is set by the user, the default query_band used will be: ```org=teradata-internal-telem;appname=dbt;``` +## Unit testing +* Unit testing is supported in dbt-teradata, allowing users to write and execute unit tests using the dbt test command. + * For detailed guidance, refer to the [dbt unit tests documentation](/docs/build/documentation). +> In Teradata, reusing the same alias across multiple common table expressions (CTEs) or subqueries within a single model is not permitted, as it results in parsing errors; therefore, it is essential to assign unique aliases to each CTE or subquery to ensure proper query execution. + ## valid_history incremental materialization strategy _This is available in early access_ @@ -361,26 +347,27 @@ In temporal databases, valid time is crucial for applications like historical re unique_key='id', on_schema_change='fail', incremental_strategy='valid_history', - valid_from='valid_from_column', - history_column_in_target='history_period_column' + valid_period='valid_period_col', + use_valid_to_time='no', ) }} ``` The `valid_history` incremental strategy requires the following parameters: -* `valid_from` — Column in the source table of **timestamp** datatype indicating when each record became valid. -* `history_column_in_target` — Column in the target table of **period** datatype that tracks history. +* `unique_key`: The primary key of the model (excluding the valid time components), specified as a column name or list of column names. +* `valid_period`: Name of the model column indicating the period for which the record is considered to be valid. The datatype must be `PERIOD(DATE)` or `PERIOD(TIMESTAMP)`. +* `use_valid_to_time`: Whether the end bound value of the valid period in the input is considered by the strategy when building the valid timeline. Use `no` if you consider your record to be valid until changed (and supply any value greater to the begin bound for the end bound of the period. A typical convention is `9999-12-31` of ``9999-12-31 23:59:59.999999`). Use `yes` if you know until when the record is valid (typically this is a correction in the history timeline). The valid_history strategy in dbt-teradata involves several critical steps to ensure the integrity and accuracy of historical data management: * Remove duplicates and conflicting values from the source data: * This step ensures that the data is clean and ready for further processing by eliminating any redundant or conflicting records. - * The process of removing duplicates and conflicting values from the source data involves using a ranking mechanism to ensure that only the highest-priority records are retained. This is accomplished using the SQL RANK() function. + * The process of removing primary key duplicates (two or more records with the same value for the `unique_key` and BEGIN() bond of the `valid_period` fields) in the dataset produced by the model. If such duplicates exist, the row with the lowest value is retained for all non-primary-key fields (in the order specified in the model). Full-row duplicates are always de-duplicated. * Identify and adjust overlapping time slices: - * Overlapping time periods in the data are detected and corrected to maintain a consistent and non-overlapping timeline. -* Manage records needing to be overwritten or split based on the source and target data: + * Overlapping or adjacent time periods in the data are corrected to maintain a consistent and non-overlapping timeline. To achieve this, the macro adjusts the valid period end bound of a record to align with the begin bound of the next record (if they overlap or are adjacent) within the same `unique_key` group. If `use_valid_to_time = 'yes'`, the valid period end bound provided in the source data is used. Otherwise, a default end date is applied for missing bounds, and adjustments are made accordingly. +* Manage records needing to be adjusted, deleted, or split based on the source and target data: * This involves handling scenarios where records in the source data overlap with or need to replace records in the target data, ensuring that the historical timeline remains accurate. -* Utilize the TD_NORMALIZE_MEET function to compact history: - * This function helps to normalize and compact the history by merging adjacent time periods, improving the efficiency and performance of the database. +* Compact history: + * Normalize and compact the history by merging records of adjacent time periods with the same value, optimizing database storage and performance. We use the function TD_NORMALIZE_MEET for this purpose. * Delete existing overlapping records from the target table: * Before inserting new or updated records, any existing records in the target table that overlap with the new data are removed to prevent conflicts. * Insert the processed data into the target table: @@ -414,11 +401,6 @@ These steps collectively ensure that the valid_history strategy effectively mana 2 | PERIOD(TIMESTAMP)[2024-03-01 00:00:00.0, 2024-03-12 00:00:00.0] | A | x1 2 | PERIOD(TIMESTAMP)[2024-03-12 00:00:00.0, 9999-12-31 23:59:59.9999] | C | x1 ``` - - -:::info -The target table must already exist before running the model. Ensure the target table is created and properly structured with the necessary columns, including a column that tracks the history with period datatype, before running a dbt model. -::: ## Common Teradata-specific tasks * *collect statistics* - when a table is created or modified significantly, there might be a need to tell Teradata to collect statistics for the optimizer. It can be done using `COLLECT STATISTICS` command. You can perform this step using dbt's `post-hooks`, e.g.: diff --git a/website/docs/reference/resource-configs/updated_at.md b/website/docs/reference/resource-configs/updated_at.md index 09122859e43..39ef7ae82d7 100644 --- a/website/docs/reference/resource-configs/updated_at.md +++ b/website/docs/reference/resource-configs/updated_at.md @@ -64,7 +64,7 @@ You will get a warning if the data type of the `updated_at` column does not matc ## Description A column within the results of your snapshot query that represents when the record row was last updated. -This parameter is **required if using the `timestamp` [strategy](/reference/resource-configs/strategy)**. +This parameter is **required if using the `timestamp` [strategy](/reference/resource-configs/strategy)**. The `updated_at` field may support ISO date strings and unix epoch integers, depending on the data platform you use. ## Default diff --git a/website/docs/reference/resource-configs/where.md b/website/docs/reference/resource-configs/where.md index 63c40b99902..3ea6497423a 100644 --- a/website/docs/reference/resource-configs/where.md +++ b/website/docs/reference/resource-configs/where.md @@ -134,7 +134,9 @@ You can override this behavior by: Within this macro definition, you can reference whatever custom macros you want, based on static inputs from the configuration. At simplest, this enables you to DRY up code that you'd otherwise need to repeat across many different `.yml` files. Because the `get_where_subquery` macro is resolved at runtime, your custom macros can also include [fetching the results of introspective database queries](https://docs.getdbt.com/reference/dbt-jinja-functions/run_query). -**Example:** Filter your test to the past three days of data, using dbt's cross-platform [`dateadd()`](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros#dateadd) utility macro. +#### Example + +Filter your test to the past N days of data, using dbt's cross-platform [`dateadd()`](/reference/dbt-jinja-functions/cross-database-macros#dateadd) utility macro. You can set the number of days in the placeholder string. @@ -147,7 +149,7 @@ models: tests: - unique: config: - where: "date_column > __three_days_ago__" # placeholder string for static config + where: "date_column > __3_days_ago__" # placeholder string for static config ``` @@ -158,10 +160,9 @@ models: {% macro get_where_subquery(relation) -%} {% set where = config.get('where') %} {% if where %} - {% if "__three_days_ago__" in where %} + {% if "_days_ago__" in where %} {# replace placeholder string with result of custom macro #} - {% set three_days_ago = dbt.dateadd('day', -3, current_timestamp()) %} - {% set where = where | replace("__three_days_ago__", three_days_ago) %} + {% set where = replace_days_ago(where) %} {% endif %} {%- set filtered -%} (select * from {{ relation }} where {{ where }}) dbt_subquery @@ -171,6 +172,21 @@ models: {% do return(relation) %} {%- endif -%} {%- endmacro %} + +{% macro replace_days_ago(where_string) %} + {# Use regex to search the pattern for the number days #} + {# Default to 3 days when no number found #} + {% set re = modules.re %} + {% set days = 3 %} + {% set pattern = '__(\d+)_days_ago__' %} + {% set match = re.search(pattern, where_string) %} + {% if match %} + {% set days = match.group(1) | int %} + {% endif %} + {% set n_days_ago = dbt.dateadd('day', -days, current_timestamp()) %} + {% set result = re.sub(pattern, n_days_ago, where_string) %} + {{ return(result) }} +{% endmacro %} ``` diff --git a/website/docs/reference/resource-properties/description.md b/website/docs/reference/resource-properties/description.md index cf7b2b29a5a..a542b9aba79 100644 --- a/website/docs/reference/resource-properties/description.md +++ b/website/docs/reference/resource-properties/description.md @@ -14,6 +14,7 @@ description: "This guide explains how to use the description key to add YAML des { label: 'Analyses', value: 'analyses', }, { label: 'Macros', value: 'macros', }, { label: 'Data tests', value: 'data_tests', }, + { label: 'Unit tests', value: 'unit_tests', }, ] }> @@ -150,24 +151,81 @@ macros: +You can add a description to a [singular data test](/docs/build/data-tests#singular-data-tests) or a [generic data test](/docs/build/data-tests#generic-data-tests). + ```yml +# Singular data test example + version: 2 data_tests: - name: data_test_name description: markdown_string - ``` + + + + +```yml +# Generic data test example + +version: 2 +models: + - name: model_name + columns: + - name: column_name + tests: + - unique: + description: markdown_string +``` -The `description` property is available for generic and singular data tests beginning in dbt v1.9. +The `description` property is available for [singular data tests](/docs/build/data-tests#singular-data-tests) or [generic data tests](/docs/build/data-tests#generic-data-tests) beginning in dbt v1.9. + + + + + + + + + + + +```yml +unit_tests: + - name: unit_test_name + description: "markdown_string" + model: model_name + given: ts + - input: ref_or_source_call + rows: + - {column_name: column_value} + - {column_name: column_value} + - {column_name: column_value} + - {column_name: column_value} + - input: ref_or_source_call + format: csv + rows: dictionary | string + expect: + format: dict | csv | sql + fixture: fixture_name +``` + + + + + + + +The `description` property is available for [unit tests](/docs/build/unit-tests) beginning in dbt v1.8. @@ -176,13 +234,17 @@ The `description` property is available for generic and singular data tests begi
## Definition -A user-defined description. Can be used to document: + +A user-defined description used to document: + - a model, and model columns - sources, source tables, and source columns - seeds, and seed columns - snapshots, and snapshot columns - analyses, and analysis columns - macros, and macro arguments +- data tests, and data test columns +- unit tests for models These descriptions are used in the documentation website rendered by dbt (refer to [the documentation guide](/docs/build/documentation) or [dbt Explorer](/docs/collaborate/explore-projects)). @@ -196,6 +258,18 @@ Be mindful of YAML semantics when providing a description. If your description c ## Examples +This section contains examples of how to add descriptions to various resources: + +- [Add a simple description to a model and column](#add-a-simple-description-to-a-model-and-column)
+- [Add a multiline description to a model](#add-a-multiline-description-to-a-model)
+- [Use some markdown in a description](#use-some-markdown-in-a-description)
+- [Use a docs block in a description](#use-a-docs-block-in-a-description)
+- [Link to another model in a description](#link-to-another-model-in-a-description) +- [Include an image from your repo in your descriptions](#include-an-image-from-your-repo-in-your-descriptions)
+- [Include an image from the web in your descriptions](#include-an-image-from-the-web-in-your-descriptions)
+- [Add a description to a data test](#add-a-description-to-a-data-test)
+- [Add a description to a unit test](#add-a-description-to-a-unit-test)
+ ### Add a simple description to a model and column @@ -400,3 +474,80 @@ models: If mixing images and text, also consider using a docs block. +### Add a description to a data test + + + + + + + +You can add a `description` property to a generic or singular data test. + +#### Generic data test + +This example shows a generic data test that checks for unique values in a column for the `orders` model. + + + +```yaml +version: 2 + +models: + - name: orders + columns: + - name: order_id + tests: + - unique: + description: "The order_id is unique for every row in the orders model" +``` + + +#### Singular data test + +This example shows a singular data test that checks to ensure all values in the `payments` model are not negative (≥ 0). + + + +```yaml +version: 2 +data_tests: + - name: assert_total_payment_amount_is_positive + description: > + Refunds have a negative amount, so the total amount should always be >= 0. + Therefore return records where total amount < 0 to make the test fail. + +``` + + +Note that in order for the test to run, the `tests/assert_total_payment_amount_is_positive.sql` SQL file has to exist in the `tests` directory. + +### Add a description to a unit test + + + + + + + +This example shows a unit test that checks to ensure the `opened_at` timestamp is properly truncated to a date for the `stg_locations` model. + + + +```yaml +unit_tests: + - name: test_does_location_opened_at_trunc_to_date + description: "Check that opened_at timestamp is properly truncated to a date." + model: stg_locations + given: + - input: source('ecom', 'raw_stores') + rows: + - {id: 1, name: "Rego Park", tax_rate: 0.2, opened_at: "2016-09-01T00:00:00"} + - {id: 2, name: "Jamaica", tax_rate: 0.1, opened_at: "2079-10-27T23:59:59.9999"} + expect: + rows: + - {location_id: 1, location_name: "Rego Park", tax_rate: 0.2, opened_date: "2016-09-01"} + - {location_id: 2, location_name: "Jamaica", tax_rate: 0.1, opened_date: "2079-10-27"} +``` + + diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 018988a4934..4fcc4e8a24d 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -284,19 +284,18 @@ Snapshots can be configured in multiple ways: -1. Defined in YAML files using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [the dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 and higher). +1. Defined in YAML files using the `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) or whichever folder you pefer. Available in [the dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks), dbt v1.9 and higher. 2. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. -1. Defined in a YAML file using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [the dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 and higher). The latest snapshot YAML syntax provides faster and more efficient management. -2. Using a `config` block within a snapshot defined in Jinja SQL. -3. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. - +1. Using a `config` block within a snapshot defined in Jinja SQL. +2. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. +3. Defined in a YAML file using the `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [the dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 and higher). -Snapshot configurations are applied hierarchically in the order above with higher taking precedence. +Snapshot configurations are applied hierarchically in the order above with higher taking precedence. You can also apply [tests](/reference/snapshot-properties) to snapshots using the [`tests` property](/reference/resource-properties/data-tests). ### Examples diff --git a/website/docs/terms/elt.md b/website/docs/terms/elt.md deleted file mode 100644 index 0e7d11bf7dd..00000000000 --- a/website/docs/terms/elt.md +++ /dev/null @@ -1,139 +0,0 @@ ---- -id: elt -title: What is ELT (Extract, Load, Transform)? -description: ELT is the process of first extraction data from different sources, then loading it into a data warehouse, and finally transforming it. -displayText: ELT -hoverSnippet: Extract, Load, Transform (ELT) is the process of first extracting data from different data sources, loading it into a target data warehouse, and finally transforming it. ---- - - What is ELT (Extract, Load, Transform)? How does it differ from ETL? - -Extract, Load, Transform (ELT) is the process of first extracting data from different data sources, then loading it into a target , and finally transforming it. - -ELT has emerged as a paradigm for how to manage information flows in a modern data warehouse. This represents a fundamental shift from how data previously was handled when Extract, Transform, Load (ETL) was the data workflow most companies implemented. - -Transitioning from ETL to ELT means that you no longer have to capture your transformations during the initial loading of the data into your data warehouse. Rather, you are able to load all of your data, then build transformations on top of it. Data teams report that the ELT workflow has several advantages over the traditional ETL workflow which we’ll go over [in-depth later in this glossary](#benefits-of-elt). - -## How ELT works - -In an ELT process, data is extracted from data sources, loaded into a target data platform, and finally transformed for analytics use. We’ll go over the three components (extract, load, transform) in detail here. - -![Diagram depicting the ELT workflow. Data is depicted being extracted from example data sources like an Email CRM, Facebook Ads platform, Backend databases, and Netsuite. The data is then loaded as raw data into a data warehouse. From there, the data is transformed within the warehouse by renaming, casting, joining, or enriching the raw data. The result is then modeled data inside your data warehouse.](/img/docs/terms/elt/elt-diagram.png) - -### Extract - -In the extraction process, data is extracted from multiple data sources. The data extracted is, for the most part, data that teams eventually want to use for analytics work. Some examples of data sources can include: - -- Backend application databases -- Marketing platforms -- Email and sales CRMs -- and more! - -Accessing these data sources using Application Programming Interface (API) calls can be a challenge for individuals and teams who don't have the technical expertise or resources to create their own scripts and automated processes. However, the recent development of certain open-source and Software as a Service (SaaS) products has removed the need for this custom development work. By establishing the option to create and manage pipelines in an automated way, you can extract the data from data sources and load it into data warehouses via a user interface. - -Since not every data source will integrate with SaaS tools for extraction and loading, it’s sometimes inevitable that teams will write custom ingestion scripts in addition to their SaaS tools. - -### Load - -During the loading stage, data that was extracted is loaded into the target data warehouse. Some examples of modern data warehouses include Snowflake, Amazon Redshift, and Google BigQuery. Examples of other data storage platforms include data lakes such as Databricks’s Data Lakes. Most of the SaaS applications that extract data from your data sources will also load it into your target data warehouse. Custom or in-house extraction and load processes usually require strong data engineering and technical skills. - -At this point in the ELT process, the data is mostly unchanged from its point of extraction. If you use an extraction and loading tool like Fivetran, there may have been some light normalization on your data. But for all intents and purposes, the data loaded into your data warehouse at this stage is in its raw format. - -### Transform - -In the final transformation step, the raw data that has been loaded into your data warehouse is finally ready for modeling! When you first look at this data, you may notice a few things about it… - -- Column names may or may not be clear -- Some columns are potentially the incorrect data type -- Tables are not joined to other tables -- Timestamps may be in the incorrect timezone for your reporting -- fields may need to be unnested -- Tables may be missing primary keys -- And more! - -...hence the need for transformation! During the transformation process, data from your data sources is usually: - -- **Lightly Transformed**: Fields are cast correctly, timestamp fields’ timezones are made uniform, tables and fields are renamed appropriately, and more. -- **Heavily Transformed**: Business logic is added, appropriate materializations are established, data is joined together, etc. -- **QA’d**: Data is tested according to business standards. In this step, data teams may ensure primary keys are unique, model relations match-up, column values are appropriate, and more. - -Common ways to transform your data include leveraging modern technologies such as dbt, writing custom SQL scripts that are automated by a scheduler, utilizing stored procedures, and more. - -## ELT vs ETL - -The primary difference between the traditional ETL and the modern ELT workflow is when [data transformation](https://www.getdbt.com/analytics-engineering/transformation/) and loading take place. In ETL workflows, data extracted from data sources is transformed prior to being loaded into target data platforms. Newer ELT workflows have data being transformed after being loaded into the data platform of choice. Why is this such a big deal? - -| | ELT | ETL | -|---|---|---| -| Programming skills required| Often little to no code to extract and load data into your data warehouse. | Often requires custom scripts or considerable data engineering lift to extract and transform data prior to load. | -| Separation of concerns | Extraction, load, and transformation layers can be explicitly separated out by different products. | ETL processes are often encapsulated in one product. | -| Distribution of transformations | Since transformations take place last, there is greater flexibility in the modeling process. Worry first about getting your data in one place, then you have time to explore the data to understand the best way to transform it. | Because transformation occurs before data is loaded into the target location, teams must conduct thorough work prior to make sure data is transformed properly. Heavy transformations often take place downstream in the BI layer. | -| [Data team distribution](https://www.getdbt.com/data-teams/analytics-job-descriptions/) | ELT workflows empower data team members who know SQL to create their own extraction and loading pipelines and transformations. | ETL workflows often require teams with greater technical skill to create and maintain pipelines. | - -Why has ELT adoption grown so quickly in recent years? A few reasons: - -- **The abundance of cheap cloud storage with modern data warehouses.** The creation of modern data warehouses such Redshift and Snowflake has made it so teams of all sizes can store and scale their data at a more efficient cost. This was a huge enabler for the ELT workflow. -- **The development of low-code or no-code data extractors and loaders.** Products that require little technical expertise such as Fivetran and Stitch, which can extract data from many data sources and load it into many different data warehouses, have helped lower the barrier of entry to the ELT workflow. Data teams can now relieve some of the data engineering lift needed to extract data and create complex transformations. -- **A true code-based, version-controlled transformation layer with the development of dbt.** Prior to the development of dbt, there was no singular transformation layer product. dbt helps data analysts apply software engineering best practices (version control, CI/CD, and testing) to data transformation, ultimately allowing for anyone who knows SQL to be a part of the ELT process. -- **Increased compatibility between ELT layers and technology in recent years.** With the expansion of extraction, loading, and transformation layers that integrate closely together and with cloud storage, the ELT workflow has never been more accessible. For example, Fivetran creates and maintains [dbt packages](https://hub.getdbt.com/) to help write dbt transformations for the data sources they connect to. - -## Benefits of ELT - -You often hear about the benefits of the ELT workflow to data, but you can sometimes forget to talk about the benefits it brings to people. There are a variety of benefits that this workflow brings to the actual data (which we’ll outline in detail below), such as the ability to recreate historical transformations, test data and data models, and more. We'll also want to use this section to emphasize the empowerment the ELT workflow brings to both data team members and business stakeholders. - -### ELT benefit #1: Data as code - -Ok we said it earlier: The ELT workflow allows data teams to function like software engineers. But what does this really mean? How does it actually impact your data? - -#### Analytics code can now follow the same best practices as software code - -At its core, data transformations that occur last in a data pipeline allow for code-based and version-controlled transformations. These two factors alone permit data team members to: - -- Easily recreate historical transformations by rolling back commits -- Establish code-based tests -- Implement CI/CD workflows -- Document data models like typical software code. - -#### Scaling, made sustainable - -As your business grows, the number of data sources correspondingly increases along with it. As such, so do the number of transformations and models needed for your business. Managing a high number of transformations without version control or automation is not scalable. - -The ELT workflow capitalizes on transformations occurring last to provide flexibility and software engineering best practices to data transformation. Instead of having to worry about how your extraction scripts scale as your data increases, data can be extracted and loaded automatically with a few clicks. - -### ELT benefit #2: Bring the power to the people - -The ELT workflow opens up a world of opportunity for the people that work on that data, not just the data itself. - -#### Empowers data team members - -Data analysts, analytics engineers, and even data scientists no longer have to be dependent on data engineers to create custom pipelines and models. Instead, they can use point-and-click products such as Fivetran and Airbyte to extract and load the data for them. - -Having the transformation as the final step in the ELT workflow also allows data folks to leverage their understanding of the data and SQL to focus more on actually modeling the data. - -#### Promotes greater transparency for end busines users - -Data teams can expose the version-controlled code used to transform data for analytics to end business users by no longer having transformations hidden in the ETL process. Instead of having to manually respond to the common question, “How is this data generated?” data folks can direct business users to documentation and repositories. Having end business users involved or viewing the data transformations promote greater collaboration and awareness between business and data folks. - -## ELT tools - -As mentioned earlier, the recent development of certain technologies and products has helped lower the barrier of entry to implementing the ELT workflow. Most of these new products act as one or two parts of the ELT process, but some have crossover across all three parts. We’ll outline some of the current tools in the ELT ecosystem below. - -| Product | E/L/T? | Description | Open source option? | -|---|---|---|---| -| Fivetran/HVR | E, some T, L | Fivetran is a SaaS company that helps data teams extract, load, and perform some transformation on their data. Fivetran easily integrates with modern data warehouses and dbt. They also offer transformations that leverage dbt Core. | :x: | -| Stitch by Talend | E, L | Stitch (part of Talend) is another SaaS product that has many data connectors to extract data and load it into data warehouses. | :x: | -| Airbyte | E, L | Airbyte is an open-source and cloud service that allows teams to create data extraction and load pipelines. | :white_check_mark: | -| Funnel | E, some T, L | Funnel is another product that can extract and load data. Funnel’s data connectors are primarily focused around marketing data sources. | :x: | -| dbt | T | dbt is the transformation tool that enables data analysts and engineers to transform, test, and document data in the cloud data warehouse. dbt offers both an open-source and cloud-based product. | :white_check_mark: | - -## Conclusion - -The past few years have been a whirlwind for the data world. The increased accessibility and affordability of cloud warehouses, no-code data extractors and loaders, and a true transformation layer with dbt has allowed for the ELT workflow to become the preferred analytics workflow. ETL predates ELT and differs in when data is transformed. In both processes, data is first extracted from different sources. However, in ELT processes, data is loaded into the target data platform and then transformed. The ELT workflow ultimately allows for data team members to extract, load, and model their own data in a flexible, accessible, and scalable way. - -## Further reading - -Here's some of our favorite content about the ELT workflow: - -- [The case for the ELT workflow](https://www.getdbt.com/analytics-engineering/case-for-elt-workflow/) -- [A love letter to ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/) -- [What is dbt?](https://getdbt.com/product/what-is-dbt/) diff --git a/website/docs/terms/etl.md b/website/docs/terms/etl.md deleted file mode 100644 index 321f59a65d0..00000000000 --- a/website/docs/terms/etl.md +++ /dev/null @@ -1,130 +0,0 @@ ---- -id: etl -title: What is ETL (Extract, Transform, Load)? -description: ETL is the process of first extracting data from a data source, transforming it, and then loading it into a target data warehouse. -displayText: ETL -hoverSnippet: Extract, Transform, Load (ETL) is the process of first extracting data from a data source, transforming it, and then loading it into a target data warehouse. ---- - - - What is ETL (Extract, Transform, Load)? How has it evolved? - - -ETL, or “Extract, Transform, Load”, is the process of first extracting data from a data source, transforming it, and then loading it into a target . In ETL workflows, much of the meaningful [data transformation](https://www.getdbt.com/analytics-engineering/transformation/) occurs outside this primary pipeline in a downstream business intelligence (BI) platform. - -ETL is contrasted with the newer (Extract, Load, Transform) workflow, where transformation occurs after data has been loaded into the target data warehouse. In many ways, the ETL workflow could have been renamed the ETLT workflow, because a considerable portion of meaningful data transformations happen outside the data pipeline. The same transformations can occur in both ETL and ELT workflows, the primary difference is *when* (inside or outside the primary ETL workflow) and *where* the data is transformed (ETL platform/BI tool/data warehouse). - -It’s important to talk about ETL and understand how it works, where it provides value, and how it can hold people back. If you don’t talk about the benefits and drawbacks of systems, how can you expect to improve them? - -## How ETL works - -In an ETL process, data is first extracted from a source, transformed, and then loaded into a target data platform. We’ll go into greater depth for all three steps below. - -![A diagram depicting the ETL workflow. The diagram starts by depicting raw data being extracted from various example data sources like an email CRM, Facebook Ads platform, a backend database, and Netsuite. Once the data is extracted, the raw data is transformed within the data pipeline via renaming, casting, joining, and enriching. After the data is transformed within the data pipeline, the modeled data is loaded into a data warehouse.](/img/docs/terms/etl/etl-diagram.png) - -### Extract - -In this first step, data is extracted from different data sources. Data that is extracted at this stage is likely going to be eventually used by end business users to make decisions. Some examples of these data sources include: - -- Ad platforms (Facebook Ads, Google Ads, etc.) -- Backend application databases -- Sales CRMs -- And more! - -To actually get this data, data engineers may write custom scripts that make Application Programming Interface (API) calls to extract all the relevant data. Because making and automating these API calls gets harder as data sources and data volume grows, this method of extraction often requires strong technical skills. In addition, these extraction scripts also involve considerable maintenance since APIs change relatively often. Data engineers are often incredibly competent at using different programming languages such as Python and Java. Data teams can also extract from these data sources with open source and Software as a Service (SaaS) products. - -### Transform - -At this stage, the raw data that has been extracted is normalized and modeled. In ETL workflows, much of the actual meaningful business logic, metric calculations, and entity joins tend to happen further down in a downstream BI platform. As a result, the transformation stage here is focused on data cleanup and normalization – renaming of columns, correct casting of fields, timestamp conversions. - -To actually transform the data, there’s two primary methods teams will use: - -- **Custom solutions**: In this solution, data teams (typically data engineers on the team), will write custom scripts and create automated pipelines to transform the data. Unlike ELT transformations that typically use SQL for modeling, ETL transformations are often written in other programming languages such as Python or Scala. Data engineers may leverage technologies such as Apache Spark or Hadoop at this point to help process large volumes of data. -- **ETL products**: There are ETL products that will extract, transform, and load your data in one platform. [These tools](#etl-tools) often involve little to no code and instead use Graphical User Interfaces (GUI) to create pipelines and transformations. - -### Load - -In the final stage, the transformed data is loaded into your target data warehouse. Once this transformed data is in its final destination, it’s most commonly exposed to end business users either in a BI tool or in the data warehouse directly. - -The ETL workflow implies that your raw data does not live in your data warehouse. *Because transformations occur before load, only transformed data lives in your data warehouse in the ETL process.* This can make it harder to ensure that transformations are performing the correct functionality. - -## How ETL is being used - -While ELT adoption is growing, we still see ETL use cases for processing large volumes of data and adhering to strong data governance principles. - -### ETL to efficiently normalize large volumes of data - -ETL can be an efficient way to perform simple normalizations across large data sets. Doing these lighter transformations across a large volume of data during loading can help get the data formatted properly and quickly for downstream use. In addition, end business users sometimes need quick access to raw or somewhat normalized data. Through an ETL workflow, data teams can conduct lightweight transformations on data sources and quickly expose them in their target data warehouse and downstream BI tool. - -### ETL for hashing PII prior to load - -Some companies will want to mask, hash, or remove PII values before it enters their data warehouse. In an ETL workflow, teams can transform PII to hashed values or remove them completely during the loading process. This limits where PII is available or accessible in an organization’s data warehouse. - -## ETL challenges - -There are reasons ETL has persisted as a workflow for over twenty years. However, there are also reasons why there’s been such immense innovation in this part of the data world in the past decade. From our perspective, the technical and human limitations we describe below are some of the reasons ELT has surpassed ETL as the preferred workflow. - -### ETL challenge #1: Technical limitations - -**Limited or lack of version control** - -When transformations exist as standalone scripts or deeply woven in ETL products, it can be hard to version control the transformations. Not having version control on transformation as code means that data teams can’t easily recreate or rollback historical transformations and perform code reviews. - -**Immense amount of business logic living in BI tools** - -Some teams with ETL workflows only implement much of their business logic in their BI platform versus earlier in their transformation phase. While most organizations have some business logic in their BI tools, an excess of this logic downstream can make rendering data in the BI tool incredibly slow and potentially hard to track if the code in the BI tool is not version controlled or exposed in documentation. - -**Challenging QA processes** - -While data quality testing can be done in ETL processes, not having the raw data living somewhere in the data warehouse inevitably makes it harder to ensure data models are performing the correct functionality. In addition, quality control continually gets harder as the number of data sources and pipelines within your system grows. - -### ETL challenge #2: Human limitations - -**Data analysts can be excluded from ETL work** - -Because ETL workflows often involve incredibly technical processes, they've restricted data analysts from being involved in the data workflow process. One of the greatest strengths of data analysts is their knowledge of the data and SQL, and when extractions and transformations involve unfamiliar code or applications, they and their expertise can be left out of the process. Data analysts and scientists also become dependent on other people to create the schemas, tables, and datasets they need for their work. - -**Business users are kept in the dark** - -Transformations and business logic can often be buried deep in custom scripts, ETL tools, and BI platforms. At the end of the day, this can hurt business users: They're kept out of the data modeling process and have limited views into how data transformation takes place. As a result, end business users often have little clarity on data definition, quality, and freshness, which ultimately can decrease trust in the data and data team. - -## ETL vs ELT - -You may read other articles or technical documents that use ETL and ELT interchangeably. On paper, the only difference is the order in which the T and the L appear. However, this mere switching of letters dramatically changes the way data exists in and flows through a business’ system. - -In both processes, data from different data sources is extracted in similar ways. However, in ELT, data is then directly loaded into the target data platform versus being transformed in ETL. Now, via ELT workflows, both raw and transformed data can live in a data warehouse. In ELT workflows, data folks have the flexibility to model the data after they’ve had the opportunity to explore and analyze the raw data. ETL workflows can be more constraining since transformations happen immediately after extraction. We break down some of the other major differences between the two below: - -| | ELT | ETL | -|---|---|---| -| Programming skills required | Often requires little to no code to extract and load data into your data warehouse. | Often requires custom scripts or considerable data engineering lift to extract and transform data prior to load. | -| Separation of concerns | Extraction, load, and transformation layers can be explicitly separated out by different products. | ETL processes are often encapsulated in one product. | -| Distribution of transformations | Since transformations take place last, there is greater flexibility in the modeling process. Worry first about getting your data in one place, then you have time to explore the data to understand the best way to transform it. | Because transformation occurs before data is loaded into the target location, teams must conduct thorough work prior to make sure data is transformed properly. Heavy transformations often take place downstream in the BI layer. | -| [Data team roles](https://www.getdbt.com/data-teams/analytics-job-descriptions/) | ELT workflows empower data team members who know SQL to create their own extraction and loading pipelines and transformations. | ETL workflows often require teams with greater technical skill to create and maintain pipelines. | - -While ELT is growing in adoption, it’s still important to talk about when ETL might be appropriate and where you'll see challenges with the ETL workflow. - -## ETL tools - -There exists a variety of ETL technologies to help teams get data into their data warehouse. A good portion of ETL tools on the market today are geared toward enterprise businesses and teams, but there are some that are also applicable for smaller organizations. - -| Platform | E/T/L? | Description | Open source option? | -|---|---|---|---| -| Informatica | E, T, L | An all-purpose ETL platform that supports low or no-code extraction, transformations and loading. Informatica also offers a broad suite of data management solutions beyond ETL and is often leveraged by enterprise organizations. | :x: | -| Integrate.io | E, T, L | A newer ETL product focused on both low-code ETL as well as reverse ETL pipelines. | :x: | -| Matillion | E, T, L | Matillion is an end-to-end ETL solution with a variety of native data connectors and GUI-based transformations. | :x: | -| Microsoft SISS | E, T, L | Microsoft’s SQL Server Integration Services (SISS) offers a robust, GUI-based platform for ETL services. SISS is often used by larger enterprise teams. | :x: | -| Talend Open Studio | E, T, L | An open source suite of GUI-based ETL tools. | :white_check_mark: | - -## Conclusion - -ETL, or “Extract, Transform, Load,” is the process of extracting data from different data sources, transforming it, and loading that transformed data into a data warehouse. ETL typically supports lighter transformations during the phase prior to loading and more meaningful transformations to take place in downstream BI tools. We’re seeing now that ETL is fading out and the newer ELT workflow is replacing it as a practice for many data teams. However, it’s important to note that ETL allowed us to get us to where we are today: Capable of building workflows that extract data within simple UIs, store data in scalable cloud data warehouses, and write data transformations like software engineers. - -## Further Reading - -Please check out some of our favorites reads regarding ETL and ELT below: - -- [Glossary: ELT](https://docs.getdbt.com/terms/elt) -- [The case for the ELT workflow](https://www.getdbt.com/analytics-engineering/case-for-elt-workflow/) -- [A love letter to ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/) -- [Reverse ETL](https://www.getdbt.com/analytics-engineering/use-cases/operational-analytics/) - diff --git a/website/docs/terms/reverse-etl.md b/website/docs/terms/reverse-etl.md deleted file mode 100644 index a3ccd0b0f70..00000000000 --- a/website/docs/terms/reverse-etl.md +++ /dev/null @@ -1,94 +0,0 @@ ---- -id: reverse-etl -title: Reverse ETL -description: Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms. -displayText: reverse ETL -hoverSnippet: Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms. ---- - - - Reverse ETL, demystified: What it is in plain english - - -Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms. Once in an end platform, that data is often used to drive meaningful business actions, such as creating custom audiences in ad platforms, personalizing email campaigns, or supplementing data in a sales CRM. You may also hear about reverse ETL referred to as operational analytics or data activation. - -Reverse ETL efforts typically happen after data teams have set up their [modern data stack](https://www.getdbt.com/blog/future-of-the-modern-data-stack/) and ultimately have a consistent and automated way to extract, load, and transform data. Data teams are also often responsible for setting up the pipelines to send down data to business platforms, and business users are typically responsible for *using the data* once it gets to their end platform. - -Ultimately, reverse ETL is a way to put data where the work is already happening, support self-service efforts, and help business users derive real action out of their data. - -## How reverse ETL works - -In the reverse ETL process, transformed data is synced from a data warehouse to external tools in order to be leveraged by different business teams. - -![A diagram depicting how the reverse ETL process works. It starts with data being extract from data sources like email CRMs, Facebook Ad platforms, backend databases, and NetSuite. The raw data is then loaded into a data warehouse. After loading, the data is transformed and modeled. The modeled data is then loaded directly back into the tools that created the data, like Email CRMs, Facebook Ad platforms, and others so the insights are more accessible to business users.](/img/docs/terms/reverse-etl/reverse-etl-diagram.png) - -The power of reverse ETL comes from sending down *already transformed data* to business platforms. Raw data, while beautiful in its own way, typically lacks the structure, aggregations, and aliasing to be useful for end business users off the bat. After data teams transform data for business use in pipelines, typically to expose in an end business intelligence (BI) tool, they can also send this cleaned and meaningful data to other platforms where business users can derive value using [reverse ETL tools](#reverse-etl-tools). - -Data teams can choose to write additional transformations that may need to happen for end business tools in reverse ETL tools themselves or by creating [additional models in dbt](https://getdbt.com/open-source-data-culture/reverse-etl-playbook/). - -## Why use reverse ETL? - -There’s a few reasons why your team may want to consider using reverse ETL: - -### Putting data where the work is happening - -While most data teams would love it if business users spent a significant portion of their time in their BI tool, that’s neither practical nor necessarily the most efficient use of their time. In the real world, many business users will spend some time in a BI tool, identify the data that could be useful in a platform they spend a significant amount of time in, and work with the data team to get that data where they need it. Users feel comfortable and confident in the systems they use everyday—why not put the data in the places that allow them to thrive? - -### Manipulating data to fit end platform requirements - -Reverse ETL helps you to put data your business users need *in the format their end tool expects*. Oftentimes, end platforms expect data fields to be named or cast in a certain way. Instead of business users having to manually input those values in the correct format, you can transform your data using a product like dbt or directly in a reverse ETL tool itself, and sync down that data in an automated way. - -### Supporting self-service efforts - -By sending down data-team approved data in reverse ETL pipelines, your business users have the flexibility to use that data however they see fit. Soon, your business users will be making audiences, testing personalization efforts, and running their end platform like a well-oiled, data-powered machine. - - -## Reverse ETL use cases - -Just as there are almost endless opportunities with data, there are many potential different use cases for reverse ETL. We won’t go into every possible option, but we’ll cover some of the common use cases that exist for reverse ETL efforts. - -### Personalization - -Reverse ETL allows business users to access data that they normally would only have access to in a BI tool *in the platforms they use every day*. As a result, business users can now use this data to personalize how they create ads, send emails, and communicate with customers. - -Personalization was all the hype a few years ago and now, you rarely ever see an email come into your inbox without some sort of personalization in-place. Data teams using reverse ETL are able to pass down important customer information, such as location, customer lifetime value (CLV), tenure, and other fields, that can be used to create personalized emails, establish appropriate messaging, and segment email flows. All we can say: the possibilities for personalization powered by reverse ETL are endless. - -### Sophisticated paid marketing initiatives - -At the end of the day, businesses want to serve the right ads to the right people (and at the right cost). A common use case for reverse ETL is for teams to use their customer data to create audiences in ad platforms to either serve specific audiences or create lookalikes. While ad platforms have gotten increasingly sophisticated with their algorithms to identify high-value audiences, it usually never hurts to try supplementing those audiences with your own data to create sophisticated audiences or lookalikes. - -### Self-service analytics culture - -We hinted at it earlier, but reverse ETL efforts can be an effective way to promote a self-service analytics culture. When data teams put the data where business users need it, business users can confidently access it on their own, driving even faster insights and action. Instead of requesting a data pull from a data team member, they can find the data they need directly within the platform that they use. Reverse ETL allows business users to act on metrics that have already been built out and validated by data teams without creating ad-hoc requests. - -### “Real-time” data - -It would be amiss if we didn’t mention reverse ETL and the notion of “real-time” data. While you can have the debate over the meaningfulness and true value-add of real-time data another time, reverse ETL can be a mechanism to bring data to end business platforms in a more “real-time” way. - -Data teams can set up syncs in reverse ETL tools at higher cadences, allowing business users to have the data they need, faster. Obviously, there’s some cost-benefit analysis on how often you want to be loading data via [ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/) and hitting your data warehouse, but reverse ETL can help move data into external tools at a quicker cadence if deemed necessary. - -All this to say: move with caution in the realm of “real-time”, understand your stakeholders’ wants and decision-making process for real-time data, and work towards a solution that’s both practical and impactful. - -## Reverse ETL tools - -Reverse ETL tools typically establish the connection between your data warehouse and end business tools, offer an interface to create additional transformations or audiences, and support automation of downstream syncs. Below are some examples of tools that support reverse ETL pipelines. - -| Tool | Description | Open source option? | -|:---:|:---:|:---:| -| Hightouch | A platform to sync data models and create custom audiences for downstream business platforms. | :x: | -| Polytomic | A unified sync platform for syncing to and from data warehouses (ETL and Reverse ETL), databases, business apps, APIs, and spreadsheets. | :x: | -| Census | Another reverse ETL tool that can sync data from your data warehouse to your go-to-market tools. | :x: | -| Rudderstack | Also a CDP (customer data platform), Rudderstack additionally supports pushing down data and audience to external tools, such as ad platforms and email CRMs. | :white_check_mark: | -| Grouparoo | Grouparoo, part of Airbyte, is an open source framework to move data from data warehouses to different cloud-based tools. | :white_check_mark: | - -## Conclusion - -Reverse ETL enables you to sync your transformed data stored in your data warehouse to external platforms often used by marketing, sales, and product teams. It allows you to leverage your data in a whole new way. Reverse ETL pipelines can support personalization efforts, sophisticated paid marketing initiatives, and ultimately offer new ways to leverage your data. In doing this, it creates a self-service analytics culture where stakeholders can receive the data they need in, in the places they need, in an automated way. - -## Further reading - -If you’re interested learning more about reverse ETL and the impact it could have on your team, check out the following: - -- [How dbt Labs’s data team approaches reverse ETL](https://getdbt.com/open-source-data-culture/reverse-etl-playbook/) -- [The operational data warehouse in action: Reverse ETL, CDPs, and the future of data activation](https://www.getdbt.com/coalesce-2021/operational-data-warehouse-reverse-etl-cdp-data-activation/) -- [The analytics engineering guide: Operational analytics](https://www.getdbt.com/analytics-engineering/use-cases/operational-analytics/) diff --git a/website/sidebars.js b/website/sidebars.js index 3a8f560c297..69da75833a7 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -26,11 +26,12 @@ const sidebarSettings = { label: "About dbt Cloud", link: { type: "doc", id: "docs/cloud/about-cloud/dbt-cloud-features" }, items: [ - "docs/cloud/about-cloud/dbt-cloud-features", "docs/cloud/about-cloud/architecture", - "docs/cloud/about-cloud/tenancy", - "docs/cloud/about-cloud/access-regions-ip-addresses", "docs/cloud/about-cloud/browsers", + "docs/cloud/about-cloud/change-your-dbt-cloud-theme", + "docs/cloud/about-cloud/dbt-cloud-features", + "docs/cloud/about-cloud/access-regions-ip-addresses", + "docs/cloud/about-cloud/tenancy", ], }, // About dbt Cloud directory { @@ -289,9 +290,9 @@ const sidebarSettings = { items: [ "docs/cloud/dbt-cloud-ide/develop-in-the-cloud", "docs/cloud/dbt-cloud-ide/keyboard-shortcuts", - "docs/cloud/dbt-cloud-ide/ide-user-interface", - "docs/cloud/dbt-cloud-ide/lint-format", "docs/cloud/dbt-cloud-ide/git-commit-signing", + "docs/cloud/dbt-cloud-ide/lint-format", + "docs/cloud/dbt-cloud-ide/ide-user-interface", { type: "category", label: "dbt Copilot", @@ -366,9 +367,9 @@ const sidebarSettings = { items: [ "docs/build/about-metricflow", "docs/build/join-logic", - "docs/build/validation", "docs/build/metricflow-time-spine", "docs/build/metricflow-commands", + "docs/build/validation", ], }, { @@ -438,10 +439,10 @@ const sidebarSettings = { link: { type: "doc", id: "docs/build/enhance-your-code" }, items: [ "docs/build/enhance-your-code", - "docs/build/project-variables", "docs/build/environment-variables", - "docs/build/packages", "docs/build/hooks-operations", + "docs/build/packages", + "docs/build/project-variables", ], }, { @@ -500,13 +501,13 @@ const sidebarSettings = { link: { type: "doc", id: "docs/deploy/monitor-jobs" }, items: [ "docs/deploy/monitor-jobs", - "docs/deploy/run-visibility", - "docs/deploy/retry-jobs", + "docs/deploy/artifacts", "docs/deploy/job-notifications", "docs/deploy/model-notifications", - "docs/deploy/webhooks", - "docs/deploy/artifacts", + "docs/deploy/run-visibility", + "docs/deploy/retry-jobs", "docs/deploy/source-freshness", + "docs/deploy/webhooks", ], }, "docs/deploy/deployment-tools", @@ -524,12 +525,12 @@ const sidebarSettings = { link: { type: "doc", id: "docs/collaborate/explore-projects" }, items: [ "docs/collaborate/explore-projects", - "docs/collaborate/data-health-signals", "docs/collaborate/access-from-dbt-cloud", "docs/collaborate/column-level-lineage", + "docs/collaborate/data-health-signals", + "docs/collaborate/explore-multiple-projects", "docs/collaborate/model-performance", "docs/collaborate/project-recommendations", - "docs/collaborate/explore-multiple-projects", "docs/collaborate/dbt-explorer-faqs", { type: "category", @@ -729,8 +730,8 @@ const sidebarSettings = { link: { type: "doc", id: "docs/dbt-cloud-apis/sl-api-overview" }, items: [ "docs/dbt-cloud-apis/sl-api-overview", - "docs/dbt-cloud-apis/sl-jdbc", "docs/dbt-cloud-apis/sl-graphql", + "docs/dbt-cloud-apis/sl-jdbc", "docs/dbt-cloud-apis/sl-python", ], }, @@ -809,6 +810,7 @@ const sidebarSettings = { items: [ "docs/dbt-versions/dbt-cloud-release-notes", "docs/dbt-versions/compatible-track-changelog", + "docs/dbt-versions/2024-release-notes", "docs/dbt-versions/2023-release-notes", "docs/dbt-versions/2022-release-notes", { @@ -851,18 +853,18 @@ const sidebarSettings = { "reference/project-configs/asset-paths", "reference/project-configs/clean-targets", "reference/project-configs/config-version", - "reference/project-configs/seed-paths", "reference/project-configs/dispatch-config", "reference/project-configs/docs-paths", "reference/project-configs/macro-paths", - "reference/project-configs/packages-install-path", "reference/project-configs/name", "reference/project-configs/on-run-start-on-run-end", + "reference/project-configs/packages-install-path", "reference/project-configs/profile", "reference/project-configs/query-comment", "reference/project-configs/quoting", "reference/project-configs/require-dbt-version", "reference/project-configs/snapshot-paths", + "reference/project-configs/seed-paths", "reference/project-configs/model-paths", "reference/project-configs/test-paths", "reference/project-configs/version", @@ -926,27 +928,27 @@ const sidebarSettings = { type: "category", label: "General configs", items: [ + "reference/advanced-config-usage", "reference/resource-configs/access", "reference/resource-configs/alias", "reference/resource-configs/batch-size", "reference/resource-configs/begin", + "reference/resource-configs/contract", "reference/resource-configs/database", + "reference/resource-configs/docs", "reference/resource-configs/enabled", "reference/resource-configs/event-time", "reference/resource-configs/full_refresh", - "reference/resource-configs/contract", "reference/resource-configs/grants", "reference/resource-configs/group", - "reference/resource-configs/docs", "reference/resource-configs/lookback", + "reference/resource-configs/meta", "reference/resource-configs/persist_docs", + "reference/resource-configs/plus-prefix", "reference/resource-configs/pre-hook-post-hook", "reference/resource-configs/schema", "reference/resource-configs/tags", "reference/resource-configs/unique_key", - "reference/resource-configs/meta", - "reference/advanced-config-usage", - "reference/resource-configs/plus-prefix", ], }, { @@ -956,10 +958,10 @@ const sidebarSettings = { "reference/model-properties", "reference/resource-properties/model_name", "reference/model-configs", + "reference/resource-properties/concurrent_batches", "reference/resource-configs/materialized", "reference/resource-configs/on_configuration_change", "reference/resource-configs/sql_header", - "reference/resource-properties/concurrent_batches", ], }, { @@ -1010,10 +1012,10 @@ const sidebarSettings = { items: [ "reference/resource-properties/unit-tests", "reference/resource-properties/unit-test-input", - "reference/resource-properties/unit-testing-versions", - "reference/resource-properties/unit-test-overrides", "reference/resource-properties/data-formats", "reference/resource-properties/data-types", + "reference/resource-properties/unit-testing-versions", + "reference/resource-properties/unit-test-overrides", ], }, { @@ -1089,15 +1091,15 @@ const sidebarSettings = { label: "Node selection", items: [ "reference/node-selection/syntax", + "reference/node-selection/exclude", + "reference/node-selection/defer", "reference/node-selection/graph-operators", "reference/node-selection/set-operators", - "reference/node-selection/exclude", "reference/node-selection/methods", "reference/node-selection/putting-it-together", + "reference/node-selection/state-comparison-caveats", "reference/node-selection/yaml-selectors", "reference/node-selection/test-selection-examples", - "reference/node-selection/defer", - "reference/node-selection/state-comparison-caveats", ], }, { @@ -1115,8 +1117,8 @@ const sidebarSettings = { link: { type: "doc", id: "reference/global-configs/adapter-behavior-changes" }, items: [ "reference/global-configs/adapter-behavior-changes", - "reference/global-configs/databricks-changes", "reference/global-configs/redshift-changes", + "reference/global-configs/databricks-changes", ], }, { @@ -1132,6 +1134,8 @@ const sidebarSettings = { type: "category", label: "Available flags", items: [ + "reference/global-configs/usage-stats", + "reference/global-configs/version-compatibility", "reference/global-configs/logs", "reference/global-configs/cache", "reference/global-configs/failing-fast", @@ -1141,8 +1145,6 @@ const sidebarSettings = { "reference/global-configs/print-output", "reference/global-configs/record-timing-info", "reference/global-configs/resource-type", - "reference/global-configs/usage-stats", - "reference/global-configs/version-compatibility", "reference/global-configs/warnings", ], }, @@ -1183,9 +1185,9 @@ const sidebarSettings = { label: "dbt Artifacts", items: [ "reference/artifacts/dbt-artifacts", + "reference/artifacts/catalog-json", "reference/artifacts/manifest-json", "reference/artifacts/run-results-json", - "reference/artifacts/catalog-json", "reference/artifacts/sources-json", "reference/artifacts/sl-manifest", "reference/artifacts/other-artifacts", diff --git a/website/snippets/_cloud-environments-info.md b/website/snippets/_cloud-environments-info.md index 6d202d01998..cc153cf38a8 100644 --- a/website/snippets/_cloud-environments-info.md +++ b/website/snippets/_cloud-environments-info.md @@ -8,12 +8,12 @@ In dbt Cloud, there are two types of environments: - Production - **Development environment** — Determines the settings used in the dbt Cloud IDE or dbt Cloud CLI, for that particular project. -Each dbt Cloud project can only have a single development environment but can have any number of deployment environments. +Each dbt Cloud project can only have a single development environment, but can have any number of General deployment environments, one Production deployment environment and one Staging deployment environment. -|| Development | Staging | Deployment | -|------| --- | --- | --- | -| **Determines settings for** | dbt Cloud IDE or dbt Cloud CLI | dbt Cloud Job runs | dbt Cloud Job runs | -| **How many can I have in my project?** | 1 | Any number | Any number | +| | Development | General | Production | Staging | +|----------|-------------|---------|------------|---------| +| **Determines settings for** | dbt Cloud IDE or dbt Cloud CLI | dbt Cloud Job runs | dbt Cloud Job runs | dbt Cloud Job runs | +| **How many can I have in my project?** | 1 | Any number | 1 | 1 | :::note For users familiar with development on dbt Core, each environment is roughly analogous to an entry in your `profiles.yml` file, with some additional information about your repository to ensure the proper version of code is executed. More info on dbt core environments [here](/docs/core/dbt-core-environments). @@ -25,11 +25,12 @@ Both development and deployment environments have a section called **General Set | Setting | Example Value | Definition | Accepted Values | | --- | --- | --- | --- | -| Name | Production | The environment name | Any string! | -| Environment Type | Deployment | The type of environment | [Deployment, Development] | -| dbt Version | 1.4 (latest) | The dbt version used | Any dbt version in the dropdown | -| Default to Custom Branch | ☑️ | Determines whether to use a branch other than the repository’s default | See below | -| Custom Branch | dev | Custom Branch name | See below | +| Environment name | Production | The environment name | Any string! | +| Environment type | Deployment | The type of environment | Deployment, Development| +| Set deployment type | PROD | Designates the deployment environment type. | Production, Staging, General | +| dbt version | Latest | dbt Cloud automatically upgrades the dbt version running in this environment, based on the [release track](/docs/dbt-versions/cloud-release-tracks) you select. | Lastest, Compatible, Extended | +| Only run on a custom branch | ☑️ | Determines whether to use a branch other than the repository’s default | See below | +| Custom branch | dev | Custom Branch name | See below | :::note About dbt version diff --git a/website/snippets/_enterprise-permissions-table.md b/website/snippets/_enterprise-permissions-table.md index b39337697c1..4104759b24d 100644 --- a/website/snippets/_enterprise-permissions-table.md +++ b/website/snippets/_enterprise-permissions-table.md @@ -19,7 +19,7 @@ Key: {` | Account-level permission| Account Admin | Billing admin | Manage marketplace apps | Project creator | Security admin | Viewer | |:-------------------------|:-------------:|:------------:|:-------------------------:|:---------------:|:--------------:|:------:| -| Account settings | W | - | - | R | R | R | +| Account settings* | W | - | - | R | R | R | | Audit logs | R | - | - | - | R | R | | Auth provider | W | - | - | - | W | R | | Billing | W | W | - | - | - | R | @@ -38,6 +38,9 @@ Key: +\* Roles with write (**W**) access to Account settings can modify account-level settings, including [setting up Slack notifications](/docs/deploy/job-notifications#slack-notifications). + + #### Project permissions for account roles diff --git a/website/snippets/_git-providers-supporting-ci.md b/website/snippets/_git-providers-supporting-ci.md new file mode 100644 index 00000000000..34bd87db2fc --- /dev/null +++ b/website/snippets/_git-providers-supporting-ci.md @@ -0,0 +1,15 @@ +## Availability of features by Git provider + +- If your git provider has a [native dbt Cloud integration](/docs/cloud/git/git-configuration-in-dbt-cloud), you can seamlessly set up [continuous integration (CI)](/docs/deploy/ci-jobs) jobs directly within dbt Cloud. + +- For providers without native integration, you can still use the [Git clone method](/docs/cloud/git/import-a-project-by-git-url) to import your git URL and leverage the [dbt Cloud Administrative API](/docs/dbt-cloud-apis/admin-cloud-api) to trigger a CI job to run. + +The following table outlines the available integration options and their corresponding capabilities. + +| **Git provider** | **Native dbt Cloud integration** | **Automated CI job**|**Git clone**| **Information**| +| -----------------| ---------------------------------| -------------------------------------------|-----------------------|---------| +|[Azure DevOps](/docs/cloud/git/setup-azure)
| ✅| ✅ | ✅ | Organizations on the Team and Developer plans can connect to Azure DeveOps using a deploy key. Note, you won’t be able to configure automated CI jobs but you can still develop.| +|[GitHub](/docs/cloud/git/connect-github)
| ✅ | ✅ | ✅ | +|[GitLab](/docs/cloud/git/connect-gitlab)
| ✅ | ✅ | ✅ | +|All other git providers using [Git clone](/docs/cloud/git/import-a-project-by-git-url) ([BitBucket](/docs/cloud/git/import-a-project-by-git-url#bitbucket), [AWS CodeCommit](/docs/cloud/git/import-a-project-by-git-url#aws-codecommit), and others)| ❌ | ❌ | ✅ | Refer to the [Customizing CI/CD with custom pipelines](/guides/custom-cicd-pipelines?step=1) guide to set up continuous integration and continuous deployment (CI/CD).| + diff --git a/website/snippets/cloud-feature-parity.md b/website/snippets/cloud-feature-parity.md index 1107b999c14..f71109292d7 100644 --- a/website/snippets/cloud-feature-parity.md +++ b/website/snippets/cloud-feature-parity.md @@ -1,6 +1,6 @@ The following table outlines which dbt Cloud features are supported on the different SaaS options available today. For more information about feature availability, please [contact us](https://www.getdbt.com/contact/). -| Feature | AWS Multi-tenant | AWS single tenant |Azure multi-tenant ([Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud)) | Azure single tenant | +| Feature | AWS Multi-tenant | AWS single tenant |Azure multi-tenant | Azure single tenant | |-------------------------------|------------------|-----------------------|---------------------|---------------------| | Audit logs | ✅ | ✅ | ✅ | ✅ | | Continuous integration jobs | ✅ | ✅ | ✅ | ✅ | diff --git a/website/static/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png b/website/static/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png new file mode 100644 index 00000000000..29f64cc59f7 Binary files /dev/null and b/website/static/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png differ diff --git a/website/static/img/blog/authors/mwan.png b/website/static/img/blog/authors/mwan.png new file mode 100644 index 00000000000..ac852ee3636 Binary files /dev/null and b/website/static/img/blog/authors/mwan.png differ diff --git a/website/static/img/docs/cloud-integrations/assign-app-to-members.png b/website/static/img/docs/cloud-integrations/assign-app-to-members.png new file mode 100644 index 00000000000..dac1b415d30 Binary files /dev/null and b/website/static/img/docs/cloud-integrations/assign-app-to-members.png differ diff --git a/website/static/img/docs/cloud-integrations/azure-subscription.png b/website/static/img/docs/cloud-integrations/azure-subscription.png new file mode 100644 index 00000000000..19f19dc2814 Binary files /dev/null and b/website/static/img/docs/cloud-integrations/azure-subscription.png differ diff --git a/website/static/img/docs/cloud-integrations/create-service-principal.png b/website/static/img/docs/cloud-integrations/create-service-principal.png new file mode 100644 index 00000000000..a072c92b3ef Binary files /dev/null and b/website/static/img/docs/cloud-integrations/create-service-principal.png differ diff --git a/website/static/img/docs/cloud-integrations/review-and-assign.png b/website/static/img/docs/cloud-integrations/review-and-assign.png new file mode 100644 index 00000000000..570717daeda Binary files /dev/null and b/website/static/img/docs/cloud-integrations/review-and-assign.png differ diff --git a/website/static/img/docs/cloud-integrations/service-principal-fields.png b/website/static/img/docs/cloud-integrations/service-principal-fields.png new file mode 100644 index 00000000000..eb391ab122d Binary files /dev/null and b/website/static/img/docs/cloud-integrations/service-principal-fields.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/azure-enable.png b/website/static/img/docs/dbt-cloud/access-control/azure-enable.png index 8d95a5cb9fe..7f79bcb3c7c 100644 Binary files a/website/static/img/docs/dbt-cloud/access-control/azure-enable.png and b/website/static/img/docs/dbt-cloud/access-control/azure-enable.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/edit-entra-saml.png b/website/static/img/docs/dbt-cloud/access-control/edit-entra-saml.png new file mode 100644 index 00000000000..ceda1ee0bcc Binary files /dev/null and b/website/static/img/docs/dbt-cloud/access-control/edit-entra-saml.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/entra-id-saml.png b/website/static/img/docs/dbt-cloud/access-control/entra-id-saml.png new file mode 100644 index 00000000000..01ab65cef27 Binary files /dev/null and b/website/static/img/docs/dbt-cloud/access-control/entra-id-saml.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/google-enable.png b/website/static/img/docs/dbt-cloud/access-control/google-enable.png index 0c46cac6d6e..a2ffd42fb50 100644 Binary files a/website/static/img/docs/dbt-cloud/access-control/google-enable.png and b/website/static/img/docs/dbt-cloud/access-control/google-enable.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/new-okta-completed.png b/website/static/img/docs/dbt-cloud/access-control/new-okta-completed.png index 7da82285a20..89c246ffc45 100644 Binary files a/website/static/img/docs/dbt-cloud/access-control/new-okta-completed.png and b/website/static/img/docs/dbt-cloud/access-control/new-okta-completed.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/new-okta-config.png b/website/static/img/docs/dbt-cloud/access-control/new-okta-config.png index c7018a64327..342e89ca631 100644 Binary files a/website/static/img/docs/dbt-cloud/access-control/new-okta-config.png and b/website/static/img/docs/dbt-cloud/access-control/new-okta-config.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/saml-enable-entra.png b/website/static/img/docs/dbt-cloud/access-control/saml-enable-entra.png new file mode 100644 index 00000000000..e0a71da007b Binary files /dev/null and b/website/static/img/docs/dbt-cloud/access-control/saml-enable-entra.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/saml-enable.png b/website/static/img/docs/dbt-cloud/access-control/saml-enable.png index a165a3ee59b..212afeb7fef 100644 Binary files a/website/static/img/docs/dbt-cloud/access-control/saml-enable.png and b/website/static/img/docs/dbt-cloud/access-control/saml-enable.png differ diff --git a/website/static/img/docs/dbt-cloud/access-control/sso-uri.png b/website/static/img/docs/dbt-cloud/access-control/sso-uri.png index c557b903e57..87787184974 100644 Binary files a/website/static/img/docs/dbt-cloud/access-control/sso-uri.png and b/website/static/img/docs/dbt-cloud/access-control/sso-uri.png differ diff --git a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png index 3773d468d6a..4b3c64a7b32 100644 Binary files a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png and b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png differ diff --git a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-verify-overridden-version.png b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-verify-overridden-version.png deleted file mode 100644 index a6e553a0b2e..00000000000 Binary files a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-verify-overridden-version.png and /dev/null differ diff --git a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png index e17e1eb471f..4ceeb564576 100644 Binary files a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png and b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png differ diff --git a/website/static/img/docs/dbt-cloud/connecting-azure-devops/AD app.png b/website/static/img/docs/dbt-cloud/connecting-azure-devops/AD app.png index 01536bab17f..a921c8544b5 100644 Binary files a/website/static/img/docs/dbt-cloud/connecting-azure-devops/AD app.png and b/website/static/img/docs/dbt-cloud/connecting-azure-devops/AD app.png differ diff --git a/website/static/img/docs/dbt-cloud/connecting-azure-devops/add-service-principal.png b/website/static/img/docs/dbt-cloud/connecting-azure-devops/add-service-principal.png new file mode 100644 index 00000000000..7b9065df74d Binary files /dev/null and b/website/static/img/docs/dbt-cloud/connecting-azure-devops/add-service-principal.png differ diff --git a/website/static/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png b/website/static/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png index b8b11f6ea00..7494972d4f6 100644 Binary files a/website/static/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png and b/website/static/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png differ diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/dark-mode.png b/website/static/img/docs/dbt-cloud/using-dbt-cloud/dark-mode.png new file mode 100644 index 00000000000..80d36bcaaa0 Binary files /dev/null and b/website/static/img/docs/dbt-cloud/using-dbt-cloud/dark-mode.png differ diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/light-vs-dark.png b/website/static/img/docs/dbt-cloud/using-dbt-cloud/light-vs-dark.png new file mode 100644 index 00000000000..54263d092b5 Binary files /dev/null and b/website/static/img/docs/dbt-cloud/using-dbt-cloud/light-vs-dark.png differ diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg b/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg deleted file mode 100644 index 04ec9280f14..00000000000 Binary files a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg and /dev/null differ diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.png b/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.png deleted file mode 100644 index 5f75707090c..00000000000 Binary files a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.png and /dev/null differ diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/theme-selection-in-the-ide.png b/website/static/img/docs/dbt-cloud/using-dbt-cloud/theme-selection-in-the-ide.png new file mode 100644 index 00000000000..cdb85349153 Binary files /dev/null and b/website/static/img/docs/dbt-cloud/using-dbt-cloud/theme-selection-in-the-ide.png differ diff --git a/website/vercel.json b/website/vercel.json index 993ff9065bd..927b7ec6b2f 100644 --- a/website/vercel.json +++ b/website/vercel.json @@ -2,6 +2,11 @@ "cleanUrls": true, "trailingSlash": false, "redirects": [ + { + "source": "/docs/cloud/about-cloud/dark-mode", + "destination": "/docs/cloud/about-cloud/change-your-dbt-cloud-theme", + "permanent": true + }, { "source": "/docs/collaborate/git/managed-repository", "destination": "/docs/cloud/git/managed-repository", @@ -3631,13 +3636,28 @@ "destination": "https://www.getdbt.com/blog/guide-to-surrogate-key", "permanent": true }, + { + "source": "/terms/elt", + "destination": "https://www.getdbt.com/blog/extract-load-transform", + "permanent": true + }, + { + "source": "/terms/etl", + "destination": "https://www.getdbt.com/blog/extract-transform-load", + "permanent": true + }, + { + "source": "/terms/reverse-etl", + "destination": "https://www.getdbt.com/blog/reverse-etl-playbook", + "permanent": true + }, { "source": "/glossary", "destination": "https://www.getdbt.com/blog", "permanent": true }, { - "source": "/terms/:path((?!elt|etl|reverse-etl).*)", + "source": "/terms/:path*", "destination": "https://www.getdbt.com/blog", "permanent": true }