Skip to content

Commit

Permalink
Merge branch 'current' into fix/python-models-dataproc-serverless
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Jan 23, 2025
2 parents afc93dd + 28f5b0d commit bf308b1
Show file tree
Hide file tree
Showing 110 changed files with 1,809 additions and 1,507 deletions.
4 changes: 2 additions & 2 deletions website/blog/2023-11-14-specify-prod-environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ is_featured: false

---

:::note You can now use a Staging environment!
This blog post was written before Staging environments. You can now use dbt Cloud can to support the patterns discussed here. Read more about [Staging environments](/docs/deploy/deploy-environments#staging-environment).
:::note You can now specify a Staging environment too!
This blog post was written before dbt Cloud added full support for Staging environments. Now that they exist, you should mark your CI environment as Staging as well. Read more about [Staging environments](/docs/deploy/deploy-environments#staging-environment).
:::

:::tip The Bottom Line:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: "Why I wish I had a control plane for my renovation"
description: "When I think back to my renovation, I realize how much smoother it would've been if I’d had a control plane for the entire process."
slug: wish-i-had-a-control-plane-for-my-renovation

authors: [mark_wan]

tags: [analytics craft, data_ecosystem]
hide_table_of_contents: false

date: 2025-01-21
is_featured: true
---

When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didn’t realize was just how complex and exhausting managing so many moving parts would be.

<Lightbox src="/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png" width="70%" title="My wife pondering our sanity" />

We had to coordinate multiple elements:

- The **architects**, who designed the layout, interior, and exterior.
- The **architectural plans**, which outlined what the house should look like.
- The **builders**, who executed those plans.
- The **inspectors**, **councils**, and **energy raters**, who checked whether everything met the required standards.

<!--truncate-->

Each piece was critical &mdash; without the plans, there’s no shared vision; without the builders, the plans don’t come to life; and without inspections, mistakes go unnoticed.

But as an inexperienced project manager, I was also the one responsible for stitching everything together:
- Architects handed me detailed plans, builders asked for clarifications.
- Inspectors flagged issues that were often too late to fix without extra costs or delays.
- On top of all this, I also don't speak "builder".

So what should have been quick and collaborative conversations, turned into drawn-out processes because there was no unified system to keep everyone on the same page.

## In many ways, this mirrors how data pipelines operate

- The **architects** are the engineers &mdash; designing how the pieces fit together.
- The **architectural plans** are your dbt code &mdash; the models, tests, and configurations that define what your data should look like.
- The **builders** are the compute layers (for example, Snowflake, BigQuery, or Databricks) that execute those transformations.
- The **inspectors** are the monitoring tools, which focus on retrospective insights like logs, job performance, and error rates.

Here’s the challenge: monitoring tools, by their nature, look backward. They’re great at telling you what happened, but they don’t help you plan or declare what should happen. And when these roles, plans, execution, and monitoring are siloed, teams are left trying to manually stitch them together, often wasting time troubleshooting issues or coordinating workflows.

## What makes dbt Cloud different

[dbt Cloud](https://www.getdbt.com/product/dbt-cloud) unifies these perspectives into a single [control plane](https://www.getdbt.com/blog/data-control-plane-introduction), bridging proactive and retrospective capabilities:

- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/syntax#state-selection) of your data before jobs even run &mdash; your architectural plans are baked into the pipeline.
- **Retrospective insights**: dbt Cloud surfaces [job logs](https://docs.getdbt.com/docs/deploy/run-visibility), performance metrics, and test results, providing the same level of insight as traditional monitoring tools.

But the real power lies in how dbt integrates these two perspectives. Transformation logic (the plans) and monitoring (the inspections) are tightly connected, creating a continuous feedback loop where issues can be identified and resolved faster, and pipelines can be optimized more effectively.

## Why does this matter?

1. **The silo problem**: Many organizations rely on separate tools for transformation and monitoring. This fragmentation creates blind spots, making it harder to identify and resolve issues.
2. **Integrated workflows**: dbt Cloud eliminates these silos by connecting transformation and monitoring logic in one place. It doesn’t just report on what happened; it ties those insights directly to the proactive plans that define your pipeline.
3. **Operational confidence**: With dbt Cloud, you can trust that your data pipelines are not only functional but aligned with your business goals, monitored in real-time, and easy to troubleshoot.

## Why I wish I had a control plane for my renovation

When I think back to my renovation, I realize how much smoother it would have been if I’d had a control plane for the entire process. There are firms that specialize in design-and-build projects, in-house architects, engineers, and contractors. The beauty of these firms is that everything is under one roof, so you know they’re communicating seamlessly.

In my case, though, my architect, builder, and engineer were all completely separate, which meant I was the intermediary. I was the pigeon service shuttling information between them, and it was exhausting. Discussions that should have taken minutes, stretched into weeks and sometimes even months because there was no centralized communication.

dbt Cloud is like having that design-and-build firm for your data pipelines. It’s the control plane that unites proactive planning with retrospective monitoring, eliminating silos and inefficiencies. With dbt Cloud, you don’t need to play the role of the pigeon service &mdash; it gives you the visibility, integration, and control you need to manage modern data workflows effortlessly.
8 changes: 8 additions & 0 deletions website/blog/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -623,3 +623,11 @@ yu_ishikawa:
url: https://www.linkedin.com/in/yuishikawa0301
name: Yu Ishikawa
organization: Ubie
mark_wan:
image_url: /img/blog/authors/mwan.png
job_title: Senior Solutions Architect
links:
- icon: fa-linkedin
url: https://www.linkedin.com/in/markwwan/
name: Mark Wan
organization: dbt Labs
5 changes: 5 additions & 0 deletions website/docs/docs/build/custom-aliases.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,3 +157,8 @@ If these models should indeed have the same database identifier, you can work ar

By default, dbt will create versioned models with the alias `<model_name>_v<v>`, where `<v>` is that version's unique identifier. You can customize this behavior just like for non-versioned models by configuring a custom `alias` or re-implementing the `generate_alias_name` macro.

## Related docs

- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias
- [Custom schema](/docs/build/custom-schemas) to learn how to customize dbt schema
- [Custom database](/docs/build/custom-databases) to learn how to customize dbt database
6 changes: 6 additions & 0 deletions website/docs/docs/build/custom-databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,9 @@ See docs on macro `dispatch`: ["Managing different global overrides across packa
### BigQuery

When dbt opens a BigQuery connection, it will do so using the `project_id` defined in your active `profiles.yml` target. This `project_id` will be billed for the queries that are executed in the dbt run, even if some models are configured to be built in other projects.

## Related docs

- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias
- [Custom schema](/docs/build/custom-schemas) to learn how to customize dbt model schema
- [Custom aliases](/docs/build/custom-aliases) to learn how to customize dbt model alias name
6 changes: 6 additions & 0 deletions website/docs/docs/build/custom-schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,3 +207,9 @@ In the `generate_schema_name` macro examples shown in the [built-in alternative
If your schema names are being generated incorrectly, double-check your target name in the relevant environment.

For more information, consult the [managing environments in dbt Core](/docs/core/dbt-core-environments) guide.

## Related docs

- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias
- [Custom database](/docs/build/custom-databases) to learn how to customize dbt model database
- [Custom aliases](/docs/build/custom-aliases) to learn how to customize dbt model alias name
4 changes: 3 additions & 1 deletion website/docs/docs/build/data-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@ having total_amount < 0

The name of this test is the name of the file: `assert_total_payment_amount_is_positive`.

Note, you won't need to include semicolons (;) at the end of the SQL statement in your singular test files as it can cause your test to fail.
Note:
- Omit semicolons (;) at the end of the SQL statement in your singular test files, as they can cause your test to fail.
- Singular tests placed in the tests directory are automatically executed when running `dbt test`. Don't reference singular tests in `model_name.yml`, as they are not treated as generic tests or macros, and doing so will result in an error.

To add a description to a singular test in your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content:

Expand Down
24 changes: 10 additions & 14 deletions website/docs/docs/build/enhance-your-code.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,17 @@ pagination_prev: null

<div className="grid--2-col">

<Card
title="Project variables"
body="Learn how to use project variables to provide data to models for compilation."
link="/docs/build/project-variables"
icon="dbt-bit"/>

<Card
title="Environment variables"
body="Learn how you can use environment variables to customize the behavior of a dbt project."
link="/docs/build/environment-variables"
icon="dbt-bit"/>

</div>
<br />
<div className="grid--2-col">
<Card
title="Hooks and operations"
body="Learn how to use hooks to trigger actions and operations to invoke macros."
link="/docs/build/hooks-operations"
icon="dbt-bit"/>

<Card
title="Packages"
Expand All @@ -30,9 +26,9 @@ pagination_prev: null
icon="dbt-bit"/>

<Card
title="Hooks and operations"
body="Learn how to use hooks to trigger actions and operations to invoke macros."
link="/docs/build/hooks-operations"
icon="dbt-bit"/>
title="Project variables"
body="Learn how to use project variables to provide data to models for compilation."
link="/docs/build/project-variables"
icon="dbt-bit"/>

</div>
</div>
17 changes: 15 additions & 2 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,27 @@ Microbatch is an incremental strategy designed for large time-series datasets:

- Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies).

### How microbatch works
## How microbatch works

When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />.

This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills), concurrently, and [retry](#retry) them independently.

### Adapter-specific behavior

dbt's microbatch strategy uses the most efficient mechanism available for "full batch" replacement on each adapter. This can vary depending on the adapter:

- `dbt-postgres`: Uses the `merge` strategy, which performs "update" or "insert" operations.
- `dbt-redshift`: Uses the `delete+insert` strategy, which "inserts" or "replaces."
- `dbt-snowflake`: Uses the `delete+insert` strategy, which "inserts" or "replaces."
- `dbt-bigquery`: Uses the `insert_overwrite` strategy, which "inserts" or "replaces."
- `dbt-spark`: Uses the `insert_overwrite` strategy, which "inserts" or "replaces."
- `dbt-databricks`: Uses the `replace_where` strategy, which "inserts" or "replaces."

Check out the [supported incremental strategies by adapter](/docs/build/incremental-strategy#supported-incremental-strategies-by-adapter) for more info.

## Example

A `sessions` model aggregates and enriches data that comes from two other models:
Expand Down Expand Up @@ -170,7 +183,7 @@ customers as (

</Tabs>

dbt will instruct the data platform to take the result of each batch query and insert, update, or replace the contents of the `analytics.sessions` table for the same day of data. To perform this operation, dbt will use the most efficient atomic mechanism for "full batch" replacement that is available on each data platform.
dbt will instruct the data platform to take the result of each batch query and [insert, update, or replace](#adapter-specific-behavior) the contents of the `analytics.sessions` table for the same day of data. To perform this operation, dbt will use the most efficient atomic mechanism for "full batch" replacement that is available on each data platform. For details, see [How microbatch works](#how-microbatch-works).

It does not matter whether the table already contains data for that day. Given the same input data, the resulting table is the same no matter how many times a batch is reprocessed.

Expand Down
1 change: 1 addition & 0 deletions website/docs/docs/build/incremental-strategy.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "About incremental strategy"
sidebar_label: "About incremental strategy"
description: "Learn about the various ways (strategies) to implement incremental materializations."
id: "incremental-strategy"
---
Expand Down
Loading

0 comments on commit bf308b1

Please sign in to comment.