Support Materialized Views #1162

clausherther · 2018-12-03T16:51:21Z

Docs: https://docs.snowflake.net/manuals/sql-reference/sql/create-materialized-view.html

Ideally this would also support the cluster by keyword:

create materialized view "schema"."my_mview"
cluster by ( id ) 
as 
select 1 as id,
         2 as name

The text was updated successfully, but these errors were encountered:

clausherther · 2019-05-13T19:53:57Z

Looks like this was added in 0.13.1
(PR #1432 )

drewbanin · 2019-05-13T20:10:03Z

@clausherther not so! #1432 fixed a problem where dbt couldn't run if a materialized view lived in the dbt schema. dbt still does not support the creation of materialized views on Snowflake, though it is something I've been experimenting with recently.

In your mind, what's the advantage of using a materialized view over a dbt table model that's refreshed with some cadence? I seem to recall some pre-release Snowflake functionality that would auto-update materialized views, but I'm struggling to find that info presently

clausherther · 2019-05-13T20:35:13Z

Ah, I misread this in the release notes: "Add "MaterializedView" relation type to the Snowflake adapter"

Personally, I don't think I have a good use case for MVs in Snowflake, that need has really been largely replaced by dbt. I think I opened the issue alongside the one for transient tables, to keep track of new Snowflake features you potentially want to support.

drewbanin · 2019-05-13T20:41:58Z

alright, thanks for the additional context! Happy to leave this open to track the Snowflake feature, but also welcome input from anyone who 👍'd the issue to chime in with some use cases!

jrandrews · 2019-06-07T16:30:48Z

I'm trying to think how we would deal with MViews in dbt. You don't really want to drop and recreate them every time because then there would be a lot of extra time and cost by Snowflake bringing them up-to-speed with the underlying data. They do act like views when they are "stale" or not completely refreshed so at least there wouldn't be any problems with what happens if they are queried if not completely rebuilt:

Data accessed through materialized views is always current, regardless of the amount of DML that has been performed on the base table. If a query is run before the materialized view is up-to-date, Snowflake either updates the materialized view or uses the up-to-date portions of the materialized view and retrieves any required newer data from the base table.
https://docs.snowflake.net/manuals/user-guide/views-materialized.html

Perhaps they would be like incremental tables where they are only re-created when a) they don't exist or b ) when forcing a full refresh?

There is also the potential for a lot of run-time error with MViews in Snowflake because they are limited in what they support from a SQL perspective. My concern is that it would be pretty difficult for dbt to parse and do a compile-time validation of what is not legitimate for an MView which would lead in many cases to dbt runs failing if the MView gets compiled but then fails creation or update at run-time. The list of things is too long for me to write here but see link above.

mike-weinberg · 2019-09-11T18:36:45Z

@drewbanin materialized views are useful for:

accelerating external tables with automated refreshes (better than using pipes)
maintaining a high-water mark on CDC tables eg

CREATE MATERIALIZED VIEW cdc_highwater 
AS 
(
    SELECT primary_key, column_name, MAX(wal_ts) AS wal_ts 
    FROM cdc_table
)

to accelerate consolidation into current-state tables like:

SELECT  primary_key
        , DECODE(column_name, 'col1', value) AS col1
        , DECODE(column_name, 'col2', value) AS col2
        , ...
        , DECODE(column_name, 'colN', value) AS colN
FROM cdc_table JOIN cdc_highwater USING (primary_key, column_name, wal_ts)

Frankly I think only use case 1 is value-generating, since the performance improvements are (allegedly) nontrivial for end users.

drewbanin · 2019-09-11T19:00:59Z

Thanks @mike-weinberg! I don't think it would be too challenging to create a materialized_view materialization in dbt. I think the logic would look like:

if the relation doesn't exist
- create a materialized view
else if the relation exists but is not a materialized view:
- drop the existing relation
- create a materialized view
else if the relation exists and is a materialized view and dbt is in full-refresh mode:
- replace the materialized view
else:
- no-op

I still think that the list of caveats are too restrictive for most modeling use cases (no window functions, no unions, limited aggregates, can't query views, etc etc etc). Still, happy to support these as I do think the first use-case you've pointed out here @mike-weinberg is good and legitimate. Plus, I imagine that these materialized views will only become more capable over time.

If someone is interested in creating this materialization, it all should be pretty doable in user-space! Would love to discuss the implementation further if anyone picks it up :)

automagic · 2020-01-23T15:58:08Z

Redshift is also planning on supporting materialized views (in preview):

https://docs.aws.amazon.com/redshift/latest/dg/mv.html

It appears that many of the same limitations apply as with Snowflake materialized views.
https://docs.aws.amazon.com/redshift/latest/dg/mv-usage-notes.html

In addition, Redshift appears to keep the underlying materialized data static until a refresh command is issued at which time it either incrementally updates the data if certain conditions are met, or does a full rebuild.
https://docs.aws.amazon.com/redshift/latest/dg/mv-refresh.html#mv_REFRESH_MARTERIALIZED_VIEW_usage

While I agree that there will be limited cases where this makes sense in modeling, surely there are valid and interesting use cases.

drewbanin · 2020-01-23T16:28:24Z

Thanks @automagic! I used to be not-so-bullish on Materialized views, but the idea of these incremental refreshes is really appealing. One thing I particularly like about them: refreshing a table can happen concurrently with external queries. When dbt replaces tables, it currently blocks if there are long-running select statements hitting the table.

The one challenge here is going to be knowing when dbt should rebuild the MV, vs. just refresh it. We can leverage the --full-refresh flag, but that feels a little heavy handed. It would certainly be pretty cool if dbt could check if the columns in the view have changed, a la #320. Even better would be some sort of logical comparison (eg. check the hash of the SQL that created the model against the hash of the new model).... but that could be something for a separate issue :)

jonah-juul · 2020-04-08T16:14:12Z

BigQuery has a similar feature https://cloud.google.com/bigquery/docs/materialized-views-intro

Would definitely be useful

bashyroger · 2020-06-17T08:34:55Z

Even with the limitations native Snowflake MV's have, supporting them in DBT will become more useful once the Transparent MV usage functionality has been released (in private preview at the moment of writing this). Basically this is functionality akin to Oracle's QUERY_REWRITE.
See my write up on this new feature (and others) and the SF new features video that mentions it:
https://www.linkedin.com/pulse/my-pov-new-snowflake-features-announced-june-2020-werschkull-/

jtcohen6 · 2020-06-17T16:23:41Z

Hey @bashyroger, check out:

Initial work on a materialized_view dbt materialization in this repo
Yesterday's dbt community office hours (youtube). I mentioned the functionality akin to Oracle's QUERY_REWRITE, as it's the biggest feature currently supported for BigQuery's MVs (docs). I don't know as much about Snowflake's support for this, it sounds like it's TK.

mike-weinberg · 2020-06-17T23:15:27Z

Since this thread is warming up, let's make it spicy 🌶️ !

Snowflake recently announced a clever feature they're calling "transparent materialized views".

In the future, if a plan node of a query is equivalent to or is a subset of the plan node of a materialized view, then the materialized view will be used, even if it isn't referenced, because it is provably equal.

So, if you identify common aggregations, and build materialized views for those aggregations, then queries which have steps which are equivalent to those materialized views will be rewritten by the planner to use those materialized views. Alternatively, you could maintain multiple copies of the same table, but with different clustering keys, and snowflake will automatically pick the best one. This is analogous to Vertica's concept of projections.

Personally, I think this is a big deal, and if snowflake expands support to include INNER JOIN and UNION ALL clauses, then chains of MV's might become the best way to express many dags. Even in the absence of that, the ability to maintain copies of tables with various clustering keys means that MVs could accelerate regular views enough to enable decent-performing low latency reporting when combined with a CDC ELT vendor.

EDIT: When I wrote this, I either didn't see the prior two comments, or my page needed to be refreshed. Embarrassing! I'll leave this here though =)

nareshkumaryadav · 2020-09-16T10:28:30Z

What is the latest state on this one ? Is this issue still open ?

kamleshsinhchauhan · 2020-09-17T10:35:57Z

I also want to know this as I am going to source control snowflake and views also need to source control like tables.

jtcohen6 · 2020-09-21T20:08:38Z

Hey @nareshkumaryadav, this issue is definitely still open.

I played around with materialized views on the Core Four databases back in June, so there's code available if you want to implement a halfway-decent version in your own project.
There are some small changes we'd want to make in dbt to include database metadata on MVs, specifically in Postgres + Redshift. Almost all the other code required here is SQL + Jinja, meaning that you have full power to try it, test it, and adapt it yourself.
Eventually, I imagine we'll want to define MVs as a native dbt materialization. We haven't yet for a couple of reasons:
- Unlike view, table, ephemeral, and incremental—which, with some small exceptions, have the same functionality across all four databases—a materialized_view necessarily means something quite different on each of Postgres, Redshift, Snowflake, and BigQuery.
- That functional variance also exists over time, as each warehouse continues to expand the footprint of its MV implementation. We're still not sure what we want to do with MVs—real-time aggregation of event streams? rollup cubes for BI query rewrite?—and dbt has an opinionated modeling framework. We definitely don't want to constrain the bounds of what's possible in dbt, but at the same time we don't want to write a bunch of code into dbt-core without having some strong recommendations for when to leverage that code.

All of which is to say: I'd find it really valuable to hear more about what you're thinking of doing with MVs in your own project.

@kamleshsinhchauhan It sounds like what you're after may be slightly different: we have long supported view and table as materialization types on Snowflake. Check out the docs on materializations. If you are thinking about Snowflake materialized views, then this is the right thread!

fairchild · 2021-02-01T05:31:59Z

I thought I'd share another use case. We are considering use of materialized views in snowflake. Our use case is to load Debezium cdc data to a staging table with variant column that accumulates the stream. Materialized views would be created from the staging table to a history table for the corresponding source table. We could create the materialized views out of band from DBT and then treat them as DBT sources, but would prefer to have it all in DBT if possible.

It would be great to hear if folks are having success with the experimental implementation, and/or if that is likely to make its way into a dbt release in the near future.

tomharrisonjr · 2021-02-24T16:46:44Z

Thanks @automagic! I used to be not-so-bullish on Materialized views, but the idea of these incremental refreshes is really appealing. One thing I particularly like about them: refreshing a table can happen concurrently with external queries. When dbt replaces tables, it currently blocks if there are long-running select statements hitting the table.

The one challenge here is going to be knowing when dbt should rebuild the MV, vs. just refresh it. We can leverage the --full-refresh flag, but that feels a little heavy handed. It would certainly be pretty cool if dbt could check if the columns in the view have changed, a la #320. Even better would be some sort of logical comparison (eg. check the hash of the SQL that created the model against the hash of the new model).... but that could be something for a separate issue :)

Not sure I understand why this is an issue? Redshift does support MVs now, including automated refresh, and internally knows whether they can be incrementally updated or require a full update. The refresh occurs asynchronously (scheduled to avoid conflict with other uses). dbt should simply recognize that when a view is MV, it simply needs to leave it alone unless the query itself has changed.

tomharrisonjr · 2021-02-24T19:07:27Z

Our analytics team is suffering with poor performance of dbt generated views. We have confirmed that a redshift-managed materialized view runs in a second or two compared to the standard view created by dbt which now takes several minutes to run (so: 2 orders of magnitude faster in this case). They have tried dbt options, but the tradeoffs are poor for their use-case. Is there any update or plan to support redshift (and bigtable and snowflake) materialized views soon?

jtcohen6 · 2021-02-26T11:36:55Z

including automated refresh

I didn't know this! I just updated the sample code in dbt-labs/dbt-labs-experimental-features#11 to add an auto_refresh config on Redshift. That's really compelling; in my anecdotal experience, the manual refresh materialized view statement was quite slow to run.

The question at this point isn't, Can you use materialized views with dbt? You absolutely can, by including some custom macros and materializations in your project. The question is, Do we have a recommended implementation of MVs we feel great about, such that we want to set it as the standard for all new projects going forward?

Materialized views have come a long way in just the past year, but they still have significant limitations. Particularly on Redshift, MVs do not support late binding, and they can't be identified in pg_depend (awslabs/amazon-redshift-utils#499), so it's impossible for dbt to know when they've been cascade-dropped. When an MV needs to be replaced (whether because its definition has updated, or the schema of the underlying table has changed), I don't see a way to do this both reliably and atomically.

LoekL · 2021-04-27T09:08:24Z

We support a large number of event types which we split out in separate tables (one for each event) for a variety of reasons. Currently we keep all these up-to-date by running hourly incremental updates. When moving this part of our setup to materialized views:

We have way less moving parts / stuff to maintain. I also suspect, though depending on your pricing model, it can be cheaper.
We get side-benefits like "smart tuning" on BigQuery (reference).
Our DBT pipeline has to take care of way less models which means it'll be done quicker with the models that aren't suitable for MV's.

bytekast · 2021-06-17T21:29:56Z

+1 Would really love to see this implemented. Auto refreshed materialized views in Redshift are much more performant and efficient than incremental materialization based on my tests.

rarup1 · 2021-09-29T15:56:53Z

+1 for redshift materialised views built into dbt.

andresbravog · 2021-10-11T06:57:34Z

+1 for Bigquery materialised views

calebsmac · 2021-10-27T16:32:59Z

+1 for Snowflake materialized views

benjaminwootton · 2021-11-18T14:55:49Z

Materialised Views have definite uses in anything latency sensitive. The warehouse will update the MV on insert into the underlying table rather than waiting for DBT to run.

It would be great to create these from DBT and then create references to them for downstream batch DBT runs.

Not sure what the plan is with Clickhouse, but that has additional features around MVs such as their SummingMergeTree and ReplacingMergeTree engine which can also be really useful to get the semantics you need.

Another +1 for this feature, Snowflake primarily.

ckaiser-graniterock · 2021-12-07T21:31:27Z

+1 MV's Snowflake

grimwm · 2022-01-28T18:08:55Z

+1 MVs in Postgres

sql-dr-watson · 2022-02-01T19:34:14Z

+1 MV's in Snowflake

cwelton · 2022-03-10T04:06:25Z

+1 for MVs in general, but given this has been open since 2018 and still no MVs... not so hopeful.

smitsrr · 2022-03-24T08:32:59Z

At this point we might need a community contributor to write the custom materialization with the logic @drewbanin outlined above. #1162 (comment)

ybressler · 2022-03-25T17:49:08Z

+1 for MVs

Would help solve this issue with cloning: dbt-labs/dbt-snowflake#118

hsenger-tiger · 2022-04-13T18:52:15Z

Any update here from the DBT side? This would be very helpful for the performance reasons mentioned many times earlier.

jspreddy · 2022-05-16T19:11:37Z

+1 really need this for performance.

simonmd · 2022-06-02T19:49:32Z

+1 MVs in Postgres

kgeis · 2022-06-07T00:14:58Z

I am using PostgreSQL, and when I started using dbt, I really wanted it to support MVs in PostgreSQL. That was when my thinking was to run dbt on demand to create the structure of my data mart. Since then I think I've come more to the "dbt way" where I run dbt on a schedule in order to populate my data mart.

In PG, there is no standard way to schedule MV refreshes and no support for log-based refresh of MVs. This says to me that an MV in PG is equivalent to materializing a dbt model as a table and then scheduling a dbt run at the OS level to refresh it.

I see the value of MV support for platforms with either internal scheduling (e.g. Oracle REFRESH clause) or incremental refresh (e.g. Oracle FAST clause) of MVs based on underlying changes, such as Oracle or I think Snowflake. I no longer feel like I need this for PG.

mike-luabase · 2022-07-14T18:56:20Z

+1 for clickhouse MV's!

Bruno1298 · 2022-07-20T12:21:21Z

+1 for MV BigQuery and Snowflake

MarcinZegar · 2022-10-31T21:13:28Z

Any updates on this ticket?

willbryant · 2022-10-31T21:42:23Z

Use https://github.com/dbt-labs/dbt-labs-experimental-features/tree/main/materialized-views and give feedback there.

wmoustafa · 2023-05-26T00:42:09Z

Sharing some early work on integrating DBT, Coral, and Spark to implement materialized views and incremental view maintenance on various engines, but starting with Spark. This DBT extension defines a new materialization mode called incremental_maintenance that automatically evaluates the query incrementally. Source code can be found under coral-dbt and coral-incremental. There are also some slides going through the design/implementation. Please feel free to test it, try it, or extend it.

dbeatty10 · 2023-05-26T13:24:24Z

Thank you for sharing @wmoustafa ! The slides were well-written and explained a couple big ideas (transpilation via Coral and incremental_maintenance as a drop-in replacement for the table materialization).

dbeatty10 · 2024-04-03T01:20:37Z

Closing in favor of:

[CT-2068] [Feature] Let's add Materialized View as a materialization, finally #6911

drewbanin mentioned this issue Apr 30, 2019

dbt fails if destination schema contains a MATERIALIZED VIEW in Snowflake #1430

Closed

clausherther closed this as completed May 13, 2019

drewbanin reopened this May 13, 2019

drewbanin added the help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors label May 13, 2019

drewbanin added redshift snowflake labels Jan 22, 2020

drewbanin changed the title ~~Support Materialized Views on Snowflake~~ Support Materialized Views Jan 22, 2020

drewbanin added the bigquery label Apr 9, 2020

jtcohen6 added the postgres label Sep 28, 2021

jtcohen6 mentioned this issue Oct 28, 2021

Event-based dbt runs #3887

Closed

jtcohen6 mentioned this issue Feb 4, 2022

Improve installation options / documentations for materialized views dbt-labs/dbt-labs-experimental-features#20

Closed

leahtessem mentioned this issue Jun 28, 2022

Ft Add materialization for Snowflake materialized view object type dbt-labs/dbt-snowflake#181

Closed

4 tasks

Fleid mentioned this issue Feb 8, 2023

[CT-2068] [Feature] Let's add Materialized View as a materialization, finally #6911

Closed

github-actions bot added the triage label May 26, 2023

dbeatty10 removed the triage label May 26, 2023

dataders added the materialized_views label Oct 25, 2023

dbeatty10 closed this as completed Apr 3, 2024

Support Materialized Views #1162

Support Materialized Views #1162

Comments

clausherther commented Dec 3, 2018

clausherther commented May 13, 2019

drewbanin commented May 13, 2019

clausherther commented May 13, 2019

drewbanin commented May 13, 2019

jrandrews commented Jun 7, 2019

mike-weinberg commented Sep 11, 2019

drewbanin commented Sep 11, 2019

automagic commented Jan 23, 2020 • edited Loading

drewbanin commented Jan 23, 2020

jonah-juul commented Apr 8, 2020

bashyroger commented Jun 17, 2020

jtcohen6 commented Jun 17, 2020 • edited Loading

mike-weinberg commented Jun 17, 2020 • edited Loading

nareshkumaryadav commented Sep 16, 2020

kamleshsinhchauhan commented Sep 17, 2020

jtcohen6 commented Sep 21, 2020

fairchild commented Feb 1, 2021

tomharrisonjr commented Feb 24, 2021

tomharrisonjr commented Feb 24, 2021

jtcohen6 commented Feb 26, 2021

LoekL commented Apr 27, 2021

bytekast commented Jun 17, 2021 • edited Loading

rarup1 commented Sep 29, 2021

andresbravog commented Oct 11, 2021

calebsmac commented Oct 27, 2021

benjaminwootton commented Nov 18, 2021 • edited Loading

ckaiser-graniterock commented Dec 7, 2021

grimwm commented Jan 28, 2022

sql-dr-watson commented Feb 1, 2022

cwelton commented Mar 10, 2022

smitsrr commented Mar 24, 2022

ybressler commented Mar 25, 2022

hsenger-tiger commented Apr 13, 2022

jspreddy commented May 16, 2022

simonmd commented Jun 2, 2022

kgeis commented Jun 7, 2022

mike-luabase commented Jul 14, 2022

Bruno1298 commented Jul 20, 2022

MarcinZegar commented Oct 31, 2022

willbryant commented Oct 31, 2022

wmoustafa commented May 26, 2023 • edited Loading

dbeatty10 commented May 26, 2023

dbeatty10 commented Apr 3, 2024

automagic commented Jan 23, 2020 •

edited

Loading

jtcohen6 commented Jun 17, 2020 •

edited

Loading

mike-weinberg commented Jun 17, 2020 •

edited

Loading

bytekast commented Jun 17, 2021 •

edited

Loading

benjaminwootton commented Nov 18, 2021 •

edited

Loading

wmoustafa commented May 26, 2023 •

edited

Loading